Picture for Florian Bordes

Florian Bordes

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Add code
Jan 08, 2026
Viaarxiv icon

What's in Common? Multimodal Models Hallucinate When Reasoning Across Scenes

Add code
Nov 05, 2025
Viaarxiv icon

IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments

Add code
Jun 11, 2025
Figure 1 for IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
Figure 2 for IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
Figure 3 for IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
Figure 4 for IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
Viaarxiv icon

Measuring Déjà vu Memorization Efficiently

Add code
Apr 08, 2025
Viaarxiv icon

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Add code
Feb 21, 2025
Figure 1 for Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Figure 2 for Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Figure 3 for Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Figure 4 for Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Viaarxiv icon

Object-centric Binding in Contrastive Language-Image Pretraining

Add code
Feb 19, 2025
Figure 1 for Object-centric Binding in Contrastive Language-Image Pretraining
Figure 2 for Object-centric Binding in Contrastive Language-Image Pretraining
Figure 3 for Object-centric Binding in Contrastive Language-Image Pretraining
Figure 4 for Object-centric Binding in Contrastive Language-Image Pretraining
Viaarxiv icon

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Add code
Oct 22, 2024
Figure 1 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 2 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 3 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 4 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Viaarxiv icon

An Introduction to Vision-Language Modeling

Add code
May 27, 2024
Figure 1 for An Introduction to Vision-Language Modeling
Figure 2 for An Introduction to Vision-Language Modeling
Figure 3 for An Introduction to Vision-Language Modeling
Viaarxiv icon

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Add code
Dec 14, 2023
Figure 1 for A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Figure 2 for A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Figure 3 for A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Figure 4 for A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Viaarxiv icon

Feedback-guided Data Synthesis for Imbalanced Classification

Add code
Sep 29, 2023
Viaarxiv icon