Picture for Boyi Li

Boyi Li

Counterfactual VLA: Self-Reflective Vision-Language-Action Model with Adaptive Reasoning

Add code
Dec 30, 2025
Viaarxiv icon

Towards Efficient and Effective Multi-Camera Encoding for End-to-End Driving

Add code
Dec 12, 2025
Viaarxiv icon

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos

Add code
Dec 11, 2025
Figure 1 for FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Figure 2 for FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Figure 3 for FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Figure 4 for FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Viaarxiv icon

AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond

Add code
Sep 30, 2025
Viaarxiv icon

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation

Add code
Sep 09, 2025
Figure 1 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 2 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 3 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 4 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Viaarxiv icon

MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real

Add code
Jul 03, 2025
Viaarxiv icon

Describe Anything: Detailed Localized Image and Video Captioning

Add code
Apr 22, 2025
Viaarxiv icon

Optimization of MedSAM model based on bounding box adaptive perturbation algorithm

Add code
Mar 25, 2025
Viaarxiv icon

Scaling Vision Pre-Training to 4K Resolution

Add code
Mar 25, 2025
Viaarxiv icon

Atlas: Multi-Scale Attention Improves Long Context Image Modeling

Add code
Mar 16, 2025
Viaarxiv icon