Picture for Bohan Zeng

Bohan Zeng

Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

Add code
Oct 22, 2025
Viaarxiv icon

MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

Add code
Oct 16, 2025
Viaarxiv icon

Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models

Add code
Jun 15, 2025
Figure 1 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Figure 2 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Figure 3 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Figure 4 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Viaarxiv icon

VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks

Add code
Jun 10, 2025
Viaarxiv icon

Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification

Add code
Jun 08, 2025
Viaarxiv icon

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Add code
May 27, 2025
Viaarxiv icon

Let's Verify Math Questions Step by Step

Add code
May 20, 2025
Viaarxiv icon

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Add code
Apr 14, 2025
Figure 1 for Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Figure 2 for Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Figure 3 for Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Figure 4 for Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Viaarxiv icon

WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes

Add code
Mar 17, 2025
Figure 1 for WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
Figure 2 for WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
Figure 3 for WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
Figure 4 for WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
Viaarxiv icon

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Add code
Jan 27, 2025
Viaarxiv icon