Picture for Haochen Wang

Haochen Wang

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Add code
Nov 18, 2025
Viaarxiv icon

CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models

Add code
Nov 15, 2025
Viaarxiv icon

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Add code
Nov 13, 2025
Viaarxiv icon

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Add code
Oct 23, 2025
Viaarxiv icon

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon

DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving

Add code
Oct 14, 2025
Figure 1 for DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Figure 2 for DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Figure 3 for DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Figure 4 for DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Viaarxiv icon

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Add code
Jul 10, 2025
Figure 1 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Figure 2 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Figure 3 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Figure 4 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Viaarxiv icon

Holistic Tokenizer for Autoregressive Image Generation

Add code
Jul 03, 2025
Viaarxiv icon

VGR: Visual Grounded Reasoning

Add code
Jun 16, 2025
Viaarxiv icon

FastMap: Revisiting Dense and Scalable Structure from Motion

Add code
May 07, 2025
Viaarxiv icon