Picture for Renrui Zhang

Renrui Zhang

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Add code
Oct 30, 2025
Viaarxiv icon

Can World Models Benefit VLMs for World Dynamics?

Add code
Oct 01, 2025
Viaarxiv icon

MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation

Add code
Sep 30, 2025
Figure 1 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Figure 2 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Figure 3 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Figure 4 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Viaarxiv icon

GLEAM: Learning to Match and Explain in Cross-View Geo-Localization

Add code
Sep 09, 2025
Viaarxiv icon

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Add code
Aug 20, 2025
Viaarxiv icon

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

Add code
Jul 23, 2025
Viaarxiv icon

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Add code
Jul 02, 2025
Viaarxiv icon

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Add code
Jun 05, 2025
Viaarxiv icon

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Add code
Jun 05, 2025
Figure 1 for Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Figure 2 for Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Figure 3 for Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Figure 4 for Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Viaarxiv icon

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Add code
May 27, 2025
Viaarxiv icon