Picture for Wenwei Zhang

Wenwei Zhang

RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy

Add code
Mar 31, 2025
Viaarxiv icon

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Add code
Mar 27, 2025
Viaarxiv icon

SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining

Add code
Mar 25, 2025
Viaarxiv icon

Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs

Add code
Mar 04, 2025
Viaarxiv icon

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Add code
Feb 10, 2025
Viaarxiv icon

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Add code
Jan 21, 2025
Figure 1 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 2 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 3 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 4 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Viaarxiv icon

LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving

Add code
Jan 07, 2025
Viaarxiv icon

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Add code
Jan 07, 2025
Figure 1 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 2 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 3 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 4 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Viaarxiv icon

Are Your LLMs Capable of Stable Reasoning?

Add code
Dec 17, 2024
Viaarxiv icon

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Add code
Dec 12, 2024
Figure 1 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 2 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 3 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 4 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Viaarxiv icon