Picture for Wangmeng Zuo

Wangmeng Zuo

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Add code
Jun 09, 2025
Viaarxiv icon

Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

Add code
Jun 04, 2025
Viaarxiv icon

MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM

Add code
May 30, 2025
Viaarxiv icon

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

Add code
May 28, 2025
Viaarxiv icon

Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning

Add code
May 26, 2025
Viaarxiv icon

Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning

Add code
May 26, 2025
Viaarxiv icon

Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem Solving

Add code
May 23, 2025
Viaarxiv icon

High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution

Add code
May 11, 2025
Viaarxiv icon

Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting

Add code
May 07, 2025
Viaarxiv icon

EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation

Add code
Apr 28, 2025
Viaarxiv icon