Picture for Zhongang Cai

Zhongang Cai

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Add code
Mar 19, 2026
Viaarxiv icon

Demystifing Video Reasoning

Add code
Mar 17, 2026
Viaarxiv icon

A Very Big Video Reasoning Suite

Add code
Feb 24, 2026
Viaarxiv icon

VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery

Add code
Feb 22, 2026
Viaarxiv icon

Scaling Spatial Intelligence with Multimodal Foundation Models

Add code
Nov 17, 2025
Figure 1 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 2 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 3 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 4 for Scaling Spatial Intelligence with Multimodal Foundation Models
Viaarxiv icon

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

Add code
Oct 30, 2025
Viaarxiv icon

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

Add code
Aug 18, 2025
Figure 1 for Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Figure 2 for Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Figure 3 for Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Figure 4 for Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Viaarxiv icon

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior

Add code
Aug 01, 2025
Viaarxiv icon

ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization

Add code
May 15, 2025
Viaarxiv icon

EgoLife: Towards Egocentric Life Assistant

Add code
Mar 05, 2025
Viaarxiv icon