Picture for Yadong Mu

Yadong Mu

Columbia University

Balancing Image Compression and Generation with Bootstrapped Tokenization

Add code
Jun 04, 2026
Viaarxiv icon

GOPAgen: Motion-Aware and Efficient Agentic Long-Video Understanding with Structural Memory and Hierarchical Reasoning

Add code
Jun 03, 2026
Viaarxiv icon

RePlan-Bot: Multi-Level Replanning for Embodied Instruction Following

Add code
May 25, 2026
Viaarxiv icon

Extending Embodied Question Answering from Perception to Decision

Add code
May 25, 2026
Viaarxiv icon

RotVLA: Rotational Latent Action for Vision-Language-Action Model

Add code
May 13, 2026
Viaarxiv icon

RoboAgent: Chaining Basic Capabilities for Embodied Task Planning

Add code
Apr 09, 2026
Viaarxiv icon

ProgressVLA: Progress-Guided Diffusion Policy for Vision-Language Robotic Manipulation

Add code
Mar 29, 2026
Viaarxiv icon

Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation

Add code
Mar 23, 2026
Viaarxiv icon

Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations

Add code
Dec 24, 2025
Figure 1 for Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Figure 2 for Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Figure 3 for Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Figure 4 for Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Viaarxiv icon

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Add code
Jun 05, 2025
Viaarxiv icon