Picture for Cong Huang

Cong Huang

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields

Add code
May 07, 2026
Viaarxiv icon

DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax

Add code
Apr 20, 2026
Viaarxiv icon

3D-Mix for VLA: A Plug-and-Play Module for Integrating VGGT-based 3D Information into Vision-Language-Action Models

Add code
Mar 25, 2026
Viaarxiv icon

CyboRacket: A Perception-to-Action Framework for Humanoid Racket Sports

Add code
Mar 15, 2026
Viaarxiv icon

Cybo-Waiter: A Physical Agentic Framework for Humanoid Whole-Body Locomotion-Manipulation

Add code
Mar 11, 2026
Viaarxiv icon

ScalSelect: Scalable Training-Free Multimodal Data Selection for Efficient Visual Instruction Tuning

Add code
Feb 12, 2026
Viaarxiv icon

A$^2$-LLM: An End-to-end Conversational Audio Avatar Large Language Model

Add code
Feb 04, 2026
Viaarxiv icon

LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

Add code
Jan 27, 2026
Viaarxiv icon

BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

Add code
Jan 21, 2026
Viaarxiv icon

TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

Add code
Jan 20, 2026
Viaarxiv icon