Picture for Dingkang Liang

Dingkang Liang

Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models

Add code
May 03, 2026
Viaarxiv icon

When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

Add code
Apr 09, 2026
Viaarxiv icon

PointTPA: Dynamic Network Parameter Adaptation for 3D Scene Understanding

Add code
Apr 06, 2026
Viaarxiv icon

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Add code
Mar 26, 2026
Viaarxiv icon

Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Add code
Mar 19, 2026
Viaarxiv icon

Towards Generalizable Robotic Manipulation in Dynamic Environments

Add code
Mar 16, 2026
Viaarxiv icon

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

Add code
Mar 12, 2026
Viaarxiv icon

ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding

Add code
Feb 26, 2026
Viaarxiv icon

MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning

Add code
Dec 16, 2025
Figure 1 for MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Figure 2 for MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Figure 3 for MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Figure 4 for MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Viaarxiv icon

NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding

Add code
Oct 31, 2025
Viaarxiv icon