Picture for Mingyu Liu

Mingyu Liu

What to Ignore, What to React: Visually Robust RL Fine-Tuning of VLA Models

Add code
May 13, 2026
Viaarxiv icon

CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs

Add code
Apr 22, 2026
Viaarxiv icon

LoViF 2026 Challenge on Real-World All-in-One Image Restoration: Methods and Results

Add code
Apr 21, 2026
Viaarxiv icon

OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering

Add code
Apr 09, 2026
Viaarxiv icon

SGTA: Scene-Graph Based Multi-Modal Traffic Agent for Video Understanding

Add code
Apr 04, 2026
Viaarxiv icon

World Guidance: World Modeling in Condition Space for Action Generation

Add code
Feb 25, 2026
Viaarxiv icon

VideoAfford: Grounding 3D Affordance from Human-Object-Interaction Videos via Multimodal Large Language Model

Add code
Feb 10, 2026
Viaarxiv icon

StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation

Add code
Oct 06, 2025
Viaarxiv icon

Learning by Imagining: Debiased Feature Augmentation for Compositional Zero-Shot Learning

Add code
Sep 16, 2025
Viaarxiv icon

Learning Primitive Embodied World Models: Towards Scalable Robotic Learning

Add code
Aug 28, 2025
Figure 1 for Learning Primitive Embodied World Models: Towards Scalable Robotic Learning
Figure 2 for Learning Primitive Embodied World Models: Towards Scalable Robotic Learning
Figure 3 for Learning Primitive Embodied World Models: Towards Scalable Robotic Learning
Figure 4 for Learning Primitive Embodied World Models: Towards Scalable Robotic Learning
Viaarxiv icon