Picture for Rui Shao

Rui Shao

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Add code
Jul 03, 2025
Viaarxiv icon

Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems

Add code
Jun 15, 2025
Viaarxiv icon

Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts

Add code
Jun 12, 2025
Viaarxiv icon

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

Add code
Jun 12, 2025
Viaarxiv icon

STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization

Add code
Jun 04, 2025
Viaarxiv icon

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

Add code
May 22, 2025
Viaarxiv icon

DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer

Add code
Apr 28, 2025
Viaarxiv icon

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation

Add code
Mar 13, 2025
Viaarxiv icon

TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs

Add code
Mar 13, 2025
Viaarxiv icon

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Add code
Mar 05, 2025
Viaarxiv icon