Picture for Rui Shao

Rui Shao

SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation

Add code
Nov 13, 2025
Figure 1 for SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation
Figure 2 for SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation
Figure 3 for SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation
Figure 4 for SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation
Viaarxiv icon

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification

Add code
Aug 28, 2025
Figure 1 for CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification
Figure 2 for CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification
Figure 3 for CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification
Figure 4 for CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification
Viaarxiv icon

Incorporating Legal Logic into Deep Learning: An Intelligent Approach to Probation Prediction

Add code
Aug 17, 2025
Viaarxiv icon

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Add code
Jul 03, 2025
Viaarxiv icon

Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems

Add code
Jun 15, 2025
Figure 1 for Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems
Figure 2 for Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems
Figure 3 for Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems
Figure 4 for Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems
Viaarxiv icon

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

Add code
Jun 12, 2025
Figure 1 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Figure 2 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Figure 3 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Figure 4 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Viaarxiv icon

Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts

Add code
Jun 12, 2025
Figure 1 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Figure 2 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Figure 3 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Figure 4 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Viaarxiv icon

STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization

Add code
Jun 04, 2025
Figure 1 for STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
Figure 2 for STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
Figure 3 for STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
Figure 4 for STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
Viaarxiv icon

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

Add code
May 22, 2025
Viaarxiv icon

DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer

Add code
Apr 28, 2025
Figure 1 for DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer
Figure 2 for DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer
Figure 3 for DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer
Figure 4 for DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer
Viaarxiv icon