Picture for Yongming Rao

Yongming Rao

ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

Add code
Jun 25, 2026
Viaarxiv icon

Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

Add code
Jun 12, 2026
Viaarxiv icon

GEM: Generative Supervision Helps Embodied Intelligence

Add code
May 27, 2026
Viaarxiv icon

HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents

Add code
Apr 08, 2026
Viaarxiv icon

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Add code
Mar 18, 2026
Viaarxiv icon

Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Add code
Mar 12, 2026
Viaarxiv icon

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Add code
Nov 19, 2025
Viaarxiv icon

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

Add code
Jul 29, 2025
Viaarxiv icon

Vision Generalist Model: A Survey

Add code
Jun 11, 2025
Viaarxiv icon

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Add code
Jun 05, 2025
Viaarxiv icon