Picture for Yongming Rao

Yongming Rao

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Add code
Mar 18, 2026
Viaarxiv icon

Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Add code
Mar 12, 2026
Viaarxiv icon

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Add code
Nov 19, 2025
Viaarxiv icon

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

Add code
Jul 29, 2025
Viaarxiv icon

Vision Generalist Model: A Survey

Add code
Jun 11, 2025
Viaarxiv icon

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Add code
Jun 05, 2025
Viaarxiv icon

Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model

Add code
May 26, 2025
Viaarxiv icon

R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation

Add code
May 04, 2025
Viaarxiv icon

BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries

Add code
Mar 16, 2025
Figure 1 for BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Figure 2 for BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Figure 3 for BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Figure 4 for BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Viaarxiv icon

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Add code
Feb 06, 2025
Figure 1 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Figure 2 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Figure 3 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Figure 4 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Viaarxiv icon