Visual Reasoning


VRIQ: Benchmarking and Analyzing Visual-Reasoning IQ of VLMs

Add code
Feb 05, 2026
Viaarxiv icon

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention

Add code
Feb 05, 2026
Viaarxiv icon

V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval

Add code
Feb 05, 2026
Viaarxiv icon

Multimodal Latent Reasoning via Hierarchical Visual Cues Injection

Add code
Feb 05, 2026
Viaarxiv icon

SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Add code
Feb 05, 2026
Viaarxiv icon

Training Data Efficiency in Multimodal Process Reward Models

Add code
Feb 05, 2026
Viaarxiv icon

Allocentric Perceiver: Disentangling Allocentric Reasoning from Egocentric Visual Priors via Frame Instantiation

Add code
Feb 05, 2026
Viaarxiv icon

Weaver: End-to-End Agentic System Training for Video Interleaved Reasoning

Add code
Feb 05, 2026
Viaarxiv icon

Imagine a City: CityGenAgent for Procedural 3D City Generation

Add code
Feb 05, 2026
Viaarxiv icon

RISE-Video: Can Video Generators Decode Implicit World Rules?

Add code
Feb 05, 2026
Viaarxiv icon