Picture for Ran He

Ran He

OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction

Add code
Apr 12, 2026
Viaarxiv icon

Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

Add code
Apr 09, 2026
Viaarxiv icon

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Add code
Apr 06, 2026
Viaarxiv icon

MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding

Add code
Mar 24, 2026
Viaarxiv icon

Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth

Add code
Mar 24, 2026
Viaarxiv icon

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

Add code
Mar 23, 2026
Viaarxiv icon

What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time

Add code
Mar 20, 2026
Viaarxiv icon

GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection?

Add code
Mar 19, 2026
Viaarxiv icon

PACED: Distillation and Self-Distillation at the Frontier of Student Competence

Add code
Mar 16, 2026
Viaarxiv icon

PACED: Distillation at the Frontier of Student Competence

Add code
Mar 11, 2026
Viaarxiv icon