Picture for Haoyu Cao

Haoyu Cao

When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition

Add code
Mar 17, 2026
Viaarxiv icon

Can Unified Generation and Understanding Models Maintain Semantic Equivalence Across Different Output Modalities?

Add code
Feb 27, 2026
Viaarxiv icon

RISE-Video: Can Video Generators Decode Implicit World Rules?

Add code
Feb 05, 2026
Viaarxiv icon

Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding

Add code
Jan 28, 2026
Viaarxiv icon

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Add code
Jan 27, 2026
Viaarxiv icon

TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning

Add code
Jan 23, 2026
Viaarxiv icon

DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model

Add code
Dec 14, 2025
Viaarxiv icon

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

Add code
Oct 10, 2025
Viaarxiv icon

CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs

Add code
Aug 09, 2025
Viaarxiv icon

BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models

Add code
Aug 09, 2025
Viaarxiv icon