Picture for Zhibo Yang

Zhibo Yang

Triviality Corrected Endogenous Reward

Add code
Apr 13, 2026
Viaarxiv icon

Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos

Add code
Mar 18, 2026
Viaarxiv icon

CodePercept: Code-Grounded Visual STEM Perception for MLLMs

Add code
Mar 11, 2026
Viaarxiv icon

From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning

Add code
Mar 04, 2026
Viaarxiv icon

UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents

Add code
Feb 03, 2026
Viaarxiv icon

BabyVision: Visual Reasoning Beyond Language

Add code
Jan 10, 2026
Viaarxiv icon

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Add code
Jan 08, 2026
Viaarxiv icon

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

Add code
Aug 12, 2025
Viaarxiv icon

MAER-Nav: Bidirectional Motion Learning Through Mirror-Augmented Experience Replay for Robot Navigation

Add code
Mar 31, 2025
Figure 1 for MAER-Nav: Bidirectional Motion Learning Through Mirror-Augmented Experience Replay for Robot Navigation
Figure 2 for MAER-Nav: Bidirectional Motion Learning Through Mirror-Augmented Experience Replay for Robot Navigation
Figure 3 for MAER-Nav: Bidirectional Motion Learning Through Mirror-Augmented Experience Replay for Robot Navigation
Figure 4 for MAER-Nav: Bidirectional Motion Learning Through Mirror-Augmented Experience Replay for Robot Navigation
Viaarxiv icon

Generative Compositor for Few-Shot Visual Information Extraction

Add code
Mar 21, 2025
Viaarxiv icon