Picture for Xiang Bai

Xiang Bai

Huazhong University of Science and Technology

OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward

Add code
Aug 27, 2025
Viaarxiv icon

SoccerNet 2025 Challenges Results

Add code
Aug 26, 2025
Viaarxiv icon

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

Add code
Aug 12, 2025
Viaarxiv icon

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle

Add code
Aug 07, 2025
Viaarxiv icon

Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching

Add code
Jul 03, 2025
Viaarxiv icon

MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling

Add code
Jun 12, 2025
Viaarxiv icon

PlayerOne: Egocentric World Simulator

Add code
Jun 11, 2025
Viaarxiv icon

AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation

Add code
Jun 11, 2025
Viaarxiv icon

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

Add code
Jun 05, 2025
Viaarxiv icon

TokBench: Evaluating Your Visual Tokenizer before Visual Generation

Add code
May 26, 2025
Viaarxiv icon