Picture for Xiang Bai

Xiang Bai

Huazhong University of Science and Technology

Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching

Add code
Jul 03, 2025
Viaarxiv icon

MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling

Add code
Jun 12, 2025
Viaarxiv icon

AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation

Add code
Jun 11, 2025
Viaarxiv icon

PlayerOne: Egocentric World Simulator

Add code
Jun 11, 2025
Viaarxiv icon

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

Add code
Jun 05, 2025
Viaarxiv icon

TokBench: Evaluating Your Visual Tokenizer before Visual Generation

Add code
May 26, 2025
Viaarxiv icon

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?

Add code
May 16, 2025
Viaarxiv icon

Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving

Add code
May 13, 2025
Viaarxiv icon

Tetrahedron-Net for Medical Image Registration

Add code
May 07, 2025
Viaarxiv icon

Visual Text Processing: A Comprehensive Review and Unified Evaluation

Add code
Apr 30, 2025
Viaarxiv icon