Picture for Zhipeng Xu

Zhipeng Xu

CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing

Add code
May 05, 2026
Viaarxiv icon

Transparent and Controllable Recommendation Filtering via Multimodal Multi-Agent Collaboration

Add code
Apr 19, 2026
Viaarxiv icon

ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment

Add code
Apr 08, 2026
Viaarxiv icon

Reasoning-Driven Multimodal LLM for Domain Generalization

Add code
Feb 27, 2026
Viaarxiv icon

UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents

Add code
Feb 03, 2026
Viaarxiv icon

MiMo-V2-Flash Technical Report

Add code
Jan 08, 2026
Viaarxiv icon

MiMo-Audio: Audio Language Models are Few-Shot Learners

Add code
Dec 29, 2025
Viaarxiv icon

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

Add code
Jul 03, 2025
Viaarxiv icon

Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Add code
May 28, 2025
Figure 1 for Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning
Figure 2 for Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning
Figure 3 for Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning
Figure 4 for Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning
Viaarxiv icon

StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning

Add code
May 20, 2025
Viaarxiv icon