Picture for Botian Shi

Botian Shi

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Viaarxiv icon

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

Add code
Aug 14, 2025
Viaarxiv icon

MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs

Add code
Aug 07, 2025
Viaarxiv icon

Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback

Add code
Jul 28, 2025
Viaarxiv icon

O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering

Add code
May 22, 2025
Viaarxiv icon

GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling

Add code
Apr 30, 2025
Viaarxiv icon

TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

Add code
Apr 22, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

RAKG:Document-level Retrieval Augmented Knowledge Graph Construction

Add code
Apr 14, 2025
Viaarxiv icon

OmniCaptioner: One Captioner to Rule Them All

Add code
Apr 09, 2025
Viaarxiv icon