Picture for Botian Shi

Botian Shi

O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering

Add code
May 22, 2025
Viaarxiv icon

GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling

Add code
Apr 30, 2025
Viaarxiv icon

TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

Add code
Apr 22, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

RAKG:Document-level Retrieval Augmented Knowledge Graph Construction

Add code
Apr 14, 2025
Viaarxiv icon

OmniCaptioner: One Captioner to Rule Them All

Add code
Apr 09, 2025
Viaarxiv icon

Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning

Add code
Mar 17, 2025
Viaarxiv icon

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Add code
Mar 10, 2025
Viaarxiv icon

LimSim Series: An Autonomous Driving Simulation Platform for Validation and Enhancement

Add code
Feb 13, 2025
Viaarxiv icon

LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking

Add code
Jan 14, 2025
Viaarxiv icon