Picture for Botian Shi

Botian Shi

ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

Add code
Jun 11, 2026
Viaarxiv icon

IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing

Add code
Jun 11, 2026
Viaarxiv icon

EviProp: Seeded Relevance Diffusion on Chunk-Page Graphs for Long Multimodal Document Retrieval

Add code
Jun 08, 2026
Viaarxiv icon

IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval

Add code
Jun 04, 2026
Viaarxiv icon

SPIRAL: A Closed-Loop Framework for Self-Improving Action World Models via Reflective Planning Agents

Add code
Mar 11, 2026
Viaarxiv icon

Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding

Add code
Feb 13, 2026
Viaarxiv icon

The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios

Add code
Jan 13, 2026
Viaarxiv icon

SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration

Add code
Dec 25, 2025
Figure 1 for SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration
Figure 2 for SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration
Figure 3 for SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration
Figure 4 for SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration
Viaarxiv icon

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

Add code
Oct 09, 2025
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Figure 1 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 2 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 3 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 4 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Viaarxiv icon