Picture for Haitao Mi

Haitao Mi

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Add code
Aug 27, 2025
Viaarxiv icon

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Add code
Aug 07, 2025
Viaarxiv icon

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Add code
May 29, 2025
Viaarxiv icon

VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models

Add code
May 28, 2025
Viaarxiv icon

InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing

Add code
May 28, 2025
Viaarxiv icon

WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback

Add code
May 26, 2025
Viaarxiv icon

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

Add code
May 20, 2025
Viaarxiv icon

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Add code
May 19, 2025
Viaarxiv icon

MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation

Add code
May 16, 2025
Viaarxiv icon

Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation

Add code
May 06, 2025
Viaarxiv icon