Picture for Haitao Mi

Haitao Mi

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

Add code
May 14, 2026
Viaarxiv icon

DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

Add code
May 10, 2026
Viaarxiv icon

Reinforcing Multimodal Reasoning Against Visual Degradation

Add code
May 10, 2026
Viaarxiv icon

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Add code
Apr 23, 2026
Viaarxiv icon

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

Add code
Apr 20, 2026
Viaarxiv icon

Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data

Add code
Apr 20, 2026
Viaarxiv icon

The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

Add code
Feb 12, 2026
Viaarxiv icon

Free(): Learning to Forget in Malloc-Only Reasoning Models

Add code
Feb 08, 2026
Viaarxiv icon

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

Add code
Feb 04, 2026
Viaarxiv icon

Verified Critical Step Optimization for LLM Agents

Add code
Feb 03, 2026
Viaarxiv icon