Picture for Haitao Mi

Haitao Mi

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

Add code
Jan 22, 2026
Viaarxiv icon

RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis

Add code
Jan 13, 2026
Viaarxiv icon

DocDancer: Towards Agentic Document-Grounded Information Seeking

Add code
Jan 08, 2026
Viaarxiv icon

Stable and Efficient Single-Rollout RL for Multimodal Reasoning

Add code
Dec 20, 2025
Figure 1 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 2 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 3 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 4 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Viaarxiv icon

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Add code
Dec 17, 2025
Figure 1 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Figure 2 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Figure 3 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Figure 4 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Viaarxiv icon

SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning

Add code
Nov 17, 2025
Viaarxiv icon

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

Add code
Oct 31, 2025
Viaarxiv icon

The End of Manual Decoding: Towards Truly End-to-End Language Models

Add code
Oct 30, 2025
Figure 1 for The End of Manual Decoding: Towards Truly End-to-End Language Models
Figure 2 for The End of Manual Decoding: Towards Truly End-to-End Language Models
Figure 3 for The End of Manual Decoding: Towards Truly End-to-End Language Models
Figure 4 for The End of Manual Decoding: Towards Truly End-to-End Language Models
Viaarxiv icon

Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents

Add code
Oct 16, 2025
Viaarxiv icon