Picture for Fei Mi

Fei Mi

and Other Contributors

Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

Add code
Dec 23, 2025
Viaarxiv icon

Rethinking Expert Trajectory Utilization in LLM Post-training

Add code
Dec 12, 2025
Viaarxiv icon

The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs

Add code
Jul 10, 2025
Figure 1 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Figure 2 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Figure 3 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Figure 4 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Viaarxiv icon

ClusterUCB: Efficient Gradient-Based Data Selection for Targeted Fine-Tuning of LLMs

Add code
Jun 12, 2025
Viaarxiv icon

Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition

Add code
May 29, 2025
Figure 1 for Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Figure 2 for Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Figure 3 for Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Figure 4 for Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Viaarxiv icon

Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning

Add code
May 28, 2025
Figure 1 for Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Figure 2 for Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Figure 3 for Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Figure 4 for Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Viaarxiv icon

Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity

Add code
May 28, 2025
Viaarxiv icon

How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study

Add code
May 21, 2025
Viaarxiv icon

Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs

Add code
Apr 10, 2025
Figure 1 for Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs
Figure 2 for Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs
Figure 3 for Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs
Figure 4 for Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs
Viaarxiv icon

DAST: Difficulty-Aware Self-Training on Large Language Models

Add code
Mar 12, 2025
Viaarxiv icon