Picture for Lifan Yuan

Lifan Yuan

Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs

Add code
Feb 07, 2026
Viaarxiv icon

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Add code
Nov 10, 2025
Figure 1 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Figure 2 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Figure 3 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Figure 4 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Viaarxiv icon

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

Add code
Oct 01, 2025
Figure 1 for Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Figure 2 for Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Figure 3 for Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Figure 4 for Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Viaarxiv icon

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Add code
May 28, 2025
Figure 1 for The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Figure 2 for The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Figure 3 for The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Figure 4 for The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Viaarxiv icon

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Add code
May 21, 2025
Figure 1 for The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Figure 2 for The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Figure 3 for The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Figure 4 for The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Viaarxiv icon

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Add code
May 16, 2025
Figure 1 for Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Figure 2 for Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Figure 3 for Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Figure 4 for Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Viaarxiv icon

Process Reinforcement through Implicit Rewards

Add code
Feb 03, 2025
Figure 1 for Process Reinforcement through Implicit Rewards
Figure 2 for Process Reinforcement through Implicit Rewards
Figure 3 for Process Reinforcement through Implicit Rewards
Figure 4 for Process Reinforcement through Implicit Rewards
Viaarxiv icon

Free Process Rewards without Process Labels

Add code
Dec 02, 2024
Figure 1 for Free Process Rewards without Process Labels
Figure 2 for Free Process Rewards without Process Labels
Figure 3 for Free Process Rewards without Process Labels
Figure 4 for Free Process Rewards without Process Labels
Viaarxiv icon

Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity

Add code
Jun 17, 2024
Figure 1 for Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity
Figure 2 for Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity
Figure 3 for Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity
Figure 4 for Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity
Viaarxiv icon

Advancing LLM Reasoning Generalists with Preference Trees

Add code
Apr 02, 2024
Figure 1 for Advancing LLM Reasoning Generalists with Preference Trees
Figure 2 for Advancing LLM Reasoning Generalists with Preference Trees
Figure 3 for Advancing LLM Reasoning Generalists with Preference Trees
Figure 4 for Advancing LLM Reasoning Generalists with Preference Trees
Viaarxiv icon