Picture for Lifan Yuan

Lifan Yuan

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Add code
May 28, 2025
Viaarxiv icon

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Add code
May 21, 2025
Viaarxiv icon

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Add code
May 16, 2025
Viaarxiv icon

Process Reinforcement through Implicit Rewards

Add code
Feb 03, 2025
Viaarxiv icon

Free Process Rewards without Process Labels

Add code
Dec 02, 2024
Figure 1 for Free Process Rewards without Process Labels
Figure 2 for Free Process Rewards without Process Labels
Figure 3 for Free Process Rewards without Process Labels
Figure 4 for Free Process Rewards without Process Labels
Viaarxiv icon

Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity

Add code
Jun 17, 2024
Figure 1 for Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity
Figure 2 for Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity
Figure 3 for Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity
Figure 4 for Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity
Viaarxiv icon

Advancing LLM Reasoning Generalists with Preference Trees

Add code
Apr 02, 2024
Figure 1 for Advancing LLM Reasoning Generalists with Preference Trees
Figure 2 for Advancing LLM Reasoning Generalists with Preference Trees
Figure 3 for Advancing LLM Reasoning Generalists with Preference Trees
Figure 4 for Advancing LLM Reasoning Generalists with Preference Trees
Viaarxiv icon

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

Add code
Feb 29, 2024
Figure 1 for Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
Figure 2 for Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
Figure 3 for Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
Figure 4 for Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
Viaarxiv icon

Executable Code Actions Elicit Better LLM Agents

Add code
Feb 01, 2024
Viaarxiv icon

Prudent Silence or Foolish Babble? Examining Large Language Models' Responses to the Unknown

Add code
Nov 16, 2023
Viaarxiv icon