Picture for Zhiyu Mei

Zhiyu Mei

How Far Are We from Optimal Reasoning Efficiency?

Add code
Jun 08, 2025
Viaarxiv icon

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Add code
May 30, 2025
Viaarxiv icon

On Designing Effective RL Reward at Training Time for LLM Reasoning

Add code
Oct 19, 2024
Figure 1 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 2 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 3 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 4 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Viaarxiv icon

ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

Add code
Jun 20, 2024
Figure 1 for ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
Figure 2 for ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
Figure 3 for ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
Figure 4 for ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
Viaarxiv icon

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Add code
Apr 16, 2024
Viaarxiv icon

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Add code
Jul 05, 2023
Viaarxiv icon