Picture for Wei Fu

Wei Fu

How Far Are We from Optimal Reasoning Efficiency?

Add code
Jun 08, 2025
Viaarxiv icon

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Add code
May 30, 2025
Viaarxiv icon

On Designing Effective RL Reward at Training Time for LLM Reasoning

Add code
Oct 19, 2024
Figure 1 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 2 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 3 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 4 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Viaarxiv icon

ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

Add code
Jun 20, 2024
Figure 1 for ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
Figure 2 for ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
Figure 3 for ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
Figure 4 for ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
Viaarxiv icon

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Add code
Apr 16, 2024
Viaarxiv icon

Learning Agile Bipedal Motions on a Quadrupedal Robot

Add code
Nov 10, 2023
Viaarxiv icon

Iteratively Learn Diverse Strategies with State Distance Information

Add code
Oct 23, 2023
Figure 1 for Iteratively Learn Diverse Strategies with State Distance Information
Figure 2 for Iteratively Learn Diverse Strategies with State Distance Information
Figure 3 for Iteratively Learn Diverse Strategies with State Distance Information
Figure 4 for Iteratively Learn Diverse Strategies with State Distance Information
Viaarxiv icon

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Add code
Jul 05, 2023
Viaarxiv icon

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Add code
Jun 15, 2022
Figure 1 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Figure 2 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Figure 3 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Figure 4 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Viaarxiv icon

Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization

Add code
Apr 04, 2022
Figure 1 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Figure 2 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Figure 3 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Figure 4 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Viaarxiv icon