Picture for Wei Fu

Wei Fu

ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

Add code
Jun 20, 2024
Viaarxiv icon

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Add code
Apr 16, 2024
Viaarxiv icon

Learning Agile Bipedal Motions on a Quadrupedal Robot

Add code
Nov 10, 2023
Figure 1 for Learning Agile Bipedal Motions on a Quadrupedal Robot
Figure 2 for Learning Agile Bipedal Motions on a Quadrupedal Robot
Figure 3 for Learning Agile Bipedal Motions on a Quadrupedal Robot
Figure 4 for Learning Agile Bipedal Motions on a Quadrupedal Robot
Viaarxiv icon

Iteratively Learn Diverse Strategies with State Distance Information

Add code
Oct 23, 2023
Figure 1 for Iteratively Learn Diverse Strategies with State Distance Information
Figure 2 for Iteratively Learn Diverse Strategies with State Distance Information
Figure 3 for Iteratively Learn Diverse Strategies with State Distance Information
Figure 4 for Iteratively Learn Diverse Strategies with State Distance Information
Viaarxiv icon

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Add code
Jul 05, 2023
Figure 1 for SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores
Figure 2 for SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores
Figure 3 for SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores
Figure 4 for SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores
Viaarxiv icon

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Add code
Jun 15, 2022
Figure 1 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Figure 2 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Figure 3 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Figure 4 for Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Viaarxiv icon

Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization

Add code
Apr 04, 2022
Figure 1 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Figure 2 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Figure 3 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Figure 4 for Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Viaarxiv icon

How to "DODGE" Complex Software Analytics?

Add code
Feb 05, 2019
Figure 1 for How to "DODGE" Complex Software Analytics?
Figure 2 for How to "DODGE" Complex Software Analytics?
Figure 3 for How to "DODGE" Complex Software Analytics?
Figure 4 for How to "DODGE" Complex Software Analytics?
Viaarxiv icon

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Add code
Feb 20, 2018
Figure 1 for What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)
Figure 2 for What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)
Figure 3 for What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)
Figure 4 for What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)
Viaarxiv icon

500+ Times Faster Than Deep Learning (A Case Study Exploring Faster Methods for Text Mining StackOverflow)

Add code
Feb 14, 2018
Figure 1 for 500+ Times Faster Than Deep Learning (A Case Study Exploring Faster Methods for Text Mining StackOverflow)
Figure 2 for 500+ Times Faster Than Deep Learning (A Case Study Exploring Faster Methods for Text Mining StackOverflow)
Figure 3 for 500+ Times Faster Than Deep Learning (A Case Study Exploring Faster Methods for Text Mining StackOverflow)
Figure 4 for 500+ Times Faster Than Deep Learning (A Case Study Exploring Faster Methods for Text Mining StackOverflow)
Viaarxiv icon