Picture for Shengyi Huang

Shengyi Huang

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Add code
Mar 24, 2024
Viaarxiv icon

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Add code
Feb 05, 2024
Viaarxiv icon

Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks

Add code
Oct 26, 2023
Viaarxiv icon

Zephyr: Direct Distillation of LM Alignment

Add code
Oct 25, 2023
Figure 1 for Zephyr: Direct Distillation of LM Alignment
Figure 2 for Zephyr: Direct Distillation of LM Alignment
Figure 3 for Zephyr: Direct Distillation of LM Alignment
Figure 4 for Zephyr: Direct Distillation of LM Alignment
Viaarxiv icon

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Add code
Sep 29, 2023
Figure 1 for Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
Figure 2 for Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
Figure 3 for Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
Figure 4 for Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
Viaarxiv icon

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

Add code
Jun 21, 2022
Figure 1 for EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
Figure 2 for EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
Figure 3 for EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
Figure 4 for EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
Viaarxiv icon

A2C is a special case of PPO

Add code
May 18, 2022
Figure 1 for A2C is a special case of PPO
Viaarxiv icon

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

Add code
Nov 16, 2021
Figure 1 for CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms
Figure 2 for CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms
Viaarxiv icon

Gym-$μ$RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning

Add code
May 21, 2021
Figure 1 for Gym-$μ$RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning
Figure 2 for Gym-$μ$RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning
Figure 3 for Gym-$μ$RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning
Figure 4 for Gym-$μ$RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning
Viaarxiv icon

Griddly: A platform for AI research in games

Add code
Nov 21, 2020
Figure 1 for Griddly: A platform for AI research in games
Figure 2 for Griddly: A platform for AI research in games
Figure 3 for Griddly: A platform for AI research in games
Figure 4 for Griddly: A platform for AI research in games
Viaarxiv icon