Alert button
Picture for Weixun Wang

Weixun Wang

Alert button

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Add code
Bookmark button
Alert button
Mar 24, 2024
Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

Viaarxiv icon

Off-Beat Multi-Agent Reinforcement Learning

Add code
Bookmark button
Alert button
May 27, 2022
Wei Qiu, Weixun Wang, Rundong Wang, Bo An, Yujing Hu, Svetlana Obraztsova, Zinovi Rabinovich, Jianye Hao, Yingfeng Chen, Changjie Fan

Figure 1 for Off-Beat Multi-Agent Reinforcement Learning
Figure 2 for Off-Beat Multi-Agent Reinforcement Learning
Figure 3 for Off-Beat Multi-Agent Reinforcement Learning
Figure 4 for Off-Beat Multi-Agent Reinforcement Learning
Viaarxiv icon

A2C is a special case of PPO

Add code
Bookmark button
Alert button
May 18, 2022
Shengyi Huang, Anssi Kanervisto, Antonin Raffin, Weixun Wang, Santiago Ontañón, Rousslan Fernand Julien Dossa

Figure 1 for A2C is a special case of PPO
Viaarxiv icon

Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

Add code
Bookmark button
Alert button
Mar 16, 2022
Jian Zhao, Youpeng Zhao, Weixun Wang, Mingyu Yang, Xunhan Hu, Wengang Zhou, Jianye Hao, Houqiang Li

Figure 1 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents
Figure 2 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents
Figure 3 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents
Figure 4 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents
Viaarxiv icon

API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks

Add code
Bookmark button
Alert button
Mar 10, 2022
Xiaotian Hao, Weixun Wang, Hangyu Mao, Yaodong Yang, Dong Li, Yan Zheng, Zhen Wang, Jianye Hao

Figure 1 for API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks
Figure 2 for API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks
Figure 3 for API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks
Figure 4 for API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks
Viaarxiv icon

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

Add code
Bookmark button
Alert button
Feb 16, 2022
Jian Zhao, Yue Zhang, Xunhan Hu, Weixun Wang, Wengang Zhou, Jianye Hao, Jiangcheng Zhu, Houqiang Li

Figure 1 for Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization
Figure 2 for Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization
Figure 3 for Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization
Figure 4 for Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization
Viaarxiv icon

Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

Add code
Bookmark button
Alert button
Jun 03, 2021
Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, Jianye Hao

Figure 1 for Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment
Figure 2 for Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment
Figure 3 for Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment
Figure 4 for Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment
Viaarxiv icon

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Add code
Bookmark button
Alert button
Nov 05, 2020
Yujing Hu, Weixun Wang, Hangtian Jia, Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Figure 1 for Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
Figure 2 for Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
Figure 3 for Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
Figure 4 for Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
Viaarxiv icon