Alert button
Picture for Pierre Harvey Richemond

Pierre Harvey Richemond

Alert button

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Bookmark button
Alert button
Mar 13, 2024
Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Bookmark button
Alert button
Feb 08, 2024
Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Viaarxiv icon

Understanding Self-Predictive Learning for Reinforcement Learning

Add code
Bookmark button
Alert button
Dec 06, 2022
Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

Figure 1 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 2 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 3 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 4 for Understanding Self-Predictive Learning for Reinforcement Learning
Viaarxiv icon