Alert button
Picture for Bilal Piot

Bilal Piot

Alert button

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Feb 08, 2024
Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Viaarxiv icon

Direct Language Model Alignment from Online AI Feedback

Feb 07, 2024
Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi Liu, Misha Khalman, Felipe Llinares, Alexandre Rame, Thomas Mesnard, Yao Zhao, Bilal Piot, Johan Ferret, Mathieu Blondel

Viaarxiv icon

Nash Learning from Human Feedback

Dec 06, 2023
Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Oct 18, 2023
Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon

Unlocking the Power of Representations in Long-term Novelty-based Exploration

May 02, 2023
Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot

Figure 1 for Unlocking the Power of Representations in Long-term Novelty-based Exploration
Figure 2 for Unlocking the Power of Representations in Long-term Novelty-based Exploration
Figure 3 for Unlocking the Power of Representations in Long-term Novelty-based Exploration
Figure 4 for Unlocking the Power of Representations in Long-term Novelty-based Exploration
Viaarxiv icon

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

Feb 09, 2023
Pierre H. Richemond, Allison Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill

Figure 1 for The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
Figure 2 for The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
Figure 3 for The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
Figure 4 for The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
Viaarxiv icon

Understanding Self-Predictive Learning for Reinforcement Learning

Dec 06, 2022
Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

Figure 1 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 2 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 3 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 4 for Understanding Self-Predictive Learning for Reinforcement Learning
Viaarxiv icon

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Jun 30, 2022
Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls

Figure 1 for Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Figure 2 for Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Figure 3 for Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Figure 4 for Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Viaarxiv icon

BYOL-Explore: Exploration by Bootstrapped Prediction

Jun 16, 2022
Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

Figure 1 for BYOL-Explore: Exploration by Bootstrapped Prediction
Figure 2 for BYOL-Explore: Exploration by Bootstrapped Prediction
Figure 3 for BYOL-Explore: Exploration by Bootstrapped Prediction
Figure 4 for BYOL-Explore: Exploration by Bootstrapped Prediction
Viaarxiv icon

Shaking the foundations: delusions in sequence models for interaction and control

Oct 20, 2021
Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg

Figure 1 for Shaking the foundations: delusions in sequence models for interaction and control
Figure 2 for Shaking the foundations: delusions in sequence models for interaction and control
Figure 3 for Shaking the foundations: delusions in sequence models for interaction and control
Figure 4 for Shaking the foundations: delusions in sequence models for interaction and control
Viaarxiv icon