Alert button
Picture for Remi Munos

Remi Munos

Alert button

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Bookmark button
Alert button
Mar 13, 2024
Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon

Model-free Posterior Sampling via Learning Rate Randomization

Add code
Bookmark button
Alert button
Oct 27, 2023
Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

Figure 1 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 2 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 3 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 4 for Model-free Posterior Sampling via Learning Rate Randomization
Viaarxiv icon

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

Add code
Bookmark button
Alert button
May 02, 2023
Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Remi Munos, Will Dabney, Diana L Borsa

Figure 1 for Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
Figure 2 for Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
Figure 3 for Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
Figure 4 for Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
Viaarxiv icon

Fast Rates for Maximum Entropy Exploration

Add code
Bookmark button
Alert button
Mar 14, 2023
Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

Figure 1 for Fast Rates for Maximum Entropy Exploration
Figure 2 for Fast Rates for Maximum Entropy Exploration
Figure 3 for Fast Rates for Maximum Entropy Exploration
Figure 4 for Fast Rates for Maximum Entropy Exploration
Viaarxiv icon

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

Add code
Bookmark button
Alert button
Sep 28, 2022
Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

Figure 1 for Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
Figure 2 for Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
Figure 3 for Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
Figure 4 for Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
Viaarxiv icon

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Add code
Bookmark button
Alert button
Jun 30, 2022
Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls

Figure 1 for Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Figure 2 for Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Figure 3 for Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Figure 4 for Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Viaarxiv icon

Game Plan: What AI can do for Football, and What Football can do for AI

Add code
Bookmark button
Alert button
Nov 18, 2020
Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder, Ali Eslami, Mark Rowland, Andrew Jaegle, Remi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, Demis Hassabis

Figure 1 for Game Plan: What AI can do for Football, and What Football can do for AI
Figure 2 for Game Plan: What AI can do for Football, and What Football can do for AI
Figure 3 for Game Plan: What AI can do for Football, and What Football can do for AI
Figure 4 for Game Plan: What AI can do for Football, and What Football can do for AI
Viaarxiv icon

Navigating the Landscape of Games

Add code
Bookmark button
Alert button
May 04, 2020
Shayegan Omidshafiei, Karl Tuyls, Wojciech M. Czarnecki, Francisco C. Santos, Mark Rowland, Jerome Connor, Daniel Hennes, Paul Muller, Julien Perolat, Bart De Vylder, Audrunas Gruslys, Remi Munos

Figure 1 for Navigating the Landscape of Games
Figure 2 for Navigating the Landscape of Games
Figure 3 for Navigating the Landscape of Games
Figure 4 for Navigating the Landscape of Games
Viaarxiv icon

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

Add code
Bookmark button
Alert button
Feb 19, 2020
Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls

Figure 1 for From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization
Figure 2 for From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization
Figure 3 for From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization
Figure 4 for From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization
Viaarxiv icon

Hindsight Credit Assignment

Add code
Bookmark button
Alert button
Dec 05, 2019
Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos

Figure 1 for Hindsight Credit Assignment
Figure 2 for Hindsight Credit Assignment
Figure 3 for Hindsight Credit Assignment
Figure 4 for Hindsight Credit Assignment
Viaarxiv icon