Alert button
Picture for Rémi Munos

Rémi Munos

Alert button

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

Add code
Bookmark button
Alert button
Feb 12, 2024
Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney

Viaarxiv icon

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

Add code
Bookmark button
Alert button
Feb 08, 2024
Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney

Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Bookmark button
Alert button
Feb 08, 2024
Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Viaarxiv icon

Nash Learning from Human Feedback

Add code
Bookmark button
Alert button
Dec 06, 2023
Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Add code
Bookmark button
Alert button
Oct 18, 2023
Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon

Local and adaptive mirror descents in extensive-form games

Add code
Bookmark button
Alert button
Sep 01, 2023
Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Figure 1 for Local and adaptive mirror descents in extensive-form games
Viaarxiv icon

VA-learning as a more efficient alternative to Q-learning

Add code
Bookmark button
Alert button
May 29, 2023
Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko

Figure 1 for VA-learning as a more efficient alternative to Q-learning
Figure 2 for VA-learning as a more efficient alternative to Q-learning
Figure 3 for VA-learning as a more efficient alternative to Q-learning
Figure 4 for VA-learning as a more efficient alternative to Q-learning
Viaarxiv icon

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Add code
Bookmark button
Alert button
May 29, 2023
Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Figure 1 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 2 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 3 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 4 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Viaarxiv icon

Towards a Better Understanding of Representation Dynamics under TD-learning

Add code
Bookmark button
Alert button
May 29, 2023
Yunhao Tang, Rémi Munos

Figure 1 for Towards a Better Understanding of Representation Dynamics under TD-learning
Figure 2 for Towards a Better Understanding of Representation Dynamics under TD-learning
Figure 3 for Towards a Better Understanding of Representation Dynamics under TD-learning
Figure 4 for Towards a Better Understanding of Representation Dynamics under TD-learning
Viaarxiv icon

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

Add code
Bookmark button
Alert button
May 28, 2023
Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney

Figure 1 for The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Figure 2 for The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Figure 3 for The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Figure 4 for The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Viaarxiv icon