Alert button
Picture for Mark Rowland

Mark Rowland

Alert button

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Bookmark button
Alert button
Mar 13, 2024
Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon

A Distributional Analogue to the Successor Representation

Add code
Bookmark button
Alert button
Feb 13, 2024
Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland

Viaarxiv icon

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

Add code
Bookmark button
Alert button
Feb 12, 2024
Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney

Viaarxiv icon

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

Add code
Bookmark button
Alert button
Feb 08, 2024
Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney

Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Bookmark button
Alert button
Feb 08, 2024
Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Viaarxiv icon

Distributional Bellman Operators over Mean Embeddings

Add code
Bookmark button
Alert button
Dec 09, 2023
Li Kevin Wenliang, Grégoire Déletang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

Figure 1 for Distributional Bellman Operators over Mean Embeddings
Figure 2 for Distributional Bellman Operators over Mean Embeddings
Figure 3 for Distributional Bellman Operators over Mean Embeddings
Figure 4 for Distributional Bellman Operators over Mean Embeddings
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Bookmark button
Alert button
Dec 06, 2023
Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Add code
Bookmark button
Alert button
Oct 18, 2023
Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon

Bootstrapped Representations in Reinforcement Learning

Add code
Bookmark button
Alert button
Jun 16, 2023
Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

Figure 1 for Bootstrapped Representations in Reinforcement Learning
Figure 2 for Bootstrapped Representations in Reinforcement Learning
Figure 3 for Bootstrapped Representations in Reinforcement Learning
Figure 4 for Bootstrapped Representations in Reinforcement Learning
Viaarxiv icon

VA-learning as a more efficient alternative to Q-learning

Add code
Bookmark button
Alert button
May 29, 2023
Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko

Figure 1 for VA-learning as a more efficient alternative to Q-learning
Figure 2 for VA-learning as a more efficient alternative to Q-learning
Figure 3 for VA-learning as a more efficient alternative to Q-learning
Figure 4 for VA-learning as a more efficient alternative to Q-learning
Viaarxiv icon