Alert button
Picture for Michal Valko

Michal Valko

Alert button

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Bookmark button
Alert button
Mar 13, 2024
Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Bookmark button
Alert button
Feb 08, 2024
Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Viaarxiv icon

Decoding-time Realignment of Language Models

Add code
Bookmark button
Alert button
Feb 05, 2024
Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel

Viaarxiv icon

Nash Learning from Human Feedback

Add code
Bookmark button
Alert button
Dec 06, 2023
Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon

Model-free Posterior Sampling via Learning Rate Randomization

Add code
Bookmark button
Alert button
Oct 27, 2023
Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

Figure 1 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 2 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 3 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 4 for Model-free Posterior Sampling via Learning Rate Randomization
Viaarxiv icon

Demonstration-Regularized RL

Add code
Bookmark button
Alert button
Oct 26, 2023
Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Add code
Bookmark button
Alert button
Oct 18, 2023
Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon

Local and adaptive mirror descents in extensive-form games

Add code
Bookmark button
Alert button
Sep 01, 2023
Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Figure 1 for Local and adaptive mirror descents in extensive-form games
Viaarxiv icon

Half-Hop: A graph upsampling approach for slowing down message passing

Add code
Bookmark button
Alert button
Aug 17, 2023
Mehdi Azabou, Venkataramana Ganesh, Shantanu Thakoor, Chi-Heng Lin, Lakshmi Sathidevi, Ran Liu, Michal Valko, Petar Veličković, Eva L. Dyer

Figure 1 for Half-Hop: A graph upsampling approach for slowing down message passing
Figure 2 for Half-Hop: A graph upsampling approach for slowing down message passing
Figure 3 for Half-Hop: A graph upsampling approach for slowing down message passing
Figure 4 for Half-Hop: A graph upsampling approach for slowing down message passing
Viaarxiv icon

VA-learning as a more efficient alternative to Q-learning

Add code
Bookmark button
Alert button
May 29, 2023
Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko

Figure 1 for VA-learning as a more efficient alternative to Q-learning
Figure 2 for VA-learning as a more efficient alternative to Q-learning
Figure 3 for VA-learning as a more efficient alternative to Q-learning
Figure 4 for VA-learning as a more efficient alternative to Q-learning
Viaarxiv icon