Picture for Michal Valko

Michal Valko

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Add code
May 20, 2024
Viaarxiv icon

Understanding the performance gap between online and offline alignment algorithms

Add code
May 14, 2024
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Feb 08, 2024
Figure 1 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 2 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 3 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 4 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Viaarxiv icon

Decoding-time Realignment of Language Models

Add code
Feb 05, 2024
Figure 1 for Decoding-time Realignment of Language Models
Figure 2 for Decoding-time Realignment of Language Models
Figure 3 for Decoding-time Realignment of Language Models
Figure 4 for Decoding-time Realignment of Language Models
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon

Model-free Posterior Sampling via Learning Rate Randomization

Add code
Oct 27, 2023
Figure 1 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 2 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 3 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 4 for Model-free Posterior Sampling via Learning Rate Randomization
Viaarxiv icon

Demonstration-Regularized RL

Add code
Oct 26, 2023
Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Add code
Oct 18, 2023
Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon

Local and adaptive mirror descents in extensive-form games

Add code
Sep 01, 2023
Figure 1 for Local and adaptive mirror descents in extensive-form games
Viaarxiv icon