Picture for Bilal Piot

Bilal Piot

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Figure 1 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 2 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 3 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 4 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Viaarxiv icon

Multi-turn Reinforcement Learning from Preference Human Feedback

Add code
May 23, 2024
Figure 1 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 2 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 3 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 4 for Multi-turn Reinforcement Learning from Preference Human Feedback
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Feb 08, 2024
Figure 1 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 2 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 3 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 4 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Viaarxiv icon

Direct Language Model Alignment from Online AI Feedback

Add code
Feb 07, 2024
Figure 1 for Direct Language Model Alignment from Online AI Feedback
Figure 2 for Direct Language Model Alignment from Online AI Feedback
Figure 3 for Direct Language Model Alignment from Online AI Feedback
Figure 4 for Direct Language Model Alignment from Online AI Feedback
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Add code
Oct 18, 2023
Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon

Unlocking the Power of Representations in Long-term Novelty-based Exploration

Add code
May 02, 2023
Figure 1 for Unlocking the Power of Representations in Long-term Novelty-based Exploration
Figure 2 for Unlocking the Power of Representations in Long-term Novelty-based Exploration
Figure 3 for Unlocking the Power of Representations in Long-term Novelty-based Exploration
Figure 4 for Unlocking the Power of Representations in Long-term Novelty-based Exploration
Viaarxiv icon

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

Add code
Feb 09, 2023
Figure 1 for The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
Figure 2 for The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
Figure 3 for The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
Figure 4 for The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
Viaarxiv icon

Understanding Self-Predictive Learning for Reinforcement Learning

Add code
Dec 06, 2022
Figure 1 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 2 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 3 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 4 for Understanding Self-Predictive Learning for Reinforcement Learning
Viaarxiv icon