Picture for Bilal Piot

Bilal Piot

Dima

Gemma 3 Technical Report

Add code
Mar 25, 2025
Viaarxiv icon

Preference Optimization as Probabilistic Inference

Add code
Oct 05, 2024
Figure 1 for Preference Optimization as Probabilistic Inference
Figure 2 for Preference Optimization as Probabilistic Inference
Figure 3 for Preference Optimization as Probabilistic Inference
Figure 4 for Preference Optimization as Probabilistic Inference
Viaarxiv icon

Building Math Agents with Multi-Turn Iterative Preference Learning

Add code
Sep 04, 2024
Figure 1 for Building Math Agents with Multi-Turn Iterative Preference Learning
Figure 2 for Building Math Agents with Multi-Turn Iterative Preference Learning
Figure 3 for Building Math Agents with Multi-Turn Iterative Preference Learning
Figure 4 for Building Math Agents with Multi-Turn Iterative Preference Learning
Viaarxiv icon

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Viaarxiv icon

Multi-turn Reinforcement Learning from Preference Human Feedback

Add code
May 23, 2024
Figure 1 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 2 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 3 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 4 for Multi-turn Reinforcement Learning from Preference Human Feedback
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Feb 08, 2024
Figure 1 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 2 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 3 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 4 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Viaarxiv icon

Direct Language Model Alignment from Online AI Feedback

Add code
Feb 07, 2024
Figure 1 for Direct Language Model Alignment from Online AI Feedback
Figure 2 for Direct Language Model Alignment from Online AI Feedback
Figure 3 for Direct Language Model Alignment from Online AI Feedback
Figure 4 for Direct Language Model Alignment from Online AI Feedback
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon