Picture for Olivier Bachem

Olivier Bachem

Google Research

WARM: On the Benefits of Weight Averaged Reward Models

Add code
Jan 22, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon

GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

Add code
Jun 23, 2023
Figure 1 for GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
Figure 2 for GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
Figure 3 for GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
Figure 4 for GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
Viaarxiv icon

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Add code
May 31, 2023
Figure 1 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 2 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 3 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 4 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Viaarxiv icon

C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

Add code
Nov 07, 2022
Figure 1 for C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining
Figure 2 for C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining
Figure 3 for C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining
Figure 4 for C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining
Viaarxiv icon

vec2text with Round-Trip Translations

Add code
Sep 14, 2022
Figure 1 for vec2text with Round-Trip Translations
Figure 2 for vec2text with Round-Trip Translations
Figure 3 for vec2text with Round-Trip Translations
Figure 4 for vec2text with Round-Trip Translations
Viaarxiv icon

Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization

Add code
Oct 10, 2021
Figure 1 for Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization
Figure 2 for Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization
Figure 3 for Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization
Figure 4 for Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization
Viaarxiv icon

A functional mirror ascent view of policy gradient methods with function approximation

Add code
Aug 12, 2021
Figure 1 for A functional mirror ascent view of policy gradient methods with function approximation
Figure 2 for A functional mirror ascent view of policy gradient methods with function approximation
Figure 3 for A functional mirror ascent view of policy gradient methods with function approximation
Figure 4 for A functional mirror ascent view of policy gradient methods with function approximation
Viaarxiv icon

Representation Learning for Out-Of-Distribution Generalization in Reinforcement Learning

Add code
Jul 12, 2021
Figure 1 for Representation Learning for Out-Of-Distribution Generalization in Reinforcement Learning
Figure 2 for Representation Learning for Out-Of-Distribution Generalization in Reinforcement Learning
Figure 3 for Representation Learning for Out-Of-Distribution Generalization in Reinforcement Learning
Figure 4 for Representation Learning for Out-Of-Distribution Generalization in Reinforcement Learning
Viaarxiv icon