Picture for Bobak Shahriari

Bobak Shahriari

Dima

Gemma 3 Technical Report

Add code
Mar 25, 2025
Viaarxiv icon

Capturing Individual Human Preferences with Reward Features

Add code
Mar 21, 2025
Figure 1 for Capturing Individual Human Preferences with Reward Features
Figure 2 for Capturing Individual Human Preferences with Reward Features
Figure 3 for Capturing Individual Human Preferences with Reward Features
Figure 4 for Capturing Individual Human Preferences with Reward Features
Viaarxiv icon

Preference Optimization as Probabilistic Inference

Add code
Oct 05, 2024
Figure 1 for Preference Optimization as Probabilistic Inference
Figure 2 for Preference Optimization as Probabilistic Inference
Figure 3 for Preference Optimization as Probabilistic Inference
Figure 4 for Preference Optimization as Probabilistic Inference
Viaarxiv icon

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

Gemma: Open Models Based on Gemini Research and Technology

Add code
Mar 13, 2024
Figure 1 for Gemma: Open Models Based on Gemini Research and Technology
Figure 2 for Gemma: Open Models Based on Gemini Research and Technology
Figure 3 for Gemma: Open Models Based on Gemini Research and Technology
Figure 4 for Gemma: Open Models Based on Gemini Research and Technology
Viaarxiv icon

Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning

Add code
May 09, 2023
Figure 1 for Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning
Figure 2 for Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning
Figure 3 for Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning
Figure 4 for Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning
Viaarxiv icon

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

Add code
Apr 22, 2022
Figure 1 for Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach
Figure 2 for Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach
Figure 3 for Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach
Figure 4 for Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach
Viaarxiv icon

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

Add code
Jun 15, 2021
Figure 1 for On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
Figure 2 for On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
Figure 3 for On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
Figure 4 for On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
Viaarxiv icon

Critic Regularized Regression

Add code
Jun 26, 2020
Figure 1 for Critic Regularized Regression
Figure 2 for Critic Regularized Regression
Figure 3 for Critic Regularized Regression
Figure 4 for Critic Regularized Regression
Viaarxiv icon

Acme: A Research Framework for Distributed Reinforcement Learning

Add code
Jun 01, 2020
Figure 1 for Acme: A Research Framework for Distributed Reinforcement Learning
Figure 2 for Acme: A Research Framework for Distributed Reinforcement Learning
Figure 3 for Acme: A Research Framework for Distributed Reinforcement Learning
Figure 4 for Acme: A Research Framework for Distributed Reinforcement Learning
Viaarxiv icon