Picture for Jack W. Rae

Jack W. Rae

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Training Compute-Optimal Large Language Models

Add code
Mar 29, 2022
Figure 1 for Training Compute-Optimal Large Language Models
Figure 2 for Training Compute-Optimal Large Language Models
Figure 3 for Training Compute-Optimal Large Language Models
Figure 4 for Training Compute-Optimal Large Language Models
Viaarxiv icon

Improving language models by retrieving from trillions of tokens

Add code
Jan 11, 2022
Figure 1 for Improving language models by retrieving from trillions of tokens
Figure 2 for Improving language models by retrieving from trillions of tokens
Figure 3 for Improving language models by retrieving from trillions of tokens
Figure 4 for Improving language models by retrieving from trillions of tokens
Viaarxiv icon

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Add code
Dec 08, 2021
Figure 1 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 2 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 3 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 4 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Viaarxiv icon

Top-KAST: Top-K Always Sparse Training

Add code
Jun 07, 2021
Figure 1 for Top-KAST: Top-K Always Sparse Training
Figure 2 for Top-KAST: Top-K Always Sparse Training
Figure 3 for Top-KAST: Top-K Always Sparse Training
Figure 4 for Top-KAST: Top-K Always Sparse Training
Viaarxiv icon

Do Transformers Need Deep Long-Range Memory

Add code
Jul 07, 2020
Figure 1 for Do Transformers Need Deep Long-Range Memory
Figure 2 for Do Transformers Need Deep Long-Range Memory
Figure 3 for Do Transformers Need Deep Long-Range Memory
Figure 4 for Do Transformers Need Deep Long-Range Memory
Viaarxiv icon

Compressive Transformers for Long-Range Sequence Modelling

Add code
Nov 13, 2019
Figure 1 for Compressive Transformers for Long-Range Sequence Modelling
Figure 2 for Compressive Transformers for Long-Range Sequence Modelling
Figure 3 for Compressive Transformers for Long-Range Sequence Modelling
Figure 4 for Compressive Transformers for Long-Range Sequence Modelling
Viaarxiv icon

Stabilizing Transformers for Reinforcement Learning

Add code
Oct 13, 2019
Figure 1 for Stabilizing Transformers for Reinforcement Learning
Figure 2 for Stabilizing Transformers for Reinforcement Learning
Figure 3 for Stabilizing Transformers for Reinforcement Learning
Figure 4 for Stabilizing Transformers for Reinforcement Learning
Viaarxiv icon

V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

Add code
Sep 26, 2019
Figure 1 for V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Figure 2 for V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Figure 3 for V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Figure 4 for V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Viaarxiv icon