Alert button
Picture for Erich Elsen

Erich Elsen

Alert button

The State of Sparse Training in Deep Reinforcement Learning

Jun 17, 2022
Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

Figure 1 for The State of Sparse Training in Deep Reinforcement Learning
Figure 2 for The State of Sparse Training in Deep Reinforcement Learning
Figure 3 for The State of Sparse Training in Deep Reinforcement Learning
Figure 4 for The State of Sparse Training in Deep Reinforcement Learning
Viaarxiv icon

Training Compute-Optimal Large Language Models

Mar 29, 2022
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre

Figure 1 for Training Compute-Optimal Large Language Models
Figure 2 for Training Compute-Optimal Large Language Models
Figure 3 for Training Compute-Optimal Large Language Models
Figure 4 for Training Compute-Optimal Large Language Models
Viaarxiv icon

Unified Scaling Laws for Routed Language Models

Feb 09, 2022
Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu, Karen Simonyan

Figure 1 for Unified Scaling Laws for Routed Language Models
Figure 2 for Unified Scaling Laws for Routed Language Models
Figure 3 for Unified Scaling Laws for Routed Language Models
Figure 4 for Unified Scaling Laws for Routed Language Models
Viaarxiv icon

Improving language models by retrieving from trillions of tokens

Jan 11, 2022
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, Laurent Sifre

Figure 1 for Improving language models by retrieving from trillions of tokens
Figure 2 for Improving language models by retrieving from trillions of tokens
Figure 3 for Improving language models by retrieving from trillions of tokens
Figure 4 for Improving language models by retrieving from trillions of tokens
Viaarxiv icon

Step-unrolled Denoising Autoencoders for Text Generation

Dec 13, 2021
Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, Aaron van den Oord

Figure 1 for Step-unrolled Denoising Autoencoders for Text Generation
Figure 2 for Step-unrolled Denoising Autoencoders for Text Generation
Figure 3 for Step-unrolled Denoising Autoencoders for Text Generation
Figure 4 for Step-unrolled Denoising Autoencoders for Text Generation
Viaarxiv icon

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Dec 08, 2021
Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent Sifre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean-Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d'Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew Johnson, Blake Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Ed Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving

Figure 1 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 2 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 3 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 4 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Viaarxiv icon

Top-KAST: Top-K Always Sparse Training

Jun 07, 2021
Siddhant M. Jayakumar, Razvan Pascanu, Jack W. Rae, Simon Osindero, Erich Elsen

Figure 1 for Top-KAST: Top-K Always Sparse Training
Figure 2 for Top-KAST: Top-K Always Sparse Training
Figure 3 for Top-KAST: Top-K Always Sparse Training
Figure 4 for Top-KAST: Top-K Always Sparse Training
Viaarxiv icon

On the Generalization Benefit of Noise in Stochastic Gradient Descent

Jun 26, 2020
Samuel L. Smith, Erich Elsen, Soham De

Figure 1 for On the Generalization Benefit of Noise in Stochastic Gradient Descent
Figure 2 for On the Generalization Benefit of Noise in Stochastic Gradient Descent
Figure 3 for On the Generalization Benefit of Noise in Stochastic Gradient Descent
Figure 4 for On the Generalization Benefit of Noise in Stochastic Gradient Descent
Viaarxiv icon

Sparse GPU Kernels for Deep Learning

Jun 18, 2020
Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen

Figure 1 for Sparse GPU Kernels for Deep Learning
Figure 2 for Sparse GPU Kernels for Deep Learning
Figure 3 for Sparse GPU Kernels for Deep Learning
Figure 4 for Sparse GPU Kernels for Deep Learning
Viaarxiv icon