Picture for Erich Elsen

Erich Elsen

The State of Sparse Training in Deep Reinforcement Learning

Add code
Jun 17, 2022
Figure 1 for The State of Sparse Training in Deep Reinforcement Learning
Figure 2 for The State of Sparse Training in Deep Reinforcement Learning
Figure 3 for The State of Sparse Training in Deep Reinforcement Learning
Figure 4 for The State of Sparse Training in Deep Reinforcement Learning
Viaarxiv icon

Training Compute-Optimal Large Language Models

Add code
Mar 29, 2022
Figure 1 for Training Compute-Optimal Large Language Models
Figure 2 for Training Compute-Optimal Large Language Models
Figure 3 for Training Compute-Optimal Large Language Models
Figure 4 for Training Compute-Optimal Large Language Models
Viaarxiv icon

Unified Scaling Laws for Routed Language Models

Add code
Feb 09, 2022
Figure 1 for Unified Scaling Laws for Routed Language Models
Figure 2 for Unified Scaling Laws for Routed Language Models
Figure 3 for Unified Scaling Laws for Routed Language Models
Figure 4 for Unified Scaling Laws for Routed Language Models
Viaarxiv icon

Improving language models by retrieving from trillions of tokens

Add code
Jan 11, 2022
Figure 1 for Improving language models by retrieving from trillions of tokens
Figure 2 for Improving language models by retrieving from trillions of tokens
Figure 3 for Improving language models by retrieving from trillions of tokens
Figure 4 for Improving language models by retrieving from trillions of tokens
Viaarxiv icon

Step-unrolled Denoising Autoencoders for Text Generation

Add code
Dec 13, 2021
Figure 1 for Step-unrolled Denoising Autoencoders for Text Generation
Figure 2 for Step-unrolled Denoising Autoencoders for Text Generation
Figure 3 for Step-unrolled Denoising Autoencoders for Text Generation
Figure 4 for Step-unrolled Denoising Autoencoders for Text Generation
Viaarxiv icon

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Add code
Dec 08, 2021
Figure 1 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 2 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 3 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 4 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Viaarxiv icon

Top-KAST: Top-K Always Sparse Training

Add code
Jun 07, 2021
Figure 1 for Top-KAST: Top-K Always Sparse Training
Figure 2 for Top-KAST: Top-K Always Sparse Training
Figure 3 for Top-KAST: Top-K Always Sparse Training
Figure 4 for Top-KAST: Top-K Always Sparse Training
Viaarxiv icon

On the Generalization Benefit of Noise in Stochastic Gradient Descent

Add code
Jun 26, 2020
Figure 1 for On the Generalization Benefit of Noise in Stochastic Gradient Descent
Figure 2 for On the Generalization Benefit of Noise in Stochastic Gradient Descent
Figure 3 for On the Generalization Benefit of Noise in Stochastic Gradient Descent
Figure 4 for On the Generalization Benefit of Noise in Stochastic Gradient Descent
Viaarxiv icon

Sparse GPU Kernels for Deep Learning

Add code
Jun 18, 2020
Figure 1 for Sparse GPU Kernels for Deep Learning
Figure 2 for Sparse GPU Kernels for Deep Learning
Figure 3 for Sparse GPU Kernels for Deep Learning
Figure 4 for Sparse GPU Kernels for Deep Learning
Viaarxiv icon

AlgebraNets

Add code
Jun 16, 2020
Figure 1 for AlgebraNets
Figure 2 for AlgebraNets
Figure 3 for AlgebraNets
Figure 4 for AlgebraNets
Viaarxiv icon