Picture for Myle Ott

Myle Ott

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Add code
Apr 21, 2023
Figure 1 for PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Figure 2 for PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Figure 3 for PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Figure 4 for PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Viaarxiv icon

OPT: Open Pre-trained Transformer Language Models

Add code
May 05, 2022
Figure 1 for OPT: Open Pre-trained Transformer Language Models
Figure 2 for OPT: Open Pre-trained Transformer Language Models
Figure 3 for OPT: Open Pre-trained Transformer Language Models
Figure 4 for OPT: Open Pre-trained Transformer Language Models
Viaarxiv icon

Efficient Language Modeling with Sparse all-MLP

Add code
Mar 16, 2022
Figure 1 for Efficient Language Modeling with Sparse all-MLP
Figure 2 for Efficient Language Modeling with Sparse all-MLP
Figure 3 for Efficient Language Modeling with Sparse all-MLP
Figure 4 for Efficient Language Modeling with Sparse all-MLP
Viaarxiv icon

Efficient Large Scale Language Modeling with Mixtures of Experts

Add code
Dec 20, 2021
Figure 1 for Efficient Large Scale Language Modeling with Mixtures of Experts
Figure 2 for Efficient Large Scale Language Modeling with Mixtures of Experts
Figure 3 for Efficient Large Scale Language Modeling with Mixtures of Experts
Figure 4 for Efficient Large Scale Language Modeling with Mixtures of Experts
Viaarxiv icon

Few-shot Learning with Multilingual Language Models

Add code
Dec 20, 2021
Figure 1 for Few-shot Learning with Multilingual Language Models
Figure 2 for Few-shot Learning with Multilingual Language Models
Figure 3 for Few-shot Learning with Multilingual Language Models
Figure 4 for Few-shot Learning with Multilingual Language Models
Viaarxiv icon

NormFormer: Improved Transformer Pretraining with Extra Normalization

Add code
Nov 01, 2021
Figure 1 for NormFormer: Improved Transformer Pretraining with Extra Normalization
Figure 2 for NormFormer: Improved Transformer Pretraining with Extra Normalization
Figure 3 for NormFormer: Improved Transformer Pretraining with Extra Normalization
Figure 4 for NormFormer: Improved Transformer Pretraining with Extra Normalization
Viaarxiv icon

Sustainable AI: Environmental Implications, Challenges and Opportunities

Add code
Oct 30, 2021
Figure 1 for Sustainable AI: Environmental Implications, Challenges and Opportunities
Figure 2 for Sustainable AI: Environmental Implications, Challenges and Opportunities
Figure 3 for Sustainable AI: Environmental Implications, Challenges and Opportunities
Figure 4 for Sustainable AI: Environmental Implications, Challenges and Opportunities
Viaarxiv icon

On Anytime Learning at Macroscale

Add code
Jun 17, 2021
Figure 1 for On Anytime Learning at Macroscale
Figure 2 for On Anytime Learning at Macroscale
Figure 3 for On Anytime Learning at Macroscale
Figure 4 for On Anytime Learning at Macroscale
Viaarxiv icon

Larger-Scale Transformers for Multilingual Masked Language Modeling

Add code
May 02, 2021
Figure 1 for Larger-Scale Transformers for Multilingual Masked Language Modeling
Figure 2 for Larger-Scale Transformers for Multilingual Masked Language Modeling
Figure 3 for Larger-Scale Transformers for Multilingual Masked Language Modeling
Figure 4 for Larger-Scale Transformers for Multilingual Masked Language Modeling
Viaarxiv icon

Few-shot Sequence Learning with Transformers

Add code
Dec 17, 2020
Figure 1 for Few-shot Sequence Learning with Transformers
Figure 2 for Few-shot Sequence Learning with Transformers
Figure 3 for Few-shot Sequence Learning with Transformers
Figure 4 for Few-shot Sequence Learning with Transformers
Viaarxiv icon