Picture for William Fedus

William Fedus

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts

Add code
May 24, 2023
Figure 1 for Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts
Figure 2 for Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts
Figure 3 for Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts
Figure 4 for Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts
Viaarxiv icon

Scaling Instruction-Finetuned Language Models

Add code
Oct 20, 2022
Figure 1 for Scaling Instruction-Finetuned Language Models
Figure 2 for Scaling Instruction-Finetuned Language Models
Figure 3 for Scaling Instruction-Finetuned Language Models
Figure 4 for Scaling Instruction-Finetuned Language Models
Viaarxiv icon

A Review of Sparse Expert Models in Deep Learning

Add code
Sep 04, 2022
Figure 1 for A Review of Sparse Expert Models in Deep Learning
Figure 2 for A Review of Sparse Expert Models in Deep Learning
Figure 3 for A Review of Sparse Expert Models in Deep Learning
Figure 4 for A Review of Sparse Expert Models in Deep Learning
Viaarxiv icon

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

Add code
Jul 21, 2022
Figure 1 for Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Figure 2 for Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Figure 3 for Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Figure 4 for Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Viaarxiv icon

Emergent Abilities of Large Language Models

Add code
Jun 15, 2022
Figure 1 for Emergent Abilities of Large Language Models
Figure 2 for Emergent Abilities of Large Language Models
Figure 3 for Emergent Abilities of Large Language Models
Figure 4 for Emergent Abilities of Large Language Models
Viaarxiv icon

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Add code
Jun 10, 2022
Viaarxiv icon

Designing Effective Sparse Expert Models

Add code
Feb 17, 2022
Figure 1 for Designing Effective Sparse Expert Models
Figure 2 for Designing Effective Sparse Expert Models
Figure 3 for Designing Effective Sparse Expert Models
Figure 4 for Designing Effective Sparse Expert Models
Viaarxiv icon

On Bonus-Based Exploration Methods in the Arcade Learning Environment

Add code
Sep 22, 2021
Figure 1 for On Bonus-Based Exploration Methods in the Arcade Learning Environment
Figure 2 for On Bonus-Based Exploration Methods in the Arcade Learning Environment
Figure 3 for On Bonus-Based Exploration Methods in the Arcade Learning Environment
Figure 4 for On Bonus-Based Exploration Methods in the Arcade Learning Environment
Viaarxiv icon

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

Add code
Sep 22, 2021
Figure 1 for Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Figure 2 for Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Figure 3 for Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Figure 4 for Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Viaarxiv icon

Revisiting ResNets: Improved Training and Scaling Strategies

Add code
Mar 13, 2021
Figure 1 for Revisiting ResNets: Improved Training and Scaling Strategies
Figure 2 for Revisiting ResNets: Improved Training and Scaling Strategies
Figure 3 for Revisiting ResNets: Improved Training and Scaling Strategies
Figure 4 for Revisiting ResNets: Improved Training and Scaling Strategies
Viaarxiv icon