Picture for Mike Lewis

Mike Lewis

Jack

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Add code
Feb 25, 2022
Figure 1 for Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Figure 2 for Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Figure 3 for Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Figure 4 for Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Viaarxiv icon

CM3: A Causal Masked Multimodal Model of the Internet

Add code
Jan 19, 2022
Figure 1 for CM3: A Causal Masked Multimodal Model of the Internet
Figure 2 for CM3: A Causal Masked Multimodal Model of the Internet
Figure 3 for CM3: A Causal Masked Multimodal Model of the Internet
Figure 4 for CM3: A Causal Masked Multimodal Model of the Internet
Viaarxiv icon

MetaICL: Learning to Learn In Context

Add code
Oct 29, 2021
Figure 1 for MetaICL: Learning to Learn In Context
Figure 2 for MetaICL: Learning to Learn In Context
Figure 3 for MetaICL: Learning to Learn In Context
Figure 4 for MetaICL: Learning to Learn In Context
Viaarxiv icon

Sparse Distillation: Speeding Up Text Classification by Using Bigger Models

Add code
Oct 16, 2021
Figure 1 for Sparse Distillation: Speeding Up Text Classification by Using Bigger Models
Figure 2 for Sparse Distillation: Speeding Up Text Classification by Using Bigger Models
Figure 3 for Sparse Distillation: Speeding Up Text Classification by Using Bigger Models
Figure 4 for Sparse Distillation: Speeding Up Text Classification by Using Bigger Models
Viaarxiv icon

Tricks for Training Sparse Translation Models

Add code
Oct 15, 2021
Figure 1 for Tricks for Training Sparse Translation Models
Figure 2 for Tricks for Training Sparse Translation Models
Figure 3 for Tricks for Training Sparse Translation Models
Figure 4 for Tricks for Training Sparse Translation Models
Viaarxiv icon

8-bit Optimizers via Block-wise Quantization

Add code
Oct 06, 2021
Figure 1 for 8-bit Optimizers via Block-wise Quantization
Figure 2 for 8-bit Optimizers via Block-wise Quantization
Figure 3 for 8-bit Optimizers via Block-wise Quantization
Figure 4 for 8-bit Optimizers via Block-wise Quantization
Viaarxiv icon

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Add code
Aug 27, 2021
Figure 1 for Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Figure 2 for Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Figure 3 for Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Figure 4 for Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Viaarxiv icon

DEMix Layers: Disentangling Domains for Modular Language Modeling

Add code
Aug 20, 2021
Figure 1 for DEMix Layers: Disentangling Domains for Modular Language Modeling
Figure 2 for DEMix Layers: Disentangling Domains for Modular Language Modeling
Figure 3 for DEMix Layers: Disentangling Domains for Modular Language Modeling
Figure 4 for DEMix Layers: Disentangling Domains for Modular Language Modeling
Viaarxiv icon

Noisy Channel Language Model Prompting for Few-Shot Text Classification

Add code
Aug 15, 2021
Figure 1 for Noisy Channel Language Model Prompting for Few-Shot Text Classification
Figure 2 for Noisy Channel Language Model Prompting for Few-Shot Text Classification
Figure 3 for Noisy Channel Language Model Prompting for Few-Shot Text Classification
Figure 4 for Noisy Channel Language Model Prompting for Few-Shot Text Classification
Viaarxiv icon

HTLM: Hyper-Text Pre-Training and Prompting of Language Models

Add code
Jul 14, 2021
Figure 1 for HTLM: Hyper-Text Pre-Training and Prompting of Language Models
Figure 2 for HTLM: Hyper-Text Pre-Training and Prompting of Language Models
Figure 3 for HTLM: Hyper-Text Pre-Training and Prompting of Language Models
Figure 4 for HTLM: Hyper-Text Pre-Training and Prompting of Language Models
Viaarxiv icon