Picture for Behnam Neyshabur

Behnam Neyshabur

Shammie

Long Range Language Modeling via Gated State Spaces

Add code
Jul 02, 2022
Figure 1 for Long Range Language Modeling via Gated State Spaces
Figure 2 for Long Range Language Modeling via Gated State Spaces
Figure 3 for Long Range Language Modeling via Gated State Spaces
Viaarxiv icon

Solving Quantitative Reasoning Problems with Language Models

Add code
Jul 01, 2022
Figure 1 for Solving Quantitative Reasoning Problems with Language Models
Figure 2 for Solving Quantitative Reasoning Problems with Language Models
Figure 3 for Solving Quantitative Reasoning Problems with Language Models
Figure 4 for Solving Quantitative Reasoning Problems with Language Models
Viaarxiv icon

Understanding the effect of sparsity on neural networks robustness

Add code
Jun 22, 2022
Figure 1 for Understanding the effect of sparsity on neural networks robustness
Figure 2 for Understanding the effect of sparsity on neural networks robustness
Figure 3 for Understanding the effect of sparsity on neural networks robustness
Figure 4 for Understanding the effect of sparsity on neural networks robustness
Viaarxiv icon

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Add code
Jun 10, 2022
Viaarxiv icon

Block-Recurrent Transformers

Add code
Mar 11, 2022
Figure 1 for Block-Recurrent Transformers
Figure 2 for Block-Recurrent Transformers
Figure 3 for Block-Recurrent Transformers
Figure 4 for Block-Recurrent Transformers
Viaarxiv icon

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Add code
Feb 09, 2022
Figure 1 for Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
Figure 2 for Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
Figure 3 for Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
Figure 4 for Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
Viaarxiv icon

Data Scaling Laws in NMT: The Effect of Noise and Architecture

Add code
Feb 04, 2022
Figure 1 for Data Scaling Laws in NMT: The Effect of Noise and Architecture
Figure 2 for Data Scaling Laws in NMT: The Effect of Noise and Architecture
Figure 3 for Data Scaling Laws in NMT: The Effect of Noise and Architecture
Figure 4 for Data Scaling Laws in NMT: The Effect of Noise and Architecture
Viaarxiv icon

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

Add code
Oct 12, 2021
Figure 1 for The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
Figure 2 for The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
Figure 3 for The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
Figure 4 for The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
Viaarxiv icon

A Loss Curvature Perspective on Training Instability in Deep Learning

Add code
Oct 08, 2021
Figure 1 for A Loss Curvature Perspective on Training Instability in Deep Learning
Figure 2 for A Loss Curvature Perspective on Training Instability in Deep Learning
Figure 3 for A Loss Curvature Perspective on Training Instability in Deep Learning
Figure 4 for A Loss Curvature Perspective on Training Instability in Deep Learning
Viaarxiv icon

Exploring the Limits of Large Scale Pre-training

Add code
Oct 05, 2021
Figure 1 for Exploring the Limits of Large Scale Pre-training
Figure 2 for Exploring the Limits of Large Scale Pre-training
Figure 3 for Exploring the Limits of Large Scale Pre-training
Figure 4 for Exploring the Limits of Large Scale Pre-training
Viaarxiv icon