Picture for Nitish Shirish Keskar

Nitish Shirish Keskar

Shammie

An Analysis of Neural Language Modeling at Multiple Scales

Add code
Mar 22, 2018
Figure 1 for An Analysis of Neural Language Modeling at Multiple Scales
Figure 2 for An Analysis of Neural Language Modeling at Multiple Scales
Figure 3 for An Analysis of Neural Language Modeling at Multiple Scales
Figure 4 for An Analysis of Neural Language Modeling at Multiple Scales
Viaarxiv icon

Improving Generalization Performance by Switching from Adam to SGD

Add code
Dec 20, 2017
Figure 1 for Improving Generalization Performance by Switching from Adam to SGD
Figure 2 for Improving Generalization Performance by Switching from Adam to SGD
Figure 3 for Improving Generalization Performance by Switching from Adam to SGD
Figure 4 for Improving Generalization Performance by Switching from Adam to SGD
Viaarxiv icon

Weighted Transformer Network for Machine Translation

Add code
Nov 06, 2017
Figure 1 for Weighted Transformer Network for Machine Translation
Figure 2 for Weighted Transformer Network for Machine Translation
Figure 3 for Weighted Transformer Network for Machine Translation
Figure 4 for Weighted Transformer Network for Machine Translation
Viaarxiv icon

Regularizing and Optimizing LSTM Language Models

Add code
Aug 07, 2017
Figure 1 for Regularizing and Optimizing LSTM Language Models
Figure 2 for Regularizing and Optimizing LSTM Language Models
Figure 3 for Regularizing and Optimizing LSTM Language Models
Figure 4 for Regularizing and Optimizing LSTM Language Models
Viaarxiv icon

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Add code
Feb 09, 2017
Figure 1 for On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
Figure 2 for On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
Figure 3 for On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
Figure 4 for On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
Viaarxiv icon

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

Add code
Feb 23, 2016
Figure 1 for adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs
Figure 2 for adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs
Viaarxiv icon