Alert button
Picture for Nitish Shirish Keskar

Nitish Shirish Keskar

Alert button

The Natural Language Decathlon: Multitask Learning as Question Answering

Add code
Bookmark button
Alert button
Jun 20, 2018
Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

Figure 1 for The Natural Language Decathlon: Multitask Learning as Question Answering
Figure 2 for The Natural Language Decathlon: Multitask Learning as Question Answering
Figure 3 for The Natural Language Decathlon: Multitask Learning as Question Answering
Figure 4 for The Natural Language Decathlon: Multitask Learning as Question Answering
Viaarxiv icon

Using Mode Connectivity for Loss Landscape Analysis

Add code
Bookmark button
Alert button
Jun 18, 2018
Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

Figure 1 for Using Mode Connectivity for Loss Landscape Analysis
Figure 2 for Using Mode Connectivity for Loss Landscape Analysis
Figure 3 for Using Mode Connectivity for Loss Landscape Analysis
Figure 4 for Using Mode Connectivity for Loss Landscape Analysis
Viaarxiv icon

An Analysis of Neural Language Modeling at Multiple Scales

Add code
Bookmark button
Alert button
Mar 22, 2018
Stephen Merity, Nitish Shirish Keskar, Richard Socher

Figure 1 for An Analysis of Neural Language Modeling at Multiple Scales
Figure 2 for An Analysis of Neural Language Modeling at Multiple Scales
Figure 3 for An Analysis of Neural Language Modeling at Multiple Scales
Figure 4 for An Analysis of Neural Language Modeling at Multiple Scales
Viaarxiv icon

Improving Generalization Performance by Switching from Adam to SGD

Add code
Bookmark button
Alert button
Dec 20, 2017
Nitish Shirish Keskar, Richard Socher

Figure 1 for Improving Generalization Performance by Switching from Adam to SGD
Figure 2 for Improving Generalization Performance by Switching from Adam to SGD
Figure 3 for Improving Generalization Performance by Switching from Adam to SGD
Figure 4 for Improving Generalization Performance by Switching from Adam to SGD
Viaarxiv icon

Weighted Transformer Network for Machine Translation

Add code
Bookmark button
Alert button
Nov 06, 2017
Karim Ahmed, Nitish Shirish Keskar, Richard Socher

Figure 1 for Weighted Transformer Network for Machine Translation
Figure 2 for Weighted Transformer Network for Machine Translation
Figure 3 for Weighted Transformer Network for Machine Translation
Figure 4 for Weighted Transformer Network for Machine Translation
Viaarxiv icon

Regularizing and Optimizing LSTM Language Models

Add code
Bookmark button
Alert button
Aug 07, 2017
Stephen Merity, Nitish Shirish Keskar, Richard Socher

Figure 1 for Regularizing and Optimizing LSTM Language Models
Figure 2 for Regularizing and Optimizing LSTM Language Models
Figure 3 for Regularizing and Optimizing LSTM Language Models
Figure 4 for Regularizing and Optimizing LSTM Language Models
Viaarxiv icon

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Add code
Bookmark button
Alert button
Feb 09, 2017
Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, Ping Tak Peter Tang

Figure 1 for On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
Figure 2 for On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
Figure 3 for On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
Figure 4 for On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
Viaarxiv icon

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

Add code
Bookmark button
Alert button
Feb 23, 2016
Nitish Shirish Keskar, Albert S. Berahas

Figure 1 for adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs
Figure 2 for adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs
Viaarxiv icon