Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

Picture for Mostofa Patwary

Efficient Large-Scale Language Model Training on GPU Clusters


Apr 09, 2021
Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia


  Access Paper or Ask Questions

End-to-End Training of Neural Retrievers for Open-Domain Question Answering


Jan 02, 2021
Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei Ping, William L Hamilton, Bryan Catanzaro

* Preprint 

  Access Paper or Ask Questions

Local Knowledge Powered Conversational Agents


Oct 20, 2020
Sashank Santhanam, Wei Ping, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro


  Access Paper or Ask Questions

BioMegatron: Larger Biomedical Domain Language Model


Oct 14, 2020
Hoo-Chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, Raghav Mani

* Accepted for publication at EMNLP 2020 

  Access Paper or Ask Questions

MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models


Oct 02, 2020
Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar, Bryan Catanzaro

* Accepted in EMNLP 2020 main conference 

  Access Paper or Ask Questions

Large Scale Multi-Actor Generative Dialog Modeling


May 13, 2020
Alex Boyd, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro


  Access Paper or Ask Questions

Training Question Answering Models From Synthetic Data


Feb 22, 2020
Raul Puri, Ryan Spring, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro


  Access Paper or Ask Questions

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism


Oct 05, 2019
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro


  Access Paper or Ask Questions

DisCo: Physics-Based Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems


Sep 25, 2019
Adam Rupe, Nalini Kumar, Vladislav Epifanov, Karthik Kashinath, Oleksandr Pavlyk, Frank Schlimbach, Mostofa Patwary, Sergey Maidanov, Victor Lee, Prabhat, James P. Crutchfield


  Access Paper or Ask Questions

Coloring Big Graphs with AlphaGoZero


Feb 28, 2019
Jiayi Huang, Mostofa Patwary, Gregory Diamos


  Access Paper or Ask Questions

Language Modeling at Scale


Oct 23, 2018
Mostofa Patwary, Milind Chabbi, Heewoo Jun, Jiaji Huang, Gregory Diamos, Kenneth Church


  Access Paper or Ask Questions