Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

Reducing Activation Recomputation in Large Transformer Models



Vijay Korthikanti , Jared Casper , Sangkug Lym , Lawrence McAfee , Michael Andersch , Mohammad Shoeybi , Bryan Catanzaro


   Access Paper or Ask Questions

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model



Shaden Smith , Mostofa Patwary , Brandon Norick , Patrick LeGresley , Samyam Rajbhandari , Jared Casper , Zhun Liu , Shrimai Prabhumoye , George Zerveas , Vijay Korthikanti , Elton Zhang , Rewon Child , Reza Yazdani Aminabadi , Julie Bernauer , Xia Song , Mohammad Shoeybi , Yuxiong He , Michael Houston , Saurabh Tiwary , Bryan Catanzaro

* Shaden Smith and Mostofa Patwary contributed equally 

   Access Paper or Ask Questions

Efficient Large-Scale Language Model Training on GPU Clusters



Deepak Narayanan , Mohammad Shoeybi , Jared Casper , Patrick LeGresley , Mostofa Patwary , Vijay Korthikanti , Dmitri Vainbrand , Prethvi Kashinkunti , Julie Bernauer , Bryan Catanzaro , Amar Phanishayee , Matei Zaharia


   Access Paper or Ask Questions