Alert button
Picture for Patrick LeGresley

Patrick LeGresley

Alert button

Nemotron-4 15B Technical Report

Add code
Bookmark button
Alert button
Feb 27, 2024
Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick LeGresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi, Jonathan Cohen, Bryan Catanzaro

Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Add code
Bookmark button
Alert button
Feb 04, 2022
Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

Efficient Large-Scale Language Model Training on GPU Clusters

Add code
Bookmark button
Alert button
Apr 09, 2021
Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

Figure 1 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 2 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 3 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 4 for Efficient Large-Scale Language Model Training on GPU Clusters
Viaarxiv icon

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Add code
Bookmark button
Alert button
Oct 05, 2019
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro

Figure 1 for Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Figure 2 for Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Figure 3 for Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Figure 4 for Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Viaarxiv icon

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Add code
Bookmark button
Alert button
Dec 08, 2015
Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu

Figure 1 for Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Figure 2 for Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Figure 3 for Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Figure 4 for Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Viaarxiv icon