Alert button
Picture for Mostofa Patwary

Mostofa Patwary

Alert button

StarCoder 2 and The Stack v2: The Next Generation

Feb 29, 2024
Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries

Viaarxiv icon

Nemotron-4 15B Technical Report

Feb 27, 2024
Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick LeGresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi, Jonathan Cohen, Bryan Catanzaro

Viaarxiv icon

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Feb 14, 2023
Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

Figure 1 for Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Figure 2 for Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Figure 3 for Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Figure 4 for Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Viaarxiv icon

Evaluating Parameter Efficient Learning for Generation

Oct 25, 2022
Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro

Figure 1 for Evaluating Parameter Efficient Learning for Generation
Figure 2 for Evaluating Parameter Efficient Learning for Generation
Figure 3 for Evaluating Parameter Efficient Learning for Generation
Figure 4 for Evaluating Parameter Efficient Learning for Generation
Viaarxiv icon

Context Generation Improves Open Domain Question Answering

Oct 12, 2022
Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro

Figure 1 for Context Generation Improves Open Domain Question Answering
Figure 2 for Context Generation Improves Open Domain Question Answering
Figure 3 for Context Generation Improves Open Domain Question Answering
Figure 4 for Context Generation Improves Open Domain Question Answering
Viaarxiv icon

Factuality Enhanced Language Models for Open-Ended Text Generation

Jun 09, 2022
Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

Figure 1 for Factuality Enhanced Language Models for Open-Ended Text Generation
Figure 2 for Factuality Enhanced Language Models for Open-Ended Text Generation
Figure 3 for Factuality Enhanced Language Models for Open-Ended Text Generation
Figure 4 for Factuality Enhanced Language Models for Open-Ended Text Generation
Viaarxiv icon

Multi-Stage Prompting for Knowledgeable Dialogue Generation

Mar 16, 2022
Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro

Figure 1 for Multi-Stage Prompting for Knowledgeable Dialogue Generation
Figure 2 for Multi-Stage Prompting for Knowledgeable Dialogue Generation
Figure 3 for Multi-Stage Prompting for Knowledgeable Dialogue Generation
Figure 4 for Multi-Stage Prompting for Knowledgeable Dialogue Generation
Viaarxiv icon

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Feb 08, 2022
Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro

Figure 1 for Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Figure 2 for Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Figure 3 for Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Figure 4 for Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Feb 04, 2022
Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

Efficient Large-Scale Language Model Training on GPU Clusters

Apr 09, 2021
Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

Figure 1 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 2 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 3 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 4 for Efficient Large-Scale Language Model Training on GPU Clusters
Viaarxiv icon