Alert button
Picture for Vijay Korthikanti

Vijay Korthikanti

Alert button

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

Feb 09, 2023
Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar

Figure 1 for Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Figure 2 for Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Figure 3 for Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Figure 4 for Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Viaarxiv icon

Reducing Activation Recomputation in Large Transformer Models

May 10, 2022
Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro

Figure 1 for Reducing Activation Recomputation in Large Transformer Models
Figure 2 for Reducing Activation Recomputation in Large Transformer Models
Figure 3 for Reducing Activation Recomputation in Large Transformer Models
Figure 4 for Reducing Activation Recomputation in Large Transformer Models
Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Feb 04, 2022
Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

Efficient Large-Scale Language Model Training on GPU Clusters

Apr 09, 2021
Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

Figure 1 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 2 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 3 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 4 for Efficient Large-Scale Language Model Training on GPU Clusters
Viaarxiv icon