Alert button
Picture for Xia Song

Xia Song

Alert button

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

Apr 16, 2022
Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao

Figure 1 for METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Figure 2 for METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Figure 3 for METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Figure 4 for METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Viaarxiv icon

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Apr 07, 2022
Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

Figure 1 for Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators
Figure 2 for Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators
Figure 3 for Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators
Figure 4 for Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators
Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Feb 04, 2022
Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

Nov 03, 2021
Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

Figure 1 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Figure 2 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Figure 3 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Figure 4 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Viaarxiv icon

Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training

Sep 15, 2021
Bo Zheng, Li Dong, Shaohan Huang, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei

Figure 1 for Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
Figure 2 for Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
Figure 3 for Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
Figure 4 for Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
Viaarxiv icon

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA

Jun 30, 2021
Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Saksham Singhal, Payal Bajaj, Xia Song, Furu Wei

Figure 1 for XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
Figure 2 for XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
Figure 3 for XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
Figure 4 for XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
Viaarxiv icon

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

Jun 25, 2021
Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

Figure 1 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Figure 2 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Figure 3 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Figure 4 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Viaarxiv icon

Consistency Regularization for Cross-Lingual Fine-Tuning

Jun 15, 2021
Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei

Figure 1 for Consistency Regularization for Cross-Lingual Fine-Tuning
Figure 2 for Consistency Regularization for Cross-Lingual Fine-Tuning
Figure 3 for Consistency Regularization for Cross-Lingual Fine-Tuning
Figure 4 for Consistency Regularization for Cross-Lingual Fine-Tuning
Viaarxiv icon

Language Scaling for Universal Suggested Replies Model

Jun 04, 2021
Qianlan Ying, Payal Bajaj, Budhaditya Deb, Yu Yang, Wei Wang, Bojia Lin, Milad Shokouhi, Xia Song, Yang Yang, Daxin Jiang

Figure 1 for Language Scaling for Universal Suggested Replies Model
Figure 2 for Language Scaling for Universal Suggested Replies Model
Figure 3 for Language Scaling for Universal Suggested Replies Model
Figure 4 for Language Scaling for Universal Suggested Replies Model
Viaarxiv icon