Alert button
Picture for Alexandre Muzio

Alexandre Muzio

Alert button

SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

Add code
Bookmark button
Alert button
Apr 07, 2024
Alexandre Muzio, Alex Sun, Churan He

Viaarxiv icon

Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

Add code
Bookmark button
Alert button
May 28, 2022
Rui Liu, Young Jin Kim, Alexandre Muzio, Barzan Mozafari, Hany Hassan Awadalla

Figure 1 for Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
Figure 2 for Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
Figure 3 for Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
Figure 4 for Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
Viaarxiv icon

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

Add code
Bookmark button
Alert button
Nov 03, 2021
Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

Figure 1 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Figure 2 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Figure 3 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Figure 4 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Viaarxiv icon

Scalable and Efficient MoE Training for Multitask Multilingual Models

Add code
Bookmark button
Alert button
Sep 22, 2021
Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla

Figure 1 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 2 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 3 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 4 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Viaarxiv icon

Improving Multilingual Translation by Representation and Gradient Regularization

Add code
Bookmark button
Alert button
Sep 10, 2021
Yilin Yang, Akiko Eriguchi, Alexandre Muzio, Prasad Tadepalli, Stefan Lee, Hany Hassan

Figure 1 for Improving Multilingual Translation by Representation and Gradient Regularization
Figure 2 for Improving Multilingual Translation by Representation and Gradient Regularization
Figure 3 for Improving Multilingual Translation by Representation and Gradient Regularization
Figure 4 for Improving Multilingual Translation by Representation and Gradient Regularization
Viaarxiv icon

Discovering Representation Sprachbund For Multilingual Pre-Training

Add code
Bookmark button
Alert button
Sep 01, 2021
Yimin Fan, Yaobo Liang, Alexandre Muzio, Hany Hassan, Houqiang Li, Ming Zhou, Nan Duan

Figure 1 for Discovering Representation Sprachbund For Multilingual Pre-Training
Figure 2 for Discovering Representation Sprachbund For Multilingual Pre-Training
Figure 3 for Discovering Representation Sprachbund For Multilingual Pre-Training
Figure 4 for Discovering Representation Sprachbund For Multilingual Pre-Training
Viaarxiv icon

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

Add code
Bookmark button
Alert button
Jun 25, 2021
Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

Figure 1 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Figure 2 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Figure 3 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Figure 4 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Viaarxiv icon

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Add code
Bookmark button
Alert button
Dec 31, 2020
Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei

Figure 1 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Figure 2 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Figure 3 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Figure 4 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Viaarxiv icon