Alert button
Picture for Shaohan Huang

Shaohan Huang

Alert button

A Length-Extrapolatable Transformer

Dec 20, 2022
Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei

Figure 1 for A Length-Extrapolatable Transformer
Figure 2 for A Length-Extrapolatable Transformer
Figure 3 for A Length-Extrapolatable Transformer
Figure 4 for A Length-Extrapolatable Transformer
Viaarxiv icon

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Dec 20, 2022
Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Zhoujun Li, Furu Wei

Figure 1 for GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Figure 2 for GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Figure 3 for GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Figure 4 for GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Viaarxiv icon

TorchScale: Transformers at Scale

Nov 23, 2022
Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei

Figure 1 for TorchScale: Transformers at Scale
Figure 2 for TorchScale: Transformers at Scale
Figure 3 for TorchScale: Transformers at Scale
Figure 4 for TorchScale: Transformers at Scale
Viaarxiv icon

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

Oct 26, 2022
Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song

Figure 1 for Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
Figure 2 for Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
Figure 3 for Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
Figure 4 for Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
Viaarxiv icon

Foundation Transformers

Oct 19, 2022
Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

Figure 1 for Foundation Transformers
Figure 2 for Foundation Transformers
Figure 3 for Foundation Transformers
Figure 4 for Foundation Transformers
Viaarxiv icon

CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation

Oct 13, 2022
Jian Yang, Shaohan Huang, Shuming Ma, Yuwei Yin, Li Dong, Dongdong Zhang, Hongcheng Guo, Zhoujun Li, Furu Wei

Figure 1 for CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation
Figure 2 for CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation
Figure 3 for CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation
Figure 4 for CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation
Viaarxiv icon

MoEC: Mixture of Expert Clusters

Jul 19, 2022
Yuan Xie, Shaohan Huang, Tianyu Chen, Furu Wei

Figure 1 for MoEC: Mixture of Expert Clusters
Figure 2 for MoEC: Mixture of Expert Clusters
Figure 3 for MoEC: Mixture of Expert Clusters
Figure 4 for MoEC: Mixture of Expert Clusters
Viaarxiv icon

Language Models are General-Purpose Interfaces

Jun 13, 2022
Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

Figure 1 for Language Models are General-Purpose Interfaces
Figure 2 for Language Models are General-Purpose Interfaces
Figure 3 for Language Models are General-Purpose Interfaces
Figure 4 for Language Models are General-Purpose Interfaces
Viaarxiv icon

Task-Specific Expert Pruning for Sparse Mixture-of-Experts

Jun 02, 2022
Tianyu Chen, Shaohan Huang, Yuan Xie, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, Furu Wei

Figure 1 for Task-Specific Expert Pruning for Sparse Mixture-of-Experts
Figure 2 for Task-Specific Expert Pruning for Sparse Mixture-of-Experts
Figure 3 for Task-Specific Expert Pruning for Sparse Mixture-of-Experts
Figure 4 for Task-Specific Expert Pruning for Sparse Mixture-of-Experts
Viaarxiv icon