Alert button
Picture for Jilong Xue

Jilong Xue

Alert button

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Add code
Bookmark button
Alert button
Feb 27, 2024
Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

Viaarxiv icon

Retentive Network: A Successor to Transformer for Large Language Models

Add code
Bookmark button
Alert button
Aug 09, 2023
Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei

Figure 1 for Retentive Network: A Successor to Transformer for Large Language Models
Figure 2 for Retentive Network: A Successor to Transformer for Large Language Models
Figure 3 for Retentive Network: A Successor to Transformer for Large Language Models
Figure 4 for Retentive Network: A Successor to Transformer for Large Language Models
Viaarxiv icon

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

Add code
Bookmark button
Alert button
Apr 08, 2023
Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui

Figure 1 for FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Figure 2 for FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Figure 3 for FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Figure 4 for FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Viaarxiv icon

Dense-to-Sparse Gate for Mixture-of-Experts

Add code
Bookmark button
Alert button
Dec 29, 2021
Xiaonan Nie, Shijie Cao, Xupeng Miao, Lingxiao Ma, Jilong Xue, Youshan Miao, Zichao Yang, Zhi Yang, Bin Cui

Figure 1 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 2 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 3 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 4 for Dense-to-Sparse Gate for Mixture-of-Experts
Viaarxiv icon

Towards Efficient Large-Scale Graph Neural Network Computing

Add code
Bookmark button
Alert button
Oct 19, 2018
Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, Yafei Dai

Figure 1 for Towards Efficient Large-Scale Graph Neural Network Computing
Figure 2 for Towards Efficient Large-Scale Graph Neural Network Computing
Figure 3 for Towards Efficient Large-Scale Graph Neural Network Computing
Figure 4 for Towards Efficient Large-Scale Graph Neural Network Computing
Viaarxiv icon

RPC Considered Harmful: Fast Distributed Deep Learning on RDMA

Add code
Bookmark button
Alert button
May 22, 2018
Jilong Xue, Youshan Miao, Cheng Chen, Ming Wu, Lintao Zhang, Lidong Zhou

Figure 1 for RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Figure 2 for RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Figure 3 for RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Figure 4 for RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Viaarxiv icon