Picture for Teng Su

Teng Su

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Add code
Mar 14, 2024
Figure 1 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Figure 2 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Figure 3 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Figure 4 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Viaarxiv icon

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X

Add code
Mar 30, 2023
Figure 1 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Figure 2 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Figure 3 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Figure 4 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Viaarxiv icon

PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Add code
Mar 20, 2023
Figure 1 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Figure 2 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Figure 3 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Figure 4 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Viaarxiv icon

PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Add code
Apr 26, 2021
Figure 1 for PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Figure 2 for PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Figure 3 for PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Figure 4 for PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Viaarxiv icon

TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism

Add code
Apr 16, 2020
Figure 1 for TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism
Figure 2 for TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism
Figure 3 for TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism
Figure 4 for TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism
Viaarxiv icon