Picture for Lianmin Zheng

Lianmin Zheng

On Optimal Caching and Model Multiplexing for Large Model Inference

Add code
Jun 03, 2023
Viaarxiv icon

High-throughput Generative Inference of Large Language Models with a Single GPU

Add code
Mar 13, 2023
Figure 1 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 2 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 3 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 4 for High-throughput Generative Inference of Large Language Models with a Single GPU
Viaarxiv icon

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

Add code
Feb 22, 2023
Viaarxiv icon

On Optimizing the Communication of Model Parallelism

Add code
Nov 10, 2022
Figure 1 for On Optimizing the Communication of Model Parallelism
Figure 2 for On Optimizing the Communication of Model Parallelism
Figure 3 for On Optimizing the Communication of Model Parallelism
Figure 4 for On Optimizing the Communication of Model Parallelism
Viaarxiv icon

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

Add code
Jul 09, 2022
Figure 1 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 2 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 3 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 4 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Viaarxiv icon

NumS: Scalable Array Programming for the Cloud

Add code
Jun 28, 2022
Figure 1 for NumS: Scalable Array Programming for the Cloud
Figure 2 for NumS: Scalable Array Programming for the Cloud
Figure 3 for NumS: Scalable Array Programming for the Cloud
Figure 4 for NumS: Scalable Array Programming for the Cloud
Viaarxiv icon

GACT: Activation Compressed Training for General Architectures

Add code
Jun 28, 2022
Figure 1 for GACT: Activation Compressed Training for General Architectures
Figure 2 for GACT: Activation Compressed Training for General Architectures
Figure 3 for GACT: Activation Compressed Training for General Architectures
Figure 4 for GACT: Activation Compressed Training for General Architectures
Viaarxiv icon

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Add code
Jan 28, 2022
Figure 1 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Figure 2 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Figure 3 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Figure 4 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Viaarxiv icon

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Add code
Apr 29, 2021
Figure 1 for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Figure 2 for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Figure 3 for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Figure 4 for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Viaarxiv icon

Ansor : Generating High-Performance Tensor Programs for Deep Learning

Add code
Jun 15, 2020
Figure 1 for Ansor : Generating High-Performance Tensor Programs for Deep Learning
Figure 2 for Ansor : Generating High-Performance Tensor Programs for Deep Learning
Figure 3 for Ansor : Generating High-Performance Tensor Programs for Deep Learning
Figure 4 for Ansor : Generating High-Performance Tensor Programs for Deep Learning
Viaarxiv icon