Alert button
Picture for Lianmin Zheng

Lianmin Zheng

Alert button

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Add code
Bookmark button
Alert button
Jun 09, 2023
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Figure 1 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Figure 2 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Figure 3 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Figure 4 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Viaarxiv icon

On Optimal Caching and Model Multiplexing for Large Model Inference

Add code
Bookmark button
Alert button
Jun 03, 2023
Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao

Figure 1 for On Optimal Caching and Model Multiplexing for Large Model Inference
Figure 2 for On Optimal Caching and Model Multiplexing for Large Model Inference
Figure 3 for On Optimal Caching and Model Multiplexing for Large Model Inference
Figure 4 for On Optimal Caching and Model Multiplexing for Large Model Inference
Viaarxiv icon

High-throughput Generative Inference of Large Language Models with a Single GPU

Add code
Bookmark button
Alert button
Mar 13, 2023
Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

Figure 1 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 2 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 3 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 4 for High-throughput Generative Inference of Large Language Models with a Single GPU
Viaarxiv icon

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

Add code
Bookmark button
Alert button
Feb 22, 2023
Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Figure 1 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 2 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 3 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 4 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Viaarxiv icon

On Optimizing the Communication of Model Parallelism

Add code
Bookmark button
Alert button
Nov 10, 2022
Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

Figure 1 for On Optimizing the Communication of Model Parallelism
Figure 2 for On Optimizing the Communication of Model Parallelism
Figure 3 for On Optimizing the Communication of Model Parallelism
Figure 4 for On Optimizing the Communication of Model Parallelism
Viaarxiv icon

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

Add code
Bookmark button
Alert button
Jul 09, 2022
Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen

Figure 1 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 2 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 3 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 4 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Viaarxiv icon

NumS: Scalable Array Programming for the Cloud

Add code
Bookmark button
Alert button
Jun 28, 2022
Melih Elibol, Vinamra Benara, Samyu Yagati, Lianmin Zheng, Alvin Cheung, Michael I. Jordan, Ion Stoica

Figure 1 for NumS: Scalable Array Programming for the Cloud
Figure 2 for NumS: Scalable Array Programming for the Cloud
Figure 3 for NumS: Scalable Array Programming for the Cloud
Figure 4 for NumS: Scalable Array Programming for the Cloud
Viaarxiv icon

GACT: Activation Compressed Training for General Architectures

Add code
Bookmark button
Alert button
Jun 28, 2022
Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung

Figure 1 for GACT: Activation Compressed Training for General Architectures
Figure 2 for GACT: Activation Compressed Training for General Architectures
Figure 3 for GACT: Activation Compressed Training for General Architectures
Figure 4 for GACT: Activation Compressed Training for General Architectures
Viaarxiv icon

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Add code
Bookmark button
Alert button
Jan 28, 2022
Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

Figure 1 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Figure 2 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Figure 3 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Figure 4 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Viaarxiv icon