Alert button
Picture for Zhuohan Li

Zhuohan Li

Alert button

Fairness in Serving Large Language Models

Add code
Bookmark button
Alert button
Dec 31, 2023
Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

Viaarxiv icon

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

Add code
Bookmark button
Alert button
Sep 30, 2023
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric. P Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

Figure 1 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Figure 2 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Figure 3 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Figure 4 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Viaarxiv icon

Efficient Memory Management for Large Language Model Serving with PagedAttention

Add code
Bookmark button
Alert button
Sep 12, 2023
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

Figure 1 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Figure 2 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Figure 3 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Figure 4 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Viaarxiv icon

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Add code
Bookmark button
Alert button
Jun 09, 2023
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Figure 1 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Figure 2 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Figure 3 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Figure 4 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Viaarxiv icon

What is the State of Memory Saving for Model Training?

Add code
Bookmark button
Alert button
Mar 26, 2023
Xiaoxuan Liu, Siddharth Jha, Chuyan Zhu, Zhuohan Li, Alvin Cheung

Figure 1 for What is the State of Memory Saving for Model Training?
Figure 2 for What is the State of Memory Saving for Model Training?
Figure 3 for What is the State of Memory Saving for Model Training?
Figure 4 for What is the State of Memory Saving for Model Training?
Viaarxiv icon

High-throughput Generative Inference of Large Language Models with a Single GPU

Add code
Bookmark button
Alert button
Mar 13, 2023
Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

Figure 1 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 2 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 3 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 4 for High-throughput Generative Inference of Large Language Models with a Single GPU
Viaarxiv icon

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

Add code
Bookmark button
Alert button
Feb 22, 2023
Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Figure 1 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 2 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 3 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 4 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Viaarxiv icon

On Optimizing the Communication of Model Parallelism

Add code
Bookmark button
Alert button
Nov 10, 2022
Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

Figure 1 for On Optimizing the Communication of Model Parallelism
Figure 2 for On Optimizing the Communication of Model Parallelism
Figure 3 for On Optimizing the Communication of Model Parallelism
Figure 4 for On Optimizing the Communication of Model Parallelism
Viaarxiv icon

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Add code
Bookmark button
Alert button
Jan 28, 2022
Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

Figure 1 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Figure 2 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Figure 3 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Figure 4 for Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Viaarxiv icon