Picture for Zhuohan Li

Zhuohan Li

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

Add code
Jun 20, 2024
Figure 1 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 2 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 3 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 4 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Viaarxiv icon

Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning

Add code
May 11, 2024
Viaarxiv icon

Fairness in Serving Large Language Models

Add code
Dec 31, 2023
Figure 1 for Fairness in Serving Large Language Models
Figure 2 for Fairness in Serving Large Language Models
Figure 3 for Fairness in Serving Large Language Models
Figure 4 for Fairness in Serving Large Language Models
Viaarxiv icon

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

Add code
Sep 30, 2023
Figure 1 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Figure 2 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Figure 3 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Figure 4 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Viaarxiv icon

Efficient Memory Management for Large Language Model Serving with PagedAttention

Add code
Sep 12, 2023
Figure 1 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Figure 2 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Figure 3 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Figure 4 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Viaarxiv icon

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Add code
Jun 09, 2023
Figure 1 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Figure 2 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Figure 3 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Figure 4 for Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Viaarxiv icon

What is the State of Memory Saving for Model Training?

Add code
Mar 26, 2023
Figure 1 for What is the State of Memory Saving for Model Training?
Figure 2 for What is the State of Memory Saving for Model Training?
Figure 3 for What is the State of Memory Saving for Model Training?
Figure 4 for What is the State of Memory Saving for Model Training?
Viaarxiv icon

High-throughput Generative Inference of Large Language Models with a Single GPU

Add code
Mar 13, 2023
Figure 1 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 2 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 3 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 4 for High-throughput Generative Inference of Large Language Models with a Single GPU
Viaarxiv icon

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

Add code
Feb 22, 2023
Figure 1 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 2 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 3 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 4 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Viaarxiv icon

On Optimizing the Communication of Model Parallelism

Add code
Nov 10, 2022
Figure 1 for On Optimizing the Communication of Model Parallelism
Figure 2 for On Optimizing the Communication of Model Parallelism
Figure 3 for On Optimizing the Communication of Model Parallelism
Figure 4 for On Optimizing the Communication of Model Parallelism
Viaarxiv icon