Alert button
Picture for Gabriele Oliaro

Gabriele Oliaro

Alert button

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

Add code
Bookmark button
Alert button
Feb 29, 2024
Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, Zhihao Jia

Viaarxiv icon

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

Add code
Bookmark button
Alert button
Jan 13, 2024
Zhengxin Zhang, Dan Zhao, Xupeng Miao, Gabriele Oliaro, Qing Li, Yong Jiang, Zhihao Jia

Viaarxiv icon

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Add code
Bookmark button
Alert button
Dec 23, 2023
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia

Viaarxiv icon

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

Add code
Bookmark button
Alert button
May 16, 2023
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Rae Ying Yee Wong, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

Figure 1 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 2 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 3 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 4 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Viaarxiv icon