Alert button
Picture for Xupeng Miao

Xupeng Miao

Alert button

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

Add code
Bookmark button
Alert button
Feb 29, 2024
Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, Zhihao Jia

Viaarxiv icon

Generative Dense Retrieval: Memory Can Be a Burden

Add code
Bookmark button
Alert button
Jan 19, 2024
Peiwen Yuan, Xinglin Wang, Shaoxiong Feng, Boyuan Pan, Yiwei Li, Heda Wang, Xupeng Miao, Kan Li

Viaarxiv icon

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

Add code
Bookmark button
Alert button
Jan 13, 2024
Zhengxin Zhang, Dan Zhao, Xupeng Miao, Gabriele Oliaro, Qing Li, Yong Jiang, Zhihao Jia

Viaarxiv icon

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Add code
Bookmark button
Alert button
Dec 23, 2023
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia

Viaarxiv icon

Experimental Analysis of Large-scale Learnable Vector Storage Compression

Add code
Bookmark button
Alert button
Nov 27, 2023
Hailin Zhang, Penghao Zhao, Xupeng Miao, Yingxia Shao, Zirui Liu, Tong Yang, Bin Cui

Viaarxiv icon

SpotServe: Serving Generative Large Language Models on Preemptible Instances

Add code
Bookmark button
Alert button
Nov 27, 2023
Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia

Viaarxiv icon

Model-enhanced Vector Index

Add code
Bookmark button
Alert button
Sep 23, 2023
Hailin Zhang, Yujing Wang, Qi Chen, Ruiheng Chang, Ting Zhang, Ziming Miao, Yingyan Hou, Yang Ding, Xupeng Miao, Haonan Wang, Bochen Pang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Xing Xie, Mao Yang, Bin Cui

Figure 1 for Model-enhanced Vector Index
Figure 2 for Model-enhanced Vector Index
Figure 3 for Model-enhanced Vector Index
Figure 4 for Model-enhanced Vector Index
Viaarxiv icon

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

Add code
Bookmark button
Alert button
Jul 05, 2023
Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Xiaonan Nie, Bin Cui

Figure 1 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 2 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 3 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 4 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Viaarxiv icon

FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference

Add code
Bookmark button
Alert button
May 27, 2023
Zihao Yu, Haoyang Li, Fangcheng Fu, Xupeng Miao, Bin Cui

Figure 1 for FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference
Figure 2 for FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference
Figure 3 for FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference
Figure 4 for FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference
Viaarxiv icon

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

Add code
Bookmark button
Alert button
May 16, 2023
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Rae Ying Yee Wong, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

Figure 1 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 2 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 3 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 4 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Viaarxiv icon