Alert button
Picture for Yinmin Zhong

Yinmin Zhong

Alert button

LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

Add code
Bookmark button
Alert button
Apr 15, 2024
Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin

Viaarxiv icon

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Add code
Bookmark button
Alert button
Feb 23, 2024
Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu

Viaarxiv icon

Fast Distributed Inference Serving for Large Language Models

Add code
Bookmark button
Alert button
May 10, 2023
Bingyang Wu, Yinmin Zhong, Zili Zhang, Gang Huang, Xuanzhe Liu, Xin Jin

Figure 1 for Fast Distributed Inference Serving for Large Language Models
Figure 2 for Fast Distributed Inference Serving for Large Language Models
Figure 3 for Fast Distributed Inference Serving for Large Language Models
Figure 4 for Fast Distributed Inference Serving for Large Language Models
Viaarxiv icon

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

Add code
Bookmark button
Alert button
Feb 22, 2023
Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Figure 1 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 2 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 3 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Figure 4 for AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Viaarxiv icon