Alert button
Picture for Zherui Liu

Zherui Liu

Alert button

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Add code
Bookmark button
Alert button
Feb 23, 2024
Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu

Viaarxiv icon

Aryl: An Elastic Cluster Scheduler for Deep Learning

Add code
Bookmark button
Alert button
Feb 16, 2022
Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang

Figure 1 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Figure 2 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Figure 3 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Figure 4 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Viaarxiv icon

Prediction of GPU Failures Under Deep Learning Workloads

Add code
Bookmark button
Alert button
Jan 27, 2022
Heting Liu, Zhichao Li, Cheng Tan, Rongqiu Yang, Guohong Cao, Zherui Liu, Chuanxiong Guo

Figure 1 for Prediction of GPU Failures Under Deep Learning Workloads
Figure 2 for Prediction of GPU Failures Under Deep Learning Workloads
Figure 3 for Prediction of GPU Failures Under Deep Learning Workloads
Figure 4 for Prediction of GPU Failures Under Deep Learning Workloads
Viaarxiv icon

Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem

Add code
Bookmark button
Alert button
Sep 18, 2021
Cheng Tan, Zhichao Li, Jian Zhang, Yu Cao, Sikai Qi, Zherui Liu, Yibo Zhu, Chuanxiong Guo

Figure 1 for Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Figure 2 for Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Figure 3 for Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Figure 4 for Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Viaarxiv icon