Alert button
Picture for Yanghua Peng

Yanghua Peng

Alert button

ByteDance

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

Mar 02, 2024
Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Chuan Wu

Figure 1 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 2 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 3 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 4 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Viaarxiv icon

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Feb 23, 2024
Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu

Viaarxiv icon

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Nov 17, 2023
Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu

Figure 1 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 2 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 3 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 4 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Viaarxiv icon

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

May 18, 2022
Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo

Figure 1 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 2 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 3 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 4 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Viaarxiv icon

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Dec 16, 2021
Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo

Figure 1 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 2 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 3 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 4 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Viaarxiv icon

DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters

Sep 13, 2019
Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, Chen Meng, Wei Lin

Figure 1 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Figure 2 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Figure 3 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Figure 4 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Viaarxiv icon