Alert button
Picture for Yibo Zhu

Yibo Zhu

Alert button

ByteDance

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Add code
Bookmark button
Alert button
Nov 17, 2023
Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu

Figure 1 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 2 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 3 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 4 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Viaarxiv icon

ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Add code
Bookmark button
Alert button
Oct 06, 2022
Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu

Figure 1 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 2 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 3 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 4 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Viaarxiv icon

ByteComp: Revisiting Gradient Compression in Distributed Training

Add code
Bookmark button
Alert button
Jun 06, 2022
Zhuang Wang, Haibin Lin, Yibo Zhu, T. S. Eugene Ng

Figure 1 for ByteComp: Revisiting Gradient Compression in Distributed Training
Figure 2 for ByteComp: Revisiting Gradient Compression in Distributed Training
Figure 3 for ByteComp: Revisiting Gradient Compression in Distributed Training
Figure 4 for ByteComp: Revisiting Gradient Compression in Distributed Training
Viaarxiv icon

Espresso: Revisiting Gradient Compression from the System Perspective

Add code
Bookmark button
Alert button
May 28, 2022
Zhuang Wang, Haibin Lin, Yibo Zhu, T. S. Eugene Ng

Figure 1 for Espresso: Revisiting Gradient Compression from the System Perspective
Figure 2 for Espresso: Revisiting Gradient Compression from the System Perspective
Figure 3 for Espresso: Revisiting Gradient Compression from the System Perspective
Figure 4 for Espresso: Revisiting Gradient Compression from the System Perspective
Viaarxiv icon

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Add code
Bookmark button
Alert button
May 18, 2022
Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo

Figure 1 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 2 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 3 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 4 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Viaarxiv icon

Aryl: An Elastic Cluster Scheduler for Deep Learning

Add code
Bookmark button
Alert button
Feb 16, 2022
Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang

Figure 1 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Figure 2 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Figure 3 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Figure 4 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Viaarxiv icon

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Add code
Bookmark button
Alert button
Dec 16, 2021
Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo

Figure 1 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 2 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 3 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 4 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Viaarxiv icon

Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance

Add code
Bookmark button
Alert button
Oct 25, 2021
Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu

Figure 1 for Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
Figure 2 for Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
Figure 3 for Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
Figure 4 for Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
Viaarxiv icon

Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem

Add code
Bookmark button
Alert button
Sep 18, 2021
Cheng Tan, Zhichao Li, Jian Zhang, Yu Cao, Sikai Qi, Zherui Liu, Yibo Zhu, Chuanxiong Guo

Figure 1 for Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Figure 2 for Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Figure 3 for Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Figure 4 for Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Viaarxiv icon