Picture for Yanghua Peng

Yanghua Peng

ByteDance

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

Add code
Jul 02, 2024
Viaarxiv icon

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

Add code
Mar 02, 2024
Figure 1 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 2 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 3 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 4 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Viaarxiv icon

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Add code
Feb 23, 2024
Viaarxiv icon

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Add code
Nov 17, 2023
Figure 1 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 2 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 3 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 4 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Viaarxiv icon

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Add code
May 18, 2022
Figure 1 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 2 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 3 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 4 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Viaarxiv icon

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Add code
Dec 16, 2021
Figure 1 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 2 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 3 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 4 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Viaarxiv icon

DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters

Add code
Sep 13, 2019
Figure 1 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Figure 2 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Figure 3 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Figure 4 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Viaarxiv icon