Picture for Sangyoon Oh

Sangyoon Oh

Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning

Add code
Feb 21, 2024
Figure 1 for Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning
Figure 2 for Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning
Figure 3 for Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning
Figure 4 for Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning
Viaarxiv icon

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training

Add code
Oct 02, 2023
Figure 1 for MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training
Figure 2 for MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training
Figure 3 for MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training
Figure 4 for MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training
Viaarxiv icon

DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification

Add code
Jul 13, 2023
Figure 1 for DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification
Figure 2 for DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification
Figure 3 for DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification
Figure 4 for DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification
Viaarxiv icon

Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment

Add code
Sep 18, 2022
Figure 1 for Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Figure 2 for Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Figure 3 for Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Figure 4 for Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Viaarxiv icon

Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability

Add code
Dec 30, 2020
Figure 1 for Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Figure 2 for Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Figure 3 for Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Figure 4 for Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Viaarxiv icon