Picture for Daegun Yoon

Daegun Yoon

HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing

Add code
Apr 18, 2025
Viaarxiv icon

Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning

Add code
Feb 21, 2024
Viaarxiv icon

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training

Add code
Oct 02, 2023
Viaarxiv icon

DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification

Add code
Jul 13, 2023
Viaarxiv icon

Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment

Add code
Sep 18, 2022
Figure 1 for Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Figure 2 for Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Figure 3 for Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Figure 4 for Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Viaarxiv icon