Picture for Youngeun Kwon

Youngeun Kwon

LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models

Add code
Apr 12, 2024
Viaarxiv icon

Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

Add code
May 10, 2022
Figure 1 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Figure 2 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Figure 3 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Figure 4 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Viaarxiv icon

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training

Add code
Oct 25, 2020
Figure 1 for Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Figure 2 for Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Figure 3 for Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Figure 4 for Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Viaarxiv icon

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

Add code
May 12, 2020
Figure 1 for Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Figure 2 for Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Figure 3 for Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Figure 4 for Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Viaarxiv icon

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

Add code
Nov 15, 2019
Figure 1 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 2 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 3 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 4 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Viaarxiv icon

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

Add code
Aug 25, 2019
Figure 1 for TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Figure 2 for TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Figure 3 for TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Figure 4 for TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Viaarxiv icon

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning

Add code
Feb 18, 2019
Figure 1 for Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Figure 2 for Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Figure 3 for Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Figure 4 for Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Viaarxiv icon