Picture for Minsoo Rhu

Minsoo Rhu

Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

Add code
May 10, 2022
Figure 1 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Figure 2 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Figure 3 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Figure 4 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Viaarxiv icon

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks

Add code
Mar 02, 2022
Figure 1 for GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks
Figure 2 for GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks
Figure 3 for GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks
Figure 4 for GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks
Viaarxiv icon

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

Add code
Feb 27, 2022
Figure 1 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Figure 2 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Figure 3 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Figure 4 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Viaarxiv icon

LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference

Add code
Oct 25, 2020
Figure 1 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Figure 2 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Figure 3 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Figure 4 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Viaarxiv icon

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training

Add code
Oct 25, 2020
Figure 1 for Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Figure 2 for Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Figure 3 for Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Figure 4 for Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Viaarxiv icon

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

Add code
May 12, 2020
Figure 1 for Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Figure 2 for Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Figure 3 for Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Figure 4 for Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Viaarxiv icon

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

Add code
Nov 15, 2019
Figure 1 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 2 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 3 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 4 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Viaarxiv icon

PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units

Add code
Sep 06, 2019
Figure 1 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Figure 2 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Figure 3 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Figure 4 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Viaarxiv icon

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

Add code
Aug 25, 2019
Figure 1 for TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Figure 2 for TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Figure 3 for TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Figure 4 for TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Viaarxiv icon

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning

Add code
Feb 18, 2019
Figure 1 for Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Figure 2 for Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Figure 3 for Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Figure 4 for Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Viaarxiv icon