Alert button
Picture for Yujeong Choi

Yujeong Choi

Alert button

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations

Feb 23, 2023
Yujeong Choi, John Kim, Minsoo Rhu

Figure 1 for Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
Figure 2 for Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
Figure 3 for Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
Figure 4 for Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
Viaarxiv icon

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

Feb 27, 2022
Yunseong Kim, Yujeong Choi, Minsoo Rhu

Figure 1 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Figure 2 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Figure 3 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Figure 4 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Viaarxiv icon

LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference

Oct 25, 2020
Yujeong Choi, Yunseong Kim, Minsoo Rhu

Figure 1 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Figure 2 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Figure 3 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Figure 4 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Viaarxiv icon

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

Nov 15, 2019
Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu

Figure 1 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 2 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 3 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 4 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Viaarxiv icon

PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units

Sep 06, 2019
Yujeong Choi, Minsoo Rhu

Figure 1 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Figure 2 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Figure 3 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Figure 4 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Viaarxiv icon