Picture for Ramachandran Ramjee

Ramachandran Ramjee

Microsoft

Accuracy is Not All You Need

Add code
Jul 12, 2024
Viaarxiv icon

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

Add code
Jul 09, 2024
Viaarxiv icon

Vidur: A Large-Scale Simulation Framework For LLM Inference

Add code
May 08, 2024
Figure 1 for Vidur: A Large-Scale Simulation Framework For LLM Inference
Figure 2 for Vidur: A Large-Scale Simulation Framework For LLM Inference
Figure 3 for Vidur: A Large-Scale Simulation Framework For LLM Inference
Figure 4 for Vidur: A Large-Scale Simulation Framework For LLM Inference
Viaarxiv icon

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Add code
May 07, 2024
Figure 1 for vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Figure 2 for vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Figure 3 for vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Figure 4 for vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Viaarxiv icon

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Add code
Mar 04, 2024
Figure 1 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 2 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 3 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 4 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Viaarxiv icon

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Add code
Aug 31, 2023
Figure 1 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Figure 2 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Figure 3 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Figure 4 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Viaarxiv icon

NGAME: Negative Mining-aware Mini-batching for Extreme Classification

Add code
Jul 10, 2022
Figure 1 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 2 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 3 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 4 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Viaarxiv icon

Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

Add code
Feb 21, 2022
Figure 1 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 2 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 3 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 4 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Viaarxiv icon

LRTuner: A Learning Rate Tuner for Deep Neural Networks

Add code
May 30, 2021
Figure 1 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Figure 2 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Figure 3 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Figure 4 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Viaarxiv icon

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

Add code
Mar 09, 2020
Figure 1 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Figure 2 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Figure 3 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Figure 4 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Viaarxiv icon