Picture for Ramachandran Ramjee

Ramachandran Ramjee

Microsoft

ASTRA: Accurate and Scalable ANNS-based Training of Extreme Classifiers

Add code
Sep 30, 2024
Viaarxiv icon

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations

Add code
Sep 25, 2024
Figure 1 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 2 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 3 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 4 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Viaarxiv icon

Accuracy is Not All You Need

Add code
Jul 12, 2024
Viaarxiv icon

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

Add code
Jul 09, 2024
Viaarxiv icon

Vidur: A Large-Scale Simulation Framework For LLM Inference

Add code
May 08, 2024
Viaarxiv icon

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Add code
May 07, 2024
Viaarxiv icon

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Add code
Mar 04, 2024
Viaarxiv icon

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Add code
Aug 31, 2023
Viaarxiv icon

NGAME: Negative Mining-aware Mini-batching for Extreme Classification

Add code
Jul 10, 2022
Figure 1 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 2 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 3 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 4 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Viaarxiv icon

Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

Add code
Feb 21, 2022
Figure 1 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 2 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 3 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 4 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Viaarxiv icon