Picture for Alexey Tumanov

Alexey Tumanov

Georgia Institute of Technology

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations

Add code
Sep 25, 2024
Viaarxiv icon

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

Add code
Jul 09, 2024
Viaarxiv icon

DεpS: Delayed ε-Shrinking for Faster Once-For-All Training

Add code
Jul 08, 2024
Viaarxiv icon

Vidur: A Large-Scale Simulation Framework For LLM Inference

Add code
May 08, 2024
Viaarxiv icon

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Add code
Mar 04, 2024
Viaarxiv icon

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads

Add code
Dec 27, 2023
Viaarxiv icon

Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off

Add code
Dec 04, 2023
Viaarxiv icon

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation

Add code
Oct 24, 2023
Viaarxiv icon

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

Add code
Jul 03, 2023
Viaarxiv icon

Subgraph Stationary Hardware-Software Inference Co-Design

Add code
Jun 21, 2023
Viaarxiv icon