Alert button
Picture for Alexey Tumanov

Alexey Tumanov

Alert button

Georgia Institute of Technology

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Mar 04, 2024
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee

Figure 1 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 2 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 3 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 4 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Viaarxiv icon

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads

Dec 27, 2023
Alind Khare, Dhruv Garg, Sukrit Kalra, Snigdha Grandhi, Ion Stoica, Alexey Tumanov

Viaarxiv icon

Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off

Dec 04, 2023
Sachit Kuhar, Yash Jain, Alexey Tumanov

Viaarxiv icon

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation

Oct 24, 2023
Anshul Ahluwalia, Rohit Das, Payman Behnam, Alind Khare, Pan Li, Alexey Tumanov

Figure 1 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Figure 2 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Figure 3 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Figure 4 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Viaarxiv icon

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

Jul 03, 2023
Debopam Sanyal, Jui-Tse Hung, Manav Agrawal, Prahlad Jasti, Shahab Nikkhoo, Somesh Jha, Tianhao Wang, Sibin Mohan, Alexey Tumanov

Figure 1 for Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Figure 2 for Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Figure 3 for Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Figure 4 for Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Viaarxiv icon

Subgraph Stationary Hardware-Software Inference Co-Design

Jun 21, 2023
Payman Behnam, Jianming Tong, Alind Khare, Yangyu Chen, Yue Pan, Pranav Gadikar, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexey Tumanov

Figure 1 for Subgraph Stationary Hardware-Software Inference Co-Design
Figure 2 for Subgraph Stationary Hardware-Software Inference Co-Design
Figure 3 for Subgraph Stationary Hardware-Software Inference Co-Design
Figure 4 for Subgraph Stationary Hardware-Software Inference Co-Design
Viaarxiv icon

DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization

Jun 20, 2023
Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov

Figure 1 for DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Figure 2 for DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Figure 3 for DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Figure 4 for DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Viaarxiv icon

SuperFed: Weight Shared Federated Learning

Jan 26, 2023
Alind Khare, Animesh Agrawal, Myungjin Lee, Alexey Tumanov

Figure 1 for SuperFed: Weight Shared Federated Learning
Figure 2 for SuperFed: Weight Shared Federated Learning
Figure 3 for SuperFed: Weight Shared Federated Learning
Figure 4 for SuperFed: Weight Shared Federated Learning
Viaarxiv icon

Signed Binary Weight Networks: Improving Efficiency of Binary Weight Networks by Exploiting Sparsity

Nov 25, 2022
Sachit Kuhar, Alexey Tumanov, Judy Hoffman

Figure 1 for Signed Binary Weight Networks: Improving Efficiency of Binary Weight Networks by Exploiting Sparsity
Figure 2 for Signed Binary Weight Networks: Improving Efficiency of Binary Weight Networks by Exploiting Sparsity
Figure 3 for Signed Binary Weight Networks: Improving Efficiency of Binary Weight Networks by Exploiting Sparsity
Figure 4 for Signed Binary Weight Networks: Improving Efficiency of Binary Weight Networks by Exploiting Sparsity
Viaarxiv icon

UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification

Oct 28, 2022
Yanbo Xu, Alind Khare, Glenn Matlin, Monish Ramadoss, Rishikesan Kamaleswaran, Chao Zhang, Alexey Tumanov

Figure 1 for UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
Figure 2 for UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
Figure 3 for UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
Figure 4 for UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
Viaarxiv icon