Alert button
Picture for Alexey Tumanov

Alexey Tumanov

Alert button

Georgia Institute of Technology

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Add code
Bookmark button
Alert button
Mar 04, 2024
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee

Figure 1 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 2 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 3 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 4 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Viaarxiv icon

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads

Add code
Bookmark button
Alert button
Dec 27, 2023
Alind Khare, Dhruv Garg, Sukrit Kalra, Snigdha Grandhi, Ion Stoica, Alexey Tumanov

Viaarxiv icon

Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off

Add code
Bookmark button
Alert button
Dec 04, 2023
Sachit Kuhar, Yash Jain, Alexey Tumanov

Viaarxiv icon

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation

Add code
Bookmark button
Alert button
Oct 24, 2023
Anshul Ahluwalia, Rohit Das, Payman Behnam, Alind Khare, Pan Li, Alexey Tumanov

Figure 1 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Figure 2 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Figure 3 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Figure 4 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Viaarxiv icon

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

Add code
Bookmark button
Alert button
Jul 03, 2023
Debopam Sanyal, Jui-Tse Hung, Manav Agrawal, Prahlad Jasti, Shahab Nikkhoo, Somesh Jha, Tianhao Wang, Sibin Mohan, Alexey Tumanov

Figure 1 for Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Figure 2 for Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Figure 3 for Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Figure 4 for Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Viaarxiv icon

Subgraph Stationary Hardware-Software Inference Co-Design

Add code
Bookmark button
Alert button
Jun 21, 2023
Payman Behnam, Jianming Tong, Alind Khare, Yangyu Chen, Yue Pan, Pranav Gadikar, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexey Tumanov

Figure 1 for Subgraph Stationary Hardware-Software Inference Co-Design
Figure 2 for Subgraph Stationary Hardware-Software Inference Co-Design
Figure 3 for Subgraph Stationary Hardware-Software Inference Co-Design
Figure 4 for Subgraph Stationary Hardware-Software Inference Co-Design
Viaarxiv icon

DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization

Add code
Bookmark button
Alert button
Jun 20, 2023
Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov

Figure 1 for DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Figure 2 for DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Figure 3 for DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Figure 4 for DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Viaarxiv icon

SuperFed: Weight Shared Federated Learning

Add code
Bookmark button
Alert button
Jan 26, 2023
Alind Khare, Animesh Agrawal, Myungjin Lee, Alexey Tumanov

Figure 1 for SuperFed: Weight Shared Federated Learning
Figure 2 for SuperFed: Weight Shared Federated Learning
Figure 3 for SuperFed: Weight Shared Federated Learning
Figure 4 for SuperFed: Weight Shared Federated Learning
Viaarxiv icon

Signed Binary Weight Networks: Improving Efficiency of Binary Weight Networks by Exploiting Sparsity

Add code
Bookmark button
Alert button
Nov 25, 2022
Sachit Kuhar, Alexey Tumanov, Judy Hoffman

Figure 1 for Signed Binary Weight Networks: Improving Efficiency of Binary Weight Networks by Exploiting Sparsity
Figure 2 for Signed Binary Weight Networks: Improving Efficiency of Binary Weight Networks by Exploiting Sparsity
Figure 3 for Signed Binary Weight Networks: Improving Efficiency of Binary Weight Networks by Exploiting Sparsity
Figure 4 for Signed Binary Weight Networks: Improving Efficiency of Binary Weight Networks by Exploiting Sparsity
Viaarxiv icon

UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification

Add code
Bookmark button
Alert button
Oct 28, 2022
Yanbo Xu, Alind Khare, Glenn Matlin, Monish Ramadoss, Rishikesan Kamaleswaran, Chao Zhang, Alexey Tumanov

Figure 1 for UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
Figure 2 for UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
Figure 3 for UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
Figure 4 for UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
Viaarxiv icon