Alert button
Picture for Ramachandran Ramjee

Ramachandran Ramjee

Alert button

Microsoft

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Add code
Bookmark button
Alert button
Mar 04, 2024
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee

Figure 1 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 2 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 3 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 4 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Viaarxiv icon

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Add code
Bookmark button
Alert button
Aug 31, 2023
Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee

Figure 1 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Figure 2 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Figure 3 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Figure 4 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Viaarxiv icon

NGAME: Negative Mining-aware Mini-batching for Extreme Classification

Add code
Bookmark button
Alert button
Jul 10, 2022
Kunal Dahiya, Nilesh Gupta, Deepak Saini, Akshay Soni, Yajun Wang, Kushal Dave, Jian Jiao, Gururaj K, Prasenjit Dey, Amit Singh, Deepesh Hada, Vidit Jain, Bhawna Paliwal, Anshul Mittal, Sonu Mehta, Ramachandran Ramjee, Sumeet Agarwal, Purushottam Kar, Manik Varma

Figure 1 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 2 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 3 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 4 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Viaarxiv icon

Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

Add code
Bookmark button
Alert button
Feb 21, 2022
Dharma Shukla, Muthian Sivathanu, Srinidhi Viswanatha, Bhargav Gulavani, Rimma Nehme, Amey Agrawal, Chen Chen, Nipun Kwatra, Ramachandran Ramjee, Pankaj Sharma, Atul Katiyar, Vipul Modi, Vaibhav Sharma, Abhishek Singh, Shreshth Singhal, Kaustubh Welankar, Lu Xun, Ravi Anupindi, Karthik Elangovan, Hasibur Rahman, Zhou Lin, Rahul Seetharaman, Cheng Xu, Eddie Ailijiang, Suresh Krishnappa, Mark Russinovich

Figure 1 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 2 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 3 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 4 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Viaarxiv icon

Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads

Add code
Bookmark button
Alert button
Feb 16, 2022
Dharma Shukla, Muthian Sivathanu, Srinidhi Viswanatha, Bhargav Gulavani, Rimma Nehme, Amey Agrawal, Chen Chen, Nipun Kwatra, Ramachandran Ramjee, Pankaj Sharma, Atul Katiyar, Vipul Modi, Vaibhav Sharma, Abhishek Singh, Shreshth Singhal, Kaustubh Welankar, Lu Xun, Ravi Anupindi, Karthik Elangovan, Hasibur Rahman, Zhou Lin, Rahul Seetharaman, Cheng Xu, Eddie Ailijiang, Suresh Krishnappa, Mark Russinovich

Figure 1 for Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads
Figure 2 for Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads
Figure 3 for Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads
Figure 4 for Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads
Viaarxiv icon

LRTuner: A Learning Rate Tuner for Deep Neural Networks

Add code
Bookmark button
Alert button
May 30, 2021
Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

Figure 1 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Figure 2 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Figure 3 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Figure 4 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Viaarxiv icon

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

Add code
Bookmark button
Alert button
Mar 09, 2020
Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

Figure 1 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Figure 2 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Figure 3 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Figure 4 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Viaarxiv icon

Privado: Practical and Secure DNN Inference

Add code
Bookmark button
Alert button
Oct 01, 2018
Shruti Tople, Karan Grover, Shweta Shinde, Ranjita Bhagwan, Ramachandran Ramjee

Figure 1 for Privado: Practical and Secure DNN Inference
Figure 2 for Privado: Practical and Secure DNN Inference
Figure 3 for Privado: Practical and Secure DNN Inference
Figure 4 for Privado: Practical and Secure DNN Inference
Viaarxiv icon