Alert button
Picture for Shivaram Venkataraman

Shivaram Venkataraman

Alert button

CHAI: Clustered Head Attention for Efficient LLM Inference

Add code
Bookmark button
Alert button
Mar 12, 2024
Saurabh Agarwal, Bilge Acun, Basil Homer, Mostafa Elhoushi, Yejin Lee, Shivaram Venkataraman, Dimitris Papailiopoulos, Carole-Jean Wu

Figure 1 for CHAI: Clustered Head Attention for Efficient LLM Inference
Figure 2 for CHAI: Clustered Head Attention for Efficient LLM Inference
Figure 3 for CHAI: Clustered Head Attention for Efficient LLM Inference
Figure 4 for CHAI: Clustered Head Attention for Efficient LLM Inference
Viaarxiv icon

Decoding Speculative Decoding

Add code
Bookmark button
Alert button
Feb 02, 2024
Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman

Viaarxiv icon

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices

Add code
Bookmark button
Alert button
Oct 30, 2023
Minghao Yan, Hongyi Wang, Shivaram Venkataraman

Viaarxiv icon

Does compressing activations help model parallel training?

Add code
Bookmark button
Alert button
Jan 06, 2023
Song Bian, Dacheng Li, Hongyi Wang, Eric P. Xing, Shivaram Venkataraman

Figure 1 for Does compressing activations help model parallel training?
Figure 2 for Does compressing activations help model parallel training?
Figure 3 for Does compressing activations help model parallel training?
Figure 4 for Does compressing activations help model parallel training?
Viaarxiv icon

BagPipe: Accelerating Deep Recommendation Model Training

Add code
Bookmark button
Alert button
Feb 24, 2022
Saurabh Agarwal, Ziyi Zhang, Shivaram Venkataraman

Figure 1 for BagPipe: Accelerating Deep Recommendation Model Training
Figure 2 for BagPipe: Accelerating Deep Recommendation Model Training
Figure 3 for BagPipe: Accelerating Deep Recommendation Model Training
Figure 4 for BagPipe: Accelerating Deep Recommendation Model Training
Viaarxiv icon

Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine

Add code
Bookmark button
Alert button
Feb 04, 2022
Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, Shivaram Venkataraman

Figure 1 for Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine
Figure 2 for Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine
Figure 3 for Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine
Figure 4 for Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine
Viaarxiv icon

Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

Add code
Bookmark button
Alert button
Nov 20, 2021
Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman, Aditya Akella

Figure 1 for Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters
Figure 2 for Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters
Figure 3 for Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters
Figure 4 for Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters
Viaarxiv icon

KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks

Add code
Bookmark button
Alert button
Jul 04, 2021
J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang

Figure 1 for KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks
Figure 2 for KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks
Figure 3 for KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks
Figure 4 for KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks
Viaarxiv icon

On the Utility of Gradient Compression in Distributed Training Systems

Add code
Bookmark button
Alert button
Mar 03, 2021
Saurabh Agarwal, Hongyi Wang, Shivaram Venkataraman, Dimitris Papailiopoulos

Figure 1 for On the Utility of Gradient Compression in Distributed Training Systems
Figure 2 for On the Utility of Gradient Compression in Distributed Training Systems
Figure 3 for On the Utility of Gradient Compression in Distributed Training Systems
Figure 4 for On the Utility of Gradient Compression in Distributed Training Systems
Viaarxiv icon