Picture for Swagath Venkataramani

Swagath Venkataramani

Is Finer Better? The Limits of Microscaling Formats in Large Language Models

Add code
Jan 26, 2026
Viaarxiv icon

Enhance DNN Adversarial Robustness and Efficiency via Injecting Noise to Non-Essential Neurons

Add code
Feb 06, 2024
Viaarxiv icon

Approximate Computing and the Efficient Machine Learning Expedition

Add code
Oct 02, 2022
Figure 1 for Approximate Computing and the Efficient Machine Learning Expedition
Figure 2 for Approximate Computing and the Efficient Machine Learning Expedition
Figure 3 for Approximate Computing and the Efficient Machine Learning Expedition
Figure 4 for Approximate Computing and the Efficient Machine Learning Expedition
Viaarxiv icon

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

Add code
Jun 16, 2022
Figure 1 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Figure 2 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Figure 3 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Viaarxiv icon

4-bit Quantization of LSTM-based Speech Recognition Models

Add code
Aug 27, 2021
Figure 1 for 4-bit Quantization of LSTM-based Speech Recognition Models
Figure 2 for 4-bit Quantization of LSTM-based Speech Recognition Models
Figure 3 for 4-bit Quantization of LSTM-based Speech Recognition Models
Figure 4 for 4-bit Quantization of LSTM-based Speech Recognition Models
Viaarxiv icon

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Add code
Apr 21, 2021
Figure 1 for ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Figure 2 for ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Figure 3 for ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Figure 4 for ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Viaarxiv icon

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks

Add code
Jul 17, 2018
Figure 1 for Bridging the Accuracy Gap for 2-bit Quantized Neural Networks
Figure 2 for Bridging the Accuracy Gap for 2-bit Quantized Neural Networks
Figure 3 for Bridging the Accuracy Gap for 2-bit Quantized Neural Networks
Figure 4 for Bridging the Accuracy Gap for 2-bit Quantized Neural Networks
Viaarxiv icon

PACT: Parameterized Clipping Activation for Quantized Neural Networks

Add code
Jul 17, 2018
Figure 1 for PACT: Parameterized Clipping Activation for Quantized Neural Networks
Figure 2 for PACT: Parameterized Clipping Activation for Quantized Neural Networks
Figure 3 for PACT: Parameterized Clipping Activation for Quantized Neural Networks
Figure 4 for PACT: Parameterized Clipping Activation for Quantized Neural Networks
Viaarxiv icon

SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks

Add code
Nov 29, 2017
Figure 1 for SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks
Figure 2 for SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks
Figure 3 for SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks
Figure 4 for SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks
Viaarxiv icon

DyVEDeep: Dynamic Variable Effort Deep Neural Networks

Add code
Apr 04, 2017
Figure 1 for DyVEDeep: Dynamic Variable Effort Deep Neural Networks
Figure 2 for DyVEDeep: Dynamic Variable Effort Deep Neural Networks
Figure 3 for DyVEDeep: Dynamic Variable Effort Deep Neural Networks
Figure 4 for DyVEDeep: Dynamic Variable Effort Deep Neural Networks
Viaarxiv icon