Picture for Andrey Kuzmin

Andrey Kuzmin

GPTVQ: The Blessing of Dimensionality for LLM Quantization

Add code
Feb 23, 2024
Figure 1 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 2 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 3 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 4 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Viaarxiv icon

Pruning vs Quantization: Which is Better?

Add code
Jul 06, 2023
Figure 1 for Pruning vs Quantization: Which is Better?
Figure 2 for Pruning vs Quantization: Which is Better?
Figure 3 for Pruning vs Quantization: Which is Better?
Figure 4 for Pruning vs Quantization: Which is Better?
Viaarxiv icon

FP8 versus INT8 for efficient deep learning inference

Add code
Mar 31, 2023
Figure 1 for FP8 versus INT8 for efficient deep learning inference
Figure 2 for FP8 versus INT8 for efficient deep learning inference
Figure 3 for FP8 versus INT8 for efficient deep learning inference
Figure 4 for FP8 versus INT8 for efficient deep learning inference
Viaarxiv icon

FP8 Quantization: The Power of the Exponent

Add code
Aug 19, 2022
Figure 1 for FP8 Quantization: The Power of the Exponent
Figure 2 for FP8 Quantization: The Power of the Exponent
Figure 3 for FP8 Quantization: The Power of the Exponent
Figure 4 for FP8 Quantization: The Power of the Exponent
Viaarxiv icon

Quantized Sparse Weight Decomposition for Neural Network Compression

Add code
Jul 22, 2022
Figure 1 for Quantized Sparse Weight Decomposition for Neural Network Compression
Figure 2 for Quantized Sparse Weight Decomposition for Neural Network Compression
Figure 3 for Quantized Sparse Weight Decomposition for Neural Network Compression
Figure 4 for Quantized Sparse Weight Decomposition for Neural Network Compression
Viaarxiv icon

Cyclical Pruning for Sparse Neural Networks

Add code
Feb 02, 2022
Figure 1 for Cyclical Pruning for Sparse Neural Networks
Figure 2 for Cyclical Pruning for Sparse Neural Networks
Figure 3 for Cyclical Pruning for Sparse Neural Networks
Figure 4 for Cyclical Pruning for Sparse Neural Networks
Viaarxiv icon

Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks

Add code
Dec 20, 2019
Figure 1 for Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks
Figure 2 for Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks
Figure 3 for Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks
Figure 4 for Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks
Viaarxiv icon

End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo

Add code
Nov 17, 2016
Figure 1 for End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo
Figure 2 for End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo
Figure 3 for End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo
Figure 4 for End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo
Viaarxiv icon