Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Silviu-Ioan Filip

TARAN

Mixed precision accumulation for neural network inference guided by componentwise forward error analysis

Mar 19, 2025

El-Mehdi El Arar, Silviu-Ioan Filip, Theo Mary, Elisa Riccietti

Abstract:This work proposes a mathematically founded mixed precision accumulation strategy for the inference of neural networks. Our strategy is based on a new componentwise forward error analysis that explains the propagation of errors in the forward pass of neural networks. Specifically, our analysis shows that the error in each component of the output of a layer is proportional to the condition number of the inner product between the weights and the input, multiplied by the condition number of the activation function. These condition numbers can vary widely from one component to the other, thus creating a significant opportunity to introduce mixed precision: each component should be accumulated in a precision inversely proportional to the product of these condition numbers. We propose a practical algorithm that exploits this observation: it first computes all components in low precision, uses this output to estimate the condition numbers, and recomputes in higher precision only the components associated with large condition numbers. We test our algorithm on various networks and datasets and confirm experimentally that it can significantly improve the cost--accuracy tradeoff compared with uniform precision accumulation baselines.

Via

Access Paper or Ask Questions

AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Apr 22, 2024

Cédric Gernigon, Silviu-Ioan Filip, Olivier Sentieys, Clément Coggiola, Mickael Bruno

Figure 1 for AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Figure 2 for AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Figure 3 for AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Figure 4 for AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Abstract:Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimizes weight and activation signal bit-widths during training for more efficient DNN inference. We use relaxed real-valued bit-widths that are updated using a gradient descent rule, but are otherwise discretized for all quantization operations. The result is a simple and flexible QAT approach for mixed-precision uniform quantization problems. Compared to other methods that are generally designed to be run on a pretrained network, AdaQAT works well in both training from scratch and fine-tuning scenarios.Initial results on the CIFAR-10 and ImageNet datasets using ResNet20 and ResNet18 models, respectively, indicate that our method is competitive with other state-of-the-art mixed-precision quantization approaches.

Via

Access Paper or Ask Questions

Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing

Nov 18, 2023

Cédric Gernigon, Silviu-Ioan Filip, Olivier Sentieys, Clément Coggiola, Mickaël Bruno

Figure 1 for Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing

Figure 2 for Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing

Figure 3 for Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing

Figure 4 for Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing

Abstract:One of the major bottlenecks in high-resolution Earth Observation (EO) space systems is the downlink between the satellite and the ground. Due to hardware limitations, on-board power limitations or ground-station operation costs, there is a strong need to reduce the amount of data transmitted. Various processing methods can be used to compress the data. One of them is the use of on-board deep learning to extract relevant information in the data. However, most ground-based deep neural network parameters and computations are performed using single-precision floating-point arithmetic, which is not adapted to the context of on-board processing. We propose to rely on quantized neural networks and study how to combine low precision (mini) floating-point arithmetic with a Quantization-Aware Training methodology. We evaluate our approach with a semantic segmentation task for ship detection using satellite images from the Airbus Ship dataset. Our results show that 6-bit floating-point quantization for both weights and activations can compete with single-precision without significant accuracy degradation. Using a Thin U-Net 32 model, only a 0.3% accuracy degradation is observed with 6-bit minifloat quantization (a 6-bit equivalent integer-based approach leads to a 0.5% degradation). An initial hardware study also confirms the potential impact of such low-precision floating-point designs, but further investigation at the scale of a full inference accelerator is needed before concluding whether they are relevant in a practical on-board scenario.

* 7 pages, 7 figures, EDHPC 2023

Via

Access Paper or Ask Questions

Approximations in Deep Learning

Dec 08, 2022

Etienne Dupuis, Silviu-Ioan Filip, Olivier Sentieys, David Novo, Ian O'Connor, Alberto Bosio

Abstract:The design and implementation of Deep Learning (DL) models is currently receiving a lot of attention from both industrials and academics. However, the computational workload associated with DL is often out of reach for low-power embedded devices and is still costly when run on datacenters. By relaxing the need for fully precise operations, Approximate Computing (AxC) substantially improves performance and energy efficiency. DL is extremely relevant in this context, since playing with the accuracy needed to do adequate computations will significantly enhance performance, while keeping the quality of results in a user-constrained range. This chapter will explore how AxC can improve the performance and energy efficiency of hardware accelerators in DL applications during inference and training.

* Approximate Computing Techniques - From Component- to Application-Level, pp.467-512, 2022, 978-3-030-94704-0

Via

Access Paper or Ask Questions