Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ming-Jay Yang

QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models

Jul 08, 2025

Sebastian Siegel, Ming-Jay Yang, Younes Bouhadjar, Maxime Fabre, Emre Neftci, John Paul Strachan

Figure 1 for QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models

Figure 2 for QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models

Figure 3 for QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models

Figure 4 for QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models

Abstract:Structured State Space models (SSM) have recently emerged as a new class of deep learning models, particularly well-suited for processing long sequences. Their constant memory footprint, in contrast to the linearly scaling memory demands of Transformers, makes them attractive candidates for deployment on resource-constrained edge-computing devices. While recent works have explored the effect of quantization-aware training (QAT) on SSMs, they typically do not address its implications for specialized edge hardware, for example, analog in-memory computing (AIMC) chips. In this work, we demonstrate that QAT can significantly reduce the complexity of SSMs by up to two orders of magnitude across various performance metrics. We analyze the relation between model size and numerical precision, and show that QAT enhances robustness to analog noise and enables structural pruning. Finally, we integrate these techniques to deploy SSMs on a memristive analog in-memory computing substrate and highlight the resulting benefits in terms of computational efficiency.

Via

Access Paper or Ask Questions

IMSSA: Deploying modern state-space models on memristive in-memory compute hardware

Dec 28, 2024

Sebastian Siegel, Ming-Jay Yang, John-Paul Strachan

Figure 1 for IMSSA: Deploying modern state-space models on memristive in-memory compute hardware

Figure 2 for IMSSA: Deploying modern state-space models on memristive in-memory compute hardware

Figure 3 for IMSSA: Deploying modern state-space models on memristive in-memory compute hardware

Figure 4 for IMSSA: Deploying modern state-space models on memristive in-memory compute hardware

Abstract:Processing long temporal sequences is a key challenge in deep learning. In recent years, Transformers have become state-of-the-art for this task, but suffer from excessive memory requirements due to the need to explicitly store the sequences. To address this issue, structured state-space sequential (S4) models recently emerged, offering a fixed memory state while still enabling the processing of very long sequence contexts. The recurrent linear update of the state in these models makes them highly efficient on modern graphics processing units (GPU) by unrolling the recurrence into a convolution. However, this approach demands significant memory and massively parallel computation, which is only available on the latest GPUs. In this work, we aim to bring the power of S4 models to edge hardware by significantly reducing the size and computational demand of an S4D model through quantization-aware training, even achieving ternary weights for a simple real-world task. To this end, we extend conventional quantization-aware training to tailor it for analog in-memory compute hardware. We then demonstrate the deployment of recurrent S4D kernels on memrisitve crossbar arrays, enabling their computation in an in-memory compute fashion. To our knowledge, this is the first implementation of S4 kernels on in-memory compute hardware.

* 5 pages, 4 figures, submitted to IEEE ISCAS 2025

Via

Access Paper or Ask Questions