Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mehdi B. Tahoori

Bespoke Co-processor for Energy-Efficient Health Monitoring on RISC-V-based Flexible Wearables

Nov 08, 2025

Theofanis Vergos, Polykarpos Vergos, Mehdi B. Tahoori, Georgios Zervakis

Abstract:Flexible electronics offer unique advantages for conformable, lightweight, and disposable healthcare wearables. However, their limited gate count, large feature sizes, and high static power consumption make on-body machine learning classification highly challenging. While existing bendable RISC-V systems provide compact solutions, they lack the energy efficiency required. We present a mechanically flexible RISC-V that integrates a bespoke multiply-accumulate co-processor with fixed coefficients to maximize energy efficiency and minimize latency. Our approach formulates a constrained programming problem to jointly determine co-processor constants and optimally map Multi-Layer Perceptron (MLP) inference operations, enabling compact, model-specific hardware by leveraging the low fabrication and non-recurring engineering costs of flexible technologies. Post-layout results demonstrate near-real-time performance across several healthcare datasets, with our circuits operating within the power budget of existing flexible batteries and occupying only 2.42 mm^2, offering a promising path toward accessible, sustainable, and conformable healthcare wearables. Our microprocessors achieve an average 2.35x speedup and 2.15x lower energy consumption compared to the state of the art.

* Accepted for publication at IEEE Design, Automation & Test in Europe (DATE 2026)

Via

Access Paper or Ask Questions

Arbitrary Precision Printed Ternary Neural Networks with Holistic Evolutionary Approximation

Aug 27, 2025

Vojtech Mrazek, Konstantinos Balaskas, Paula Carolina Lozano Duarte, Zdenek Vasicek, Mehdi B. Tahoori, Georgios Zervakis

Abstract:Printed electronics offer a promising alternative for applications beyond silicon-based systems, requiring properties like flexibility, stretchability, conformality, and ultra-low fabrication costs. Despite the large feature sizes in printed electronics, printed neural networks have attracted attention for meeting target application requirements, though realizing complex circuits remains challenging. This work bridges the gap between classification accuracy and area efficiency in printed neural networks, covering the entire processing-near-sensor system design and co-optimization from the analog-to-digital interface-a major area and power bottleneck-to the digital classifier. We propose an automated framework for designing printed Ternary Neural Networks with arbitrary input precision, utilizing multi-objective optimization and holistic approximation. Our circuits outperform existing approximate printed neural networks by 17x in area and 59x in power on average, being the first to enable printed-battery-powered operation with under 5% accuracy loss while accounting for analog-to-digital interfacing costs.

* Accepted at IEEE Transactions on Circuits and Systems for Artificial Intelligence

Via

Access Paper or Ask Questions

Invited Paper: Feature-to-Classifier Co-Design for Mixed-Signal Smart Flexible Wearables for Healthcare at the Extreme Edge

Aug 27, 2025

Maha Shatta, Konstantinos Balaskas, Paula Carolina Lozano Duarte, Georgios Panagopoulos, Mehdi B. Tahoori, Georgios Zervakis

Abstract:Flexible Electronics (FE) offer a promising alternative to rigid silicon-based hardware for wearable healthcare devices, enabling lightweight, conformable, and low-cost systems. However, their limited integration density and large feature sizes impose strict area and power constraints, making ML-based healthcare systems-integrating analog frontend, feature extraction and classifier-particularly challenging. Existing FE solutions often neglect potential system-wide solutions and focus on the classifier, overlooking the substantial hardware cost of feature extraction and Analog-to-Digital Converters (ADCs)-both major contributors to area and power consumption. In this work, we present a holistic mixed-signal feature-to-classifier co-design framework for flexible smart wearable systems. To the best of our knowledge, we design the first analog feature extractors in FE, significantly reducing feature extraction cost. We further propose an hardware-aware NAS-inspired feature selection strategy within ML training, enabling efficient, application-specific designs. Our evaluation on healthcare benchmarks shows our approach delivers highly accurate, ultra-area-efficient flexible systems-ideal for disposable, low-power wearable monitoring.

* Accepted at 2025 International Conference on Computer-Aided Design (ICCAD)

Via

Access Paper or Ask Questions

Exploration of Low-Power Flexible Stress Monitoring Classifiers for Conformal Wearables

Aug 27, 2025

Florentia Afentaki, Sri Sai Rakesh Nakkilla, Konstantinos Balaskas, Paula Carolina Lozano Duarte, Shiyi Jiang, Georgios Zervakis, Farshad Firouzi, Krishnendu Chakrabarty, Mehdi B. Tahoori

Abstract:Conventional stress monitoring relies on episodic, symptom-focused interventions, missing the need for continuous, accessible, and cost-efficient solutions. State-of-the-art approaches use rigid, silicon-based wearables, which, though capable of multitasking, are not optimized for lightweight, flexible wear, limiting their practicality for continuous monitoring. In contrast, flexible electronics (FE) offer flexibility and low manufacturing costs, enabling real-time stress monitoring circuits. However, implementing complex circuits like machine learning (ML) classifiers in FE is challenging due to integration and power constraints. Previous research has explored flexible biosensors and ADCs, but classifier design for stress detection remains underexplored. This work presents the first comprehensive design space exploration of low-power, flexible stress classifiers. We cover various ML classifiers, feature selection, and neural simplification algorithms, with over 1200 flexible classifiers. To optimize hardware efficiency, fully customized circuits with low-precision arithmetic are designed in each case. Our exploration provides insights into designing real-time stress classifiers that offer higher accuracy than current methods, while being low-cost, conformable, and ensuring low power and compact size.

* Accepted for publication at the IEEE/ACM International Symposium on Low Power Electronics and Design} (ISLPED 2025)

Via

Access Paper or Ask Questions

Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons

Nov 14, 2024

Florentia Afentaki, Paula Carolina Lozano Duarte, Georgios Zervakis, Mehdi B. Tahoori

Figure 1 for Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons

Figure 2 for Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons

Figure 3 for Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons

Figure 4 for Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons

Abstract:Printed electronics technology offers a cost-effectiveand fully-customizable solution to computational needs beyondthe capabilities of traditional silicon technologies, offering ad-vantages such as on-demand manufacturing and conformal, low-cost hardware. However, the low-resolution fabrication of printedelectronics, which results in large feature sizes, poses a challengefor integrating complex designs like those of machine learn-ing (ML) classification systems. Current literature optimizes onlythe Multilayer Perceptron (MLP) circuit within the classificationsystem, while the cost of analog-to-digital converters (ADCs)is overlooked. Printed applications frequently require on-sensorprocessing, yet while the digital classifier has been extensivelyoptimized, the analog-to-digital interfacing, specifically the ADCs,dominates the total area and energy consumption. In this work,we target digital printed MLP classifiers and we propose thedesign of customized ADCs per MLP's input which involvesminimizing the distinct represented numbers for each input,simplifying thus the ADC's circuitry. Incorporating this ADCoptimization in the MLP training, enables eliminating ADC levelsand the respective comparators, while still maintaining highclassification accuracy. Our approach achieves 11.2x lower ADCarea for less than 5% accuracy drop across varying MLPs.

* This article is accepted for publication in IEEE Embedded Systems Letters

Via

Access Paper or Ask Questions

Design and In-training Optimization of Binary Search ADC for Flexible Classifiers

Oct 02, 2024

Paula Carolina Lozano Duarte, Florentia Afentaki, Georgios Zervakis, Mehdi B. Tahoori

Figure 1 for Design and In-training Optimization of Binary Search ADC for Flexible Classifiers

Figure 2 for Design and In-training Optimization of Binary Search ADC for Flexible Classifiers

Figure 3 for Design and In-training Optimization of Binary Search ADC for Flexible Classifiers

Figure 4 for Design and In-training Optimization of Binary Search ADC for Flexible Classifiers

Abstract:Flexible Electronics (FE) offer distinct advantages, including mechanical flexibility and low process temperatures, enabling extremely low-cost production. To address the demands of applications such as smart sensors and wearables, flexible devices must be small and operate at low supply voltages. Additionally, target applications often require classifiers to operate directly on analog sensory input, necessitating the use of Analog to Digital Converters (ADCs) to process the sensory data. However, ADCs present serious challenges, particularly in terms of high area and power consumption, especially when considering stringent area and energy budget. In this work, we target common classifiers in this domain such as MLPs and SVMs and present a holistic approach to mitigate the elevated overhead of analog to digital interfacing in FE. First, we propose a novel design for Binary Search ADC that reduces area overhead 2X compared with the state-of-the-art Binary design and up to 5.4X compared with Flash ADC. Next, we present an in-training ADC optimization in which we keep the bare-minimum representations required and simplifying ADCs by removing unnecessary components. Our in-training optimization further reduces on average the area in terms of transistor count of the required ADCs by 5X for less than 1% accuracy loss.

* Accepted for publication at the 30th Asia and South Pacific Design Automation Conference (ASPDAC '25). doi: https://doi.org/10.1145/3658617.3697715

Via

Access Paper or Ask Questions

Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights

May 07, 2024

Soyed Tuhin Ahmed, Michael Hefenbrock, Mehdi B. Tahoori

Figure 1 for Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights

Figure 2 for Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights

Figure 3 for Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights

Figure 4 for Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights

Abstract:The applications of artificial intelligence (AI) are rapidly evolving, and they are also commonly used in safety-critical domains, such as autonomous driving and medical diagnosis, where functional safety is paramount. In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety. Therefore, the robustness and reliability of model predictions can be improved. However, conventional uncertainty estimation methods, such as the deep ensemble method, impose high computation and, accordingly, hardware (latency and energy) overhead because they require the storage and processing of multiple models. Alternatively, Monte Carlo dropout (MC-dropout) methods, although having low memory overhead, necessitate numerous ($\sim 100$) forward passes, leading to high computational overhead and latency. Thus, these approaches are not suitable for battery-powered edge devices with limited computing and memory resources. In this paper, we propose the Tiny-Deep Ensemble approach, a low-cost approach for uncertainty estimation on edge devices. In our approach, only normalization layers are ensembled $M$ times, with all ensemble members sharing common weights and biases, leading to a significant decrease in storage requirements and latency. Moreover, our approach requires only one forward pass in a hardware architecture that allows batch processing for inference and uncertainty estimation. Furthermore, it has approximately the same memory overhead compared to a single model. Therefore, latency and memory overhead are reduced by a factor of up to $\sim M\times$. Nevertheless, our method does not compromise accuracy, with an increase in inference accuracy of up to $\sim 1\%$ and a reduction in RMSE of $17.17\%$ in various benchmark datasets, tasks, and state-of-the-art architectures.

Via

Access Paper or Ask Questions

Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Feb 05, 2024

Florentia Afentaki, Michael Hefenbrock, Georgios Zervakis, Mehdi B. Tahoori

Figure 1 for Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Figure 2 for Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Figure 3 for Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Figure 4 for Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Abstract:Printed Electronics (PE) stands out as a promisingtechnology for widespread computing due to its distinct attributes, such as low costs and flexible manufacturing. Unlike traditional silicon-based technologies, PE enables stretchable, conformal,and non-toxic hardware. However, PE are constrained by larger feature sizes, making it challenging to implement complex circuits such as machine learning (ML) classifiers. Approximate computing has been proven to reduce the hardware cost of ML circuits such as Multilayer Perceptrons (MLPs). In this paper, we maximize the benefits of approximate computing by integrating hardware approximation into the MLP training process. Due to the discrete nature of hardware approximation, we propose and implement a genetic-based, approximate, hardware-aware training approach specifically designed for printed MLPs. For a 5% accuracy loss, our MLPs achieve over 5x area and power reduction compared to the baseline while outperforming state of-the-art approximate and stochastic printed MLPs.

* Accepted for publication at the 27th Design, Automation and Test in Europe Conference (DATE'24), Mar 25-27 2024, Valencia, Spain

Via

Access Paper or Ask Questions

Enhancing Reliability of Neural Networks at the Edge: Inverted Normalization with Stochastic Affine Transformations

Jan 23, 2024

Soyed Tuhin Ahmed, Kamal Danouchi, Guillaume Prenat, Lorena Anghel, Mehdi B. Tahoori

Abstract:Bayesian Neural Networks (BayNNs) naturally provide uncertainty in their predictions, making them a suitable choice in safety-critical applications. Additionally, their realization using memristor-based in-memory computing (IMC) architectures enables them for resource-constrained edge applications. In addition to predictive uncertainty, however, the ability to be inherently robust to noise in computation is also essential to ensure functional safety. In particular, memristor-based IMCs are susceptible to various sources of non-idealities such as manufacturing and runtime variations, drift, and failure, which can significantly reduce inference accuracy. In this paper, we propose a method to inherently enhance the robustness and inference accuracy of BayNNs deployed in IMC architectures. To achieve this, we introduce a novel normalization layer combined with stochastic affine transformations. Empirical results in various benchmark datasets show a graceful degradation in inference accuracy, with an improvement of up to $58.11\%$.

Via

Access Paper or Ask Questions

NeuSpin: Design of a Reliable Edge Neuromorphic System Based on Spintronics for Green AI

Jan 11, 2024

Soyed Tuhin Ahmed, Kamal Danouchi, Guillaume Prenat, Lorena Anghel, Mehdi B. Tahoori

Abstract:Internet of Things (IoT) and smart wearable devices for personalized healthcare will require storing and computing ever-increasing amounts of data. The key requirements for these devices are ultra-low-power, high-processing capabilities, autonomy at low cost, as well as reliability and accuracy to enable Green AI at the edge. Artificial Intelligence (AI) models, especially Bayesian Neural Networks (BayNNs) are resource-intensive and face challenges with traditional computing architectures due to the memory wall problem. Computing-in-Memory (CIM) with emerging resistive memories offers a solution by combining memory blocks and computing units for higher efficiency and lower power consumption. However, implementing BayNNs on CIM hardware, particularly with spintronic technologies, presents technical challenges due to variability and manufacturing defects. The NeuSPIN project aims to address these challenges through full-stack hardware and software co-design, developing novel algorithmic and circuit design approaches to enhance the performance, energy-efficiency and robustness of BayNNs on sprintronic-based CIM platforms.

Via

Access Paper or Ask Questions