Abstract:Spiking Neural Networks (SNNs) provide an attractive framework for energy-efficient inference due to their event-driven computation and biologically inspired dynamics. However, efficient hardware realization of SNNs remains challenging because neuronal computations incur significant power and area costs, and uncontrolled approximate arithmetic can destabilize recurrent state updates when precision is not properly managed. To address these challenges, this paper presents ReSCom, a reconfigurable SNN accelerator that leverages stochastic computing to reduce hardware complexity while maintaining stable inference. The proposed architecture employs stochastic arithmetic for multiplication operations in neuron dynamics, while preserving exact fixed-point addition/subtraction operations. This stochastic strategy enables runtime trade-offs between accuracy, latency, and energy consumption. A unified reconfigurable neuron design supports Integrate-and-Fire (IF), Leaky Integrate-and-Fire (LIF), and Synaptic neuron models within a single hardware framework. Experimental results for MNIST inference on a Xilinx Artix-7 FPGA show that ReSCom achieves $92.80\%$ classification accuracy while consuming just $0.05~\mathrm{mJ}$ of operational energy per image at $100~\mathrm{MHz}$, outperforming the energy efficiency of recent state-of-the-art implementations. Furthermore, managing the stochastic bit-stream length allows explicit, dynamic control over accuracy-latency-energy trade-offs to meet target application constraints.
Abstract:Spiking Neural Networks (SNNs) offer a brain-inspired path toward highly efficient computation, but their practical deployment is constrained by the challenge of managing and executing their massive parallelism on physical hardware. This problem mirrors the historical challenge in processor design of moving beyond serial execution, a barrier broken by superscalar architectures that dispatch multiple instructions to parallel functional units. Drawing inspiration from this paradigm, we introduce a hardware-software co-design framework that treats synaptic events as parallelizable micro-operations. We present SupraSNN, a superscalar-inspired architecture that achieves high synapse-level parallelism by physically decoupling synaptic and neuronal computations. Within this architecture, a Multi-Cast Tree routes spike data to multiple parallel Synapse Processing Units serve as the computational pipelines, while a Merge Tree consolidates distributed results for processing by a unified Neuron Unit--deliberately centralizing complex neuron state dynamics to mitigate hardware overhead and resource duplication. The efficacy of this architecture is enabled by a sophisticated partitioning and scheduling framework that first maps the SNN onto hardware respecting memory constraints, then heuristic scheduling determines the synaptic execution order, maximizing throughput and resource utilization. Implementing a feedforward SNN trained on MNIST (93.44% accuracy), SupraSNN achieves 149 $μs$ inference latency and 0.025 mJ per image (0.276 nJ per synapse) on the Xilinx Zynq XC7Z020 FPGA--delivering 47.6% lower latency and 5.6$\times$ better energy efficiency than prior FPGA-based SNN accelerators. Beyond vision tasks, a recurrent SNN on the Spiking Heidelberg Dataset (71.82% accuracy) achieves 1.41 ms latency and 0.77 mJ per sample on XC7Z030.