We present a fully memristive spiking neural network (MSNN) consisting of novel memristive neurons trained using the backpropagation through time (BPTT) learning rule. Gradient descent is applied directly to the memristive integrated-and-fire (MIF) neuron designed using analog SPICE circuit models, which generates distinct depolarization, hyperpolarization, and repolarization voltage waveforms. Synaptic weights are trained by BPTT using the membrane potential of the MIF neuron model and can be processed on memristive crossbars. The natural spiking dynamics of the MIF neuron model are fully differentiable, eliminating the need for gradient approximations that are prevalent in the spiking neural network literature. Despite the added complexity of training directly on SPICE circuit models, we achieve 97.58% accuracy on the MNIST testing dataset and 75.26% on the Fashion-MNIST testing dataset, the highest accuracies among all fully MSNNs.
Spiking and Quantized Neural Networks (NNs) are becoming exceedingly important for hyper-efficient implementations of Deep Learning (DL) algorithms. However, these networks face challenges when trained using error backpropagation, due to the absence of gradient signals when applying hard thresholds. The broadly accepted trick to overcoming this is through the use of biased gradient estimators: surrogate gradients which approximate thresholding in Spiking Neural Networks (SNNs), and Straight-Through Estimators (STEs), which completely bypass thresholding in Quantized Neural Networks (QNNs). While noisy gradient feedback has enabled reasonable performance on simple supervised learning tasks, it is thought that such noise increases the difficulty of finding optima in loss landscapes, especially during the later stages of optimization. By periodically boosting the Learning Rate (LR) during training, we expect the network can navigate unexplored solution spaces that would otherwise be difficult to reach due to local minima, barriers, or flat surfaces. This paper presents a systematic evaluation of a cosine-annealed LR schedule coupled with weight-independent adaptive moment estimation as applied to Quantized SNNs (QSNNs). We provide a rigorous empirical evaluation of this technique on high precision and 4-bit quantized SNNs across three datasets, demonstrating (close to) state-of-the-art performance on the more complex datasets. Our source code is available at this link: https://github.com/jeshraghian/QSNNs.
Spiking neural networks can compensate for quantization error by encoding information either in the temporal domain, or by processing discretized quantities in hidden states of higher precision. In theory, a wide dynamic range state-space enables multiple binarized inputs to be accumulated together, thus improving the representational capacity of individual neurons. This may be achieved by increasing the firing threshold, but make it too high and sparse spike activity turns into no spike emission. In this paper, we propose the use of `threshold annealing' as a warm-up method for firing thresholds. We show it enables the propagation of spikes across multiple layers where neurons would otherwise cease to fire, and in doing so, achieve highly competitive results on four diverse datasets, despite using binarized weights. Source code is available at https://github.com/jeshraghian/snn-tha/
The impact of device and circuit-level effects in mixed-signal Resistive Random Access Memory (RRAM) accelerators typically manifest as performance degradation of Deep Learning (DL) algorithms, but the degree of impact varies based on algorithmic features. These include network architecture, capacity, weight distribution, and the type of inter-layer connections. Techniques are continuously emerging to efficiently train sparse neural networks, which may have activation sparsity, quantization, and memristive noise. In this paper, we present an extended Design Space Exploration (DSE) methodology to quantify the benefits and limitations of dense and sparse mapping schemes for a variety of network architectures. While sparsity of connectivity promotes less power consumption and is often optimized for extracting localized features, its performance on tiled RRAM arrays may be more susceptible to noise due to under-parameterization, when compared to dense mapping schemes. Moreover, we present a case study quantifying and formalizing the trade-offs of typical non-idealities introduced into 1-Transistor-1-Resistor (1T1R) tiled memristive architectures and the size of modular crossbar tiles using the CIFAR-10 dataset.
The brain is the perfect place to look for inspiration to develop more efficient neural networks. The inner workings of our synapses and neurons provide a glimpse at what the future of deep learning might look like. This paper serves as a tutorial and perspective showing how to apply the lessons learnt from several decades of research in deep learning, gradient descent, backpropagation and neuroscience to biologically plausible spiking neural neural networks. We also explore the delicate interplay between encoding data as spikes and the learning process; the challenges and solutions of applying gradient-based learning to spiking neural networks; the subtle link between temporal backpropagation and spike timing dependent plasticity, and how deep learning might move towards biologically plausible online learning. Some ideas are well accepted and commonly used amongst the neuromorphic engineering community, while others are presented or justified for the first time here. A series of companion interactive tutorials complementary to this paper using our Python package, snnTorch, are also made available: https://snntorch.readthedocs.io/en/latest/tutorials/index.html
Stochastic Computing (SC) is a computing paradigm that allows for the low-cost and low-power computation of various arithmetic operations using stochastic bit streams and digital logic. In contrast to conventional representation schemes used within the binary domain, the sequence of bit streams in the stochastic domain is inconsequential, and computation is usually non-deterministic. In this brief, we exploit the stochasticity during switching of probabilistic Conductive Bridging RAM (CBRAM) devices to efficiently generate stochastic bit streams in order to perform Deep Learning (DL) parameter optimization, reducing the size of Multiply and Accumulate (MAC) units by 5 orders of magnitude. We demonstrate that in using a 40-nm Complementary Metal Oxide Semiconductor (CMOS) process our scalable architecture occupies 1.55mm$^2$ and consumes approximately 167$\mu$W when optimizing parameters of a Convolutional Neural Network (CNN) while it is being trained for a character recognition task, observing no notable reduction in accuracy post-training.
Deep neural network inference accelerators are rapidly growing in importance as we turn to massively parallelized processing beyond GPUs and ASICs. The dominant operation in feedforward inference is the multiply-and-accumlate process, where each column in a crossbar generates the current response of a single neuron. As a result, memristor crossbar arrays parallelize inference and image processing tasks very efficiently. In this brief, we present a 3-D active memristor crossbar array `CrossStack', which adopts stacked pairs of Al/TiO2/TiO2-x/Al devices with common middle electrodes. By designing CMOS-memristor hybrid cells used in the layout of the array, CrossStack can operate in one of two user-configurable modes as a reconfigurable inference engine: 1) expansion mode and 2) deep-net mode. In expansion mode, the resolution of the network is doubled by increasing the number of inputs for a given chip area, reducing IR drop by 22%. In deep-net mode, inference speed per-10-bit convolution is improved by 29\% by simultaneously using one TiO2/TiO2-x layer for read processes, and the other for write processes. We experimentally verify both modes on our $10\times10\times2$ array.
With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors, new opportunities are emerging for applying deep and Spiking Neural Network (SNN) algorithms to healthcare and biomedical applications at the edge. This can facilitate the advancement of the medical Internet of Things (IoT) systems and Point of Care (PoC) devices. In this paper, we provide a tutorial describing how various technologies ranging from emerging memristive devices, to established Field Programmable Gate Arrays (FPGAs), and mature Complementary Metal Oxide Semiconductor (CMOS) technology can be used to develop efficient DL accelerators to solve a wide variety of diagnostic, pattern recognition, and signal processing problems in healthcare. Furthermore, we explore how spiking neuromorphic processors can complement their DL counterparts for processing biomedical signals. After providing the required background, we unify the sparsely distributed research on neural network and neuromorphic hardware implementations as applied to the healthcare domain. In addition, we benchmark various hardware platforms by performing a biomedical electromyography (EMG) signal processing task and drawing comparisons among them in terms of inference delay and energy. Finally, we provide our analysis of the field and share a perspective on the advantages, disadvantages, challenges, and opportunities that different accelerators and neuromorphic processors introduce to healthcare and biomedical domains. This paper can serve a large audience, ranging from nanoelectronics researchers, to biomedical and healthcare practitioners in grasping the fundamental interplay between hardware, algorithms, and clinical adoption of these tools, as we shed light on the future of deep networks and spiking neuromorphic processing systems as proponents for driving biomedical circuits and systems forward.