Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Piotr Wzorek

End-to-End Keyword Spotting on FPGA Using Graph Neural Networks with a Neuromorphic Auditory Sensor

May 10, 2026

Wiktor Matykiewicz, Piotr Wzorek, Kamil Jeziorek, Tomás Muñoz, Antonio Rios-Navarro, Angel Jiménez-Fernández, Tomasz Kryjak

Abstract:With the rapid growth of mobile robotics and embedded intelligence, there is an increasing demand for efficient on-device data processing on edge platforms. A promising research direction is the use of neuromorphic sensors inspired by human sensory systems, which generate sparse, event-based data encoding changes in the environment. In this work, we present the first end-to-end FPGA implementation of a keyword spotting system that integrates a Neuromorphic Auditory Sensor (NAS) and a graph neural network (GNN) on a single FPGA device, enabling real-time processing of raw audio data. The proposed architecture eliminates conventional signal preprocessing and operates directly on event-based audio streams. Leveraging a compute-near-memory network architecture, the system achieves efficient inference with low latency and low power consumption. Experimental results demonstrate an accuracy of 87.43% after quantization on the Google Speech Commands v2 dataset processed through the neuromorphic sensor, with end-to-end latency below 35 us and average power consumption of 1.12 W. The processed datasets, software models, and hardware modules are available at https://github.com/vision-agh/NAS-GNN-KWS.

* Accepted for the ARC 2026 conference

Via

Access Paper or Ask Questions

Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA

Feb 18, 2026

Kamil Jeziorek, Piotr Wzorek, Krzysztof Blachut, Hiroshi Nakano, Manon Dampfhoffer, Thomas Mesquida, Hiroaki Nishi, Thomas Dalgaty, Tomasz Kryjak

Abstract:As the volume of data recorded by embedded edge sensors increases, particularly from neuromorphic devices producing discrete event streams, there is a growing need for hardware-aware neural architectures that enable efficient, low-latency, and energy-conscious local processing. We present an FPGA implementation of event-graph neural networks for audio processing. We utilise an artificial cochlea that converts time-series signals into sparse event data, reducing memory and computation costs. Our architecture was implemented on a SoC FPGA and evaluated on two open-source datasets. For classification task, our baseline floating-point model achieves 92.7% accuracy on SHD dataset - only 2.4% below the state of the art - while requiring over 10x and 67x fewer parameters. On SSC, our models achieve 66.9-71.0% accuracy. Compared to FPGA-based spiking neural networks, our quantised model reaches 92.3% accuracy, outperforming them by up to 19.3% while reducing resource usage and latency. For SSC, we report the first hardware-accelerated evaluation. We further demonstrate the first end-to-end FPGA implementation of event-audio keyword spotting, combining graph convolutional layers with recurrent sequence modelling. The system achieves up to 95% word-end detection accuracy, with only 10.53 microsecond latency and 1.18 W power consumption, establishing a strong benchmark for energy-efficient event-driven KWS.

* Under revision in TRETS Journal

Via

Access Paper or Ask Questions

Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA

Mar 09, 2025

Hiroshi Nakano, Krzysztof Blachut, Kamil Jeziorek, Piotr Wzorek, Manon Dampfhoffer, Thomas Mesquida, Hiroaki Nishi, Tomasz Kryjak, Thomas Dalgaty

Figure 1 for Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA

Figure 2 for Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA

Figure 3 for Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA

Figure 4 for Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA

Abstract:As the quantities of data recorded by embedded edge sensors grow, so too does the need for intelligent local processing. Such data often comes in the form of time-series signals, based on which real-time predictions can be made locally using an AI model. However, a hardware-software approach capable of making low-latency predictions with low power consumption is required. In this paper, we present a hardware implementation of an event-graph neural network for time-series classification. We leverage an artificial cochlea model to convert the input time-series signals into a sparse event-data format that allows the event-graph to drastically reduce the number of calculations relative to other AI methods. We implemented the design on a SoC FPGA and applied it to the real-time processing of the Spiking Heidelberg Digits (SHD) dataset to benchmark our approach against competitive solutions. Our method achieves a floating-point accuracy of 92.7% on the SHD dataset for the base model, which is only 2.4% and 2% less than the state-of-the-art models with over 10% and 67% fewer model parameters, respectively. It also outperforms FPGA-based spiking neural network implementations by 19.3% and 4.5%, achieving 92.3% accuracy for the quantised model while using fewer computational resources and reducing latency.

* Paper accepted for the 21st International Symposium on Applied Reconfigurable Computing ARC 2025, Sevilla, Spain, April 9-11, 2025

Via

Access Paper or Ask Questions

Increasing the scalability of graph convolution for FPGA-implemented event-based vision

Nov 06, 2024

Piotr Wzorek, Kamil Jeziorek, Tomasz Kryjak, Andrea Pinna

Figure 1 for Increasing the scalability of graph convolution for FPGA-implemented event-based vision

Figure 2 for Increasing the scalability of graph convolution for FPGA-implemented event-based vision

Figure 3 for Increasing the scalability of graph convolution for FPGA-implemented event-based vision

Abstract:Event cameras are becoming increasingly popular as an alternative to traditional frame-based vision sensors, especially in mobile robotics. Taking full advantage of their high temporal resolution, high dynamic range, low power consumption and sparsity of event data, which only reflects changes in the observed scene, requires both an efficient algorithm and a specialised hardware platform. A recent trend involves using Graph Convolutional Neural Networks (GCNNs) implemented on a heterogeneous SoC FPGA. In this paper we focus on optimising hardware modules for graph convolution to allow flexible selection of the FPGA resource (BlockRAM, DSP and LUT) for their implementation. We propose a ''two-step convolution'' approach that utilises additional BRAM buffers in order to reduce up to 94% of LUT usage for multiplications. This method significantly improves the scalability of GCNNs, enabling the deployment of models with more layers, larger graphs sizes and their application for more dynamic scenarios.

* Accepted for the PhD forum during FPT 2024 (International Conference on Field Programmable Technology), 10-12 December 2024, Sydney, Australia

Via

Access Paper or Ask Questions

Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs

Jun 11, 2024

Kamil Jeziorek, Piotr Wzorek, Krzysztof Blachut, Andrea Pinna, Tomasz Kryjak

Abstract:The utilisation of event cameras represents an important and swiftly evolving trend aimed at addressing the constraints of traditional video systems. Particularly within the automotive domain, these cameras find significant relevance for their integration into embedded real-time systems due to lower latency and energy consumption. One effective approach to ensure the necessary throughput and latency for event processing systems is through the utilisation of graph convolutional networks (GCNs). In this study, we introduce a series of hardware-aware optimisations tailored for PointNet++, a GCN architecture designed for point cloud processing. The proposed techniques result in more than a 100-fold reduction in model size compared to Asynchronous Event-based GNN (AEGNN), one of the most recent works in the field, with a relatively small decrease in accuracy (2.3% for N-Caltech101 classification, 1.7% for N-Cars classification), thus following the TinyML trend. Based on software research, we designed a custom EFGCN (Event-Based FPGA-accelerated Graph Convolutional Network) and we implemented it on ZCU104 SoC FPGA platform, achieving a throughput of 13.3 million events per second (MEPS) and real-time partially asynchronous processing with a latency of 4.47 ms. We also address the scalability of the proposed hardware model to improve the obtained accuracy score. To the best of our knowledge, this study marks the first endeavour in accelerating PointNet++ networks on SoC FPGAs, as well as the first hardware architecture exploration of graph convolutional networks implementation for real-time continuous event data processing. We publish both software and hardware source code in an open repository: https://github.com/vision-agh/*** (will be published upon acceptance).

* Submitted to the IEEE Transactions on Circuits and System for Video Technology. This manuscript was first submitted for publication on March 31, 2024. It has since been revised twice: on May 22, 2024 and June 10, 2024

Via

Access Paper or Ask Questions

Optimising Graph Representation for Hardware Implementation of Graph Convolutional Networks for Event-based Vision

Jan 10, 2024

Kamil Jeziorek, Piotr Wzorek, Krzysztof Blachut, Andrea Pinna, Tomasz Kryjak

Abstract:Event-based vision is an emerging research field involving processing data generated by Dynamic Vision Sensors (neuromorphic cameras). One of the latest proposals in this area are Graph Convolutional Networks (GCNs), which allow to process events in its original sparse form while maintaining high detection and classification performance. In this paper, we present the hardware implementation of a~graph generation process from an event camera data stream, taking into account both the advantages and limitations of FPGAs. We propose various ways to simplify the graph representation and use scaling and quantisation of values. We consider both undirected and directed graphs that enable the use of PointNet convolution. The results obtained show that by appropriately modifying the graph representation, it is possible to create a~hardware module for graph generation. Moreover, the proposed modifications have no significant impact on object detection performance, only 0.08% mAP less for the base model and the N-Caltech data set.Finally, we describe the proposed hardware architecture of the graph generation module.

* Paper was accepted for the DASIP 2024 workshop in conjunction with HiPEAC 2024 (Munich, Germany)

Via

Access Paper or Ask Questions

Pedestrian detection with high-resolution event camera

May 29, 2023

Piotr Wzorek, Tomasz Kryjak

Figure 1 for Pedestrian detection with high-resolution event camera

Abstract:Despite the dynamic development of computer vision algorithms, the implementation of perception and control systems for autonomous vehicles such as drones and self-driving cars still poses many challenges. A video stream captured by traditional cameras is often prone to problems such as motion blur or degraded image quality due to challenging lighting conditions. In addition, the frame rate - typically 30 or 60 frames per second - can be a limiting factor in certain scenarios. Event cameras (DVS -- Dynamic Vision Sensor) are a potentially interesting technology to address the above mentioned problems. In this paper, we compare two methods of processing event data by means of deep learning for the task of pedestrian detection. We used a representation in the form of video frames, convolutional neural networks and asynchronous sparse convolutional neural networks. The results obtained illustrate the potential of event cameras and allow the evaluation of the accuracy and efficiency of the methods used for high-resolution (1280 x 720 pixels) footage.

* Accepted for the PP-RAI'2023 - 4th Polish Conference on Artificial Intelligence

Via

Access Paper or Ask Questions

Traffic Sign Detection With Event Cameras and DCNN

Jul 27, 2022

Piotr Wzorek, Tomasz Kryjak

Figure 1 for Traffic Sign Detection With Event Cameras and DCNN

Figure 2 for Traffic Sign Detection With Event Cameras and DCNN

Figure 3 for Traffic Sign Detection With Event Cameras and DCNN

Figure 4 for Traffic Sign Detection With Event Cameras and DCNN

Abstract:In recent years, event cameras (DVS - Dynamic Vision Sensors) have been used in vision systems as an alternative or supplement to traditional cameras. They are characterised by high dynamic range, high temporal resolution, low latency, and reliable performance in limited lighting conditions -- parameters that are particularly important in the context of advanced driver assistance systems (ADAS) and self-driving cars. In this work, we test whether these rather novel sensors can be applied to the popular task of traffic sign detection. To this end, we analyse different representations of the event data: event frame, event frequency, and the exponentially decaying time surface, and apply video frame reconstruction using a deep neural network called FireNet. We use the deep convolutional neural network YOLOv4 as a detector. For particular representations, we obtain a detection accuracy in the range of 86.9-88.9% mAP@0.5. The use of a fusion of the considered representations allows us to obtain a detector with higher accuracy of 89.9% mAP@0.5. In comparison, the detector for the frames reconstructed with FireNet is characterised by an accuracy of 72.67% mAP@0.5. The results obtained illustrate the potential of event cameras in automotive applications, either as standalone sensors or in close cooperation with typical frame-based cameras.

* Accepted for the SPA 2022 conference, Poznan, Poland

Via

Access Paper or Ask Questions

Training dataset generation for bridge game registration

Sep 24, 2021

Piotr Wzorek, Tomasz Kryjak

Abstract:This paper presents a method for automatic generation of a training dataset for a deep convolutional neural network used for playing card detection. The solution allows to skip the time-consuming processes of manual image collecting and labelling recognised objects. The YOLOv4 network trained on the generated dataset achieved an efficiency of 99.8% in the cards detection task. The proposed method is a part of a project that aims to automate the process of broadcasting duplicate bridge competitions using a vision system and neural networks.

* Submitted to Zeszyty Studenckiego Towarzystwa Naukowego, ISSN 1732-0925

Via

Access Paper or Ask Questions