Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Paul Devos

BAT: Better Audio Transformer Guided by Convex Gated Probing

Feb 18, 2026

Houtan Ghaffari, Lukas Rauch, Christoph Scholz, Paul Devos

Abstract:Probing is widely adopted in computer vision to faithfully evaluate self-supervised learning (SSL) embeddings, as fine-tuning may misrepresent their inherent quality. In contrast, audio SSL models still rely on fine-tuning because simple probing fails to unlock their full potential and alters their rankings when competing for SOTA on AudioSet. Hence, a robust and efficient probing mechanism is required to guide the trajectory of audio SSL towards reliable and reproducible methods. We introduce Convex Gated Probing (CGP), a prototype-based method that drastically closes the gap between fine-tuning and probing in audio. CGP efficiently utilizes all frozen layers via a gating mechanism and exposes the location of latent task-relevant information. Guided by CGP, we rework the entire SSL pipeline of current SOTA audio models that use legacy implementations of prior SSL methods. By refining data preprocessing, model architecture, and pre-training recipe, we introduce Better Audio Transformer (BAT), and establish new SOTA on audio benchmarks.

Via

Access Paper or Ask Questions

Batch Normalization-Free Fully Integer Quantized Neural Networks via Progressive Tandem Learning

Dec 18, 2025

Pengfei Sun, Wenyu Jiang, Piew Yoong Chee, Paul Devos, Dick Botteldooren

Abstract:Quantised neural networks (QNNs) shrink models and reduce inference energy through low-bit arithmetic, yet most still depend on a running statistics batch normalisation (BN) layer, preventing true integer-only deployment. Prior attempts remove BN by parameter folding or tailored initialisation; while helpful, they rarely recover BN's stability and accuracy and often impose bespoke constraints. We present a BN-free, fully integer QNN trained via a progressive, layer-wise distillation scheme that slots into existing low-bit pipelines. Starting from a pretrained BN-enabled teacher, we use layer-wise targets and progressive compensation to train a student that performs inference exclusively with integer arithmetic and contains no BN operations. On ImageNet with AlexNet, the BN-free model attains competitive Top-1 accuracy under aggressive quantisation. The procedure integrates directly with standard quantisation workflows, enabling end-to-end integer-only inference for resource-constrained settings such as edge and embedded devices.

Via

Access Paper or Ask Questions

Data-Efficient Self-Supervised Algorithms for Fine-Grained Birdsong Analysis

Nov 15, 2025

Houtan Ghaffari, Lukas Rauch, Paul Devos

Abstract:Many bioacoustics, neuroscience, and linguistics research utilize birdsongs as proxy models to acquire knowledge in diverse areas. Developing models generally requires precisely annotated data at the level of syllables. Hence, automated and data-efficient methods that reduce annotation costs are in demand. This work presents a lightweight, yet performant neural network architecture for birdsong annotation called Residual-MLP-RNN. Then, it presents a robust three-stage training pipeline for developing reliable deep birdsong syllable detectors with minimal expert labor. The first stage is self-supervised learning from unlabeled data. Two of the most successful pretraining paradigms are explored, namely, masked prediction and online clustering. The second stage is supervised training with effective data augmentations to create a robust model for frame-level syllable detection. The third stage is semi-supervised post-training, which leverages the unlabeled data again. However, unlike the initial phase, this time it is aligned with the downstream task. The performance of this data-efficient approach is demonstrated for the complex song of the Canary in extreme label-scarcity scenarios. Canary has one of the most difficult songs to annotate, which implicitly validates the method for other birds. Finally, the potential of self-supervised embeddings is assessed for linear probing and unsupervised birdsong analysis.

Via

Access Paper or Ask Questions

Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition

Apr 26, 2024

Houtan Ghaffari, Paul Devos

Figure 1 for Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition

Figure 2 for Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition

Abstract:Transferring the weights of a pre-trained model to assist another task has become a crucial part of modern deep learning, particularly in data-scarce scenarios. Pre-training refers to the initial step of training models outside the current task of interest, typically on another dataset. It can be done via supervised models using human-annotated datasets or self-supervised models trained on unlabeled datasets. In both cases, many pre-trained models are available to fine-tune for the task of interest. Interestingly, research has shown that pre-trained models from ImageNet can be helpful for audio tasks despite being trained on image datasets. Hence, it's unclear whether in-domain models would be advantageous compared to competent out-domain models, such as convolutional neural networks from ImageNet. Our experiments will demonstrate the usefulness of in-domain models and datasets for bird species recognition by leveraging VICReg, a recent and powerful self-supervised method.

Via

Access Paper or Ask Questions

EEG decoding with conditional identification information

Mar 21, 2024

Pengfei Sun, Jorg De Winne, Paul Devos, Dick Botteldooren

Figure 1 for EEG decoding with conditional identification information

Figure 2 for EEG decoding with conditional identification information

Figure 3 for EEG decoding with conditional identification information

Figure 4 for EEG decoding with conditional identification information

Abstract:Decoding EEG signals is crucial for unraveling human brain and advancing brain-computer interfaces. Traditional machine learning algorithms have been hindered by the high noise levels and inherent inter-person variations in EEG signals. Recent advances in deep neural networks (DNNs) have shown promise, owing to their advanced nonlinear modeling capabilities. However, DNN still faces challenge in decoding EEG samples of unseen individuals. To address this, this paper introduces a novel approach by incorporating the conditional identification information of each individual into the neural network, thereby enhancing model representation through the synergistic interaction of EEG and personal traits. We test our model on the WithMe dataset and demonstrated that the inclusion of these identifiers substantially boosts accuracy for both subjects in the training set and unseen subjects. This enhancement suggests promising potential for improving for EEG interpretability and understanding of relevant identification features.

* Accepted by 6th International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI' 2024)

Via

Access Paper or Ask Questions

Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Oct 23, 2023

Pengfei Sun, Jibin Wu, Malu Zhang, Paul Devos, Dick Botteldooren

Figure 1 for Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Figure 2 for Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Figure 3 for Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Figure 4 for Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Abstract:Recurrent Neural Networks (RNNs) are renowned for their adeptness in modeling temporal dependencies, a trait that has driven their widespread adoption for sequential data processing. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs tend to be over-parameterized, resulting in poor network generalization. To address these challenges, we propose a novel Delayed Memory Unit (DMU) in this paper, wherein a delay line structure, coupled with delay gates, is introduced to facilitate temporal interaction and temporal credit assignment, so as to enhance the temporal modeling capabilities of vanilla RNNs. Particularly, the DMU is designed to directly distribute the input information to the optimal time instant in the future, rather than aggregating and redistributing it over time through intricate network dynamics. Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks, utilizing considerably fewer parameters than other state-of-the-art gated RNN models in applications such as speech recognition, radar gesture recognition, ECG waveform segmentation, and permuted sequential image classification.

Via

Access Paper or Ask Questions

Adaptive Axonal Delays in feedforward spiking neural networks for accurate spoken word recognition

Feb 16, 2023

Pengfei Sun, Ehsan Eqlimi, Yansong Chua, Paul Devos, Dick Botteldooren

Figure 1 for Adaptive Axonal Delays in feedforward spiking neural networks for accurate spoken word recognition

Figure 2 for Adaptive Axonal Delays in feedforward spiking neural networks for accurate spoken word recognition

Figure 3 for Adaptive Axonal Delays in feedforward spiking neural networks for accurate spoken word recognition

Figure 4 for Adaptive Axonal Delays in feedforward spiking neural networks for accurate spoken word recognition

Abstract:Spiking neural networks (SNN) are a promising research avenue for building accurate and efficient automatic speech recognition systems. Recent advances in audio-to-spike encoding and training algorithms enable SNN to be applied in practical tasks. Biologically-inspired SNN communicates using sparse asynchronous events. Therefore, spike-timing is critical to SNN performance. In this aspect, most works focus on training synaptic weights and few have considered delays in event transmission, namely axonal delay. In this work, we consider a learnable axonal delay capped at a maximum value, which can be adapted according to the axonal delay distribution in each network layer. We show that our proposed method achieves the best classification results reported on the SHD dataset (92.45%) and NTIDIGITS dataset (95.09%). Our work illustrates the potential of training axonal delays for tasks with complex temporal structures.

* Accepted by ICASSP 2023

Via

Access Paper or Ask Questions