Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marelie H. Davel

REACH: Interpretability-Driven Feature Identification and Architecture Compression for Multi-Channel Vehicular Channel Estimation

Jun 10, 2026

Simbarashe Aldrin Ngorima, Albert Helberg, Marelie H. Davel

Abstract:Multi-channel mixed-SNR training improves out-of-distribution (OOD) generalisation of deep learning channel estimators for IEEE 802.11p vehicular communications, yet the internal mechanism responsible for this remains unexplained. This work presents REACH (Relevance-based Explanation and Architectural Compression for cHannel estimators), a gradient-based interpretability framework that operates at two levels. Input-level attribution identifies a subset of time-frequency features consistently relevant across all evaluated channel conditions, enabling input dimensionality reduction with minimal performance loss. Filter-level attribution reveals a near-universal internal representation, providing a representational account of the observed OOD generalisation. Guided by the resulting filter taxonomy, relevance-guided architecture compression substantially reduces both the number of parameters and the number of floating-point operations (FLOPs) with sub-1 dB normalised mean square error (NMSE) degradation, and OOD generalisation degrades more slowly than within-distribution accuracy under increasing compression.

* 22 pages, 16 figures

Via

Access Paper or Ask Questions

Feature extraction for plant growth estimation

Jun 10, 2026

Simbarashe Aldrin Ngorima, Albert Helberg, Marelie H. Davel

Abstract:Precision agriculture requires the estimation of plant growth stages in real-time. When the plant growth stage is known, the wastage of resources in cultivation, such as nutrients and water, is reduced as only the required resources need to be supplied. Plants at different growth stages, however, have similar morphological features, which can make autonomous growth stage estimation difficult. This paper presents two feature extraction methods for growth stage estimation: one that uses a bank of Gabor filters and morphological operations, and the other that uses pre-trained convolutional neural networks (CNNs) and transfer learning. We test these methods on a publicly available plant growth stage dataset (``bccr-segset``) for two species, canola and radish, grown and captured under indoor conditions. The two proposed feature extraction methods are compared, using support vector machines and boosted trees as classifiers. We find that both methods are suitable for real-time applications, and that CNN features outperform the hand-crafted features, both with regard to speed and accuracy. The best system (VGG-19 features, classified with a radial basis function support vector machine) obtained an accuracy of 98.4% for both species, processing an image in 0.08 seconds.

* Artificial Intelligence Research. SACAIR 2025. Communications in Computer and Information Science, vol 2784. Springer, Cham (2026)
* 13 pages

Via

Access Paper or Ask Questions

Does simple trump complex? Comparing strategies for adversarial robustness in DNNs

Aug 25, 2025

William Brooks, Marelie H. Davel, Coenraad Mouton

Figure 1 for Does simple trump complex? Comparing strategies for adversarial robustness in DNNs

Abstract:Deep Neural Networks (DNNs) have shown substantial success in various applications but remain vulnerable to adversarial attacks. This study aims to identify and isolate the components of two different adversarial training techniques that contribute most to increased adversarial robustness, particularly through the lens of margins in the input space -- the minimal distance between data points and decision boundaries. Specifically, we compare two methods that maximize margins: a simple approach which modifies the loss function to increase an approximation of the margin, and a more complex state-of-the-art method (Dynamics-Aware Robust Training) which builds upon this approach. Using a VGG-16 model as our base, we systematically isolate and evaluate individual components from these methods to determine their relative impact on adversarial robustness. We assess the effect of each component on the model's performance under various adversarial attacks, including AutoAttack and Projected Gradient Descent (PGD). Our analysis on the CIFAR-10 dataset reveals which elements most effectively enhance adversarial robustness, providing insights for designing more robust DNNs.

Via

Access Paper or Ask Questions

A Data Pilot-Aided Temporal Convolutional Network for Channel Estimation in IEEE 802.11p Vehicle-to-Vehicle Communications

Feb 05, 2025

Simbarashe Aldrin Ngorima, Albert Helberg, Marelie H. Davel

Abstract:In modern communication systems, having an accurate channel estimator is crucial. However, when there is mobility, it becomes difficult to estimate the channel and the pilot signals, which are used for channel estimation, become insufficient. In this paper, we introduce the use of Temporal Convolutional Networks (TCNs) with data pilot-aided (DPA) channel estimation and temporal averaging (TA) to estimate vehicle-to-vehicle same direction with Wall (VTV-SDWW) channels. The TCN-DPA-TA estimator showed an improvement in Bit Error Rate (BER) performance of up to 1 order of magnitude. Furthermore, the BER performance of the TCN-DPA without TA also improved by up to 0.7 magnitude compared to the best classical estimator.

* Southern Africa Telecommunication Networks and Applications Conference (SATNAC) 2024
* 10 pages, 7 Figures, SATNAC 2024 proceedings

Via

Access Paper or Ask Questions

Pre-training a Transformer-Based Generative Model Using a Small Sepedi Dataset

Jan 25, 2025

Simon P. Ramalepe, Thipe I. Modipa, Marelie H. Davel

Figure 1 for Pre-training a Transformer-Based Generative Model Using a Small Sepedi Dataset

Figure 2 for Pre-training a Transformer-Based Generative Model Using a Small Sepedi Dataset

Figure 3 for Pre-training a Transformer-Based Generative Model Using a Small Sepedi Dataset

Figure 4 for Pre-training a Transformer-Based Generative Model Using a Small Sepedi Dataset

Abstract:Due to the scarcity of data in low-resourced languages, the development of language models for these languages has been very slow. Currently, pre-trained language models have gained popularity in natural language processing, especially, in developing domain-specific models for low-resourced languages. In this study, we experiment with the impact of using occlusion-based techniques when training a language model for a text generation task. We curate 2 new datasets, the Sepedi monolingual (SepMono) dataset from several South African resources and the Sepedi radio news (SepNews) dataset from the radio news domain. We use the SepMono dataset to pre-train transformer-based models using the occlusion and non-occlusion pre-training techniques and compare performance. The SepNews dataset is specifically used for fine-tuning. Our results show that the non-occlusion models perform better compared to the occlusion-based models when measuring validation loss and perplexity. However, analysis of the generated text using the BLEU score metric, which measures the quality of the generated text, shows a slightly higher BLEU score for the occlusion-based models compared to the non-occlusion models.

Via

Access Paper or Ask Questions

Impact of Batch Normalization on Convolutional Network Representations

Jan 24, 2025

Hermanus L. Potgieter, Coenraad Mouton, Marelie H. Davel

Abstract:Batch normalization (BatchNorm) is a popular layer normalization technique used when training deep neural networks. It has been shown to enhance the training speed and accuracy of deep learning models. However, the mechanics by which BatchNorm achieves these benefits is an active area of research, and different perspectives have been proposed. In this paper, we investigate the effect of BatchNorm on the resulting hidden representations, that is, the vectors of activation values formed as samples are processed at each hidden layer. Specifically, we consider the sparsity of these representations, as well as their implicit clustering -- the creation of groups of representations that are similar to some extent. We contrast image classification models trained with and without batch normalization and highlight consistent differences observed. These findings highlight that BatchNorm's effect on representational sparsity is not a significant factor affecting generalization, while the representations of models trained with BatchNorm tend to show more advantageous clustering characteristics.

* Communications in Computer and Information Science, vol 2326. Springer, Cham (2025)

Via

Access Paper or Ask Questions

Is network fragmentation a useful complexity measure?

Nov 07, 2024

Coenraad Mouton, Randle Rabe, Daniël G. Haasbroek, Marthinus W. Theunissen, Hermanus L. Potgieter, Marelie H. Davel

Figure 1 for Is network fragmentation a useful complexity measure?

Figure 2 for Is network fragmentation a useful complexity measure?

Figure 3 for Is network fragmentation a useful complexity measure?

Figure 4 for Is network fragmentation a useful complexity measure?

Abstract:It has been observed that the input space of deep neural network classifiers can exhibit `fragmentation', where the model function rapidly changes class as the input space is traversed. The severity of this fragmentation tends to follow the double descent curve, achieving a maximum at the interpolation regime. We study this phenomenon in the context of image classification and ask whether fragmentation could be predictive of generalization performance. Using a fragmentation-based complexity measure, we show this to be possible by achieving good performance on the PGDL (Predicting Generalization in Deep Learning) benchmark. In addition, we report on new observations related to fragmentation, namely (i) fragmentation is not limited to the input space but occurs in the hidden representations as well, (ii) fragmentation follows the trends in the validation error throughout training, and (iii) fragmentation is not a direct result of increased weight norms. Together, this indicates that fragmentation is a phenomenon worth investigating further when studying the generalization ability of deep neural networks.

Via

Access Paper or Ask Questions

Input margins can predict generalization too

Aug 29, 2023

Coenraad Mouton, Marthinus W. Theunissen, Marelie H. Davel

Abstract:Understanding generalization in deep neural networks is an active area of research. A promising avenue of exploration has been that of margin measurements: the shortest distance to the decision boundary for a given sample or its representation internal to the network. While margins have been shown to be correlated with the generalization ability of a model when measured at its hidden representations (hidden margins), no such link between large margins and generalization has been established for input margins. We show that while input margins are not generally predictive of generalization, they can be if the search space is appropriately constrained. We develop such a measure based on input margins, which we refer to as `constrained margins'. The predictive power of this new measure is demonstrated on the 'Predicting Generalization in Deep Learning' (PGDL) dataset and contrasted with hidden representation margins. We find that constrained margins achieve highly competitive scores and outperform other margin measurements in general. This provides a novel insight on the relationship between generalization and classification margins, and highlights the importance of considering the data manifold for investigations of generalization in DNNs.

Via

Access Paper or Ask Questions

The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs

Feb 14, 2023

Marthinus W. Theunissen, Coenraad Mouton, Marelie H. Davel

Abstract:Classification margins are commonly used to estimate the generalization ability of machine learning models. We present an empirical study of these margins in artificial neural networks. A global estimate of margin size is usually used in the literature. In this work, we point out seldom considered nuances regarding classification margins. Notably, we demonstrate that some types of training samples are modelled with consistently small margins while affecting generalization in different ways. By showing a link with the minimum distance to a different-target sample and the remoteness of samples from one another, we provide a plausible explanation for this observation. We support our findings with an analysis of fully-connected networks trained on noise-corrupted MNIST data, as well as convolutional networks trained on noise-corrupted CIFAR10 data.

* In Communications in Computer and Information Science, vol 1734. Springer, Cham (2022)
* This work is a preprint of a published paper by the same name, which it subsumes. This preprint is an extended version: it contains additional empirical evidence and discussion

Via

Access Paper or Ask Questions

Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

Oct 06, 2022

Walter Heymans, Marelie H. Davel, Charl van Heerden

Figure 1 for Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

Figure 2 for Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

Figure 3 for Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

Figure 4 for Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

Abstract:We propose a new framework to improve automatic speech recognition (ASR) systems in resource-scarce environments using a generative adversarial network (GAN) operating on acoustic input features. The GAN is used to enhance the features of mismatched data prior to decoding, or can optionally be used to fine-tune the acoustic model. We achieve improvements that are comparable to multi-style training (MTR), but at a lower computational cost. With less than one hour of data, an ASR system trained on good quality data, and evaluated on mismatched audio is improved by between 11.5% and 19.7% relative word error rate (WER). Experiments demonstrate that the framework can be very useful in under-resourced environments where training data and computational resources are limited. The GAN does not require parallel training data, because it utilises a baseline acoustic model to provide an additional loss term that guides the generator to create acoustic features that are better classified by the baseline.

* Speech Communication, 143, pp.10-20 (2022)
* Final published version available at: Efficient acoustic feature transformation in mismatched environments using a Guided-GAN. Speech Communication, 143, pp.10-20

Via

Access Paper or Ask Questions