Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Isabel Praça

GECAD, ISEP, Polytechnic of Porto, Portugal

Evaluating Local Explainability Metrics for Machine Learning Models on Tabular Data

May 26, 2026

Tomás Pereira, João Vitorino, Eva Maia, Isabel Praça

Abstract:Despite the wide use of explainability techniques to attempt to understand the behavior of Artificial Intelligence (AI), the generated explanations may not always be reliable. An explanation can appear plausible to humans but fail to capture the internal reasoning of a model, particularly when dealing with complex tabular data. This paper studies the trustworthiness of local explainability techniques when applied to complex tabular classification tasks, considering evaluated metrics for three main properties: faithfulness to the model's predictions, robustness to input data variations, and complexity of the explanation itself. A benchmark was performed for Local Interpretable Model-Agnostic Explanations (LIME), Kernel SHapley Additive exPlanations (SHAP), and Feature Ablation techniques, across 32 datasets and different types of machine learning models. Model performance ranges were analyzed to identify two groups: consensus-correct, which are samples that all models predicted correctly, and consensus-wrong, samples that all models predicted incorrectly. The obtained results demonstrate that that the explanations are not always correlated with a model's predictive performance. Instead, dataset complexity and feature distributions seem to be the main factors affecting explanation quality and reliability.

* 9 pages, 12 tables, 1 figure, DATA 2026 Conference

Via

Access Paper or Ask Questions

Machine Learning Transferability for Malware Detection

Mar 27, 2026

César Vieira, João Vitorino, Eva Maia, Isabel Praça

Abstract:Malware continues to be a predominant operational risk for organizations, especially when obfuscation techniques are used to evade detection. Despite the ongoing efforts in the development of Machine Learning (ML) detection approaches, there is still a lack of feature compatibility in public datasets. This limits generalization when facing distribution shifts, as well as transferability to different datasets. This study evaluates the suitability of different data preprocessing approaches for the detection of Portable Executable (PE) files with ML models. The preprocessing pipeline unifies EMBERv2 (2,381-dim) features datasets, trains paired models under two training setups: EMBER + BODMAS and EMBER + BODMAS + ERMDS. Regarding model evaluation, both EMBER + BODMAS and EMBER + BODMAS + ERMDS models are tested against TRITIUM, INFERNO and SOREL-20M. ERMDS is also used for testing for the EMBER + BODMAS setup.

* 12 pages, 1 Figure, 2 tables, World CIST 2026

Via

Access Paper or Ask Questions

Revisiting Network Traffic Analysis: Compatible network flows for ML models

Nov 11, 2025

João Vitorino, Daniela Pinto, Eva Maia, Ivone Amorim, Isabel Praça

Abstract:To ensure that Machine Learning (ML) models can perform a robust detection and classification of cyberattacks, it is essential to train them with high-quality datasets with relevant features. However, it can be difficult to accurately represent the complex traffic patterns of an attack, especially in Internet-of-Things (IoT) networks. This paper studies the impact that seemingly similar features created by different network traffic flow exporters can have on the generalization and robustness of ML models. In addition to the original CSV files of the Bot-IoT, IoT-23, and CICIoT23 datasets, the raw network packets of their PCAP files were analysed with the HERA tool, generating new labelled flows and extracting consistent features for new CSV versions. To assess the usefulness of these new flows for intrusion detection, they were compared with the original versions and were used to fine-tune multiple models. Overall, the results indicate that directly analysing and preprocessing PCAP files, instead of just using the commonly available CSV files, enables the computation of more relevant features to train bagging and gradient boosting decision tree ensembles. It is important to continue improving feature extraction and feature selection processes to make different datasets more compatible and enable a trustworthy evaluation and comparison of the ML models used in cybersecurity solutions.

* 16 pages, 12 tables, 1 figure, FPS 2025 conference

Via

Access Paper or Ask Questions

Binary and Multiclass Cyberattack Classification on GeNIS Dataset

Nov 11, 2025

Miguel Silva, Daniela Pinto, João Vitorino, Eva Maia, Isabel Praça, Ivone Amorim, Maria João Viamonte

Abstract:The integration of Artificial Intelligence (AI) in Network Intrusion Detection Systems (NIDS) is a promising approach to tackle the increasing sophistication of cyberattacks. However, since Machine Learning (ML) and Deep Learning (DL) models rely heavily on the quality of their training data, the lack of diverse and up-to-date datasets hinders their generalization capability to detect malicious activity in previously unseen network traffic. This study presents an experimental validation of the reliability of the GeNIS dataset for AI-based NIDS, to serve as a baseline for future benchmarks. Five feature selection methods, Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, were combined to identify the most relevant features of GeNIS and reduce its dimensionality, enabling a more computationally efficient detection. Three decision tree ensembles and two deep neural networks were trained for both binary and multiclass classification tasks. All models reached high accuracy and F1-scores, and the ML ensembles achieved slightly better generalization while remaining more efficient than DL models. Overall, the obtained results indicate that the GeNIS dataset supports intelligent intrusion detection and cyberattack classification with time-based and quantity-based behavioral features.

* 17 pages, 12 tables, FPS 2025 conference

Via

Access Paper or Ask Questions

SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

Sep 30, 2025

João Vitorino, Eva Maia, Isabel Praça, Carlos Soares

Figure 1 for SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

Figure 2 for SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

Figure 3 for SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

Figure 4 for SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

Abstract:Due to the susceptibility of Artificial Intelligence (AI) to data perturbations and adversarial examples, it is crucial to perform a thorough robustness evaluation before any Machine Learning (ML) model is deployed. However, examining a model's decision boundaries and identifying potential vulnerabilities typically requires access to the training and testing datasets, which may pose risks to data privacy and confidentiality. To improve transparency in organizations that handle confidential data or manage critical infrastructure, it is essential to allow external verification and validation of AI without the disclosure of private datasets. This paper presents Systematic Pattern Analysis (SPATA), a deterministic method that converts any tabular dataset to a domain-independent representation of its statistical patterns, to provide more detailed and transparent data cards. SPATA computes the projection of each data instance into a discrete space where they can be analyzed and compared, without risking data leakage. These projected datasets can be reliably used for the evaluation of how different features affect ML model robustness and for the generation of interpretable explanations of their behavior, contributing to more trustworthy AI.

* 16 pages, 3 tables, 6 figures, SynDAiTE, ECML PKDD 2025

Via

Access Paper or Ask Questions

MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection

Jul 23, 2025

Paulo Mendes, Eva Maia, Isabel Praça

Figure 1 for MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection

Figure 2 for MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection

Figure 3 for MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection

Abstract:Phishing emails continue to pose a significant threat to cybersecurity by exploiting human vulnerabilities through deceptive content and malicious payloads. While Machine Learning (ML) models are effective at detecting phishing threats, their performance largely relies on the quality and diversity of the training data. This paper presents MeAJOR (Merged email Assets from Joint Open-source Repositories) Corpus, a novel, multi-source phishing email dataset designed to overcome critical limitations in existing resources. It integrates 135894 samples representing a broad number of phishing tactics and legitimate emails, with a wide spectrum of engineered features. We evaluated the dataset's utility for phishing detection research through systematic experiments with four classification models (RF, XGB, MLP, and CNN) across multiple feature configurations. Results highlight the dataset's effectiveness, achieving 98.34% F1 with XGB. By integrating broad features from multiple categories, our dataset provides a reusable and consistent resource, while addressing common challenges like class imbalance, generalisability and reproducibility.

* 8 pages, 2 tables, WI-IAT 2025 conference

Via

Access Paper or Ask Questions

Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection

May 08, 2025

José Gonçalves, Miguel Silva, Eva Maia, Isabel Praça

Figure 1 for Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection

Figure 2 for Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection

Figure 3 for Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection

Abstract:The application of Artificial Intelligence has become a powerful approach to detecting software vulnerabilities. However, effective vulnerability detection relies on accurately capturing the semantic structure of code and its contextual relationships. Given that the same functionality can be implemented in various forms, a preprocessing tool that standardizes code representation is important. This tool must be efficient, adaptable across programming languages, and capable of supporting new transformations. To address this challenge, we build on the existing SCoPE framework and introduce SCoPE2, an enhanced version with improved performance. We compare both versions in terms of processing time and memory usage and evaluate their impact on a Large Language Model (LLM) for vulnerability detection. Our results show a 97.3\% reduction in processing time with SCoPE2, along with an improved F1-score for the LLM, solely due to the refined preprocessing approach.

* 10 pages, 3 tables, DCAI'25: Distributed Computing and Artificial Intelligence 2025

Via

Access Paper or Ask Questions

Evaluating LLaMA 3.2 for Software Vulnerability Detection

Mar 10, 2025

José Gonçalves, Miguel Silva, Bernardo Cabral, Tiago Dias, Eva Maia, Isabel Praça, Ricardo Severino, Luís Lino Ferreira

Figure 1 for Evaluating LLaMA 3.2 for Software Vulnerability Detection

Figure 2 for Evaluating LLaMA 3.2 for Software Vulnerability Detection

Figure 3 for Evaluating LLaMA 3.2 for Software Vulnerability Detection

Figure 4 for Evaluating LLaMA 3.2 for Software Vulnerability Detection

Abstract:Deep Learning (DL) has emerged as a powerful tool for vulnerability detection, often outperforming traditional solutions. However, developing effective DL models requires large amounts of real-world data, which can be difficult to obtain in sufficient quantities. To address this challenge, DiverseVul dataset has been curated as the largest dataset of vulnerable and non-vulnerable C/C++ functions extracted exclusively from real-world projects. Its goal is to provide high-quality, large-scale samples for training DL models. However, during our study several inconsistencies were identified in the raw dataset while applying pre-processing techniques, highlighting the need for a refined version. In this work, we present a refined version of DiverseVul dataset, which is used to fine-tune a large language model, LLaMA 3.2, for vulnerability detection. Experimental results show that the use of pre-processing techniques led to an improvement in performance, with the model achieving an F1-Score of 66%, a competitive result when compared to our baseline, which achieved a 47% F1-Score in software vulnerability detection.

* 14 pages, 4 tables, EICC 2025: European Interdisciplinary Cybersecurity Conference 2025

Via

Access Paper or Ask Questions

Flow Exporter Impact on Intelligent Intrusion Detection Systems

Dec 18, 2024

Daniela Pinto, João Vitorino, Eva Maia, Ivone Amorim, Isabel Praça

Figure 1 for Flow Exporter Impact on Intelligent Intrusion Detection Systems

Figure 2 for Flow Exporter Impact on Intelligent Intrusion Detection Systems

Figure 3 for Flow Exporter Impact on Intelligent Intrusion Detection Systems

Figure 4 for Flow Exporter Impact on Intelligent Intrusion Detection Systems

Abstract:High-quality datasets are critical for training machine learning models, as inconsistencies in feature generation can hinder the accuracy and reliability of threat detection. For this reason, ensuring the quality of the data in network intrusion detection datasets is important. A key component of this is using reliable tools to generate the flows and features present in the datasets. This paper investigates the impact of flow exporters on the performance and reliability of machine learning models for intrusion detection. Using HERA, a tool designed to export flows and extract features, the raw network packets of two widely used datasets, UNSW-NB15 and CIC-IDS2017, were processed from PCAP files to generate new versions of these datasets. These were compared to the original ones in terms of their influence on the performance of several models, including Random Forest, XGBoost, LightGBM, and Explainable Boosting Machine. The results obtained were significant. Models trained on the HERA version of the datasets consistently outperformed those trained on the original dataset, showing improvements in accuracy and indicating a better generalisation. This highlighted the importance of flow generation in the model's ability to differentiate between benign and malicious traffic.

* 9 pages, 10 tables, ICISSP 2025 conference

Via

Access Paper or Ask Questions

Intelligent Green Efficiency for Intrusion Detection

Nov 11, 2024

Pedro Pereira, Paulo Mendes, João Vitorino, Eva Maia, Isabel Praça

Figure 1 for Intelligent Green Efficiency for Intrusion Detection

Figure 2 for Intelligent Green Efficiency for Intrusion Detection

Figure 3 for Intelligent Green Efficiency for Intrusion Detection

Figure 4 for Intelligent Green Efficiency for Intrusion Detection

Abstract:Artificial Intelligence (AI) has emerged in popularity recently, recording great progress in various industries. However, the environmental impact of AI is a growing concern, in terms of the energy consumption and carbon footprint of Machine Learning (ML) and Deep Learning (DL) models, making essential investigate Green AI, an attempt to reduce the climate impact of AI systems. This paper presents an assessment of different programming languages and Feature Selection (FS) methods to improve computation performance of AI focusing on Network Intrusion Detection (NID) and cyber-attack classification tasks. Experiments were conducted using five ML models - Random Forest, XGBoost, LightGBM, Multi-Layer Perceptron, and Long Short-Term Memory - implemented in four programming languages - Python, Java, R, and Rust - along with three FS methods - Information Gain, Recursive Feature Elimination, and Chi-Square. The obtained results demonstrated that FS plays an important role enhancing the computational efficiency of AI models without compromising detection accuracy, highlighting languages like Python and R, that benefit from a rich AI libraries environment. These conclusions can be useful to design efficient and sustainable AI systems that still provide a good generalization and a reliable detection.

* 16 pages, 9 tables, FPS 2024 conference

Via

Access Paper or Ask Questions