Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Miguel Silva

Binary and Multiclass Cyberattack Classification on GeNIS Dataset

Nov 11, 2025

Miguel Silva, Daniela Pinto, João Vitorino, Eva Maia, Isabel Praça, Ivone Amorim, Maria João Viamonte

Abstract:The integration of Artificial Intelligence (AI) in Network Intrusion Detection Systems (NIDS) is a promising approach to tackle the increasing sophistication of cyberattacks. However, since Machine Learning (ML) and Deep Learning (DL) models rely heavily on the quality of their training data, the lack of diverse and up-to-date datasets hinders their generalization capability to detect malicious activity in previously unseen network traffic. This study presents an experimental validation of the reliability of the GeNIS dataset for AI-based NIDS, to serve as a baseline for future benchmarks. Five feature selection methods, Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, were combined to identify the most relevant features of GeNIS and reduce its dimensionality, enabling a more computationally efficient detection. Three decision tree ensembles and two deep neural networks were trained for both binary and multiclass classification tasks. All models reached high accuracy and F1-scores, and the ML ensembles achieved slightly better generalization while remaining more efficient than DL models. Overall, the obtained results indicate that the GeNIS dataset supports intelligent intrusion detection and cyberattack classification with time-based and quantity-based behavioral features.

* 17 pages, 12 tables, FPS 2025 conference

Via

Access Paper or Ask Questions

AI-powered Contextual 3D Environment Generation: A Systematic Review

Jun 05, 2025

Miguel Silva, Alexandre Valle de Carvalho

Figure 1 for AI-powered Contextual 3D Environment Generation: A Systematic Review

Figure 2 for AI-powered Contextual 3D Environment Generation: A Systematic Review

Figure 3 for AI-powered Contextual 3D Environment Generation: A Systematic Review

Figure 4 for AI-powered Contextual 3D Environment Generation: A Systematic Review

Abstract:The generation of high-quality 3D environments is crucial for industries such as gaming, virtual reality, and cinema, yet remains resource-intensive due to the reliance on manual processes. This study performs a systematic review of existing generative AI techniques for 3D scene generation, analyzing their characteristics, strengths, limitations, and potential for improvement. By examining state-of-the-art approaches, it presents key challenges such as scene authenticity and the influence of textual inputs. Special attention is given to how AI can blend different stylistic domains while maintaining coherence, the impact of training data on output quality, and the limitations of current models. In addition, this review surveys existing evaluation metrics for assessing realism and explores how industry professionals incorporate AI into their workflows. The findings of this study aim to provide a comprehensive understanding of the current landscape and serve as a foundation for future research on AI-driven 3D content generation. Key findings include that advanced generative architectures enable high-quality 3D content creation at a high computational cost, effective multi-modal integration techniques like cross-attention and latent space alignment facilitate text-to-3D tasks, and the quality and diversity of training data combined with comprehensive evaluation metrics are critical to achieving scalable, robust 3D scene generation.

Via

Access Paper or Ask Questions

Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection

May 08, 2025

José Gonçalves, Miguel Silva, Eva Maia, Isabel Praça

Figure 1 for Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection

Figure 2 for Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection

Figure 3 for Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection

Abstract:The application of Artificial Intelligence has become a powerful approach to detecting software vulnerabilities. However, effective vulnerability detection relies on accurately capturing the semantic structure of code and its contextual relationships. Given that the same functionality can be implemented in various forms, a preprocessing tool that standardizes code representation is important. This tool must be efficient, adaptable across programming languages, and capable of supporting new transformations. To address this challenge, we build on the existing SCoPE framework and introduce SCoPE2, an enhanced version with improved performance. We compare both versions in terms of processing time and memory usage and evaluate their impact on a Large Language Model (LLM) for vulnerability detection. Our results show a 97.3\% reduction in processing time with SCoPE2, along with an improved F1-score for the LLM, solely due to the refined preprocessing approach.

* 10 pages, 3 tables, DCAI'25: Distributed Computing and Artificial Intelligence 2025

Via

Access Paper or Ask Questions

Evaluating LLaMA 3.2 for Software Vulnerability Detection

Mar 10, 2025

José Gonçalves, Miguel Silva, Bernardo Cabral, Tiago Dias, Eva Maia, Isabel Praça, Ricardo Severino, Luís Lino Ferreira

Figure 1 for Evaluating LLaMA 3.2 for Software Vulnerability Detection

Figure 2 for Evaluating LLaMA 3.2 for Software Vulnerability Detection

Figure 3 for Evaluating LLaMA 3.2 for Software Vulnerability Detection

Figure 4 for Evaluating LLaMA 3.2 for Software Vulnerability Detection

Abstract:Deep Learning (DL) has emerged as a powerful tool for vulnerability detection, often outperforming traditional solutions. However, developing effective DL models requires large amounts of real-world data, which can be difficult to obtain in sufficient quantities. To address this challenge, DiverseVul dataset has been curated as the largest dataset of vulnerable and non-vulnerable C/C++ functions extracted exclusively from real-world projects. Its goal is to provide high-quality, large-scale samples for training DL models. However, during our study several inconsistencies were identified in the raw dataset while applying pre-processing techniques, highlighting the need for a refined version. In this work, we present a refined version of DiverseVul dataset, which is used to fine-tune a large language model, LLaMA 3.2, for vulnerability detection. Experimental results show that the use of pre-processing techniques led to an improvement in performance, with the model achieving an F1-Score of 66%, a competitive result when compared to our baseline, which achieved a 47% F1-Score in software vulnerability detection.

* 14 pages, 4 tables, EICC 2025: European Interdisciplinary Cybersecurity Conference 2025

Via

Access Paper or Ask Questions

Efficient Network Traffic Feature Sets for IoT Intrusion Detection

Jun 12, 2024

Miguel Silva, João Vitorino, Eva Maia, Isabel Praça

Figure 1 for Efficient Network Traffic Feature Sets for IoT Intrusion Detection

Figure 2 for Efficient Network Traffic Feature Sets for IoT Intrusion Detection

Figure 3 for Efficient Network Traffic Feature Sets for IoT Intrusion Detection

Figure 4 for Efficient Network Traffic Feature Sets for IoT Intrusion Detection

Abstract:The use of Machine Learning (ML) models in cybersecurity solutions requires high-quality data that is stripped of redundant, missing, and noisy information. By selecting the most relevant features, data integrity and model efficiency can be significantly improved. This work evaluates the feature sets provided by a combination of different feature selection methods, namely Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, in multiple IoT network datasets. The influence of the smaller feature sets on both the classification performance and the training time of ML models is compared, with the aim of increasing the computational efficiency of IoT intrusion detection. Overall, the most impactful features of each dataset were identified, and the ML models obtained higher computational efficiency while preserving a good generalization, showing little to no difference between the sets.

* 10 pages, 9 tables, DCAI 2024 conference

Via

Access Paper or Ask Questions

Reliable Feature Selection for Adversarially Robust Cyber-Attack Detection

Apr 05, 2024

João Vitorino, Miguel Silva, Eva Maia, Isabel Praça

Abstract:The growing cybersecurity threats make it essential to use high-quality data to train Machine Learning (ML) models for network traffic analysis, without noisy or missing data. By selecting the most relevant features for cyber-attack detection, it is possible to improve both the robustness and computational efficiency of the models used in a cybersecurity system. This work presents a feature selection and consensus process that combines multiple methods and applies them to several network datasets. Two different feature sets were selected and were used to train multiple ML models with regular and adversarial training. Finally, an adversarial evasion robustness benchmark was performed to analyze the reliability of the different feature sets and their impact on the susceptibility of the models to adversarial examples. By using an improved dataset with more data diversity, selecting the best time-related features and a more specific feature set, and performing adversarial training, the ML models were able to achieve a better adversarially robust generalization. The robustness of the models was significantly improved without their generalization to regular traffic flows being affected, without increases of false alarms, and without requiring too many computational resources, which enables a reliable detection of suspicious activity and perturbed traffic flows in enterprise computer networks.

* 24 pages, 17 tables, Annals of Telecommunications journal. arXiv admin note: substantial text overlap with arXiv:2402.16912

Via

Access Paper or Ask Questions

An Adversarial Robustness Benchmark for Enterprise Network Intrusion Detection

Feb 25, 2024

João Vitorino, Miguel Silva, Eva Maia, Isabel Praça

Abstract:As cyber-attacks become more sophisticated, improving the robustness of Machine Learning (ML) models must be a priority for enterprises of all sizes. To reliably compare the robustness of different ML models for cyber-attack detection in enterprise computer networks, they must be evaluated in standardized conditions. This work presents a methodical adversarial robustness benchmark of multiple decision tree ensembles with constrained adversarial examples generated from standard datasets. The robustness of regularly and adversarially trained RF, XGB, LGBM, and EBM models was evaluated on the original CICIDS2017 dataset, a corrected version of it designated as NewCICIDS, and the HIKARI dataset, which contains more recent network traffic. NewCICIDS led to models with a better performance, especially XGB and EBM, but RF and LGBM were less robust against the more recent cyber-attacks of HIKARI. Overall, the robustness of the models to adversarial cyber-attack examples was improved without their generalization to regular traffic being affected, enabling a reliable detection of suspicious activity without costly increases of false alarms.

* 15 pages, 8 tables, 2 figures, FPS 2023 conference

Via

Access Paper or Ask Questions