Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohammad Ghafari

Security Bug Report Prediction Within and Across Projects: A Comparative Study of BERT and Random Forest

Apr 28, 2025

Farnaz Soltaniani, Mohammad Ghafari, Mohammed Sayagh

Abstract:Early detection of security bug reports (SBRs) is crucial for preventing vulnerabilities and ensuring system reliability. While machine learning models have been developed for SBR prediction, their predictive performance still has room for improvement. In this study, we conduct a comprehensive comparison between BERT and Random Forest (RF), a competitive baseline for predicting SBRs. The results show that RF outperforms BERT with a 34% higher average G-measure for within-project predictions. Adding only SBRs from various projects improves both models' average performance. However, including both security and nonsecurity bug reports significantly reduces RF's average performance to 46%, while boosts BERT to its best average performance of 66%, surpassing RF. In cross-project SBR prediction, BERT achieves a remarkable 62% G-measure, which is substantially higher than RF.

Via

Access Paper or Ask Questions

Poisoned Source Code Detection in Code Models

Feb 19, 2025

Ehab Ghannoum, Mohammad Ghafari

Abstract:Deep learning models have gained popularity for conducting various tasks involving source code. However, their black-box nature raises concerns about potential risks. One such risk is a poisoning attack, where an attacker intentionally contaminates the training set with malicious samples to mislead the model's predictions in specific scenarios. To protect source code models from poisoning attacks, we introduce CodeGarrison (CG), a hybrid deep-learning model that relies on code embeddings to identify poisoned code samples. We evaluated CG against the state-of-the-art technique ONION for detecting poisoned samples generated by DAMP, MHM, ALERT, as well as a novel poisoning technique named CodeFooler. Results showed that CG significantly outperformed ONION with an accuracy of 93.5%. We also tested CG's robustness against unknown attacks and achieved an average accuracy of 85.6% in identifying poisoned samples across the four attacks mentioned above.

* Accepted for Publication in the Journal of Systems and Software (JSS)

Via

Access Paper or Ask Questions

Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models

Feb 09, 2025

Marc Bruni, Fabio Gabrielli, Mohammad Ghafari, Martin Kropp

Figure 1 for Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models

Figure 2 for Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models

Figure 3 for Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models

Figure 4 for Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models

Abstract:Prompt engineering reduces reasoning mistakes in Large Language Models (LLMs). However, its effectiveness in mitigating vulnerabilities in LLM-generated code remains underexplored. To address this gap, we implemented a benchmark to automatically assess the impact of various prompt engineering strategies on code security. Our benchmark leverages two peer-reviewed prompt datasets and employs static scanners to evaluate code security at scale. We tested multiple prompt engineering techniques on GPT-3.5-turbo, GPT-4o, and GPT-4o-mini. Our results show that for GPT-4o and GPT-4o-mini, a security-focused prompt prefix can reduce the occurrence of security vulnerabilities by up to 56%. Additionally, all tested models demonstrated the ability to detect and repair between 41.9% and 68.7% of vulnerabilities in previously generated code when using iterative prompting techniques. Finally, we introduce a "prompt agent" that demonstrates how the most effective techniques can be applied in real-world development workflows.

* Accepted at the 2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge 2025). 10 pages, 7 figures, 5 tables

Via

Access Paper or Ask Questions

ChatGPT's Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools

Sep 10, 2024

Ehsan Firouzi, Mohammad Ghafari, Mike Ebrahimi

Figure 1 for ChatGPT's Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools

Figure 2 for ChatGPT's Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools

Figure 3 for ChatGPT's Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools

Abstract:The correct adoption of cryptography APIs is challenging for mainstream developers, often resulting in widespread API misuse. Meanwhile, cryptography misuse detectors have demonstrated inconsistent performance and remain largely inaccessible to most developers. We investigated the extent to which ChatGPT can detect cryptography misuses and compared its performance with that of the state-of-the-art static analysis tools. Our investigation, mainly based on the CryptoAPI-Bench benchmark, demonstrated that ChatGPT is effective in identifying cryptography API misuses, and with the use of prompt engineering, it can even outperform leading static cryptography misuse detectors.

* ESEM 2024

Via

Access Paper or Ask Questions