Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Farnaz Soltaniani

Evaluating Large Language Models for Security Bug Report Prediction

Jan 30, 2026

Farnaz Soltaniani, Shoaib Razzaq, Mohammad Ghafari

Abstract:Early detection of security bug reports (SBRs) is critical for timely vulnerability mitigation. We present an evaluation of prompt-based engineering and fine-tuning approaches for predicting SBRs using Large Language Models (LLMs). Our findings reveal a distinct trade-off between the two approaches. Prompted proprietary models demonstrate the highest sensitivity to SBRs, achieving a G-measure of 77% and a recall of 74% on average across all the datasets, albeit at the cost of a higher false-positive rate, resulting in an average precision of only 22%. Fine-tuned models, by contrast, exhibit the opposite behavior, attaining a lower overall G-measure of 51% but substantially higher precision of 75% at the cost of reduced recall of 36%. Though a one-time investment in building fine-tuned models is necessary, the inference on the largest dataset is up to 50 times faster than that of proprietary models. These findings suggest that further investigations to harness the power of LLMs for SBR prediction are necessary.

Via

Access Paper or Ask Questions

From Data Leak to Secret Misses: The Impact of Data Leakage on Secret Detection Models

Jan 30, 2026

Farnaz Soltaniani, Mohammad Ghafari

Abstract:Machine learning models are increasingly used for software security tasks. These models are commonly trained and evaluated on large Internet-derived datasets, which often contain duplicated or highly similar samples. When such samples are split across training and test sets, data leakage may occur, allowing models to memorize patterns instead of learning to generalize. We investigate duplication in a widely used benchmark dataset of hard coded secrets and show how data leakage can substantially inflate the reported performance of AI-based secret detectors, resulting in a misleading picture of their real-world effectiveness.

Via

Access Paper or Ask Questions

Security Bug Report Prediction Within and Across Projects: A Comparative Study of BERT and Random Forest

Apr 28, 2025

Farnaz Soltaniani, Mohammad Ghafari, Mohammed Sayagh

Figure 1 for Security Bug Report Prediction Within and Across Projects: A Comparative Study of BERT and Random Forest

Figure 2 for Security Bug Report Prediction Within and Across Projects: A Comparative Study of BERT and Random Forest

Figure 3 for Security Bug Report Prediction Within and Across Projects: A Comparative Study of BERT and Random Forest

Figure 4 for Security Bug Report Prediction Within and Across Projects: A Comparative Study of BERT and Random Forest

Abstract:Early detection of security bug reports (SBRs) is crucial for preventing vulnerabilities and ensuring system reliability. While machine learning models have been developed for SBR prediction, their predictive performance still has room for improvement. In this study, we conduct a comprehensive comparison between BERT and Random Forest (RF), a competitive baseline for predicting SBRs. The results show that RF outperforms BERT with a 34% higher average G-measure for within-project predictions. Adding only SBRs from various projects improves both models' average performance. However, including both security and nonsecurity bug reports significantly reduces RF's average performance to 46%, while boosts BERT to its best average performance of 66%, surpassing RF. In cross-project SBR prediction, BERT achieves a remarkable 62% G-measure, which is substantially higher than RF.

Via

Access Paper or Ask Questions