Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ionut Cardei

Minerva: Reinforcement Learning with Verifiable Rewards for Cyber Threat Intelligence LLMs

Jan 31, 2026

Md Tanvirul Alam, Aritran Piplai, Ionut Cardei, Nidhi Rastogi, Peter J Worth

Abstract:Cyber threat intelligence (CTI) analysts routinely convert noisy, unstructured security artifacts into standardized, automation-ready representations. Although large language models (LLMs) show promise for this task, existing approaches remain brittle when producing structured CTI outputs and have largely relied on supervised fine-tuning (SFT). In contrast, CTI standards and community-maintained resources define canonical identifiers and schemas that enable deterministic verification of model outputs. We leverage this structure to study reinforcement learning with verifiable rewards (RLVR) for CTI tasks. We introduce \textit{Minerva}, a unified dataset and training pipeline spanning multiple CTI subtasks, each paired with task-specific verifiers that score structured outputs and identifier predictions. To address reward sparsity during rollout, we propose a lightweight self-training mechanism that generates additional verified trajectories and distills them back into the model. Experiments across LLM backbones show consistent improvements in accuracy and robustness over SFT across multiple benchmarks.

Via

Access Paper or Ask Questions

Predicting Student Success with Heterogeneous Graph Deep Learning and Machine Learning Models

Jan 11, 2026

Anca Muresan, Mihaela Cardei, Ionut Cardei

Abstract:Early identification of student success is crucial for enabling timely interventions, reducing dropout rates, and promoting on time graduation. In educational settings, AI powered systems have become essential for predicting student performance due to their advanced analytical capabilities. However, effectively leveraging diverse student data to uncover latent and complex patterns remains a key challenge. While prior studies have explored this area, the potential of dynamic data features and multi category entities has been largely overlooked. To address this gap, we propose a framework that integrates heterogeneous graph deep learning models to enhance early and continuous student performance prediction, using traditional machine learning algorithms for comparison. Our approach employs a graph metapath structure and incorporates dynamic assessment features, which progressively influence the student success prediction task. Experiments on the Open University Learning Analytics (OULA) dataset demonstrate promising results, achieving a 68.6% validation F1 score with only 7% of the semester completed, and reaching up to 89.5% near the semester's end. Our approach outperforms top machine learning models by 4.7% in validation F1 score during the critical early 7% of the semester, underscoring the value of dynamic features and heterogeneous graph representations in student success prediction.

* Proceedings of the 18th International Conference on Educational Data Mining, 2025

Via

Access Paper or Ask Questions

Bloom Filter Encoding for Machine Learning

Dec 23, 2025

John Cartmell, Mihaela Cardei, Ionut Cardei

Abstract:We present a method that uses the Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact, privacy-preserving bit array. This reduces memory use and protects the original data while keeping enough structure for accurate classification. We test the method on six datasets: SMS Spam Collection, ECG200, Adult 50K, CDC Diabetes, MNIST, and Fashion MNIST. Four classifiers are used: Extreme Gradient Boosting, Deep Neural Networks, Convolutional Neural Networks, and Logistic Regression. Results show that models trained on Bloom filter encodings achieve accuracy similar to models trained on raw data or other transforms. At the same time, the method provides memory savings while enhancing privacy. These results suggest that the Bloom filter transform is an efficient preprocessing approach for diverse machine learning tasks.

* 14 pages, 7 figures

Via

Access Paper or Ask Questions