



Cyber attacks are often identified using system and network logs. There have been significant prior works that utilize provenance graphs and ML techniques to detect attacks, specifically advanced persistent threats, which are very difficult to detect. Lately, there have been studies where transformer-based language models are being used to detect various types of attacks from system logs. However, no such attempts have been made in the case of APTs. In addition, existing state-of-the-art techniques that use system provenance graphs, lack a data processing framework generalized across datasets for optimal performance. For mitigating this limitation as well as exploring the effectiveness of transformer-based language models, this paper proposes LogShield, a framework designed to detect APT attack patterns leveraging the power of self-attention in transformers. We incorporate customized embedding layers to effectively capture the context of event sequences derived from provenance graphs. While acknowledging the computational overhead associated with training transformer networks, our framework surpasses existing LSTM and Language models regarding APT detection. We integrated the model parameters and training procedure from the RoBERTa model and conducted extensive experiments on well-known APT datasets (DARPA OpTC and DARPA TC E3). Our framework achieved superior F1 scores of 98% and 95% on the two datasets respectively, surpassing the F1 scores of 96% and 94% obtained by LSTM models. Our findings suggest that LogShield's performance benefits from larger datasets and demonstrates its potential for generalization across diverse domains. These findings contribute to the advancement of APT attack detection methods and underscore the significance of transformer-based architectures in addressing security challenges in computer systems.




We present ANUBIS, a highly effective machine learning-based APT detection system. Our design philosophy for ANUBIS involves two principal components. Firstly, we intend ANUBIS to be effectively utilized by cyber-response teams. Therefore, prediction explainability is one of the main focuses of ANUBIS design. Secondly, ANUBIS uses system provenance graphs to capture causality and thereby achieve high detection performance. At the core of the predictive capability of ANUBIS, there is a Bayesian Neural Network that can tell how confident it is in its predictions. We evaluate ANUBIS against a recent APT dataset (DARPA OpTC) and show that ANUBIS can detect malicious activity akin to APT campaigns with high accuracy. Moreover, ANUBIS learns about high-level patterns that allow it to explain its predictions to threat analysts. The high predictive performance with explainable attack story reconstruction makes ANUBIS an effective tool to use for enterprise cyber defense.




Rule-based IDS (intrusion detection systems) are being replaced by more robust neural IDS, which demonstrate great potential in the field of Cybersecurity. However, these ML approaches continue to rely on ad-hoc feature engineering techniques, which lack the capacity to vectorize inputs in ways that are fully relevant to the discovery of anomalous cyber activity. We propose a deep end-to-end framework with NLP-inspired components for identifying potentially malicious behaviors on enterprise computer networks. We also demonstrate the efficacy of this technique on the recently released DARPA OpTC data set.




APT, known as Advanced Persistent Threat, is a difficult challenge for cyber defence. These threats make many traditional defences ineffective as the vulnerabilities exploited by these threats are insiders who have access to and are within the network. This paper proposes DeepTaskAPT, a heterogeneous task-tree based deep learning method to construct a baseline model based on sequences of tasks using a Long Short-Term Memory (LSTM) neural network that can be applied across different users to identify anomalous behaviour. Rather than applying the model to sequential log entries directly, as most current approaches do, DeepTaskAPT applies a process tree based task generation method to generate sequential log entries for the deep learning model. To assess the performance of DeepTaskAPT, we use a recently released synthetic dataset, DARPA Operationally Transparent Computing (OpTC) dataset and a real-world dataset, Los Alamos National Laboratory (LANL) dataset. Both of them are composed of host-based data collected from sensors. Our results show that DeepTaskAPT outperforms similar approaches e.g. DeepLog and the DeepTaskAPT baseline model demonstrate its capability to detect malicious traces in various attack scenarios while having high accuracy and low false-positive rates. To the best of knowledge this is the very first attempt of using recently introduced OpTC dataset for cyber threat detection.