Alert button
Picture for Kim Rasmussen

Kim Rasmussen

Alert button

MalwareDNA: Simultaneous Classification of Malware, Malware Families, and Novel Malware

Sep 04, 2023
Maksim E. Eren, Manish Bhattarai, Kim Rasmussen, Boian S. Alexandrov, Charles Nicholas

Figure 1 for MalwareDNA: Simultaneous Classification of Malware, Malware Families, and Novel Malware
Figure 2 for MalwareDNA: Simultaneous Classification of Malware, Malware Families, and Novel Malware
Figure 3 for MalwareDNA: Simultaneous Classification of Malware, Malware Families, and Novel Malware

Malware is one of the most dangerous and costly cyber threats to national security and a crucial factor in modern cyber-space. However, the adoption of machine learning (ML) based solutions against malware threats has been relatively slow. Shortcomings in the existing ML approaches are likely contributing to this problem. The majority of current ML approaches ignore real-world challenges such as the detection of novel malware. In addition, proposed ML approaches are often designed either for malware/benign-ware classification or malware family classification. Here we introduce and showcase preliminary capabilities of a new method that can perform precise identification of novel malware families, while also unifying the capability for malware/benign-ware classification and malware family classification into a single framework.

* Accepted at IEEE ISI 2023 
Viaarxiv icon

Robust Adversarial Defense by Tensor Factorization

Sep 03, 2023
Manish Bhattarai, Mehmet Cagri Kaymak, Ryan Barron, Ben Nebgen, Kim Rasmussen, Boian Alexandrov

Figure 1 for Robust Adversarial Defense by Tensor Factorization
Figure 2 for Robust Adversarial Defense by Tensor Factorization
Figure 3 for Robust Adversarial Defense by Tensor Factorization
Figure 4 for Robust Adversarial Defense by Tensor Factorization

As machine learning techniques become increasingly prevalent in data analysis, the threat of adversarial attacks has surged, necessitating robust defense mechanisms. Among these defenses, methods exploiting low-rank approximations for input data preprocessing and neural network (NN) parameter factorization have shown potential. Our work advances this field further by integrating the tensorization of input data with low-rank decomposition and tensorization of NN parameters to enhance adversarial defense. The proposed approach demonstrates significant defense capabilities, maintaining robust accuracy even when subjected to the strongest known auto-attacks. Evaluations against leading-edge robust performance benchmarks reveal that our results not only hold their ground against the best defensive methods available but also exceed all current defense strategies that rely on tensor factorizations. This study underscores the potential of integrating tensorization and low-rank decomposition as a robust defense against adversarial attacks in machine learning.

* Accepted at 2023 ICMLA Conference 
Viaarxiv icon

Process Modeling, Hidden Markov Models, and Non-negative Tensor Factorization with Model Selection

Oct 03, 2022
Erik Skau, Andrew Hollis, Stephan Eidenbenz, Kim Rasmussen, Boian Alexandrov

Figure 1 for Process Modeling, Hidden Markov Models, and Non-negative Tensor Factorization with Model Selection
Figure 2 for Process Modeling, Hidden Markov Models, and Non-negative Tensor Factorization with Model Selection
Figure 3 for Process Modeling, Hidden Markov Models, and Non-negative Tensor Factorization with Model Selection
Figure 4 for Process Modeling, Hidden Markov Models, and Non-negative Tensor Factorization with Model Selection

Monitoring of industrial processes is a critical capability in industry and in government to ensure reliability of production cycles, quick emergency response, and national security. Process monitoring allows users to gauge the involvement of an organization in an industrial process or predict the degradation or aging of machine parts in processes taking place at a remote location. Similar to many data science applications, we usually only have access to limited raw data, such as satellite imagery, short video clips, some event logs, and signatures captured by a small set of sensors. To combat data scarcity, we leverage the knowledge of subject matter experts (SMEs) who are familiar with the process. Various process mining techniques have been developed for this type of analysis; typically such approaches combine theoretical process models built based on domain expert insights with ad-hoc integration of available pieces of raw data. Here, we introduce a novel mathematically sound method that integrates theoretical process models (as proposed by SMEs) with interrelated minimal Hidden Markov Models (HMM), built via non-negative tensor factorization and discrete model simulations. Our method consolidates: (a) Theoretical process models development, (b) Discrete model simulations (c) HMM, (d) Joint Non-negative Matrix Factorization (NMF) and Non-negative Tensor Factorization (NTF), and (e) Custom model selection. To demonstrate our methodology and its abilities, we apply it on simple synthetic and real world process models.

* 17 pages, 8 figures 
Viaarxiv icon

SeNMFk-SPLIT: Large Corpora Topic Modeling by Semantic Non-negative Matrix Factorization with Automatic Model Selection

Aug 21, 2022
Maksim E. Eren, Nick Solovyev, Manish Bhattarai, Kim Rasmussen, Charles Nicholas, Boian S. Alexandrov

Figure 1 for SeNMFk-SPLIT: Large Corpora Topic Modeling by Semantic Non-negative Matrix Factorization with Automatic Model Selection
Figure 2 for SeNMFk-SPLIT: Large Corpora Topic Modeling by Semantic Non-negative Matrix Factorization with Automatic Model Selection
Figure 3 for SeNMFk-SPLIT: Large Corpora Topic Modeling by Semantic Non-negative Matrix Factorization with Automatic Model Selection
Figure 4 for SeNMFk-SPLIT: Large Corpora Topic Modeling by Semantic Non-negative Matrix Factorization with Automatic Model Selection

As the amount of text data continues to grow, topic modeling is serving an important role in understanding the content hidden by the overwhelming quantity of documents. One popular topic modeling approach is non-negative matrix factorization (NMF), an unsupervised machine learning (ML) method. Recently, Semantic NMF with automatic model selection (SeNMFk) has been proposed as a modification to NMF. In addition to heuristically estimating the number of topics, SeNMFk also incorporates the semantic structure of the text. This is performed by jointly factorizing the term frequency-inverse document frequency (TF-IDF) matrix with the co-occurrence/word-context matrix, the values of which represent the number of times two words co-occur in a predetermined window of the text. In this paper, we introduce a novel distributed method, SeNMFk-SPLIT, for semantic topic extraction suitable for large corpora. Contrary to SeNMFk, our method enables the joint factorization of large documents by decomposing the word-context and term-document matrices separately. We demonstrate the capability of SeNMFk-SPLIT by applying it to the entire artificial intelligence (AI) and ML scientific literature uploaded on arXiv.

* Accepted at ACM Symposium on Document Engineering 2022 (DocEng 22), 2022 
Viaarxiv icon