Alert button
Picture for Ryan Barron

Ryan Barron

Alert button

Interactive Distillation of Large Single-Topic Corpora of Scientific Papers

Sep 19, 2023
Nicholas Solovyev, Ryan Barron, Manish Bhattarai, Maksim E. Eren, Kim O. Rasmussen, Boian S. Alexandrov

Figure 1 for Interactive Distillation of Large Single-Topic Corpora of Scientific Papers
Figure 2 for Interactive Distillation of Large Single-Topic Corpora of Scientific Papers
Figure 3 for Interactive Distillation of Large Single-Topic Corpora of Scientific Papers
Figure 4 for Interactive Distillation of Large Single-Topic Corpora of Scientific Papers

Highly specific datasets of scientific literature are important for both research and education. However, it is difficult to build such datasets at scale. A common approach is to build these datasets reductively by applying topic modeling on an established corpus and selecting specific topics. A more robust but time-consuming approach is to build the dataset constructively in which a subject matter expert (SME) handpicks documents. This method does not scale and is prone to error as the dataset grows. Here we showcase a new tool, based on machine learning, for constructively generating targeted datasets of scientific literature. Given a small initial "core" corpus of papers, we build a citation network of documents. At each step of the citation network, we generate text embeddings and visualize the embeddings through dimensionality reduction. Papers are kept in the dataset if they are "similar" to the core or are otherwise pruned through human-in-the-loop selection. Additional insight into the papers is gained through sub-topic modeling using SeNMFk. We demonstrate our new tool for literature review by applying it to two different fields in machine learning.

* Accepted at 2023 IEEE ICMLA conference 
Viaarxiv icon

Robust Adversarial Defense by Tensor Factorization

Sep 03, 2023
Manish Bhattarai, Mehmet Cagri Kaymak, Ryan Barron, Ben Nebgen, Kim Rasmussen, Boian Alexandrov

Figure 1 for Robust Adversarial Defense by Tensor Factorization
Figure 2 for Robust Adversarial Defense by Tensor Factorization
Figure 3 for Robust Adversarial Defense by Tensor Factorization
Figure 4 for Robust Adversarial Defense by Tensor Factorization

As machine learning techniques become increasingly prevalent in data analysis, the threat of adversarial attacks has surged, necessitating robust defense mechanisms. Among these defenses, methods exploiting low-rank approximations for input data preprocessing and neural network (NN) parameter factorization have shown potential. Our work advances this field further by integrating the tensorization of input data with low-rank decomposition and tensorization of NN parameters to enhance adversarial defense. The proposed approach demonstrates significant defense capabilities, maintaining robust accuracy even when subjected to the strongest known auto-attacks. Evaluations against leading-edge robust performance benchmarks reveal that our results not only hold their ground against the best defensive methods available but also exceed all current defense strategies that rely on tensor factorizations. This study underscores the potential of integrating tensorization and low-rank decomposition as a robust defense against adversarial attacks in machine learning.

* Accepted at 2023 ICMLA Conference 
Viaarxiv icon