Alert button
Picture for Deanna Needell

Deanna Needell

Alert button

Stratified-NMF for Heterogeneous Data

Nov 17, 2023
James Chapman, Yotam Yaniv, Deanna Needell

Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent statistics and a shared topics matrix. We develop multiplicative update rules for this novel objective and prove convergence of the objective. Then, we experiment on synthetic data to demonstrate the efficiency and accuracy of the method. Lastly, we apply our method to three real world datasets and empirically investigate their learned features.

* 5 pages. Will appear in IEEE Asilomar Conference on Signals, Systems, and Computers 2023 
Viaarxiv icon

Manifold Filter-Combine Networks

Jul 25, 2023
Joyce Chew, Edward De Brouwer, Smita Krishnaswamy, Deanna Needell, Michael Perlmutter

Figure 1 for Manifold Filter-Combine Networks
Figure 2 for Manifold Filter-Combine Networks
Figure 3 for Manifold Filter-Combine Networks

We introduce a class of manifold neural networks (MNNs) that we call Manifold Filter-Combine Networks (MFCNs), that aims to further our understanding of MNNs, analogous to how the aggregate-combine framework helps with the understanding of graph neural networks (GNNs). This class includes a wide variety of subclasses that can be thought of as the manifold analog of various popular GNNs. We then consider a method, based on building a data-driven graph, for implementing such networks when one does not have global knowledge of the manifold, but merely has access to finitely many sample points. We provide sufficient conditions for the network to provably converge to its continuum limit as the number of sample points tends to infinity. Unlike previous work (which focused on specific graph constructions), our rate of convergence does not directly depend on the number of filters used. Moreover, it exhibits linear dependence on the depth of the network rather than the exponential dependence obtained previously. Additionally, we provide several examples of interesting subclasses of MFCNs and of the rates of convergence that are obtained under specific graph constructions.

Viaarxiv icon

Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

Jun 16, 2023
Erin George, Michael Murray, William Swartworth, Deanna Needell

Figure 1 for Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?
Figure 2 for Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?
Figure 3 for Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

We study benign overfitting in two-layer ReLU networks trained using gradient descent and hinge loss on noisy data for binary classification. In particular, we consider linearly separable data for which a relatively small proportion of labels are corrupted or flipped. We identify conditions on the margin of the clean data that give rise to three distinct training outcomes: benign overfitting, in which zero loss is achieved and with high probability test data is classified correctly; overfitting, in which zero loss is achieved but test data is misclassified with probability lower bounded by a constant; and non-overfitting, in which clean points, but not corrupt points, achieve zero loss and again with high probability test data is classified correctly. Our analysis provides a fine-grained description of the dynamics of neurons throughout training and reveals two distinct phases: in the first phase clean points achieve close to zero loss, in the second phase clean points oscillate on the boundary of zero loss while corrupt points either converge towards zero loss or are eventually zeroed by the network. We prove these results using a combinatorial approach that involves bounding the number of clean versus corrupt updates across these phases of training.

* 48 pages, 2 figures, 1 table 
Viaarxiv icon

Stochastic Natural Thresholding Algorithms

Jun 07, 2023
Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, Jing Qin

Figure 1 for Stochastic Natural Thresholding Algorithms
Figure 2 for Stochastic Natural Thresholding Algorithms
Figure 3 for Stochastic Natural Thresholding Algorithms
Figure 4 for Stochastic Natural Thresholding Algorithms

Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and discusses convergence guarantees for stochastic natural thresholding algorithms by extending the NT from the deterministic version with linear measurements to the stochastic version with a general objective function. We also conduct various numerical experiments on linear and nonlinear measurements to demonstrate the performance of StoNT.

Viaarxiv icon

Detecting and Mitigating Indirect Stereotypes in Word Embeddings

May 23, 2023
Erin George, Joyce Chew, Deanna Needell

Figure 1 for Detecting and Mitigating Indirect Stereotypes in Word Embeddings
Figure 2 for Detecting and Mitigating Indirect Stereotypes in Word Embeddings
Figure 3 for Detecting and Mitigating Indirect Stereotypes in Word Embeddings
Figure 4 for Detecting and Mitigating Indirect Stereotypes in Word Embeddings

Societal biases in the usage of words, including harmful stereotypes, are frequently learned by common word embedding methods. These biases manifest not only between a word and an explicit marker of its stereotype, but also between words that share related stereotypes. This latter phenomenon, sometimes called "indirect bias,'' has resisted prior attempts at debiasing. In this paper, we propose a novel method called Biased Indirect Relationship Modification (BIRM) to mitigate indirect bias in distributional word embeddings by modifying biased relationships between words before embeddings are learned. This is done by considering how the co-occurrence probability of a given pair of words changes in the presence of words marking an attribute of bias, and using this to average out the effect of a bias attribute. To evaluate this method, we perform a series of common tests and demonstrate that measures of bias in the word embeddings are reduced in exchange for minor reduction in the semantic quality of the embeddings. In addition, we conduct novel tests for measuring indirect stereotypes by extending the Word Embedding Association Test (WEAT) with new test sets for indirect binary gender stereotypes. With these tests, we demonstrate the presence of more subtle stereotypes not addressed by previous work. The proposed method is able to reduce the presence of some of these new stereotypes, serving as a crucial next step towards non-stereotyped word embeddings.

* 15 pages 
Viaarxiv icon

Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption

May 06, 2023
HanQin Cai, Zehan Chao, Longxiu Huang, Deanna Needell

Figure 1 for Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption
Figure 2 for Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption
Figure 3 for Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption
Figure 4 for Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption

We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of matrix robust principal component analysis (RPCA), that aims to split the given tensor into an underlying low-rank component and a sparse outlier component. This work proposes a fast algorithm, called Robust Tensor CUR Decompositions (RTCUR), for large-scale non-convex TRPCA problems under the Tucker rank setting. RTCUR is developed within a framework of alternating projections that projects between the set of low-rank tensors and the set of sparse tensors. We utilize the recently developed tensor CUR decomposition to substantially reduce the computational complexity in each projection. In addition, we develop four variants of RTCUR for different application settings. We demonstrate the effectiveness and computational advantages of RTCUR against state-of-the-art methods on both synthetic and real-world datasets.

Viaarxiv icon

Linear Convergence of Reshuffling Kaczmarz Methods With Sparse Constraints

Apr 20, 2023
Halyun Jeong, Deanna Needell

Figure 1 for Linear Convergence of Reshuffling Kaczmarz Methods With Sparse Constraints
Figure 2 for Linear Convergence of Reshuffling Kaczmarz Methods With Sparse Constraints
Figure 3 for Linear Convergence of Reshuffling Kaczmarz Methods With Sparse Constraints
Figure 4 for Linear Convergence of Reshuffling Kaczmarz Methods With Sparse Constraints

The Kaczmarz method (KZ) and its variants, which are types of stochastic gradient descent (SGD) methods, have been extensively studied due to their simplicity and efficiency in solving linear equation systems. The iterative thresholding (IHT) method has gained popularity in various research fields, including compressed sensing or sparse linear regression, machine learning with additional structure, and optimization with nonconvex constraints. Recently, a hybrid method called Kaczmarz-based IHT (KZIHT) has been proposed, combining the benefits of both approaches, but its theoretical guarantees are missing. In this paper, we provide the first theoretical convergence guarantees for KZIHT by showing that it converges linearly to the solution of a system with sparsity constraints up to optimal statistical bias when the reshuffling data sampling scheme is used. We also propose the Kaczmarz with periodic thresholding (KZPT) method, which generalizes KZIHT by applying the thresholding operation for every certain number of KZ iterations and by employing two different types of step sizes. We establish a linear convergence guarantee for KZPT for randomly subsampled bounded orthonormal systems (BOS) and mean-zero isotropic sub-Gaussian random matrices, which are most commonly used models in compressed sensing, dimension reduction, matrix sketching, and many inverse problems in neural networks. Our analysis shows that KZPT with an optimal thresholding period outperforms KZIHT. To support our theory, we include several numerical experiments.

* Submitted to a journal 
Viaarxiv icon

One-Bit Quadratic Compressed Sensing: From Sample Abundance to Linear Feasibility

Mar 16, 2023
Arian Eamaz, Farhang Yeganegi, Deanna Needell, Mojtaba Soltanalian

Figure 1 for One-Bit Quadratic Compressed Sensing: From Sample Abundance to Linear Feasibility
Figure 2 for One-Bit Quadratic Compressed Sensing: From Sample Abundance to Linear Feasibility
Figure 3 for One-Bit Quadratic Compressed Sensing: From Sample Abundance to Linear Feasibility
Figure 4 for One-Bit Quadratic Compressed Sensing: From Sample Abundance to Linear Feasibility

One-bit quantization with time-varying sampling thresholds has recently found significant utilization potential in statistical signal processing applications due to its relatively low power consumption and low implementation cost. In addition to such advantages, an attractive feature of one-bit analog-to-digital converters (ADCs) is their superior sampling rates as compared to their conventional multi-bit counterparts. This characteristic endows one-bit signal processing frameworks with what we refer to as sample abundance. On the other hand, many signal recovery and optimization problems are formulated as (possibly non-convex) quadratic programs with linear feasibility constraints in the one-bit sampling regime. We demonstrate, with a particular focus on quadratic compressed sensing, that the sample abundance paradigm allows for the transformation of such quadratic problems to merely a linear feasibility problem by forming a large-scale overdetermined linear system; thus removing the need for costly optimization constraints and objectives. To efficiently tackle the emerging overdetermined linear feasibility problem, we further propose an enhanced randomized Kaczmarz algorithm, called Block SKM. Several numerical results are presented to illustrate the effectiveness of the proposed methodologies.

* arXiv admin note: substantial text overlap with arXiv:2301.03467 
Viaarxiv icon

Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling

Feb 28, 2023
Tyler Will, Runyu Zhang, Eli Sadovnik, Mengdi Gao, Joshua Vendrow, Jamie Haddock, Denali Molitor, Deanna Needell

Figure 1 for Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling
Figure 2 for Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling
Figure 3 for Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling
Figure 4 for Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling

We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data. Datasets with hierarchical structure arise in a wide variety of fields, such as document classification, image processing, and bioinformatics. Neural NMF recursively applies NMF in layers to discover overarching topics encompassing the lower-level features. We derive a backpropagation optimization scheme that allows us to frame hierarchical NMF as a neural network. We test Neural NMF on a synthetic hierarchical dataset, the 20 Newsgroups dataset, and the MyLymeData symptoms dataset. Numerical results demonstrate that Neural NMF outperforms other hierarchical NMF methods on these data sets and offers better learned hierarchical structure and interpretability of topics.

Viaarxiv icon