Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Deanna Needell

Kernel Alignment for Unsupervised Feature Selection via Matrix Factorization

Mar 13, 2024

Ziyuan Lin, Deanna Needell

Figure 1 for Kernel Alignment for Unsupervised Feature Selection via Matrix Factorization

Figure 2 for Kernel Alignment for Unsupervised Feature Selection via Matrix Factorization

Figure 3 for Kernel Alignment for Unsupervised Feature Selection via Matrix Factorization

Figure 4 for Kernel Alignment for Unsupervised Feature Selection via Matrix Factorization

Abstract:By removing irrelevant and redundant features, feature selection aims to find a good representation of the original features. With the prevalence of unlabeled data, unsupervised feature selection has been proven effective in alleviating the so-called curse of dimensionality. Most existing matrix factorization-based unsupervised feature selection methods are built upon subspace learning, but they have limitations in capturing nonlinear structural information among features. It is well-known that kernel techniques can capture nonlinear structural information. In this paper, we construct a model by integrating kernel functions and kernel alignment, which can be equivalently characterized as a matrix factorization problem. However, such an extension raises another issue: the algorithm performance heavily depends on the choice of kernel, which is often unknown a priori. Therefore, we further propose a multiple kernel-based learning method. By doing so, our model can learn both linear and nonlinear similarity information and automatically generate the most appropriate kernel. Experimental analysis on real-world data demonstrates that the two proposed methods outperform other classic and state-of-the-art unsupervised feature selection methods in terms of clustering results and redundancy reduction in almost all datasets tested.

Via

Access Paper or Ask Questions

Benign overfitting in leaky ReLU networks with moderate input dimension

Mar 11, 2024

Kedar Karhadkar, Erin George, Michael Murray, Guido Montúfar, Deanna Needell

Abstract:The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data which can be decomposed into the sum of a common signal and a random noise component, which lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign, or harmful, overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with Gradient Descent (GD) satisfy this property. In contrast to prior work we do not require near orthogonality conditions on the training data: notably, for input dimension $d$ and training sample size $n$, while prior work shows asymptotically optimal error when $d = \Omega(n^2 \log n)$, here we require only $d = \Omega\left(n \log \frac{1}{\epsilon}\right)$ to obtain error within $\epsilon$ of optimal.

* 36 pages

Via

Access Paper or Ask Questions

Stochastic gradient descent for streaming linear and rectified linear systems with Massart noise

Mar 02, 2024

Halyun Jeong, Deanna Needell, Elizaveta Rebrova

Figure 1 for Stochastic gradient descent for streaming linear and rectified linear systems with Massart noise

Figure 2 for Stochastic gradient descent for streaming linear and rectified linear systems with Massart noise

Figure 3 for Stochastic gradient descent for streaming linear and rectified linear systems with Massart noise

Figure 4 for Stochastic gradient descent for streaming linear and rectified linear systems with Massart noise

Abstract:We propose SGD-exp, a stochastic gradient descent approach for linear and ReLU regressions under Massart noise (adversarial semi-random corruption model) for the fully streaming setting. We show novel nearly linear convergence guarantees of SGD-exp to the true parameter with up to $50\%$ Massart corruption rate, and with any corruption rate in the case of symmetric oblivious corruptions. This is the first convergence guarantee result for robust ReLU regression in the streaming setting, and it shows the improved convergence rate over previous robust methods for $L_1$ linear regression due to a choice of an exponentially decaying step size, known for its efficiency in practice. Our analysis is based on the drift analysis of a discrete stochastic process, which could also be interesting on its own.

* Submitted to a journal

Via

Access Paper or Ask Questions

Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization

Dec 16, 2023

Yuchen Li, Laura Balzano, Deanna Needell, Hanbaek Lyu

Figure 1 for Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization

Figure 2 for Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization

Figure 3 for Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization

Figure 4 for Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization

Abstract:Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex optimization that sequentially minimizes a majorizing surrogate of the objective function in each block coordinate while the other block coordinates are held fixed. We consider a family of BMM algorithms for minimizing smooth nonconvex objectives, where each parameter block is constrained within a subset of a Riemannian manifold. We establish that this algorithm converges asymptotically to the set of stationary points, and attains an $\epsilon$-stationary point within $\widetilde{O}(\epsilon^{-2})$ iterations. In particular, the assumptions for our complexity results are completely Euclidean when the underlying manifold is a product of Euclidean or Stiefel manifolds, although our analysis makes explicit use of the Riemannian geometry. Our general analysis applies to a wide range of algorithms with Riemannian constraints: Riemannian MM, block projected gradient descent, optimistic likelihood estimation, geodesically constrained subspace tracking, robust PCA, and Riemannian CP-dictionary-learning. We experimentally validate that our algorithm converges faster than standard Euclidean algorithms applied to the Riemannian setting.

* 54 pages, 8 figures

Via

Access Paper or Ask Questions

Stratified-NMF for Heterogeneous Data

Nov 17, 2023

James Chapman, Yotam Yaniv, Deanna Needell

Figure 1 for Stratified-NMF for Heterogeneous Data

Figure 2 for Stratified-NMF for Heterogeneous Data

Figure 3 for Stratified-NMF for Heterogeneous Data

Figure 4 for Stratified-NMF for Heterogeneous Data

Abstract:Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent statistics and a shared topics matrix. We develop multiplicative update rules for this novel objective and prove convergence of the objective. Then, we experiment on synthetic data to demonstrate the efficiency and accuracy of the method. Lastly, we apply our method to three real world datasets and empirically investigate their learned features.

* 5 pages. Will appear in IEEE Asilomar Conference on Signals, Systems, and Computers 2023

Via

Access Paper or Ask Questions

Manifold Filter-Combine Networks

Jul 25, 2023

Joyce Chew, Edward De Brouwer, Smita Krishnaswamy, Deanna Needell, Michael Perlmutter

Figure 1 for Manifold Filter-Combine Networks

Figure 2 for Manifold Filter-Combine Networks

Figure 3 for Manifold Filter-Combine Networks

Figure 4 for Manifold Filter-Combine Networks

Abstract:We introduce a class of manifold neural networks (MNNs) that we call Manifold Filter-Combine Networks (MFCNs), that aims to further our understanding of MNNs, analogous to how the aggregate-combine framework helps with the understanding of graph neural networks (GNNs). This class includes a wide variety of subclasses that can be thought of as the manifold analog of various popular GNNs. We then consider a method, based on building a data-driven graph, for implementing such networks when one does not have global knowledge of the manifold, but merely has access to finitely many sample points. We provide sufficient conditions for the network to provably converge to its continuum limit as the number of sample points tends to infinity. Unlike previous work (which focused on specific graph constructions), our rate of convergence does not directly depend on the number of filters used. Moreover, it exhibits linear dependence on the depth of the network rather than the exponential dependence obtained previously. Additionally, we provide several examples of interesting subclasses of MFCNs and of the rates of convergence that are obtained under specific graph constructions.

Via

Access Paper or Ask Questions

Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

Jun 16, 2023

Erin George, Michael Murray, William Swartworth, Deanna Needell

Figure 1 for Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

Figure 2 for Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

Figure 3 for Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

Abstract:We study benign overfitting in two-layer ReLU networks trained using gradient descent and hinge loss on noisy data for binary classification. In particular, we consider linearly separable data for which a relatively small proportion of labels are corrupted or flipped. We identify conditions on the margin of the clean data that give rise to three distinct training outcomes: benign overfitting, in which zero loss is achieved and with high probability test data is classified correctly; overfitting, in which zero loss is achieved but test data is misclassified with probability lower bounded by a constant; and non-overfitting, in which clean points, but not corrupt points, achieve zero loss and again with high probability test data is classified correctly. Our analysis provides a fine-grained description of the dynamics of neurons throughout training and reveals two distinct phases: in the first phase clean points achieve close to zero loss, in the second phase clean points oscillate on the boundary of zero loss while corrupt points either converge towards zero loss or are eventually zeroed by the network. We prove these results using a combinatorial approach that involves bounding the number of clean versus corrupt updates across these phases of training.

* 48 pages, 2 figures, 1 table

Via

Access Paper or Ask Questions

Stochastic Natural Thresholding Algorithms

Jun 07, 2023

Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, Jing Qin

Figure 1 for Stochastic Natural Thresholding Algorithms

Figure 2 for Stochastic Natural Thresholding Algorithms

Figure 3 for Stochastic Natural Thresholding Algorithms

Figure 4 for Stochastic Natural Thresholding Algorithms

Abstract:Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and discusses convergence guarantees for stochastic natural thresholding algorithms by extending the NT from the deterministic version with linear measurements to the stochastic version with a general objective function. We also conduct various numerical experiments on linear and nonlinear measurements to demonstrate the performance of StoNT.

Via

Access Paper or Ask Questions

Detecting and Mitigating Indirect Stereotypes in Word Embeddings

May 23, 2023

Erin George, Joyce Chew, Deanna Needell

Figure 1 for Detecting and Mitigating Indirect Stereotypes in Word Embeddings

Figure 2 for Detecting and Mitigating Indirect Stereotypes in Word Embeddings

Figure 3 for Detecting and Mitigating Indirect Stereotypes in Word Embeddings

Figure 4 for Detecting and Mitigating Indirect Stereotypes in Word Embeddings

Abstract:Societal biases in the usage of words, including harmful stereotypes, are frequently learned by common word embedding methods. These biases manifest not only between a word and an explicit marker of its stereotype, but also between words that share related stereotypes. This latter phenomenon, sometimes called "indirect bias,'' has resisted prior attempts at debiasing. In this paper, we propose a novel method called Biased Indirect Relationship Modification (BIRM) to mitigate indirect bias in distributional word embeddings by modifying biased relationships between words before embeddings are learned. This is done by considering how the co-occurrence probability of a given pair of words changes in the presence of words marking an attribute of bias, and using this to average out the effect of a bias attribute. To evaluate this method, we perform a series of common tests and demonstrate that measures of bias in the word embeddings are reduced in exchange for minor reduction in the semantic quality of the embeddings. In addition, we conduct novel tests for measuring indirect stereotypes by extending the Word Embedding Association Test (WEAT) with new test sets for indirect binary gender stereotypes. With these tests, we demonstrate the presence of more subtle stereotypes not addressed by previous work. The proposed method is able to reduce the presence of some of these new stereotypes, serving as a crucial next step towards non-stereotyped word embeddings.

* 15 pages

Via

Access Paper or Ask Questions

Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption

May 06, 2023

HanQin Cai, Zehan Chao, Longxiu Huang, Deanna Needell

Figure 1 for Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption

Figure 2 for Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption

Figure 3 for Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption

Figure 4 for Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption

Abstract:We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of matrix robust principal component analysis (RPCA), that aims to split the given tensor into an underlying low-rank component and a sparse outlier component. This work proposes a fast algorithm, called Robust Tensor CUR Decompositions (RTCUR), for large-scale non-convex TRPCA problems under the Tucker rank setting. RTCUR is developed within a framework of alternating projections that projects between the set of low-rank tensors and the set of sparse tensors. We utilize the recently developed tensor CUR decomposition to substantially reduce the computational complexity in each projection. In addition, we develop four variants of RTCUR for different application settings. We demonstrate the effectiveness and computational advantages of RTCUR against state-of-the-art methods on both synthetic and real-world datasets.

Via

Access Paper or Ask Questions