Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rebecca Willett

Tensor Methods for Nonlinear Matrix Completion

Apr 26, 2018

Greg Ongie, Laura Balzano, Daniel Pimentel-Alarcón, Rebecca Willett, Robert D. Nowak

Figure 1 for Tensor Methods for Nonlinear Matrix Completion

Figure 2 for Tensor Methods for Nonlinear Matrix Completion

Figure 3 for Tensor Methods for Nonlinear Matrix Completion

Figure 4 for Tensor Methods for Nonlinear Matrix Completion

Abstract:In the low rank matrix completion (LRMC) problem, the low rank assumption means that the columns (or rows) of the matrix to be completed are points on a low-dimensional linear algebraic variety. This paper extends this thinking to cases where the columns are points on a low-dimensional nonlinear algebraic variety, a problem we call Low Algebraic Dimension Matrix Completion (LADMC). Matrices whose columns belong to a union of subspaces (UoS) are an important special case. We propose a LADMC algorithm that leverages existing LRMC methods on a tensorized representation of the data. For example, a second-order tensorization representation is formed by taking the outer product of each column with itself, and we consider higher order tensorizations as well. This approach will succeed in many cases where traditional LRMC is guaranteed to fail because the data are low-rank in the tensorized representation but not in the original representation. We also provide a formal mathematical justification for the success of our method. In particular, we show bounds of the rank of these data in the tensorized representation, and we prove sampling requirements to guarantee uniqueness of the solution. Interestingly, the sampling requirements of our LADMC algorithm nearly match the information theoretic lower bounds for matrix completion under a UoS model. We also provide experimental results showing that the new approach significantly outperforms existing state-of-the-art methods for matrix completion in many situations.

Via

Access Paper or Ask Questions

Missing Data in Sparse Transition Matrix Estimation for Sub-Gaussian Vector Autoregressive Processes

Feb 26, 2018

Amin Jalali, Rebecca Willett

Figure 1 for Missing Data in Sparse Transition Matrix Estimation for Sub-Gaussian Vector Autoregressive Processes

Figure 2 for Missing Data in Sparse Transition Matrix Estimation for Sub-Gaussian Vector Autoregressive Processes

Abstract:High-dimensional time series data exist in numerous areas such as finance, genomics, healthcare, and neuroscience. An unavoidable aspect of all such datasets is missing data, and dealing with this issue has been an important focus in statistics, control, and machine learning. In this work, we consider a high-dimensional estimation problem where a dynamical system, governed by a stable vector autoregressive model, is randomly and only partially observed at each time point. Our task amounts to estimating the transition matrix, which is assumed to be sparse. In such a scenario, where covariates are highly interdependent and partially missing, new theoretical challenges arise. While transition matrix estimation in vector autoregressive models has been studied previously, the missing data scenario requires separate efforts. Moreover, while transition matrix estimation can be studied from a high-dimensional sparse linear regression perspective, the covariates are highly dependent and existing results on regularized estimation with missing data from i.i.d.~covariates are not applicable. At the heart of our analysis lies 1) a novel concentration result when the innovation noise satisfies the convex concentration property, as well as 2) a new quantity for characterizing the interactions of the time-varying observation process with the underlying dynamical system.

Via

Access Paper or Ask Questions

Network Estimation from Point Process Data

Feb 13, 2018

Benjamin Mark, Garvesh Raskutti, Rebecca Willett

Figure 1 for Network Estimation from Point Process Data

Figure 2 for Network Estimation from Point Process Data

Figure 3 for Network Estimation from Point Process Data

Figure 4 for Network Estimation from Point Process Data

Abstract:Consider observing a collection of discrete events within a network that reflect how network nodes influence one another. Such data are common in spike trains recorded from biological neural networks, interactions within a social network, and a variety of other settings. Data of this form may be modeled as self-exciting point processes, in which the likelihood of future events depends on the past events. This paper addresses the problem of estimating self-excitation parameters and inferring the underlying functional network structure from self-exciting point process data. Past work in this area was limited by strong assumptions which are addressed by the novel approach here. Specifically, in this paper we (1) incorporate saturation in a point process model which both ensures stability and models non-linear thresholding effects; (2) impose general low-dimensional structural assumptions that include sparsity, group sparsity and low-rankness that allows bounds to be developed in the high-dimensional setting; and (3) incorporate long-range memory effects through moving average and higher-order auto-regressive components. Using our general framework, we provide a number of novel theoretical guarantees for high-dimensional self-exciting point processes that reflect the role played by the underlying network structure and long-term memory. We also provide simulations and real data examples to support our methodology and main results.

* Submitted to IEEE Transactions on Information Theory

Via

Access Paper or Ask Questions

Subspace Clustering with Missing and Corrupted Data

Jan 15, 2018

Zachary Charles, Amin Jalali, Rebecca Willett

Figure 1 for Subspace Clustering with Missing and Corrupted Data

Abstract:Given full or partial information about a collection of points that lie close to a union of several subspaces, subspace clustering refers to the process of clustering the points according to their subspace and identifying the subspaces. One popular approach, sparse subspace clustering (SSC), represents each sample as a weighted combination of the other samples, with weights of minimal $\ell_1$ norm, and then uses those learned weights to cluster the samples. SSC is stable in settings where each sample is contaminated by a relatively small amount of noise. However, when there is a significant amount of additive noise, or a considerable number of entries are missing, theoretical guarantees are scarce. In this paper, we study a robust variant of SSC and establish clustering guarantees in the presence of corrupted or missing data. We give explicit bounds on amount of noise and missing data that the algorithm can tolerate, both in deterministic settings and in a random generative model. Notably, our approach provides guarantees for higher tolerance to noise and missing data than existing analyses for this method. By design, the results hold even when we do not know the locations of the missing data; e.g., as in presence-only data.

* 31 pages, 2 figures

Via

Access Paper or Ask Questions

Online Learning for Changing Environments using Coin Betting

Nov 06, 2017

Kwang-Sung Jun, Francesco Orabona, Stephen Wright, Rebecca Willett

Figure 1 for Online Learning for Changing Environments using Coin Betting

Figure 2 for Online Learning for Changing Environments using Coin Betting

Figure 3 for Online Learning for Changing Environments using Coin Betting

Abstract:A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments. Recent studies have proposed "meta" algorithms that convert any online learning algorithm to one that is adaptive to changing environments, where the adaptivity is analyzed in a quantity called the strongly-adaptive regret. This paper describes a new meta algorithm that has a strongly-adaptive regret bound that is a factor of $\sqrt{\log(T)}$ better than other algorithms with the same time complexity, where $T$ is the time horizon. We also extend our algorithm to achieve a first-order (i.e., dependent on the observed losses) strongly-adaptive regret bound for the first time, to our knowledge. At its heart is a new parameter-free algorithm for the learning with expert advice (LEA) problem in which experts sometimes do not output advice for consecutive time steps (i.e., \emph{sleeping} experts). This algorithm is derived by a reduction from optimal algorithms for the so-called coin betting problem. Empirical results show that our algorithm outperforms state-of-the-art methods in both learning with expert advice and metric learning scenarios.

* submitted to a journal. arXiv admin note: substantial text overlap with arXiv:1610.04578

Via

Access Paper or Ask Questions

Scalable Generalized Linear Bandits: Online Computation and Hashing

Oct 21, 2017

Kwang-Sung Jun, Aniruddha Bhargava, Robert Nowak, Rebecca Willett

Figure 1 for Scalable Generalized Linear Bandits: Online Computation and Hashing

Abstract:Generalized Linear Bandits (GLBs), a natural extension of the stochastic linear bandits, has been popular and successful in recent years. However, existing GLBs scale poorly with the number of rounds and the number of arms, limiting their utility in practice. This paper proposes new, scalable solutions to the GLB problem in two respects. First, unlike existing GLBs, whose per-time-step space and time complexity grow at least linearly with time $t$, we propose a new algorithm that performs online computations to enjoy a constant space and time complexity. At its heart is a novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes \emph{any} online learning algorithm and turns it into a GLB algorithm. As a special case, we apply GLOC to the online Newton step algorithm, which results in a low-regret GLB algorithm with much lower time and memory complexity than prior work. Second, for the case where the number $N$ of arms is very large, we propose new algorithms in which each next arm is selected via an inner product search. Such methods can be implemented via hashing algorithms (i.e., "hash-amenable") and result in a time complexity sublinear in $N$. While a Thompson sampling extension of GLOC is hash-amenable, its regret bound for $d$-dimensional arm sets scales with $d^{3/2}$, whereas GLOC's regret bound scales with $d$. Towards closing this gap, we propose a new hash-amenable algorithm whose regret bound scales with $d^{5/4}$. Finally, we propose a fast approximate hash-key computation (inner product) with a better accuracy than the state-of-the-art, which can be of independent interest. We conclude the paper with preliminary experimental results confirming the merits of our methods.

* accepted to NIPS'17 (typos fixed)

Via

Access Paper or Ask Questions

Improved Strongly Adaptive Online Learning using Coin Betting

Aug 07, 2017

Kwang-Sung Jun, Francesco Orabona, Rebecca Willett, Stephen Wright

Figure 1 for Improved Strongly Adaptive Online Learning using Coin Betting

Figure 2 for Improved Strongly Adaptive Online Learning using Coin Betting

Figure 3 for Improved Strongly Adaptive Online Learning using Coin Betting

Abstract:This paper describes a new parameter-free online learning algorithm for changing environments. In comparing against algorithms with the same time complexity as ours, we obtain a strongly adaptive regret bound that is a factor of at least $\sqrt{\log(T)}$ better, where $T$ is the time horizon. Empirical results show that our algorithm outperforms state-of-the-art methods in learning with expert advice and metric learning scenarios.

* fixed a few typos

Via

Access Paper or Ask Questions

Inference of High-dimensional Autoregressive Generalized Linear Models

Jun 24, 2017

Eric C. Hall, Garvesh Raskutti, Rebecca Willett

Figure 1 for Inference of High-dimensional Autoregressive Generalized Linear Models

Figure 2 for Inference of High-dimensional Autoregressive Generalized Linear Models

Figure 3 for Inference of High-dimensional Autoregressive Generalized Linear Models

Abstract:Vector autoregressive models characterize a variety of time series in which linear combinations of current and past observations can be used to accurately predict future observations. For instance, each element of an observation vector could correspond to a different node in a network, and the parameters of an autoregressive model would correspond to the impact of the network structure on the time series evolution. Often these models are used successfully in practice to learn the structure of social, epidemiological, financial, or biological neural networks. However, little is known about statistical guarantees on estimates of such models in non-Gaussian settings. This paper addresses the inference of the autoregressive parameters and associated network structure within a generalized linear model framework that includes Poisson and Bernoulli autoregressive processes. At the heart of this analysis is a sparsity-regularized maximum likelihood estimator. While sparsity-regularization is well-studied in the statistics and machine learning communities, those analysis methods cannot be applied to autoregressive generalized linear models because of the correlations and potential heteroscedasticity inherent in the observations. Sample complexity bounds are derived using a combination of martingale concentration inequalities and modern empirical process techniques for dependent random variables. These bounds, which are supported by several simulation studies, characterize the impact of various network parameters on estimator performance.

* Submitted to IEEE Transactions on Information Theory

Via

Access Paper or Ask Questions

Algebraic Variety Models for High-Rank Matrix Completion

Mar 28, 2017

Greg Ongie, Rebecca Willett, Robert D. Nowak, Laura Balzano

Figure 1 for Algebraic Variety Models for High-Rank Matrix Completion

Figure 2 for Algebraic Variety Models for High-Rank Matrix Completion

Figure 3 for Algebraic Variety Models for High-Rank Matrix Completion

Figure 4 for Algebraic Variety Models for High-Rank Matrix Completion

Abstract:We consider a generalization of low-rank matrix completion to the case where the data belongs to an algebraic variety, i.e. each data point is a solution to a system of polynomial equations. In this case the original matrix is possibly high-rank, but it becomes low-rank after mapping each column to a higher dimensional space of monomial features. Many well-studied extensions of linear models, including affine subspaces and their union, can be described by a variety model. In addition, varieties can be used to model a richer class of nonlinear quadratic and higher degree curves and surfaces. We study the sampling requirements for matrix completion under a variety model with a focus on a union of affine subspaces. We also propose an efficient matrix completion algorithm that minimizes a convex or non-convex surrogate of the rank of the matrix of monomial features. Our algorithm uses the well-known "kernel trick" to avoid working directly with the high-dimensional monomial matrix. We show the proposed algorithm is able to recover synthetically generated data up to the predicted sampling complexity bounds. The proposed algorithm also outperforms standard low rank matrix completion and subspace clustering techniques in experiments with real data.

Via

Access Paper or Ask Questions

On Learning High Dimensional Structured Single Index Models

Nov 29, 2016

Nikhil Rao, Ravi Ganti, Laura Balzano, Rebecca Willett, Robert Nowak

Figure 1 for On Learning High Dimensional Structured Single Index Models

Figure 2 for On Learning High Dimensional Structured Single Index Models

Figure 3 for On Learning High Dimensional Structured Single Index Models

Figure 4 for On Learning High Dimensional Structured Single Index Models

Abstract:Single Index Models (SIMs) are simple yet flexible semi-parametric models for machine learning, where the response variable is modeled as a monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights and the nonlinear function that relates features to observations. While methods have been described to learn SIMs in the low dimensional regime, a method that can efficiently learn SIMs in high dimensions, and under general structural assumptions, has not been forthcoming. In this paper, we propose computationally efficient algorithms for SIM inference in high dimensions with structural constraints. Our general approach specializes to sparsity, group sparsity, and low-rank assumptions among others. Experiments show that the proposed method enjoys superior predictive performance when compared to generalized linear models, and achieves results comparable to or better than single layer feedforward neural networks with significantly less computational cost.

* 7 pages, 3 tables, 1 Figure, substantial text overlap with arXiv:1506.08910; Accepted for publication at AAAI 2017; added new experimental results comparing our method to a single layer neural network

Via

Access Paper or Ask Questions