Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

John Lafferty

University of Chicago

Conditional Sparse Coding and Grouped Multivariate Regression

Jun 27, 2012

Min Xu, John Lafferty

Figure 1 for Conditional Sparse Coding and Grouped Multivariate Regression

Abstract:We study the problem of multivariate regression where the data are naturally grouped, and a regression matrix is to be estimated for each group. We propose an approach in which a dictionary of low rank parameter matrices is estimated across groups, and a sparse linear combination of the dictionary elements is estimated to form a model within each group. We refer to the method as conditional sparse coding since it is a coding procedure for the response vectors Y conditioned on the covariate vectors X. This approach captures the shared information across the groups while adapting to the structure within each group. It exploits the same intuition behind sparse coding that has been successfully developed in computer vision and computational neuroscience. We propose an algorithm for conditional sparse coding, analyze its theoretical properties in terms of predictive accuracy, and present the results of simulation and brain imaging experiments that compare the new technique to reduced rank regression.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Via

Access Paper or Ask Questions

Sequential Nonparametric Regression

Jun 27, 2012

Haijie Gu, John Lafferty

Figure 1 for Sequential Nonparametric Regression

Figure 2 for Sequential Nonparametric Regression

Abstract:We present algorithms for nonparametric regression in settings where the data are obtained sequentially. While traditional estimators select bandwidths that depend upon the sample size, for sequential data the effective sample size is dynamically changing. We propose a linear time algorithm that adjusts the bandwidth for each new data point, and show that the estimator achieves the optimal minimax rate of convergence. We also propose the use of online expert mixing algorithms to adapt to unknown smoothness of the regression function. We provide simulations that confirm the theoretical results, and demonstrate the effectiveness of the methods.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Via

Access Paper or Ask Questions

Sparse Additive Functional and Kernel CCA

Jun 18, 2012

Sivaraman Balakrishnan, Kriti Puniyani, John Lafferty

Figure 1 for Sparse Additive Functional and Kernel CCA

Figure 2 for Sparse Additive Functional and Kernel CCA

Figure 3 for Sparse Additive Functional and Kernel CCA

Figure 4 for Sparse Additive Functional and Kernel CCA

Abstract:Canonical Correlation Analysis (CCA) is a classical tool for finding correlations among the components of two random vectors. In recent years, CCA has been widely applied to the analysis of genomic data, where it is common for researchers to perform multiple assays on a single set of patient samples. Recent work has proposed sparse variants of CCA to address the high dimensionality of such data. However, classical and sparse CCA are based on linear models, and are thus limited in their ability to find general correlations. In this paper, we present two approaches to high-dimensional nonparametric CCA, building on recent developments in high-dimensional nonparametric regression. We present estimation procedures for both approaches, and analyze their theoretical properties in the high-dimensional setting. We demonstrate the effectiveness of these procedures in discovering nonlinear correlations via extensive simulations, as well as through experiments with genomic data.

* ICML2012

Via

Access Paper or Ask Questions

Forest Density Estimation

Oct 20, 2010

Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, Larry Wasserman

Abstract:We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models.

* Extended version of earlier paper titled "Tree density estimation"

Via

Access Paper or Ask Questions

Union Support Recovery in Multi-task Learning

Aug 31, 2010

Mladen Kolar, John Lafferty, Larry Wasserman

Figure 1 for Union Support Recovery in Multi-task Learning

Figure 2 for Union Support Recovery in Multi-task Learning

Figure 3 for Union Support Recovery in Multi-task Learning

Abstract:We sharply characterize the performance of different penalization schemes for the problem of selecting the relevant variables in the multi-task setting. Previous work focuses on the regression problem where conditions on the design matrix complicate the analysis. A clearer and simpler picture emerges by studying the Normal means model. This model, often used in the field of statistics, is a simplified model that provides a laboratory for studying complex procedures.

Via

Access Paper or Ask Questions

Graph-Valued Regression

Jun 21, 2010

Han Liu, Xi Chen, John Lafferty, Larry Wasserman

Abstract:Undirected graphical models encode in a graph $G$ the dependency structure of a random vector $Y$. In many applications, it is of interest to model $Y$ given another random vector $X$ as input. We refer to the problem of estimating the graph $G(x)$ of $Y$ conditioned on $X=x$ as ``graph-valued regression.'' In this paper, we propose a semiparametric method for estimating $G(x)$ that builds a tree on the $X$ space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph. We call the method ``Graph-optimized CART,'' or Go-CART. We study the theoretical properties of Go-CART using dyadic partitioning trees, establishing oracle inequalities on risk minimization and tree partition consistency. We also demonstrate the application of Go-CART to a meteorological dataset, showing how graph-valued regression can provide a useful tool for analyzing complex data.

Via

Access Paper or Ask Questions

The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

Mar 03, 2009

Han Liu, John Lafferty, Larry Wasserman

Figure 1 for The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

Figure 2 for The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

Figure 3 for The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

Figure 4 for The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

Abstract:Recent methods for estimating sparse undirected graphs for real-valued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula--or "nonparanormal"--for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of one-dimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for estimating the nonparanormal, study the method's theoretical properties, and show that it works well in many examples.

Via

Access Paper or Ask Questions

Time Varying Undirected Graphs

Apr 29, 2008

Shuheng Zhou, John Lafferty, Larry Wasserman

Figure 1 for Time Varying Undirected Graphs

Figure 2 for Time Varying Undirected Graphs

Figure 3 for Time Varying Undirected Graphs

Abstract:Undirected graphs are often used to describe high dimensional distributions. Under sparsity conditions, the graph can be estimated using $\ell_1$ penalization methods. However, current methods assume that the data are independent and identically distributed. If the distribution, and hence the graph, evolves over time then the data are not longer identically distributed. In this paper, we show how to estimate the sequence of graphs for non-identically distributed data, where the distribution evolves over time.

* The 21st Annual Conference on Learning Theory (COLT 2008), Helsinki, Finland
* 12 pages, 3 figures, to appear in COLT 2008

Via

Access Paper or Ask Questions

Compressed Regression

Jan 11, 2008

Shuheng Zhou, John Lafferty, Larry Wasserman

Abstract:Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. This line of work shows that $\ell_1$-regularized least squares regression can accurately estimate a sparse linear model from $n$ noisy examples in $p$ dimensions, even if $p$ is much larger than $n$. In this paper we study a variant of this problem where the original $n$ input variables are compressed by a random linear transformation to $m \ll n$ examples in $p$ dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for $\ell_1$-regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called ``sparsistence.'' In addition, we show that $\ell_1$-regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called ``persistence.'' Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the mutual information between the compressed and uncompressed data that decay to zero.

* IEEE Transactions on Information Theory, Volume 55, No.2, pp 846--866, 2009
* 59 pages, 5 figure, Submitted for review

Via

Access Paper or Ask Questions

A Model of Lexical Attraction and Repulsion

Jun 16, 1997

Doug Beeferman, Adam Berger, John Lafferty

Figure 1 for A Model of Lexical Attraction and Repulsion

Abstract:This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale. Empirical data drawn from English and Japanese text, as well as conversational speech, reveals that the ``attraction'' between words decays exponentially, while stylistic and syntactic contraints create a ``repulsion'' between words that discourages close co-occurrence. We show that these characteristics are well described by simple mixture models based on two-stage exponential distributions which can be trained using the EM algorithm. The resulting distance distributions can then be incorporated as penalizing features in an exponential language model.

* 8 pages, LaTeX source and postscript figures for ACL/EACL'97 paper

Via

Access Paper or Ask Questions