Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bryon Aragam

Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families

Oct 28, 2021
Goutham Rajendran, Bohdan Kivva, Ming Gao, Bryon Aragam

Figure 1 for Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families

Figure 2 for Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families

Figure 3 for Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families

Figure 4 for Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families

Greedy algorithms have long been a workhorse for learning graphical models, and more broadly for learning statistical models with sparse structure. In the context of learning directed acyclic graphs, greedy algorithms are popular despite their worst-case exponential runtime. In practice, however, they are very efficient. We provide new insight into this phenomenon by studying a general greedy score-based algorithm for learning DAGs. Unlike edge-greedy algorithms such as the popular GES and hill-climbing algorithms, our approach is vertex-greedy and requires at most a polynomial number of score evaluations. We then show how recent polynomial-time algorithms for learning DAG models are a special case of this algorithm, thereby illustrating how these order-based algorithms can be rigourously interpreted as score-based algorithms. This observation suggests new score functions and optimality conditions based on the duality between Bregman divergences and exponential families, which we explore in detail. Explicit sample and computational complexity bounds are derived. Finally, we provide extensive experiments suggesting that this algorithm indeed optimizes the score in a variety of settings.

* Accepted to NeurIPS 2021; 27 pages, 9 figures

Via

Access Paper or Ask Questions

Efficient Bayesian network structure learning via local Markov boundary search

Oct 12, 2021
Ming Gao, Bryon Aragam

Figure 1 for Efficient Bayesian network structure learning via local Markov boundary search

Figure 2 for Efficient Bayesian network structure learning via local Markov boundary search

We analyze the complexity of learning directed acyclic graphical models from observational data in general settings without specific distributional assumptions. Our approach is information-theoretic and uses a local Markov boundary search procedure in order to recursively construct ancestral sets in the underlying graphical model. Perhaps surprisingly, we show that for certain graph ensembles, a simple forward greedy search algorithm (i.e. without a backward pruning phase) suffices to learn the Markov boundary of each node. This substantially improves the sample complexity, which we show is at most polynomial in the number of nodes. This is then applied to learn the entire graph under a novel identifiability condition that generalizes existing conditions from the literature. As a matter of independent interest, we establish finite-sample guarantees for the problem of recovering Markov boundaries from data. Moreover, we apply our results to the special case of polytrees, for which the assumptions simplify, and provide explicit conditions under which polytrees are identifiable and learnable in polynomial time. We further illustrate the performance of the algorithm, which is easy to implement, in a simulation study. Our approach is general, works for discrete or continuous distributions without distributional assumptions, and as such sheds light on the minimal assumptions required to efficiently learn the structure of directed graphical models from data.

* 30 pages, 3 figures, to appear in NeurIPS 2021

Via

Access Paper or Ask Questions

Uniform Consistency in Nonparametric Mixture Models

Aug 31, 2021
Bryon Aragam, Ruiyi Yang

Figure 1 for Uniform Consistency in Nonparametric Mixture Models

We study uniform consistency in nonparametric mixture models as well as closely related mixture of regression (also known as mixed regression) models, where the regression functions are allowed to be nonparametric and the error distributions are assumed to be convolutions of a Gaussian density. We construct uniformly consistent estimators under general conditions while simultaneously highlighting several pain points in extending existing pointwise consistency results to uniform results. The resulting analysis turns out to be nontrivial, and several novel technical tools are developed along the way. In the case of mixed regression, we prove $L^1$ convergence of the regression functions while allowing for the component regression functions to intersect arbitrarily often, which presents additional technical challenges. We also consider generalizations to general (i.e. non-convolutional) nonparametric mixtures.

Via

Access Paper or Ask Questions

Learning latent causal graphs via mixture oracles

Jun 29, 2021
Bohdan Kivva, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam

Figure 1 for Learning latent causal graphs via mixture oracles

Figure 2 for Learning latent causal graphs via mixture oracles

Figure 3 for Learning latent causal graphs via mixture oracles

Figure 4 for Learning latent causal graphs via mixture oracles

We study the problem of reconstructing a causal graphical model from data in the presence of latent variables. The main problem of interest is recovering the causal structure over the latent variables while allowing for general, potentially nonlinear dependence between the variables. In many practical problems, the dependence between raw observations (e.g. pixels in an image) is much less relevant than the dependence between certain high-level, latent features (e.g. concepts or objects), and this is the setting of interest. We provide conditions under which both the latent representations and the underlying latent causal model are identifiable by a reduction to a mixture oracle. The proof is constructive, and leads to several algorithms for explicitly reconstructing the full graphical model. We discuss efficient algorithms and provide experiments illustrating the algorithms in practice.

* 37 pages

Via

Access Paper or Ask Questions

Fundamental Limits and Tradeoffs in Invariant Representation Learning

Dec 19, 2020
Han Zhao, Chen Dan, Bryon Aragam, Tommi S. Jaakkola, Geoffrey J. Gordon, Pradeep Ravikumar

Figure 1 for Fundamental Limits and Tradeoffs in Invariant Representation Learning

Figure 2 for Fundamental Limits and Tradeoffs in Invariant Representation Learning

Figure 3 for Fundamental Limits and Tradeoffs in Invariant Representation Learning

Many machine learning applications involve learning representations that achieve two competing goals: To maximize information or accuracy with respect to a subset of features (e.g.\ for prediction) while simultaneously maximizing invariance or independence with respect to another, potentially overlapping, subset of features (e.g.\ for fairness, privacy, etc). Typical examples include privacy-preserving learning, domain adaptation, and algorithmic fairness, just to name a few. In fact, all of the above problems admit a common minimax game-theoretic formulation, whose equilibrium represents a fundamental tradeoff between accuracy and invariance. Despite its abundant applications in the aforementioned domains, theoretical understanding on the limits and tradeoffs of invariant representations is severely lacking. In this paper, we provide an information-theoretic analysis of this general and important problem under both classification and regression settings. In both cases, we analyze the inherent tradeoffs between accuracy and invariance by providing a geometric characterization of the feasible region in the information plane, where we connect the geometric properties of this feasible region to the fundamental limitations of the tradeoff problem. In the regression setting, we also derive a tight lower bound on the Lagrangian objective that quantifies the tradeoff between accuracy and invariance. This lower bound leads to a better understanding of the tradeoff via the spectral properties of the joint distribution. In both cases, our results shed new light on this fundamental problem by providing insights on the interplay between accuracy and invariance. These results deepen our understanding of this fundamental problem and may be useful in guiding the design of adversarial representation learning algorithms.

Via

Access Paper or Ask Questions

A polynomial-time algorithm for learning nonparametric causal graphs

Jun 22, 2020
Ming Gao, Yi Ding, Bryon Aragam

Figure 1 for A polynomial-time algorithm for learning nonparametric causal graphs

Figure 2 for A polynomial-time algorithm for learning nonparametric causal graphs

Figure 3 for A polynomial-time algorithm for learning nonparametric causal graphs

Figure 4 for A polynomial-time algorithm for learning nonparametric causal graphs

We establish finite-sample guarantees for a polynomial-time algorithm for learning a nonlinear, nonparametric directed acyclic graphical (DAG) model from data. The analysis is model-free and does not assume linearity, additivity, independent noise, or faithfulness. Instead, we impose a condition on the residual variances that is closely related to previous work on linear models with equal variances. Compared to an optimal algorithm with oracle knowledge of the variable ordering, the additional cost of the algorithm is linear in the dimension $d$ and the number of samples $n$. Finally, we compare the proposed algorithm to existing approaches in a simulation study.

* 30 pages

Via

Access Paper or Ask Questions

DYNOTEARS: Structure Learning from Time-Series Data

Feb 02, 2020
Roxana Pamfil, Nisara Sriwattanaworachai, Shaan Desai, Philip Pilgerstorfer, Paul Beaumont, Konstantinos Georgatzis, Bryon Aragam

Figure 1 for DYNOTEARS: Structure Learning from Time-Series Data

Figure 2 for DYNOTEARS: Structure Learning from Time-Series Data

Figure 3 for DYNOTEARS: Structure Learning from Time-Series Data

Figure 4 for DYNOTEARS: Structure Learning from Time-Series Data

In this paper, we revisit the structure learning problem for dynamic Bayesian networks and propose a method that simultaneously estimates contemporaneous (intra-slice) and time-lagged (inter-slice) relationships between variables in a time-series. Our approach is score-based, and revolves around minimizing a penalized loss subject to an acyclicity constraint. To solve this problem, we leverage a recent algebraic result characterizing the acyclicity constraint as a smooth equality constraint. The resulting algorithm, which we call DYNOTEARS, outperforms other methods on simulated data, especially in high-dimensions as the number of variables increases. We also apply this algorithm on real datasets from two different domains, finance and molecular biology, and analyze the resulting output. Compared to state-of-the-art methods for learning dynamic Bayesian networks, our method is both scalable and accurate on real data. The simple formulation, and competitive performance of our method make it suitable for a variety of problems where one seeks to learn connections between variables across time.

* 22 pages, 13 figures, accepted to AISTATS 2020

Via

Access Paper or Ask Questions

Diagnostic Curves for Black Box Models

Dec 02, 2019
David I. Inouye, Liu Leqi, Joon Sik Kim, Bryon Aragam, Pradeep Ravikumar

Figure 1 for Diagnostic Curves for Black Box Models

Figure 2 for Diagnostic Curves for Black Box Models

Figure 3 for Diagnostic Curves for Black Box Models

Figure 4 for Diagnostic Curves for Black Box Models

In safety-critical applications of machine learning, it is often necessary to look beyond standard metrics such as test accuracy in order to validate various qualitative properties such as monotonicity with respect to a feature or combination of features, checking for undesirable changes or oscillations in the response, and differences in outcomes (e.g. discrimination) for a protected class. To help answer this need, we propose a framework for approximately validating (or invalidating) various properties of a black box model by finding a univariate diagnostic curve in the input space whose output maximally violates a given property. These diagnostic curves show the exact value of the model along the curve and can be displayed with a simple and intuitive line graph. We demonstrate the usefulness of these diagnostic curves across multiple use-cases and datasets including selecting between two models and understanding out-of-sample behavior.

* Accepted to NeurIPS 2019 Workshop on Safety and Robustness in Decision Making

Via

Access Paper or Ask Questions

Learning Sample-Specific Models with Low-Rank Personalized Regression

Oct 15, 2019
Benjamin Lengerich, Bryon Aragam, Eric P. Xing

Figure 1 for Learning Sample-Specific Models with Low-Rank Personalized Regression

Figure 2 for Learning Sample-Specific Models with Low-Rank Personalized Regression

Figure 3 for Learning Sample-Specific Models with Low-Rank Personalized Regression

Figure 4 for Learning Sample-Specific Models with Low-Rank Personalized Regression

Modern applications of machine learning (ML) deal with increasingly heterogeneous datasets comprised of data collected from overlapping latent subpopulations. As a result, traditional models trained over large datasets may fail to recognize highly predictive localized effects in favour of weakly predictive global patterns. This is a problem because localized effects are critical to developing individualized policies and treatment plans in applications ranging from precision medicine to advertising. To address this challenge, we propose to estimate sample-specific models that tailor inference and prediction at the individual level. In contrast to classical ML models that estimate a single, complex model (or only a few complex models), our approach produces a model personalized to each sample. These sample-specific models can be studied to understand subgroup dynamics that go beyond coarse-grained class labels. Crucially, our approach does not assume that relationships between samples (e.g. a similarity network) are known a priori. Instead, we use unmodeled covariates to learn a latent distance metric over the samples. We apply this approach to financial, biomedical, and electoral data as well as simulated data and show that sample-specific models provide fine-grained interpretations of complicated phenomena without sacrificing predictive accuracy compared to state-of-the-art models such as deep neural networks.

* Accepted at NeurIPS 2019

Via

Access Paper or Ask Questions