Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pradeep Ravikumar

Responsible AI (RAI) Games and Ensembles

Oct 28, 2023

Yash Gupta, Runtian Zhai, Arun Suggala, Pradeep Ravikumar

Abstract:Several recent works have studied the societal effects of AI; these include issues such as fairness, robustness, and safety. In many of these objectives, a learner seeks to minimize its worst-case loss over a set of predefined distributions (known as uncertainty sets), with usual examples being perturbed versions of the empirical distribution. In other words, aforementioned problems can be written as min-max problems over these uncertainty sets. In this work, we provide a general framework for studying these problems, which we refer to as Responsible AI (RAI) games. We provide two classes of algorithms for solving these games: (a) game-play based algorithms, and (b) greedy stagewise estimation algorithms. The former class is motivated by online learning and game theory, whereas the latter class is motivated by the classical statistical literature on boosting, and regression. We empirically demonstrate the applicability and competitive performance of our techniques for solving several RAI problems, particularly around subpopulation shift.

Via

Access Paper or Ask Questions

Sample based Explanations via Generalized Representers

Oct 27, 2023

Che-Ping Tsai, Chih-Kuan Yeh, Pradeep Ravikumar

Abstract:We propose a general class of sample based explanations of machine learning models, which we term generalized representers. To measure the effect of a training sample on a model's test prediction, generalized representers use two components: a global sample importance that quantifies the importance of the training point to the model and is invariant to test samples, and a local sample importance that measures similarity between the training sample and the test point with a kernel. A key contribution of the paper is to show that generalized representers are the only class of sample based explanations satisfying a natural set of axiomatic properties. We discuss approaches to extract global importances given a kernel, and also natural choices of kernels given modern non-linear models. As we show, many popular existing sample based explanations could be cast as generalized representers with particular choices of kernels and approaches to extract global importances. Additionally, we conduct empirical comparisons of different generalized representers on two image and two text classification datasets.

* Accepted by Neurips 2023

Via

Access Paper or Ask Questions

Identifying Representations for Intervention Extrapolation

Oct 06, 2023

Sorawit Saengkyongam, Elan Rosenfeld, Pradeep Ravikumar, Niklas Pfister, Jonas Peters

Figure 1 for Identifying Representations for Intervention Extrapolation

Figure 2 for Identifying Representations for Intervention Extrapolation

Figure 3 for Identifying Representations for Intervention Extrapolation

Figure 4 for Identifying Representations for Intervention Extrapolation

Abstract:The premise of identifiable and causal representation learning is to improve the current representation learning paradigm in terms of generalizability or robustness. Despite recent progress in questions of identifiability, more theoretical results demonstrating concrete advantages of these methods for downstream tasks are needed. In this paper, we consider the task of intervention extrapolation: predicting how interventions affect an outcome, even when those interventions are not observed at training time, and show that identifiable representations can provide an effective solution to this task even if the interventions affect the outcome non-linearly. Our setup includes an outcome Y, observed features X, which are generated as a non-linear transformation of latent features Z, and exogenous action variables A, which influence Z. The objective of intervention extrapolation is to predict how interventions on A that lie outside the training support of A affect Y. Here, extrapolation becomes possible if the effect of A on Z is linear and the residual when regressing Z on A has full support. As Z is latent, we combine the task of intervention extrapolation with identifiable representation learning, which we call Rep4Ex: we aim to map the observed features X into a subspace that allows for non-linear extrapolation in A. We show using Wiener's Tauberian theorem that the hidden representation is identifiable up to an affine transformation in Z-space, which is sufficient for intervention extrapolation. The identifiability is characterized by a novel constraint describing the linearity assumption of A on Z. Based on this insight, we propose a method that enforces the linear invariance constraint and can be combined with any type of autoencoder. We validate our theoretical findings through synthetic experiments and show that our approach succeeds in predicting the effects of unseen interventions.

Via

Access Paper or Ask Questions

iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models

Jun 30, 2023

Tianyu Chen, Kevin Bello, Bryon Aragam, Pradeep Ravikumar

Figure 1 for iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models

Figure 2 for iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models

Figure 3 for iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models

Figure 4 for iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models

Abstract:Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the true underlying directed acyclic graph (DAG) structure is often unknown, and determining it from observational or interventional data remains a challenging task. However, in many situations, the end goal is to identify changes (shifts) in causal mechanisms between related SCMs rather than recovering the entire underlying DAG structure. Examples include analyzing gene regulatory network structure changes between healthy and cancerous individuals or understanding variations in biological pathways under different cellular contexts. This paper focuses on identifying $\textit{functional}$ mechanism shifts in two or more related SCMs over the same set of variables -- $\textit{without estimating the entire DAG structure of each SCM}$. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key contribution of this work is to show that the Jacobian of the score function for the $\textit{mixture distribution}$ allows for identification of shifts in general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach.

* 38 pages, 17 figures

Via

Access Paper or Ask Questions

Global Optimality in Bivariate Gradient-based DAG Learning

Jun 30, 2023

Chang Deng, Kevin Bello, Bryon Aragam, Pradeep Ravikumar

Figure 1 for Global Optimality in Bivariate Gradient-based DAG Learning

Figure 2 for Global Optimality in Bivariate Gradient-based DAG Learning

Figure 3 for Global Optimality in Bivariate Gradient-based DAG Learning

Figure 4 for Global Optimality in Bivariate Gradient-based DAG Learning

Abstract:Recently, a new class of non-convex optimization problems motivated by the statistical problem of learning an acyclic directed graphical model from data has attracted significant interest. While existing work uses standard first-order optimization schemes to solve this problem, proving the global optimality of such approaches has proven elusive. The difficulty lies in the fact that unlike other non-convex problems in the literature, this problem is not "benign", and possesses multiple spurious solutions that standard approaches can easily get trapped in. In this paper, we prove that a simple path-following optimization scheme globally converges to the global minimum of the population loss in the bivariate setting.

* 39 pages, 13 figures

Via

Access Paper or Ask Questions

Learning Linear Causal Representations from Interventions under General Nonlinear Mixing

Jun 04, 2023

Simon Buchholz, Goutham Rajendran, Elan Rosenfeld, Bryon Aragam, Bernhard Schölkopf, Pradeep Ravikumar

Figure 1 for Learning Linear Causal Representations from Interventions under General Nonlinear Mixing

Figure 2 for Learning Linear Causal Representations from Interventions under General Nonlinear Mixing

Figure 3 for Learning Linear Causal Representations from Interventions under General Nonlinear Mixing

Figure 4 for Learning Linear Causal Representations from Interventions under General Nonlinear Mixing

Abstract:We study the problem of learning causal representations from unknown, latent interventions in a general setting, where the latent distribution is Gaussian but the mixing function is completely general. We prove strong identifiability results given unknown single-node interventions, i.e., without having access to the intervention targets. This generalizes prior works which have focused on weaker classes, such as linear maps or paired counterfactual data. This is also the first instance of causal identifiability from non-paired interventions for deep neural network embeddings. Our proof relies on carefully uncovering the high-dimensional geometric structure present in the data distribution after a non-linear density transformation, which we capture by analyzing quadratic forms of precision matrices of the latent distributions. Finally, we propose a contrastive algorithm to identify the latent variables in practice and evaluate its performance on various tasks.

* 38 pages

Via

Access Paper or Ask Questions

Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation

Jun 01, 2023

Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep Ravikumar

Abstract:Good data augmentation is one of the key factors that lead to the empirical success of self-supervised representation learning such as contrastive learning and masked language modeling, yet theoretical understanding of its role in learning good representations remains limited. Recent work has built the connection between self-supervised learning and approximating the top eigenspace of a graph Laplacian operator. Learning a linear probe on top of such features can naturally be connected to RKHS regression. In this work, we use this insight to perform a statistical analysis of augmentation-based pretraining. We start from the isometry property, a key geometric characterization of the target function given by the augmentation. Our first main theorem provides, for an arbitrary encoder, near tight bounds for both the estimation error incurred by fitting the linear probe on top of the encoder, and the approximation error entailed by the fitness of the RKHS the encoder learns. Our second main theorem specifically addresses the case where the encoder extracts the top-d eigenspace of a Monte-Carlo approximation of the underlying kernel with the finite pretraining samples. Our analysis completely disentangles the effects of the model and the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance on synthetic and real datasets.

* 33 pages

Via

Access Paper or Ask Questions

Representer Point Selection for Explaining Regularized High-dimensional Models

May 31, 2023

Che-Ping Tsai, Jiong Zhang, Eli Chien, Hsiang-Fu Yu, Cho-Jui Hsieh, Pradeep Ravikumar

Figure 1 for Representer Point Selection for Explaining Regularized High-dimensional Models

Figure 2 for Representer Point Selection for Explaining Regularized High-dimensional Models

Figure 3 for Representer Point Selection for Explaining Regularized High-dimensional Models

Figure 4 for Representer Point Selection for Explaining Regularized High-dimensional Models

Abstract:We introduce a novel class of sample-based explanations we term high-dimensional representers, that can be used to explain the predictions of a regularized high-dimensional model in terms of importance weights for each of the training samples. Our workhorse is a novel representer theorem for general regularized high-dimensional models, which decomposes the model prediction in terms of contributions from each of the training samples: with positive (negative) values corresponding to positive (negative) impact training samples to the model's prediction. We derive consequences for the canonical instances of $\ell_1$ regularized sparse models, and nuclear norm regularized low-rank models. As a case study, we further investigate the application of low-rank models in the context of collaborative filtering, where we instantiate high-dimensional representers for specific popular classes of models. Finally, we study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets. We also showcase the utility of high-dimensional representers in explaining model recommendations.

* Accepted by ICML 2023

Via

Access Paper or Ask Questions

Optimizing NOTEARS Objectives via Topological Swaps

May 26, 2023

Chang Deng, Kevin Bello, Bryon Aragam, Pradeep Ravikumar

Figure 1 for Optimizing NOTEARS Objectives via Topological Swaps

Figure 2 for Optimizing NOTEARS Objectives via Topological Swaps

Figure 3 for Optimizing NOTEARS Objectives via Topological Swaps

Figure 4 for Optimizing NOTEARS Objectives via Topological Swaps

Abstract:Recently, an intriguing class of non-convex optimization problems has emerged in the context of learning directed acyclic graphs (DAGs). These problems involve minimizing a given loss or score function, subject to a non-convex continuous constraint that penalizes the presence of cycles in a graph. In this work, we delve into the optimization challenges associated with this class of non-convex programs. To address these challenges, we propose a bi-level algorithm that leverages the non-convex constraint in a novel way. The outer level of the algorithm optimizes over topological orders by iteratively swapping pairs of nodes within the topological order of a DAG. A key innovation of our approach is the development of an effective method for generating a set of candidate swapping pairs for each iteration. At the inner level, given a topological order, we utilize off-the-shelf solvers that can handle linear constraints. The key advantage of our proposed algorithm is that it is guaranteed to find a local minimum or a KKT point under weaker conditions compared to previous work and finds solutions with lower scores. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in terms of achieving a better score. Additionally, our method can also be used as a post-processing algorithm to significantly improve the score of other algorithms. Code implementing the proposed method is available at https://github.com/duntrain/topo.

* 39 pages, 12 figures, ICML 2023

Via

Access Paper or Ask Questions

Learning with Explanation Constraints

Mar 25, 2023

Rattana Pukdee, Dylan Sam, J. Zico Kolter, Maria-Florina Balcan, Pradeep Ravikumar

Figure 1 for Learning with Explanation Constraints

Figure 2 for Learning with Explanation Constraints

Figure 3 for Learning with Explanation Constraints

Figure 4 for Learning with Explanation Constraints

Abstract:While supervised learning assumes the presence of labeled data, we may have prior information about how models should behave. In this paper, we formalize this notion as learning from explanation constraints and provide a learning theoretic framework to analyze how such explanations can improve the learning of our models. For what models would explanations be helpful? Our first key contribution addresses this question via the definition of what we call EPAC models (models that satisfy these constraints in expectation over new data), and we analyze this class of models using standard learning theoretic tools. Our second key contribution is to characterize these restrictions (in terms of their Rademacher complexities) for a canonical class of explanations given by gradient information for linear models and two layer neural networks. Finally, we provide an algorithmic solution for our framework, via a variational approximation that achieves better performance and satisfies these constraints more frequently, when compared to simpler augmented Lagrangian methods to incorporate these explanations. We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.

Via

Access Paper or Ask Questions