Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vasilis Syrgkanis

Omitted Variable Bias in Machine Learned Causal Models

Dec 29, 2021

Victor Chernozhukov, Carlos Cinelli, Whitney Newey, Amit Sharma, Vasilis Syrgkanis

Figure 1 for Omitted Variable Bias in Machine Learned Causal Models

Figure 2 for Omitted Variable Bias in Machine Learned Causal Models

Figure 3 for Omitted Variable Bias in Machine Learned Causal Models

Figure 4 for Omitted Variable Bias in Machine Learned Causal Models

Abstract:We derive general, yet simple, sharp bounds on the size of the omitted variable bias for a broad class of causal parameters that can be identified as linear functionals of the conditional expectation function of the outcome. Such functionals encompass many of the traditional targets of investigation in causal inference studies, such as, for example, (weighted) average of potential outcomes, average treatment effects (including subgroup effects, such as the effect on the treated), (weighted) average derivatives, and policy effects from shifts in covariate distribution -- all for general, nonparametric causal models. Our construction relies on the Riesz-Frechet representation of the target functional. Specifically, we show how the bound on the bias depends only on the additional variation that the latent variables create both in the outcome and in the Riesz representer for the parameter of interest. Moreover, in many important cases (e.g, average treatment effects in partially linear models, or in nonseparable models with a binary treatment) the bound is shown to depend on two easily interpretable quantities: the nonparametric partial $R^2$ (Pearson's "correlation ratio") of the unobserved variables with the treatment and with the outcome. Therefore, simple plausibility judgments on the maximum explanatory power of omitted variables (in explaining treatment and outcome variation) are sufficient to place overall bounds on the size of the bias. Finally, leveraging debiased machine learning, we provide flexible and efficient statistical inference methods to estimate the components of the bounds that are identifiable from the observed distribution.

* This version of the paper was prepared for the NeurIPS-2021 Workshop "Causal Inference & Machine Learning: Why now?''; 32 pages; 4 figures; typos corrected

Via

Access Paper or Ask Questions

Robust Generalized Method of Moments: A Finite Sample Viewpoint

Oct 13, 2021

Dhruv Rohatgi, Vasilis Syrgkanis

Figure 1 for Robust Generalized Method of Moments: A Finite Sample Viewpoint

Abstract:For many inference problems in statistics and econometrics, the unknown parameter is identified by a set of moment conditions. A generic method of solving moment conditions is the Generalized Method of Moments (GMM). However, classical GMM estimation is potentially very sensitive to outliers. Robustified GMM estimators have been developed in the past, but suffer from several drawbacks: computational intractability, poor dimension-dependence, and no quantitative recovery guarantees in the presence of a constant fraction of outliers. In this work, we develop the first computationally efficient GMM estimator (under intuitive assumptions) that can tolerate a constant $\epsilon$ fraction of adversarially corrupted samples, and that has an $\ell_2$ recovery guarantee of $O(\sqrt{\epsilon})$. To achieve this, we draw upon and extend a recent line of work on algorithmic robust statistics for related but simpler problems such as mean estimation, linear regression and stochastic optimization. As two examples of the generality of our algorithm, we show how our estimation algorithm and assumptions apply to instrumental variables linear and logistic regression. Moreover, we experimentally validate that our estimator outperforms classical IV regression and two-stage Huber regression on synthetic and semi-synthetic datasets with corruption.

* 24 pages, 1 figure

Via

Access Paper or Ask Questions

RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Oct 12, 2021

Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, Vasilis Syrgkanis

Figure 1 for RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Figure 2 for RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Figure 3 for RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Figure 4 for RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Abstract:Many causal and policy effects of interest are defined by linear functionals of high-dimensional or non-parametric regression functions. $\sqrt{n}$-consistent and asymptotically normal estimation of the object of interest requires debiasing to reduce the effects of regularization and/or model selection on the object of interest. Debiasing is typically achieved by adding a correction term to the plug-in estimator of the functional, that is derived based on a functional-specific theoretical derivation of what is known as the influence function and which leads to properties such as double robustness and Neyman orthogonality. We instead implement an automatic debiasing procedure based on automatically learning the Riesz representation of the linear functional using Neural Nets and Random Forests. Our method solely requires value query oracle access to the linear functional. We propose a multi-tasking Neural Net debiasing method with stochastic gradient descent minimization of a combined Riesz representer and regression loss, while sharing representation layers for the two functions. We also propose a Random Forest method which learns a locally linear representation of the Riesz function. Even though our methodology applies to arbitrary functionals, we experimentally find that it beats state of the art performance of the prior neural net based estimator of Shi et al. (2019) for the case of the average treatment effect functional. We also evaluate our method on the more challenging problem of estimating average marginal effects with continuous treatments, using semi-synthetic data of gasoline price changes on gasoline demand.

Via

Access Paper or Ask Questions

DoWhy: Addressing Challenges in Expressing and Validating Causal Assumptions

Aug 27, 2021

Amit Sharma, Vasilis Syrgkanis, Cheng Zhang, Emre Kıcıman

Figure 1 for DoWhy: Addressing Challenges in Expressing and Validating Causal Assumptions

Abstract:Estimation of causal effects involves crucial assumptions about the data-generating process, such as directionality of effect, presence of instrumental variables or mediators, and whether all relevant confounders are observed. Violation of any of these assumptions leads to significant error in the effect estimate. However, unlike cross-validation for predictive models, there is no global validator method for a causal estimate. As a result, expressing different causal assumptions formally and validating them (to the extent possible) becomes critical for any analysis. We present DoWhy, a framework that allows explicit declaration of assumptions through a causal graph and provides multiple validation tests to check a subset of these assumptions. Our experience with DoWhy highlights a number of open questions for future research: developing new ways beyond causal graphs to express assumptions, the role of causal discovery in learning relevant parts of the graph, and developing validation tests that can better detect errors, both for average and conditional treatment effects. DoWhy is available at https://github.com/microsoft/dowhy.

* Presented at ICML 2021 Workshop on the Neglected Assumptions in Causal Inference(NACI)

Via

Access Paper or Ask Questions

Incentivizing Compliance with Algorithmic Instruments

Jul 28, 2021

Daniel Ngo, Logan Stapleton, Vasilis Syrgkanis, Zhiwei Steven Wu

Figure 1 for Incentivizing Compliance with Algorithmic Instruments

Figure 2 for Incentivizing Compliance with Algorithmic Instruments

Figure 3 for Incentivizing Compliance with Algorithmic Instruments

Figure 4 for Incentivizing Compliance with Algorithmic Instruments

Abstract:Randomized experiments can be susceptible to selection bias due to potential non-compliance by the participants. While much of the existing work has studied compliance as a static behavior, we propose a game-theoretic model to study compliance as dynamic behavior that may change over time. In rounds, a social planner interacts with a sequence of heterogeneous agents who arrive with their unobserved private type that determines both their prior preferences across the actions (e.g., control and treatment) and their baseline rewards without taking any treatment. The planner provides each agent with a randomized recommendation that may alter their beliefs and their action selection. We develop a novel recommendation mechanism that views the planner's recommendation as a form of instrumental variable (IV) that only affects an agents' action selection, but not the observed rewards. We construct such IVs by carefully mapping the history -- the interactions between the planner and the previous agents -- to a random recommendation. Even though the initial agents may be completely non-compliant, our mechanism can incentivize compliance over time, thereby enabling the estimation of the treatment effect of each treatment, and minimizing the cumulative regret of the planner whose goal is to identify the optimal treatment.

* In Proceedings of the Thirty-eighth International Conference on Machine Learning (ICML 2021), 17 pages of main text, 53 pages total, 3 figures

Via

Access Paper or Ask Questions

Knowledge Distillation as Semiparametric Inference

Apr 20, 2021

Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, Lester Mackey

Figure 1 for Knowledge Distillation as Semiparametric Inference

Figure 2 for Knowledge Distillation as Semiparametric Inference

Figure 3 for Knowledge Distillation as Semiparametric Inference

Figure 4 for Knowledge Distillation as Semiparametric Inference

Abstract:A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model. Surprisingly, this two-step knowledge distillation process often leads to higher accuracy than training the student directly on labeled data. To explain and enhance this phenomenon, we cast knowledge distillation as a semiparametric inference problem with the optimal student model as the target, the unknown Bayes class probabilities as nuisance, and the teacher probabilities as a plug-in nuisance estimate. By adapting modern semiparametric tools, we derive new guarantees for the prediction error of standard distillation and develop two enhancements -- cross-fitting and loss correction -- to mitigate the impact of teacher overfitting and underfitting on student performance. We validate our findings empirically on both tabular and image data and observe consistent improvements from our knowledge distillation enhancements.

Via

Access Paper or Ask Questions

Estimating the Long-Term Effects of Novel Treatments

Mar 15, 2021

Keith Battocchi, Eleanor Dillon, Maggie Hei, Greg Lewis, Miruna Oprescu, Vasilis Syrgkanis

Figure 1 for Estimating the Long-Term Effects of Novel Treatments

Figure 2 for Estimating the Long-Term Effects of Novel Treatments

Figure 3 for Estimating the Long-Term Effects of Novel Treatments

Figure 4 for Estimating the Long-Term Effects of Novel Treatments

Abstract:Policy makers typically face the problem of wanting to estimate the long-term effects of novel treatments, while only having historical data of older treatment options. We assume access to a long-term dataset where only past treatments were administered and a short-term dataset where novel treatments have been administered. We propose a surrogate based approach where we assume that the long-term effect is channeled through a multitude of available short-term proxies. Our work combines three major recent techniques in the causal machine learning literature: surrogate indices, dynamic treatment effect estimation and double machine learning, in a unified pipeline. We show that our method is consistent and provides root-n asymptotically normal estimates under a Markovian assumption on the data and the observational policy. We use a data-set from a major corporation that includes customer investments over a three year period to create a semi-synthetic data distribution where the major qualitative properties of the real dataset are preserved. We evaluate the performance of our method and discuss practical challenges of deploying our formal methodology and how to address them.

Via

Access Paper or Ask Questions

Evidence-Based Policy Learning

Mar 12, 2021

Jann Spiess, Vasilis Syrgkanis

Figure 1 for Evidence-Based Policy Learning

Figure 2 for Evidence-Based Policy Learning

Figure 3 for Evidence-Based Policy Learning

Figure 4 for Evidence-Based Policy Learning

Abstract:The past years have seen seen the development and deployment of machine-learning algorithms to estimate personalized treatment-assignment policies from randomized controlled trials. Yet such algorithms for the assignment of treatment typically optimize expected outcomes without taking into account that treatment assignments are frequently subject to hypothesis testing. In this article, we explicitly take significance testing of the effect of treatment-assignment policies into account, and consider assignments that optimize the probability of finding a subset of individuals with a statistically significant positive treatment effect. We provide an efficient implementation using decision trees, and demonstrate its gain over selecting subsets based on positive (estimated) treatment effects. Compared to standard tree-based regression and classification tools, this approach tends to yield substantially higher power in detecting subgroups with positive treatment effects.

Via

Access Paper or Ask Questions

Adversarial Estimation of Riesz Representers

Dec 30, 2020

Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

Figure 1 for Adversarial Estimation of Riesz Representers

Abstract:We provide an adversarial approach to estimating Riesz representers of linear functionals within arbitrary function spaces. We prove oracle inequalities based on the localized Rademacher complexity of the function space used to approximate the Riesz representer and the approximation error. These inequalities imply fast finite sample mean-squared-error rates for many function spaces of interest, such as high-dimensional sparse linear functions, neural networks and reproducing kernel Hilbert spaces. Our approach offers a new way of estimating Riesz representers with a plethora of recently introduced machine learning techniques. We show how our estimator can be used in the context of de-biasing structural/causal parameters in semi-parametric models, for automated orthogonalization of moment equations and for estimating the stochastic discount factor in the context of asset pricing.

Via

Access Paper or Ask Questions

Asymptotics of the Empirical Bootstrap Method Beyond Asymptotic Normality

Nov 23, 2020

Morgane Austern, Vasilis Syrgkanis

Abstract:One of the most commonly used methods for forming confidence intervals for statistical inference is the empirical bootstrap, which is especially expedient when the limiting distribution of the estimator is unknown. However, despite its ubiquitous role, its theoretical properties are still not well understood for non-asymptotically normal estimators. In this paper, under stability conditions, we establish the limiting distribution of the empirical bootstrap estimator, derive tight conditions for it to be asymptotically consistent, and quantify the speed of convergence. Moreover, we propose three alternative ways to use the bootstrap method to build confidence intervals with coverage guarantees. Finally, we illustrate the generality and tightness of our results by a series of examples, including uniform confidence bands, two-sample kernel tests, minmax stochastic programs and the empirical risk of stacked estimators.

Via

Access Paper or Ask Questions