In environmental epidemiology, it is critically important to identify subpopulations that are most vulnerable to the adverse effects of air pollution so we can develop targeted interventions. In recent years, there have been many methodological developments for addressing heterogeneity of treatment effects in causal inference. A common approach is to estimate the conditional average treatment effect (CATE) for a pre-specified covariate set. However, this approach does not provide an easy-to-interpret tool for identifying susceptible subpopulations or discover new subpopulations that are not defined a priori by the researchers. In this paper, we propose a new causal rule ensemble (CRE) method with two features simultaneously: 1) ensuring interpretability by revealing heterogeneous treatment effect structures in terms of decision rules and 2) providing CATE estimates with high statistical precision similar to causal machine learning algorithms. We provide theoretical results that guarantee consistency of the estimated causal effects for the newly discovered causal rules. Furthermore, via simulations, we show that the CRE method has competitive performance on its ability to discover subpopulations and then accurately estimate the causal effects. We also develop a new sensitivity analysis method that examine robustness to unmeasured confounding bias. Lastly, we apply the CRE method to the study of the effects of long-term exposure to air pollution on the 5-year mortality rate of the New England Medicare-enrolled population in United States. Code is available at https://github.com/kwonsang/causal_rule_ensemble.
Thanks to the increasing availability of granular, yet high-dimensional, firm level data, machine learning (ML) algorithms have been successfully applied to address multiple research questions related to firm dynamics. Especially supervised learning (SL), the branch of ML dealing with the prediction of labelled outcomes, has been used to better predict firms' performance. In this contribution, we will illustrate a series of SL approaches to be used for prediction tasks, relevant at different stages of the company life cycle. The stages we will focus on are (i) startup and innovation, (ii) growth and performance of companies, and (iii) firms exit from the market. First, we review SL implementations to predict successful startups and R&D projects. Next, we describe how SL tools can be used to analyze company growth and performance. Finally, we review SL applications to better forecast financial distress and company failure. In the concluding Section, we extend the discussion of SL methods in the light of targeted policies, result interpretability, and causality.
This paper introduces an innovative Bayesian machine learning algorithm to draw inference on heterogeneous causal effects in the presence of imperfect compliance (e.g., under an irregular assignment mechanism). We show, through Monte Carlo simulations, that the proposed Bayesian Causal Forest with Instrumental Variable (BCF-IV) algorithm outperforms other machine learning techniques tailored for causal inference (namely, Generalized Random Forest and Causal Trees with Instrumental Variable) in estimating the causal effects. Moreover, we show that it converges to an optimal asymptotic performance in discovering the drivers of heterogeneity in a simulated scenario. BCF-IV sheds a light on the heterogeneity of causal effects in instrumental variable scenarios and, in turn, provides the policy-makers with a relevant tool for targeted policies. Its empirical application evaluates the effects of additional funding on students' performances. The results indicate that BCF-IV could be used to enhance the effectiveness of school funding on students' performance by 3.2 to 3.5 times.