Alerting the public when heat may harm their health is a crucial service, especially considering that extreme heat events will be more frequent under climate change. Current practice for issuing heat alerts in the US does not take advantage of modern data science methods for optimizing local alert criteria. Specifically, application of reinforcement learning (RL) has the potential to inform more health-protective policies, accounting for regional and sociodemographic heterogeneity as well as sequential dependence of alerts. In this work, we formulate the issuance of heat alerts as a sequential decision making problem and develop modifications to the RL workflow to address challenges commonly encountered in environmental health settings. Key modifications include creating a simulator that pairs hierarchical Bayesian modeling of low-signal health effects with sampling of real weather trajectories (exogenous features), constraining the total number of alerts issued as well as preventing alerts on less-hot days, and optimizing location-specific policies. Post-hoc contrastive analysis offers insights into scenarios when using RL for heat alert issuance may protect public health better than the current or alternative policies. This work contributes to a broader movement of advancing data-driven policy optimization for public health and climate change adaptation.
Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, we introduce SpaCE: The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores characterizing the effect of a missing spatial confounder. It also includes realistic semi-synthetic outcomes and counterfactuals, generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models. The SpaCE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions.
Policymakers are required to evaluate the health benefits of reducing the National Ambient Air Quality Standards (NAAQS; i.e., the safety standards) for fine particulate matter PM 2.5 before implementing new policies. We formulate this objective as a shift-response function (SRF) and develop methods to analyze the problem using methods for causal inference, specifically under the stochastic interventions framework. SRFs model the average change in an outcome of interest resulting from a hypothetical shift in the observed exposure distribution. We propose a new broadly applicable doubly-robust method to learn SRFs using targeted regularization with neural networks. We evaluate our proposed method under various benchmarks specific for marginal estimates as a function of continuous exposure. Finally, we implement our estimator in the motivating application that considers the potential reduction in deaths from lowering the NAAQS from the current level of 12 $\mu g/m^3$ to levels that are recently proposed by the Environmental Protection Agency in the US (10, 9, and 8 $\mu g/m^3$).
In environmental epidemiology, it is critically important to identify subpopulations that are most vulnerable to the adverse effects of air pollution so we can develop targeted interventions. In recent years, there have been many methodological developments for addressing heterogeneity of treatment effects in causal inference. A common approach is to estimate the conditional average treatment effect (CATE) for a pre-specified covariate set. However, this approach does not provide an easy-to-interpret tool for identifying susceptible subpopulations or discover new subpopulations that are not defined a priori by the researchers. In this paper, we propose a new causal rule ensemble (CRE) method with two features simultaneously: 1) ensuring interpretability by revealing heterogeneous treatment effect structures in terms of decision rules and 2) providing CATE estimates with high statistical precision similar to causal machine learning algorithms. We provide theoretical results that guarantee consistency of the estimated causal effects for the newly discovered causal rules. Furthermore, via simulations, we show that the CRE method has competitive performance on its ability to discover subpopulations and then accurately estimate the causal effects. We also develop a new sensitivity analysis method that examine robustness to unmeasured confounding bias. Lastly, we apply the CRE method to the study of the effects of long-term exposure to air pollution on the 5-year mortality rate of the New England Medicare-enrolled population in United States. Code is available at https://github.com/kwonsang/causal_rule_ensemble.
Fine particulate matter (PM$_{2.5}$) is one of the criteria air pollutants regulated by the Environmental Protection Agency in the United States. There is strong evidence that ambient exposure to (PM$_{2.5}$) increases risk of mortality and hospitalization. Large scale epidemiological studies on the health effects of PM$_{2.5}$ provide the necessary evidence base for lowering the safety standards and inform regulatory policy. However, ambient monitors of PM$_{2.5}$ (as well as monitors for other pollutants) are sparsely located across the U.S., and therefore studies based only on the levels of PM$_{2.5}$ measured from the monitors would inevitably exclude large amounts of the population. One approach to resolving this issue has been developing models to predict local PM$_{2.5}$, NO$_2$, and ozone based on satellite, meteorological, and land use data. This process typically relies developing a prediction model that relies on large amounts of input data and is highly computationally intensive to predict levels of air pollution in unmonitored areas. We have developed a flexible R package that allows for environmental health researchers to design and train spatio-temporal models capable of predicting multiple pollutants, including PM$_{2.5}$. We utilize H2O, an open source big data platform, to achieve both performance and scalability when used in conjunction with cloud or cluster computing systems.