Abstract:A class of causal effect functionals requires integration over conditional densities of continuous variables, as in mediation effects and nonparametric identification in causal graphical models. Estimating such densities and evaluating the resulting integrals can be statistically and computationally demanding. A common workaround is to discretize the variable and replace integrals with finite sums. Although convenient, discretization alters the population-level functional and can induce non-negligible approximation bias, even under correct identification. Under smoothness conditions, we show that this coarsening bias is first order in the bin width and arises at the level of the target functional, distinct from statistical estimation error. We propose a simple bias-reduced functional that evaluates the outcome regression at within-bin conditional means, eliminating the leading term and yielding a second-order approximation error. We derive plug-in and one-step estimators for the bias-reduced functional. Simulations demonstrate substantial bias reduction and near-nominal confidence interval coverage, even under coarse binning. Our results provide a simple framework for controlling the impact of variable discretization on parameter approximation and estimation.




Abstract:Racial disparities in healthcare expenditures are well-documented, yet the underlying drivers remain complex and require further investigation. This study employs causal and counterfactual path-specific effects to quantify how various factors, including socioeconomic status, insurance access, health behaviors, and health status, mediate these disparities. Using data from the Medical Expenditures Panel Survey, we estimate how expenditures would differ under counterfactual scenarios in which the values of specific mediators were aligned across racial groups along selected causal pathways. A key challenge in this analysis is ensuring robustness against model misspecification while addressing the zero-inflation and right-skewness of healthcare expenditures. For reliable inference, we derive asymptotically linear estimators by integrating influence function-based techniques with flexible machine learning methods, including super learners and a two-part model tailored to the zero-inflated, right-skewed nature of healthcare expenditures.