-
Bayesian Nonparametric Trees for Principal Causal Effects
Authors:
Chanmin Kim,
Corwin Zigler
Abstract:
Principal stratification analysis evaluates how causal effects of a treatment on a primary outcome vary across strata of units defined by their treatment effect on some intermediate quantity. This endeavor is substantially challenged when the intermediate variable is continuously scaled and there are infinitely many basic principal strata. We employ a Bayesian nonparametric approach to flexibly ev…
▽ More
Principal stratification analysis evaluates how causal effects of a treatment on a primary outcome vary across strata of units defined by their treatment effect on some intermediate quantity. This endeavor is substantially challenged when the intermediate variable is continuously scaled and there are infinitely many basic principal strata. We employ a Bayesian nonparametric approach to flexibly evaluate treatment effects across flexibly-modeled principal strata. The approach uses Bayesian Causal Forests (BCF) to simultaneously specify two Bayesian Additive Regression Tree models; one for the principal stratum membership and one for the outcome, conditional on principal strata. We show how the capability of BCF for capturing treatment effect heterogeneity is particularly relevant for assessing how treatment effects vary across the surface defined by continuously-scaled principal strata, in addition to other benefits relating to targeted selection and regularization-induced confounding. The capabilities of the proposed approach are illustrated with a simulation study, and the methodology is deployed to investigate how causal effects of power plant emissions control technologies on ambient particulate pollution vary as a function of the technologies' impact on sulfur dioxide emissions.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Causal Analysis of Air Pollution Mixtures: Estimands, Positivity, and Extrapolation
Authors:
Joseph Antonelli,
Corwin Zigler
Abstract:
Causal inference for air pollution mixtures is an increasingly important issue with appreciable challenges. When the exposure is a multivariate mixture, there are many exposure contrasts that may be of nominal interest for causal effect estimation, but the complex joint mixture distribution often renders observed data extremely limited in their ability to inform estimates of many commonly-defined…
▽ More
Causal inference for air pollution mixtures is an increasingly important issue with appreciable challenges. When the exposure is a multivariate mixture, there are many exposure contrasts that may be of nominal interest for causal effect estimation, but the complex joint mixture distribution often renders observed data extremely limited in their ability to inform estimates of many commonly-defined causal effects. We use potential outcomes to 1) define causal effects of air pollution mixtures, 2) formalize the key assumption of mixture positivity required for estimation and 3) offer diagnostic metrics for positivity violations in the mixture setting that allow researchers to assess the extent to which data can actually support estimation of mixture effects of interest. For settings where there is limited empirical support, we redefine causal estimands that apportion causal effects according to whether they can be directly informed by observed data versus rely entirely on model extrapolation, isolating key sources of information on the causal effect of an air pollution mixture. The ideas are deployed to assess the ability of a national United States data set on the chemical components of ambient particulate matter air pollution to support estimation of a variety of causal mixture effects.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Causal health impacts of power plant emission controls under modeled and uncertain physical process interference
Authors:
Nathan B. Wikle,
Corwin M. Zigler
Abstract:
Causal inference with spatial environmental data is often challenging due to the presence of interference: outcomes for observational units depend on some combination of local and non-local treatment. This is especially relevant when estimating the effect of power plant emissions controls on population health, as pollution exposure is dictated by (i) the location of point-source emissions, as well…
▽ More
Causal inference with spatial environmental data is often challenging due to the presence of interference: outcomes for observational units depend on some combination of local and non-local treatment. This is especially relevant when estimating the effect of power plant emissions controls on population health, as pollution exposure is dictated by (i) the location of point-source emissions, as well as (ii) the transport of pollutants across space via dynamic physical-chemical processes. In this work, we estimate the effectiveness of air quality interventions at coal-fired power plants in reducing two adverse health outcomes in Texas in 2016: pediatric asthma ED visits and Medicare all-cause mortality. We develop methods for causal inference with interference when the underlying network structure is not known with certainty and instead must be estimated from ancillary data. Notably, uncertainty in the interference structure is propagated to the resulting causal effect estimates. We offer a Bayesian, spatial mechanistic model for the interference mapping which we combine with a flexible non-parametric outcome model to marginalize estimates of causal effects over uncertainty in the structure of interference. Our analysis finds some evidence that emissions controls at upwind power plants reduce asthma ED visits and all-cause mortality, however accounting for uncertainty in the interference renders the results largely inconclusive.
△ Less
Submitted 13 May, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Statistical Inference for Complete and Incomplete Mobility Trajectories under the Flight-Pause Model
Authors:
Marcin Jurek,
Catherine A. Calder,
Corwin Zigler
Abstract:
We formulate a statistical flight-pause model for human mobility, represented by a collection of random objects, called motions, appropriate for mobile phone tracking (MPT) data. We develop the statistical machinery for parameter inference and trajectory imputation under various forms of missing data. We show that common assumptions about the missing data mechanism for MPT are not valid for the me…
▽ More
We formulate a statistical flight-pause model for human mobility, represented by a collection of random objects, called motions, appropriate for mobile phone tracking (MPT) data. We develop the statistical machinery for parameter inference and trajectory imputation under various forms of missing data. We show that common assumptions about the missing data mechanism for MPT are not valid for the mechanism governing the random motions underlying the flight-pause model, representing an understudied missing data phenomenon. We demonstrate the consequences of missing data and our proposed adjustments in both simulations and real data, outlining implications for MPT data collection and design.
△ Less
Submitted 30 June, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Weather2vec: Representation Learning for Causal Inference with Non-Local Confounding in Air Pollution and Climate Studies
Authors:
Mauricio Tec,
James Scott,
Corwin Zigler
Abstract:
Estimating the causal effects of a spatially-varying intervention on a spatially-varying outcome may be subject to non-local confounding (NLC), a phenomenon that can bias estimates when the treatments and outcomes of a given unit are dictated in part by the covariates of other nearby units. In particular, NLC is a challenge for evaluating the effects of environmental policies and climate events on…
▽ More
Estimating the causal effects of a spatially-varying intervention on a spatially-varying outcome may be subject to non-local confounding (NLC), a phenomenon that can bias estimates when the treatments and outcomes of a given unit are dictated in part by the covariates of other nearby units. In particular, NLC is a challenge for evaluating the effects of environmental policies and climate events on health-related outcomes such as air pollution exposure. This paper first formalizes NLC using the potential outcomes framework, providing a comparison with the related phenomenon of causal interference. Then, it proposes a broadly applicable framework, termed "weather2vec", that uses the theory of balancing scores to learn representations of non-local information into a scalar or vector defined for each observational unit, which is subsequently used to adjust for confounding in conjunction with causal inference methods. The framework is evaluated in a simulation study and two case studies on air pollution where the weather is an (inherently regional) known confounder.
△ Less
Submitted 11 December, 2022; v1 submitted 25 September, 2022;
originally announced September 2022.
-
Bayesian Nonparametric Adjustment of Confounding
Authors:
Chanmin Kim,
Mauricio Tec,
Corwin M Zigler
Abstract:
Analysis of observational studies increasingly confronts the challenge of determining which of a possibly high-dimensional set of available covariates are required to satisfy the assumption of ignorable treatment assignment for estimation of causal effects. We propose a Bayesian nonparametric approach that simultaneously 1) prioritizes inclusion of adjustment variables in accordance with existing…
▽ More
Analysis of observational studies increasingly confronts the challenge of determining which of a possibly high-dimensional set of available covariates are required to satisfy the assumption of ignorable treatment assignment for estimation of causal effects. We propose a Bayesian nonparametric approach that simultaneously 1) prioritizes inclusion of adjustment variables in accordance with existing principles of confounder selection; 2) estimates causal effects in a manner that permits complex relationships among confounders, exposures, and outcomes; and 3) provides causal estimates that account for uncertainty in the nature of confounding. The proposal relies on specification of multiple Bayesian Additive Regression Trees models, linked together with a common prior distribution that accrues posterior selection probability to covariates on the basis of association with both the exposure and the outcome of interest. A set of extensive simulation studies demonstrates that the proposed method performs well relative to similarly-motivated methodologies in a variety of scenarios. We deploy the method to investigate the causal effect of emissions from coal-fired power plants on ambient air pollution concentrations, where the prospect of confounding due to local and regional meteorological factors introduces uncertainty around the confounding role of a high-dimensional set of measured variables. Ultimately, we show that the proposed method produces more efficient and more consistent results across adjacent years than alternative methods, lending strength to the evidence of the causal relationship between SO2 emissions and ambient particulate pollution.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
Bipartite Interference and Air Pollution Transport: Estimating Health Effects of Power Plant Interventions
Authors:
Corwin Zigler,
Vera Liu,
Fabrizia Mealli,
Laura Forastiere
Abstract:
Evaluating air quality interventions is confronted with the challenge of interference since interventions at a particular pollution source likely impact air quality and health at distant locations and air quality and health at any given location are likely impacted by interventions at many sources. The structure of interference in this context is dictated by complex atmospheric processes governing…
▽ More
Evaluating air quality interventions is confronted with the challenge of interference since interventions at a particular pollution source likely impact air quality and health at distant locations and air quality and health at any given location are likely impacted by interventions at many sources. The structure of interference in this context is dictated by complex atmospheric processes governing how pollution emitted from a particular source is transformed and transported across space, and can be cast with a bipartite structure reflecting the two distinct types of units: 1) interventional units on which treatments are applied or withheld to change pollution emissions; and 2) outcome units on which outcomes of primary interest are measured. We propose new estimands for bipartite causal inference with interference that construe two components of treatment: a "key-associated" (or "individual") treatment and an "upwind" (or "neighborhood") treatment. Estimation is carried out using a semi-parametric adjustment approach based on joint propensity scores. A reduced-complexity atmospheric model is deployed to characterize the structure of the interference network by modeling the movement of air parcels through time and space. The new methods are deployed to evaluate the effectiveness of installing flue-gas desulfurization scrubbers on 472 coal-burning power plants (the interventional units) in reducing Medicare hospitalizations among 21,577,552 Medicare beneficiaries residing across 25,553 ZIP codes in the United States (the outcome units).
△ Less
Submitted 2 January, 2023; v1 submitted 8 December, 2020;
originally announced December 2020.
-
A Mechanistic Model of Annual Sulfate Concentrations in the United States
Authors:
Nathan B. Wikle,
Ephraim M. Hanks,
Lucas R. F. Henneman,
Corwin M. Zigler
Abstract:
We develop a mechanistic model to analyze the impact of sulfur dioxide emissions from coal-fired power plants on average sulfate concentrations in the central United States. A multivariate Ornstein-Uhlenbeck (OU) process is used to approximate the dynamics of the underlying space-time chemical transport process, and its distributional properties are leveraged to specify novel probability models fo…
▽ More
We develop a mechanistic model to analyze the impact of sulfur dioxide emissions from coal-fired power plants on average sulfate concentrations in the central United States. A multivariate Ornstein-Uhlenbeck (OU) process is used to approximate the dynamics of the underlying space-time chemical transport process, and its distributional properties are leveraged to specify novel probability models for spatial data (i.e., spatially-referenced data with no temporal replication) that are viewed as either a snapshot or a time-averaged observation of the OU process. Air pollution transport dynamics determine the mean and covariance structure of our atmospheric sulfate model, allowing us to infer which process dynamics are driving observed air pollution concentrations. We use these inferred dynamics to assess the regulatory impact of flue-gas desulfurization (FGD) technologies on human exposure to sulfate aerosols.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Posterior Predictive Treatment Assignment Methods for Causal Inference in the Context of Time-Varying Treatments
Authors:
Shirley Liao,
Lucas Henneman,
Corwin Zigler
Abstract:
Marginal structural models (MSM) with inverse probability weighting (IPW) are used to estimate causal effects of time-varying treatments, but can result in erratic finite-sample performance when there is low overlap in covariate distributions across different treatment patterns. Modifications to IPW which target the average treatment effect (ATE) estimand either introduce bias or rely on unverifia…
▽ More
Marginal structural models (MSM) with inverse probability weighting (IPW) are used to estimate causal effects of time-varying treatments, but can result in erratic finite-sample performance when there is low overlap in covariate distributions across different treatment patterns. Modifications to IPW which target the average treatment effect (ATE) estimand either introduce bias or rely on unverifiable parametric assumptions and extrapolation. This paper extends an alternate estimand, the average treatment effect on the overlap population (ATO) which is estimated on a sub-population with a reasonable probability of receiving alternate treatment patterns in time-varying treatment settings. To estimate the ATO within a MSM framework, this paper extends a stochastic pruning method based on the posterior predictive treatment assignment (PPTA) as well as a weighting analogue to the time-varying treatment setting. Simulations demonstrate the performance of these extensions compared against IPW and stabilized weighting with regard to bias, efficiency and coverage. Finally, an analysis using these methods is performed on Medicare beneficiaries residing across 18,480 zip codes in the U.S. to evaluate the effect of coal-fired power plant emissions exposure on ischemic heart disease hospitalization, accounting for seasonal patterns that lead to change in treatment over time.
△ Less
Submitted 15 July, 2019;
originally announced July 2019.
-
Bayesian data fusion for unmeasured confounding
Authors:
Leah Comment,
Brent A. Coull,
Corwin Zigler,
Linda Valeri
Abstract:
Bayesian causal inference offers a principled approach to policy evaluation of proposed interventions on mediators or time-varying exposures. We outline a general approach to the estimation of causal quantities for settings with time-varying confounding, such as exposure-induced mediator-outcome confounders. We further extend this approach to propose two Bayesian data fusion (BDF) methods for unme…
▽ More
Bayesian causal inference offers a principled approach to policy evaluation of proposed interventions on mediators or time-varying exposures. We outline a general approach to the estimation of causal quantities for settings with time-varying confounding, such as exposure-induced mediator-outcome confounders. We further extend this approach to propose two Bayesian data fusion (BDF) methods for unmeasured confounding. Using informative priors on quantities relating to the confounding bias parameters, our methods incorporate data from an external source where the confounder is measured in order to make inferences about causal estimands in the main study population. We present results from a simulation study comparing our data fusion methods to two common frequentist correction methods for unmeasured confounding bias in the mediation setting. We also demonstrate our method with an investigation of the role of stage at cancer diagnosis in contributing to Black-White colorectal cancer survival disparities.
△ Less
Submitted 27 February, 2019;
originally announced February 2019.
-
A Source-Oriented Approach to Coal Power Plant Emissions Health Effects
Authors:
Kevin Cummiskey,
Chanmin Kim,
Christine Choirat,
Lucas R. F. Henneman,
Joel Schwartz,
Corwin Zigler
Abstract:
There is increasing focus on whether air pollution originating from different sources has different health implications. In particular, recent evidence suggests that fine particulate matter (PM2.5) with chemical tracers suggesting coal combustion origins is especially harmful. Augmenting this knowledge with estimates from causal inference methods to identify the health impacts of PM2.5 derived fro…
▽ More
There is increasing focus on whether air pollution originating from different sources has different health implications. In particular, recent evidence suggests that fine particulate matter (PM2.5) with chemical tracers suggesting coal combustion origins is especially harmful. Augmenting this knowledge with estimates from causal inference methods to identify the health impacts of PM2.5 derived from specific point sources of coal combustion would be an important step towards informing specific, targeted interventions. We investigated the effect of high-exposure to coal combustion emissions from 783 coal-fired power generating units on ischemic heart disease (IHD) hospitalizations in over 19 million Medicare beneficiaries residing at 21,351 ZIP codes in the eastern United States. We used InMAP, a newly-developed, reduced-complexity air quality model to classify each ZIP code as either a high-exposed or control location. Our health outcomes analysis uses a causal inference method - propensity score matching - to adjust for potential confounders of the relationship between exposure and IHD. We fit separate Poisson regression models to the matched data in each geographic region to estimate the incidence rate ratio for IHD comparing high-exposed to control locations. High exposure to coal power plant emissions and IHD were positively associated in the Northeast (IRR = 1.08, 95% CI = 1.06, 1.09) and the Southeast (IRR = 1.06, 95% CI = 1.04, 1.08). No significant association was found in the Industrial Midwest (IRR = 1.02, 95% CI = 1.00, 1.04), likely the result of small exposure contrasts between high-exposed and control ZIP codes in that region. This study provides targeted evidence of the association between emissions from specific coal power plants and IHD hospitalizations among Medicare beneficiaries.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Survivor average causal effects for continuous time: a principal stratification approach to causal inference with semicompeting risks
Authors:
Leah Comment,
Fabrizia Mealli,
Sebastien Haneuse,
Corwin Zigler
Abstract:
In semicompeting risks problems, nonterminal time-to-event outcomes such as time to hospital readmission are subject to truncation by death. These settings are often modeled with illness-death models for the hazards of the terminal and nonterminal events, but evaluating causal treatment effects with hazard models is problematic due to conditioning on survival (a post-treatment outcome) that is emb…
▽ More
In semicompeting risks problems, nonterminal time-to-event outcomes such as time to hospital readmission are subject to truncation by death. These settings are often modeled with illness-death models for the hazards of the terminal and nonterminal events, but evaluating causal treatment effects with hazard models is problematic due to conditioning on survival (a post-treatment outcome) that is embedded in the definition of a hazard. Extending an existing survivor average causal effect (SACE) estimand, we frame the evaluation of treatment effects in the context of semicompeting risks with principal stratification and introduce two new causal estimands: the time-varying survivor average causal effect (TV-SACE) and the restricted mean survivor average causal effect (RM-SACE). These principal causal effects are defined among units that would survive regardless of assigned treatment. We adopt a Bayesian estimation procedure that parameterizes illness-death models for both treatment arms. We outline a frailty specification that can accommodate within-person correlation between nonterminal and terminal event times, and we discuss potential avenues for adding model flexibility. The method is demonstrated in the context of hospital readmission among late-stage pancreatic cancer patients.
△ Less
Submitted 15 February, 2019;
originally announced February 2019.
-
Bayesian Methods for Multiple Mediators: Relating Principal Stratification and Causal Mediation in the Analysis of Power Plant Emission Controls
Authors:
Chanmin Kim,
Michael Daniels,
Joseph Hogan,
Christine Choirat,
Corwin Zigler
Abstract:
Emission control technologies installed on power plants are a key feature of many air pollution regulations in the US. While such regulations are predicated on the presumed relationships between emissions, ambient air pollution, and human health, many of these relationships have never been empirically verified. The goal of this paper is to develop new statistical methods to quantify these relation…
▽ More
Emission control technologies installed on power plants are a key feature of many air pollution regulations in the US. While such regulations are predicated on the presumed relationships between emissions, ambient air pollution, and human health, many of these relationships have never been empirically verified. The goal of this paper is to develop new statistical methods to quantify these relationships. We frame this problem as one of mediation analysis to evaluate the extent to which the effect of a particular control technology on ambient pollution is mediated through causal effects on power plant emissions. Since power plants emit various compounds that contribute to ambient pollution, we develop new methods for multiple intermediate variables that are measured contemporaneously, may interact with one another, and may exhibit joint mediating effects. Specifically, we propose new methods leveraging two related frameworks for causal inference in the presence of mediating variables: principal stratification and causal mediation analysis. We define principal effects based on multiple mediators, and also introduce a new decomposition of the total effect of an intervention on ambient pollution into the natural direct effect and natural indirect effects for all combinations of mediators. Both approaches are anchored to the same observed-data models, which we specify with Bayesian nonparametric techniques. We provide assumptions for estimating principal causal effects, then augment these with an additional assumption required for causal mediation analysis. The two analyses, interpreted in tandem, provide the first empirical investigation of the presumed causal pathways that motivate important air quality regulatory policies.
△ Less
Submitted 16 February, 2019;
originally announced February 2019.
-
Bayesian Longitudinal Causal Inference in the Analysis of the Public Health Impact of Pollutant Emissions
Authors:
Chanmin Kim,
Corwin M Zigler,
Michael J Daniels,
Christine Choirat,
Jason A Roy
Abstract:
Pollutant emissions from coal-burning power plants have been deemed to adversely impact ambient air quality and public health conditions. Despite the noticeable reduction in emissions and the improvement of air quality since the Clean Air Act (CAA) became the law, the public-health benefits from changes in emissions have not been widely evaluated yet. In terms of the chain of accountability (HEI A…
▽ More
Pollutant emissions from coal-burning power plants have been deemed to adversely impact ambient air quality and public health conditions. Despite the noticeable reduction in emissions and the improvement of air quality since the Clean Air Act (CAA) became the law, the public-health benefits from changes in emissions have not been widely evaluated yet. In terms of the chain of accountability (HEI Accountability Working Group, 2003), the link between pollutant emissions from the power plants (SO2) and public health conditions (respiratory diseases) accounting for changes in ambient air quality (PM2.5) is unknown. We provide the first assessment of the longitudinal effect of specific pollutant emission (SO2) on public health outcomes that is mediated through changes in the ambient air quality. It is of particular interest to examine the extent to which the effect that is mediated through changes in local ambient air quality differs from year to year. In this paper, we propose a Bayesian approach to estimate novel causal estimands: time-varying mediation effects in the presence of mediators and responses measured every year. We replace the commonly invoked sequential ignorability assumption with a new set of assumptions which are sufficient to identify the distributions of the natural indirect and direct effects in this setting.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
Evaluating Federal Policies Using Bayesian Time Series Models: Estimating the Causal Impact of the Hospital Readmissions Reduction Program
Authors:
Georgia Papadogeorgou,
Fiammetta Menchetti,
Christine Choirat,
Jason H. Wasfy,
Corwin M. Zigler,
Fabrizia Mealli
Abstract:
Researchers are often faced with evaluating the effect of a policy or program that was simultaneously initiated across an entire population of units at a single point in time, and its effects over the targeted population can manifest at any time period afterwards. In the presence of data measured over time, Bayesian time series models have been used to impute what would have happened after the pol…
▽ More
Researchers are often faced with evaluating the effect of a policy or program that was simultaneously initiated across an entire population of units at a single point in time, and its effects over the targeted population can manifest at any time period afterwards. In the presence of data measured over time, Bayesian time series models have been used to impute what would have happened after the policy was initiated, had the policy not taken place, in order to estimate causal effects. However, the considerations regarding the definition of the target estimands, the underlying assumptions, the plausibility of such assumptions, and the choice of an appropriate model have not been thoroughly investigated. In this paper, we establish useful estimands for the evaluation of large-scale policies. We discuss that imputation of missing potential outcomes relies on an assumption which, even though untestable, can be partially evaluated using observed data. We illustrate an approach to evaluate this key causal assumption and facilitate model elicitation based on data from the time interval before policy initiation and using classic statistical techniques. As an illustration, we study the Hospital Readmissions Reduction Program (HRRP), a US federal intervention aiming to improve health outcomes for patients with pneumonia, acute myocardial infraction, or congestive failure admitted to a hospital. We evaluate the effect of the HRRP on population mortality among the elderly across the US and in four geographic subregions, and at different time windows. We find that the HRRP increased mortality from pneumonia and acute myocardial infraction across at least one geographical region and time horizon, and is likely to have had a detrimental effect on public health.
△ Less
Submitted 28 October, 2022; v1 submitted 13 September, 2018;
originally announced September 2018.
-
Uncertainty in the Design Stage of Two-Stage Bayesian Propensity Score Analysis
Authors:
Shirley Liao,
Corwin Zigler
Abstract:
The two-stage process of propensity score analysis (PSA) includes a design stage where propensity scores are estimated and implemented to approximate a randomized experiment and an analysis stage where treatment effects are estimated conditional upon the design. This paper considers how uncertainty associated with the design stage impacts estimation of causal effects in the analysis stage. Such de…
▽ More
The two-stage process of propensity score analysis (PSA) includes a design stage where propensity scores are estimated and implemented to approximate a randomized experiment and an analysis stage where treatment effects are estimated conditional upon the design. This paper considers how uncertainty associated with the design stage impacts estimation of causal effects in the analysis stage. Such design uncertainty can derive from the fact that the propensity score itself is an estimated quantity, but also from other features of the design stage tied to choice of propensity score implementation. This paper offers a procedure for obtaining the posterior distribution of causal effects after marginalizing over a distribution of design-stage outputs, lending a degree of formality to Bayesian methods for PSA (BPSA) that have gained attention in recent literature. Formulation of a probability distribution for the design-stage output depends on how the propensity score is implemented in the design stage, and propagation of uncertainty into causal estimates depends on how the treatment effect is estimated in the analysis stage. We explore these differences within a sample of commonly-used propensity score implementations (quantile stratification, nearest-neighbor matching, caliper matching, inverse probability of treatment weighting, and doubly robust estimation) and investigate in a simulation study the impact of statistician choice in PS model and implementation on the degree of between- and within-design variability in the estimated treatment effect. The methods are then deployed in an investigation of the association between levels of fine particulate air pollution and elevated exposure to emissions from coal-fired power plants.
△ Less
Submitted 15 July, 2019; v1 submitted 13 September, 2018;
originally announced September 2018.
-
Bipartite Causal Inference with Interference
Authors:
Corwin M. Zigler,
Georgia Papadogeorgou
Abstract:
Statistical methods to evaluate the effectiveness of interventions are increasingly challenged by the inherent interconnectedness of units. Specifically, a recent flurry of methods research has addressed the problem of interference between observations, which arises when one observational unit's outcome depends not only on its treatment but also the treatment assigned to other units. We introduce…
▽ More
Statistical methods to evaluate the effectiveness of interventions are increasingly challenged by the inherent interconnectedness of units. Specifically, a recent flurry of methods research has addressed the problem of interference between observations, which arises when one observational unit's outcome depends not only on its treatment but also the treatment assigned to other units. We introduce the setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Basic definitions and formulations are provided for this setting, highlighting similarities and differences with more commonly considered settings of causal inference with interference. Several types of causal estimands are discussed, and a simple inverse probability of treatment weighted estimator is developed for a subset of simplified estimands. The estimators are deployed to evaluate how interventions to reduce air pollution from 473 power plants in the U.S. causally affect cardiovascular hospitalization among Medicare beneficiaries residing at 23,458 zip code locations.
△ Less
Submitted 23 July, 2018;
originally announced July 2018.
-
Causal inference for interfering units with cluster and population level treatment allocation programs
Authors:
Georgia Papadogeorgou,
Fabrizia Mealli,
Corwin M. Zigler
Abstract:
Interference arises when an individual's potential outcome depends on the individual treatment level, but also on the treatment level of others. A common assumption in the causal inference literature in the presence of interference is partial interference, implying that the population can be partitioned in clusters of individuals whose potential outcomes only depend on the treatment of units withi…
▽ More
Interference arises when an individual's potential outcome depends on the individual treatment level, but also on the treatment level of others. A common assumption in the causal inference literature in the presence of interference is partial interference, implying that the population can be partitioned in clusters of individuals whose potential outcomes only depend on the treatment of units within the same cluster. Previous literature has defined average potential outcomes under counterfactual scenarios where treatments are randomly allocated to units within a cluster. However, within clusters there may be units that are more or less likely to receive treatment based on covariates or neighbors' treatment. We define new estimands that describe average potential outcomes for realistic counterfactual treatment allocation programs, extending existing estimands to take into consideration the units' covariates and dependence between units' treatment assignment. We further propose entirely new estimands for population-level interventions over the collection of clusters, which correspond in the motivating setting to regulations at the federal (vs. cluster or regional) level. We discuss these estimands, propose unbiased estimators and derive asymptotic results as the number of clusters grows. Finally, we estimate effects in a comparative effectiveness study of power plant emission reduction technologies on ambient ozone pollution.
△ Less
Submitted 14 May, 2018; v1 submitted 3 November, 2017;
originally announced November 2017.
-
Posterior Predictive Treatment Assignment for Estimating Causal Effects with Limited Overlap
Authors:
Corwin M Zigler,
Matthew Cefalu
Abstract:
Estimating causal effects with propensity scores relies upon the availability of treated and untreated units observed at each value of the estimated propensity score. In settings with strong confounding, limited so-called "overlap" in propensity score distributions can undermine the empirical basis for estimating causal effects and yield erratic finite-sample performance of existing estimators. We…
▽ More
Estimating causal effects with propensity scores relies upon the availability of treated and untreated units observed at each value of the estimated propensity score. In settings with strong confounding, limited so-called "overlap" in propensity score distributions can undermine the empirical basis for estimating causal effects and yield erratic finite-sample performance of existing estimators. We propose a Bayesian procedure designed to estimate causal effects in settings where there is limited overlap in propensity score distributions. Our method relies on the posterior predictive treatment assignment (PPTA), a quantity that is derived from the propensity score but serves different role in estimation of causal effects. We use the PPTA to estimate causal effects by marginalizing over the uncertainty in whether each observation is a member of an unknown subset for which treatment assignment can be assumed unconfounded. The resulting posterior distribution depends on the empirical basis for estimating a causal effect for each observation and has commonalities with recently-proposed "overlap weights" of Li et al. (2016). We show that the PPTA approach can be construed as a stochastic version of existing ad-hoc approaches such as pruning based on the propensity score or truncation of inverse probability of treatment weights, and highlight several practical advantages including uncertainty quantification and improved finite-sample performance. We illustrate the method in an evaluation of the effectiveness of technologies for reducing harmful pollution emissions from power plants in the United States.
△ Less
Submitted 24 October, 2017;
originally announced October 2017.
-
Adjusting for Unmeasured Spatial Confounding with Distance Adjusted Propensity Score Matching
Authors:
Georgia Papadogeorgou,
Christine Choirat,
Corwin Zigler
Abstract:
Propensity score matching is a common tool for adjusting for observed confounding in observational studies, but is known to have limitations in the presence of unmeasured confounding. In many settings, researchers are confronted with spatially-indexed data where the relative locations of the observational units may serve as a useful proxy for unmeasured confounding that varies according to a spati…
▽ More
Propensity score matching is a common tool for adjusting for observed confounding in observational studies, but is known to have limitations in the presence of unmeasured confounding. In many settings, researchers are confronted with spatially-indexed data where the relative locations of the observational units may serve as a useful proxy for unmeasured confounding that varies according to a spatial pattern. We develop a new method, termed Distance Adjusted Propensity Score Matching (DAPSm) that incorporates information on units' spatial proximity into a propensity score matching procedure. We show that DAPSm can adjust for both observed and some forms of unobserved confounding and evaluate its performance relative to several other reasonable alternatives for incorporating spatial information into propensity score adjustment. The method is motivated by and applied to a comparative effectiveness investigation of power plant emission reduction technologies designed to reduce population exposure to ambient ozone pollution. Ultimately, DAPSm provides a framework for augmenting a "standard" propensity score analysis with information on spatial proximity and provides a transparent and principled way to assess the relative trade offs of prioritizing observed confounding adjustment versus spatial proximity adjustment.
△ Less
Submitted 6 December, 2017; v1 submitted 24 October, 2016;
originally announced October 2016.
-
The central role of Bayes theorem for joint estimation of causal effects and propensity scores
Authors:
Corwin M. Zigler
Abstract:
Although propensity scores have been central to the estimation of causal effects for over 30 years, only recently has the statistical literature begun to consider in detail methods for Bayesian estimation of propensity scores and causal effects. Underlying this recent body of literature on Bayesian propensity score estimation is an implicit discordance between the goal of the propensity score and…
▽ More
Although propensity scores have been central to the estimation of causal effects for over 30 years, only recently has the statistical literature begun to consider in detail methods for Bayesian estimation of propensity scores and causal effects. Underlying this recent body of literature on Bayesian propensity score estimation is an implicit discordance between the goal of the propensity score and the use of Bayes theorem. The propensity score condenses multivariate covariate information into a scalar to allow estimation of causal effects without specifying a model for how each covariate relates to the outcome. Avoiding specification of a detailed model for the outcome response surface is valuable for robust estimation of causal effects, but this strategy is at odds with the use of Bayes theorem, which presupposes a full probability model for the observed data. The goal of this paper is to explicate this fundamental feature of Bayesian estimation of causal effects with propensity scores in order to provide context for the existing literature and for future work on this important topic.
△ Less
Submitted 8 April, 2014; v1 submitted 26 August, 2013;
originally announced August 2013.
-
The potential for bias in principal causal effect estimation when treatment received depends on a key covariate
Authors:
Corwin M. Zigler,
Thomas R. Belin
Abstract:
Motivated by a potential-outcomes perspective, the idea of principal stratification has been widely recognized for its relevance in settings susceptible to posttreatment selection bias such as randomized clinical trials where treatment received can differ from treatment assigned. In one such setting, we address subtleties involved in inference for causal effects when using a key covariate to predi…
▽ More
Motivated by a potential-outcomes perspective, the idea of principal stratification has been widely recognized for its relevance in settings susceptible to posttreatment selection bias such as randomized clinical trials where treatment received can differ from treatment assigned. In one such setting, we address subtleties involved in inference for causal effects when using a key covariate to predict membership in latent principal strata. We show that when treatment received can differ from treatment assigned in both study arms, incorporating a stratum-predictive covariate can make estimates of the "complier average causal effect" (CACE) derive from observations in the two treatment arms with different covariate distributions. Adopting a Bayesian perspective and using Markov chain Monte Carlo for computation, we develop posterior checks that characterize the extent to which incorporating the pretreatment covariate endangers estimation of the CACE. We apply the method to analyze a clinical trial comparing two treatments for jaw fractures in which the study protocol allowed surgeons to overrule both possible randomized treatment assignments based on their clinical judgment and the data contained a key covariate (injury severity) predictive of treatment received.
△ Less
Submitted 7 November, 2011;
originally announced November 2011.