Treatment Prediction, Balance, and Propensity Score Adjustment
and vector of confounding variables
, the propensity score is a balancing score,
is a scalar quantity, whatever the dimension of
. Achieving balance on covariates that are not confounders (particularly instruments, i.e., strong predictors of treatment) is unhelpful, and yet there is still considerable interest in (and some advocacy for) using procedures to estimate the treatment model that are more complex than the simple approach of fitting a binary regression model. We caution that such procedures must be used with care, as the goals of optimal treatment prediction and balancing are very different.
We report findings from an empirical study of the impact of current smoking on systolic blood pressure using data from National Health and Nutrition Examination Survey, restricting attention to adults in the second wave of the survey. We compare the estimated propensity scores and balance statistics for propensity score estimation using procedures of increasing complexity. Potential confounders are gender, age, race, education, marital status, household income, and a poverty index. We compare logistic regression, generalized boosted models as suggested by,3 and the ensemble approach of Super Learning,4 using the following R libraries: k-nearest neighbors, regularized generalized linear models, mean prediction, and random forests (all with default settings). We examined standardized mean difference for each of the confounding variables (i) in the original sample, (ii) within quintiles of each fitted propensity score, (iii) following 1:1 matching with replacement,5 and (iv) following inverse probability of treatment weighting.
Local balance is not achieved within quintiles of the propensity score for any of the estimation approaches (Table). Thus a stratified analysis could not rely on fitting simple means within quintiles, but rather would need to rely on outcome regression modeling within quintiles—however, typically the analyst wishes to avoid specifying an outcome regression model. Using logistic regression to compute the propensity score, we observe excellent balance is achieved through inverse probability of treatment weighting, and balance is quite good following matching.
The propensity score estimated via a generalized boosted model leads to greater predictive accuracy (0.80 as compared with 0.70 within-sample accuracy for logistic regression), and greater separation in the propensity score distribution between smokers and nonsmokers (see eFigure 1; http://links.lww.com/EDE/B189), and a decrease in the balance as measured by the standardized mean difference. Super Learning’s predictive accuracy is even greater (0.98), yielding such strong separation between the smokers and nonsmokers that only the third quintile contains both exposure groups. Within the third quintile, balance is generally worse than in the original sample.
Beyond the lack of overlap (positivity violations) resulting from high predictive accuracy, in an analysis using inverse probability weighting, weights will be close to 1 when the treatment model is very accurate. Thus, in this example, the average treatment effect estimated using weighting by a treatment model fit by Super Learning is most similar to the naive (unweighted) difference of averages between exposed and unexposed: the naive estimate (95% confidence interval [CI]) is −3.70 (−5.71, −1.78), where as the inverse probability weighted estimates using a treatment model fit by logistic regression, generalized boosted model, and Super Learning are, respectively, −1.99 (−3.93, −0.16), −2.18 (−4.22, −0.17), and −3.49 (−5.39, −1.87).