Data-adaptive approaches to confounding adjustment may improve performance beyond expert knowledge when analyzing electronic healthcare databases and have additional practical advantages for analyzing multiple databases in rapid cycles. Improvements seemed possible if outcome predictors were reliably identified empirically and adjusted.Methods:
In five cohort studies from diverse healthcare databases, we implemented a base-case high-dimensional propensity score algorithm with propensity score decile-adjusted outcome models to estimate treatment effects among prescription drug initiators. The original variable selection procedure based on the estimated bias of each variable using unadjusted associations between confounders and exposure (RRCE) and disease outcome (RRCD) was augmented by alternative strategies. These included using increasingly adjusted RRCD estimates, including models considering >1,500 variables jointly (Lasso, Bayesian logistic regression); using prediction statistics or likelihood-ratio statistics for covariate prioritization; directly estimating the propensity score with >1,500 variables (Lasso, Bayesian regression); or directly fitting an outcome model using all covariates jointly (Lasso, Ridge).Results:
In five example studies, most tested augmentations of the base-case hdPS did not meaningfully change estimates in light of wide confidence intervals except for Bayesian regression and Lasso to estimate RRCD, which moved estimates minimally closer to the expectation in three of five examples. The direct outcome estimation with Lasso performed worst.Conclusion:
Overall, the basic heuristic of variable reduction in high-dimensional propensity score adjustment performed, as well as alternative approaches in diverse settings. Minor improvements in variable selection may be possible using Bayesian outcome regression to prioritize variables for propensity score estimation when outcomes are rare. See video abstract at, http://links.lww.com/EDE/B162.