|| Checking for direct PDF access through Ovid
Participants in epidemiologic and genetic studies are rarely true random samples of the populations they are intended to represent, and both known and unknown factors can influence participation in a study (known as selection into a study). The circumstances in which selection causes bias in an instrumental variable (IV) analysis are not widely understood by practitioners of IV analyses. We use directed acyclic graphs (DAGs) to depict assumptions about the selection mechanism (factors affecting selection) and show how DAGs can be used to determine when a two-stage least squares IV analysis is biased by different selection mechanisms. Through simulations, we show that selection can result in a biased IV estimate with substantial confidence interval (CI) undercoverage, and the level of bias can differ between instrument strengths, a linear and nonlinear exposure–instrument association, and a causal and noncausal exposure effect. We present an application from the UK Biobank study, which is known to be a selected sample of the general population. Of interest was the causal effect of staying in school at least 1 extra year on the decision to smoke. Based on 22,138 participants, the two-stage least squares exposure estimates were very different between the IV analysis ignoring selection and the IV analysis which adjusted for selection (e.g., risk differences, 1.8% [95% CI, −1.5%, 5.0%] and −4.5% [95% CI, −6.6%, −2.4%], respectively). We conclude that selection bias can have a major effect on an IV analysis, and further research is needed on how to conduct sensitivity analyses when selection depends on unmeasured data.