Objective: Complete documentation in large scale datasets such as administrative data or disease registries is often difficult. Given that the subset of patients with complete data documentation are most likely not a random sample of patients, selection bias threatens the validity of results if a complete case analysis is used. To demonstrate, we will assess the presence and magnitude of selection bias in ischemic stroke patients with documented National Institute of Health Stroke Scale (NIHSS) [[Unable to Display Character: –]] which is often incomplete [[Unable to Display Character: –]] using the Heckman Selection Model.
Methods: Patient level variables including demographics, comorbidities, clinical EMS and admission variables, and medical history/comorbidities were obtained from 10,717 ischemic stroke patients aged 65 and older in the Michigan Stroke Registry in 2009-2012. The Heckman Selection Model assesses the presence and magnitude of selection bias by estimating a correlation coefficient between error components of a linear regression model predicting patient NIHSS score [[Unable to Display Character: –]] the outcome model [[Unable to Display Character: –]] and a binary probit model predicting NIHSS documentation [[Unable to Display Character: –]] the selection model [[Unable to Display Character: –]] conditional on patient and hospital predictors. The outcome model predicting NIHSS score was specified using a backward selection process with stepwise deletion of non-significant predictors. The selection model included all variables in the outcome model, plus additional significant predictors of NIHHS documentation. Quasi-maximum likelihood estimation was used to produce robust standard errors. All analyses were done using PROC QLIM procedure in SAS.
Results: 7,956 cases (74.2%) of cases had NIHSS documented. Significant predictors in the outcome and selection models are shown in the Table. The Heckman Selection Model found a statistically significant but modest correlation coefficient of ρ=0.1089 (SE=0.0119, p<0.0001). The positive correlation indicates that NIHSS was more likely to be documented in patients with higher NIHSS scores, i.e., more severe strokes.
Conclusions: We found statistically significant albeit weak selection bias in the documentation of NIHSS in stroke patients. The Heckman Selection Model is a novel method that can be used to assess the presence and magnitude of selection bias when missing data is common.