Introduction: Real-time identification of patients with acute ischemic stroke (AIS) in the electronic health record (EHR) can enhance care delivery systems, clinical decision support, and research subject recruitment. EHR data that is accessible during a patient’s admission may be used to identify patients with AIS, but established methods for characterizing which data to use have not yet been determined.
Hypotheses: 1. An EHR “phenotype” of AIS can be identified using clinical EHR data. 2. Machine learning can identify the AIS phenotype using similar inputs with greater accuracy than clinician-specified identification algorithms.
Methods: Two stroke neurologists selected generalizable AIS-related clinical data points from the Columbia University Medical Center EHR (clinical laboratory results and medication, imaging, and stroke service list orders) to identify the AIS phenotype, and determined pre-specified priority logic based on institutional practice patterns. Separately, a regularized logistic regression (RLR) model was applied to all available neurology-related order sets and clinical laboratory inputs. The classification accuracy of the two algorithms was compared using a “gold standard” data set, consisting of our institution’s ischemic stroke registry from January 1st, 2015 to March 31st, 2016. Negative controls were selected from all patients admitted to the neurology service at our institution during the same time period.
Results: Our data contained 482 patients with AIS and 3,628 negative controls. The clinician-specified identification algorithm identified the AIS phenotype with sensitivity of 90.6%, specificity of 50.4%, and positive predictive value (PPV) of 93.5%. In comparison, the RLR-based algorithm had a sensitivity of 96.3%, specificity of 52.2%, and PPV of 93.8%.
Conclusions: We determined an AIS phenotype that could be identified using clinical, non-claims data that is available during a patient’s admission, and used machine learning to optimize the classifying ability. While specificity is low, the high sensitivity may allow use for screening and clinical decision support. Further studies are needed to externally validate these findings and optimize algorithm specificity.