|| Checking for direct PDF access through Ovid
Introduction: Epidemiological studies utilizing administrative databases typically use International Classification of Diseases (ICD) codes to identify stroke cases and estimate incidence rates. However, they are limited by sensitivity/specificity across study designs and stroke types. Few studies utilize physician chart review of patient records to confirm cases for improved accuracy, as this is labor intensive. We sought to develop a machine learning (ML) approach that could adjudicate potential stroke events.Methods: We utilized 8081 hospitalized stroke events in the Greater Cincinnati/Northern Kentucky Stroke Study. The study coordinators identified events with stroke-related diagnoses (ICD9 codes 430-438) from 17 regional hospitals in 2005 and 2010 and performed detailed chart abstraction. The information (e.g. diagnostic tests) was abstracted from patients’ medical records for each event, followed by physician case adjudication. Utilizing all clinical variables, a ML algorithm (logistic regression) was used to predict stroke cases and subtypes (ischemic, hemorrhagic, TIA, and non-strokes). Linear regression (LR) was applied to calibrate ML outputs and estimate prediction intervals based on gold-standard physician adjudication. The ML and LR models were trained on one year of data and tested on the other year. The model results were compared with using ICD-9 (ischemic: 434/436; hemorrhagic: 430-432; TIA: 435; non-stroke: other codes) calibrated by LR analysis.Results: Prediction intervals generated by ML covered the majority of true numbers of stroke events (Table). Compared with ICD9 codes, the ML algorithm achieved better sensitivity/specificity and more “hits” with narrower prediction intervals.Conclusions: The ML algorithm showed promise in matching physician adjudication and subtyping stroke cases. Future work is required to refine the methods to automate stroke epidemiology with improved accuracy and granularity.