Surveillance of venous thromboembolisms (VTEs) is necessary for improving patient safety in acute care hospitals, but current detection methods are inaccurate and inefficient. With the growing availability of clinical narratives in an electronic format, automated surveillance using natural language processing (NLP) techniques may represent a better method.Objective:
We assessed the accuracy of using symbolic NLP for identifying the 2 clinical manifestations of VTE, deep vein thrombosis (DVT) and pulmonary embolism (PE), from narrative radiology reports.Methods:
A random sample of 4000 narrative reports was selected among imaging studies that could diagnose DVT or PE, and that were performed between 2008 and 2012 in a university health network of 5 adult-care hospitals in Montreal (Canada). The reports were coded by clinical experts to identify positive and negative cases of DVT and PE, which served as the reference standard. Using data from the largest hospital (n=2788), 2 symbolic NLP classifiers were trained; one for DVT, the other for PE. The accuracy of these classifiers was tested on data from the other 4 hospitals (n=1212).Results:
On manual review, 663 DVT-positive and 272 PE-positive reports were identified. In the testing dataset, the DVT classifier achieved 94% sensitivity (95% CI, 88%-97%), 96% specificity (95% CI, 94%-97%), and 73% positive predictive value (95% CI, 65%-80%), whereas the PE classifier achieved 94% sensitivity (95% CI, 89%-97%), 96% specificity (95% CI, 95%-97%), and 80% positive predictive value (95% CI, 73%-85%).Conclusions:
Symbolic NLP can accurately identify VTEs from narrative radiology reports. This method could facilitate VTE surveillance and the evaluation of preventive measures.