Introduction: Electronic Health Records (EHRs) benefit record keeping, information collation, error prevention, and charge capture. They provide a large database of clinical information that can be used for research. Sorting vast amounts of data manually is inefficient, hence, an effectual, validated method is required to uncover information from large sets of data and generate knowledge.
The U.S., and especially West Virginia, has a tremendous burden of cardiovascular disease (CVD). Undiagnosed Familial Hypercholesterolemia (FH) is an important factor for CVD in the U.S. FH results in elevated levels of LDL from childhood and early atherosclerotic disease. We are interested in better screening processes for FH. One method is to detect adults with coronary artery disease (CAD) and determine if their lipid levels are indicative of FH. Relatives and children can then be screened for FH and treated. Efficient identification of a CAD phenotype from EHRs is an important initial step in this screening process.
Hypothesis: We hypothesized that a CAD phenotype detection algorithm that uses discrete data elements from EHRs can be validated as a precursor to detection of FH.
Methods: We developed an algorithm to detect a CAD phenotype, which searched through discrete data elements, such as diagnoses lists (ICD-10) and procedure (CPT) codes. Direct inspection of EHR discrete data avoided the need for artificial intelligence, such as natural language processing.
The algorithm was applied to a cohort of 1,000 patients with varying characteristics. We then determined which patients had CAD by systematically going through EHRs.
Following this, we revised the algorithm by refining the constraints under which it operated. We ran the algorithm again on the same 1,000 patients, and determined the accuracy of the modified algorithm.
Results: Manual validation of the 1,000 patients resulted in 413 with CAD and 587 without. The original algorithm distinguished 488 CAD positive patients and 512 CAD negative patients. This was 89% accurate, 96% sensitive, and 85% specific.
After revising the algorithm and applying it to the same cohort, it determined that there were 474 CAD patients and 526 without CAD. This was 93% accurate, 99% sensitive, and 89% specific.
Conclusion: EHR usage has created a large pool of minable clinical data. However, without an efficient method to obtain inferences from it, the information cannot be effectually utilized. We have created an algorithm that detects CAD on a large scale with high accuracy. It has proven to be useful among a varied patient population. Since the constraints that are used, such as ICD codes and CPT codes, are universal, it can be utilized across many hospital systems; although, local validation is prudent. Using this algorithm can select a population with a propensity for FH, thereby allowing us to screen and manage patients with undiagnosed FH or other familial dyslipidemias.