To compare probabilistic and deterministic algorithms for linking mothers and infants within electronic health records (EHRs) to support pregnancy outcomes research.Methods
The study population was women enrolled in Group Health (Washington State, USA) delivering a liveborn infant from 2001 through 2008 (N = 33 093 deliveries) and infant members born in these years. We linked women to infants by surname, address, and dates of birth and delivery using deterministic and probabilistic algorithms. In a subset previously linked using “gold standard” identifiers (N = 14 449), we assessed each approach's sensitivity and positive predictive value (PPV). For deliveries with no “gold standard” linkage (N = 18 644), we compared the algorithms' linkage proportions. We repeated our analyses in an independent test set of deliveries from 2009 through 2013. We reviewed medical records to validate a sample of pairs apparently linked by one algorithm but not the other (N = 51 or 1.4% of discordant pairs).Results
In the 2001–2008 “gold standard” population, the probabilistic algorithm's sensitivity was 84.1% (95% CI, 83.5–84.7) and PPV 99.3% (99.1–99.4), while the deterministic algorithm had sensitivity 74.5% (73.8–75.2) and PPV 95.7% (95.4–96.0). In the test set, the probabilistic algorithm again had higher sensitivity and PPV. For deliveries in 2001–2008 with no “gold standard” linkage, the probabilistic algorithm found matched infants for 58.3% and the deterministic algorithm, 52.8%. On medical record review, 100% of linked pairs appeared valid.Conclusions
A probabilistic algorithm improved linkage proportion and accuracy compared to a deterministic algorithm. Better linkage methods can increase the value of EHRs for pregnancy outcomes research. Copyright © 2014 John Wiley & Sons, Ltd.