Comparison of methods for algorithmic classification of dementia status in the Health and Retirement Study

    loading  Checking for direct PDF access through Ovid



Dementia ascertainment is time-consuming and costly. Several algorithms use existing data from the U.S.-representative Health and Retirement Study (HRS) to algorithmically identify dementia. However, relative performance of these algorithms remains unknown.


We compared performance across five algorithms (Herzog-Wallace, Langa-Kabeto-Weir, Crimmins, Hurd, Wu) overall and within sociodemographic subgroups in participants in HRS and Wave A of the Aging, Demographics, and Memory Study (ADAMS, 2000-2002), an HRS sub-study including in-person dementia ascertainment. We then compared algorithmic performance in an internal (time-split) validation dataset including participants of HRS and ADAMS Waves B, C, and/or D (2002-2009).


In the unweighted training data, sensitivity ranged from 53% to 90%, specificity ranged from 79% to 97%, and overall accuracy ranged from 81% to 87%. Though sensitivity was lower in the unweighted validation data (range: 18% to 62%), overall accuracy was similar (range: 79% to 88%) due to higher specificities (range: 82% to 98%). In analyses weighted to represent the age-eligible US population, accuracy ranged from 91% to 94% in the training data and 87% to 94% in the validation data. Using a 0.5 probability cutoff, Crimmins maximized sensitivity, Herzog-Wallace maximized specificity, and Wu and Hurd maximized accuracy. Accuracy was higher among younger, highly-educated, and non-Hispanic white participants versus their complements in both weighted and unweighted analyses.


Algorithmic diagnoses provide a cost-effective way to conduct dementia research. However, naïve use of existing algorithms in disparities or risk-factor research may induce non-conservative bias. Algorithms with more comparable performance across relevant subgroups are needed.

Related Topics

    loading  Loading Related Articles