To determine the generalizability of crowdsourced, electronic health data from self-selected individuals using a national survey as a reference.Methods
Using the world’s largest crowdsourcing platform in 2015, we collected data on characteristics known to influence cardiovascular disease risk and identified comparable data from the 2013 Behavioral Risk Factor Surveillance System. We used age-stratified logistic regression models to identify differences among groups.Results
Crowdsourced respondents were younger, more likely to be non-Hispanic and White, and had higher educational attainment. Those aged 40 to 59 years were similar to US adults in the rates of smoking, diabetes, hypertension, and hyperlipidemia. Those aged 18 to 39 years were less similar, whereas those aged 60 to 75 years were underrepresented among crowdsourced respondents.Conclusions
Crowdsourced health data might be most generalizable to adults aged 40 to 59 years, but studies of younger or older populations, racial and ethnic minorities, or those with lower educational attainment should approach crowdsourced data with caution.Public Health Implications
Policymakers, the national Precision Medicine Initiative, and others planning to use crowdsourced data should take explicit steps to define and address anticipated underrepresentation by important population subgroups.