To assess the accuracy of crowdsourcing for grading optic nerve images for glaucoma using Amazon Mechanical Turk before and after training modules.Materials and Methods:
Images (n=60) from 2 large population studies were graded for glaucoma status and vertical cup-to-disc ratio (VCDR). In the baseline trial, users on Amazon Mechanical Turk (Turkers) graded fundus photos for glaucoma and VCDR after reviewing annotated example images. In 2 additional trials, Turkers viewed a 26-slide PowerPoint training or a 10-minute video training and passed a quiz before being permitted to grade the same 60 images. Each image was graded by 10 unique Turkers in all trials. The mode of Turker grades for each image was compared with an adjudicated expert grade to determine accuracy as well as the sensitivity and specificity of Turker grading.Results:
In the baseline study, 50% of the images were graded correctly for glaucoma status and the area under the receiver operating characteristic (AUROC) was 0.75 [95% confidence interval (CI), 0.64-0.87]. Post-PowerPoint training, 66.7% of the images were graded correctly with AUROC of 0.86 (95% CI, 0.78-0.95). Finally, Turker grading accuracy was 63.3% with AUROC of 0.89 (95% CI, 0.83-0.96) after video training. Overall, Turker VCDR grades for each image correlated with expert VCDR grades (Bland-Altman plot mean difference=−0.02).Conclusions:
Turkers graded 60 fundus images quickly and at low cost, with grading accuracy, sensitivity, and specificity, all improving with brief training. With effective education, crowdsourcing may be an efficient tool to aid in the identification of glaucomatous changes in retinal images.