Evidence-based practice (EBP) is a dominant paradigm in healthcare that aims to deliver the highest quality patient care. EBP requires clinicians to integrate the best-available, current research evidence, with their own clinical expertise, and to consider patients’ needs and preferences, when making clinical decisions. Consideration of the ‘best’ evidence requires clinicians to evaluate the scientific quality of published studies (i.e., undertake critical appraisal); however, recognised barriers to this process include a lack of skill, a lack of time, and the quantity of published research.Objectives
To overcome these established barriers to EBP, we developed a free, online tool that teaches critical appraisal and facilitates the sharing of appraisals amongst a global community of clinicians (CrowdCARE, Crowdsourcing Critical Appraisal of Research Evidence: crowdcare.unimelb.edu.au). Our aim was to investigate the rigour of crowdsourcing critical appraisal from trained novice raters, using CrowdCARE.Method
Systematic reviews (n=71) were critically appraised in CrowdCARE by five trained novice raters and two expert raters. For each article, the appraisal was performed using a validated tool (Assessing Methodological Quality of Systematic Reviews, AMSTAR) to yield: (i) an aggregate quality score (range: 0–11), and (ii) domain-specific responses for each of the 11 assessment items. After performing independent appraisals, experts resolved any disagreements by consensus (to produce an ‘expert consensus’ rating, as the gold-standard approach for appraisal in systematic reviews). For novices, the aggregate mean score was calculated.Method
Critical appraisal quality was investigated by: (i) assessing variability in AMSTAR scoring both between experts and between the expert consensus and mean novice ratings; (ii) calculating the concordance of ratings using Cohen’s Kappa (κ); and (iii) identifying ‘contentious AMSTAR items,’ defined as when more than half of the novice raters provided a different response to the expert consensus rating.Results
The variability in aggregate AMSTAR scores was similar between expert raters, and between the expert consensus and mean novice ratings. Comparing the expert consensus rating with individual expert ratings, the AMSTAR score was within ±1 unit for 82% of studies. Comparing the expert consensus rating with the mean novice rating, the score was within ±1 unit for 87% of studies. A strong correlation was evident between the expert consensus rating and the mean novice rating (Pearson’s correlation coefficient, r2=0.89, p<0.0001). Rating concordance, evaluated using Cohen’s Kappa (κ), indicated good overall agreement (κ=0.67, 95% CI: 0.61 to 0.73) between the aggregate score of the expert consensus rating and mean novice rating. Furthermore, for 82% of articles, the mean novice assessment was consistent with the expert consensus assessment for at least nine out of 11 the individual AMSTAR assessment items.Conclusions
These data are the first to demonstrate the merit of crowdsourcing for assessing research quality. We find that novices can be trained to critically appraise systematic reviews in CrowdCARE and overall achieve a high degree of accuracy relative to experts. CrowdCARE provides clinicians with the essential skills to appraise research quality and contributes to making EBP more efficient by removing the substantial duplication of effort made by individual clinicians across the globe. The CrowdCARE datastream can support efficient and rapid evidence synthesis for clinical guidelines and systematic reviews, to inform practice and/or policy, based upon the best-available research evidence.