Measuring interobserver variation in a pathology EQA scheme using weighted κ for multiple readers

    loading  Checking for direct PDF access through Ovid



A Urological Pathology External Quality Assurance (EQA) Scheme in the UK has reported observer variation in the diagnosis and grading of adenocarcinoma in prostatic biopsies using basic κ statistics, which rate all disagreements equally.


The aim of this study is to use customised weighting schemes to report κ statistics that reflect the closeness of interobserver agreement in the prostate EQA scheme.


A total of 83, 114 and 116 pathologists took part, respectively, in three web-based circulations and were classified as either expert or other readers. For analyses of diagnosis, there were 10, 8 and 8 cases in the three circulations, respectively. For analyses of Gleason Sum Score, only invasive cases were included, leaving 5, 5 and 6 cases, respectively. Analyses were conducted using customised weighting schemes with ‘pairwise-weighted’ κ for multiple readers.


Analysis of diagnosis for all circulations and all readers gave a composite κ value of 0.86 and pairwise-weighted κ (κp–w) value of 0.91, both regarded as ‘almost perfect’ agreement. This was due to the high proportion of responses that showed partial agreement. Analysis of Gleason Sum Score gave κ=0.38 and κp–w=0.58 over all circulations and all readers, indicating that discrepancies occur at the boundary between adjacent grades and may not be as clinically significant as suggested by composite κ.


Weighted κ show higher levels of agreement than previously reported as they have the advantage of applying weighting, which reflects the relative importance of different types of discordance in diagnosis or grading. Agreement on grading remained low.

Related Topics

    loading  Loading Related Articles