The validity and reliability of expert-based assessments can be improved by using multiple raters. However, to maximise scarce resources, use of multiple raters should focus on jobs for which experts are more likely to disagree. For comparisons of agreement across subgroups, the standard metric Kappa must be used cautiously because it is sensitive to the ratings’ marginal distribution. As an alternative, we used Kappa’s numerator: the difference between observed and expected agreement. This value equals the Mean Risk Stratification (MRS), a novel metric also used to evaluate the predictiveness of risk models. MRS is interpreted as the number of observations (per 100) that raters will agree on beyond chance. For subgroups of jobs in three industries stratified based on 4 characteristics, we evaluated quadratically-weighted MRS from six experts’ ordinal, 4-category exposure ratings (67–74 workers per industry). For all industries, MRS was consistently lower for jobs in far vs. near proximity to an exposure source and for jobs with multiple vs. one work locations, with experts agreeing on 2–8 fewer jobs (per 100) for far proximity jobs and 0.4–12 fewer jobs with multiple work locations. MRS was also lower for jobs with subject-reported non-visible vs. visible dust accumulation in two industries (difference: 1–6 jobs) and for non-production vs. production jobs in one industry (difference: 9 jobs). The use of MRS allowed us to identify job characteristics that are associated with lower agreement between experts and to quantify the potential benefit of using multiple raters.