Abstract ranking processes for scientific conferences are essential but controversial. This study examined the validity of a structured abstract rating instrument, evaluated interrater variability, and modeled the impact of interrater variability on abstract ranking decisions. Additionally, we examined whether a more efficient rating process (abstracts rated by two rather than three raters) supported valid abstract rankings.Methods:
Data were 4016 sets of abstract ratings from the 2011 and 2013 national scientific conferences for a health discipline. Many-faceted Rasch analysis procedures were used to examine validity of the abstract rating instrument and to identify and adjust for the presence of interrater variability. The two-rater simulation was created by the deletion of one set of ratings for each abstract in the 2013 data set.Results:
The abstract rating instrument demonstrated sound measurement properties. Although each rater applied the rating criteria consistently (intrarater reliability), there was significant variability between raters. Adjusting for interrater variability changed the final presentation format for approximately 10–20% of abstracts. The two-rater simulation demonstrated that abstract rankings derived through this process were valid, although the impact of interrater variability was more substantial.Discussion:
Interrater variability exerts a small but important influence on overall abstract acceptance outcome. The use of many-faceted Rasch analysis allows for this variability to be adjusted for. Additionally, Rasch processes allow for more efficient abstract ranking by reducing the need for multiple raters.