Operational USMLE™ computer-based case simulation results were examined to determine the extent to which rater reliability and regression model performance met expectations based on preoperational data.Method.
Operational data resulted from Step 3 examinations given between 1999 and 2004. Plots were produced using reliability and multiple correlation coefficients.Results.
Operational testing reliabilities increased over the four years but were lower than the preoperational reliability. Multiple correlation coefficient results are somewhat superior to the results reported during the preoperational period and suggest that the operational scoring algorithms have been relatively consistent.Conclusions.
Changes in the rater population, changes in the rating task, and enhancements to the training procedures are several factors that can explain the identified differences between preoperational and operational results. The present findings have important implications for test development and test validity.