We conducted a pilot study to test the interrater reliability of emergency department (ED) physician assessments of 3 ED visit attributes—severity, immediacy, and ideal setting, with the long-term goal of developing a novel ED categorization system.Methods:
Using 2010 National Hospital Ambulatory Care Survey data, we randomly selected 300 ED patient records for review by 6 emergency medicine physicians. Each record was assessed by 2 physicians for severity and immediacy using a 7-point scale; “ideal” setting was chosen among 6 possible settings. κ—Weighted and unweighted—and interclass correlation coefficients were used to test interrater agreement.Results:
For severity, immediacy, and ideal setting, there was “fair” agreement in assessments with a weighted κ of 0.33 (95% confidence interval [CI], 0.27-0.40), 0.30 (95% CI, 0.23-0.36), and nonweighted κ of 0.28 (95% CI, 0.21-0.34), respectively. When both raters were “very certain” about their assessments, weighted κ increased to 0.42 (95% CI, 0.34-0.51) for severity and 0.35 (95% CI, 0.27-0.44) for immediacy. Interclass correlation coefficients showed similar results. There was agreement on ideal setting for 162 (54%) of 300 scenarios. Scenarios with poor agreement on ideal setting in general involved care for nonspecific symptoms rather than specific diagnoses.Conclusions:
Rater agreement among ED physicians when assessing clinical data on specific ED visits was fair for severity and immediacy ratings. Raters agreed on ideal treatment settings half the time. In general, there was greater agreement when a specific diagnosis was found rather than negative workups for symptoms. This demonstrates a validity issue when it comes to developing and using categorization systems for ED visits and assessing setting appropriateness.