Providing the best treatment options and appropriate prognostic information to patients with cartilaginous neoplasms of long bones depends on distinguishing benign from malignant lesions. Correlative interpretation of imaging, histopathology, and clinical information is the current method for making this distinction, yet the reliability of this approach has not been critically evaluated. This study quantifies the interobserver reliability of the determination of grade for cartilaginous neoplasms among a group of experienced musculoskeletal pathologists and radiologists.Methods
Nine recognized musculoskeletal pathologists and eight recognized musculoskeletal radiologists reviewed forty-six consecutive cases of cartilaginous lesions in long bones that underwent open biopsy or intralesional curettage. All diagnosticians had a bulleted history and preoperative conventional radiographs for review. Pathologists reviewed the original hematoxylin and eosin-stained glass slides from each case. Radiologists reviewed any additional imaging that was available, variably including serial radiographs, magnetic resonance imaging, and computed tomography scans. Each diagnostician classified a lesion as benign, low-grade malignant, or high-grade malignant. Kappa coefficients were calculated as a measure of reliability.Results
Kappa coefficients for interrater reliability were 0.443 for the pathologists and 0.345 for the radiologists (p < 0.0001 for both). Kappa coefficients for a subgroup of cases determined to be high risk by subsequent clinical course were poorer at 0.236 and 0.206, respectively (p < 0.0001 for both). Slightly improved agreement among radiologists was noted for the twenty lesions that had magnetic resonance imaging available (Kappa = 0.437, p < 0.0001), but not for the lesions analyzed with serial plain radiographs or computed tomography scans.Conclusions
This study demonstrates low reliability for the grading of cartilaginous lesions in long bones, even among specialized and experienced pathologists and radiologists. This included low reliability both in differentiating benign from malignant lesions and in differentiating high-grade from low-grade malignant lesions, both of which are critical to the safe treatment of these neoplasms. This may explain in part the wide variation in outcomes reported for chondrosarcomas treated in different medical centers. New diagnostic and grading strategies linked to protocol-driven treatments are needed, but they must be measured against the long-term gold standard of patient outcomes.