Scoring systems frequently are used to assess the outcome of total hip arthroplasty. The result may be presented as a numeric value, or in descriptive terms such as excellent, good, fair, and poor (category system). The current study was done to investigate the influence of descriptive and numeric outcomes for interobserver reliability and interscore correlation of five different hip scores. Sixty-four patients (83 hips) were included in the study. The average age of the patients at followup was 70 years (range, 48–88 years). The average followup was 6.2 years (range, 2–17 years). For the numeric outcome a higher interobserver reliability (correlation coefficient, 0.71–0.81) and interscore correlation (correlation coefficient, 0.81–0.92) were found compared with the category system (interobserver reliability[correlation coefficient, 0.57–0.72]; interscore correlation [correlation coefficient, 0.46–0.62]). Findings from the study suggest that categorization of the results of total hip arthroplasty reduces interobserver reliability and interscore correlation.