The widespread use of standardized health surveys is predicated on the largely untested assumption that scales constructed from those surveys will satisfy minimum psychometric requirements across diverse population groups. Data from the Medical Outcomes Study (MOS) were used to evaluate data completeness and quality, test scaling assumptions, and estimate internal-consistency reliability for the eight scales constructed from the MOS SF-36 Health Survey. Analyses were conducted among 3,445 patients and were replicated across 24 subgroups differing in sociodemographic characteristics, diagnosis, and disease severity. For each scale, item-completion rates were high across all groups (88% to 95%), but tended to be somewhat lower among the elderly, those with less than a high school education, and those in poverty. On average, surveys were complete enough to compute scale scores for more than 96% of the sample. Across patient groups, all scales passed tests for item-internal consistency (97% passed) and item-discriminant validity (92% passed). Reliability coefficients ranged from a low of 0.65 to a high of 0.94 across scales (median=0.85) and varied somewhat across patient subgroups. Floor effects were negligible except for the two role disability scales. Noteworthy ceiling effects were observed for both role disability scales and the social functioning scale. These findings support the use of the SF-36 survey across the diverse populations studied and identify population groups in which use of standardized health status measures may or may not be problematic.