We analyzed diatom and water chemistry data collected by The Academy of Natural Sciences from 47 rivers throughout the eastern United States to address several ecological questions. How does the composition of diatom assemblages vary over large regional scales? What are the most important environmental factors affecting assemblage composition and how does their influence vary among regions and with spatial scale? How do distributions and autecological characteristics of individual taxa vary spatially? What are the implications of answers to these questions for use of diatoms as water quality indicators? Data for 186 samples at 116 sites were collected from 1951 to 1991 on moderate- to large-sized rivers ranging from Maine to Texas as part of Academy monitoring and survey programs, most initiated and implemented by Dr. Ruth Patrick. Several sites were highly impaired by point and non-point source pollution. Diatom assemblages grouped into four main categories, based on multivariate analyses. Group membership correlated equally well with intermediate-scale geographic regions and water chemistry: (1) Northeastern US rivers with lower alkalinity and hardness, and pH 6.5-7.8; (2) Primarily dilute coastal plain rivers in the southeastern United States with the lowest average pH (5.5-7.3) of all sites and some with high DOC; (3) Rivers within and west of the Appalachian Mountains, generally having higher pH (>7.5) than those in other regions, but with relatively low chloride concentrations; and (4) Gulf Coast rivers with the highest chloride (>100 mg l-1), hardness (>250 mg l-1), and pH of rivers in all the groups. Hardness, pH, alkalinity, and Cl explained most of the variation among diatom assemblages, based on ordination analysis. Factors related to water quality problems, such as BOD, P, NH4, and turbidity explained much less variability at the eastern US scale, but were more important in the four intermediate-scale regions. Diatom taxa abundance-weighted mean values for water chemistry characteristics varied among the four intermediate-scale regions, often greatly, and in proportion to the average measured values for each region. Design of calibration data sets for development of water quality indicators should account for spatial scale in relation to species dispersal, regional geochemistry and habitat types, and human-influenced water chemistry characteristics.