No consensus exists regarding whether early response to an antidepressant strongly predicts a good outcome, what is the criterion for early response, or when to measure it.We hypothesized that early response (greater or equal to 20% decrease in HAM-D21) after any of weeks 1, 2, or 3 of fluoxetine treatment of major depression in geriatric outpatients would predict a favorable outcome by week 6 or an earlier endpoint accurately enough for clinical use. We also hypothesized that the week 1, 2, and 3 percent changes in 21-item Hamilton Rating Scale for Depression (HAM-D21) would predict the percent change at week 6 (or endpoint) accurately enough for clinical use. We enrolled 671 elderly outpatients with unipolar DSM-III-R major depression in a double-blind, placebo-controlled trial of fluoxetine, 20 mg/day. For analysis, fluoxetine-treated patients were randomly divided into a development set (N = 154) for a preliminary test of our criteria and a validation set (N = 181) to validate the development data set's results. Early responders at weeks 1, 2, and 3 were statistically significantly more likely to experience marked improvement or remission than those lacking early response. However, at week 3, this criterion correctly classified only about three-fourths of patients with regard to marked improvement and only about two-thirds with regard to remission. Moreover, about one-third of patients predicted to experience marked improvement and about three-fifths of those predicted to remit did not. The continuous variable, percent change in HAM-D21, did not produce predictive results of any greater clinical utility. We believe that the sensitivity, specificity, false-positive rate, false-negative rate, and kappa of outcome predictions all should be reported in future studies. Without a full set of descriptive statistics, clinicians can be misled by statistically significant results. (J Clin Psychopharmacol 1995;15:421-427).