It can be difficult to judge the effectiveness of encoding techniques in a within-subject design. Consider the production effect—the finding that words read aloud are better remembered than words read silently. In the absence of a baseline, a within-subject production effect in a mixed study list could reflect a benefit of reading aloud, a cost of reading silently, or both. To help interpret within-subject data, memory researchers have compared within-subject and between-subjects designs, with the between-subjects (i.e., pure list) conditions serving as baselines against which the within-subject (i.e., mixed-list) conditions are compared. In the present article, the authors highlight a shortcoming of using this comparison to assess costs and benefits in recognition. Unlike between-subjects experiments where separate false alarm rates are obtained for each condition, the typical within-subject experiment yields a collapsed false alarm rate, which, the authors argue, can potentially bias calculations of memory discrimination (d′). Across 3 experiments that used production as the encoding manipulation, they used a typical mixed-list versus pure-list design (Experiment 1) and then made modifications to this design (Experiments 2 and 3) that yielded separate mixed-list false alarm rates. The results of the latter 2 experiments demonstrated that words that are read aloud in a mixed list have an overall memorial benefit over words that are read aloud in a pure list—both in terms of increased hits and reduced false alarms. The authors frame these results in terms of the distinctiveness heuristic.