Any mature field of research in psychology—such as short-term/working memory—is characterized by a wealth of empirical findings. It is currently unrealistic to expect a theory to explain them all; theorists must satisfice with explaining a subset of findings. The aim of the present article is to make the choice of that subset less arbitrary and idiosyncratic than is current practice. We propose criteria for identifying benchmark findings that every theory in a field should be able to explain: Benchmarks should be reproducible, generalize across materials and methodological variations, and be theoretically informative. We propose a set of benchmarks for theories and computational models of short-term and working memory. The benchmarks are described in as theory-neutral a way as possible, so that they can serve as empirical common ground for competing theoretical approaches. Benchmarks are rated on three levels according to their priority for explanation. Selection and ratings of the benchmarks is based on consensus among the authors, who jointly represent a broad range of theoretical perspectives on working memory, and they are supported by a survey among other experts on working memory. The article is accompanied by a web page providing an open forum for discussion and for submitting proposals for new benchmarks; and a repository for reference data sets for each benchmark.