Evaluations of assessment instruments using classical test theory typically rely on indices of internal consistency, test–retest reliability, and construct validity. However, the use of models from item response theory (IRT) allows comparison of instruments (and items) in terms of the information they provide and where they provide it along the continuum of severity of the construct being assessed. Such results help to identify the measures most appropriate for specific clinical and research contexts. The present study examined the functioning of the Beck Depression Inventory (BDI), the Center for Epidemiologic Studies – Depression (CES-D) scale, and the nine primary symptoms from the depression module of the Schedule for Affective Disorders and Schizophrenia – Children (K-SADS) using IRT methods. A large sample of adolescents (n = 1709) completed the BDI, CES-D scale, and K-SADS. IRT calibration analyses demonstrated that the BDI and CES-D scale performed well in similar ranges of depressive severity (approximately −1 to +3 standard deviations [SDs]), although the BDI provided more information at higher severity levels and the CES-D scale at lower severity levels. The K-SADS depression items, which are dichotomous and focused on clinical disorder, provided the least information that was restricted to the narrowest range (approximately +1 to +3 SDs). This work finds consistency between past rationale for the use of the BDI in clinical samples while using the CES-D scale in epidemiological studies. The results for the K-SADS suggest that interview measures may benefit from increasing the number of items and/or response options to collect more psychometric information. Copyright © 2012 John Wiley & Sons, Ltd.