Our goal in this investigation was to evaluate the reliability of scores from the Balanced Inventory of Desirable Responding (BIDR) more comprehensively than in prior research using a generalizability-theory framework based on both dichotomous and polytomous scoring of items. Generalizability coefficients accounting for specific-factor, transient, and random-response error ranged from .64 to .75 for the BIDR's Self-Deception Enhancement (SDE) and Impression Management (IM) subscale scores, and these values were systematically lower than corresponding alpha (.66 to .83) and 1-week test–retest (.78 to .86) coefficients. Polytomous scoring provided higher reliability than dichotomous scoring on nearly all indexes reported. Random-response (8%–17%) and specific factor error (11%–17%) exceeded transient error (3%–6%) for both subscales and scoring methods. Doubling the number of items on a single occasion provided greater improvements in generalizability (.76–.83) than aggregating scores across 2 administrations (.72–.81). Both scoring methods provided reasonably high indexes of consistency (φ coefficients ≥ .91) at cut scores on the IM scale for detecting faked responses when all sources of error were taken into account. Implications of these results for common uses of the BIDR are discussed.