The past 2 decades have seen increasing use of experience sampling methods (ESMs) to gain insights into the daily experience of affective states (e.g., its variability, as well as antecedents and consequences of temporary shifts in affect). Much less attention has been given to methodological challenges, such as how to ensure reliability of test scores obtained using ESM. The present study demonstrates the use of dynamic factor analysis (DFA) to quantify reliability of test scores in ESM contexts, evaluates the potential impact of unreliable test scores, and seeks to identify characteristics of individuals that may account for their unreliable test scores. One hundred twenty-seven participants completed baseline measures (demographics and personality traits), followed by a 7-day ESM phase in which positive and negative state affect were measured up to 6 times per day. Analyses showed that although at the sample level, scores on these affect measures exhibited adequate levels of reliability, up to one third of participants failed to meet conventional standards of reliability. Where these low reliability estimates were not significantly associated with personality factors, they could—in some cases—be explained by model misspecification where a meaningful alternative structure was available. Despite these potential differences in factor structure across participants, subsequent modeling with and without these “unreliable” cases showed similar substantive results. Hence, the present findings suggest typical analyses based on ESM data may be robust to individual differences in data structure and/or quality. Ways to augment the DFA approach to better understand unreliable cases are discussed.