Recognizing others’ emotional expressions is vital for socioemotional development; impairments in this ability occur in several psychiatric disorders. Further study is needed to map the development of this ability and to evaluate its components as potential transdiagnostic endophenotypes. Before doing so, however, research is required to substantiate the test–retest reliability of scores of the face emotion identification tasks linked to developmental psychopathology. The current study estimated test–retest reliability of scores of one such task, the facial expression labeling task (FELT) among a sample of twin children (N = 157; ages 9–14). Participants completed the FELT at two visits two to five weeks apart. Participants discerned the emotion presented of faces depicting six emotions (i.e., happiness, anger, sadness, fear, surprise, and disgust) morphed with a neutral face to provide 10 levels of increasing emotional expressivity. The present study found strong test–retest reliability (Pearson r) of the FELT scores across all emotions. Results suggested that data from this task may be effectively analyzed using a latent growth curve model to estimate overall ability (i.e., intercept; r’s = 0.76—0.85) and improvement as emotions become clearer (i.e., linear slope; r’s = 0.69—0.83). Evidence of high test–retest reliability of this task’s scores informs future developmental research and the potential identification of transdiagnostic endophenotypes for child psychopathology.