A great deal of perceptual and social information is conveyed by facial motion. Here, we investigated observers' sensitivity to the complex spatio-temporal information in facial expressions and what cues they use to judge the similarity of these movements. We motion-captured four facial expressions and decomposed them into time courses of semantically meaningful local facial actions (e.g., eyebrow raise). We then generated approximations of the time courses which differed in the amount of information about the natural facial motion they contained, and used these and the original time courses to animate an avatar head. Observers chose which of two animations based on approximations was more similar to the animation based on the original time course. We found that observers preferred animations containing more information about the natural facial motion dynamics. To explain observers' similarity judgments, we developed and used several measures of objective stimulus similarity. The time course of facial actions (e.g., onset and peak of eyebrow raise) explained observers' behavioral choices better than image-based measures (e.g., optic flow). Our results thus revealed observers' sensitivity to changes of natural facial dynamics. Importantly, our method allows a quantitative explanation of the perceived similarity of dynamic facial expressions, which suggests that sparse but meaningful spatio-temporal cues are used to process facial motion.