Emotion perception naturally entails multisensory integration. It is also assumed that multisensory emotion perception is characterized by enhanced activation of brain areas implied in multisensory integration, such as the superior temporal gyrus and sulcus (STG/STS). However, most previous studies have employed designs and stimuli that preclude other forms of multisensory interaction, such as crossmodal prediction, leaving open the question whether classical integration is the only relevant process in multisensory emotion perception. Here, we used video clips containing emotional and neutral body and vocal expressions to investigate the role of crossmodal prediction in multisensory emotion perception.
While emotional multisensory expressions increased activation in the bilateral fusiform gyrus (FFG), neutral expressions compared to emotional ones enhanced activation in the bilateral middle temporal gyrus (MTG) and posterior STS. Hence, while neutral stimuli activate classical multisensory areas, emotional stimuli invoke areas linked to unisensory visual processing. Emotional stimuli may therefore trigger a prediction of upcoming auditory information based on prior visual information. Such prediction may be stronger for highly salient emotional compared to less salient neutral information. Therefore, we suggest that multisensory emotion perception involves at least two distinct mechanisms; classical multisensory integration, as shown for neutral expressions, and crossmodal prediction, as evident for emotional expressions.