In order to recognize banter or sarcasm in social interactions, listeners must integrate verbal and vocal emotional expressions. Here, we investigated event-related potential correlates of this integration in Asian listeners. We presented emotional words spoken with congruous or incongruous emotional prosody. When listeners classified word meaning as positive or negative and ignored prosody, incongruous trials elicited a larger late positivity than congruous trials in women but not in men. Sex differences were absent when listeners evaluated the congruence between word meaning and emotional prosody. The similarity of these results to those obtained in Western listeners suggests that sex differences in emotional speech processing depend on attentional focus and may reflect culturally independent mechanisms.