The current study employed behavioral and electrophysiological measures to investigate the timing, localization, and neural oscillation characteristics of cortical activities associated with phonetic and emotional information processing of speech. The experimental design used a cross-modal priming paradigm in which the normal adult participants were presented a visual prime followed by an auditory target. Primes were facial expressions that systematically varied in emotional content (happy or angry) and mouth shape (corresponding to /a/ or /i/ vowels). Targets were spoken words that varied by emotional prosody (happy or angry) and vowel (/a/ or /i/). In both the phonetic and prosodic conditions, participants were asked to judge congruency status of the visual prime and the auditory target. Behavioral results showed a congruency effect for both percent correct and reaction time. Two ERP responses, the N400 and late positive response (LPR), were identified in both conditions. Source localization and inter-trial phase coherence of the N400 and LPR components further revealed different cortical contributions and neural oscillation patterns for selective processing of phonetic and emotional information in speech. The results provide corroborating evidence for the necessity of differentiating brain mechanisms underlying the representation and processing of co-existing linguistic and paralinguistic information in spoken language, which has important implications for theoretical models of speech recognition as well as clinical studies on the neural bases of language and social communication deficits.