Speech perception requires the effortless mapping from smooth, seemingly continuous changes in sound features into discrete perceptual units, a conversion exemplified in the phenomenon of categorical perception. Explaining how/when the human brain performs this acoustic–phonetic transformation remains an elusive problem in current models and theories of speech perception. In previous attempts to decipher the neural basis of speech perception, it is often unclear whether the alleged brain correlates reflect an underlying percept or merely changes in neural activity that covary with parameters of the stimulus. Here, we recorded neuroelectric activity generated at both cortical and subcortical levels of the auditory pathway elicited by a speech vowel continuum whose percept varied categorically from /u/ to /a/. This integrative approach allows us to characterize how various auditory structures code, transform, and ultimately render the perception of speech material as well as dissociate brain responses reflecting changes in stimulus acoustics from those that index true internalized percepts. We find that activity from the brainstem mirrors properties of the speech waveform with remarkable fidelity, reflecting progressive changes in speech acoustics but not the discrete phonetic classes reported behaviorally. In comparison, patterns of late cortical evoked activity contain information reflecting distinct perceptual categories and predict the abstract phonetic speech boundaries heard by listeners. Our findings demonstrate a critical transformation in neural speech representations between brainstem and early auditory cortex analogous to an acoustic–phonetic mapping necessary to generate categorical speech percepts. Analytic modeling demonstrates that a simple nonlinearity accounts for the transformation between early (subcortical) brain activity and subsequent cortical/behavioral responses to speech (> 150–200 ms) thereby describing a plausible mechanism by which the brain achieves its acoustic-to-phonetic mapping. Results provide evidence that the neurophysiological underpinnings of categorical speech are present cortically by ˜ 175 ms after sound enters the ear.