Speech perception depends on long-term representations that reflect regularities of the native language. However, listeners rapidly adapt when speech acoustics deviate from these regularities due to talker idiosyncrasies such as foreign accents and dialects. To better understand these dual aspects of speech perception, we probe native English listeners' baseline perceptual weighting of 2 acoustic dimensions (spectral quality and vowel duration) toward vowel categorization and examine how they subsequently adapt to an “artificial accent” that deviates from English norms in the correlation between the 2 dimensions. At baseline, listeners rely relatively more on spectral quality than vowel duration to signal vowel category, but duration nonetheless contributes. Upon encountering an “artificial accent” in which the spectral-duration correlation is perturbed relative to English language norms, listeners rapidly down-weight reliance on duration. Listeners exhibit this type of short-term statistical learning even in the context of nonwords, confirming that lexical information is not necessary to this form of adaptive plasticity in speech perception. Moreover, learning generalizes to both novel lexical contexts and acoustically distinct altered voices. These findings are discussed in the context of a mechanistic proposal for how supervised learning may contribute to this type of adaptive plasticity in speech perception.