Auditory perceptual inference engages learning of complex statistical information about the environment. Inferences assist us to simplify perception highlighting what can be predicted on the basis of prior learning (through the formation of internal “prediction” models) and what might be new, potentially necessitating an investment of resources to remodel predictions. In the present study, we tested the hypothesis that sound sequences with multiple levels of predictability may rely on cognitive resources and be cognitively penetrable to a greater extent than was previously shown by studies presenting simpler sound sequences. Auditory-evoked potentials (AEPs) were recorded from 117 participants. All participants heard the exact same sound sequence but under different conditions: 51 while watching a DVD movie and 66 while performing a cognitively demanding task. Participants were asked to ignore the sounds and focus their attention on the movie/task. However, prior to commencing the experiment we manipulated what participants knew about the sound sequence by providing explicit sequence information to 15 and 34 of the participants in the DVD and cognitive-task conditions, respectively, and no information to the others. The results demonstrated that although local pattern violations elicited distinctive AEP responses (namely, mismatch negativity), the way the amplitude of this response was modulated by sequence learning over time was dependent upon both task and explicit sequence knowledge. The implications are discussed with reference to how the division of available attention resources between the primary task and concurrent sound impacts what is learned.