Musical melodies have “peaks” and “valleys”. Although the vertical component of pitch and music is well-known, the mechanisms underlying its mental representation still remain elusive. We show evidence regarding the importance of previous experience with melodies for crossmodal interactions to emerge. The impact of these crossmodal interactions on other perceptual and attentional processes was also studied. Melodies including two tones with different frequency (e.g., E4 and D3) were repeatedly presented during the study. These melodies could either generate strong predictions (e.g., E4-D3-E4-D3-E4-[D3]) or not (e.g., E4-D3-E4-E4-D3-[?]). After the presentation of each melody, the participants had to judge the colour of a visual stimulus that appeared in a position that was, according to the traditional vertical connotations of pitch, either congruent (e.g., high-low-high-low-[up]), incongruent (high-low-high-low-[down]) or unpredicted with respect to the melody. Behavioural and electroencephalographic responses to the visual stimuli were obtained. Congruent visual stimuli elicited faster responses at the end of the experiment than at the beginning. Additionally, incongruent visual stimuli that broke the spatial prediction generated by the melody elicited larger P3b amplitudes (reflecting ‘surprise’ responses). Our results suggest that the passive (but repeated) exposure to melodies elicits spatial predictions that modulate the processing of other sensory events.