The first time a newborn is held, he is attracted by the human’s face. A talking face is even more captivating, as it is the first time he or she hears and sees another human talking. Older infants are relatively good at detecting the relationship between images and sounds when someone is addressing to them, but it is unclear whether this ability is dependent on experience or not. Using an intermodal matching procedure, we presented newborns with 2 silent point-line displays representing the same face uttering different sentences while they were hearing a vocal-only utterance that matched 1 of the 2 stimuli. Nearly all of the newborns looked longer at the matching point-line face than at the mismatching 1, with prior exposure to the stimuli (Experiment 1) or without (Experiment 2). These results are interpreted in terms of newborns’ ability to extract common visual and auditory information of continuous speech events despite a short experience with talking faces. The implications are discussed in the light of the language processing and acquisition literature.