Traditional studies of memory and object recognition involved objects presented within a single sensory modality (i.e., purely visual or purely auditory objects). However, in naturalistic settings, objects are often evaluated and processed in a multisensory manner. This begets the question of how object representations that combine information from the different senses are created and utilised by memory functions. Here we review research that has demonstrated that a single multisensory exposure can influence memory for both visual and auditory objects. In an old/new object discrimination task, objects that were presented initially with a task-irrelevant stimulus in another sense were better remembered compared to stimuli presented alone, most notably when the two stimuli were semantically congruent. The brain discriminates between these two types of object representations within the first 100 ms post-stimulus onset, indicating early “tagging” of objects/events by the brain based on the nature of their initial presentation context. Interestingly, the specific brain networks supporting the improved object recognition vary based on a variety of factors, including the effectiveness of the initial multisensory presentation and the sense that is task-relevant. We specify the requisite conditions for multisensory contexts to improve object discrimination following single exposures, and the individual differences that exist with respect to these improvements. Our results shed light onto how memory operates on the multisensory nature of object representations as well as how the brain stores and retrieves memories of objects.