Three experiments examined the immediate free recall (IFR) of auditory-verbal and visuospatial materials from single-modality and dual-modality lists. In Experiment 1, we presented participants with between 1 and 16 spoken words, with between 1 and 16 visuospatial dot locations, or with between 1 and 16 words and dots with synchronized onsets. We found that for dual-modality lists (a) overall performance, initial recalls, and serial position curves were largely determined by the within-modality list lengths, (b) there was only a small degree of dual-task trade-off (that was limited to the visuospatial items), and (c) there were strongly constrained output orders: participants tended to alternate between words and dots from equivalent or neighboring serial positions. In Experiments 2 and 3, we compared lists of 6 single-modality items with dual-modality lists of 6 words and 6 dots with synchronous or alternating onsets (Experiment 2), or random but asynchronous onsets (Experiment 3). In all 3 dual-modality conditions, we again found only a small trade-off in visuospatial (but not verbal) IFR performance. There were similarly highly constrained output orders with the synchronous and alternating onsets, and these patterns were present but attenuated with the randomized onsets. We propose that both auditory-verbal and visuospatial list items are associated with a common temporal episodic context that is used to guide cross-modal retrieval, and we speculate that the limited, asymmetric interference could arise because the less variable representations of the dots share only a relatively small subset of features with the more variable representations of the words.