The episodic buffer has been described as a structure of working memory capable of maintaining multimodal information in an integrated format. Although the role of the episodic buffer in binding features into objects has received considerable attention, several of its characteristics have remained rather underexplored. This is the case for its maintenance capacity limits and its separability from domain-specific maintenance buffers. The present study addressed these questions, making use of a complex span paradigm in which participants were asked to maintain cross-domain (i.e., verbal–spatial) associations. The 1st experiment showed that the capacity limit for these cross-domain associations proved to be lower than the capacity limit for single features, and did not exceed 3. Cross-domain associations and single features depended, however, to the same extent on attentional resources for their maintenance. The 2nd experiment showed that domain-specific (verbal or spatial) resources were not involved in the maintenance of cross-domain information, revealing a clear distinction between the episodic buffer and the domain-specific buffers. Overall, in line with the episodic buffer hypothesis, these findings support the existence of a central system of limited capacity for the maintenance of cross-domain information.