Many pupils have difficulties with the abstract verbal information in history lessons. In this study we assessed the value of active construction of multimodal representations of historical phenomena. In an experimental study we compared the learning outcomes of pupils who co-constructed textual representations, visual-textual representations, or visual-textual representations integrated in a timeline. 85 pupils in pre-vocational secondary education, aged 12–13, worked in dyads on a series of four history tasks. All pupils took a pre-test, post-test and retention test. Results show that working on visual-textual representations integrated in a timeline leads to higher short-term results than co-constructing textual representations. Dialogue analyses for two dyads working in the condition with visual-textual representations integrated in a timeline indicate that the extent to which pupils verbally integrate textual and visual information differs for the four different tasks.