Demonstrations of emotional Stroop in conditioned made-up words are flawed because of the lack of task ensuring similar word encoding across conditions. Here, participants were trained on associations between made-up words (e.g., ‘drott’) and pictures with an alarming or neutral content (e.g., ‘a dead sheep’ vs. ‘a munching cow’) in a situation that required attention to both ends of each association. To test whether word emotional attributes need to consolidate before they can hijack attention, one set of associations was learned seven days before the test, whereas the other set was learned either six hrs or immediately before the test. The novel words’ ability to evoke their emotional attributes was assessed by using both Stroop and an auditory analogue called pause detection. Matching words and pictures was harder for alarming associations. However, similar learning rate and forgetting at seven days were observed for both types of associations. Pause detection revealed no emotion effect for same-day (i.e., unconsolidated) associations, but robust interference for seven-day-old (i.e., consolidated) alarming associations. Attention capture was found in the emotional Stroop as well, though only when trial n−1 referred to a same-day association. This task also showed stronger response repetition priming (independently of emotion) when trials n and n−1 both tapped into seven-day-old associations. Word emotional attributes hence take between six hrs and seven days to be operational. Moreover, age interactions between consecutive trials can be used to gauge implicitly the indirect (relational) episodic associations that develop in the meantime between the memories of individual items.