Despite the empirical evidence for the power of the cognitive capacity of implicit learning of structures and regularities in several modalities and materials, it remains controversial whether implicit learning extends to the learning of temporal structures and regularities. We investigated whether (a) an artificial grammar can be learned equally well when expressed in duration sequences as when expressed in pitch sequences, (b) learning of the artificial grammar in either duration or pitch (as the primary dimension) sequences can be influenced by the properties of the secondary dimension (invariant vs. randomized), and (c) learning can be boosted when the artificial grammar is expressed in both pitch and duration. After an exposure phase with grammatical sequences, learning in a subsequent test phase was assessed in a grammaticality judgment task. Participants in both the pitch and duration conditions showed incidental (not fully implicit) learning of the artificial grammar when the secondary dimension was invariant, but randomizing the pitch sequence prevented learning of the artificial grammar in duration sequences. Expressing the artificial grammar in both pitch and duration resulted in disproportionately better performance, suggesting an interaction between the learning of pitch and temporal structure. The findings are relevant to research investigating the learning of temporal structures and the learning of structures presented simultaneously in 2 dimensions (e.g., space and time, space and objects). By investigating learning, the findings provide further insight into the potential specificity of pitch and time processing, and their integrated versus independent processing, as previously debated in music cognition research.