The brain integrates signals from multiple modalities to provide a reliable estimate of environmental events. A temporally cluttered environment presents a challenge for sensory integration because of the risk of misbinding, yet it also provides scope for cross-modal binding to greatly enhance performance by highlighting multimodal events. We present a tactile search task in which fingertips received pulsed vibrations and participants identified which finger was stimulated in synchrony with an auditory signal. Results showed that performance for identifying the target finger was impaired when other fingers were stimulated, even though all fingers were stimulated sequentially. When the number of fingers vibrated was fixed, we found that both spatial and temporal factors constrained performance, because events occurring close to the target vibration in either space or time reduced accuracy. When tactile search was compared with visual search, we found overall performance was lower in touch than in vision, although the cost of reducing temporal separation between stimuli or increasing the presentation rate was similar for both target modalities. Audiotactile performance benefitted from increasing spatial separation between target and distractors, with a particularly strong benefit for locating the target on a different hand to the distractors, whereas the spatial manipulations did not affect audiovisual performance. The similar trends in performance for temporal manipulations across vision and touch suggest a common supramodal binding mechanism that, when combining audition and touch, is limited by the poor resolution of the underlying unisensory representation of touch in cluttered settings.