Gaze cues are used alongside language to communicate. Lab-based studies have shown that people reflexively follow gaze cue stimuli, however it is unclear whether this affect is present in real interactions. Language specificity influences the extent to which we utilize gaze cues in real interactions, but it is unclear whether the type of language used can similarly affect gaze cue utilization. We aimed to (a) investigate whether automatic gaze following effects are present in real-world interactions, and (b) explore how gaze cue utilization varies depending on the form of concurrent language used. Wearing a mobile eye-tracker, participants followed instructions to complete a real-world search task. The instructor varied the determiner used (featural or spatial) and the presence of gaze cues (absent, congruent, or incongruent). Congruent gaze cues were used more when provided alongside featural references. Incongruent gaze cues were initially followed no more than chance. However, unlike participants in the no-gaze condition, participants in the incongruent condition did not benefit from receiving spatial instructions over featural instructions. We suggest that although participants selectively use informative gaze cues and ignore unreliable gaze cues, visual search can nevertheless be disrupted when inherently spatial gaze cues are accompanied by contradictory verbal spatial references.