To test whether the attentional selection of targets defined by a combination of visual and auditory features is guided in a modality-specific fashion or by control processes that are integrated across modalities, we measured attentional capture by visual stimuli during unimodal visual and audiovisual search. Search arrays were preceded by spatially uninformative visual singleton cues that matched the current target-defining visual feature. Participants searched for targets defined by a visual feature, or by a combination of visual and auditory features (e.g., red targets accompanied by high-pitch tones). Spatial cueing effects indicative of attentional capture were reduced during audiovisual search, and cue-triggered N2pc components were attenuated and delayed. This reduction of cue-induced attentional capture effects during audiovisual search provides new evidence for the multimodal control of selective attention.