Acoustic speech is easier to detect in noise when the talker can be seen. This finding could be explained by integration of multisensory inputs or refinement of auditory processing from visual guidance. In two experiments, we studied two-interval forced-choice detection of an auditory ‘ba’ in acoustic noise, paired with various visual and tactile stimuli that were identically presented in the two observation intervals. Detection thresholds were reduced under the multisensory conditions vs. the auditory-only condition, even though the visual and/or tactile stimuli alone could not inform the correct response. Results were analysed relative to an ideal observer for which intrinsic (internal) noise and efficiency were independent contributors to detection sensitivity. Across experiments, intrinsic noise was unaffected by the multisensory stimuli, arguing against the merging (integrating) of multisensory inputs into a unitary speech signal, but sampling efficiency was increased to varying degrees, supporting refinement of knowledge about the auditory stimulus. The steepness of the psychometric functions decreased with increasing sampling efficiency, suggesting that the ‘task-irrelevant’ visual and tactile stimuli reduced uncertainty about the acoustic signal. Visible speech was not superior for enhancing auditory speech detection. Our results reject multisensory neuronal integration and speech-specific neural processing as explanations for the enhanced auditory speech detection under noisy conditions. Instead, they support a more rudimentary form of multisensory interaction: the otherwise task-irrelevant sensory systems inform the auditory system about when to listen.