We investigated how a picture fosters learning from text, both with self-paced presentation and with short presentation before text. In an experiment, participants (N = 114) learned about the structure and functioning of a pulley system in one of six conditions: text only, picture presentation for 150 milliseconds, 600 milliseconds, or 2 seconds, or self-paced before text, or self-paced concurrent presentation of text and picture. Presenting the picture for self-paced study time, both before and concurrently with text, fostered recall and comprehension and sped up text processing compared with presenting text only. Moreover, even inspecting the picture for only 600 milliseconds or 2 seconds improved comprehension and yielded faster reading of subsequent text about the spatial structure of the system compared with text only. These findings suggest that pictures, even if attended for a short time only, may yield a spatial mental scaffold that allows for the integration with verbal information, thereby fostering comprehension. Copyright © 2013 John Wiley & Sons, Ltd.