Research on second language (L2) vocabulary acquisition has revealed that words associated with actual objects or imagery techniques are learned more easily than those without. With multimedia applications, it is possible to provide, in addition to traditional definitions of words, different types of information, such as pictures and videos. Thus, one of the fundamental research questions posed in the use of multimedia systems is: How effective are annotations with different media types for vocabulary acquisition? This article discusses the results of three studies done with 160 university German students using CyberBuch, a hypermedia application for reading German texts that contains a variety of annotations for words in the form of text, pictures, and video. The issues examined are related to (a) how well vocabulary is learned incidentally when the goal is reading comprehension, (b) the effectiveness of different types of annotations for vocabulary acquisition, and (c) the relationship between look-up behavior and performance on vocabulary tests. The results showed a higher rate of incidental learning than expected (25% accuracy on production tests, 77% on recognition tests), significantly higher scores for words that were annotated with pictures + text than for those with video + text or text only, and a correlation between looking up a certain annotation type and using this type as the retrieval cue for remembering words.