Previous research with artificial language learning paradigms has shown that infants are sensitive to statistical cues to word boundaries (Saffran, Aslin & Newport, 1996) and that they can use these cues to extract word-like units (Saffran, 2001). However, it is unknown whether infants use statistical information to construct a receptive lexicon when acquiring their native language. In order to investigate this issue, we rely on the fact that besides real words a statistical algorithm extracts sound sequences that are highly frequent in infant-directed speech but constitute nonwords. In three experiments, we use a preferential listening paradigm to test French-learning 11-month-old infants' recognition of highly frequent disyllabic sequences from their native language. In Experiments 1 and 2, we use nonword stimuli and find that infants listen longer to high-frequency than to low-frequency sequences. In Experiment 3, we compare high-frequency nonwords to real words in the same frequency range, and find that infants show no preference. Thus, at 11 months, French-learning infants recognize highly frequent sound sequences from their native language and fail to differentiate between words and nonwords among these sequences. These results are evidence that they have used statistical information to extract word candidates from their input and stored them in a ‘protolexicon’, containing both words and nonwords.