should be sent to Sophie Dufour, Laboratoire Parole et Langage, Université de Provence, 5, avenue Pasteur, 13604 Aix-en-Provence, France. E-mail: email@example.com
Although the word-frequency effect is one of the most established findings in spoken-word recognition, the precise processing locus of this effect is still a topic of debate. In this study, we used event-related potentials (ERPs) to track the time course of the word-frequency effect. In addition, the neighborhood density effect, which is known to reflect mechanisms involved in word identification, was also examined. The ERP data showed a clear frequency effect as early as 350 ms from word onset on the P350, followed by a later effect at word offset on the late N400. A neighborhood density effect was also found at an early stage of spoken-word processing on the PMN, and at word offset on the late N400. Overall, our ERP differences for word frequency suggest that frequency affects the core processes of word identification starting from the initial phase of lexical activation and including target word selection. They thus rule out any interpretation of the word frequency effect that is limited to a purely decisional locus after word identification has been completed.
Numerous studies have shown that the ease with which spoken words are recognized strongly depends on their frequency of occurrence in the language (e.g., Connine, Mullennix, Shernoff, & Yelen, 1990; Dahan, Magnuson, & Tanenhaus, 2001; Luce & Pisoni, 1998; Marslen-Wilson, 1990; Radeau, Morais, & Segui, 1995). High-frequency words are responded to faster and more accurately than low-frequency ones. This advantage of high-frequency words over low-frequency ones has been found in different tasks, including identification in noise, lexical decision, and naming. This effect, known as the word-frequency effect, is one of the most influential effects in word recognition literature. However, an examination of spoken-word literature shows that the exact locus at which frequency comes into play during word recognition is controversial. In this study, we used event-related potentials (ERPs) to track the time course of the word-frequency effect and to identify when, during the recognition process, neural responses change as a function of word frequency.
In models with discrete lexical representations, such as in interactive-activation models (e.g., McClelland & Rumelhart, 1981) or in the Cohort Model (Marslen-Wilson, 1987, 1990), frequency is generally coded in the resting activation level of lexical units and thus determines the baseline activation level of each word. As high-frequency words have higher resting activation levels than low-frequency ones, they reach the recognition threshold earlier and thus are processed faster. In the distributed version of the Cohort Model (DCM, distributed cohort model, Gaskell & Marslen-Wilson, 1997), the increased exposure to high-frequency words leads to stronger connection weights in the network and to more activation of these words. In contrast to these models, neighborhood activation model (NAM; Luce & Pisoni, 1998) places the locus of the frequency effect after the initial activation of lexical candidates. Frequency is not directly coded in the resting activation level but acts later in biasing the decision process in favor of high-frequency words. Finally, a third hypothesis about the locus of the word-frequency effect comes from studies on visual word recognition. Balota and Chumbley (1984) claim that the visual word-frequency effect in tasks, such as lexical decision, results from a response bias that affects the decision after word identification has been completed. Hence, this last account, contrary to the preceding ones, places the primary locus of the word-frequency effect after word identification and attributes the effect to task-specific decision processes. It should be noted, however, that although the response bias hypothesis places the locus of the frequency effect somewhat later than does the NAM model, the two proposals are quite similar and difficult to tease apart.
Evidence in support of an early locus of the frequency effect has been found in cross-modal semantic priming experiments by Marslen-Wilson (1987). He presented participants with spoken word fragments in which both high- and low-frequency words were possible completions. These fragments were immediately followed by a visual word target which was semantically related to either the low- or the high-frequency candidate. He reported less priming (in comparison to a control condition) for target words related to low-frequency candidates than for target words related to high-frequency candidates. Marslen-Wilson interpreted these results as evidence that low-frequency words become less activated than high-frequency ones in the early stages of lexical processing. However, in a subsequent study, Connine, Titone, and Wang (1993) claimed that this differential priming effect can be explained without assuming that frequency directly affects the activation level of lexical candidates. According to Connine et al., because the frequency effect was observed only when an incomplete word was presented and not when the entire prime was presented, participants may have strategically completed the fragments with the more frequent alternative, thus leading to more priming for target words related to high-frequency candidates.
To test the hypothesis that word frequency introduces a response bias for high-frequency words without affecting their level of activation, Connine et al. (1993) used a phoneme identification task in which participants had to identify an ambiguous phoneme that could map onto both high- and low-frequency words (e.g., b/pest), and manipulated the frequency composition of the word lists. They reported a reverse frequency effect with ambiguous tokens labeled in concordance to the low-frequency word when the stimulus lists consisted of words of low frequency, and an exaggerated frequency effect with ambiguous tokens labeled in concordance to the high-frequency word when the stimuli lists consisted of words of high frequency. Because Connine et al. (1993) also observed that the fast response times (RTs) were more affected by list manipulation rather than by word frequency itself, they argued that frequency played no role in the initial phase of lexical activation. According to these authors, their results supported a late use of frequency during spoken-word recognition, with frequency operating as a response bias but not guiding the activation of lexical candidates. They thus proposed an account of the word-frequency effect similar to that proposed by Balota and Chumbley (1984) with “word frequency functioning as a default source of information used late in the decision process rather than an early source of information used to shape lexical hypotheses”(p. 91 in Connine et al., 1993).
The controversy about the locus of the frequency effect in studies using RT measurements highlights the necessity of using more fine-grained measurements to obtain more precise information about the time course of word recognition. As reaction times provide only a single measure of the various mental operations involved between stimulus onset and participants’ response, they are limited in their ability to determine the locus of the frequency effect. To examine when word frequency comes into play in the recognition process, Dahan et al. (2001) appealed to another measure. They examined the eye movements of participants who followed spoken instructions to manipulate objects pictured on a computer screen. The way the participants’ eyes fixated the pictures as the spoken word unfolded over time gave clues to the dynamics of speech perception and word recognition. In their first experiment, a referent picture (e.g., the picture of a bench) was presented along with three other pictures, two of which had the same initial phonemes as the name of the referent (e.g., bed and bell) but varied with respect to their frequency relative to that of the referent. They observed that as early as 267 ms after target onset, participants were more likely to fixate the high-frequency competitor (bed) than the low-frequency one (bell). In a subsequent experiment, they directly manipulated the frequency of the name of the referent picture presented among three unrelated distractors and showed that fixation latencies to referents with high-frequency names were faster than those to referents with low-frequency names (see also, Magnuson, Dixon, Tanenhaus, & Aslin, 2007). Together, their results suggest that the word-frequency effect emerges very early during the processing of the spoken word, that is, while the speech signal is still ambiguous with respect to the referent word’s identity. Contrary to Connine et al. (1993), they, thus, claim that their result “rule[s] out models in which frequency effects in spoken word recognition are primarily due to decision biases that apply after lexical activation is complete” (p. 360).
1.1. Electrophysiological correlates of spoken word recognition
Like eye tracking but unlike RT measurements, the millisecond-by-millisecond resolution of the ERP technique allows one to study spoken-word processing as it unfolds. Usefully and unlike eye tracking, specific electrophysiological components have been associated with distinct stages of spoken-word recognition. For example, phonological mismatch negativity (PMN) has been used to probe prelexical processing, that is, the process which occurs before lexical access (Connolly, Phillips, Stewart, & Brake, 1992; Newman, Connolly, Service, & McIvor, 2003) while the N400 (Desroches, Newman, & Joanisse, 2009) has been used to probe lexical processing, going from activation of a set of lexical candidates to the selection of the target word from this activated set. ERPs are thus promising in the investigation of the exact locus at which frequency effects take place during spoken-word recognition.
In contrast, the P350, a positive wave peaking around 350 ms, has been proposed to reflect lexical processing (Friedrich, Kotz, Friederici, & Gunter, 2004; Friedrich, Kotz, Friederici, & Alter, 2004; Friedrich, Schild, & Röder, 2009). According to Friedrich et al. (2009), the P350 reflects the activation of modality-independent neural word form representations. This component has been mainly reported in cross-modal fragment priming but also in auditory and visual unimodal fragment priming. For example, mismatches in syllable stress between the visual target and the auditory prime fragment (e.g., unstressed fa preceding the stressed word “Faden”) produced a greater amplitude of the P350 as compared with the matching stress condition (Friedrich et al., 2004a). The authors interpret their results as reflecting activation of lexical word forms. Hence, the observation of a frequency effect on this component could suggest that frequency contributes to lexical activation and thus has an early effect on spoken-word recognition.
Although the N400, a negative deflection peaking around 400 ms after word onset with a centroparietal distribution (Kutas & Hillyard, 1984; Kutas & Federmeier, 2000), is traditionally associated with semantic processing (Kutas & Federmeier, 2000; for a review), some studies suggest that it can also reflect the ease with which phonological information is retrieved at lexical level (Desroches et al., 2009; Dumay et al., 2001). In a recent study, Desroches et al. (2009) examined the impact of different types of phonological competitors which were supposed to be activated during word recognition. More particularly, they used a picture–word matching task in which participants saw a picture followed by an auditory probe word and had to judge whether the picture and the probe were the same. Critically, the auditory probe was identical to the picture label (e.g., CONE–cone); rhymed with the picture label (e.g., CONE–bone); shared initial consonant onset and subsequent vowel with the picture label (e.g., CONE–comb); or was unrelated to the picture label (e.g., CONE–fox). Consistent with others studies (e.g., Dumay et al., 2001), Desroches et al. observed a reduction in the amplitude of the N400 component in the word final phonological similarity (CONE–bone). For the word initial phonological similarity condition (CONE–comb), the magnitude of the N400 component increased significantly in comparison to the other conditions at a slightly later time point (Late N400). The increase in the late N400 amplitude for word initial phonological information has been taken as evidence for competition between initially similar sounding words during word recognition. In particular, the presence of a close competitor makes the selection of the target word harder among the set of activated candidates. Hence, because the N400 or the late N400 appear to reflect the ease with which a word is selected, finding a frequency effect in one of these components would suggest that frequency acts in the selection process that follows lexical activation.
1.2. The present study
This study had two goals. The main goal was to provide further insights into the precise processing locus at which the frequency effect takes place during spoken-word recognition. To this end, we used ERPs and identified the moments in time at which the amplitudes and spatial distributions of brain responses were modulated by the target word frequency. Although the study of Dahan et al. (2001) strongly suggests that frequency affects the earliest moment of spoken-word recognition, we judged it useful to supply new evidence in favor of this claim, using a different methodological approach in which cerebral activity is directly probed. We predicted that if word-frequency guides lexical activation, the word-frequency effect should emerge in time windows before the offset of the target word, that is, during the initial phase of lexical activation when information about the identity of the target word is still ambiguous and partially compatible with other words.
The second goal was to examine the time course of another lexical factor, namely phonological neighborhood density. Neighborhood density refers to the number of words that sound similar to a target word; words with many neighbors are said to have dense neighborhoods, whereas words with few neighbors are said to have sparse neighborhoods. Numerous studies have reported an inhibitory influence of neighborhood on spoken-word recognition. Words residing in dense neighborhoods are recognized slower than words residing in sparse neighborhoods (Luce & Pisoni, 1998; Vitevitch & Luce, 1998, 1999). This finding has been taken as evidence in favor of a competition process between simultaneously activated lexical candidates that is embraced by all models of spoken-word recognition. Although neighborhood density has also been operationalized by others by taking the frequency of the neighbors into consideration and summing their frequencies (e.g., words with high neighborhood frequencies are said to occur in dense similarity neighborhoods and words with low neighborhood frequencies are said to occur in sparse similarity neighborhoods; see Magnuson et al., 2007; Vitevitch & Luce, 1998), we wanted to examine the neighborhood density effect by matching the summed neighbor frequencies across the two neighborhood densities. This procedure allowed us to compare the time course of the word-frequency effect with another effect, the so-called neighborhood density effect, which is assumed to affect lexical identification and in which frequency does not play a role. Because various types of neighbors have been found to be activated as speech unfolds (e.g., Allopenna, Magnuson, & Tanenhaus, 1998; Desroches et al., 2009; Magnuson et al., 2007), we used a broad metric to calculate the number of neighbors and thus applied the definition proposed by Luce and Pisoni (1998), that is, all the words that can be formed by adding, substituting, or deleting one phoneme to this word.1
As current models of spoken-word recognition assume parallel and partial activation of lexical representations that are acoustically similar to the unfolding input, we expected to find an influence of neighborhood density at the earliest moments of spoken-word recognition, that is, when the information in the speech signal does not yet permit the target words to be uniquely identified. In particular, the effect of neighborhood density could be seen on the PMN component. Some studies have already shown that neighborhood density can affect prelexical phonemic processing (Vitevitch & Luce, 1998, 1999). In this case, because words residing in dense neighborhoods have frequently occurring phonemes and frequently occurring phoneme sequences, they facilitate prelexical processing, and a reverse effect of neighborhood density was observed, with words residing in dense neighborhoods treated faster than those residing in sparse neighborhood densities (Vitevitch & Luce, 1998, 1999). We also expected to find a neighborhood density effect on components reflecting lexical processing, such as the N400 or the P350, and find evidence for competition between phonological neighbors.
In addition to the major theoretical goal of probing the locus of the word-frequency effect, to our knowledge, no ERP studies have examined the simultaneous influence of word frequency and phonological neighborhood density on the precise time course of spoken-word recognition. We used monosyllabic words with a recognition point—the moment at which words become unique with respect to the other words in the lexicon—after the last phoneme, and recorded ERPs while participants performed a lexical decision task widely used to tap into lexical processing.
Twenty-six right-handed French speakers from the University of Aix-Marseille participated in the experiment after having given written informed consent. Handedness was assessed using the Edinburgh Inventory (Oldfield, 1971). All participants reported having no neurological or hearing impairments.
One hundred monosyllabic words, three to four phonemes in length, were selected from Vocolex, a lexical database for the French language (Dufour, Peereman, Pallier, & Radeau, 2002). All words had their uniqueness point—the phonemic position at which a word can be reliably identified—after their last phoneme. Fifty were of low frequency, and the remaining 50 words were of high frequency. For each level of frequency (low, high), half of the words resided in a sparse neighborhood, and the other half resided in a dense neighborhood. Neighborhood density was calculated by summing the number of words generated by substituting, adding or deleting a single phoneme at any position within the word (Luce & Pisoni, 1998). The four categories of words were matched in terms of phonological length (number of phonemes), summed frequencies of neighbors, uniqueness point, and duration. Item characteristics are summarized in Table 1, and the words are provided in the Appendix A. For the purpose of the lexical decision task, 100 nonwords were created by changing the last phoneme of words not found in the stimulus sets (see also, Vitevitch, 2007). Changing the last of words allowed us to create nonwords with both legal phonemic sequences and a late deviation point—the moment at which the nonwords were no longer compatible with others words. Note also that the use of nonwords with both legal phonemic sequences and a late deviation point forces the participants to consult their mental lexicon to discriminate between the words and nonwords stimuli.
Table 1. Characteristics of the stimulus sets (mean values)
Note.aUsing the Luce and Pisoni (1998)’s definition; bnumber of words that share the first two phonemes independently of the length; chow often a particular segment occurs in a given position in a word; dsegment-to-segment co-occurrence. Positional phoneme and biphone frequencies were calculated from the Lexique 2 French Data Base (New et al., 2001).
Frequency (in occurrence per million)
Summed frequencies of neighbors
Positional phoneme frequencyc
Positional biphone frequencyd
Number of phonemes
Duration (in ms)
The stimuli were recorded by a female native speaker of French and were digitized at a sampling rate of 44 kHz with 16 bits. The participants were tested in a sound-attenuated booth, and stimuli were presented over headphones at a comfortable sound level. Participants were asked to make a lexical decision as quickly and accurately as possible with “word” responses, using their dominant hand on a button-box that was placed in front of them. RTs were recorded from the onset of stimuli. An intertrial interval of 2,000 ms elapsed between the end of one trial and the beginning of the next. The participants began the experiment with 16 practice trials.
2.3.1. ERP recording and processing
The electrical signal (sample rate 1,024 Hz) was recorded during auditory stimulation with a 64-channel BioSemi ActiveTwo AD-box. Individual electrodes were adjusted to a stable offset lower than 20 mV. The EEG epochs, starting at 100 ms before stimulus onset and ending 800 ms after it, were averaged for each participant and for each experimental condition. The electroencephalographic (EEG) data were filtered offline by a bandpass filter (1–30 Hz). In particular, we applied a second-order Butterworth filter with a 12 db/octave roll-off, and this filter was computed linearly with two passes (forward and reverse), which produced a zero-phase shift. The electroencephalographic (EEG) data were also corrected by a baseline of 100 ms before stimulus onset. Epochs were accepted according to an artifact rejection criterion of ±100 μV. All participants took part in at least 20 accepted trials for each experimental condition. Data from bad channels for each participant were interpolated (Perrin, Pernier, Bertrand, Giard, & Echallier, 1987), and the EEG signal was transformed using the average reference.
Four items that gave rise to an error rate of more than 40% were excluded from the analyses. The exclusion of these items did not affect the matching across experimental conditions.
3.1. Behavioral results
Following Van Selst and Jolicoeur (1994), a two-step trimming procedure was applied to the latency data. In a first step, latencies longer than 1,300 ms were excluded. This criterion was based on the examination of the overall distribution of RTs: M = 873 ms, median = 837 ms, SD = 209 ms; it was set close to median + 2 × SD (1,255 ms). In a second step and for each condition, latencies higher or lower than 2.5 SD from the participants’ means were also removed. By adopting these criteria, less than 5% of the data were rejected. The mean RTs and error rates in each condition are presented in Table 2. Analyses of variance (anovas) by participants (F1) and by items (F2) were performed with frequency (low, high) and neighborhood density (sparse, dense) as variables.
Table 2. Mean reaction times (RTs, in ms), standard deviation (SD) for correct responses and error rates (in %) in each condition
In the RT analyses, only the main effect of frequency was significant [(F1(1,25) = 41.52, p < .0001); F2(1,92) = 7.87, p < .01]. RTs were shorter for high-frequency words than for low-frequency words. Neither the effect of neighborhood density (Fs < 1) nor the interaction between frequency and neighborhood density [(F1(1,25) = 2.85, p = .10); F2 < 1] was significant.
The error analyses showed a main effect of frequency that was significant by participants (F1(1,25) = 8.70, p < .01) and nearly significant by items (F2(1,92) = 3.42, p = .07). There were more errors for low- than for high-frequency words. The main effect of neighborhood density was significant both by participants (F1(1,25) = 8.62, p < .01) and by items (F2(1,92) = 3.81, p = .05). There were more errors for words residing in a dense neighborhood than for words residing in a sparse neighborhood. The interaction between frequency and neighborhood density approached significance by participants (F1(1,25) = 3.84, p = .06) but was not significant by items (F2(1,92) = 1.08, p > .20).
3.2. ERP results
Analyses were conducted on the three negative components usually associated with spoken-word recognition (N100, PMN and N400) and on the positive component called P350. Four time windows were selected around the peak amplitude of each component after visual inspection: 90–110 ms (N100),2 250–330 ms (PMN), 330–400 ms (P350), and 400–500 ms (N400). A late time window between 550 and 650 ms after the word onset was also used to analyze later effects on the N400 component (see for a similar approach, Desroches et al., 2009).
An anova was performed on these five time windows with frequency (low, high), neighborhood density (sparse, dense), site (frontocentral, centroparietal, anterior, posterior), and laterality (right, left) as variables. Sites and electrodes were chosen to provide appropriate scalp coverage to identify the components of interest: left anterior (F7, F5, AF7, AF3), right anterior (F8, F6, AF8, AF4), left frontocentral (F1, FC1, FC3, C1), right frontocentral (F2, FC2, FC4, C2), left centroparietal (CP1, CP3, P1, P3), right centroparietal (CP2, CP4, P2, P4), left posterior (P7, P5, PO7, PO3), and right posterior (P8, P6, PO8, PO4). The factor laterality tested potential lateral effects between left and right sites, and the factor site tested the topography of effects following the anterior–posterior line with anterior, frontocentral, centroparietal, and posterior electrodes. The Greenhouse–Geisser correction was applied (Greenhouse & Geisser, 1959), and the corrected p-values are reported below. Grand-average waveforms for high- and low-frequency words are displayed in Fig. 1, and those for words residing in sparse and dense neighborhoods in Fig. 2. Topographical maps are shown in Fig. 3.
On the N100 time window, only effects of site (F(3,75) = 22.14, p <.001) and laterality (F(1,25) = 4.49, p <.05) were observed. As typically observed, the amplitude of the N100 was stronger over the frontocentral and centroparietal recording sites. Additionally, the N100 showed a predominant effect over the left hemiscalp. The interaction between these two factors was also significant (F(3,75) = 8.27, p <.001). The N100 was stronger on the left hemiscalp in comparison with the right hemiscalp, particularly at the centroparietal (F(1,25) = 14.34, p < .01) and posterior (F(1,25) = 10.20, p < .01) sites.
On the PMN time window, a main effect of site was found (F(3,75) = 61.46, p <.001) showing positive values at the anterior and frontocentral sites and negative values at the centroparietal and posterior sites. As effects on the PMN are usually centered on frontocentral recording sites (e.g., Newman & Connolly, 2009), neighborhood density and frequency effects were examined on these particular sites. While no effect of frequency was observed (F(1,25) = 0.16, p >.2), an effect of density was found (F(1,25) = 5.32, p <.05). The amplitude of the PMN was greater for words residing in a sparse neighborhood than for words residing in a dense neighborhood. No interaction between frequency and density was found (F(1,25) = 0.16, p >.2).
On the P350 time window, a main effect of site (F(3,75) = 50.28, p <.001) was observed with positive values at the anterior and frontocentral sites, and negative values at the centroparietal and posterior recording sites. Interestingly, a significant frequency × site interaction was found (F(3,75) = 3.45, p =.05). The amplitude of the P350 was greater for low than for high-frequency words on both frontocentral (F(1,25) = 12.12, p <.01) and posterior recording sites (F(1, 25) = 5.74, p <.05). No others factors or interactions were significant.
On the N400 time window, the effect of site was also significant (F(3,75) = 26,51, p <.001), showing negative values at the frontocentral, centroparietal and posterior sites, and positive values at the anterior sites. As the electrodes placed around centroparietal sites are usually used for studying the N400 (e.g., Kutas & Hillyard, 1984), neighborhood density and frequency effects were examined on these particular sites. Neither an effect of frequency (F(1,25) = 0.51, p >.2) nor an effect of density (F(1,25) = 2.28, p =.15) was found. The interaction between these two factors was not significant (F(1,25) = .03, p >.2).
On the late N400 time window, a main effect of site was observed (F(3,75) = 5.68, p <.01) with negative values at the frontocentral and centroparietal recording sites, and positive values at the anterior and posterior recording sites. A main effect of frequency (F(1,25) = 4.45, p <.05) was observed. The amplitude of the late N400 was greater for low- than for high-frequency words. A significant frequency × site interaction was found (F(3,75) = 8.41, p < .01). This interaction was due to an effect of frequency restricted to centroparietal (F(1,25) = 10.87, p <.05), posterior (F(1,25) = 6.94, p <.05), and anterior (F(1,25) = 10,88, p <.01) recording sites. A significant density × site × laterality interaction (F(3,75) = 7.39, p <.001) was also observed. This interaction was due to an effect of neighborhood density on the left centroparietal (F(1,25) = 4.25, p <.05) and on the left posterior (F(1,25) = 5,22, p <.05) recording sites. More particularly, the amplitude of the late N400 was greater for the words residing in a dense neighborhood compared with the words residing in a sparse neighborhood. No interaction between frequency and density was found.
The main goal of this study was to determine the precise moment at which word frequency comes into play during spoken-word recognition. The earliest ERP differences between low- and high-frequency words were found at around 350 ms from stimulus onset; that is, long before the end of the words whose duration was around 565 ms on average. More precisely, the first effect of frequency was seen on the P350, a component thought to reflect activation of lexical form (Friedrich et al., 2004a), with high-frequency words eliciting lower amplitudes than low-frequency ones. This frequency effect had two major characteristics. First, it occurred roughly during the processing of the first two phonemes of the target words, which represents the point in time where lexical activation is assumed to start (e.g., Marslen-Wilson & Welsh, 1978). Second, it occurred before our words became uniquely identifiable, that is, when information in the speech signal was still ambiguous and partially compatible with other lexical candidates. Together, these observations strongly suggest that word frequency exerts its influence at an early stage of word processing, namely during lexical activation (Dahan et al., 2001; Marslen-Wilson, 1987, 1990; McClelland & Rumelhart, 1981), and thus rules out a purely late decisional locus (Connine et al., 1993). Note also that at an electrophysiological level, the observation of a word-frequency effect on the P350 is important on theoretical grounds, as it confirms the claim that this component reflects processes involved in the activation of lexical forms (Friedrich et al., 2004a). In particular, the difference in amplitudes between high- and low-frequency words reveals that high-frequency words are more easily activated, thus leading to lower amplitudes for these words.
Our results also suggest that frequency exerts an influence beyond the stage of lexical activation. We observed a second effect of frequency starting just after the offset of the target words and continuing until 80 ms after this offset. In particular, this late effect was seen on the late N400 component, with high-frequency words eliciting lower negativities than low-frequency ones. A major characteristic of this frequency effect is that it occurs when information in the speech signal is no longer compatible with other lexical candidates, and thus when target words can be reliably identified. This late frequency effect appears to reflect the ease with which the target word is selected as the best candidate. When the frequency of the target word is high, selection of the target word as the best candidate is easier, thus leading to lower negativities. Such an interpretation fits with the results of other studies where an effect thought to reflect lexical selection was found in approximately the same time window (Desroches et al., 2009). Following the assumption of Desroches et al. (2009), the uniqueness point of words which occurred here later, after the last phoneme, may produce a shift in time so that the frequency effect was only found on the late N400.
Crucially, our results suggest that word frequency affects the core processes of word identification starting from the initial phase of lexical activation and including target word selection. This observation is compatible with spoken-word recognition models, such as the Cohort (Marslen-Wilson, 1987, 1990) or the TRACE models (McClelland & Elman, 1986). In these models, word frequency can influence both lexical activation and lexical selection because it is assumed to be encoded in the resting activation level of lexical candidates,3 and because the selection of a particular word takes into account the global activation within the lexicon. For example, in the Cohort Model, target word recognition takes place when the difference in activation between the target word and its most activated competitors has reached a certain criterion. As a result, all other things being equal, the more frequent a word is, the more rapidly it surpasses the activation level of the other candidates, and the more rapidly it can be selected. Although the authors of the TRACE Model have proposed a mechanism to account for phoneme recognition, they have not done so for words. Nonetheless, numerous TRACE simulations have taken into account the activation level of the other candidates to define the precise moment at which the target word is selected and recognized (e.g., Frauenfelder & Peters, 1998). The DCM (Gaskell & Marslen-Wilson, 1997) could likewise accommodate early frequency effects by assuming stronger connection weights for high-frequency words, producing an advantage for these words even when the input is still being received, and hence is ambiguous. Late frequency effects could be accounted for by a learning mechanism in which the model must produce the maximum lexical activation. As a result, the output of the model is, by necessity, biased in favor of the higher frequency word candidates.
Our results are thus globally in accordance with different implementations of the word-frequency effect. However, it is difficult to determine exactly which model best accounts for the word-frequency effect. Precise TRACE simulations of different frequency implementations by Dahan et al. (2001) confirm this difficulty. These authors compared three major implementations of frequency discussed in the introduction: frequency operating on resting activation levels, on connection weights, and as a bias applied to activations in a decision rule during selection process. They showed that although the three methods fit their eye tracking data, the connection-strength approach provided the best fit of their frequency effect. Most crucially, however, regarding the precise time course of the word-frequency effect, the three methods incorporating frequency made similar predictions, giving an advantage to high-frequency words early in the recognition process. As a result, our results are globally consistent with the experimental results and simulations of Dahan et al. (2001) in showing that frequency plays a role at the earliest moments of spoken-word recognition.
The second goal of our study was to track the time course of the neighborhood density effect, which is known to reflect mechanisms involved in lexical access. Although a neighborhood density effect was not observed in RT measurements,4 ERP findings indicated that neighborhood density also affected the early stages of spoken-word recognition, around 250 ms from stimulus onset. In particular, we observed a first effect of neighborhood density on the PMN component, with words residing in dense neighborhoods eliciting lower amplitudes than words residing in sparse neighborhoods. Because there is much evidence suggesting that PMN reflects prelexical processing (e.g., Connolly et al., 1992; Newman et al., 2003), we think that our first effect of neighborhood density reflects the ease with which words are treated at a phonemic level. Indeed, because words residing in dense neighborhoods have frequently occurring phoneme and frequently occurring phoneme sequences, they facilitate prelexical processing (see Vitevitch & Luce, 1998, 1999). Interestingly, an effect of neighborhood density was also found at word offset, until about 80-ms poststimulus offset. As in the case for the word-frequency effect, the second effect of neighborhood density was seen on the late N400, with greater negativities for words residing in dense neighborhoods, in comparison with words residing in sparse neighborhoods. Again, we think that this later effect reflects the ease with which words are selected: words residing in dense neighborhoods encounter more intense competition from activated lexical candidates and are more difficult to select as the best candidate (see also Desroches et al., 2009).
To conclude, our results converge with those of Dahan et al. (2001) in showing that word frequency exerts an influence long before word offset during spoken-word processing, and, more important, before there is sufficient information to make a reliable decision about target identity. They, thus, rule out any interpretation of the word-frequency effect in terms of a purely decisional locus after word identification is completed. Importantly, our results also show that the influence of word frequency persists for some time after target offset. We thus conclude that word frequency is involved in both lexical activation and lexical selection.
Because some models emphasize onset-based similarity (Marslen-Wilson & Welsh, 1978; Norris, 1994) by activating a set of candidates that match the initial stretch (roughly the first two phonemes) of the input, we have also computed the number of words that match the target words on the first two phonemes independently of their length. As we will see in the description of our words, words residing in dense neighborhoods according to the definition of Luce and Pisoni (1998) tend to have many neighbors that share the first phonemes, and inversely words residing in sparse neighborhoods with Luce and Pisoni (1998)’s definition tend to have few neighbors that share the first phonemes. Hence, similar predictions can be made from the two metrics.
The N100 is considered to reflect early acoustic processing of spoken words.
Although word frequency is not implemented in the original version of the TRACE model (McClelland & Elman, 1986), the authors envisage that “frequency effects could be accommodated, as they were in the interactive-activation model of word recognition, in terms of variation in the resting activation level of word units….” (p. 60).
Unexpectedly, despite the robust evidence for an inhibitory effect of neighborhood density in the behavioral studies, we did not find this effect in RT analyses. In our study, words residing in a sparse neighborhood have on average 16 neighbors. However, in a recent behavioral study (Dufour & Frauenfelder, 2010) conducted with French materials and in which strong inhibitory effects of neighborhood density were found, words in sparse neighborhood had on average less than five neighbors. The words categorized as having few competitors in the present study may be subject to so much competition that our RT measurement could not capture the additional competition caused by the presence of additional neighbors in the dense neighborhood condition.
The ERP analyses were performed using Cartool software (developed in the Center for Biomedical Imaging of Geneva and Lausanne). Our thanks go to Ronald Peereman for his help in the calculation of positional phoneme and biphone frequencies. We are grateful to Jim Magnuson and three anonymous reviewers for their helpful comments on earlier versions.
Appendix A Words used in each experimental condition