Chinese readers' eye movements were simulated in the context of the E-Z Reader model, which was developed to account for the eye movements of readers of English. Despite obvious differences between English and Chinese, the model did a fairly good job of simulating the eye movements of Chinese readers. The successful simulation suggests that the control of eye movements in reading Chinese is similar to that in an alphabetic language such as English.
During the past few years, a number of formal computational models of eye movement control in reading have been proposed.1 These models differ on a number of dimensions including (a) the extent to which cognitive/lexical processing influences eye movements, (b) whether lexical processing drives the eye movements or primarily serves to interrupt processing when something does not compute, and (c) the extent to which words are lexically processed serially or in parallel. Nevertheless, most of them are able to account for certain benchmark data on eye movements during reading. Specifically, they generally account for the following effects (see Rayner, 1998, for a summary of studies supporting these effects): (a) word frequency (readers spend less time fixating on high-frequency words than low-frequency words), (b) predictability (readers spend less time looking at predictable words than unpredictable words), (c) preview benefit (readers spend less time looking at a word when they have received a valid preview of the word than when they received an invalid preview), and (d) landing position (the initial fixation in a word tends to be halfway between the middle and the beginning of a word).
Although these different models can account for much of the data, it is the case that they all deal with the reading of alphabetic languages. Our view is that one venue for testing the generalizibility of a model is to determine how effectively it can account for the data on eye movements when reading a non-alphabetic language.2 Perhaps the best test case, in terms of being the most dramatically different from alphabetic writing systems like English, is Chinese. In this article, we will discuss how the E-Z Reader model, which we developed for English (Pollatsek, Reichle, & Rayner, 2006; Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Rayner, & Pollatsek, 2003), fared when we subjected it to Chinese. Although such an endeavor is interesting in the context of models of eye movement control in reading, it also has broader significance in that there has recently been considerable interest in the extent to which cultural differences influence language processing and cognition in general (Chua, Boland, & Nisbett, 2005; Rayner, Li, Williams, Cave, & Well, 2007).
We will first review a few basic facts about Chinese and then provide a quick overview of what is known about the eye movements of Chinese readers. Chinese text is formed by strings of equally spaced box-like symbols called characters. Historically, it was printed from top-to-bottom (with the columns printed from right-to-left). However, like English, it is now most typically printed horizontally from left-to-right. Unlike English (and other alphabetic writing systems), Chinese is written without spaces between successive characters and words. Furthermore, individual characters vary in terms of complexity because they differ in (a) the number of strokes per character, (b) the number of radicals (or certain combinations of strokes that denote semantic or phonological information), and (c) the manner of construction (i.e., radicals can be combined in different ways to form compound words). Basically, there are many visual details packed into a constant, box-shaped area for each character.
Whereas the concept of a word is not as clearly defined in Chinese as it is in English (so that Chinese readers will disagree somewhat concerning where word boundaries are located), Chinese words also differ in frequency. Chinese characters are more like morphemes and most words are made up of two characters, although some words consist of only one character and some consist of three or more characters. Finally, Chinese words, like English words, vary in terms of how predictable they are from the preceding context.
What is known about the eye movements of Chinese readers? First, the perceptual span3 for Chinese readers extends 1 character to the left of fixation to 2 to 3 characters to the right when reading from left-to-right (Chen & Tang, 1998; Inhoff & Liu, 1997, 1998);4 in contrast, in English the span extends 3 to 4 letters to the left of fixation to about 14 to 15 letters to the right of fixation (Rayner, 1998). Second, not surprisingly, average saccades are much shorter in Chinese (about 2.6 characters) than in English (about 7–8 letters) because the information is more densely packed in Chinese (Chen, Song, Lau, Wong, & Tang, 2003). Third, average fixation durations tend to be very similar (about 225–250 msec) for readers of Chinese and English (Chen et al., 2003; Rayner, 1998; Sun & Feng, 1999). Fourth, regression rate appears to be slightly higher in Chinese (about 15%) than English (about 10%) skilled readers (Chen et al., 2003; Rayner, 1998). Fifth, the probability of skipping a word tends to be fairly similar in Chinese and English (Rayner, Li, Juhasz, & Yan, 2005).5 Sixth, Chinese readers, like English readers, fixate for less time on high-frequency words than on low-frequency words (Yan, Tian, Bai, & Rayner, 2006) and on high-predictable words than on low-predictable words (Rayner et al., 2005); like English readers, they also skip high-predictable words more than low-predictable words (Rayner et al., 2005) and high-frequency words more than low-frequency words (Yan et al., 2006). Finally, character frequency affects fixation time on a word, but only when word frequency is low (Yan et al., 2006).
2. The E-Z Reader model
We will next describe the basic architecture of the E-Z Reader model, and then turn to simulations to see how well it could account for the eye movements of Chinese readers. A major motivation for the E-Z Reader model was to account for eye movements during reading in English (or a typical alphabetic orthography) making the fewest and simplest assumptions possible. In such orthographies, a salient visual entity is the word, which is delimited by spaces.6 Thus, in E-Z Reader, the primary “engine” to move the eyes forward is the encoding of a word and the target for a saccade is the middle of a word. Given that words are not visually marked in Chinese (and indeed, there are disagreements about where word boundaries are), it is an open question whether a model that makes such assumptions will give a satisfactory account of reading in Chinese. On the other hand, there is good reason to believe that words are also quite salient for Chinese readers (even though the word boundaries are not clearly marked). For example, Bai, Yan, Liversedge, Zang, and Rayner (2007) had Chinese readers read sentences in which spaces were inserted between (a) every character, (b) every word, or (c) pseudo-randomly (so that the spaces did not clearly mark words). They found that inserting spaces between characters and pseudo-randomly disrupted reading, whereas inserting spaces between words did not yield reading times that were any different from normal Chinese text. Given that the Chinese readers had not seen spaced text before (and had a lifetime of experience reading unspaced text), the fact that spaces between words didn't increase reading time is evidence that words are psychologically salient for Chinese readers.
2.1. Encoding assumptions
The key encoding assumption of E-Z Reader is that words are lexically processed serially (i.e., one word at a time), although it is tacitly assumed that the component letters within a word are processed in parallel. That is, covert attention is posited to be directed to a word (word n) and does not shift to the next word (word n + 1) until encoding of the prior word is completed. The second key assumption is that the signal that triggers an eye movement to a word is not the same signal that triggers an attention shift to that word. That is, lexical encoding is, within the model, formally divided into two stages, L1 and L2, although they may not be distinct stages in the lexical processing system but instead may reflect the crossing of two different thresholds of lexical activation. Completion of the earlier stage, L1, provides a signal to the eye movement system to make an eye movement to the next word and completion of the later stage, L2, provides a signal to the attention system to shift attention to the next word. Both of these stages, in which words are processed serially, are assumed to be preceded by a visual stage, V, in which visual information impinging on the retina (although with acuity constraints) is assumed to be processed in parallel. Thus, low-level information, such as the spaces between words, is available early in a fixation to aid the planning of eye movements.
A key assumption of the model is that both L1 and L2 are assumed to be functions of both the frequency of the word in the language and its predictability from the prior text and thus that the speed of processing a word can influence the fixation time on it (see Equations 1 and 2 in Fig. 1). A second key assumption with respect to processing speed is that it decreases as words are further from the fixation point. The form of Equation 3 (see Fig. 1) in which speed is modulated by the average absolute distance of the letters in a word from the fixation point implies: (a) words in the parafovea will be processed more slowly than those in foveal vision; (b) fixated words will be processed more slowly the further the fixation point is from the center of the word;7 and (c) that, all else being equal, longer words will take longer to process than shorter words because, on average, the letters will be further from fixation.
2.2. Eye movement control assumptions
The second set of assumptions of the model deals with how signals are sent to, and executed by, the eye movement system. The key assumption, which explains why words that are easier to process are skipped, is a saccade cancellation mechanism. This is realized in the model by positing two stages in the eye movement planning time. That is, when a signal goes to the eye movement system, there is a program immediately laid down, but it is in a labile state and can be cancelled by a subsequent eye movement program; after a period of time has elapsed it goes into a non-labile state and cannot be cancelled by a subsequent program.
In the normal course of affairs, when the reader is on word n, a signal goes out to fixate word n + 1 when L1 is completed, and after both the labile and non-labile stages of the eye movement program are complete, a saccade is executed whose target is word n + 1. However, when word n + 1 is easy to process (and thus its L1 and L2 stage is short), a second eye movement program can be initiated to fixate word n + 2 (while the reader is still fixating word n) while the eye movement program to word n + 1 is still in the labile stage. In these cases, the eye movement program to fixate word n + 1 will be cancelled, the program to fixate word n + 2 will be executed (later), and thus word n + 1 will be skipped. To handle refixations on words, the model also assumes that an eye movement program to refixate a word is initiated almost automatically at the beginning of a fixation. However, this refixation program can be cancelled (and will often be cancelled) by the initiation of the program to fixate word n + 1. This cancellation will occur more often when L1 is long, and thus the model predicts that words that are more difficult to process will be refixated more often.
The last set of assumptions deal with eye movement targeting. As indicated above, all eye movements are assumed to be directed to the middle of the targeted word (both refixation programs and programs targeting other words). However, no motor programming is perfect, so there are errors assumed in the execution of the eye movement programs. First, there is random error assumed, which is a gamma distribution whose standard deviation is proportional to the length of the programmed eye movement. Second, and perhaps more important, there is assumed to be bias: There is, in some sense, a preferred eye movement distance, and saccades that are programmed to be longer than this distance tend to be shorter than they were intended and saccades that are programmed to be shorter than this distance tend to be longer than intended (see Equation 4 in Fig. 1). Thus, short words can also be skipped just because they are short. However, if they are also reasonably easy to process (and thus the L2 stage on word n + 1 is complete before word n + 2 is fixated) this will create no problems for the reader.
3. Applying the E-Z Reader model to Chinese reading data
On some level, the obvious question is whether the control of eye movements during reading in a language like Chinese is fundamentally different than in English. One way to test this is to try to apply E-Z Reader—which gives a good account of eye movement data in English—to Chinese. The corpus we selected to model was that of Rayner et al. (2005). In the modeling, there are three obvious differences in the orthography that one has to consider. First, the concept of “word” in Chinese is far from settled and thus, as noted earlier, different readers of Chinese may disagree on where the word boundaries are. We believe, however, that there is still a reasonable amount of agreement on what words are, and thus the word boundaries we used (see Fig. 2 for examples) were based on the judgments of three native speakers of Chinese).8 Second, there are no word boundaries signaled by the orthography similar to the spaces between words in English. Thus, it is far from clear how a Chinese reader would target the middle of a word. However, we decided to leave that part of the model unchanged. (We will discuss alternatives after presenting the simulation; however, it is far from clear what a reasonable alternative is.) Third, a character is quite different from a letter, both visually and linguistically. However, again, to keep the change from English as little as possible, we treated it as an orthographic unit, just as in English.
The corpus of text simulated consisted of 36 Chinese sentences (see Rayner et al., 2005). Each sentence was on a single line. The 16 participants read each sentence and pressed a key when they had read the sentence, and were periodically asked questions that tested for their understanding of the sentence they had just read.9 As with the simulations involving reading of English, we did not attempt to model fixations on the first or last words of the sentence. Moreover, as E-Z Reader does not attempt to explain “higher order” effects on eye movements (such as syntactic misparsing), we also did not analyze sentences in which there were interword regressions (as per earlier simulations).
Our method of fitting was one in which some of the parameters were fixed and others were free to vary within certain constrained reasonable limits (see Table 1 for details on the parameters), and as there is no algebraic solution for a best fit, we simply repeated the simulation a number of times and found the best fit (i.e., that minimized the root mean squared deviation or between observed and predicted values) within these constraints. As with our simulations in English, we divided the words in the text into five frequency classes, using the Chinese Dictionary (National Languages Committee, 1997), and examined how well the model fit various indexes of eye movements such as first fixation duration, gaze duration, and probability of skipping, for the five classes.
Table 1. Best fitting parameters
Best-fitting value in current simulation
Best-fitting value in simulation of Englisha
Note. α1, α2, α3, ε, Ω1, Ω2, Ψ,η1, and η2 were free to vary when fitting; all of the other parameters were fixed. V = duration of visual stage; M1 and M2 = durations of labile and non-labile stages of eye-movement programming; R = the mean refixation decision time; S = the assumed time for a saccadic movement; η1 and η2 = the slope and intercept of the random error component of saccadic programming; σγ = sets the standard deviations of the processing distributions to 0.22 times the mean; λ = controls refixation probabilities as a function of distance from the center of the word.
As seen in Table 2, E-Z Reader does a reasonable job of predicting the durations of fixations and the probabilities of fixating Chinese words as a function of their frequency. As with our simulations of English reading, the fit is not perfect, but there do not seem to be any serious anomalies. With the fixation durations, the major deviation between observed and predicted is for frequency Class 4, and there it appears that the observed values are a bit anomalous (e.g., the mean individual fixation durations on Class 4 words are actually less than on Class 5 words). Moreover, the means are based on relatively little data, as skipping rates were high for these words. The model also appeared to be under-predicting the effect of frequency on the skipping rates a bit.
Note. Root mean standard deviation (RMSD) = 0.1731. The RMSD for E-Z Reader 9 in English was 0.153 (Pollatsek, Reichle, & Rayner, 2006). The skipping rate presented here is much higher than that reported by Rayner et al. (2005). All of the target words in Rayner et al. (2005) were two characters. In this analysis of the data, words that were one character were also included leading to the higher overall skipping rates.
In evaluating the performance of the model, one also would want the parameter values estimated to be defensible in terms of what is known about reading. As a result, we present in Table 1 the best fitting parameter values of the fit for Chinese compared with those in our latest full simulation of data in English. As can be seen, the parameters involved in the speed of lexical access are all fairly similar to the English values. The parameters that are notably different, Ψ, Ω1, Ω2, η1, η2, deal with systematic and random errors in programming; however, these parameters were scaled in terms of characters and letters for Chinese and English, respectively, so that it makes sense that most of the values should be three to five times as big in English as in Chinese. The parameter η2, indexing the random error component of targeting saccades, indicates that saccade targeting is more variable for larger planned saccades in Chinese. This seems reasonable, as we discuss below. Only the eccentricity parameter, ε, appears to be at all anomalous: It is about the same size in the two simulations, even though the unit is the character in Chinese and the letter in English. Our attempts to force this value to be bigger in the Chinese simulations, so that the eccentricity parameters would be approximately equal when scaled in terms of degrees of visual angle, were unsuccessful. Apparently, the Chinese readers were more successful on extracting relevant information about characters further from fixation (in terms of visual angle) than were readers of English. This could be because there is lower spatial frequency information in Chinese orthography that plays a significant role in character and word identification, whereas analogous information (e.g., word shape) appears to play little or no role in English.
Another issue that our initial simulation did not consider was whether character frequency plays an important role in determining eye movement behavior apart from word frequency.
There are several ways to test this. The one we chose was to include character frequency as a predictor as well as word frequency and word predictability in the equation for the duration of L1. (All other aspects of the model were unchanged, although the parameter values were free to vary as in the above fit.) The equation we chose in Equation 5 (see Table 3) added a term that we designed to have the most reasonable form. That is, the term was constructed so that for very frequent words, the character frequency effect would be small, and for infrequent words, the character frequency effect would be bigger (Yan et al., 2006). In fact, when we applied this model to the data, the fit was actually slightly worse than with the model that used only word frequency and word predictability in spite of having two more free parameters (see Table 3). This suggests that character frequency may not play an important role in Chinese reading apart from word frequency. However, it is possible that some alternative equation combining the effects of word frequency and character may do a better job.
Table 3. E-Z Reader fits using character frequency
Simulations using the E-Z Reader model indicated that it did a good job of accounting for both when (i.e., how long readers look at words) and where (i.e., which words are skipped) Chinese readers move their eyes. We thus think that our simulations indicate that the hypothesis that the control of eye movements in reading Chinese is similar to that in an alphabetic language such as English is a reasonable one. Obviously, however, we are just scratching the surface and more sophisticated tests need to be done. A major question that needs to be explored, of course, is the mechanism by which saccades in Chinese are targeted given that there are no obvious cues for word boundaries. In our current simulations, we assumed that they were appropriately targeted as if these cues were present, and the one indication that something different was going on was that the η2 parameter was substantially larger in Chinese indicating more variability in programming, especially for larger saccades. In terms of the current simulation, one could characterize this as: Chinese readers were trying to do the same thing as English readers (i.e., trying to target the center of a word) but being less successful because word boundaries are not marked in the orthography. If the next word is short (e.g., a single character), this may be less of a problem. Needless to say, it is reasonable to consider alternatives to targeting words. However, they are far from obvious. Given that the probability of skipping a word in Chinese is influenced by the predictability of a word with word length held constant (Rayner et al., 2005), one can not reasonably posit that Chinese readers simply target saccades a fixed distance forward. Thus, they appear to have some sort of target based on what they have read. One possibility is that they do in fact have a fairly good idea where the end of a word is that has been encoded and program the next saccade to be something like one to two characters ahead of that (or possibly something like 1° of visual angle past the last word encoded). The answers to these questions require better knowledge of what kinds of visual information Chinese readers have online when they are reading.
In closing, we want to make it clear that by simulating the eye movements of Chinese readers in the context of the E-Z Reader model, we do not mean to imply that other extant models of eye movement control in reading could not do likewise. Our goal with the present simulations was not to suggest that E-Z Reader can do it, and the other models cannot. We do suspect that any type of purely oculomotor control model, in which the eyes move forward a set number of characters and pause for variable amounts of time to input the information, would have a difficult time accounting for the data. However, it seems at least plausible that other models that allow for cognitive/lexical influences on eye movements during reading could provide simulations that would capture the data pattern. Finally, further research such as that reported here examining language and cognition as a function of cultural background should help us to further understand important differences and similarities in cognitive processing due to cultural differences.
For a recent survey of existing models of eye-movement control during reading, see the 2006 (vol. 7) special issue of Cognitive Systems Research, which includes current instantiations of the following computational models: E-Z Reader (Reichle, Pollatsek, & Rayner, 2006), SHARE (Feng, 2006), SWIFT (Richter, Engbert, & Kliegl, 2006), Glenmore (Reilly & Radach, 2006), and the Competition/Activation model (Yang, 2006). See Reichle, Rayner, and Pollatsek (2003) for an overview of the models and the differences between them. Among the models listed here, E-Z Reader has the most lexical involvement and the Competition/Activation model, which is generally described as an oculomotor model, has the least amount of lexical involvement. Whereas E-Z Reader involves the serial lexical processing of words, SWIFT and Glenmore allow for parallel processing of words.
The perceptual span refers to the region of effective vision during an eye fixation. It is usually determined via the use of the moving window technique (McConkie & Rayner, 1975; Rayner & Bertera, 1979) in which the number of letters and characters available on each fixation is manipulated.
If the perceptual span is measured in terms of words or “idea units” rather than characters, the spans are fairly equivalent.
Actually, single fixations on a word tend to be shorter on the ends of words than in the middle of words. This counterintuitive finding is accounted for within the E-Z Reader model via mislocalized fixations (see Pollatsek, Reichle, & Rayner, 2006).
A possible counter-example is spaced compound words such as tennis ball, which are single words in some analyses.
There was initial disagreement on approximately 10% of the word boundaries in the sentences. Virtually all of the disagreements had to do with function words, and subsequent discussion among the three raters led to full agreement on the word boundaries used in the simulations.
This work was supported by Grant HD26765 from the National Institute of Health and a grant from Microsoft. Talks describing the work were presented at the 11th International Conference on Processing Chinese and Other East Asian Languages in Hong Kong, December 2005; and at the 2nd China International Conference on Eye Movements in Tianjin, June 2006. We thank the reviewers for their helpful comments on an earlier draft.