Orthographic consistency influences morphological processing in reading aloud: Evidence from a cross‐linguistic study

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. Developmental Science published by John Wiley & Sons Ltd 1Max Planck Institute for Human Development (MPIB), Berlin, Germany 2Department of Educational Psychology, University of Göttingen, Göttingen, Germany 3Department of Cognitive Science and Macquarie University Centre for Reading, Macquarie University, Sydney, NSW, Australia 4Laboratoire de Psychologie Cognitive, Centre National de la Recherche Scientifique (CNRS), Aix-Marseille University, Marseille, France 5International School for Advanced Studies (SISSA), Trieste, Italy 6Laboratoire Apprentissage, Didactique, Évaluation, Formation, Aix-Marseille University, Marseille, France


| INTRODUC TI ON
Forming links between spoken and written language provides the foundation for learning to read. However, learning the print-to-sound relationships in a given orthography is not sufficient to become a skilled reader. Children also need to learn to map print onto meaning in order to recognize words quickly, reliably, and efficiently (Nation, 2009). How might this process occur? Morphemes, the minimal linguistic units with a lexical or a grammatical meaning (Booij, 2012), are thought to play an important role in reading acquisition. Critically, morpheme identification facilitates word recognition (see Amenta & Crepaldi, 2012), thus enabling skilled reading.
Despite its importance for the development of skilled reading, morphology has been neglected even in the most recent and prominent theoretical conceptualizations of reading acquisition (e.g. Perry, Zorzi, & Ziegler, 2019;Ziegler, Perry, & Zorzi, 2014). The empirical evidence shows that as children move beyond the first stages of learning to read, their ability to reflect on and manipulate the morphological structure of words, known as morphological awareness (Carlisle, 1995), starts to influence their reading (e.g. Carlisle & Stone,

| Orthographic Depth Hypothesis and Psycholinguistic Grain Size Theory
The Orthographic Depth Hypothesis postulates that the use of phonology should be more prevalent when reading in a shallow orthography than when reading in a deep orthography, because the consistency of GPCs in the former makes the phonological representation of a

Research Highlights
• We investigated whether the orthographic consistency of a language or its morphological complexity influences morphological processing in developmental and skilled reading.
• A reading aloud task was used in four alphabetic orthographies that differ in orthographic consistency and morphological complexity (i.e. English, French, German, Italian).
• Developing and skilled readers of English, the least consistent and most morphologically sparse language, showed greater morphological processing than readers of the other three languages.
• Our findings suggest that the orthographic consistency of a language, and not its morphological complexity, influences the extent to which morphology is used in reading.
printed word available to the reader at less cost when its phonology is assembled. In contrast, the inconsistency of grapheme-to-phoneme relationships in deep orthographies encourages the reader to focus on the visual-orthographic structure of printed words, which could be effectively done by referring to their morphology. Critically, according to the Orthographic Depth Hypothesis, during the course of reading acquisition, readers of deep orthographies would shift their reliance from phonological codes to orthographic lexico-semantic codes.
From this viewpoint, only when readers have well-established lexical representations can they use a visual-orthographic semantic reading mechanism. The Orthographic Depth Hypothesis is consistent with the idea that different alphabetic orthographies afford different reading mechanisms or strategies (Seidenberg, 2011).
The Psycholinguistic Grain Size Theory was developed to explain cross-language variation in reading acquisition, namely, that children learning to read in a deep orthography lag behind children learning to read in a shallow orthography (e.g. Seymour et al., 2003). According to this theory, while readers of shallow orthographies can reliably use GPCs to pronounce words correctly, readers of deep orthographies need to rely on larger orthographic units, such as syllables, rimes, or even whole words to assign correct pronunciations. This is because smaller grain sizes tend to be more inconsistent than larger grain sizes in deep orthographies (Treiman, Mullennix, Bijeljac-Babic, & Richmond-Welty, 1995). There are many more orthographic units to learn when the grain size is large than when it is small, thus slowing down the rate of reading acquisition in deep orthographies compared to shallow orthographies. Even though morphemes are not mentioned in the original description of the Psycholinguistic Grain Size Theory, morphological units are thought to bring an important degree of consistency to orthographies that are characterized by inconsistency in the mapping between spelling and sound (Ulicheva, Harvey, Aronoff, & Rastle, 2018). It follows then from this theory, that readers of deep orthographies are likely to rely on morphemes to the same extent they rely on other sublexical units such as syllables and rimes when reading aloud (see also Goswami & Ziegler, 2006, who acknowledge that morphology should be given a greater role in Psycholinguistic Grain Size Theory). Critically, according to the Psycholinguistic Grain Size Theory, there is no shift during the reading development from phonological to lexico-semantic processing as a function of the consistency of the writing system. All readers have to go through an orthography-phonology mapping (phonological decoding), but they do so using different grain sizes.
With regard to our study, we predicted that based on the two theories outlined above, English readers should show overall more robust morphological processing than readers of French, German, and Italian. However, according to the Orthographic Depth Hypothesis, greater reliance on morphemes via a visual-orthographic reading mechanism should be apparent only in skilled, and not in developing readers of English, who just like the developing readers of the other three languages should show a preference for a phonological reading mechanism. In contrast, according to the Psycholinguistic Grain Size Theory, both developing and skilled adult readers of English should rely more on morphemes than on smaller grain sizes in reading aloud.

| Morphological complexity
Languages vary with regard to their morphological complexity.
Accordingly, deep orthographies seem to have simple inflectional morphology (e.g. English), whereas shallow orthographies tend to have complex inflectional morphology (e.g. German, Finnish, Italian, Serbo-Croatian). French appears to fall in the middle in this case, as its inflectional morphology is not as complex as in most shallow orthographies, but also not as simple as in English (Seidenberg, 2011).
Attempts to quantify morphological complexity across languages (for a review, see Borleffs, Maassen, Lyytinen, & Zwarts, 2017) reveal that according to the three main morphological complexity methods used in the literature, namely, Linguistica (Bane, 2008), Juola (1998Juola ( , 2008, and type-token ratio (TTR; Kettunen, 2014), English is the least morphologically complex, followed in increasing order by German, French, and Italian (Linguistica), or Italian, French, and German (Juola), or French, Italian, and German (TTR). An empirical question that arises is whether the use of morphology during reading depends on the morphological complexity of a language. We would expect that readers might be more sensitive to the morphological structure of printed letter strings in morphologically rich languages (e.g. French, German, Italian) than in morphologically sparse languages (e.g. English). Such sensitivity might be more prominent in skilled adult readers than in developing readers, because of greater exposure of the former to the characteristics of their language.

| Previous studies
To our knowledge, only one study has implicitly tested the above hypotheses cross-linguistically (Casalis, Quémart, & Duncan, 2015).
In that study, greater morphological processing was observed in French than in English developing readers, suggesting that the use of morphology in reading depends on the morphological complexity of a language. One limitation of that study was that the items in the different conditions were not matched on psycholinguistic variables that are known to influence reading processes. Also, the stems in the nonword items were often modified within and across languages inconsistently. This is problematic, because children seem to process morphologically complex words with modified stems differently than words with preserved stems (Lázaro, García, & Burani, 2015). We took these issues into consideration when constructing the stimuli for the present study.

| Present study
Conducting cross-linguistic research is challenging, insofar as both within-and across-language factors need to be taken into account.
One common strategy is to use materials that are as similar as possible across the languages under examination (Frith, Wimmer, & Landerl, 1998;Ziegler, Perry, Jacobs, & Braun, 2001). Thus, we chose translation-equivalent nouns, which often happened to be cognates, either in some or all of the languages. These were used for the construction of morphologically structured and non-morphologically structured nonwords, which were the focus of the present study. Four conditions were created: Stem + Suffix (e.g. nightness), Stem + Non-Suffix (e.g. nightlude), Non-Stem + Suffix (e.g. nishtness), and Non-Stem + Non-Suffix (e.g. nishtlude). The advantage of this design is that it allowed us to investigate how the presence of a stem or a suffix in printed letter strings may independently influence reading aloud processes, as well as how these may interact during reading aloud. To avoid the use of strategic reading processes, such as focusing exclusively on sublexical units during nonword reading, morphologically simple, and morphologically complex words were also included in the study.
Our aim was to investigate the processes that are at play when developing readers encounter new words with familiar units (i.e. morphemes). To simulate the situation that children face in natural reading we presented the nonwords intermixed with words. We used the reading aloud task and focused on morphologically structured and non-morphologically structured nonwords, because nonword reading aloud provides an index of children's decoding skills independently of their word knowledge (Castles, Rastle, & Nation, 2018).
The study was carried out with typically developing readers from Australia, France, Germany, and Italy, who attended Grade 3. We chose children in this grade, because compared to French, German, and Italian children, who typically reach 80%-90% nonword reading accuracy by the end of Grade 1, English-speaking children only start to reach similar levels of accuracy by Grade 3 (see Cossu, Gugliotta, & Marshall, 1995;Frith et al., 1998;Goswami et al., 1998;Landerl, 2000;Sprenger-Charolles, Siegel, & Bonnet, 1998;Wimmer & Goswami, 1994). Also, most words (60%-80%) that children encounter in third grade English texts tend to be morphologically complex (Anglin, 1993;Nagy & Anderson, 1984). Such proportions are likely to be higher in more morphologically productive languages. Critically, third graders are thought to be sensitive to the morphological characteristics of their language (Mann & Singson, 2003). Children in all four countries had roughly the same age. To test the predictions of the opposing theoretical accounts with regard to potential morphological processing differences as a function of reading experience, we also tested skilled adult readers on the same task in all four languages.

| Participants
A total of 126 children (30 Australian,32 French,32 German,and 32 Italian) in Grade 3 participated in the study for a small gift. French and German children were randomly selected from a larger sample that participated in an independent longitudinal project on the role of morphology in reading development. The selection criteria for these children were that (a) they were tested between February and March (to ensure that testing times were comparable across all languages-see below), and (b) their reading aloud accuracy was above 50%. Australian and Italian children were recruited for the purposes of the present study. Six Australian children achieved below 50% accuracy and were excluded, leaving a total of 24 to be included in the analyses. German, French, and Italian children were tested between February and May. Australian children were tested between September and October of the same year (given that the start of the school year in Australia is in February). Therefore, data collection in all countries took place after the first half of the third school year. Children in Australia started to receive formal reading instruction in the second half of the first school year, known as kindergarten (between ages five and six). In France, some reading instruction starts in the last year of école maternelle (at the age of five), which corresponds to kindergarten. In Germany and Italy, children start to receive reading instruction in the first grade (at the age of six).
A total of 128 adults (32 Australian, 32 French, 32 German, and 32 Italian) participated in the study for monetary compensation. Although studying at the university was not a requirement for participating in the study, most adult participants were university students. Both children and adult participants were native speakers of their respective languages, had normal or corrected-to-normal vision, and reported no hearing, reading, or language difficulties.
Participants' age and gender, as well as other demographic information, are shown in Table 1. The study was approved by the ethics committees of the participating universities and research institutions, as well as the relevant school authorities. Prior to participating in the study children gave oral consent, while written consent was obtained from their parents. Adult participants gave written consent.

| Materials
Sixty morphologically simple frequent nouns were selected from each language (e.g. night, nuit, Nacht, notte) for the construction of morphologically structured and non-morphologically structured nonword targets. The selected nouns served as stems and were combined with a frequent suffix, forming nonwords in the Stem + Suffix condition (e.g. nightness, nuiteur, Nachter, nottenza), or a letter sequence that did not correspond to a suffix, forming nonwords in the Stem + Non-Suffix condition (e.g. nightlude, nuiterge, Nachtatz, notterto). After a letter was replaced in the stems, the resulting non-stems were combined with the suffixes, forming nonwords in the Non-Stem + Suffix condition (e.g. nishtness, naiteur, Nechter, nuttenza), or the letter sequences (e.g. nishtlude, naiterge, Nechtatz, nutterto), forming nonwords in the Non-Stem + Non-Suffix condition. Translation-equivalent stems and whenever possible, translation-equivalent suffixes were used across languages.

| Procedure
Three hundred items (60 words and 240 nonwords) were used in each language. Nonword items belonged to four conditions: Stem + Suffix, Stem + Non-Suffix, Non-Stem + Suffix, Non-Stem + Non-Suffix. Word items belonged to two conditions: Suffix and Non-Suffix. Four lists were created with each target nonword appearing once across the four lists and each target word appearing once in every list. Thus, each list comprised 120 items, 60 nonwords (15 with stem + suffix, 15 with stem + non-suffix, 15 with nonstem + suffix, and 15 with non-stem + non-suffix) and 60 words (30 suffixed and 30 non-suffixed), with all conditions being represented in every list. An equal number of participants were assigned to each list. 4 The order of trial presentation within each list was randomized across participants. Six practice trials were presented prior to the experimental trials.
Participants were tested individually, seated approximately 60 cm in front of a laptop or a PC monitor in a quiet room. Stimulus presentation and data recordings were controlled by DMDX software (Forster & Forster, 2003). Participants were instructed to read aloud the items quickly and carefully. Each item was presented in lowercase letters, except for German, where the first letter of nouns is always uppercase. For consistency, all German items were presented in the same format. The stimuli appeared in white on a black background (20-point Arial font) and remained on the screen for 4,000 ms (children) or 3,000 ms (adults). The task lasted 15 min for children and 10 min for adults.

| Reading fluency
Children's reading ability was assessed to ensure they had no reading impairments that could affect their performance on the task.
Based on each sample, we calculated a z-score for correctly read words and a z-score for correctly read nonwords. The average of the two was used as a reading ability score in the analyses. The Italian MT Reading Test involved reading aloud of a text passage that contained words. A measure of reading speed of correctly read words, which was expressed in seconds per syllable, was extracted. This meant that higher scores on this test corresponded to slower children. Hence, z-scores based on the sample were first calculated, and then multiplied by −1.

| Vocabulary
Children's vocabulary size was assessed to obtain an estimate of the general level of lexical knowledge. The tests used were the vocabulary subtest of the Wechsler Abbreviated Scale of Intelligence (Wechsler, 2011) in English, the Test de vocabulaire actif et passif pour enfants de 5 à 8 ans (TVAP 5-8) in French (Deltour & Hupkens, 1980), and the vocabulary subtest of the CFT-20R (Weiß, 2006) in German. Based on each sample, z-scores were calculated. Due to testing time limitations, Italian children could not be administered a vocabulary test.

| Results
Naming latencies were determined by the acoustic onsets of participants' reading aloud responses. Acoustic onsets were hand-marked with CheckVocal (Protopapas, 2007) following the criteria specified by Rastle, Croot, Harrington, and Coltheart (2005). A response was deemed correct or incorrect using the same rules in all languages. In the case of words, incorrect responses were considered those where the word was read incorrectly. In the case of nonwords, incorrect responses corresponded to utterances containing mispronounced, deleted, or additional phonemes. The vast majority of nonwords yielded a single pronunciation. In those cases where nonwords could be pronounced in more than one way, all plausible pronunciations were considered correct. Generally, only pronunciations of nonwords that native speakers of the corresponding language considered illegitimate were marked as incorrect. In each language, trained research assistants who were naïve to the purposes of the study labelled the acoustic onsets and determined the accuracy of the reading aloud responses.
Analyses were performed using (generalized) linear mixed-effects (LME) models (Baayen, Davidson, & Bates, 2008)  Data from all four languages were analysed together. The analyses of the children's data are reported first, followed by the analyses of the adult data. Nonwords and words were analysed separately.
Nonwords were the focus of the present study, so only the nonword analyses are reported in the paper. All data and the R code corresponding to the present analyses are available via the Open Science Framework (OSF; https://osf.io/byqp9 /).

| Nonword naming latencies in children
Incorrect responses to words and nonwords (17.8% of the data) were removed. For the nonword analyses, latencies below 300 or above 3,000 ms (2.2% of the data) were considered as extreme values and were also removed. Outliers were identified following the procedure outlined by Baayen and Milin (2010). A base model that included only participants and items as random intercepts was fitted to the data and data points with residuals exceeding 2.5 SDs were removed (1.8% of the data).
The LME model included the effect-coded fixed effects of

Morphological processing as a function of Reading Ability
The interaction between Suffix and Reading Ability was significant.

Morphological processing as a function of Vocabulary Knowledge
An additional prediction derived from the Psycholinguistic Grain Size Theory is that vocabulary knowledge might facilitate reading in all languages, because the phonological decoding network can only be reinforced when children know the words they decode from  Ziegler et al., 2014), and this is true in all languages no matter the grain size they might use for the computation. In contrast, the Orthographic Depth Hypothesis would predict that vocabulary knowledge is particularly important for reading in English, because it would further boost lexico-semantic processing in this language (Harm & Seidenberg, 2004). We tested these predictions in English, French, and German.
The analyses were conducted in the same way as the analyses on Importantly, no differences across languages were observed.

| Nonword accuracy in children
Accuracy was analysed in the same way as naming latencies. The GLME model included the same fixed effects and interactions as the LME model. Results are shown in Table 4 and mean model errors are shown in Figure 4.

Main effects
The main effect of Stem was significant. Nonwords with stems  German; z = 0.969, p = .333, for French vs. Italian; z = 0.394, p = .694, for German vs. Italian). The absence of suffixes in nonwords was thus detrimental to children's reading accuracy in English. Last, the interaction between Stem and Suffix was significant. The Stem effect for suffixed nonwords (Δ = 8.9, z = −4.832, p < .001) was larger than the Stem effect for non-suffixed nonwords (Δ = 6.4, z = −2.820, p = .005).

| Nonword naming latencies in adults
Incorrect responses to words and nonwords (2.3% of the data) were removed. For the nonword analyses, latencies below 200 or above 2,000 ms (0.4% of the data) were considered as extreme values and were also removed. Outliers (1.9% of the data) were removed following the same procedure as for the children. The same LME model as for the analyses of the children data was created except that Reading Ability was not included in the model. Results are shown in Table 3 and mean model naming latencies are shown in Figure 5.

Main effects
The

| Nonword accuracy in adults
The GLME model included the same fixed effects and interactions as the LME model. Results are shown in Table 4 and mean model errors are shown in Figure 6.

Main effects
There was a significant main effect of Language. Errors in English

| G ENER AL D ISCUSS I ON
The present study is the first that uses a tightly controlled crosslinguistic experimental design to examine whether readers of deep orthographies use morphemes to compute pronunciations (rather than other large grain sizes such as syllables, rimes, or whole words, which have been extensively investigated in the literature). We observed that morphological processing is, indeed, more robust in English than in more consistent orthographies such as French, German, and Italian. Our findings provide support for the Orthographic Depth Hypothesis and the Psycholinguistic Grain Size Theory, showing that the orthographic consistency of a language, and not its morphological complexity, modulates the extent to which morphology is used in reading (see Vannest, Bertram, Järvikivi, & Niemi, 2002, who also found more morphological computation in English than in Finnish, even though Finnish is renowned for its morphological richness). 8 It is worth noting that cross-linguistic differences were even greater for stems than for suffixes, perhaps because of the serial left-to-right nature of the reading aloud task, which requires stem recognition prior to suffix recognition, thus placing more emphasis on the stem. Also, stem morphemes are thought to be highly salient units contributing the largest amount of meaning to morphologically complex words (Grainger & Beyersmann, 2017).
Another important finding of the present study is that the observed cross-linguistic differences in morphological processing were astonishingly similar for developing and skilled readers. This result is predicted by the Psycholinguistic Grain Size Theory, according to which readers of all alphabetic writing systems have to go through an orthography-to-phonology mapping (decoding) to acquire reading, but they use different grain sizes to do so. English spelling prioritizes the consistency of morphemes over the consistency of phonemes (Bowers & Bowers, 2018), which is why morphological units might be used by English-speaking children right from the start to achieve an efficient orthography-to-phonology mapping. The Orthographic Depth Hypothesis makes a somewhat different prediction by stating that readers of deep orthographies, such as English, shift their reliance from the phonological to the orthographic lexico-semantic route (Katz & Frost, 1992). Given that lexico-semantic processing takes time to develop, the Orthographic Depth Hypothesis would predict that cross-linguistic differences in morphological processing may only emerge with sufficient reading experience.
This specific finding also challenges connectionist reading models (e.g. Seidenberg & McClelland, 1989), which require a huge amount of training before they exhibit any cross-language differences in reading aloud (e.g. Hutzler, Ziegler, Perry, Wimmer, & Zorzi, 2004).
Such models make the strong prediction that morphological effects would only occur late during the learning-to-read process, when the division of labour shifts the focus from spelling-to-sound to spelling-to-meaning mappings.

| Morphological processing as a function of Reading Ability and Vocabulary Knowledge
The analyses on naming latencies showed that good readers yielded a 56-ms larger suffix effect than poor readers (see Figure 2), indicating that reading skill modulates sensitivity to suffixes. Also, French good readers yielded a large stem effect (i.e. 108 ms), whereas French poor readers yielded no stem effect (i.e. 2 ms), indicating that reading ability may modulate sensitivity to stem morphemes too. Moreover, we observed that children with good vocabulary knowledge yielded a 73-ms larger suffix effect than children with poor vocabulary knowledge (see Figure 3), indicating that vocabulary knowledge also modulates sensitivity to suffixes. Taken together, these results suggest that children who read fluently and children who have a rich vocabulary make more extensive use of morphology during reading. This finding is consistent with the idea that individuals with good reading and language skills are better at mapping letters onto large grain sizes (Andrews & Lo, 2013;, thus promoting more efficient reading. As per the Psycholinguistic Grain Size Theory, and in contrast to the Orthographic Depth Hypothesis, the F I G U R E 6 Adult nonword accuracy (%) and standard errors extensive use of morphology as an index of efficient reading by children with good vocabulary knowledge was not modulated by the orthographic consistency of the language.

| Limitations
One limitation of our study is that we used only suffixed items, so it might be that our results do not generalize to prefixed items. It is worth noting though that a few recent studies that specifically sought to investigate differences in the processing of prefixed and suffixed items during reading found no differences between the two affix types. For example, in a study conducted in French, equivalent priming was observed when targets (e.g. AMOUR) were preceded by prefixed (e.g. preamour) and suffixed (e.g. amouresse) nonword primes, or similarly constructed non-affixed nonword primes (e.g. brosamour, amourugne), compared to an unrelated condition (Beyersmann, Cavalli, Casalis, & Colé, 2016). Similar findings have been reported in English (Heathcote, Nation, Castles, & Beyersmann, 2018) and German (Mousikou & Schroeder, 2019). Moreover, the results from the German study were replicated in three single-word reading experiments and one sentence reading experiment. Therefore, there is no reason to think that the cross-linguistic differences observed in the present study would not also arise with prefixed items.

| Educational implications
There is a general consensus that systematic phonics, that is, explicit instruction of the relationship between letters and sounds, is best practice for early reading instruction in English (see Castles et al., 2018). However, as it has been recently pointed out by Bowers and Bowers (2018), English is a morphophonemic system that evolved to jointly represent units of meaning (morphemes) and phonology (phonemes). In fact, English prioritizes the consistent spelling of morphemes over the consistent spelling of phonemes. Accordingly, it has been suggested that reading instruction in English should be guided by the logic of the English writing system (Bowers & Bowers, 2017). Thus, it should be organized around morphology and phonology rather than just phonology. Our findings support this idea. We found that developing readers of English made extensive use of morphology in reading aloud. Furthermore, we observed that good readers were overall more sensitive to morphological structure than poor readers.
Importantly, poor readers of English often exhibit phonological processing deficits, so these children might benefit even more by teaching methods that focus on optimal grain sizes of their writing system (i.e. morphemes), which would allow a more straightforward mapping between print and sound, in addition to an easy mapping between print and meaning.
To conclude, cross-linguistic studies can help us gain an insight into both universal and language-specific processes involved in reading acquisition, which is critical for addressing theoretical and applied issues that are relevant for a universal science of reading.

CO N FLI C T O F I NTE R E S T S
The authors declare that they have no conflicts of interest with respect to their authorship or the publication of this article.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets analysed during the current study are available in the 2 For the Italian nonwords busazione and busalorte, a completive thematic vowel was added to the stem bus (-a, resulting in busa), so that the nonwords were phonologically legal when combined with the suffix -zione and the letter sequence -lorte. Due to an oversight, the counterpart "non-stem" nonwords basazione and basalorte contained the real stem bas-. Therefore, neither the psycholinguistic properties of this quadruplet nor the naming latencies corresponding to it were included in the respective calculations and analyses. Similarly, the nonwords bisfuitful (English) and tanneloso (Italian) were accidentally used twice, correctly in the Non-Stem + Suffix condition but incorrectly in the Non-Stem + Non-Suffix condition. The psycholinguistic properties of these nonwords and the naming latencies corresponding to them were excluded from the respective calculations and analyses. 5 Population norms were available for the German SLRT, the French TIME3, and the English TOWRE. A one-sample t test revealed that German children performed significantly below the population mean for words, t(31) = −3.436, p = .002, and nonwords, t(31) = −3.173, p = .003. French children did not differ significantly from the population mean on word reading, t(31) = 1.145, p = .261. English-speaking children did not differ significantly from the population mean for words, t(23) = 0.637, p = .530, yet they scored slightly above the population mean for nonwords, t(23) = 2.934, p = .007. Because of these differences, we computed a population-based Reading Ability score for each child in these three languages, where population norms for the corresponding reading tests were available. We then analyzed the data in the same way, except that population-based reading scores were included in the model. Results did not differ from those reported in the paper (see relevant analyses at https://osf.io/byqp9 /). 6 One possibility is that the stronger morphological effects observed in English are due to the higher overall exposure to print of the Australian sample, given that formal reading instruction in Australia begins in kindergarten, so before Grade 1. To exclude this possibility, we carried out additional analyses. Given that our French and German data came from a sample of children who participated in an independent longitudinal project (see Section 2.1), we had reading aloud data (from the same task that includes the same stimuli) from the same French and German children in Grade 4. Our additional analyses included thus the reading aloud data from 24 English-speaking third-graders, 24 French fourth-graders, and 31 German fourth-graders (testing in Grade 4 occurred exactly a year later than testing in Grade 3 in both France and Germany, so there were a few dropouts). Critically, results from these analyses were similar to those reported in the paper, showing more robust morphological effects in English-speaking third-graders than in French and German fourth-graders (see relevant analyses at https:// osf.io/byqp9 /). 7 Six English-speaking children were originally excluded from the analyses due to an error rate of over 50%. Hence, French and English children yielded more errors than German and Italian children. The high error rate in French could be due to the substantial number of silent letters in the French nonwords, which would likely increase pronunciation uncertainty. 8 One possibility is that English readers' sensitivity to morphology is due to the "high visibility" of morphological information in English spelling (Rastle, 2018). English past tense forms, for example, are always spelled with 'ed' even when their ending is pronounced /əd/, /d/, or /t/, thus making morphological relationships in print particularly prominent. However, morphological information is not less visible in the other three languages. In German, for example, a similar morphological principle applies: the written form of morphologically related words (e.g., Sand-sandig 'sand-sandy') is preserved even when the spoken form slightly varies (/zant/-/zandɪk/), where the 'd' in 'Sand' is pronounced /t/ due to devoicing.