Distinct genetic influences on grammar and phonological short-term memory deficits: evidence from 6-year-old twins


D. V. M. Bishop, Department of Experimental Psychology, University of Oxford, Tinbergen Building, South Parks Road, Oxford, OX1 3UD, UK. E-mail: dorothy.bishop@psy.ox.ac.uk


Children with language impairments have limitations of phonological short-term memory (STM) and have distinctive problems with certain aspects of grammar. Both deficits have been proposed as phenotypic markers of heritable language impairment. We studied 173 twin pairs, selected to be over-representative of children with risk of developmental language impairment, using a battery of standardized language and intelligence tests, a test of nonword repetition to index phonological STM and two elicitation tasks to assess use of verb tense marking. As predicted, the phonological STM and the verb tense measures both discriminated children with risk of language impairment from low risk children, and DeFries–Fulker analysis showed that impairments on both tasks were significantly heritable. However, there was minimal phenotypic and etiological overlap between the two deficits, suggesting that different genes are implicated in causing these two kinds of language difficulty. From an evolutionary perspective, these data are consistent with the view that language is a complex function that depends on multiple underlying skills with distinct genetic origins.

Most children learn language effortlessly, so that by 4 years of age they are able to talk clearly in complex utterances and to understand most of what is said to them. However, a minority of children make much slower progress than this, even though all the prerequisites for language development (adequate hearing and intelligence, normal physical development and supportive home environment) are in place. This selective and unexplained difficulty in learning language is known as specific language impairment (SLI) and affects around 7% of 5- to 6-year-old children (Tomblin et al. 1997).

Over the past two decades, several studies have converged in finding that SLI is under genetic influence (see Bishop 2002 for review). The strongest evidence comes from comparisons of identical [monozygotic (MZ)] and fraternal [dizygotic (DZ)] twins. One would expect twins growing up together to resemble one another, because they are exposed to many of the same environmental influences. However, twins differ in their genetic relatedness: MZ twins are genetically identical, whereas, on average, DZ twins have only 50% alleles in common. Thus, if genes affect a trait, then MZ twins should be more similar to one another than DZ twins. Three twin studies of SLI have shown that if one member of a MZ twin pair has SLI, the probability is high (70–96%) that the co-twin will also have some evidence of language impairment; in contrast, if a DZ twin has SLI, the chances that the co-twin will be affected is around 46–69% (Bishop et al. 1995; Lewis & Thompson 1992; Tomblin & Buckwalter 1998). The evidence for genetic influence on SLI is now so strong that few would dispute it. What is not known, however, is the mechanism whereby genes affect language development. There are many theories of the underlying basis of SLI (Bishop 1997; Leonard 1998). These vary widely in terms of the specificity of the mechanism that is thought to be defective, encompassing those that postulate low-level auditory deficits, those that argue for a general limitation of processing capacity and others that maintain that there is impairment of brain regions that are innately specified for language processing. We shall focus here on two theories that are of particular interest insofar as they make specific claims about the nature of the inherited deficit in SLI.

Baddeley et al. (1998) argued that humans have a specialized phonological short-term memory (STM) system that evolved to facilitate language learning and is deficient in SLI. To assess integrity of this system, Gathercole and Baddeley (1990) asked children to repeat nonsense words, such as ‘blonterstaping’ or ‘dopelate’, and showed that those with SLI were impaired on this task relative to controls. The deficit was particularly pronounced for nonwords with four or five syllables, consistent with a theory that attributes poor language learning to limitations of phonological STM. Bishop et al. (1999a) replicated this finding in a twin sample of children aged 7–12 years and also showed that deficits in nonword repetition were highly heritable, with little evidence of any environmental influence. In contrast, ability to remember sequences of nonverbal tones showed no genetic influence. This study suggested that there might be specific genes that influence nonword repetition. Subsequent molecular genetic studies have supported this idea, revealing a region on chromosome 16 that shows significant linkage to nonword repetition performance (SLI Consortium 2002; 2004).

An account of SLI that implicates deficient phonological STM contrasts sharply with theoretical accounts derived from a linguistic perspective. An influential theory of this kind was proposed by Rice et al. (1995) who suggested that SLI occurs when there is delayed maturation of a modular brain system that is implicated in one aspect of grammar: marking of finite verb phrases. A finite verb is one that is marked for tense and number with a grammatical morpheme such as past tense – ed or third person singular – s. For instance, in English, the verb ‘go’ is finite in a sentence such as ‘John went to school’ or ‘he goes to school’ but is non-finite in sentences such as ‘I want him to go to school’ or ‘I saw him go to school’. Before the age of about 4 years, typically developing children treat marking of finite verbs as optional and will find a sentence such as ‘John go to school’ as acceptable as ‘John goes to school’: after this age, it is hypothesized that the module matures and a grammatical parameter controlling finite verb marking are set for English, and hence, correct verb inflections are seen. According to Rice et al. (1995), mastery of finite verb marking is determined by biological maturation, with only a minimal amount of language exposure being needed to trigger the correct setting. However, in SLI, it is argued that maturation of the hypothesized module is delayed, and children continue to treat the marking of tense and agreement on finite verbs as optional. Rice (2000) reviewed evidence in support of this ‘Extended Optional Infinitive’ theory, noting that it was supported by the distinctive pattern of grammatical errors seen in children with SLI and by the lack of correlation between their use of verb inflections and vocabulary development. She suggested that problems with verb inflections might act as a phenotypic marker of heritable language disorder. In line with this idea, Rice et al. (1998) found an increase in language impairments in first degree relatives of children who made errors on verb inflections. Bishop (2005) found further support for this proposal in a study showing high heritability of impaired verb inflection use in a sample of 6-year-old twins.

We thus have two candidate behavioral markers for heritable SLI, nonword repetition and verb inflection marking. Both markers have a strong theoretical basis, are good at discriminating children with SLI from typically developing children (Conti-Ramsden 2003), and show high heritability (Bishop et al. 1996; 1999a; Bishop, 2005). The immediate question that is raised is whether they are different manifestations of the same underlying impairment or whether they correspond to etiologically distinct subtypes of SLI. This is an important question, not only for those carrying out studies on the molecular basis of SLI but also for theories of the origins of language.

The Rice–Wexler account of grammatical impairments is in line with theorising by Chomsky (1986) who argued that language, and especially grammar, could not be acquired by general-purpose learning mechanisms, and hence must depend on a specialised ‘language acquisition device’. However, this raises the question of how such a device could have evolved in humans. It seems implausible that such a complex function could be the result of a single ‘macromutation’ that distinguishes humans from other primates (Pinker 2003). A radically different view has been put forward by those working in the neuroconstructivist tradition (e.g. Elman et al. 1996) who argue that a great deal of functional specialization emerges during the course of learning and who queried whether any domain-specific language learning mechanisms need be postulated. Rather, the capacity for language might be explicable in terms of the unique conjunction of sensorimotor, attentional and computational skills that differentiates humans from other primates (Bates 2004). According to this view, problems with verb inflections could be an indirect result of limitations of verbal STM, rather than the consequence of delayed maturation of a specific syntactic module (Baddeley et al. 1998; Bates 2004; Bishop 1997; Joanisse & Seidenberg 1998).

A contrasting account of SLI, put forward by Ullman and Pierpont (2005), also predicts a link between impairments of nonword repetition and syntax. According to these authors, SLI is the consequence of impairment of a neurological system affecting procedural learning. This system is implicated in certain types of nonverbal, especially motoric, function and also in syntax acquisition and working memory. On this view, we should see associations between poor nonword repetition and impaired mastery of verb morphology in SLI, because both are manifestations of the same underlying neurological dysfunction.

To date, there is little evidence on this point, and such studies as exist are ambiguous. Norbury et al. (2001) confirmed that both nonword repetition and verb inflection tasks distinguish between children with SLI and typically developing children, but the correlation between these two deficits within the SLI group was weak and non-significant. However, Botting and Conti-Ramsden (2001) divided a group of 11-year olds with language impairments into those with good and poor nonword repetition and found that the latter group did significantly worse on a range of language tests, including measures of verb tense use.

The current study used a sample of 6-year-old twins selected so that children with poor language skills were over-represented. These children were given a battery of language tests, including two tasks designed to elicit verb grammatical morphemes and a test of nonword repetition. The principal goal of the study was to establish whether deficits in nonword repetition and verb inflection use were etiologically distinct.

Materials and methods


The twin pairs seen for the current study were a subset of participants in the Twins Early Development Study (TEDS), an population-based study of twins born in England and Wales in 1994, 1995 and 1996 (Trouton et al. 2002). The main TEDS sample was recruited through the UK Office for National Statistics, with parents of all live 1-year-old twins born in this period invited to take part. Given the large sample size, and the dispersal of families around the UK, it was not possible to do individual language testing of all the twins in the TEDS sample, but parents completed assessments of their children's language and nonverbal abilities at 2, 3 and 4 years of age (Dale et al. 2003), with 4-year parental report data being available on 5426 same-sex twin pairs. On the basis of parental assessment at 4 years of age, children were identified as at risk of language impairment (‘LI risk’), if they had a poor score on any one of three indices. The first was a grammar rating, which was used to identify children who were not yet talking in full sentences. The second was a parental estimate of the child's vocabulary size based on an experimental checklist designed for 4-year olds. This consisted of a list of 48 words from which parents were asked to check those that they have heard their child say. Finally, children were included in the at risk group if parents answered ‘yes’ to the question ‘Do you have any concerns about your child's speech and language?' and selected the option ‘his/her language is developing slowly’ when asked to specify the nature of the concern. These parental report measures have been shown to be effective at identifying children who obtain low language scores when seen for individual testing (Oliver et al. 2004). For the whole sample, around 10% of twin pairs met criteria for LI risk in one or both twins. The remaining children were designated as ‘low risk’.

The identification of language risk status was made on the basis of parental report at 4 years of age, but children were seen for the current study at 6 years of age. We excluded cases where the language impairment was associated with sensorineural hearing loss, physical handicap, autism or another syndrome affecting cognitive development. We also excluded families where English was not the only language spoken in the home. The participants were selected to be white, which includes over 90% of the population of England and Wales, in order to reduce the possible effects of ethnic stratification in future molecular genetic studies.

Same-sex twin pairs were selected from the main TEDS sample so that pairs where one or both children met criteria for LI risk constituted around 2/3 of the sample, and there were equal numbers of MZ and DZ pairs; however, twin concordance for LI pairs was not taken into account when selecting twin pairs, as this would have biased heritability estimates. The initial sample contained 196 twin pairs, but data were excluded for 23 twin pairs (19 LI risk and four low risk) where one or both twins was either reluctant to speak or was too unintelligible to give valid results on the nonword repetition and/or verb inflection tasks. The remaining sample of 173 twin pairs (see Table 1) did not differ from the remainder of the TEDS sample in terms of socio-economic status. The sample of twins described here overlaps partially with a 4-year-old sample described by Colledge et al. (2002).

Table 1.  Numbers of twin pairs selected for in-depth study in relation to zygosity, gender and LI risk status
 Twins with LI risk
 Neither twinOne twinBoth twinsTotal
  1. DZ, dizygotic; LI risk, risk of language impairment; MZ, monozygotic.

MZ female1681337
DZ female1619944
MZ male18131950
DZ male12191142

Children participating in the current study were seen individually in a quiet room at home or school for an assessment lasting around 90 min at the age of 6 years (range 6.0–6.9 years, mean = 6.5 years; SD = 0.185 year). The protocols for this study were approved by the Ethics Committee of Oxford University Experimental Psychology Department. Signed consent was obtained from all parents whose children participated.

Test battery

Children were given a battery that included the four subtests of the Wechsler Abbreviated Scale of Intelligence (WASI) (Wechsler 1999) and three subtests from the Clinical Evaluation of Language Fundamentals – Revised (CELF-R) (Semel et al. 1987): Listening to Paragraphs, Sentence Structure (both measures of receptive language) and Recalling Sentences (an expressive test of STM for sentences).

Nonword repetition was assessed using the Children's Nonword Repetition Test (Gathercole et al. 1994), with the test words digitized and presented on a laptop computer by a talking monster animation. The child listened through headphones and repeated the nonwords. Accuracy of the child's repetition was scored on-line, but an audio recording was made of the session, so that the item could be rescored in cases of uncertainty. Because of time constraints, articulation was not routinely assessed in all children. However, The Goldman–Fristoe Sounds-In-Words subtest (Goldman & Fristoe 1986) was administered to any child whose speech was judged by the examiner to be unclear or immature. Of 59 such children, 24 produced fewer than 85% of consonants correctly; all analyses reported below were rerun with these children excluded to confirm that poor articulation was not responsible for observed deficits on other measures. Results were essentially unchanged.

Use of verb inflections was tested using pre-publication versions of two subtests from the Rice–Wexler Test of Early Grammatical Impairment (Rice & Wexler 2001). In the first of these, the Past Tense probe, the child is first given a demonstration item, showing two pictures, one of a boy raking leaves and the second of him having completed the task. The examiner says ‘I have two pictures. I will describe the first one, and you will tell me about the second one. Here the boy is raking; now he is done. Tell me what he did’. The goal is to elicit the past tense form ‘raked’. The child is encouraged to produce a full sentence with an overt subject and is given demonstration of what is required if necessary. Another practice item, ‘skated’ is then given, with feedback, followed by 19 test items, including 11 regular verbs (wash, colour, paint, brush, kick, clean, climb, jump, play, pick and plant) and eight irregular verbs (fall, catch, make, throw, write, ride, swim and dig). Responses were tape recorded and coded according to whether or not the verb was inflected for tense.

The second test of verb inflections was the 3rd person singular probe. The child was given a series of pictures depicting occupations, such as dentist, painter and cowboy, and prompted ‘Here's an (occupation name). Tell me what an (occupation name) does', with the goal of eliciting verbs such as ‘looks’, ‘paints’ or ‘rides’. Two practice items were first given, with additional prompting as necessary, followed by 12 test items. Responses were tape recorded and coded for presence of marking of 3rd person singular.

Note that the focus of interest for the Extended Optional Infinitive Theory is whether or not the verb is marked for tense, rather than whether the marking is correct. The ‘verb inflections’ score was the percentage of items inflected for tense across both tests, expressed in relation to all items where an appropriate verb form (inflected or stem) was elicited. Incorrect inflections, such as overgeneralizations of irregular verbs (‘runned for “ran”) were included in the total of inflected forms. Univariate heritability estimates for this measure on this sample have been reported by Bishop (2005).


Mean scores on the core battery for LI risk and low risk children

Table 2 summarizes mean scores on the test battery for individual children in relation to language risk status. Scores are shown as age-scaled scores except for the verb inflections task, where norms were not available. These data allow us to see how far the low risk sample is representative of the general population and how effective the criterion for identifying language risk were. The low risk children score close to the normative mean of 100, whereas the LI risk cases score significantly lower on all language measures. The effect size is shown by η2, which shows the proportion of variance accounted for by the group factor. The largest effect size is seen for Recalling Sentences, consistent with findings by Conti-Ramsden (2003) who found sentence repetition to be a particularly good marker of SLI. Although there is a significant group difference on the measure of Performance IQ, the effect size is very small, and it is clear that overall the nonverbal abilities of the LI risk group are well within normal limits.

Table 2.  Mean scores on test battery for low risk and LI risk children
 Low riskLI risk   
 n = 183n = 163FPη2
  • LI risk, risk of language impairment.

  • *

    Subtest from CELF-R; scores rescaled to mean of 100 and SD 15 for comparability with other tests.

Performance IQ100.9 (11.22)97.7 (10.79)7.10.0080.02
Verbal IQ101.0 (13.21)93.3 (12.52)31.3<0.0010.08
Listening to paragraphs*99.8 (13.62)94.6 (15.41)11.60.0010.03
Sentence structure*99.5 (13.00)92.6 (12.68)24.2<0.0010.07
Recalling sentences*97.0 (12.43)86.7 (13.82)52.7<0.0010.13
Nonword repetition scaled96.6 (17.07)85.1 (18.13)36.8<0.0010.10
% verbs inflected (raw)94.9 (10.08)88.2 (18.39)18.2<0.0010.05

Derivation of a measure of phonological STM

As summarized in Table 2, when nonword repetition scores were converted to scaled scores on the basis of the test norms, there was a significant difference between the low risk and LI risk groups. Analysis of raw data in relation to syllable length indicated that this effect was driven by performance on the longer nonwords, with the two groups having similar mean scores on two-syllable nonwords (see Fig. 1). A similar pattern was observed by Bishop et al. (1996), but in their study, with an older sample, performance on the two-syllable items was near ceiling. The fact that children made errors on the two-syllable nonwords, which posed little memory load, suggested that performance was affected by articulatory efficiency as well as by phonological STM. This is reminiscent of findings by Colledge et al. (2002) who reported that a brief nonword repetition test loaded on the same factor as articulation in 4-year olds. To obtain a measure that reflected memory capacity, independent of articulatory efficiency, we computed a derived measure that reflected the score on three-, four- and five-syllable nonwords, after adjusting for the score on two-syllable nonwords. To this we computed the total score on nonwords of three syllables and more, and then, using the low risk group only, computed the regression equation for predicting this total from the score on two-syllable nonwords. This equation was then applied to all children to compute a residual score. A low residual score indicates that the child's score on nonwords of three syllables and more is lower than would be predicted from their two-syllable score. This measure differentiated between the low risk and LI risk groups, with a greater effect size than any of the other measures: mean for low risk = 0, SD = 1; mean for LI risk = −0.87, SD = 1.12; F (1, 344) = 58.8, P < 0.001, η2 = 0.15. This score will be referred to as the phonological STM index.

Figure 1.

Mean number of items correct on nonword repetition at each syllable length for 6-year-old children, subdivided according to 4-year-old language status. Error bars show standard errors. On t-test, the two groups differ significantly at 0.001 for all syllable lengths except two syllables, where the difference falls well short of significance. LI risk, risk of language impairment.

Univariate analysis: heritability of different language measures

To examine heritability of language impairment, we applied DeFries–Fulker (DF) analysis (DeFries & Fulker 1985) using the Mx implementation of this method developed by Purcell and Sham (2003). For this analysis, one first identifies as a proband any child who scores below a cutoff level on the measure of interest. The scores of co-twins will tend to regress toward the population mean. If proband/co-twin similarity were determined solely by environmental factors, then the amount of regression to the mean should be similar for MZ and DZ twins. If, however, genes are implicated in causing disorder, the DZ co-twin scores should regress further to the mean than the MZ co-twin scores. After appropriate scaling of the data, one can obtain estimates of heritability of impairment from a regression analysis in which co-twin scores are predicted from proband scores and from the degree of genetic relationship between twins (1.0 for MZ and 0.5 for DZ). The statistic h2g estimates the extent to which differences between impaired and unimpaired children are caused by genetic variation. This method is robust even when data are strongly skewed, as in the verb inflections task (Bishop, 2005) and is not sensitive to ascertainment bias, and hence can be applied to samples where there is over-representation of impaired cases. DF analysis requires that a cutoff be selected to identify probands in relation to an estimated population mean. In the analyses conducted here, we estimated the population mean by taking an average from the whole sample, with the low risk and LI risk pairs weighted to reflect their frequencies in the whole TEDS population (0.9 for low risk and 0.1 for LI risk). The cutoff was placed so as to select the lowest 13% of cases as probands.

On DF analysis (Table 3 and Fig. 2), h2g was significant at the .05 level only for phonological STM, verb inflections and the CELF-R Sentence Structure subtest: the latter measures the ability to understand sentences that incorporate various syntactic structures, such as passive voice or embedded clauses. These three measures were striking insofar as they showed no significant effect of environmental influences that are common to both members of a twin pair (c2g). Genetic terms fell short of significance for the other CELF-R subtests and were very low for the two WASI subtests, vocabulary and similarities. Both the vocabulary subtest and the CELF-R Recalling Sentences showed significant influence of shared environment. Not summarized in Table 3 and not shown in Fig. 2 are results from DF analysis of the raw scores from nonword repetition. Group heritability of this measure was not statistically significant (h2g = 0.18).

Table 3.  Results of univariate DeFries–Fulker analysis
 General populationMZDZ   Significance h2gSignificance c2g
Measure*MeanSDnProbandCo-twinnProbandCo-twinh2g (90% CI)c2g (90% CI)e2g (90% CIs)χ2Pχ2P
  • DZ, dizygotic; MZ, monozygotic.

  • *

    Means shown for scores before transformation for DF analysis: all are raw scores, except Performance IQ, which is a scaled score derived from two subtests.

Performance IQ101.0611.543084.3090.032584.6492.000.21
Listening to paragraphs5.961.77342.033.03251.803.920.51
Sentence structure21.853.404314.0015.443414.5918.910.82
Recalling sentences47.9010.603624.4227.423425.8232.590.36
Phonological STM0.001.0038−1.94−1.4641−1.96−0.920.61
Verb inflections95.0710.733564.5270.753570.8584.480.74
Figure 2.

Proportions of variance in language-deficit status attributable to genes (h2g), environmental influences shared by both twins (c2g) or other influences (e2g). The ‘other’ term incorporates influences traditionally referred to as nonshared environment and includes measurement error. WASI, Wechsler Abbreviated Scale of Intelligence; STM, short-term memory.

Bivariate analysis: do the same genes cause different language deficits?

The crucial next question was whether phonological STM and verb inflections are different indices of a common underlying language deficit, or whether the deficits have different origins. The relationship between the two measures, although statistically significant, was weak: Pearson correlation in the whole sample of 346 children is 0.299, P < 0.001; LI risk subgroup, r = 0.209, P = 0.007, n = 163; low risk subgroup, r = 0.267, P < 0.001, n = 183. This suggests that they are not different manifestations of the same underlying ability. A stronger test of common origins of the two deficits is given using a bivariate extension of DF analysis (see Purcell et al. 2001). In this method, one identifies probands on the basis of low scores on one measure (X) and then considers whether one can predict scores of their co-twins on another measure (Y). If the prediction is stronger for MZ than for DZ twins, this points to shared genetic origins for X and Y. Bivariate heritability, h2g.xy, is the bivariate analog of h2g and assesses the extent to which genetic factors are responsible for the lowered Y scores of probands with low X. In model fitting, h2g.xy was constrained to a lower bound of zero. Figure 3 shows the relationship between h2g.xy and the univariate values of h2g for X and Y. The genetic correlation, rg, estimates the extent to which genetic factors affecting X are the same as those affecting Y and is computed as h2g.xy/√(h2g.x·h2g.y). In principle, it is possible to have a high value of rg despite low group heritabilities for X and/or Y: this would indicate that, although small in magnitude, genetic effects on X and Y overlap substantially. In practice, however, rg is of interest only when univariate values of h2g are significant, and accordingly, bivariate analyses were conducted only for measures that showed significant group heritability. Another statistic that can be estimated from bivariate DF analysis is the phenotypic association between X and Y, which is estimated by the mean transformed score of probands on Y, regardless of zygosity. If X and Y are unrelated, this should be zero; if they are equivalent, it will be 1.

Figure 3.

The relationship between bivariate heritability and genetic correlation, based on Purcell et al. (2001).

The terminology X→Y indicates that probands were selected as low scorers on measure X, and X was then used to predict scores of co-twins on measure Y, after appropriate scaling of the data. Analyses of X→Y and Y→X will select different cases as probands, and hence, will not necessarily give the same results. Table 4 summarizes that bivariate DF analysis gave estimates close to zero for bivariate heritability (h2g.xy) of phonological STM, and verb morphology regardless of which test was used to identify probands. The other language measure that showed significant group heritability, Sentence Structure, showed no genetic overlap with phonological STM, but there was suggestive evidence of a link with verb inflections, with the estimates of bivariate heritability falling just short of statistical significance.

Table 4.  Results from bivariate DeFries–Fulker analysis
  MZDZ   Significance (h2g.xy)
Measure proband
selected on (X)

Co-twin measure (Y)


Co-twin (mean Y*)


Co-twin (mean Y*)
(90% CI)
r (whole




  • DZ, dizygotic; MZ, monozygotic; STM, short-term memory.

  • *

    Scaled for bivariate DF analysis.

Verb inflectionsPhonological STM350.328350.297
Phonological STMVerb inflections380.306410.427
Verb inflectionsSentence structure350.355350.161
Sentence structureVerb inflections430.517340.184
Phonological STMSentence structure380.495410.397
Sentence structurePhonological STM430.281340.348

The phenotypic association between deficits in phonological STM and verb inflections

In the whole sample of 346 children, 48 (13.8%) were impaired on phonological STM but not verb inflections, 39 (11.3%) were impaired on verb inflections but not phonological STM and 31 (9%) were impaired on both. This is a statistically significant association between deficits, χ2(1) = 22.9, P < 0.001, with ϕ coefficient = 0.257. When impairment on these two measures was considered in relation to parental report at 4 years, it was evident that children with a ‘double deficit’ were most likely to have come from the LI risk group. The percentages of cases in the LI risk group were 47% for the whole sample, 35% for those with no deficit on either test, 59% for those with a deficit on phonological STM only, 67% for those with a deficit on verb inflections only and 87% for those with a deficit on both measures. The frequency of LI risk cases differed significantly between the four groups: χ2(3) = 41.7, P < 0.001. Another way of describing the results is to say that 17.4% of children in the low risk group compared with 33.7% in the LI risk group had a single deficit (odds ratio 1.92), whereas 2.2% of those in the low risk group had a double deficit compared with 16.6% of those in the LI risk group (odds ratio 7.58).


Overall, these results agree with earlier research that indicated that phonological STM is a good marker of heritable language impairment in SLI. Our findings are also in agreement with predictions made by Rice and colleagues, in confirming that deficits in use of verb inflections commonly persist beyond the age of 4 years in children with language impairments and are heritable. Most crucially, this study reveals that impairments in use of verb inflections have distinctive genetic origins and cannot be explained away as secondary consequences of limitations of phonological STM. A finding of such selective influences on specific aspects of language development is discrepant with previous studies of language development in pre-school twins from the TEDS cohort. These have found significant genetic influence on a range of language skills but have suggested that there is little etiological differentiation between different aspects of language functioning (Viding et al. 2003) or indeed between verbal and nonverbal skills (Colledge et al. 2002; Purcell et al. 2001). The different findings may reflect the fact that these studies were conducted on relatively young children. It is difficult in pre-schoolers to obtain specific measures of the kinds of memory and syntactic functions that are the focus of current study, because performance may be more affected by factors such as articulatory limitations or poor attention.

Another point of difference between the current data and previous TEDS analyses is in our finding of no genetic influence on the vocabulary measure. Purcell et al. (2001) obtained significant estimates of heritability for low vocabulary in twins at 2 years of age (using the much larger dataset from which the current sample was selected); it is likely that the discrepancy reflects differences in the measures used, as the 2-year vocabulary measure was a parent-report measure of words used by their children, whereas our vocabulary measure required children to define words. It is also possible that the importance of genetic influences on vocabulary declines with age, but this would be an exception to a general rule that genes exert an increasing influence on cognition as children grow older (Plomin & Spinath 2004). The current study suggests that identification of more selective genetic influences on language development may depend crucially on both the age of the children and the measures that are used.

Nonword repetition and phonological STM

Results from the current study suggest that the nonword repetition test measures different underlying skills at different ages. In these 6-year-old children, performance on this task appeared to be influenced by articulatory constraints as well as by STM, insofar as many children made errors on two-syllable nonwords. A purer measure of memory was obtained by regressing out the effect of accuracy on two-syllable nonwords. When this was done, the resulting measure gave stronger differentiation between LI risk and low risk children than the raw score, and it also gave high estimates of heritability. Note that in older children, such as those studied by Bishop et al. (1996), most children score at ceiling on two-syllable nonwords, and hence, this alternative method of scoring would have little effect on results. It is also of interest to see that influences on STM for meaningful materials, as measured by Recalling Sentences, do not pattern closely with nonword repetition. Even though Recalling Sentences did a good job in differentiating between LI risk and low risk children, it did not seem to be a sensitive indicator of a heritable phenotype in these 6-year olds and showed substantial influence of shared environment, in this regard showing some similarity with the vocabulary measure. It is possible that this measure is sensitive to children's familiarity with vocabulary in the test items as well as indexing STM.

Verb inflections and grammatical impairment

The verb inflections measure showed no evidence for genetic overlap with phonological STM, but there was suggestive evidence (just short of statistical significance) of common genetic influence between this test and the CELF-R Sentence Structure subtest, which is a comprehension test assessing the ability to understand grammatically complex sentences. Sentence Structure uses a multiple-choice picture-selection format to assess understanding and requires no speech from the child. These data suggest that the genes that affect grammatical development may be implicated more generally in computation of syntactic relationships, rather than solely impacting on use of verb inflections. In this regard, the results are compatible with theorising by Van der Lely 2005) who has argued for a specific subtype of ‘grammatical SLI’ in which both expressive and receptive grammatical difficulties are explained in terms of a deficit in computational grammatical complexity. According to this account, children with such a deficit will have syntactic problems that extend beyond tense marking to affect all linguistic operations that involve computation of non-local dependencies between grammatical elements.

Why do deficits in phonological STM and verb inflections co-occur?

The bivariate analysis indicated no significant shared genetic influence on phonological STM and verb inflections, yet the current study, together with some previous studies (Conti-Ramsden et al. 2001; Norbury et al. 2001), finds a modest but significant phenotypic correlation between the two language traits. How, then, are we to explain this association?

A first point to note is that our sample size is small, and the standard errors surrounding estimates of h2g.xy are correspondingly large. We cannot rule out the possibility that genetic overlap between phonological STM and verb inflections might be found in a replication study. However, it is unlikely that any such overlap would be substantial given that estimates of h2g.xy in our data were close to zero.

One possibility to consider is that the phenotypic association is an artifact, arising because of selection bias. Our sample was selected so as to over-represent children whose parents expressed concern about language development when they were 4 years of age. Suppose the likelihood of a parent showing such concern is particularly high if a child has a double deficit, then children with double deficit will be more common in the sample than in an unselected population. This type of explanation is given some credence by the finding that children with a double deficit (i.e. both phonological STM and verb inflections impaired) were much more likely than those with a single deficit to have come from the LI risk group.

Another way to account for the association between verb morphology and phonological STM is to propose that a common environmental factor is implicated in both deficits. However, it is noteworthy that for neither test was there a significant influence of the common environment shared by both twins (see Table 3). It is possible, however, that some child-specific environmental factor might influence both tasks, such as nervousness in the test situation.

A final possibility is that non-random mating may lead to risk alleles from different genes co-occurring at above chance levels in the same individuals. Inbreeding will tend to increase homozygosity of the genome, leading to recessive effects being more evident at multiple loci. In addition, assortative mating of people with language impairments could lead to clustering of different risk alleles for LI in the offspring.

Broader theoretical implications

As noted in the Introduction, a number of authors have queried whether grammatical deficits in SLI provide evidence for the kind of specialized language acquisition device postulated by Chomsky (1986). Rather, they have suggested that grammatical deficits may arise as secondary consequences of other more domain-general impairments in perception, cognition or memory. In particular, theorists in the field of SLI have suggested that specific grammatical deficits might be explained in terms of limitations of auditory perception or STM (Baddeley et al. 1998; Bates 2004; Bishop 1997; Joanisse & Seidenberg 1998; Tallal 2000). However, over the past few years, evidence has accumulated to show that although children with SLI often have auditory perceptual problems, these cannot adequately account for their grammatical deficits (Bishop et al. 1999b; Norbury et al. 2001; Van der Lely et al. 2004) and do not appear to be heritable (Bishop et al. 1999a). The current study expands the list of factors that seem inadequate to account for grammatical impairments: the heritable deficits in verb inflections and syntactic comprehension seen in our sample cannot be explained in terms of weak phonological STM, low IQ, poor articulation or vocabulary limitations. Thus, most of the domain-general candidate explanations that have been put forward to explain grammatical deficits in SLI are inadequate to account for this pattern of results. Can we go further and conclude that there is a domain-specific syntactic module? Anyone attempting to argue such a point is in the difficult position of trying to prove a negative: we cannot rule out the possibility that problems in mastering syntax might be caused by some other domain-general cognitive factor that was not measured in our study. However, the scope of plausible domain-general explanations for grammatical ability is considerably constrained by our data.

On the other hand, our results also challenge the idea that the human language faculty could be the consequence of a single genetic ‘macromutation’ that made syntax possible. Our results sit more comfortably with contemporary views on evolution of language that maintain that a complex function such as language is likely to be the result of multiple adaptive specializations (Hauser et al. 2002; Pinker 2003). The current study provides evidence for genetic variation relevant to two such specializations: a capacity for retaining strings of unfamiliar speech sounds for brief periods of time and a capacity for carrying out grammatical computations. We suggest that evidence for other specialized systems may be found as researchers move away from reliance on standardized clinical tests and start instead to use more theoretically motivated language measures. Finally, we note that these results have important implications for those conducting molecular genetic studies of developmental language impairments: they suggest that measures of computational grammatical skills will define a heritable phenotype that will have different genetic origins from deficits in phonological STM.


We thank the twins and their families and teachers who participated in this research. This study would not have been possible without generous assistance of Robert Plomin, Bonamy Oliver, Alexandra Trouton and other staff from the Twins Early Development Study. Thanks are also due to Barbara Arfe and Lesley Bretherton for assistance with data collection, to Mabel Rice for making available pre-publication materials from the Rice–Wexler Test of Early Grammatical Impairment and to Simon Fisher for helpful discussion about genetic mechanisms that could lead to comorbidity. This research was supported by a programme grant from the Wellcome Trust.