• Open Access

CNTNAP2 variants affect early language development in the general population


  • A. J. O. Whitehouse,

    Corresponding author
    1. Telethon Institute for Child Health Research, Centre for Child Health Research
    2. Neurocognitive Development Unit, School of Psychology, University of Western Australia, Perth, Australia
    Search for more papers by this author
  • D. V. M. Bishop,

    1. Neurocognitive Development Unit, School of Psychology, University of Western Australia, Perth, Australia
    2. Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
    Search for more papers by this author
  • Q. W. Ang,

    1. School of Women's and Infants' Health, University of Western Australia, Perth, Australia
    Search for more papers by this author
  • C. E. Pennell,

    1. School of Women's and Infants' Health, University of Western Australia, Perth, Australia
    Search for more papers by this author
  • S. E. Fisher

    1. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    2. Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
    Search for more papers by this author


This article is corrected by:

  1. Errata: CNTNAP2 variants affect early language development in the general population Volume 11, Issue 4, 501, Article first published online: 7 June 2012

Dr A. Whitehouse, Telethon Institute for Child Health Research, Centre for Child Health Research, University of Western Australia, 100 Roberts Road, Subiaco, Western Australia. E-mail: awhitehouse@ichr.uwa.edu.au


Early language development is known to be under genetic influence, but the genes affecting normal variation in the general population remain largely elusive. Recent studies of disorder reported that variants of the CNTNAP2 gene are associated both with language deficits in specific language impairment (SLI) and with language delays in autism. We tested the hypothesis that these CNTNAP2 variants affect communicative behavior, measured at 2 years of age in a large epidemiological sample, the Western Australian Pregnancy Cohort (Raine) Study. Singlepoint analyses of 1149 children (606 males and 543 females) revealed patterns of association which were strikingly reminiscent of those observed in previous investigations of impaired language, centered on the same genetic markers and with a consistent direction of effect (rs2710102, P = 0.0239; rs759178, P = 0.0248). On the basis of these findings, we performed analyses of four-marker haplotypes of rs2710102–rs759178–rs17236239–rs2538976 and identified significant association (haplotype TTAA, P = 0.049; haplotype GCAG, P = .0014). Our study suggests that common variants in the exon 13–15 region of CNTNAP2 influence early language acquisition, as assessed at age 2, in the general population. We propose that these CNTNAP2 variants increase susceptibility to SLI or autism when they occur together with other risk factors.

Although nearly all children learn to talk, there is substantial variation in the timing of language development. Around 10% of children can talk in sentences at 18 months of age, whereas the slowest 10% produce at most a handful of single words at this age (Neligan & Prudham 1969). Many late-talkers are actually ‘late bloomers', catching up with their peers by the time they are 3 or 4 years old (Thal & Katich 1997). Nevertheless, in some children late talking is the first indication of persistent language impairment (Haynes & Naidoo 1991) and in a minority of these it may be a symptom of autistic disorder (Hagberg et al. 2010).

It is often assumed that the age at which a child develops language is largely dependent on the language input he or she receives. However, a recent epidemiological study found that family history of delayed language development predicted late talking in 24-month-olds, while other factors, such as maternal education, birth risks and maternal depression, did not have significant influence (Zubrick et al. 2007). Data from twin studies indicate that inherited factors make substantial contributions to early language development (Dale et al. 1998) and affect levels of performance on components of language in the normal range of abilities (Kovas et al. 2005). Still, at this point very little is known regarding the specific genetic variants that are associated with language development in toddlers from the general population. Here, we address this issue through analyses of early communicative behavior in a large epidemiological sample.

Our investigations were tightly constrained by prior evidence from molecular studies of neurodevelopmental disorders, which have converged on CNTNAP2 as a gene with relevance to language learning. One notable study reported associations between markers in CNTNAP2 and parental report of ‘age at first word’ in probands with autism (Alarcón et al. 2008). Independent analyses of children with specific language impairment (SLI), but not autism, identified association of CNTNAP2 variants with reduced performance on quantitative indices of language ability (Vernes et al. 2008). Intriguingly, these separate investigations of distinct language-related disorders (Whitehouse et al. 2007) highlighted the same markers and alleles within CNTNAP2 as risk factors. CNTNAP2 encodes a member of the neurexin superfamily – neuronal transmembrane proteins involved in cell adhesion – and shows enriched expression in language-related circuits of the brain (Abrahams et al. 2007). Moreover, this gene is directly regulated by FOXP2, a transcription factor mutated in rare monogenic forms of speech and language disorder (Fisher & Scharff 2009).

Thus, in the current investigation, we carried out a hypothesis-driven study of links between common CNTNAP2 variants and early language proficiency, assessed at 24 months of age, in an epidemiological sample of over a thousand children (the Raine sample). We specifically targeted the same single-nucleotide polymorphisms (SNPs) across the CNTNAP2 gene as those previously investigated in SLI by Vernes et al. (2008). Our hypothesis was that the particular CNTNAP2 markers implicated in language impairments of SLI and delayed language in autism would extend their influence beyond disorder, to show association with early language acquisition in the general population.

Materials and methods


The Western Australian Pregnancy Cohort (Raine) Study is a longitudinal investigation of 2900 pregnant women and their offspring consecutively recruited from maternity units between 1989 and 1991 (Newnham et al. 1993). The inclusion criteria were (1) English language skills sufficient to understand the study demands, (2) an expectation to deliver at King Edward Memorial Hospital (KEMH) and (3) an intention to remain in Western Australia to enable future follow-up of their child. Ninety percent of eligible women agreed to participate in the study.

From the original cohort, 2868 children have been followed over two decades. Participant recruitment and all follow-ups of their families were approved by the Human Ethics Committee at King Edward Memorial Hospital and/or Princess Margaret Hospital for Children in Perth. The Raine sample is representative of the larger Australian population (88% Caucasian); only those children with both biological parents of White European origin were included in the current analyses. DNA and phenotypic data were available for 1149 children (606 males and 543 females).

Phenotypic measure

Our study specifically concerned early indicators of language acquisition in toddlers, where direct assessment of ability can be challenging. For phenotyping at such young ages, parental report has been shown to provide a robust alternative to direct testing (Johnson et al. 2008). The Communication subscale of the Infant Monitoring Questionnaire (IMQ) (Bricker & Squires 1989) was administered when the child was 2 years old. This parent-completed checklist contains seven items assessing early communicative behavior, such as protoimperative actions (e.g. looking or pointing at an item to request it), the following of simple commands (e.g. ‘come here’, ‘sit down’), and the use of two- or three-word strings (e.g. ‘go, car’, ‘shut door’). Parents indicate whether their child shows this behavior always (2 points), sometimes (1 point) or never (zero points), yielding an overall score ranging from 0 to 14. The validity and reliability of the IMQ range from 0.85 to 0.9 (Bricker et al. 1988). Questionnaires with one missing item (n = 155) were prorated to yield a score out of 14. Scores were transformed from centile equivalents to z-scores to give a normally distributed variable.

Genetic data

For the Raine study, DNA samples have been collected using standardized procedures at 14 or 16 years of age, followed by genotyping on an Illumina 660 Quad Array (San Diego, CA, USA). SNPs that did not meet quality control criteria (call rate ≥95%; minor allele frequency >0.05; Hardy–Weinberg disequilibrium P value >0.000001) were discarded. It is important to emphasize that, although genome-wide SNP data have been collected for this sample, we did not perform a hypothesis-free genome-wide association scan for our measure of interest. Instead, this study was a tightly constrained hypothesis-driven candidate gene approach, based on prior literature, which considered a set of 30 SNPs from the CNTNAP2 gene [matching those from Vernes et al. (2008)]. This led us to a focused analysis of the rs2710102–rs759178–rs17236239–rs2538976 multimarker combination. No other markers from elsewhere in the genome were assessed for association with early communicative behavior in this sample.

Data analysis

Our panel of 30 SNPs matching those used to study SLI in previous CNTNAP2 analyses (Vernes et al. 2008) constituted the majority of the 38 SNPs assessed in the prior study. Each biallelic SNP was first tested for association with the quantitative measure of the communication phenotype using an allelic test of association within R (R Development Core Team 2009). On the basis of the previous findings by Vernes et al. (2008), our model assumed that the risk allele of the SNP had a dominant mode of action. Consideration of the singlepoint SNP findings, and their convergence with earlier studies, led us to test the four-marker haplotypes of rs2710102–rs759178–rs17236239–rs2538976, analyzing the three common alleles using R. Our analysis of each such multimarker allele involved two factors: (1) comparison between harboring two copies and one copy of the haplotype and (2) comparison between harboring two copies and no copies of the haplotype – allowing us to separately assess the modes of action of each of the three alleles. To minimize multiple testing, we did not analyze any further marker configurations. Linkage disequilibrium (LD) among CNTNAP2 SNPs was determined with Haploview version 4.2 (http://www.broadinstitute.org/haploview/haploview) (Barrett et al. 2005). Haplotypes were inferred using SimHap version 1.0.2, and the most-likely haplotypes of each individual used as inputs for the R analyses described above.

Principal components analysis of genome-wide SNP data with Eigenstrat (Price et al. 2006) has revealed evidence of population stratification in the Raine sample, and so the first two principal components were included as cofactors in all analyses. This procedure has been used previously in genetic analyses of the Raine cohort (Paracchini et al. 2011).


We assessed the same panel of markers across CNTNAP2 as Vernes et al. (2008), but focusing instead on a quantitative measure of early language in a general population cohort. This panel included most of the key SNPs that were significantly associated in that study, as well as the flanking markers from elsewhere in the gene that had not shown association. Our hypothesis was that a similarly localized subset of SNPs within the panel would show evidence of association in our sample, against a background of nonsignificant results. The pattern of single SNP associations in our general population sample (Table 1) was strikingly reminiscent of that observed by Vernes et al. (2008) in their SLI families, highlighting an almost identical subset of markers, located in the exon 13–15 region of CNTNAP2. Two neighboring SNPs – rs2710102 and rs759178 – showed nominal significance (P = 0.0239 and 0.0248) and another three markers in their vicinity – rs17236239, rs2538976 and rs2710117 – displayed suggestive trends (P values between 0.05 and 0.085). These markers corresponded to those showing strongest associations in the Vernes et al. (2008) study of SLI and overlapped with the most significant findings from the Alarcón et al. (2008) investigation of language delay in autistic probands. The effects observed were consistently in the same direction as prior studies; the alleles that correlated with reduced language performance in the Raine sample (Table 2) were the same as those identified as putative susceptibility alleles in studies of disorder [c.f. Table S3 in Vernes et al. (2008) and Table S1 in Alarcón et al. (2008)]. For example, risk alleles in SLI and autism were C for marker rs2710102 (C/T polymorphism) and G for marker rs759178 (G/T polymorphism); these same alleles were associated with lower early language scores in our general population sample (Table 2).

Table 1.  Singlepoint association between CNTNAP2 variants and a quantitative measure of early communicative behavior
SNP markerPosition (bp)*LocationSNPMAF P value§SLI association
  1. ns, not significant.

  2. *Position based on the hg19 assembly of the Human Genome sequence.

  3. Alleles of each SNP are given with respect to the forward strand of chromosome 7.

  4. Minor allele frequency within the Raine sample.

  5. § P values <0.1 are shown, with P values <0.05 denoted by an asterisk.

  6. Summary of findings from the Vernes et al. (2008) study of SLI: ‘+’ indicates SNPs showing significant association in that study, whereas ‘−’ denotes negative results.

rs7806058146007792Intron 1A/G0.35ns
rs6946112146059217Intron 1C/T0.27ns
rs12703803146062909Intron 1T/G0.29ns
rs2058377146090070Intron 1A/G0.31ns
rs12667234146111176Intron 1A/G0.30ns
rs2888335146124574Intron 1T/C0.30ns
rs7805539146160278Intron 1G/A0.28ns
rs4726793146276890Intron 1A/G0.20ns
rs10277654146352576Intron 1T/C0.44ns
rs7794745146489606Intron 2A/T0.26ns
rs6945085146691220Intron 3T/C0.09ns
rs1024676146715861Intron 3C/T0.38ns
rs10282158146738067Intron 3T/A0.06ns
rs7812091146740577Intron 3T/C0.38ns
rs10500170146848251Intron 8A/G0.16ns
rs1603453146908919Intron 8T/A0.11ns
rs1603450146913540Intron 8G/A0.180.0426*
rs10251377147117454Intron 10A/G0.25ns
rs851715147526906Intron 13A/G0.32ns+
rs1177007147546371Intron 13A/G0.31ns
rs10246256147554807Intron 13T/C0.31ns+
rs2710102147574390Intron 13C/T0.490.0239*+
rs759178147575112Intron 13G/T0.490.0248*+
rs17236239147582305Intron 13A/G0.350.0851+
rs2538976147585819Intron 13G/A0.500.0535+
rs2538963147599446Intron 13G/A0.32ns
rs2710117147601772Intron 14A/T0.380.0771+
rs10240503147674978Exon 15A/G0.11ns
rs12155129147856865Intron 17A/G0.06ns
rs11980146147956733Intron 20A/G0.34ns
Table 2.  Effects of singlepoint CNTNAP2 variants on early communicative behavior
SNP markerSNP*Putative risk alleleNon-risk homozygote§Heterozygote§Risk homozygote§
  1. na, not applicable.

  2. *Alleles of each SNP are given with respect to the forward strand of chromosome 7.

  3. Allele which was correlated with reduced scores in the Raine sample.

  4. Allele which was correlated with reduced scores in the Vernes et al. (2008) study of SLI.

  5. §Mean (and SD) scores of the language phenotype at age 2 years (z-score transformed scores on the Communication subscale of the Infant Monitoring Questionnaire) according to diploid genotype in the Raine sample.

rs1603450G/AGna0.649 (0.841)0.204 (0.961)0.221 (0.958)
rs2710102C/TCC0.356 (0.977)0.189 (0.958)0.191 (0.933)
rs759178G/TGG0.356 (0.977)0.191 (0.958)0.190 (0.935)
rs17236239A/GGG0.293 (0.969)0.166 (0.959)0.246 (0.907)
rs2538976G/AGG0.336 (0.972)0.201 (0.954)0.180 (0.949)
rs2710117A/TAA0.368 (0.954)0.228 (0.966)0.186 (0.937)

In the main cluster of associated SNPs – rs2710102, rs759178, rs17236239, rs2538976 – the markers were in strong LD, with D′ values of 1 for all pairwise comparisons (Figure S1, Supporting information). Notably, these four SNPs were central to a nine-marker risk haplotype previously studied by Vernes et al. (2008). We therefore constructed multimarker haplotypes using these four neighboring SNPs and identified three common combinations (TTAA, CGGG and CGAG), representing 98% of individuals (Table 3). As expected from the direction of effects observed in the singlepoint results (Table 2) and consistent with prior published results (Vernes et al. 2008), the TTAA multimarker allele was associated with higher scores on the measure of early language, whereas the CGGG and CGAG alleles were associated with reduced scores. TTAA showed nominal significance (P = 0.0488) and CGGG displayed a suggestive trend (P = 0.0627), but the strongest association was for CGAG (P = 0.0014); this remains significant after accounting for the number of tests that we performed in the study (30 singlepoint tests and 3 haplotypic analyses). Children carrying two copies of this haplotype obtained substantially lower scores (mean = −0.355, SE = 0.169) than those with one copy (mean 0.313, SE = 0.055) or no copies (mean = 0.223, SE = 0.033).

Table 3.  Association of rs2710102–rs759178–rs17236239–rs2538976 haplotypes with a quantitative measure of early communicative behavior
Haplotype*Frequency P valueFactor
  1. *Alleles are given with respect to the forward strand of chromosome 7.

  2. Frequency of haplotype within the Raine sample.

  3. Analysis in R assessed two factors: 1 = comparison between harboring two copies and one copy of the haplotype; 2 = comparison between harboring two copies and no copies of the haplotype. This column indicates which factor yielded the most significant result, as reported in the preceding column.



Our results suggest that variants in the exon 13–15 region of CNTNAP2 previously associated with deficits in SLI (Vernes et al. 2008) and delayed language in autism (Alarcón et al. 2008; Poot et al. 2010) also affect the early stages of language development in children from the general population. This was a targeted hypothesis-driven study of a single gene, focusing on specific markers that have been strongly implicated in multiple prior reports of language-related disorder, rather than a genome-wide search for new variants.

The consistencies in findings across multiple investigations are noteworthy given several key differences in the natures of these studies. Alarcón et al. (2008) studied probands with autism in an American sample, employing a parental report of language delay. Vernes et al. (2008) assessed a UK sample, examined language test scores in older children and focused on families selected for SLI. In this study, we investigated an Australian sample, used a parental report measure assessing language development at age 2, and tested for association across the normal range. Despite the obvious differences in sample ascertainment and phenotypic characterization, there was agreement not only regarding the pattern of SNPs that were associated but also in the direction of allelic effects.

In our study, we constructed a single set of haplotypes using four neighboring markers in high LD which, based on the singlepoint pattern of results, appeared to form a core site of association. Although we did not genotype every associated marker from the Vernes et al. (2008) study, these four markers were central to the nine-marker haplotypes that they previously assessed in SLI. Thus, our haplotypic alleles would be expected to capture much of the relevant variation from the earlier investigation. Indeed, haplotypic analyses from the two studies are generally concordant – both investigations found that the TTAA multimarker allele of rs2710102–rs759178–rs17236239–rs2538976 is associated with higher scores, whereas the alternative CGGG/CGAG alleles are associated with reduced performance (c.f. Table S4 of Vernes et al. 2008). However, although the CGGG allele showed the strongest association in the SLI study, our analyses of the Raine sample identified much more significant effects for the rare CGAG combination, which here had particularly dramatic effects on language scores. These differences in haplotypic background could relate to the distinct population history of the samples. Regardless, the data suggest that in the vicinity of rs2710102–rs759178–rs17236239–rs2538976 there lie specific functional risk variants (as yet unidentified) with particular relevance to early language acquisition. Of note, the CNTNAP2 gene locus is one of the largest in the genome and could potentially contain multiple additional sites with functional relevance to neurodevelopmental phenotypes, to be clarified in future with high-density SNP screening and sequence-based strategies.

A methodological conclusion from our study is that a simple parental questionnaire focused on early language development can provide valuable phenotypic information for molecular genetic analyses, which may be particularly pertinent given the difficulties in directly assessing a child's performance in the earliest years of life. This is consistent with the core findings of Alarcón et al. (2008), who reported that rs2710102 and neighboring variants were associated with just a single item from the Autism Diagnostic Inventory – Revised (Lord et al. 1994),‘age at first word’, in autistic probands. In addition, in a recent study of multiple traits contributing to the autistic spectrum, Steer et al. (2010) reported a nominal association between rs17236239 and a factor they termed ‘language acquisition’, which primarily loaded on parental report measures of early language development. Our conclusion is also in line with the findings of Johnson et al. (2008), who showed good agreement between parent report and direct assessment of children's abilities at 2 years of age.

In terms of theoretical implications, it is clear that these common CNTNAP2 variants are not sufficient by themselves to account for language and communication disorders in children. This conclusion is in line with the current consensus that both SLI and autism are complex disorders resulting from the combined effect of multiple influences (Geschwind 2008). We hypothesize that CNTNAP2 variants which usually yield only a small boost or lag in language acquisition will have more marked consequences when they occur in concert with other genetic or environmental risk factors. Bishop (2010) suggests that autism may result from epistatic rather than additive interactions between genes. From this perspective, it would be of considerable interest to see whether there are additive or interactive effects of CNTNAP2 with genetic variants affecting social cognition, such as a recently described locus on chromosome 5p14 (St Pourcain et al. 2010).


The authors would like to acknowledge the National Health and Medical Research Council (NHMRC) for their long-term contribution to funding the Raine study over the last 20 years. Core Management of the Raine study has been funded by the University of Western Australia (UWA), the UWA Faculty of Medicine, Dentistry and Health Sciences, the Raine Medical Research Foundation, the Telethon Institute for Child Health Research and the Women's and Infants Research Foundation. The genotyping of the Raine cohort was funded by a project grant from the NHMRC (572613). The authors are extremely grateful to the study participants and their families as well as the Raine study team for cohort co-ordination and data collection. A.J.O.W. is supported by the NHMRC. S.E.F. is supported by the Royal Society, the Simons Foundation Autism Research Initiative and the Max Planck Society. D.V.M.B. is supported by the Wellcome Trust.