Dense-map genome scan for dyslexia supports loci at 4q13, 16p12, 17q22; suggests novel locus at 7q36

Authors


Corresponding author: Dr L. L. Field, Department of Medical Genetics, Rm C234, BC Women's Hospital and Health Centre, 4500 Oak Street, Vancouver, BC, Canada V6H 3N1. E-mail: llfield@mail.ubc.ca

Abstract

Analysis of genetic linkage to dyslexia was performed using 133,165 array-based SNPs genotyped in 718 persons from 101 dyslexia-affected families. Results showed five linkage peaks with lod scores >2.3 (4q13.1, 7q36.1-q36.2, 7q36.3, 16p12.1, and 17q22). Of these five regions, three have been previously implicated in dyslexia (4q13.1, 16p12.1, and 17q22), three have been implicated in attention-deficit hyperactivity disorder (ADHD, which highly co-occurs with dyslexia; 4q13.1, 7q36.3, 16p12.1) and four have been implicated in autism (a condition characterized by language deficits; 7q36.1-q36.2, 7q36.3, 16p12.1, and 17q22). These results highlight the reproducibility of dyslexia linkage signals, even without formally significant lod scores, and suggest dyslexia predisposing genes with relatively major effects and locus heterogeneity. The largest lod score (2.80) occurred at 17q22 within the MSI2 gene, involved in neuronal stem cell lineage proliferation. Interestingly, the 4q13.1 linkage peak (lod 2.34) occurred immediately upstream of the LPHN3 gene, recently reported both linked and associated with ADHD. Separate analyses of larger pedigrees revealed lods >2.3 at 1–3 regions per family; one family showed strong linkage (lod 2.9) to a known dyslexia locus (18p11) not detected in our overall data, demonstrating the value of analyzing single large pedigrees. Association analysis identified no SNPs with genome-wide significance, although a borderline significant SNP (P = 6 × 10–7) occurred at 5q35.1 near FGF18, involved in laminar positioning of cortical neurons during development. We conclude that dyslexia genes with relatively major effects exist, are detectable by linkage analysis despite genetic heterogeneity, and show substantial overlapping predisposition with ADHD and autism.

Dyslexia is a common neurodevelopmental disorder of language acquisition and processing (Habib 2000; Shaywitz & Shaywitz 2005), which manifests primarily as difficulty in learning to read and spell, independent of general intelligence and educational opportunity. Most specialists agree that dyslexia is characterized by core deficits in processing the basic phoneme units of language (Bogliotti et al. 2008; Ramus et al. 2003; Van Orden & Goldinger 1996). Dyslexia affects 6–10% of school age children (DeFries 1989; Shaywitz et al. 1990) and results in major emotional, social, educational and economic repercussions (Spreen 1988). If genetically at-risk children could be identified prior to onset of language difficulties, long-term sequelae could be largely prevented by early educational intervention.

The best evidence for a genetic basis to dyslexia comes from comparisons of monozygotic (MZ) vs. dizygotic (DZ) twins, showing a higher rate of dyslexia-concordance in MZ than in DZ twins (83% vs. 29%, Bakwin 1973; 68% vs. 38%, DeFries & Alarcon 1996), and among dyslexia-discordant twins, a higher correlation of word recognition and spelling scores in MZ than in DZ twins (DeFries et al. 1987). Studies of twins with at least one dyslexic member have estimated heritabilities ranging from 0.61 for reading performance (Wadsworth et al. 2010) to 0.93 for the phonological coding component of word recognition (Olson et al. 1989). Although twin studies clearly show a genetic basis for dyslexia, the inheritance mode is unclear.

The demonstration of significant linkage between dyslexia and mapped genetic markers has provided definitive proof of a genetic basis to dyslexia, as well as localizing the susceptibility genes to specific genomic regions. Nine dyslexia loci, assigned gene symbols DYX1-DYX9 by the Human Gene Nomenclature Committee, have been located on chromosomal regions 1p34-p36 (DYX8), 2p15-p16 (DYX3), 3p14.1-q13 (DYX5), 6p22.2 (DYX2), 6q12-q14.1 (DYX4), 11p15.5 (DYX7), 15q21 (DYX1), 18p11 (DYX6) and Xq27 (DYX9) (for review and references, see Bates et al. 2007). Additional dyslexia linkages identified through whole genome scans have been reported at 2p22.3 (Loo et al. 2004), 2q22.3 (Bates et al. 2007; Raskind et al. 2005), 4p15.33-p16.1 (Bates et al. 2007), 4q13.1 (Brkanac et al. 2008), 7q32 (Bates et al. 2007; Kaminen et al. 2003), 10q26 (Loo et al. 2004), 12p13 (Brkanac et al. 2008), 13q22 (Fisher et al. 2002), 16p12-p13 (Loo et al. 2004), 17p13.3 (Bates et al. 2007), 17q22 (Loo et al. 2004) and 17q24.2 (Brkanac et al. 2008).

While genome-wide association studies (GWAS) are designed to detect effects of common predisposing genes on the phenotype being studied, genome-wide linkage analyses excel at detecting effects of rarer genes with relatively major effects. The goal of linkage scans is to identify these major (large-effect) dyslexia genes, in order to develop early-intervention genetic testing and, more generally, to increase understanding of the genetic control of normal language-related brain development. Here, we report results of a genome-wide linkage search for dyslexia-predisposing genes using 133,165 genomic markers typed in 718 persons from 101 dyslexia-affected families. This represents the first search for dyslexia genes using a high-density array-based single-nucleotide polymorphism (SNP) marker map.

Materials and methods

Families

Families with two or more siblings affected with phonological coding dyslexia (PCD) were ascertained primarily through probands attending schools for learning-disabled children in southern Alberta and British Columbia, Canada. Unaffected siblings were also sampled whenever possible. All subjects were >8 years of age, and (with the exception of four unaffected parents in four families), all subjects were of European ancestry. Informed consent was obtained from each subject or subject's parent. Research was performed in compliance with the Code of Ethics of the World Medical Association Helsinki Declaration and the Ethics Review Boards of the University of Calgary and University of British Columbia.

The current study families were derived from 100 families previously analyzed by our research group (Hsiung et al. 2004; Tzenova et al. 2004). However, due to budgetary constraints, the total number of genotyped individuals in the current array-based study was considerably lower than in our previous microsatellite-based studies (718 vs. 914). One additional extended kindred with four dyslexic members was also studied, bringing the total number of families to 101.

PCD phenotype

Phonological coding skills were the basis for assigning a qualitative dyslexia status (affected/unaffected/uncertain). Phonological coding was tested with ‘word attack’ (nonword pronunciation) subtests from the Woodcock Reading Mastery Test (Woodcock 1987) and the Woodcock–Johnson Psychoeducational Test – Revised (Woodcock & Johnson 1989). Additional psychometric tests were used to assist in determining the degree of certainty (not severity) for the assigned affection status (i.e. definite vs. probable). Phonological awareness was assessed using the Auditory Analysis Test (Rosner & Simon 1971), spelling was assessed with the Wide Range Achievement Test (Jastak & Wilkinson 1984) and general intelligence was measured with short forms of the Wechsler Intelligence Scale (Wechsler 1974, 1981).

Children (8–18 years old) were diagnosed as affected if there was at least a 2-year difference between chronological age and phonological coding test performance (for further details see Field & Kaplan 1998). Only one of the word attack subtests had published norms for subjects over 18 years of age. Therefore, for adult subjects, a structured interview to collect reported history of reading problems was also used to assist in assigning PCD status. Dr. Kaplan and a reading specialist independently reviewed each subject's psychometric scores and assigned subjects to one of five categories: 1 = definitely unaffected, 2 = probably unaffected, 3 = uncertain, 4 = probably affected and 5 = definitely affected. This coding scheme produced high inter-rater reliability (κ = 0.84), with 100% agreement between the raters for affected (codes 4 or 5) and unaffected status (codes 1 or 2). In other words, in no case did one rater assign an affected status and the other rater an unaffected status. Any untested individual, whether deceased or unavailable, was considered uncertain (code 3), regardless of family history.

In this study, linkage and association analysis was performed using only three categories (affected/unaffected/uncertain). However, the certainty rating was used when possible to select subjects from our DNA bank for array-genotyping by preferentially selecting those with ‘definite’ (codes 1 or 5) status and by omitting less informative persons with uncertain (code 3) status. If we assume that ‘definitely affected’ subjects tend to be more severely affected, then preferentially including severely affected individuals who may have a stronger genetic component to the phenotype should increase the power of the analyses to detect linkage (Leal & Ott 2000).

Samples and quality control (QC)

A total of 749 persons from our 101 families were genotyped using Affymetrix 500K arrays (see SNP genotyping and QC section below), and 742 samples with BRLMM-algorithm call rates ≥90% were retained in the study (i.e. successful calls for ≥90% of SNPs on that individual's array). We also retained results from three additional persons who were critical to the linkage analyses but whose call rates were slightly below 90% (87.5%, 89.4% and 89.7%). The mean and median call rates for these 745 samples were 97.8% and 98.5%, respectively. Data from the 745 samples were then submitted to further QC analyses using the PLINK software designed to perform advanced error checking for high-density array genotyping data (Purcell et al. 2007; http://pngu.mgh.harvard.edu/∼purcell/plink/). This resulted in the omission of 10 additional individuals: five because they were confirmed to be one of a dyslexia-concordant MZ twin pair (mean sample identity = 0.9995; the twin with the highest call rate was retained for the linkage analyses), two due to excessive intra-individual heterozygosity (F < −0.1) suggesting DNA contamination, two because their presence created high (>5%) family Mendelian error rates suggesting sample mixup or misspecified paternity and one because genomic sex was different from recorded sex suggesting possible sample or status mixup. As expected theoretically, in the four families with one European and one non-European ancestry parent, the children showed increased heterozygosity (F −0.06 to −0.08; all subjects retained in study). As a result of the above QC decisions, two complete families (six more persons) were removed because the nuclear family now had only one child (no linkage information). This reduced the original dataset from 101 to 99 families. Thus, 745 – 10 – 6 = 729 samples from 99 families were available for analysis of linkage. Among these 99 independent families, 65 were nuclear families; 48/65 (74%) of the nuclear families contained genotype data for both parents and 18/65 (28%) contained more than two affected members – in other words, the ‘core’ affected sibpair plus other affected siblings or affected parents. The remaining 34 study families were extended pedigrees containing an affected sibpair and additional affected members such grandparents, cousins, aunts/uncles.

Due to constraints of the MERLIN linkage analysis programs, the size of some extended families needed to be reduced to achieve a maximum size of 22 bits (where bits = 2[# nonfounders] – [# founders]). Therefore, (a) six of the larger extended families were broken into subfamilies, resulting in a total of 112 families for MERLIN analysis (four families were divided into two parts, one family into three parts, and one large pedigree into eight subfamilies) and (b) 11 genotyped persons with unaffected or ‘uncertain’ status were deleted to further reduce the size of problematic families. As a result, the final dataset included 112 families with 729 – 11 = 718 genotyped individuals. To minimize the loss of linkage information, nine genotyped individuals were duplicated at the breakage points in split families (i.e. the duplicated person was a child in one subfamily and a parent in another subfamily).

In summary, the final analysis dataset consisted of 112 families and included a total of 830 individuals, of whom 718 (87%) were array-genotyped. The remaining 112 persons were untyped but were necessary to the linkage analyses (e.g. unavailable parents or connecting persons in extended pedigrees). The average family size was 7.4, ranging from 4 to 23 members. Among the 718 genotyped persons, 400 (56%) were affected with dyslexia, 233 (32%) were unaffected and 85 (12%) were ‘uncertain’ status. The frequency of male sex was 257/400 (64%) among affected (sex ratio 1.8:1), 83/233 (36%) among unaffected and 47/85 (55%) among persons with ‘uncertain’ status. The families include an unusual number of dyslexia-concordant MZ twins (five pairs), which may be due to ascertainment bias.

SNP genotyping and QC

Genomic DNA was extracted from whole blood as previously described (Field & Kaplan 1998). For each person, a 250 ng sample of DNA was genotyped for 262,217 SNPs on an NspI array from the Affymetrix Human Mapping 500K Array Set (one array of the 2-array 500K set) using an Affymetrix GeneChip® Instrument System, following manufacturer's protocols (https://www.affymetrix.com/support/downloads/manuals/500k_assay_manual.pdf). Genotyping results were stored and managed using the Progeny Lab 7 database system (www.progenygenetics.com; Progeny software). Although the 718 study samples had high individual call rates (see above), we further analyzed these samples using PLINK to assess SNP data quality. Since undetected genotyping errors have a strong negative impact on true linkage signals, we applied highly stringent criteria to remove error-prone SNPs. We excluded SNPs having call rates <98% in the 718 samples (85,717 SNPs), SNPs with minor allele frequency <5% in 116 unaffected family founder members (57,940 SNPs) and SNPs that failed an exact test of HW equilibrium with p value <0.001 in the unaffected founders (221 SNPs). The resulting quality-controlled (QC) dataset contained 133,165 SNPs (129,052 SNPs omitted, at least 14,826 of which failed more than one criterion). As a result of removing SNPs with demonstrated or potential lower quality, the mean individual genotyping call rate rose from 97.8% to 99.3%.

Linkage analyses

SNP genotypes for the 718 family members were output from the Progeny database into the MERLIN linkage analysis program, which was designed to efficiently analyze high-density SNP data (www.sph.umich.edu/csg/abecasis/Merlin; Abecasis & Wigginton 2005; Abecasis et al. 2002). The Affymetrix 500K array database software assigned SNP physical locations based on NCBI build 36 (2006) as well as cM locations. MERLIN processes only markers with different cM positions to the 4th decimal place, therefore Progeny deletes other markers with the same cM location (although different base pair positions) from the MERLIN analysis files. This reduced the 133,165 QC SNPs to a MERLIN input dataset of 128,435 SNPs. When employing a highly dense marker map, it is essential to remove or adjust for markers displaying strong linkage disequilibrium (LD), because LD can falsely inflate linkage signals (Huang et al. 2004). We used the LD modelling option in MERLIN to delete markers showing LD (r2) >0.3, which resulted in a severely pruned marker map with a final linkage analysis dataset containing 47,940 SNPs. The chromosomal distribution of SNPs after LD pruning was as follows (number in brackets): chr.1 (3664), 2 (3822); 3 (3278); 4 (3134); 5 (2971); 6 (3009); 7 (2640); 8 (2513); 9 (2210); 10 (2471); 11 (2244); 12 (2372); 13 (1916); 14 (1573); 15 (1397); 16 (1460); 17 (1135); 18 (1576); 19 (695); 20 (1214); 21 (766), 22 (605); X (1275). MERLIN detects Mendelian inheritance incompatibilities and deletes genotypes for any problematic SNP in the family showing incompatibility. Note that these incompatibilities were due to genotype call errors rather than misspecified paternity, since array comparisons during PLINK QC analyses verified genetic relationships between family members.

MERLIN was used to perform both parametric and nonparametric multipoint linkage analysis of the qualitative dyslexia phenotype (affected/unaffected/uncertain). Use of the ‘uncertain’ phenotype allows those individuals to contribute marker information but remain neutral with respect to dyslexia status in linkage analyses. Since the parameters of any dyslexia-predisposing gene are unknown, we used two generalized parametric models to attempt to detect major susceptibility genes with incomplete penetrance: a dominant (DOM) model (disease allele A frequency 0.05, penetrances 0.8, 0.8, 0.0001 for genotypes AA, Aa, aa) and a recessive (REC) model (disease allele a frequency 0.05, penetrances 0.0001, 0.0001, 0.8). We performed nonparametric linkage (NPL) analysis using the linear model all pairs option (‘LinAll’; Kong & Cox 1997), which is designed to detect small increases in allele sharing among affected individuals, in other words, common susceptibility genes with small effect. Conversely, the parametric analyses are designed to detect more ‘major’ gene effects, even if restricted to a subset of families. Parametric heterogeneity lod scores (Hlods) were calculated, including an estimate of the proportion of families showing linkage at each analysis location. SNP allele frequencies used in the linkage analyses were derived from the genotypes of 116 unaffected family founders.

Association analyses

Genotype data on 718 family members for the 125,047 autosomal SNPs that were used in the linkage analyses (without LD pruning) were input to the PLINK program package for family-based association analyses using the transmission disequilibrium test (TDT) (Purcell et al. 2007). Two types of association testing were performed: (a) association in linkage regions (1) testing SNPs within 2.5 Mb of lod peaks ≥1.5 (total 5 Mb around each peak SNP) and (2) testing SNPs within 2.5 Mb of lod peaks >2.3; (b) genome-wide association analysis.

Results

Linkage analysis

The results of analyzing genetic linkage of 47,940 genome-wide SNPs to the dyslexia phenotype under nonparametric (NPL ‘LinAll’), general DOM and general REC models are shown in Figs. 1-3 (Note: genome-wide lod scores available upon request from L.L.F.). These analyses detected 15 linkage regions with maximum lod or Hlod scores ≥1.5 under one or more models (Table 1). None of the lod scores exceeded the recommended 3.3–3.6 threshold for genome-wide significance (Holmans & Craddock 1997; Lander & Kruglyak 1995), however five regions produced lod scores >2.3 which may be considered ‘suggestive’ of true linkage. Table 1 shows the locations of the 15 possible linkage regions, the maximum NPL lod score or parametric Hlod (heterogeneity lod) observed, the SNP at the peak, its position in cM and bp (NCBI build 36), the model producing the highest score at that region, alpha (estimated proportion of linked families under parametric models), closest gene(s) of interest within 1.5 Mb of the peak SNP (3 Mb region), and whether linkage (L) or association (A) with dyslexia (DYS), attention-deficit hyperactivity disorder (ADHD), or autism (AUT) has been previously reported in that region.

Figure 1.

Whole genome lod scores under NPL (linear all pairs, ‘LinAll) model.

Figure 2.

Whole genome lod scores under parametric DOM model.

Figure 3.

Whole genome lod scores under parametric REC model.

Table 1. Fifteen linkage regions with LOD (NPL) or HLOD (DOM or REC) scores ≥1.5
Chr.BandPeak SNPSNP bpaModelSNP cMH/LODAlphaGene/s within 1.5 Mb of peak SNPLinkage, assoc.b
  1. Five regions with scores >2.3 are bolded.

  2. a

    SNP bp positions are from NCBI Build 36 (2006).

  3. b

    Previously reported linkage (L) or association (A) of dyslexia (DYS), attention-deficit hyperactivity disorder (ADHD), or autism (AUT) in this region.

  4. c

    Also known as SNP_A-2147644.

  5. d

    These two SNPs spanning 5p13.3-p13.1 are conservatively interpreted to reflect the same peak.

1p1p31rs497064382,272,279DOM107.081.500.14LPHN2 (82.23 Mb)NO
2p2p21rs137790644,631,967NPL69.641.95n/aSIX3 (45.02 Mb), PRKCE (45.73 Mb)DYS (L, A)
4p4p15.32rs1750520116,677,686DOM32.601.950.12LDB2 ( 16.51 Mb)DYS (L)
 4p13rs1685374741,589,703DOM63.132.060.17PHOX2B (41.44 Mb), LIMCH1 (41.40 Mb)NO
4q4q13.1rs41454650c59,724,645DOM74.542.340.20LPHN3 (61.75 Mb)ADHD (L, A), DYS (L)
 4q35.1rs17324527183,775,949DOM184.401.780.16ODZ3 (183.48–183.96 Mb)DYS (L)
5p5p13.1-p13.3rs283128d32,102,829DOM53.501.690.15PDZD2 (31.83–32.15 Mb)ADHD (L), AUT (L)
 5p13.1-p13.3rs443383d40,222,598DOM64.651.690.16  
5q5q14rs1744628586,474,785REC102.301.660.13RASA1 (86.60 Mb)NO
7q7q36.1-q36.2rs4236441152,220,735REC168.422.420.11ACTR3B (152.18 Mb), DPP6 (153.21 Mb),AUT (L, A)
7q7q36.3rs11770182155,628,329REC181.022.570.13SHH (155.29 Mb), EN2 (154.95 Mb) PTPRN2 (157.02 Mb)AUT (A), ADHD (A)
9q9q31.2rs607019109,985,117REC111.731.520.09AL390170 (110.93 Mb)DYS (L, A)
 9q33.3rs2182660125,911,120DOM131.322.160.22LHX2 (125.84 Mb)ADHD (L)
12q12q24.23rs1365433118,188,040REC138.641.670.14SRRM4 (118.08 Mb), MSI1 (119.26 Mb)AUT (A)
16p16p12.1rs15453322,626,439NPL45.082.42n/aHS3ST2 (22.73 Mb), PRKCB (23.75 Mb)DYS (L), ADHD (L), AUT (L, A)
17q17q22rs233293352,922,575REC87.412.800.15MSI2 (52.69–53.11 Mb), BZRAP1 (53.73 Mb)DYS (L), AUT (A)

Regions with lod scores >2.3: Five regions generated lods >2.3 in our dataset (bolded in Table 1). Figures 4-7 show the localizations of these five linkage peaks (using NPL ‘LinAll’, DOM and REC models), the SNPs at the main peak and nearby minor peaks, and all genes in the 3 Mb region around the peak SNP (from UCSC Genome Browser), with red circles highlighting genes of interest listed in Table 1.

Figure 4.

Chromosome 17 linkage results. Top: lod scores under DOM, REC and NPL (LinAll) models; top inset: SNPs at 17q22 peak and nearby subpeaks; bottom: all genes within 1.5 Mb of peak SNP (from UCSC Genome Browser); red circles highlight genes of interest.

Figure 5.

Chromosome 7 linkage results. Top: chromosome 7 lod scores under DOM, REC and NPL (LinAll) models; top inset: SNPs at 7q36.1-q36.2 peak and 7q36.3 peak, and nearby subpeaks; bottom: All genes within 1.5 Mb of either peak SNP (from UCSC Genome Browser); red circles highlight genes of interest.

Figure 6.

Chromosome 16 linkage results. Top: lod scores under DOM, REC and NPL (LinAll) models; top inset: SNPs at 16p12 peak and nearby subpeaks; bottom: all genes within 1.5Mb of peak SNP (from UCSC Genome Browser); red circles highlight genes of interest.

Figure 7.

Chromosome 4 linkage results. Top: lod scores under DOM, REC and NPL (LinAll) models; top inset: SNPs at 4q13 peak and nearby subpeaks; bottom: all genes within 1.5Mb of 4q13 peak SNP (from UCSC Genome Browser); red circles highlight genes of interest.

All five of the regions with lods >2.3 have been previously implicated in dyslexia, ADHD, or autism: three have been implicated in dyslexia (4q13.1, 16p12.1, 17q22), three in ADHD (4q13.1, 7q36.3, 16p12.1), and four in autism (7q36.1-q36.2, 7q36.3, 16p12.1, 17q22). ADHD commonly co-occurs in individuals or families with dyslexia, and twin studies have demonstrated this is partially due to shared genetic predisposition between dyslexia and the inattention component of ADHD (Willcutt & Pennington 2000; Willcutt et al. 2007). Autism is characterized by language impairments and other social communication deficits, and parents of autistic children have an increased frequency of language-related disabilities (Bradford et al. 2001; Folstein et al. 1999; Schmidt et al. 2008).

The largest lod score (2.80) occurred on chromosome 17q22 under the REC model, at a SNP within the MSI2 gene, which is expressed during CNS development (Fig. 4; see Discussion below). The next largest lod (2.57, REC model, Fig. 5) occurred at 7q36.3, maximizing close to SHH and EN2 loci, implicated in holoprosencephaly and autism, respectively. A nearby but potentially distinct peak occurred at the border between 7q36.1-q36.2 (lod 2.42, REC model, Fig. 5) and maximized between ACTR3B and DPP6, both expressed in fetal brain. An equally strong linkage signal (lod 2.42, NPL model, Fig. 6) occurred at 16p12.1 at the HS3ST2 gene involved in cortical neuron development. The fifth largest score (2.34, DOM model, Fig. 7) occurred at 4q13.1; the gene closest to the peak is LPHN3, which has recently been reported to be both linked and associated with ADHD. The NPL model also produced a peak lod score of 2.16 at the same SNP in 4q13.1 (Fig. 7).

Regions with lod scores 1.5–2.3: Seven of the 10 regions that produced lod scores between 1.5 and 2.3 (see Table 1), although not considered statistically ‘suggestive’ evidence for linkage, have been previously reported by other research groups to show linkage or association with dyslexia (2p21, 4p15, 4q35, 9q31,) ADHD (5p13, 9q33) or autism (5p13, 12q24.23). Thus, these may represent true linkage signals.

Regions with lod scores 1.0–1.5: In addition to the 15 regions with lod scores of ≥1.5 shown in Table 1, our analyses also detected lod scores of 1.0–1.5 in 10 other regions (see Figs. 1-3). Since this weak evidence for linkage may nevertheless be of use to other researchers, the results are summarized here including the peak SNP: 1q31.1 (lod 1.1, rs17591814), 2q14.2 (lod 1.4, rs10201826), 4q22.2-q23 (lod 1.4, rs4399995), 7q31.33 (lod 1.0, rs2299555), 8p21.1 (lod 1.0, rs2123472), 17p13.3 (lod 1.1, rs8065080), 17q12 (lod 1.0, rs2106772), 17q24.3 (lod 1.3 rs236565) 17q25.3 (lod 1.2, rs9898429), Xq13.1 (lod 1.0, rs5980949) (Note: genome-wide lod scores available upon request from L.L.F.).

Association analysis in linkage regions

A logical approach to searching for association caused by LD between genetic markers and a trait locus is to focus on regions with a priori evidence of containing genes predisposing to the trait through positive linkage signals. In fact, detection of association might enable ‘fine-mapping’ of a susceptibility gene that was localized more roughly by linkage. We searched for LD using family-based TDT methods around the 15 linkage peaks with lod scores ≥ 1.5, by testing all 3853 SNPs within 2.5 Mb of the 15 peak SNPs (total 5 Mb around each peak, except for a broader 13 Mb range around the 5p13 dual peak). None of the SNPs generated p values equivalent to a significance level of 0.05 after Bonferroni correction for multiple testing (i.e. 1.3 × 10–5). Similarly, when association analysis was restricted to 962 SNPs within 2.5 Mb of the five most suggestive linkage peaks (peaks with lod scores >2.3), there were also no SNPs showing a significant p value after correction for total number of SNPs tested (i.e. 5.2 × 10–5).

There were 1225 autosomal SNPs with P < 0.01, and 39 (3.2%) of these SNPs were in the 15 putative linkage regions. Based on an estimated Build 36 autosome size of 2868 Mb (http://www.ncbi.nlm.nih.gov/genome/assembly/2928/), the 15 linkage regions account for 2.8% of the autosomal genome (81.5/2868 Mb). This slight increase of SNPs with P < 0.01 in putative linkage regions compared with the expected frequency (3.2% vs. 2.8%) was not statistically significant.

Genome-wide association analysis

We performed a genome-wide association analysis using all 125,047 (non-LD-pruned) autosomal SNPs. None of the SNPs produced a p value <4.0 × 10–7, equivalent to a genome-wide significance level of 0.05 after Bonferroni correction. The SNP with the lowest P-value, which almost met genome-wide significance (rs9313548, 170.89 Mb, P = 6.2 × 10–7; odds ratio [OR] for overtransmitted major allele = 2.77) occurred at 5q35.1, only 77 kb downstream of FGF18, a gene that has been implicated in specification of left-right asymmetry (Ohuchi et al. 2000) and in laminar positioning of cortical neurons during neocortex development (Hasegawa et al. 2004). The SNP with the next lowest P-value (rs577043, 78.61 Mb, P = 3.8 × 10–6; OR for overtransmitted major allele = 4.67) occurred at 11q14.1, within the ODZ4 gene. ODZ genes are expressed in complementary patterns in developing cortex (Li et al. 2006). It is therefore of interest that another ODZ gene, ODZ3 at 4q35, showed evidence for linkage with dyslexia in our data (lod 1.78, Table 1), and Loo et al. (2004) reported linkage of spelling ability to 4q35.

In addition to 5q35 and 11q14, there were two other genomic locations with SNP association P < 10–5 (7p11.2, rs1454517; 19q13.11, rs10410583) and another eight locations with P < 10–4 (1p33, rs4926547; 3q28, rs9839711; 4q22.2, rs280063; 7p21.1, rs3807493; 9q21.11, rs4744608; 10q21.3, rs1942006; 13q12.11, rs2880301; 14q23.1, rs268821) (Note: association p values for all genome-wide SNPs tested are available upon request from L.L.F.). Interestingly, none of these 12 locations were among our 15 regions with lod scores ≥1.5. However, one association location (4q22.2) overlapped a linkage region with lod score 1.0–1.5. The associated SNP rs280063 (95.17 Mb, P = 9.6 × 10–5; OR 1.64) is located just 204 kb downstream of ATOH1, which encodes a transcription factor involved in neuronal differentiation in the developing cerebellum (Klisch et al. 2011). The linkage peak occurred at 4q23 (lod 1.42 at rs4399995, 99.65 Mb) and the lod score remained >1.0 at ATOH1 (data not shown). ATOH1 regulates SHH signalling (Flora et al. 2009) and progenitor cells expressing ATOH1 give rise to interneurons expressing LHX2 (Gowan et al. 2001); this is noteworthy because both SHH and LHX2 lie very close to linkage peaks in our families (7q36.3 and 9q33, respectively; Table 1).

Linkage analyses of six largest pedigrees

Large pedigrees have the potential to provide strong linkage signals for major predisposition genes that are segregating in that family. Each of six large pedigrees with 10 or more affected genotyped members (mean 19.3 affected) was analyzed for evidence of such major genes by summing lod scores across their component subfamilies (see Materials and methods). Table 2 shows, for each pedigree, the regions that produced lod scores >2.3. No family generated a lod score exceeding 3.3, but all except one family generated at least one lod >2.3. Two families produced multiple regions with lod >2.3, suggesting within-family locus heterogeneity (such as bilineal inheritance) or oligogenic inheritance, rather than single-locus Mendelian-like predisposition. The majority of peaks corresponded to (i.e. contributed to) linkage regions detected in the whole dataset. However, the largest peak produced among these six pedigrees occurred at 18p11.22 (lod 2.9, DOM model, family 3933), a linkage region not detected in the whole dataset. Surprisingly, this corresponds precisely to a previously reported dyslexia linkage region, DYX6 (Bates et al. 2007; Fisher et al. 2002), demonstrating that summing evidence across heterogeneous families can obscure large linkage signals at previously reported locations. The current finding represents strong independent confirmation of the DYX6 susceptibility locus in a single large pedigree.

Table 2. LOD scores >2.3 in six large pedigrees
Family ID# AffectedLODs >2.3ChromosomePeak (cM)ModelAlphaGeneComments
  1. a

    Indicates a linkage region not seen in whole dataset of current study.

1913352.87q36.1-q36.2168–169 cMDOM0.61ACTR3BFetal brain exp.
3933212.918p11.22a32–33 cMDOM1NAPGLinked to dyslexia (Fisher et al. 2002)
191920None      
1953192.61p22.2114 cMDOM1  
  2.62p24.1-p23.247–51 cMDOM1  
  2.46q22.1a118 cMREC1HS3ST5Fetal brain exp. NB: 16p12 linkage at HS3ST2 (Table 1)
1916112.717q24.3102 cMDOM1  
  2.520q13.33a106 cMDOM1CADH4Fetal brain exp. Linked to ADHD (Ogdie et al. 2004)
1160102.62p22.2-p2162–70 cMDOM1  

Comparison with our previous studies

Previous studies of a qualitative dyslexia phenotype, using families from which the current study subjects were derived, showed significant linkage signals (e.g. Hsiung et al. 2004; Tzenova et al. 2004) not detected in this study. Several possible factors may contribute to this failure to reproduce strong signals: (1) reduction in the numbers of subjects in this array-based study (n = 718) compared to previous microsatellite-based studies (n = 914); (2) dividing large pedigrees for analysis in Merlin, not required in some previous programs (such as Fastlink); (3) different analysis methods used in previous studies (e.g. maximization-over-models, affected sibpair analyses) and (4) use of many low-information SNPs, rather than highly-informative microsatellites. However, in multipoint analyses one would theoretically expect dense SNPs to provide the same linkage information as microsatellites, assuming allele call accuracy is similar. We tested this expectation by merging data from 237 genome-wide microsatellites into our SNP data, and found that the linkage results were virtually identical using merged vs. SNP-only markers (unpublished results). Thus, differences between our previous findings and those of the current study are likely due to changes in number of subjects, pedigree structure and analytical methods employed, rather than marker type, and do not invalidate our previous findings.

Discussion

Putative linkage regions (lod >2.3)

17q22: Our largest linkage signal (lod 2.80, REC model) occurred at 17q22 (Table 1, Fig. 4) with an estimated 15% of families showing linkage to this region. The peak SNP rs2332933 occurred in an intron of the MSI2 (musashi 2) gene encoding an RNA-binding protein that influences generation and/or maintenance of specific CNS stem cell lineages (Sakakibara et al. 2002). Sakakibara et al. demonstrated that MSI2 and the highly homologous MSI1 gene are strongly co-expressed and may be cooperatively involved in proliferation and maintenance of CNS stem cell populations. It is therefore very interesting that our data also produced a linkage peak only 1.1 Mb from MSI1 on chromosome 12q24.23 (lod 1.67, REC model, Table 1). Significant association has been reported between autism and a marker near MSI1 (Lauritsen et al. 2006). Kawase et al. (2011) recently identified an intronic element within MSI1 that regulates MSI1 transcription in neural stem/progenitor cells; since MSI2 may also have an intronic regulator, it is noteworthy that the MSI2 linkage peak localized within the gene. The 17q22 region has also been reported linked to reading ability in ADHD-affected siblings using 404 genome-wide microsatellite markers (Loo et al. 2004). Using a ‘reading recognition’ phenotype, they obtained a lod score of 2.6 near D17S787 (50.6 Mb), which is remarkably close to our linkage maximum at 52.9 Mb. The current study provides the first independent support for a reading disability locus at 17q22, obtained despite using different phenotypic definitions from those of Loo et al. (2004). The 17q22 region has also shown association with autism: rare exonic dup/del variants at the BZRAP1 gene (53.7 Mb, just distal to MSI2; Fig. 4) were increased in autistic cases and unaffected family members, compared to healthy controls (Bucan et al. 2009).

7q36.3: Our second largest lod score (lod 2.57, REC model) occurred on chromosome 7q36.3 at 155.6 Mb, maximizing close to both SHH (sonic hedgehog) and EN2 (engrailed homeobox-2) loci (Table 1, Fig. 5).

SHH is a candidate dyslexia gene, because some individuals with SHH mutations causing holoprosencephaly (incomplete cleavage of the cerebral hemispheres) show delayed speech and learning disabilities as their primary clinical presentation (Hehr et al. 2004). Regarding SHH candidacy, our data also generated a lod score of 1.95 very close to the SIX3 gene on 2p21 (Table 1), and a recent study showed that SIX3 directly regulates SHH transcription, so that SIX3 mutations can also produce holoprosencephaly (Jeong et al. 2008). The 2p21-p22 region has previously shown evidence of linkage (Loo et al. 2004) and association (Francks et al. 2002) with dyslexia. This region appears to be distinct from the more proximal DYX3 dyslexia region at 2p11-p16 (Fagerheim et al. 1999; Kaminen et al. 2003; Petryshen et al. 2002).

The EN2 gene is also of interest because it has been reported associated with autism in several studies (reviewed in Sen et al. 2010). It may be relevant that our data also produced a lod score of 1.4 at 2q14.2 (see above ‘regions with lod scores 1.0–1.5’) only 1.5 Mb from the EN1 gene, since EN2 and EN1 function collaboratively in development of cerebellar organization late in embryogenesis (Cheng et al. 2010). Thus, like MSI2 and MSI1 above, both EN2 and EN1 generated linkage signals, suggesting developmentally interacting pairs of dyslexia-predisposing genes. Future analyses using two-locus disease modelling should address this possibility.

Finally, PTPRN2 is also a regional dyslexia candidate gene. Murine double knock-outs of PTPRN2 and PTPRN show impaired neuroendocrine secretion and changes in behaviour and learning (Nishimura et al. 2009). Furthermore, Lionel et al. (2011) reported a rare inherited duplication of PTPRN2 in two brothers with ADHD. Thus, there are several good candidates in the 7q36.3 region, including SHH, EN2 and PTPRN2. The current study is the first report of suggestive evidence for dyslexia linkage to this region.

7q36.1-q36.2: The next largest linkage signal in our data maximized at the junction of 7q36.1 and 7q36.2, at 152.2 Mb (lod 2.42, REC model, Table 1 and Fig. 5). Conservatively, this linkage peak could reflect the same dyslexia-predisposing locus as the 7q36.3 peak. On the other hand, this peak appears distinct from the 7q36.3 peak using a 1-lod-drop confidence interval (trough lod 1.20 at rs4960690, 154.57 Mb). Furthermore, 7q36.1-q36.2 produces a stronger signal under a DOM model than 7q36.3 (Fig. 5), with one large kindred making a major contribution (family #1913, see Table 2), supporting the possibility of two separate dyslexia loci. The lod score maximized between ACTR3B and DPP6, both good candidate genes. ACTR3B is highly expressed in fetal brain and may influence neuronal morphogenesis and migration during brain development (Jay et al. 2000). Marshall et al. (2008) found dup/dels involving the DPP6 gene in four unrelated children with autism. In addition, studies have reported linkage of the 7q36.1-q36.2 region with autism (Liu et al. 2001, lod 2.1 at D7S483, 151.8 Mb) and autism with developmental dysphasia (Auranen et al. 2002; lod 3.7 at D7S2462, 153.19 Mb). The current study is the first report of suggestive evidence for dyslexia linkage to this region.

16p12.1: The 16p12.1 region showed a linkage peak at 22.6 Mb (lod 2.42, NPL model, Table 1 and Fig. 6) with a secondary peak at 16p12.3, 19.1 Mb (rs916767, lod 2.08, Fig. 6). The major peak is immediately upstream of the HS3ST2 gene (heparan sulphate glucosamine 3-O-sulfotransferase 2), which modifies the structure of heparin sulphate (HS) and shows complex spatial-temporal expression patterns in the developing cerebral cortex and cerebellum (Yabe et al. 2005). HS is critical to midline axon guidance in development of the corpus callosum, hippocampal commissure, and anterior commissure (Inatani et al. 2003). Another regional candidate gene is PRKCB (protein kinase C beta, 23.7 Mb), which has been reported associated with autism (Philippi et al. 2005). Protein kinase C is known to regulate cognitive functioning in the prefrontal cortex (Chen et al. 2009) (note also PRKCE at 2p21 linkage region; Table 1). The secondary 16p12.3 peak lies over another candidate gene, SYT17 (synaptotagmin17), which is abundantly expressed in frontal and temporal lobes (Chin et al. 2006). The 16p11-p13 region has been previously linked to dyslexia in two microsatellite genome-wide scans (16p12.2, D16S3046, lod 2.2–2.4, Loo et al. 2004; 16p11.2, D16S753, lod 2.9, Raskind et al. 2005), to ADHD in two scans (16p13, lod 3.7, Ogdie et al. 2004; 16p13, lod 2.4, Gayan et al. 2005) and to autism (16p13, D16S3102, lod 2.9, IMGSAC 2001). The current study represents additional strong support for dyslexia linkage to the 16p12 region, using a high density SNP map which may assist in finer localization of the susceptibility locus, and highlights again the possible overlapping genetic predisposition of dyslexia with ADHD and autism.

4q13.1: This linkage signal maximized at 59.7 Mb (lod 2.34, DOM model, 20% of families linked, Table 1 and Fig. 7) in a ‘gene desert’ immediately upstream of the large LPHN3 gene (latrophilin 3, 61.7–62.6 Mb), encoding a brain-specific G-protein coupled receptor that is expressed in both fetal and adult brain. Arcos-Burgos et al. (2004) reported linkage of ADHD to this same region. Subsequently, in a large case–control study, Arcos-Burgos et al. (2010) showed significant association of ADHD to three SNPs within the LPHN3 locus (OR 1.21–1.23) and demonstrated that these variants were associated with differences in metabolic brain activity and in response to stimulant medication used to treat ADHD. The association of LPHN3 with ADHD has been confirmed in adult cases (Ribasés et al. 2011). This same chromosomal region was also reported linked to a latent class cluster including ADHD, co-morbid behavioural disorders, and alcohol/nicotine dependence (lod 4.0, D4S3248, 59.7 Mb, Jain et al. 2007) and linked to a dyslexia ‘nonword repetition’ phenotype – maximizing at the same microsatellite as the ADHD latent class (lod 2.43, D4S3248, 59.7 Mb, Brkanac et al. 2008). Our linkage signal maximized upstream of LPHN3; this is consistent with a recent LPHN3 mutation screen showing that coding variants did not account for the observed ADHD association (Domené et al. 2011). We tested for association at 393 SNPs located in the 5 Mb region around the linkage peak, across the LPHN3 gene, and 500 kb downstream of the gene, and found only one SNP with nominal P < 0.01 (rs13141378, P = 0.0036, OR 1.67), located 707 kb upstream of LPHN3. In summary, the current study provides the first independent data supporting linkage of dyslexia to LPHN3, a gene strongly implicated in ADHD and ADHD-related behavioural traits. Given the convincing evidence of LPHN3 involvement in dyslexia, it is interesting that our families also produced a linkage signal at 1p31, only 42 kb from the LPHN2 locus (latrophilin 2, lod 1.5, Table 1) which is also expressed in fetal brain.

Other regions (lod 1.5–2.3)

Six of the ten regions with lod scores between 1.5 and 2.3 are interesting since they have been previously reported linked to dyslexia or ADHD. However, these results must be interpreted with caution because they are not statistically significant.

Four of these weaker linkage peaks are in regions implicated in dyslexia by other research groups. Peaks at 2p21 (lod 1.95) and at 4q35.1 (lod 1.78) have already been discussed above (Table 1). Our peak at 4p15.32 (lod 1.95, 16.7 Mb, Table 1) is very close to a linkage peak for reading ability found by Bates et al. (2007) at 4p15.33 (lod 2.1, D4S403, 13.4 Mb). Our signal maximized adjacent to LDB2 (LIM-domain binding 2), which is expressed in fetal and adult brain and implicated in early neuronal differentiation pathways (Azim et al. 2009). Our families also generated a linkage signal at 9q31.2 (lod 1.52, 109.9 Mb, Table 1), which is close to a previous report of linkage to spelling ability at 9q31.3 (lod 1.2, D9S1677, 111.0 Mb, Loo et al. 2004). Our localization is also supported by results of the first high density QTL association genome screen for early reading ability (Meaburn et al. 2008), which found and replicated an association with rs1323381 at 110.10 Mb (although this association was not replicated by Luciano et al. 2011). A possible regional candidate is AL390170 at 110.93 Mb, an uncharacterized clone highly expressed in fetal and adult brain.

Two of our weaker peaks are in regions previously linked to ADHD. The 9q33.3 peak localized right at LHX2 (lod 2.16, 125.9 Mb, 22% of families linked, Table 1), a gene critical to development of cerebral cortex, particularly hippocampus (Mangale et al. 2008). The 9q33.3 region has been previously reported linked to ADHD in two microsatellite genome screens (Bakker et al. 2003, lod 2.1, D9S1825, 126.9 Mb; Arcos-Burgos et al. 2004, 9q33.3, marker not stated). Similarly, our peak at 5p13.3-p13.1 (lod 1.69, 32.1–40.2 Mb, Table 1) has also shown linkage in two independent genome screens of ADHD (Ogdie et al. 2004, lod 2.6, D5S418, 40.1 Mb; Arcos-Burgos et al. 2004, 5p13.3, marker not stated) as well as autism (D5S2494, lod 2.55, 40.3 Mb, Liu et al. 2001).

Conclusions

In summary, this study generated five ‘suggestive’ linkage signals (lod >2.3), of which three provide strong support for previously reported dyslexia linkage loci (at 17q22, 16p12 and 4q13.1), and the remaining two peaks suggest novel dyslexia loci (at 7q36.1-q36.2 and/or 7q36.3) in a region strongly implicated in autism. The study highlights several points: (1) the reproducibility of dyslexia linkage signals suggests the existence of a limited number of dyslexia genes of relatively large effect; (2) now popular GWAS complement rather than substitute for more arduous family linkage studies, because the regions showing strongest linkage and association signals in our data were nonoverlapping; (3) large pedigrees are invaluable for linkage studies, because major gene effects in individual pedigrees may be obscured by summing linkage signals across numerous genetically-heterogeneous families and (4) the remarkable overlap of linkage regions for dyslexia, ADHD and autism suggests common underlying neurodevelopmental genes. The convergence of genetic predisposition for clinically distinct conditions (dyslexia, ADHD, autism) should remind us that the investigator-defined behavioural phenotypes utilized in linkage/association studies might only loosely correlate with the underlying genetic determination of brain structure/function. Thus, rather than searching for ‘dyslexia genes’, we are searching for ‘neurodevelopment genes’ that may underlie a variety of clinical conditions.

Acknowledgments

Supported by grants from the Canadian Institutes of Health Research (grant MT-15661 to L.L.F.), the Alberta Mental Health Research Fund (to B.J.K. and L.L.F.), Alberta Children's Hospital Foundation (B.J.K.), and the Canadian Genetic Diseases Network of Centres of Excellence Programme (L.L.F.), as well as a Senior Scientist Awards to L.L.F. from the BC Children's Hospital Foundation and the Alberta Heritage Foundation for Medical Research. We greatly appreciate the assistance of D. Fong in graphics preparation, and thank Affymetrix for generous assistance while our lab was a beta test site. The authors have no conflicts of interest to declare.

Ancillary