*Dr R. Usha, Manovikas Biomedical Research & Diagnostic Centre, Manovikas Kendra Rehabilitation & Research Institute for the Handicapped, 482 Madudah, Plot I-24, Sector-J, EM Bypass, Kolkata 700 107, India. E-mail:firstname.lastname@example.org
Engrailed 2 (EN2) is a homeobox transcription factor involved in the patterning of cerebellum during brain development. Linkage analysis and studies on knockout mice support EN2, located on chromosome 7q36.3, as a potential risk locus for autism. Candidate gene approach also suggested association of EN2 with autism spectrum disorder (ASD) in various populations. Here, we have investigated the association of five markers [rs3735653 (C/T) in exon 1; rs34808376 (GC/-) and rs6150410 (CGCATCCCC/-) in promoter region; rs1861972 (A/G) and rs1861973 (C/T) in the intron] of the gene with autism and ASD in Indian population using family-based approach. Probands have been recruited using Diagnostic and Statistical Manual of Mental Disorders Fourth Edition (DSM-IV) diagnostic criteria. Genotypic distributions conform to Hardy–Weinberg equilibrium. Genotyping analysis showed that the intronic single nucleotide polymorphisms (SNPs) are in complete linkage disequilibrium showing A-C and corresponding G-T allelic association. We observed significant preferential transmission of C allele of rs1861973 from the parents to affected offspring [transmission disequilibrium test (TDT): narrow diagnosis likelihood ratio statistics (LRS) = 6.63, P = 0.006; broad diagnosis LRS = 4.47, P = 0.05]. Interestingly, gender-based investigations showed significant transmission of C allele to the affected females [TDT: LRS = 7.36, P = 0.0025; haplotype-based haplotype relative risk (HHRR): LRS = 7.16, P = 0.02]. A maternal overtransmission for these alleles was also noted (TDT: LRS = 3.65, P = 0.036; HHRR: LRS = 2.81, P = 0.036). Bioinformatic analysis using TFSearch showed generation of Sp1 binding site in the presence of C allele. While Del-T haplotype formed from rs34808376-rs1861973 markers showed increased non-transmission, the Ins-C showed significant transmission suggesting protective effect and risk, respectively, conferred by these haplotypes in autism etiology. These results suggest positive genetic correlation of EN2 with autism in the Indian population.
Several studies show that En2 plays an important role in the development and patterning of cerebellum in mice. En2−/− mice show complex changes in social and motor behavior (Cheh et al. 2006), which resemble some features of autistic phenotype (Fatemi et al. 2002; Palmen et al. 2004). The role of EN2 gene has also been implicated in serotonergic neuron development (Simon et al. 2005). Cerebellar-specific increase in serotonin level has been reported in En2−/− mice (Cheh et al. 2006) and hyperserotoninemia of platelets is the best characterized endophenotype of autism (Cook et al. 1993). Together, these data suggest that EN2 is a potential candidate gene for autism.
The first case–control study conducted by Petit et al. (1995) reported significant association between rs34808376 polymorphism in EN2 promoter region and autism in 100 cases and 100 controls. Two subsequent family-based studies (Gharani et al. 2004; Zhong et al. 2003) on rs3735653 marker of EN2 failed to show any association with autism. However, Gharani et al. (2004) found significant association of two intronic single nucleotide polymorphisms (SNPs) (rs1861972, rs1861973) with ASD. The study was repeated with additional data sets, where significant association was again reported with intronic SNPs and no involvement with rs34808376, rs6150410 and rs3735653 (Benayed et al. 2005, 2009). Contrarily, a recent case-control study conducted on Han Chinese children indicated a protective effect conferred by the A-C haplotype formed between the two intronic SNPs toward autism (Yang et al. 2008). A family-based study on the same population reported association of A allele of rs3824068 and specific haplotypes formed by this allele and multiple markers of exonic and intronic regions of EN2 with autism (Wang et al. 2008). Simultaneously, another family-based study conducted by Brune et al. (2008) also suggested association of rs1861972 with ASD.
Converging evidences from functional and genetic studies support the importance of EN2 in autism pathophysiology. EN2 gene harbors two exons separated by a single intron and spans 8.1-kb region of genomic DNA. So far there is no report available on this gene and its association with autism from the Indian population. Therefore, in the current study we have investigated the possible association of five markers located in the promoter (rs34808376 and rs6150410), exon 1 (rs3735653) and intron (rs1861972 and rs1861973) of EN2 gene (Fig. 1) with autism in the Indian population.
Materials and methods
Selection of subjects
The subjects included in the study were recruited through the Out Patients' Department of our institute and Assam Autism Foundation, Guwahati, India. All the probands met the Diagnostic and Statistical Manual of Mental Disorders Fourth Edition (DSM-IV) criteria for autism. Further assessment was carried out using Childhood Autism Rating Scale (CARS), which classifies the probands into mild, moderate and severe autistic cases (Schopler et al. 1986). Independent diagnostic evaluations of the cases were conducted by psychiatrist and clinical psychologist of the institute as described earlier (Dutta et al. 2008). We obtained informed written consent from all the parents for themselves and their children to participate in the study. For the present study, the probands have been classified into two groups: broad and narrow diagnosis of autism. A total of 128 families of ASD children comprising of 105 trios and 23 duos were considered for the analysis. The cohort included 98 families from West Bengal (83 males and 15 females) and 30 families from various other states of India (23 males and 7 females). The broad diagnosis includes probands with ASD, whereas narrow diagnosis group includes strictly autistic individuals. Hence, 16 children diagnosed as pervasive developmental disorders-not otherwise specified (PDD-NOS) and one child classified as Asperger syndrome have been included under the broad diagnostic criterion. Except for one family, PDD-NOS children belonged to the cohort from West Bengal. In the current study, male to female ratio was 4.77:1 with a mean age of 5.98 years (5.98 ± 3.1). Subjects who have been excluded from the study include those who have Fragile X syndrome [determined by Fragile X mental retardation protein (FMRP) immunocytochemical staining and methylation-specific polymerase chain reaction (PCR)] and gross chromosomal abnormalities (assessed by karyotyping). We used a standard data set for the collection of information regarding prenatal and postnatal history. The study protocol has been approved by the Human Ethics Committee of the institute.
Peripheral blood was collected from all the subjects. DNA was extracted using phenol, chloroform and isoamyl alcohol following isolation and lysis of the white blood cells. The genomic DNA was then subjected to PCR to amplify the region encompassing the polymorphisms. The 9-bp Ins/Del variant (rs6150410) was analyzed by gel electrophoresis following PCR amplification of the desired region. The remaining four markers, rs3735653, rs34808376, rs1861972 and rs1861973, were genotyped employing restriction fragment length polymorphism (RFLP) analysis using AluI, PvuII, TspRI and Tth111I, respectively (details of PCR and genotyping conditions are provided in Tables S1 and S2). All the enzymes were procured from New England Biolabs Inc., Ipswich, MA, USA. The sequences of the amplified PCR products were confirmed by direct sequencing of at least 20 DNA samples chosen at random for each locus using ABI 3130 Genetic analyzer.
Calculations for genotypic distributions and Hardy–Weinberg equilibrium of the markers were performed separately for the probands and their parents using the software pogene version 1.31. This program is freely available online at http://www.ualberta.ca/~fyeh/download.htm. Transmission disequilibrium test (TDT) and haplotype-based haplotype relative risk (HHRR) analysis were conducted as tests for association (Spielman et al. 1993; Terwilliger & Ott 1992) using tdtphase from the unphased program suite version 2.403 (Dudbridge 2003). Transmission disequilibrium test performed using phase-certain haplotype utilizes the extended transmission disequilibrium test (ETDT) method (Sham & Curtis 1995). Haplotype-based haplotype relative risk carried out with phase-uncertain haplotype treats all the transmitted haplotypes as ‘cases’ and all the untransmitted haplotypes as ‘controls’. For each analysis, 10 000 permutations were performed to compute the global P value. For HHRR, the EM algorithm is used to obtain maximum-likelihood estimates of case and control parental haplotype frequencies under both null and alternative. This is not robust to population stratification although the permutation option can considerably lessen this. Hence throughout the analysis part, we have considered only the global P values after 10 000 permutations, i.e. the corrected P values. Permutation estimates the significance of the best result correcting for multiple testing of haplotypes, loci and phenotypes. In each replicate, all the selected markers are analyzed and the most significant P value stored, so that the permutation procedure gives a significance level corrected for the multiple haplotypes and markers tested. For TDT and HHRR, all default parameters were set except for the following: the affection status of parents was chosen as unaffected for all analyses, test was performed for individual haplotypes and 10 000 permutations were set for ‘number of random permutations'. The only difference between the two test conditions was that the option of ‘uncertain haplotypes' was selected for HHRR analysis. Reference haplotype for odds ratio (OR) and relative risk (RR) calculations was set for the allele that showed decreased transmission. The unphased program suite is available online at http://www.mrc-bsu.cam.ac.uk/personal/frank/software/unphased.Haploviewsoftwareversion4.1 was used for computing pairwise linkage disequilibrium (LD) (Barrett et al. 2005). This software also checks for missing data and Mendelian inconsistencies. All power calculations were done employing genetic power calculator available online at http://www.pngu.mgh.harvard.edu/purcell/gpc/ (Purcell et al. 2003).
The search engine TFSearch, available online at http://www.cbrc.jp/research/db/TFSEARCH.html, was employed to analyze the sequences for putative transcription factor binding sites based on the TRANSFAC database (Heinemeyer et al. 1998). DNA sequences encompassing approximately 20-bp upstream and downstream of the polymorphic sites were provided as input sequence data. Taxonomy matrix was entered as ‘vertebrate’ and the threshold score was set at 85.0 for the analysis.
Figure 1 shows a schematic representation of the EN2 gene. The figure depicts the location of the polymorphisms included in this study and the inter-marker distances in basepairs. EN2 gene consists of two exons separated by a single intron. Here, we have analyzed the possible association of five markers (rs34808376 and rs6150410 in promoter region, rs3735653 in exon 1, rs1861972 and rs1861973 in the intron) of EN2 gene with autism and ASD in the Indian population. The polymorphisms were genotyped systematically and checked for Mendelian inconsistency. The genotypic frequencies for the cohort from West Bengal are depicted in Table 1. The genotype distributions of all the markers in different study groups were in agreement with Hardy–Weinberg equilibrium (data not shown). The genotype distributions for the 30 families, which are ethnically different and belong to 11 other states, are provided in Table S6. After the initial genotyping analysis of 388 samples, we noticed that rs1861972 (A/G) is in complete LD with rs1861973 (C/T) in the Indian population (D′ = 1, r2 = 1). Therefore, alleles A and C as well as G and T are always associated together and hence further analysis was performed only with the rs1861973 SNP.
Table 1. Analysis of genotypic distribution of EN2 markers for the samples from West Bengal
Broad diagnosis of Autism
Narrow diagnosis of Autism
Genotype count (frequency)
Genotype count (frequency)
Probands (n = 98)
Probands (n = 82)
Parents (n = 183)
Parents (n = 156)
Probands (n = 98)
Probands (n = 82)
Parents (n = 183)
Parents (n = 156)
Probands (n = 98)
Probands (n = 82)
Parents (n = 183)
Parents (n = 156)
Probands (n = 98)
Probands (n = 82)
Parents (n = 183)
Parents (n = 156)
Family-based association analyses were performed for the combined data set from West Bengal and other states using TDT and HHRR (Tables 2 and 3). In the case of rs3735653, rs34808376 and rs6150410 markers, both the analysis did not show biased transmission of any alleles. However, TDT analysis of rs1861973 showed a significant preferential transmission of the C allele from the parents to the affected progeny [likelihood ratio statistics (LRS) = 4.47, global P = 0.05]. This bias was more pronounced when the narrow diagnosis was considered (LRS = 6.63, global P = 0.006). The power for this study is 0.56 for narrow diagnosis and 0.39 for broad diagnosis of autism assuming prevalence as 0.002 and 0.006, respectively. The uploaded allele frequencies for power calculation are as mentioned in Table 1. Similarly, HHRR analysis showed a significant overtransmission of C allele to probands (broad diagnosis LRS = 3.49, global P = 0.027; narrow diagnosis LRS = 5.24, global P = 0.006) as shown in Table 3. Interestingly, further analysis for rs1861973 by segregating the probands according to their gender showed a highly significant overtransmission of the C allele to female autistic probands (Table 4) using TDT (LRS = 7.36, global P = 0.0025) and HHRR (broad diagnosis LRS = 7.16, global P = 0.02; narrow diagnosis LRS = 7.29, global P = 0.02) analyses. No such biased transmission was observed for the male probands (data provided in Table S5). While considering the possibility of any parent of origin effect, we noticed a significant maternal overtransmission for C allele of rs1861973 under the narrow diagnosis (TDT: n = 31, 19 transmissions vs. 9 non-transmissions, LRS = 3.65, df = 1, global P = 0.036; HHRR: n = 93, 75 transmissions vs. 65 non-transmissions, LRS = 2.81, df = 1, global P = 0.036).
Table 2. Family-based association analyses using TDT employing unphased program suite
*TDT, n = no . of informative trios; HHRR, n = all complete trios.
§Likelihood ratio statistics.
¶P value after 10 000 permutations, bold figures represent significant P values.
TDT (C allele)
Broad (n = 8)
Narrow (n = 7)
HHRR (C allele)
Broad (n = 17)
Narrow (n = 16)
Pairwise LD was computed for the probands from West Bengal (Fig. 2). Linkage disequilibrium between rs34808376-rs3735653 was weak with a normalized LD coefficient (D′) value of 0.25, while the other haplotypes exhibited moderate LD (Fig. 2). Although the LD between rs6150410 and rs1861973 markers depicted an overestimated D′ value of 1, r2 value was only 0.053. This is most likely because of the lack of all four haplotype combinations. The LD structure for all the affected children, including the 30 children from other states, has been provided in Fig. S1.
Further association analyses were conducted using 11 haplotype combinations of the four markers. Transmission disequilibrium test (Table 5) showed significant preferential transmission of haplotypes of rs3735653-rs1861973 and rs34808376-rs1861973 markers for broad and narrow diagnostic groups, respectively. The biased transmission pattern was supported for both broad and narrow diagnosis through HHRR analysis (Table 6). Moreover, we observed significant transmission pattern in the case of narrow diagnostic group through HHRR analysis for rs6150410-rs1861973 haplotypic combination. Details of the haplotype transmission data portraying the significant findings are provided in Tables S3 and S4. This data clearly depicts significant overtransmission and non-transmission of the haplotypes formed by the common and rare alleles, respectively, to the affected offsprings.
The bioinformatics software, TFSearch was used to examine the putative transcription factor binding sites created or deleted as a result of polymorphism (Table 7). The probable positions of the transcription factor binding sites have been depicted in Fig. 1. The results indicated generation of SpI binding site with 90.4 score at the rs1861973 site when the C allele is present. The site is deleted when the C allele is replaced by T allele. In the case of rs1861972 marker, no transcription factor binding site was observed for any of the alleles. The analysis on rs6150410 marker suggested the generation of GATA-1, GATA-2 and MZF1 binding sites with scores of 92.2, 88.9 and 87.8, respectively, as a result of insertion of the 9-bp stretch constituting the Ins allele. However, CdxA binding site adjacent to this marker remained unchanged in both the variants. The rs34808376 dinucleotide polymorphism did not show any potential binding sites in case of either allele.
†Underlines indicate the transcription factor binding sites.
In the present study, we have investigated the genetic association of EN2 gene with autism (narrow diagnosis) and ASD (broad diagnosis) in the Indian population using candidate gene approach. To our knowledge, this study is the first report on this gene in relation to autism from this population. Five markers, which are located in the promoter, exon 1 and intronic region of EN2 gene, were selected for the present study. The family-based analysis of these markers using TDT showed that the two intronic markers, which are in absolute LD, are associated with autism/ASD with a significant higher transmission of C allele of rs1861973 (hence A allele of rs1861972) to affected offspring. Interestingly, the bias is highly significant in the case of female probands suggesting a gender-specific transmission pattern, which has not been reported earlier in any population. Another positive observation of this study includes the significant non-transmission of Del-T (rs34808376-rs1861973) and Del-T (rs6150410-rs1861973) haplotypes to the probands, suggesting protective role of these markers in the etiology of autism.
The consideration of EN2 as a candidate gene has emerged from the following evidences. First, the role of En2 protein is well implicated in both embryonic as well as postnatal development of mouse cerebellum and hindbrain as En2 knockout mice exhibit cerebellar hypoplasia, foliar defects and reduced Purkinje cell numbers. These mice also display behavioral deficits such as decreased play, reduced social sniffing and low aggressive behavior. Similar neuroanatomical and behavioral alterations are observed in autistic individuals (Baader et al. 1998; Cheh et al. 2006; Jankowski et al. 2004; Kuemerle et al. 1997, 2007; Millen et al. 1994). Cerebellar-specific increase in serotonin level was observed in knockout mice. Defects in serotoninergic system have been implicated in autism and platelet hyperserotoninemia is one of autism endophenotypes. Functional magnetic resonance imaging (fMRI) studies on cerebellum show that it is involved in a number of key cognitive functions, including attention and the processing of language, music, problem solving and other sensory temporal stimuli. These attributes are found to be altered in majority of autistic subjects (Akshoomoff et al. 1997; Corina et al. 2003; Courchesne 1997; Courchesne & Allen 1997; Gao et al. 1996; Kim et al. 1994). Second, the location of this gene on chromosome 7q36.3 merits this concept because this region has been identified to be in linkage with autism through genome-scan studies. These evidences compel us to propose that EN2 is a possible candidate gene for autism. To understand the association of EN2 gene with autism, the variants at different marker loci need to be analyzed. Although various such studies from other population indicate the possibility of EN2 as risk loci for autism, no reports are available on the Indian population.
The first case–control study (Petit et al. 1995) in French population reported significant association of rs34808376 with autism. Two subsequent family-based studies (Benayed et al. 2005; Gharani et al. 2004) had failed to show any association of this marker with autism. Our findings are in agreement with the latter reports. We failed to detect any preferential allelic transmission from parents to the affected offspring, thereby supporting the null hypothesis of no association. Earlier studies with the rs3735653 polymorphism localized on exon 1 (Benayed et al. 2005; Gharani et al. 2004; Zhong et al. 2003) did not indicate any association with autistic disorder using family-based approaches. Wang et al. (2008), however, reported association of haplotypes containing rs3735653 and other intronic SNPs with autism. Similar to earlier reports, the present study did not show any correlation of this SNP with the disorder individually. However, the haplotype analysis performed using rs3735653 showed association with autism under both broad and narrow diagnosis, supporting the earlier report by Wang et al. (2008).
The two markers that generated more attention in this regard are the two intronic SNPs, rs1861972 and rs1861973. These SNPs are located 152-bp apart and are in strong LD in all the populations studied so far. In the present study, we observed complete LD between these markers with A-C/G-T allelic associations. They have been consistently reported to be associated with the disease etiology both individually and as the A-C haplotypes (Benayed et al. 2005, 2009; Brune et al. 2008; Gharani et al. 2004). However, two recent studies conducted on Han Chinese population have painted a different picture (Wang et al. 2008; Yang et al. 2008). Wang et al. (2008) reported no association of rs1861972 and rs1861973 markers individually with autism in their family-based study. However, they observed significant association with haplotypes of multiple markers including rs3735653, rs1861972 and rs1861973. On the contrary, a report on this population by Yang et al. (2008) using case–control approach suggested that the A-C haplotype might in fact confer a protective effect. As in the Caucasian population, the present preliminary observation shows transmission of C allele of rs1861973 (hence A allele of rs1861972) more frequently to the affected ASD probands. This pattern was clearer when strictly autistic cases (narrow diagnosis) were considered. Such a bias in the transmission underlines the risk posed by C allele toward autism. To examine whether these SNPs impart any regulatory role during transcription, we have performed a bioinformatics analysis to investigate putative transcription factor binding sites in this region. Although we failed to detect a binding site in the case of rs1861972 marker, generation of a Sp1 transcription factor binding site was observed in the case of rs1861973 when the autism-associated C allele was present (Fig. 1). This suggests the possibility of this SNP being involved in the transcriptional regulation of EN2. A very recent study by Benayed et al. (2009) showed that A-C haplotype of rs1861972-rs1861973 combination as a risk haplotype through functional studies, which also supports our present finding.
Autistic disorder is known to have gender bias. Hence, we have studied the preferential transmission of alleles separately to male and female offsprings. The study showed a strong overtransmission of C allele to the female probands (n = 17). Moreover, this predisposition was contributed solely by the female cases of narrow diagnosis group. This observation gives suggestive evidences for a gender-specific functional significance of the EN2 protein. In addition, a maternal transmission bias was also observed under the narrow diagnosis for this allele. Apart from the reports that En2 is expressed in mid-hindbrain region in early embryos, a study on mouse embryos showed that it is also expressed in mandibular arch myoblasts (Davis & Joyner 1988; Davis et al. 1991; Logan et al. 1993). It was reported that with the loss of En2 in jaw muscle, there occurs a shift in fiber metabolic properties exclusively in the jaw of female mice (Degenhardt & Sassoon 2001). Because the jaw muscles are sexually dimorphic, the authors proposed that function of this protein and mechanisms leading to sexual dimorphism of these muscles are integrated. This may possibly explain the gender-specific bias in transmission pattern. However, any information or any hormonal influence on the regulation of such a mechanism in the central nervous system is not available. These observations suggest the possibility of genetic imprinting and imprinted regions have been identified in chromosome 7q. However, these results need to be confirmed using higher sample size.
We observed comparatively strong LD between different pair of markers except for rs3480376 and rs3735653. On the basis of the LD information, haplotype analyses were carried out. TDT and HHRR analysis showed association of haplotypes formed by rs34808376-rs1861973 and rs6150410-rs1861973 markers with autism/ASD (Tables 5 and 6). The significant non-transmission and overtransmission of Del-T (rs34808376-rs1861973, rs6150410-rs1861973) and Ins-C (rs34808376-rs1861973) haplotypes, respectively, suggest the protective effect and risk posed by the respective haplotypes (Tables S3 and S4). In all these association analyses, 10 000 permutations have been carried out to correct for multiple testing. The results of bioinformatics analysis indicating generation of three additional putative transcription factor binding sites in case of insertion allele of rs6150410 might suggest the functional significance of this marker in the etiology of autism.
In conclusion, the current preliminary candidate gene study provided evidence for a suggestive role of EN2 gene in the pathology of autism and ASD in the Indian population. However, replication of these findings with a larger sample size and more work in different ethnic groups is warranted to strengthen this hypothesis.
Financial assistance as Senior Research Fellowship to B.S. from Council of Scientific and Industrial Research (Govt. of India) is thankfully acknowledged. The work was initially supported by Senior Research Scholarship from Lady Tata Memorial Trust, India to B.S. We gratefully acknowledge the helpful discussion and advice provided by Dr Manoranjan Singh, Research Director of the institute during the preparation of the manuscript. We thank Dr K. Zaman Ahmed, Pain Clinic of North East India, Guwahati, for the clinical contribution to carry out this study. We are also grateful to the families who have been our partners in this research.