Evaluating Mitochondrial DNA Variation in Autism Spectrum Disorders

Authors


Corresponding author: JACOB L. MCCAULEY, PhD. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, 1501 NW 10th Avenue, BRB-307 (M-860) Miami Florida 33136 Tel: 305-243-4578 Fax: 305-243-2396; E-mail: jmccauley@med.miami.edu

Summary

Despite the increasing speculation that oxidative stress and abnormal energy metabolism may play a role in Autism Spectrum Disorders (ASD), and the observation that patients with mitochondrial defects have symptoms consistent with ASD, there are no comprehensive published studies examining the role of mitochondrial variation in autism. Therefore, we have sought to comprehensively examine the role of mitochondrial DNA (mtDNA) variation with regard to ASD risk, employing a multi-phase approach.

In phase 1 of our experiment, we examined 132 mtDNA single-nucleotide polymorphisms (SNPs) genotyped as part of our genome-wide association studies of ASD. In phase 2 we genotyped the major European mitochondrial haplogroup-defining variants within an expanded set of autism probands and controls. Finally in phase 3, we resequenced the entire mtDNA in a subset of our Caucasian samples (∼400 proband-father pairs). In each phase we tested whether mitochondrial variation showed evidence of association to ASD. Despite a thorough interrogation of mtDNA variation, we found no evidence to suggest a major role for mtDNA variation in ASD susceptibility. Accordingly, while there may be attractive biological hints suggesting the role of mitochondria in ASD our data indicate that mtDNA variation is not a major contributing factor to the development of ASD.

Introduction

Autism Spectrum Disorders (ASDs) are neurobehavioral disorders characterized by deficits in social abilities, problems with language and communication, and the presence of patterns of repetitive behaviors, restricted interests, and resistance to change. ASD has an estimated population prevalence of one in every 1000 individuals within the general population with a male:female ratio of 4:1 (Fombonne, 2002; Fombonne, 2009). Autism, along with Asperger's syndrome, Rett syndrome, and other pervasive developmental disorders, are more generally classified as ASDs and may affect as many as one in 88 children in the United States (Autism and Developmental Disabilities Monitoring Network Surveillance Year 2012). Little is known about the etiology of ASD; however, overwhelming evidence from numerous studies has indicated that idiopathic autism has a complex genetic etiology. Twin and sibling studies overwhelmingly suggest a strong genetic component and high heritability for ASD. Studies show a concordance rate of ∼60% for classic autism and up to ∼90% for ASD among monozygotic (MZ) twins and <10% among dizygotic (DZ) twins (Folstein & Piven 1991; Ritvo et al., 1985; Bailey et al., 1995). Numerous linkage and association studies, including genome-wide association studies (GWAS), and candidate gene studies have failed to characterize an appreciable amount of the genetic variation believed to be involved in this devastating disease. These approaches have identified a multitude of possible locations and genes for susceptibility, but few consensus regions or genome-wide significant associations have resulted (Anney et al., 2010; Autism Genome Project Consortium et al., 2007; Ma et al., 2009; Wang et al., 2009; Weiss et al., 2009). Recently, Copy Number Variants have been revealed to explain some of the variation in ASD, strengthening the hypothesis that multiple sources contribute to ASD etiology (Salyakina et al., 2011; Griswold et al., 2012)

The mitochondrial genome is small and circular (16,569 base pairs), possesses a distinct code from the nuclear genome (Wallace et al., 1999) and has a unique maternal inheritance pattern. Multiple copies of the mitochondrial DNA (mtDNA) are contained in each mitochondrion; some differ in sequence, this phenomenon is called “heteroplasmy.” It encodes 13 protein subunits of the mitochondrial electron transport chain and a distinct set of rRNAs and tRNAs all of which are critical for life-sustaining oxidative phosphorylation and energy generation (Wallace, 1994). Relative to the nuclear genome, the mitochondrial genome has been understudied in the search for common genetic variation associated to human disease, despite the fact that mitochondria play a vital role in cellular energy production (Papa, 1996; Wallace, 1997; Wallace et al., 1999).

Variation in mitochondrial DNA has been examined in numerous neurological, age-related common genetic diseases (van der Walt et al., 2003; Howell et al., 2005; Raule et al., 2007; Canter et al., 2008; Rollins et al., 2009). Increased production of reactive oxygen species (ROS) due to mitochondrial respiratory activity and the resultant damage to both mtDNA and nuclear DNA have long been implicated in disease (Feig et al., 1994; de Zwart et al., 1999; Wallace et al., 1999; Penta et al., 2001; Scheffler, 2001; Kang & Hamasaki, 2003). There has been increasing speculation that oxidative stress and abnormal energy metabolism may play a role in ASD, consistent with some level of mitochondrial dysfunction (Lombard, 1998; Chugani et al., 1999; Clark-Taylor & Clark-Taylor 2004; Oliveira et al., 2005; Chauhan & Chauhan 2006; Rossignol et al., 2007), although this remains somewhat controversial (Lerman-Sagie et al., 2004). In addition, numerous clinical reports have described patients with mitochondrial disorders or mutations who have symptoms consistent with ASD (Graf et al., 2000; Fillano et al., 2002; Filipek et al., 2003; Pons et al., 2004; Oliveira et al., 2005; Poling et al., 2006; Tsao & Mendell 2007). Further, mitochondrial inheritance is consistent with the observed increased neuropsychological abnormalities in the mothers of ASD children (Baron-Cohen et al., 1997; Bishop et al., 2004; Constantino & Todd 2005). To date there has been one investigation of the mitochondrial haplogroups in ASD (Kent et al., 2008). The small sample size (n = 162) generated borderline significant results, but the study was underpowered to detect anything but large main effects with odds ratios > 2.0.

Additional evidence for the role of mitochondria in autism stems from data regarding the SLC25A12 gene, a nuclear encoded protein for the mitochondrial aspartate/glutamate transporter, ARALAR (De Zwart et al., 1999; Satrustegui et al., 2007). Mice with a homozygous deletion of the Aralar gene develop severe birth defects soon after birth and die approximately 20 days postnatal. In addition, aralar deficiency causes a large drop in aspartate and its derivative N-acetylaspartate (NAA) in the brain and in primary neuronal cultures. Interestingly, NAA is commonly used in 1H-NMR spectroscopy as a potential diagnostic marker for neuronal function or loss (Pan & Takahashi 2005; Tsai & Coyle 1995), and is reduced in certain brain regions of autistic patients (Otsuka et al., 1999). In humans, two polymorphisms within SLC25A12 were associated with an increased risk of autism in a dataset of 411 families (Ramoz et al., 2004). A positive replication was found in a relatively homogeneous dataset of Irish autism families (Segurado et al., 2005), but negative reports in more heterogeneous datasets (Rabionet et al., 2006; Blasi et al., 2006; Correia et al., 2006) have also been published. There are several possible explanations for the inconsistency of results, including underpowered datasets and genetic heterogeneity, which could arise from having differing mitochondrial haplogroup backgrounds that may affect SLC25A12 function.

In summary, the potential role of mitochondrial variation in ASDs remains intriguing and warrants thorough investigation. While the nuclear genome has been the focus of countless studies over the past couple of decades in search for autism susceptibility genes, our study marks the largest examination of the mitochondrial genome in ASD, and the first report of mtDNA resequencing in ASD.

Materials and Methods

Patient Ascertainment and Description

Individual patient samples included in this study (n = 1298) (Table 1) consist of samples ascertained at the John P. Hussman Institute for Human Genomics (HIHG) at the University of Miami, Miller School of Medicine (Miami, Florida) (n = 668), the University of South Carolina (Columbia, South Carolina) (n = 317), the Center for Human Genetics Research at the Vanderbilt University (Nashville, Tennessee) (n = 108), and samples obtained from the Autism Genetic Resource Exchange (AGRE) (n = 205) (Autism Genetics Resource Exchange, 2008). Families were enrolled through a multi-site genetics study of autism and recruited via family support groups, advertisements, and clinical and educational settings. All participants were ascertained and sampled according to approved Institutional Review Board (IRB) protocols. Participants with ASD met the following minimum criteria for inclusion: (1) chronological age between 3 and 21 years of age; (2) presumptive clinical diagnosis of ASD; (3) expert clinical determination of ASD diagnosis using DSM-IV criteria supported by the Autism Diagnostic Interview (ADI-R) (Rutter et al., 2003b). Diagnostic determination was based on review by clinical psychologists with extensive experience in autism and related disorders. In those instances where an ADI-R was not available, a best-estimate diagnosis was assigned using all available clinical information including clinician summaries, caregiver report, and medical records; (4) minimal developmental level of 18 months as determined by the Vineland Adaptive Behavior Scale (VABS) (Sparrow et al., 1984) or the VABS-II (Sparrow et al., 2005) or an IQ equivalent > 35. These minimal developmental levels assure that ADI-R results are valid and reduce the likelihood of including individuals with severe mental retardation. We excluded participants with severe sensory problems (e.g., visual impairment or hearing loss), significant motor impairments (e.g., failure to sit by 12 months or walk by 24 months), or identified metabolic, genetic, or progressive neurological disorders. Family history and pedigree information (including any known health and psychiatric history of family members) was collected in a standard semi-structured interview with a biological parent of the proband, frequently the mother. Phenotypic data regarding the family also was collected through a review of available medical and psychiatric records of the proband and/or affected sibling, as well as review of available photographs of the proband, siblings and parents in the patient charts. Confounding by race and ethnicity was addressed using both a stratified analysis and a principal components analysis (PCA) in phase 1, haplogroup definition in phase 2, and a homogeneous sample of self-reported Caucasian non-Hispanic individuals in phase 3.

Table 1. Race/Ethnicity dataset descriptions
Phase (Platform)*Overall uniquePhase 1 (Illumina)Phase 2 (Sequenom)Phase 3 (Affymetrix)
RaceEthnicityCasesControlsCasesControlsCasesControlsCasesControls
  1. *As different phases of this project contain overlapping samples, this represents the non-overlapping unique sample set.

  2. **Contains individuals of self-reported race and ethnicity of American Indian, Asian, and other/unknown.

CaucasianNon-Hispanic885152459612125511391379372
CaucasianHispanic7816363163292200
CaucasianUnknown14570816718216517200
African AmericanNon-Hispanic41139401436300
African AmericanUnknown322132212100
**Other/Unknown117916642655200
TOTAL1298264696417638181641379372

Control Ascertainment and Description

Control samples utilized in this study (n = 2,646) were obtained from multiple sources (Table 1). Healthy children (n = 513) between the ages of 4 and 21 years were recruited by the HIHG. Participants were screened for eligibility using a series of preliminary questions to determine whether the child, his or her parent, or sibling has been diagnosed a developmental, behavioral, neurological or other disability or condition. If none of those conditions were present, parents of minor children or participants reviewed and signed the informed consent and completed the Social Communication Questionnaire (Rutter et al., 2003a) to screen for potential ASDs. These control participants provided a saliva sample for these and other ongoing genetic studies. A second set of control individuals (n = 327) were part of ongoing studies of preterm birth. These samples were collected via the Centennial Medical Center in Nashville from the cord blood of term pregnancies (>37 weeks gestation). Mothers between the ages of 18 and 40 were recruited for this collection, with cord blood being collected from live singleton births. Our third set of control individuals (n = 582) were from the National Institute of Mental Health (NIMH) Human Genetics Initiative and consist of non-Hispanic European-ancestry DNA samples made available through this resource. These samples and their corresponding data have been used by multiple investigators through permission and collaboration with the Center for Collaborative Studies of Mental Disorders resource (https://www.nimhgenetics.org). Finally, given that mtDNA is maternally inherited, we included the unaffected fathers (n = 1,224) from the ascertained autism families (n = 614 HIHG; n = 290 South Carolina; n = 106 Vanderbilt; n = 214 AGRE) as additional controls for these experiments.

Phase 1 Dataset (Illumina Genotyping)

Description and genotyping

A total of 2727 samples (964 cases, 1763 controls) were included in the phase 1 experiment (Table 1; Fig. 1). Cases were selected from among our ASD families, with a single case selected (probands were chosen preferentially) from each nuclear family. Additional cases were chosen from 76 extended multiplex families provided they did not share mitochondrial lineage (i.e., each case chosen had a unique maternal founder). Control data came from the fathers of probands (n = 923) and the pediatric controls from the HIHG and the preterm birth study (n = 840).

Figure 1.

A graphical representation of the overlap among the three phases of this study for a) samples and b) markers.

Analysis

Individual sample data was selected for cases and controls following a comprehensive quality control (QC) analysis of these samples using the autosomal markers of the GWAS panel as previously reported (Ma et al., 2009; Wang et al., 2009). Of the total 163 mtDNA single-nucleotide polymorphisms (SNPs) genotyped across both the Illumina 1M and Illumina 1M Duo BeadChip arrays, we examined the 132 common to both platforms. These SNPs were examined for call rate (requiring >95% for inclusion). Using the PLINK analysis software package, we next examined the call rate of samples and dropped any sample with a call rate below 95% in this set of SNPs (Purcell et al., 2007). We further checked both samples and SNPs by examining inconsistencies between mother and child genotypes at these 132 mtDNA SNPs with the use of the PLATO software package (Grady et al., 2010). This resulted in the removal of one case sample. Due to the insensitivity of the assay to detect heteroplasmy we set all heterozygous calls to missing.

To test for association while accounting for possible confounding by population stratification, we performed a stratified analysis, using the Cochran–Mantel–Haenszel (CMH) test, as implemented in PLINK, with genetically defined clusters generated with the software program CLUSTER (Table S1, Fig. S1). CLUSTER was developed in-house as an alternative to the STRUCTURE software application (Pritchard et al., 2000). It uses Ward's clustering algorithm to assign individuals to populations on the basis of information from multiple loci. In addition, to address the issue of population stratification on a finer scale, we conducted Eigenstrat analysis using the 132 mtDNA SNPs (Price et al., 2006; Biffi et al., 2010). This PCA was used to infer continuous axes of genetic variation which control for ancestry in the place of the categorical self-reported ethnicity variable. Eigenstrat analysis resulted in the exclusion of 49 samples with eigenvector values that were ≥6 standard deviations from the mean of principal components 1, 2 and 3 (mtPCs). The mitochondrial genomic inflation factor (mtGIF) was used as a measure of deviation from the median of the test statistic distribution. Association analysis was performed using logistic regression as implemented in PLINK, with mtPC1, mtPC2 and mtPC3 used as covariates in the analysis (Table 2). The mtGIF = 1.0 with the inclusion of these principal components. We did not attempt to incorporate autosomal data to further correct for mitochondrial population substructure as data suggests it results in little improvement (Biffi et al., 2010). Furthermore, we conducted permutation testing to assess the significance of our results, using PLINK (–mperm 10000–model-trend options) on the self-reported Caucasian non-Hispanic subset of the phase 1 dataset.

Table 2. Nominally significant (P < 0.05) logistic regression results of mtDNA SNPs analyzed in phase 1
SNPBasepair positionMinor alleleMAF in controlsMAF in mtDBP valueAmino acid changeRegion
MitoG228A228A0.050.030.03naD-loop
MitoG7522A7521A0.090.060.03nat-RNA (Asp)
MitoT9900C9899C0.020.010.04synCOX 3
MitoG10590A10,589A0.010.020.02synND4L
MitoG11378A11,377A0.010.010.03synND4
MitoT13966C13,965C0.020.010.04synND5

Phase 2 Dataset (Sequenom Genotyping)

Description and genotyping

A total of 2459 samples (818 cases, 1641 controls) were included in the phase 2 experiment (Table 1; Fig. 1), only five of which overlap with phase 1. Cases were selected as in phase 1 from among our ASD families (n = 613), with additional cases from the AGRE resource (n = 205). Control data for this experiment came from the mitochondrially unrelated fathers of probands (n = 806), from the cord blood of the term pregnancies (n = 253), and from the NIMH Human Genetics Initiative (n = 582). In phase 2 of our experiment, we used both the Sequenom MassARRAY iPLEX and TaqMan genotyping platforms to genotype the major European mitochondrial haplogroup defining variants (Table S2). A total of 12 SNPs were genotyped in this effort for the main purpose of defining these haplogroups within our dataset.

Analysis

Cases and controls were chosen for analysis following a QC approach similar to phase 1. As we genotyped our entire ASD dataset on the 12 selected SNPs, our QC analysis in this phase benefited from mother–child as well as cross-platform genotype concordance. Due to the limited number of markers and their importance in haplogroup assignment the sample call rate threshold was set to 100%. After sample and marker checks, any remaining instances of erroneous heterozygous genotype calls or platform discordant genotypes were set to missing for our analysis. Each sample was assigned to a specific haplogroup (Table 3) using information from the 12 genotyped SNPs (Table S2). Logistic regression analysis was conducted using SAS for both haplogroup and single marker tests of association (Table 4).

Table 3. Phase 2 haplogroup results
 CountsFrequency    
HaplogroupCasesControlsCasesControlsP valueORL95U95
  1. OR, odds ratio; L95/U95, lower and upper bounds of the 95% confidence interval for the OR.

H3587620.440.460.210.900.761.06
I21360.030.020.561.180.682.03
J831460.100.090.321.160.871.54
K641340.080.080.770.960.701.30
T821730.100.110.690.950.721.25
U1472720.180.170.391.100.881.38
V32560.040.030.531.150.741.79
W18360.020.020.991.000.571.78
X13260.020.020.991.000.511.96
Total81816411.001.00
Table 4. Logistic regression results for haplogroup defining mtDNA SNPs analysed in Phase 2
SNPBasepair positionMinor allele**MAFP valueORL95U95
  1. *These 5 SNPs were also genotyped in Phase 1.

  2. **Minor allele frequency in controls.

  3. OR, odds ratio; L95/U95, lower and upper bounds of the 95% confidence interval for the OR.

*MitoA1719G1719G0.040.441.180.781.79
MitoC4216T4216T0.170.581.070.851.34
MitoA4580G4580G0.040.790.940.581.52
*MitoG4917A4917A0.090.340.860.631.17
MitoC7028T7028T0.390.300.830.581.18
MitoA8251G8251G0.040.731.080.711.63
MitoA9055G9055G0.090.930.990.731.33
*MitoG10398A10,398A0.260.311.120.901.41
*MitoG12308A12,308A0.220.991.000.781.27
MitoA13368G13,368G0.10.660.940.691.26
MitoA13708G13,708G0.090.091.280.971.71
*MitoA16391G16,391G0.020.681.130.632.03

Phase 3 Dataset (Affymetrix Genotyping)

Description and genotyping

A subset of Caucasian samples was selected for resequencing of the entire mitochondrial genome (Table 1; Fig. 1). Before QC, the case group consisted of 400 samples that represented one affected individual per family (typically the proband), and the control group consisted of the 400 fathers of these individuals. We utilized the Affymetrix Human Mitochondrial Resequencing Array 2.0. (Affymetrix Inc. Santa Clara, CA) to resequence the entire mtDNA in this sample subset. This array uses microarray chip technology to sequence both strands of the entire mtDNA sequence after performing three long-range PCR amplifications. Each base position is interrogated with eight unique 25-mer probes on the resequencing array, and allows for the detection of both known and novel base substitutions.

Analysis

Haploid calls were initially made by setting the Affymetrix GSEQ software algorithm parameters to the haploid model and the quality score threshold to 12 (Coon et al., 2006). A summary of the sequencing results for each sample was generated with MSDAT, a powerful tool developed by our group for the analysis of mtDNA sequence data. Call rate thresholds were set at 95% for both samples and SNPs. As a result, 842 out of 16,544 positions and 49 out of 800 samples were dropped using the PLATO software package (Grady et al., 2010). Additional base positions were dropped depending on the analysis, with both tri-allelic (n = 29) and monomorphic base calls (n = 14,732) being dropped as part of the single marker analysis. In this phase, maternal genotypes were unavailable for conducting additional mitochondrial error checks. The sample call rate threshold reduced the final sample counts to 372 controls and 379 cases. We assessed significance by permutation testing for 183 common (MAF > 0.01), variants, and separately for 941 variants (no MAF filter) using the program RVASSOC (Kinnamon, 2010) which implements Cochran-Armitage (CA) max/sum tests (Kinnamon et al., 2012). Moreover,we tested for coding variation (Table 5) and heteroplasmy differences, examined our dataset for presumably rare variation that has been shown to be associated with other mitochondrial disorders (Table 6), and performed mutational burden assessments between our cases and controls. We qualitatively assessed the differences in heteroplasmic SNP distributions between cases and controls (Cutler et al., 2001) using the diploid GSEQ algorithm settings, a quality threshold of 3 and a call rate threshold of 95%. We were able to examine 16,010 out of the 16,544 mitochondrial positions. We followed the approaches set forth by Coon and colleagues to examine both heteroplasmy and mutational burden (Coon et al., 2006). Total mutational burden was calculated as the number of variants observed corrected for the number of sites/positions examined, multiplied by the number of cases or controls (Table 7).

Table 5. Coding changes per gene in Phase 3 analysis
 *Synonymous changes**Non-synonymous changes
Mito GeneControlsCases P valueControlsCases P value
  1. *Coding changes per gene that DO NOT result in an amino acid change from rCRS.

  2. **Coding changes per gene that DO result in an amino acid change from rCRS.

ATPase812180.3024190.41
ATPase680660.20931010.66
ND1100860.251241170.55
ND21361520.4392990.71
ND336300.42100890.35
ND45255440.7846320.10
ND756630.59630.46
ND53573640.991311290.78
ND61331360.9823200.60
CO13023060.9537260.14
CO266710.751090.79
CO31171030.2855700.22
Cytb1621510.434484520.88
Table 6. mtDNA variants shown to be associated with other diseases
*SNPBasepair positionMinor allele**P value
  1. *Novel Phase 3 interrogated variation was compared against the “mtDNA Mutations with Reports of Disease-Association” from http://www.mitomap.org.

  2. **Fisher's exact test with 1 degree of freedom.

MitoT1005C1005C0.50
MitoG1438A1438A0.14
MitoA1555G1555G0.12
MitoG3316A3316A0.50
MitoA3796G3796G0.25
MitoC4025T4025T0.50
MitoC4171A4171A0.50
MitoA4295G4295G0.38
MitoC4640A4640A0.49
MitoT5814C5814C0.50
MitoC6489A6489A0.25
MitoG7444A7444A0.50
MitoA8348G8348G0.50
MitoG9804A9804A0.10
MitoT9957C9957C0.50
MitoT10237C10,237C0.49
MitoA11084G11,084G0.50
MitoT11253C11,253C0.16
MitoG11696A11,696A0.51
MitoA12026G12,026G0.50
MitoT12297C12,297C0.50
MitoT12811C12,811C0.25
MitoA13637G13,637G0.02
MitoT14325C14,325C0.38
MitoG14831A14,831A0.25
Table 7. The distribution of mtDNA SNPs between ASD cases and controls in the Phase 3 resequencing dataset
  CasesControls 
 *Total Per**Total Per**Total 
 sitesTotalindividualmutationalTotalindividualmutationalχ2
 possibleobserved(N = 379)burdenobserved(N = 372)burdenP value
  1. *Total mutational burden = total variants observed/(total sites possible x N of cases or controls).

  2. **rCRS probes to 16, 544 positions, 534 of which failed the 95% call rate threshold using the diploid algorithm.

  3. a

    Homoplasmic SNPs and n calls.

  4. b

    IUPAC codes “a,” “c,” “g,” “t.”

  5. c

    IUPAC codes “r,” “y,” “k,” “m,” “s,” “w,” “n.”

  6. d

    IUPAC codes “r,” “y,” “k,” “m,” “s,” “w.”

Sequence variantsa16,01020,08553.00.003318,43549.60.0031>0.05
n” SNPs16,01013,15334.70.002211,56631.10.0019>0.05
Homoplasmic SNPsb16,010693218.30.0011686918.50.0012>0.05
Potential heteroplasmic SNPsc16,01014,22137.50.002312,61033.90.0021>0.05
Heteroplasmic SNPsd16,01010682.80.000210442.80.0002>0.05

Additional QC across all phases

The existence of overlapping sample and marker datasets made a cross-platform genotype concordance QC measure possible. This validation step provided additional confidence and validation of our genotype calls across the different platforms and phases of this project. Examining only non-missing data and comparing phase 1 with phase 2 data (6371 pair-wise comparisons across 1276 common samples and five common SNPs), yielded a genotype concordance rate of 0.998. Evaluation of phase1 with phase 3 data considered 713 common samples and 132 common SNPs with a total of 92,516 pair-wise comparisons and yielded a concordance rate of 0.999. Finally, examination of phase 2 with phase 3 considered 6340 comparisons between 534 common samples and 12 common SNPs returned a concordance rate of 1.0.

Results

In phase 1, we had >80% power to detect a genotype relative risk (GRR) of 1.3 for any single SNP, with a type 1 error rate of 0.05 and an allele frequency of 0.10. The small number of individuals with African ancestry in our dataset significantly decreases the power for this subset (12% power for GRR of 1.3, α = 0.05, MAF = 0.10). Our stratified CMH test identified six nominally significant variants (P ≤ 0.05) that do not survive Bonferroni correction (Table S1). Using a continuous variable to control for stratification in a logistic regression framework, we identified six SNPs with nominal P-values (P ≤ 0.05) through our single marker tests of association (Table 2; Fig. 2; Table S3).These nominal associations do not survive either Bonferroni correction or our less conservative assessment via permutation testing using RVASSOC in the homogeneous Caucasian non-Hispanic subset of phase 1 (observed max χ2 = 4.81, P = 0.93). Of note, four of these six nominally significant variants overlap the stratified and logistic regression approaches; out of these four overlapping variants, two (mtDNA positions 9899 and 10589) tag subgroups of haplogroup L.

Figure 2.

A manhattan plot of the phase 1 logistic regresstion results.

In phase 2, we examined both the major European haplogroups and European haplogroup defining SNPs for association to ASD susceptibility. Our haplogroup analysis yielded no significant difference in the frequency of cases versus controls for any particular haplogroup (Table 3). There are too few SNPs in this phase to control for population substructure using the principal components approach we used in phase 1. Instead, we used the haplogroups as covariates in a logistic regression analysis. We see no significant difference between our cases and controls for any of these single haplogroup defining SNPs (Table 4). Interestingly, individuals with self-reported African ancestry belong to a number of haplogroups other than haplogroup L.

Given that the samples in phase 3 are almost completely overlapping with those genotyped in phase 1 and 2, we chose not to perform single marker association tests of the 139 SNPs previously examined in those phases within this dataset. The advantage of this phase rests in capturing rare variants which cannot be powerfully tested with single marker tests of association. We confirmed the homogeneity of the phase 3 dataset which was based on self-report using the mtGIF calculated with χ2 test statistics from all 941 polymorphic positions. Subsequently, we performed a joint test of these variants, many of which have MAF ≤ 0.01 and are spread across the mitochondrial genome; this returned no significant results (observed max χ2 = 6.34, P = 0.70).

Furthermore, we examined whether we detected more rare variation in total among cases when compared to controls specifically in the coding regions of the 13 protein subunits of the mitochondrial electron transport chain. We found no significant difference between our cases and controls for either the number of synonymous or non-synonymous changes in these genes (Table 5).

We specifically analyzed our phase 3 dataset to determine if any of our samples contained any of these rare variations as reported and organized by the MitoMap project (http://www.mitomap.org). We identified 25 variants which were not previously examined in phases 1 or 2 that were in the MitoMap tables of “mtDNA Mutations with Reports of Disease-Associations” from http://www.mitomap.org (Table 6). These tables contain any mtDNA variant that is reported in the literature of disease associations, but not necessarily replicated in subsequent investigations. A single variant (A13637G), previously associated to Lieber's Hereditary Optic Neuropathy (LHON), demonstrated a nominally significant P-value before multiple testing correction (P = 0.02) using the Fisher's Exact test. This rare allele was found in eight ASD cases and a single control.

Finally, we examined our phase 3 dataset for evidence of heteroplasmy and mutational burden differences between cases and controls. We examined the mutational burden in a specific subset of the variants detected, however failed to find any significant difference between our ASD case and control datasets (Table 7). No significant difference was found between cases and controls in the heteroplasmy analysis using individual raw intensity allele data (data not shown).

Discussion

In recent years, there has been considerable speculation that mitochondrial variation may play a role in ASDs. We performed the first comprehensive investigation of this hypothesis. Because of common ancestry and maternal inheritance of mtDNA, the vast majority of humans can be assigned to a known haplogroup that arose during ancient migrations. Approximately 40% of Caucasians of European descent belong to haplogroup H, and the total prevalence of the next most common mitochondrial haplogroups (I, J, and K) is approximately 25% (Torroni et al., 1994). Although most persons of African descent belong to haplogroup L, this group is extremely diverse and can be divided into many sub-haplogroups (Chen et al., 1995). This genetic variation results in distinctive sets of human mitochondrial electron transport chains with different capacities for energy production, free radical generation and apoptosis (Swerdlow et al., 1996; Wallace, 2005). Despite the initial sequencing of the mitochondrion in the 1980s, routine full mitochondrial sequencing has been prohibitively expensive and thus the full complement of variation has not been routinely examined in any disease.

In each of the phases of this study, we searched for whether we find evidence of association of mitochondrial variation in ASD. While we do not detect strong evidence for significant main effects of any single mtDNA variation in our current dataset, we identify only a handful of variants demonstrating a mild level of significance. Furthermore, with our substantially larger dataset than the one explored by Kent and colleagues (Kent et al., 2008), we find that there is no compelling evidence that European mitochondrial haplogroups influence risk of developing ASDs. Our inability to detect a significant haplogroup association is not surprising based on the work by Samuels and colleagues noting that the reliable detection of haplogroup associations in complex disease is difficult under most study conditions where power is limited with sample sizes <10,000 (Samuels et al., 2006).

It is intriguing to note that a couple of the nominal hits in phase 1 tag subgroups of haplogroup L. These signals are generated by the small and diverse set of self-reported African-ancestry samples within our dataset. The inherent diversity, which this panel of markers is designed to capture, among individuals of African ancestry combined with probable admixture and consequent misclassification of their mitochondrial ancestry increases the difficulty of appropriately ascertaining matched cases and controls based on self-report. Importantly, our study underlines the need for larger, more powerful studies of patients with African ancestry. These will allow us to make a definitive statement on the involvement of mtDNA in ASD susceptibility in a more inclusive manner.

Beyond our examination of single marker tests for association, we have explored a number of additional possible mechanisms by which mitochondrial variation may play a role in ASD. Collectively, our investigations fail to provide any convincing evidence for a major contribution of mtDNA variation or heteroplasmy to ASD.

Software

MSDAT source code and documentation is available for download at the Hussman Institute for Human Genetics website at http://hihg.med.miami.edu/software-download.

Acknowledgements

We thank the autism patients and their families, as well as the control parents and children for their participation in our studies. This work would not be possible without their generosity. We gratefully acknowledge the resources provided by the AGRE Consortium and the participating AGRE families. The AGRE is a program of Autism Speaks and is supported in part, by grant 1U24MH081810 from the NIMH to Clara M. Lajonchere (PI). We thank the individuals who volunteered for the control sample for their participation. A number of control subjects came from the National Institute of Mental Health Schizophrenia Genetics Initiative (NIMH-GI), data and biomaterials were collected by the “Molecular Genetics of Schizophrenia II” (MGS-2) collaboration. The investigators and co-investigators are: ENH/Northwestern University, Evanston, IL, MH059571, Pablo V. Gejman, M.D. (Collaboration Coordinator; PI), Alan R. Sanders, M.D.; Emory University School of Medicine, Atlanta GA, MH59587, Farooq Amin, M.D. (PI); Louisiana State University Health Sciences Center; New Orleans, Louisiana, MH067257, Nancy Buccola APRN, BC, MSN (PI); University of California-Irvine, Irvine, CA, MH60870, William Byerley, M.D. (PI); Washington University, St. Louis, MO, U01, MH060879, C. Robert Cloninger, M.D. (PI); University of Iowa, Iowa City, IA, MH59566, Raymond Crowe, M.D. (PI), Donald Black, M.D.; University of Colorado, Denver, CO, MH059565, Robert Freedman, M.D. (PI); University of Pennsylvania, Philadelphia, PA, MH061675, Douglas Levinson, M.D. (PI); University of Queensland, Queensland, Australia, MH059588, Bryan Mowry, M.D. (PI); Mt. Sinai School of Medicine, New York, NY, MH59586, Jeremy Silverman, Ph.D. (PI). In addition, cord blood samples were collected by V L Nimgaonkar's group at the University of Pittsburgh, as part of a multi-institutional collaborative research project with J Smoller, M.D. D.Sc. and P Sklar, M.D. Ph.D. (Massachusetts General Hospital) (grant MH 63420).

We are grateful to the John P. Hussman Institute for Human Genomics (HIHG) personnel within the Patient and Family Ascertainment Core, the Biorepository, and the Center for Genome Technology for their commitment to this project. This research was supported by grants from the National Institutes of Health (9R01MH080647 and 7P01NS026630) and by a gift from the Hussman Foundation. A subset of the participants was ascertained while Dr. Pericak-Vance was a faculty member at Duke University.

Authors' Contributions

JLM, JLH, and MPV designed the study. JLM, AH, ERM led the writing and revising of the manuscript. PLW, IK, DJH, JG performed and directed all molecular work including the genotyping and mtDNA resequencing efforts. AH, JLM, MS, and ERM performed the QC and statistical analysis. HHW, RKA, MLC, JLH, and MPV were involved in recruiting autism families and controls for this study. RKM and SMW provided control sample DNA. All authors have read and contributed to the manuscript.

Conflict of Interest

The authors declare no conflict of interest.

Ancillary