Qihua Tan, Epidemiology, Institute of Public Health, University of Southern Denmark, J. B. Winsløws Vej 9B, DK-5000 Odense C, Denmark. Tel.: 0045 65503536; fax: 0045 65411911; e-mail: email@example.com
Genetic interactions or epistasis could make a substantial contribution to variation in human complex traits including longevity. However, detecting epistatic interactions in high dimensional datasets is difficult due to various reasons including multiple testing of correlated tests. We introduce a novel permutation strategy to the case-only analysis of gene-by-gene interaction using multiple SNPs. The method is applied to genes coding for Forkhead box O transcription factors which recently have been associated with human longevity across different populations hypothesizing that epistatic interaction in the regulation and expression of the FOXO gene family could contribute to the human longevity phenotype. Genotype data were collected from 1088 individuals from the Danish 1905 birth cohort aged over 92–93 years with 12 SNPs in the FOXO1a and 15 SNPs in the FOXO3a genes. Our analysis detected a joint effect between rs9486902 in FOXO3a and rs2701858 in FOXO1a that highly significantly contributes to human longevity (OR = 3.23, 95% CI: 2.93–3.53) which is consistent in both males and females. Our results were compared with published studies, and importance of our novel method and findings was discussed.
Complex diseases or phenotypes like longevity may have multiple genetic and environmental causes. The complexity arises from the fact that many genetic and environmental factors may interact with each other such that the expression of the phenotype may not be accurately predictable based on knowledge of the individual effects for each of the component factors considered alone. Genetic interactions or epistasis could hold the key to understanding of complex conditions such as Alzheimer's disease and diabetes (Carlborg & Haley, 2004; Moore, 2005). Recently, epistatic interactions have been used to explain the ‘missing heritability’ in genome-wide association studies (GWAS) (Zuk et al., 2012) which assume that genetic variants act in an additive and independent manner. However, detecting epistatic interactions in high dimensional datasets is difficult because of the computational complexity due to all possible combinations of genetic variations across loci and problem of multiple testing under high dependence. As one solution, a recently proposed approach employs biological knowledge, such as functional pathway, to narrow down the evaluation to gene combinations with biologically concise reasons (Ma et al., 2012).
The FOXO (Forkhead box O) transcription factors, characterized by a conserved DNA binding domain, are essential in both development and adult physiology. Members of the FOXO family are believed to be evolutionarily conserved post-translational mediators of insulin and growth factor signaling. In C. elegans, the FOXO orthologue DAF-16 had been shown to regulate life span. In humans, genetic variations in genes coding for FOXO1a and FOXO3a have been associated with longevity in different populations (Bonafè et al., 2003; Kojima et al., 2004; Willcox et al., 2008; Li et al., 2009; Soerensen et al., 2010). Recently, Zeng et al. (2010) found evidence of gene-by-gene and gene-by-environment interactions that affect longevity in a case–control study of middle-aged Chinese and Chinese centenarians.
The case-only design is a powerful method for analyzing gene-by-gene interaction effects on longevity (Tan et al., 2002) provided that the two genetic variants are not in linkage disequilibrium (LD). The method is characterized by assessment of interaction effects without controls, an important feature in longevity studies as compared with the traditional case–control design (Tan et al., 2006) for which the longevity phenotype in the young controls are actually censored. Similar to all association analyses, current application of the case-only design encounters problem of multiple testing due to the popularity of the single nucleotide polymorphism (SNP) markers. The situation is further complicated by the correlated structure among SNPs typed in each of the interacting genes. Resampling-based methods have been applied in analyzing microarray gene expression data for P value adjustment accounting for multiplicity and dependence structure in the thousands of genes tested simultaneously in a microarray experiment using the case–control design (Dudoit et al., 2002). In the case-only analysis of gene–gene interactions (G × G), the popular phenotype-based permutation test for multiple comparisons is inapplicable as all samples are of the same phenotype, that is, cases. This paper aims at first introducing a novel genotype-based permutation scheme to the case-only analysis of G × G using multiple SNPs genotyped in each of the interacting genes and second, applying the method to SNPs data on FOXO1a and FOXO3a genes in the Danish 1905 birth cohort who survived over 93 years of age.
The combination of 12 SNPs in FOXO1a and 15 SNPs in FOXO3a genes created 180 pairs of interacting SNPs. The above-described case-only analysis was applied to each combination in males and females separately and in the whole sample of cases as well. To correct for multiple testing, we performed the genotype-based permutation test to each of the three analyses with B = 10 000 iterations. After adjustment, only one pair of interacting SNPs (rs2701858 in FOXO1a and rs9486902 in FOXO3a) remained significant (permutation P value < 0.0001, OR = 3.23, 95% CI: 2.93–3.53). The same analysis was applied to males and females separately, and interestingly, both showed the same pair of SNPs as the only significant interacting SNPs (permutation P value < 0.0001, OR = 4.42, 95% CI: 3.85–5 in males; permutation P value < 0.0001, OR = 2.83, 95% CI: 2.47–3.18 in females).
Table 1 is a contingency table for the frequency counts by genotype for the two interacting SNPs in male, female, and the whole samples. A total of 49 individuals were dropped due to missing genotypes in either of the two SNPs. Based on Table 1, we calculated frequencies of carriers of minor alleles of one SNP conditional on genotypes of the other SNP for males (Fig. 1A,B), females (Fig. 1C,D), and the whole samples (Fig. 1E,F) together with 95% CIs. As shown by the figures, frequencies of the minor allele in one SNP are significantly higher in carriers of the minor allele than in homozygotes for the common allele of the other SNP suggesting high dependency between genotypes of the two SNPs as a result of interaction (Fig. 1).
Table 1. Sample counts by genotypes (0 for noncarriers and 1 for carriers of the minor alleles) at SNPs rs9486902 in FOXO3 and rs2701858 in FOXO1 genes and case-only test results
Minor allele rs2701858
Minor allele rs9486902
χ2 = 26.28, P = 2.96e-7, OR = 4.42, 95% CI: 3.85–5.00
χ2 = 32.95, P = 9.44e-9, OR = 2.83, 95% CI: 2.47–3.18
χ2 = 60.16, P = 8.76e-15, OR = 3.23, 95% CI: 2.93–3.53
We have introduced a novel resampling-based approach to enable case-only analysis of gene–gene interactions using multiple SNPs data. Application of our method to genotype data on FOXO genes in the Danish 1905 birth cohort surviving over ages 92 and 93 detected highly significant interaction effect by one pair of SNPs (rs9486902 in FOXO3a and rs2701858 in FOXO1a) that is beneficial for longevity in both males and females. Our application of the method is characterized by a novel genotype permutation strategy that enables resampling-based case-only analysis of high dimensional SNPs data for their interaction effects to handle multiple testing with correlated structure due to LD.
Detection of epistatic effects in the current genetic association studies is a challenge in genetic epidemiology due to issues created by the high dimensional analysis (Carlborg & Haley, 2004). Proper methods for dealing with multiple testing are demanding because of dependency of the tests, and conservative methods such as Bonferroni correction can only detect large interaction effects but will ignore subtle epistasis. Through example application to the FOXO genes, we have shown that our novel genotype-based permutation strategy can be applied to handle multiple comparisons in correlated tests on epistatic interaction between two genes each with multiple typed SNPs. Besides longevity, the strategy can be applied to case-only analysis of gene-by-gene interaction on disease phenotypes as well provided that the main interest is epistatic effects.
Soerensen et al. (2010) performed an association analysis of FOXO3a gene variants with human longevity using a case–control design with the same samples from the Danish 1905 birth cohort as used in this study and middle-aged Danes as controls. The study identified SNP rs9486902 as benefiting longevity in a recessive model with higher statistical significance in males. Taking the same case–control data as in their study, we fitted a logistic regression model to the genotype data of SNPs rs2701858 in FOXO1a and rs9486902 in FOXO3a with an interaction term assuming dominant effects similar to that in our case-only analysis. The results only showed a significant interaction effect in male samples (OR = 2.36, 95% CI: 1.04–5.35, P = 0.04). Compared with our result for males, the interaction effect, although in the same direction, is relatively underestimated and the confidence interval is obviously large. This phenomenon exemplifies, in empirical data, the high efficiency of the case-only model in detecting interaction as compared with the traditional case–control design.
Although useful, the case-only model is unable to handle interactions between genes each with an additive mode of inheritance. Moreover, the permutation test can be computer intensive when applied to very high dimensional data such as the GWAS data. We emphasize that our permutation procedure is more suitable to case-only analysis of gene–gene interaction in candidate gene studies using multiple SNP markers. Recently, Pierce & Ahsan (2010) introduced a case-only genome-wide interaction analysis and suggested a screening procedure that searches the genome for variants that interact with a candidate polymorphism. Although interesting, the selection of candidates and managing the potential interactions can still be challenging issues. More work need to be carried out to introduce the powerful case-only model to genome-wide studies.
Zeng et al. (2010) reported epistatic effects on longevity in the FOXO gene family (SNPs rs2755209 and rs2755213 in FOXO1a; SNPs rs2802292 and rs2253310 in FOXO3a) in a Chinese study. Although their risk estimates did not reach statistical significance, consistent joint effects for carriers of the minor alleles of the two genes were found that increase chance of survival from middle age to over 100 years. Of the three SNPs also covered in our Danish data (rs2755209, rs2755213 in FOXO1a and rs2802292 in FOXO3a), neither significant interactions nor a consistent trend was found. Nevertheless, our two SNPs not covered in the Chinese study did exhibit a joint effect in favor of longevity in carriers of their minor alleles as indicated by the ORs in our case-only analysis. More replication studies are needed to establish the epistatic interactions on human longevity in the FOXO gene family.
Materials and methods
Our longevity cases consist of 1088 subjects who were born in 1905 and who were first interviewed in 1998 at ages 92–93 years. All cases were collected by the Danish 1905 Cohort Study, a study that covers all Danes born in 1905 (Nybo et al., 2001). Blood samples were taken at the first interview for DNA extraction. The Danish 1905 Cohort Study was approved by the Danish National Committee on Medical Research Ethics.
The procedures for DNA extraction and genotyping were the same as described by Soerensen et al. (2010). In brief, DNA was purified from dry blood spots using QIAamp DNA Mini and Micro Kits (Qiagen, Dusseldorf, Germany) and genotyped by the Illumina GoldenGate assay (Illumina Inc, San Diego, CA, USA). Genotyping was performed for 12 SNPs in the FOXO1a gene on chromosome 13 (rs10507486, rs12854161, rs12876443, rs2180961, rs2701858, rs2721068, rs2755209, rs2755213, rs2951787, rs2984121, rs4581585, and rs9603776) and 15 SNPs in the FOXO3a gene on chromosome 6 (rs10499051, rs12206094, rs12207868, rs12212067, rs13217795, rs13220810, rs2764264, rs2802292, rs3800231, rs3800232, rs479744, rs7762395, rs9398172, rs9400239, and rs9486902). SNPs for genotyping were selected to cover the majority of known common genetic variations in the two genes. Selection of chromosome regions and tagging SNPs was described in detail elsewhere (Soerensen et al., 2010).
The case-only analysis
The case-only analysis is an efficient design in comparison with the traditional case–control design for detecting gene–gene (Yang et al., 1999) and gene–environment (Piegorsch, 1994; Khoury & Flanders, 1996) interactions under the assumption that the interacting factors are independent. The design requires far fewer study samples (Yang et al., 1997) and avoids difficulties in selecting appropriate controls (Khoury & Flanders, 1996) with the trade-off that no main effects other than interaction can be assessed. By treating longevity individuals (such as centenarians or nonagenarians) as cases, Tan et al. (2002) introduced and validated the method for detecting gene–gene interaction in longevity studies. Table 2 is a typical contingency table for case-only analysis of genes M and N with 0 and 1 for noncarrying and carrying of the minor allele of a SNP. The individual counts in the four cells are for samples that are noncarriers (a00), carriers (a11) of both minor alleles, carriers of minor allele in gene M and major allele in gene N (a01), and carriers of minor allele in gene N and major allele in gene M (a10). In brief, the case-only design detects an interaction effect on longevity as an association between two interacting genes in longevity samples with an observed positive association suggesting synergistic interaction while an inverse association indicating antagonistic interaction (Ottman, 1996). The null hypothesis is thus that the genotype distribution of one gene is independent of the other gene for which a standard χ2 statistic can be calculated as to test the null hypothesis with one degree of freedom, where n1., n0., n.1, n.0 are marginal sum of genotype counts and n the overall sum. The odds ratio for the interaction effect can be calculated as which has been shown to measure departure from the multiplicative joint effect on longevity from the two genes (Tan et al., 2002). The standard error for the natural log of OR can be calculated as . A confidence interval for ln(OR) can then be constructed and converted to the original scale.
Table 2. A contingency table for calculating the test statistic in a case-only study
Correction for multiple testing
As mentioned before, the case-only design for assessing G × G using SNPs data creates multiplicity of correlated nature due to LD in multiple SNPs genotyped in each of the interacting genes. Resampling-based methods have designed for correction of multiple comparisons in the situation of correlated test statistics (or P values), for example, the maxT and the minP approaches introduced by Westfall & Young (1993). These methods have been applied in analyzing genome-wide microarray gene expression data (Dudoit et al., 2002). The maxT method is also implemented in the popular Plink package for genetic association analysis (http://pngu.mgh.harvard.edu/~purcell/plink/). Given the special situation of case-only analysis, we need to introduce a novel permutation strategy to make use of the resampling-based methods for dealing with multiple testing.
In a traditional case-control study, a permutation test can be performed by shuffling the phenotype, that is, randomly assign phenotype labels to the samples. The procedure produces a null distribution of the test statistic which can be used for obtaining an empirical significance level for the statistic of the original data.
In the case-only design, phenotype-based permutation is inapplicable because all samples are of the same phenotype, that is, cases. Instead of permuting the phenotypes, we introduce a genotype-based permutation procedure for assessing statistical significance of the test statistics and for correction of multiple testing due to multiple SNPs typed in the interacting genes. In Fig. 2, we illustrate the procedure with two interacting genes typed for m SNPs in gene 1 and n SNPs in gene 2. The permutation is conducted by shuffling the genotypes among all subjects for one of the two interacting genes (e.g., gene 2 in Fig. 2). The idea is, through permutation, to destroy the genotype dependence between the two genes due to gene–gene interaction in each of the cases. The permuted samples are random samples in term of the epistatic interaction which can be used to generate the distribution of the χ2 statistic calculated from the case-only analysis of each random sample upon a large number of iterations. Note that, the χ2 statistics from all pairs of interacting SNPs are uniformly distributed with one degree of freedom, a situation that meets the assumption of ‘subset pivotality’ in resampling-based multiple testing (Westfall & Troendle, 2008).
P value correction
To correct for multiple testing, we adopt the step-down minP procedure introduced by Westfall & Young (1993) which consists of the following steps:
Order the raw P values from the original data for the m × n pairs of interacting SNPs:
Permuting the genotypes as described above.
Compute m × n P values from the χ2 test applied to the bth permuted replicate.
Compute the successive minima of P values for the bth replicate, qi,b, with for i = m × n–1,···, 1.
Repeat 2–4 B times, and estimate the adjusted P values as
Here, I(∙) is an indicator of value 1 for true and 0 for false conditions. Constraints on monotonicity is obtained by setting
This work was partially supported by the EU Seventh Framework Programme (FP7/2007-2011) under grant agreement no. 259679 and NIH/NIA grant P01 AG08761.