• Open Access

Review: a meta-analysis of GWAS and age-associated diseases


Norman E. Sharpless, MD, The Lineberger Comprehensive Cancer Center, CB #7295, Departments of Medicine and Genetics, The University of North Carolina School of Medicine, Chapel Hill, NC 27599-7295, USA. Tel.: +(919) 966 1185; fax: +(919) 966 8212; e-mail: nes@med.unc.edu


Genome-Wide Association studies (GWAS) offer an unbiased means to understand the genetic basis of traits by identifying single nucleotide polymorphisms (SNPs) linked to causal variants of complex phenotypes. GWAS have identified a host of susceptibility SNPs associated with many important human diseases, including diseases associated with aging. In an effort to understand the genetics of broad resistance to age-associated diseases (i.e., ‘wellness’), we performed a meta-analysis of human GWAS. Toward that end, we compiled 372 GWAS that identified 1775 susceptibility SNPs to 105 unique diseases and used these SNPs to create a genomic landscape of disease susceptibility. This map was constructed by partitioning the genome into 200 kb ‘bins’ and mapping the 1775 susceptibility SNPs to bins based on their genomic location. Investigation of these data revealed significant heterogeneity of disease association within the genome, with 92% of bins devoid of disease-associated SNPs. In contrast, 10 bins (0.06%) were significantly (P < 0.05) enriched for susceptibility to multiple diseases, 5 of which formed two highly significant peaks of disease association (P < 0.0001). These peaks mapped to the Major Histocompatibility (MHC) locus on 6p21 and the INK4/ARF (CDKN2a/b) tumor suppressor locus on 9p21.3. Provocatively, all 10 significantly enriched bins contained genes linked to either inflammation or cellular senescence pathways, and SNPs near regulators of senescence were particularly associated with disease of aging (e.g., cancer, atherosclerosis, type 2 diabetes, glaucoma). This analysis suggests that germline genetic heterogeneity in the regulation of immunity and cellular senescence influences the human healthspan.


A central tenet of gerontology is that common pathogenic mechanisms cause age-related phenotypes in disparate organs and tissues. For example, telomere dysfunction in the liver, bone marrow, and lung has been linked to age-associated, tissue-specific diseases such as cirrhosis, aplastic anemia, and pulmonary fibrosis, respectively (Armanios, 2012). Several broad pathways have been suggested as candidate global modifiers of human aging including sirtuins, insulin/IGF-1, ROS metabolism, inflammation, and cellular senescence. A prediction of the notion that common pathogenic pathways contribute to aging of distinct tissues is that there should be genes whose expression modulates these pathways, and heterogeneous expression of such genes within a population should be associated with multiple, seemingly distinct tissue-specific diseases.

High-density single nucleotide polymorphism (SNP) arrays have provided population geneticists a high throughput method for identifying polymorphisms associated with the onset of complex phenotypes (e.g., physiological traits and/or markers, congenital abnormalities, and disease susceptibility/resistances). Large scale, population-based studies that utilize SNP arrays to gain insights into gene(s) that may promote/cause a complex phenotype consist of Candidate Gene Association Studies (CGAS) and Genome-Wide Association Studies (GWAS). The key difference between these two epidemiological study methods is that CGAS take a hypothesis-driven approach, whereas GWAS are performed in a non-biased manner (see review by (Jorgensen et al., 2009) for more detailed discussion of advantages/disadvantages of these methodologies). Moreover, modern pedigree studies (linkage analyses) can be performed using SNP arrays to perform genome-wide searches to identify variants associated with complex diseases, such as Alzheimer’s disease (Zuchner et al., 2008), but variants identified from such efforts may be limited to small numbers of actual cases (i.e., individual families).

GWAS have been successfully employed to identify common polymorphic variations that contribute to several complex phenotypes. The value of GWAS is underscored by the ready identification of risk alleles that have been replicated in independent populations, which have identified both novel and known modulators of disease pathogenesis, as well as revealed new therapeutic targets (Altshuler et al., 2008). Moreover, the National Human Genome Research Institute (NHGRI) maintains a catalog of published GWAS that currently houses approximately 1000 studies that have identified >4500 SNPs to >500 phenotypes (Hindorff et al., 2011). In an effort to understand what GWAS tell us about disease of human aging, we performed a meta-analysis of the NHGRI GWAS catalog. In particular, we used this resource to ask in an unbiased, genome-wide manner whether there are ‘hotspot’ loci associated with multiple disease susceptibility/resistance phenotypes. Toward that end, we filtered this NHGRI data set to only include studies that focused on clinically relevant human diseases. To better visualize chromosomal loci and candidate genes associated with multiple, distinct human diseases, especially age-associated diseases, we summed the frequency of disease-associated SNPs in 200 kb bins spanning the whole genome. While clearly ‘age-related disease’ is not the same thing as ‘aging’, we elected to focus this analysis on disease susceptibility given the tractability of many well-delineated diseases to GWAS, as opposed to the mixed results obtained for less discrete endpoints (e.g., longevity, frailty, etc.). We believe this approach is still of interest to gerontologists given that freedom from disease (wellness) is an essential determinant of healthspan.

To compile and filter GWAS that identified SNPs specific to human disease resistance/susceptibility, the complete 6/29/11 release of the NHGRI GWAS database was downloaded from the NHGRI GWAS website (Hindorff et al., 2011). This release contained 932 published GWAS that identified 4558 SNPs in 511 phenotypes, with each SNP achieving a combined P-value of < 1.0 *10–5. Studies included in the catalog are also required to include at least 100 000 SNPs to permit a truly genome-wide analysis. Our analysis did not distinguish between ‘susceptibility’ SNPs and ‘protective’ SNPs, as each ‘susceptibility’ allele implies an alternative ‘protective’ allele at the same location. This data set was filtered to exclude small GWAS (< 300 cases) as well as those that investigated non-disease traits, congenital deformities, and medical conditions of limited morbidity (e.g., restless leg syndrome). In rare instances (n = 14) where disease versus non-disease classification of a GWAS was not obvious, classification was performed with blinding to GWAS results (see Table S1 and Table S2 for included and excluded ‘diseases’). The inclusion/exclusion of these borderline conditions did not affect the analysis’ conclusions. The filtered GWAS data set consisted of 372 studies that identified 1775 SNPs associated with susceptibility/resistance to 105 unique human diseases. The total number of patients from these 372 studies totaled more than 2.3 million individuals from diverse ethnic backgrounds.

This data set was then used to construct the genome-wide disease susceptibility map (Fig. 1). The hg19 release of the human genome was divided into 15,157 bins with each bin containing 200 kb of genomic sequence (see Table S3 for genomic coordinates and hits associated with each bin). The analysis was also not sensitive to choice of bin size. SNPs from the filtered GWAS data set were mapped to the binned genome with redundant hits of the same disease to the same bin counted as a single hit. As an example of this, when GWAS SNPs mapped to the same bin as a disease that encompassed more specific disease states within it (e.g., inflammatory bowel disease (IBD) and ulcerative colitis), it was counted as a single disease hit for that bin. Alternatively, if only specific forms of IBD mapped to the same bin, they were counted as individual disease hits for that bin. This approach allowed studies that identified distinct effects on disease subtypes to be included, without over representing studies focused on identifying SNPs associated with disease categories. Distinguishing disease subtypes (i.e., IBD) had minimal impact on our findings, and only shifted one locus, 17q12, above the significance threshold. The number of unique disease associations per bin was then graphed versus chromosomal location in a ‘Manhattan plot’ (Fig. 1), and a 10 000 iteration permutation analysis was performed on the mapped SNPs to estimate statistical significance.

Figure 1.

 The genetic landscape of human disease. Manhattan plot depicting the number of unique human diseases per bin linked to disease susceptibility SNPs identified by GWAS. Each point represents a 200 kb bin ordered by chromosomal location. The dotted line represents the cutoff for statistical significance as determined by a 10 000 iteration permutation test (P < 0.05). The two highest peaks of disease association (P < 0.0001) are circled.

We elected to use a permutation test to estimate statistical significance, as this approach accounts for variation in the number of SNPs tested, our method of counting diseases in shared categories, and multiple comparisons of assessing each bin for significance. Permutation testing is the gold standard for determining significance, provided that it is computationally tractable (Johnson et al., 2010). In each iteration of this test, all SNPs were randomly and independently assigned to the 15 157 bins that represent the whole genome, and the bin with the maximum number of randomly assigned SNPs was identified. Bins containing more than four unique disease-associated SNPs occurred in less than 5% (i.e., P < 0.05) of the 10 000 iterations performed, setting this as our threshold for significance (indicated by a dashed line on Fig. 1). Although it is possible that not all of the 15,000 +  bins are assayed equally well by GWAS, the inclusion criterion of the NHGRI requiring at least 100 000 mapping SNPs indicates that the large majority of the genome is covered in these analyses, and the major conclusions of the study remain significant even if the permutation analysis is restricted to a small fraction of the genome.

This analysis revealed substantial heterogeneity in the human genome with regard to disease susceptibility. The majority of bins (13 900 of 15 157; 92%) did not contain any disease-associated SNPs. In contrast, only ten bins (2 Mb or 0.06% of the genome) showed statistically significant enrichment (P < 0.05) for disease association, with two strong ‘peaks’ (P < 0.0001) of multi-disease association. The largest peak spanned four neighboring bins (800 kb) that contain the gene-rich MHC locus on chromosome 6p21 (Fig. 1). SNPs in this bin were linked to 24 unique diseases, most of which were autoimmune in nature (e.g., asthma, inflammatory bowel disease, lupus, Hodgkins Disease, Table 1), and not classical disease of aging. This finding confirms the well-established pathogenic role of MHC polymorphisms in the development of diverse autoimmune diseases (Fernando et al., 2008; Rioux et al., 2009). Therefore, while the association of the MHC locus with autoimmune diseases is not surprising, this finding serves as a positive control for the analysis.

Table 1.   Chromosomal loci of significantly enriched (‘hotspot’) bins
Chromosome regionNo. unique diseases/binCandidate gene(s)Associated disease susceptibilities
1p31.35IL23RImmune: IBD (x2), Behcet’s disease, Psoriasis, Anklyosing spondylitis
2p16.15RELImmune: IBD, RA, Psoriasis, Celiac Disease, Hodgkin’s Lymphoma
5p15.335TERTSenescence: Cancers (x5), Idiopathic pulmonary fibrosis
6p2126 across four binsMHC, NOTCH4Immune: Arthritis (x4), IBD (x2), Cancer (x5), Lupus, MS, Scleroderma, Celiac disease, T1DM, Asthma, Primary biliary cirrhosis, Psoriasis
7q32.15IRF5, TNPO3Immune: SLE, IBD, RA, Primary biliary cirrhosis, Scleroderma
9p21.310p15INK4b, p16INK4a, p14ARF, ANRILSenescence: MI, stroke, T2DM, Glaucoma, Aortic aneurysm, intracranial aneurysm, Cancers (x3), Endometriosis
17q125IKZF3, GSDMA, GSDMBImmune: IBD (x2), Asthma, RA, T1DM

The second highest disease susceptibility association peak mapped to a gene-poor bin on chromosome 9p21.3. This bin contains only four transcripts emanating from the INK4/ARF (or CDKN2a/b) locus, which harbors three related protein-encoding transcripts (p15INK4b, p16INK4a, and p14ARF) as well as a long non-coding RNA (ANRIL) that is anti-sense to p15INK4b. The INK4/ARF locus is a key mediator of cellular senescence that inhibits cell cycle progression from G1 to S phase in response to various forms of cellular stress (Sharpless & DePinho, 2007). The 9p21.3 bin was linked to 10 unique diseases, almost all of which are age-associated: cancers (e.g., breast, glioblastoma), type 2 diabetes mellitus (T2DM), glaucoma and several atherosclerotic diseases (e.g., stroke, aortic aneurysm, myocardial infarction) (Table 1). It is worth noting the considerable size of these two peaks: the 6p21 and 9p21.3 disease susceptibility hotspots represent 0.03% of the genome, but combined were associated with nearly a third (34 of 105) of the unique diseases analyzed by GWAS.

The remaining five bins (1p31.3, 2p16.1, 5p15.33, 7q32.1, and 17q12) that were significantly enriched for disease associations (P < 0.05) were also directly linked to either immunity/inflammation or cellular senescence pathways. The 1p31 and 2p16 bins contain IL23R and REL, respectively, which modulate immunity and lymphocyte biology, and these bins were predominantly associated with autoimmune disease (Table 1). The 5p15.33 bin includes TERT, a critical subunit of telomerase, which is associated with cellular senescence by modulating telomere length (Martinez & Blasco, 2011). Disease susceptibilities mapping to the 5p15.33 bin were mainly comprised of cancers, consistent with the association between telomere length and cancer susceptibility (Table 1) (Hills & Lansdorp, 2009; Willeit et al., 2010). The 5p15.33 bin was also associated with idiopathic pulmonary fibrosis (IPF), consistent with the finding of increased IPF in patients with congenital telomerase deficiency (Armanios et al., 2007; Tsakiri et al., 2007). Candidate genes in the 7q32 and 17q12 bins are less obvious, but these loci were also solely associated with autoimmune or inflammatory diseases (Table 1), suggesting these bins harbor modulators of the immune response. In general, the 5 loci associated with immunity and inflammation were mostly associated with autoimmune diseases (e.g., T1DM, asthma, IBD, Hodgkins disease) and were not as strongly linked to age-associated diseases as the two bins associated with senescence (i.e., cancers, atherosclerosis, T2DM, glaucoma, pulmonary fibrosis).

Although the finding that all loci associated with broad disease resistance appeared related to effects on immunity or senescence, there are limitations to this analysis. First, cis-regulatory elements can act over a large genomic scale (e.g., several Mb’s); for example, 9p21.3 variants have been suggested to influence expression both of the nearby tumor suppressor proteins of the INK4a/ARF locus and IFNa-21, a more distant (approximately 1 Mb) regulator of inflammation (Liu et al., 2009; Harismendy et al., 2011). Likewise, another gene in the 5p15.33 bin, CLPTM1L, has also been postulated to contribute to cancer progression (McKay et al., 2008). Therefore, the true causal variant located near GWAS-identified SNPs may influence expression of one or more local transcripts, some or all of which may not be located in the same bin. Moreover, an ascertainment bias exists in that certain well-demarcated disease states (e.g., autoimmune diseases) appear more tractable to GWAS than less clinically distinct entities (e.g., community acquired pneumonia). Therefore, not all morbid conditions of aging are tractable to GWAS.

Importantly, the prevalence and morbidity of each disease were not weighted in this study. For example, scleroderma (rare) and myocardial infarction (common) were each counted as a single, unique disease per genomic bin, despite differing greatly in their total contribution to human morbidity. Additionally, this analysis does not account for SNP prevalence or scale of their effect. Future work could incorporate these factors to estimate the multi-disease population attributable risk associated with certain SNP genotypes. As a result of these limitations, this analysis may overestimate the importance of the MHC locus, which is strongly associated with several rare diseases. By contrast, it may underestimate the relevance of the 9p21.3 bin, which is associated with common, highly morbid diseases (Table 1).

Although the association of senescence regulators such as TERT and p16INK4a with cancer and the MHC locus with autoimmunity is not surprising, the finding that all identified hotspots of recurrent disease association map to bins linked to either inflammation/immunity or cellular senescence is striking. The diversity of age-related diseases associated with the 9p21.3 bin is particularly remarkable (Fig. 2). Of the four principal causes of age-related morbidity (neoplasia, metabolic disease, atherosclerosis, and neurodegeneration), three are recurrently associated with polymorphisms mapping near the INK4/ARF locus by GWAS (Fig. 2). Recently, even the outlier, neurodegenerative disease, has been linked to this locus based on a genome-wide pedigree study of late onset Alzheimer’s disease (Zuchner et al., 2008). While it remains unclear how modulating senescence may contribute to some diseases in the 9p21.3 bin, this finding is consistent with several recent murine studies showing an effect of modulating p16INK4a expression in vivo on many non-malignant, age-associated phenotypes including T2DM, atherosclerosis, T-cell function, cataracts, and sarcopenia (Krishnamurthy et al., 2006; Baker et al., 2011; Chen et al., 2011; Kuo et al., 2011; Liu et al., 2011).

Figure 2.

 The INK4/ARF Locus is a Genomic Hotspot for Age-Associated Disease Susceptibility. The primary transcripts within the human INK4/ARF locus including the long non-coding RNA, ANRIL, are depicted relative to chromosome 9. Disease susceptibility SNPs identified by GWAS that mapped to the INK4/ARF locus are represented by color-coded circles above the genomic ruler. Susceptibility SNPs displayed below the ruler were identified by non-GWAS such as genome-wide pedigree studies. Each color represents a specific human disease that is described in the SNP color key. Clusters of susceptibility SNPs associated with a specific disease are marked by large color-coded ovals. ASVD = Atherosclerotic vascular diseases and T2DM = Type 2 Diabetes Mellitus.

It is also worth noting what was not associated with broad disease susceptibility: conserved pathways that modulate longevity in model organisms (e.g., Insulin/IGF-1 signaling, mTOR signaling, reactive oxygen species signaling, Sirtuins, etc.). It is possible that regulation of these pathways is not variable among human populations or that these pathways do not modulate general disease resistance in humans, but we think more likely this observation reflects a lack of power of the GWAS meta-analysis approach. Accordingly, SNPs near IGF1R, FOXO3A, and AKT1 have been associated with longevity in candidate studies and pedigree analyses (Suh et al., 2008; Pawlikowska et al., 2009; Sebastiani et al., 2012), suggesting an association of these loci with age-associated conditions may emerge in genome-wide analyses with further study. Nonetheless, this unbiased meta-analysis of results from approximately 2.3 million patients only identifies polymorphic regulation of cellular senescence and immunity as general determinants of genetic susceptibility to a host of human diseases, with in particular a striking association of senescence with age-associated disease. These genetic data support the therapeutic targeting of these specific pathways to promote broad disease resistance and augment the human health span.


This work was supported by grants from the Burroughs Wellcome Foundation and the NIA.

Author contributions

WRJ contributed to the study design, analysis and interpretation of data, preparation of figures and tables, and writing and editing the manuscript. APS contributed to the study design, collection and interpretation of data, preparation of figures and tables, and writing and editing the manuscript. NES conceived the study, contributed to the study design, interpretation of data, and writing and editing the manuscript. The authors have no conflict of interest to report.