The largest, most comprehensive GWA study to date was carried out by the Wellcome Trust Case-Control Consortium (WTCCC) . The WTCCC study used the Affymetix 500 K GeneChip and examined 3000 shared controls and 2000 cases, all of UK Caucasian ancestry, for seven common complex diseases: bipolar disorder, coronary artery disease, Crohn’s disease, hypertension, rheumatoid arthritis, type 1 diabetes and type 2 diabetes . The study identified 25 independent association signals at a stringent level of significance (P < 5 × 10−7). Association signals were identified for all diseases except hypertension, where the strongest signal had P = 7.7 × 10−7. This initial analysis of the WTCCC study therefore doubled the number of known complex disease genes. However, the WTCCC was primarily a hypothesis generating study, with only the ‘low hanging fruit’ being convincingly identified in this ‘first pass’ analysis: many of the SNPs with P-values >5 × 10−7 will also be disease-causing variants. As will be described in the sections below, follow-up studies in sufficiently powered replication cohorts, and the combination of findings from several GWA scans have confirmed many other complex disease variants.
Type 2 diabetes
It is perhaps the results of the first type 2 diabetes genome-wide scans [18–22] that best illustrate the power of the GWA approach for identifying novel genes that are important in the aetiology of complex disease. Following up the results from the initial WTCCC GWA scan , we worked closely with the DGI  and FUSION  groups, who had performed similar studies. We used the combined information from these three studies to prioritize variants for follow-up. Including replication samples, these three studies provided data from 14 586 cases and 17 968 controls. The combined analyses identified three entirely novel type 2 diabetes susceptibility genes: CDKAL1 [cyclin-dependent kinase 5 (CDK5) regulatory subunit-associated protein 1-like 1; OR 1.12, combined P = 4.1 × 10−11], IGF2BP2 (insulin-like growth factor 2-binding protein 2; OR 1.14, combined P = 8.6 × 10−16) and CDKN2A/CDKN2B (cyclin-dependent kinase inhibitor 2 A/B) gene region (OR 1.20, combined P = 7.8 × 10−15) and demonstrated that integrating the results from multiple genome scans can aid the prioritization of signals for replication, and allow confirmation of genes at appropriate levels of statistical confidence not possible with individual GWA studies.
Other type 2 diabetes GWA studies have also been published. The deCODE study  of several European and a Chinese population replicates the association of the CDKAL1 variant (OR 1.20 in Europeans and 1.25 in Chinese). These four studies also confirm the association of variants near HHEX (homeobox, hematopoietically expressed) and SLC30A8 [solute carrier family 30 (zinc transporter), member 8] genes, originally published by Sladek et al. . Importantly, as a positive control, associations for variants in PPARG , KCNJ11  and TCF7L2 , originally identified through candidate gene and positional cloning methods, were also seen in the GWA scans, with expected odds ratios.
Of the variants identified through the GWA approach, the two near the CDKN2A/CDKN2B gene are particularly interesting. CDKN2A encodes P16INK4a, and is a known tumour-suppressor gene . Mutations of CDKN2A cause diverse neoplasias. CDKN2A is an inhibitor of cyclin-dependent kinase 4 (CDK4), which is important for beta-cell replication . Overexpression of Cdkn2a in mice leads to decreased islet proliferation, whilst Cdkn2a knockout mice demonstrate enhanced islet proliferation and survival after beta-cell ablation . Overexpression of Cdkn2b causes islet hypoplasia and diabetes in murine models . Together with the CDKAL1 association, the CDKN2A/B finding implicates the cyclin-dependent kinase pathway in the pathophysiology of type 2 diabetes.
Another interesting feature of the CDKN2A/B finding is that, as described below, variants of the CDKN2A/B gene have also recently been shown to predispose to myocardial infarction (MI). Determining why a gene predisposes to type 2 diabetes and heart disease may lead to an explanation for the link between these two disorders.
The CDKN2A/B finding also highlights the power of GWA studies to identify variants outside described genes: whilst one of the signals occurs in the CDKN2A/B region, the other (much stronger) association signal occurs >200 kb from these genes, in a gene desert. This association would not have been picked up by a candidate gene approach. Identifying the mechanism by which this variant (presumably) affects CDKN2A/B expression will provide new insights into the regulation of this important gene(s).
The other newly identified type 2 diabetes genes are generally involved in beta-cell development and function, and insulin secretion [18–20]. For example, the HHEX gene is highly expressed in foetal and adult pancreas, and is implicated in pancreatic development [28, 29]. It is a target of the WNT signalling pathway, which has been shown to be critical for the development of the pancreas and islets during embryonic growth . Importantly, TCF7L2 also has an important role in WNT signalling, acting as a nuclear receptor for β-catenin . Together, these findings highlight the importance of the WNT signalling pathway in glucose homeostasis.
In addition to the newly identified type 2 diabetes genes described above, the WTCCC study found strong association with FTO (fat mass and obesity associated) gene region (OR 1.27, P = 2.0 × 10−8) . This finding, which was the strongest susceptibility locus outside TCF7L2, showed strong replication in a further 3757 type 2 diabetes cases and 5346 controls from the UK (OR 1.22, P = 5.4 × 10−7) . However, the lack of such strong association in the DGI study , which matched cases and controls for BMI, and the FUSION study , where there was minimal BMI differences between cases and controls, suggested that the association with type 2 diabetes was caused by the primary effect on adiposity. Indeed, adjustment for BMI in the UK replication samples abolished the type 2 diabetes association (OR 1.03, P = 0.44). This exciting observation lead to the study of association of FTO gene variation with BMI and the risk of being overweight and obese in an additional 19 424 adults and 10 172 children, all of white European origin . In the combined data set each additional copy of the rs9939609 risk allele is associated with a BMI increase of approximately 0.4 kg m−2 (P = 3 × 10−35). Individuals homozygous for the A allele (16% of the population) are at a substantially increased risk of being overweight (OR 1.38, P = 4 × 10−11) and obese (OR 1.67, P = 1 × 10−14) compared to those homozygous for the low-risk T allele (37% of the population). This association was observed in children at ages 7–11, but not at birth, and reflects a specific increase in fat mass .
FTO is a gene of unknown function in an unknown pathway. It seems to be widely expressed in both foetal and adult tissues, with highest levels in the brain . One possibility therefore is that FTO is an important regulator of appetite. This would be consistent with the role of monogenic obesity genes, such as leptin, but much work is needed to determine whether this is the case. It is clear though that understanding how variants of the FTO gene increase fat mass will lead to the identification of a new obesity pathway, with implications for drug development and treatments.
Age-related macular degeneration
Age-related macular degeneration (AMD), the main cause of blindness in developed countries, is a chronic, common and complex disease characterized by progressive destruction of retina’s central region and drusen formation behind the retina (reviewed in Ref. ). Currently, there is no broadly effective therapy available. The major environmental risk factor for AMD is smoking (smokers have up to 2.5-fold increased risk of AMD than nonsmokers) [34, 35]. One of the first published genome-wide case–control studies was by Klein and colleagues . Using the relatively sparse Affymetrix 100 K chip (Affymetrix Inc.) they identified a common variant in the complement factor H gene (CFH) as the SNP most strongly associated with AMD . Although this was a small study (96 cases and 50 controls), it increased its power by using enriched samples (severe AMD cases and older controls to increase the probability of them not developing AMD). A more recent case–control candidate gene study replicated the association of CFH gene, and confirmed that individuals homozygous for the most strongly associated risk allele have over sevenfold higher risk for AMD than those homozygous for the nonrisk allele . Human CFH is a regulator of the innate complement system that responds to infection by normally attacking only the diseased cells. Observations of activated complement components within drusen of AMD patients, and of strong effects of smoking and age on CFH plasma levels, suggest that AMD may result from abnormal complement activation in an anomalous inflammatory response . Although the CFH polymorphisms are noncoding, they may alter the binding of CFH to heparin and C-reactive protein . Furthermore, as CFH is a member of the complement and coagulation cascade pathway, these findings highlight that several different complement and coagulation factors may be potential drug targets and justify further research.
Crohn’s disease, most commonly affecting ileum and colon, is a common form of idiopathic inflammatory bowel disease (IBD) where genetic predisposition has been supported by twin studies showing concordance rate of 50% in monozygotic compared to 10% in dizygotic pairs. Previously, years of research effort involving linkage, candidate gene and targeted association studies, identified only two genuinely associated variants, in CARD15 gene and the IBD5 haplotype. A recent GWA study by Rioux and colleagues  identified and replicated several new susceptibility loci for ileal Crohn’s disease. The most associated SNP, independently identified by a smaller German study , was a nonsynonymous amino acid change in ATG16 autophagy-related 16-like 1 (ATG16L1) gene. The risk allele is a major allele (it has a frequency of about 52% and 60% in controls and cases, respectively), and individuals carrying one copy are at a 35–45% higher risk of developing the disease than those carrying no ATG16L1 risk alleles [38, 39]. This SNP is in strong LD (r2 = 0.97) with the strongest signal in the WTCCC scan for Crohn’s disease. Autophagy is a constitutive biological process involved in immune pathogen recognition, and the variants in ATG16L1 gene may alter innate immune control or antigen presentation in the adaptive immune pathways . The WTCCC study identified four novel association signals, all of which have since been replicated. These map to IRGM (immunity-related guanosine triphosphatase), MST1 (macrophage stimulating 1), NKX2-3 (NK2 transcription factor related, locus 3) and PTPN2 (protein tyrosine phosphatase, nonreceptor type 2) gene regions. These novel findings highlight that defects in a number of components of innate and adaptive immune pathways, such as those in autophagy and the processing of phagocytosed bacteria, are a major cause of Crohn’s disease.
Until recently, the only risk factors established for prostate cancer, the most prevalent noncutaneous malignancy in males in developed countries, were family history and African-American ethnic background . Two genome-wide case–control association studies have now both confirmed a known associated variant and separately identified new independent risk variants in the linked 8q24 region [41, 66]. Yeager et al.  estimate that their new and a previously reported variant contribute to over threefold risk of disease in double homozygotes for the risk alleles compared to double homozygotes for the protective alleles, giving the combined population attributable risk (PAR) of 27%, whilst Gudmundsson et al.  estimate a combined PAR of 13% in European populations for their two variants. As none of the independent 8q24 signals lie in known genes, it is possible, that there are several unknown prostate cancer susceptibility genes in the region. Alternatively, the risk variants may independently affect the regulation of genes outside the linked region, such as the near-by proto-oncogene MYC, by making the whole region prone to somatic amplification, a common event in prostate tumours .
Coronary heart disease (CHD), including MI, has reached endemic proportions worldwide and is the major cause of mortality in Western countries . Two GWA studies seem to have discovered the major genetic risk factor [44, 45]. Both studies identified a strong signal on chromosome 9p21, close to CDKN2A and CDKN2B genes (MI association P = 1.2 × 10−20). Homozygotes for the risk allele (20–25% of Caucasians) are 64% more likely to suffer heart attack  and have up to 40% increase in CHD  than homozygotes for the nonrisk allele. In the study by Helgadottir et al. , each copy of the associated variant reduced the age at onset of MI by approximately 1 year (P = 2.9 × 10−7), and had PAR of 21% (31% for early onset cases). Resequencing of the associated 58-kb interval identified a copy number variation in a putative noncoding RNA of unknown function, suggesting that specific variation in the transcript expression or function may predispose to heart disease .
Although the relative risks explain a small proportion of the familial clustering of the disease, this susceptibility locus is the same one found to associate with type 2 diabetes, as described above. It is possible that the risk allele is located within a regulatory element that controls the expression of a gene outside of the associated region, or that the functional variant itself is located outside the region that was not well covered by the genotyping platforms used in the studies. Therefore, fine mapping and further studies of this region are needed to find the causative variant(s) for both cardiovascular disease and type 2 diabetes, providing potentially the same drug target for two common diseases.