For Mendelian disorders, the application of linkage-based approaches has been extremely successful with thousands of gene loci and specific mutations identified over the last 30 years . By contrast, progress in common multifactorial disease without a clear Mendelian pattern of inheritance was slow . Linkage did yield some notable successes such as the role of NOD2 in Crohn’s disease  but was not a tractable approach in the vast majority of cases. Similarly, candidate gene analysis, while being fruitful in some instances such as APOE e4 in Alzheimer’s disease [11, 12] or factor V Leidin in venous thrombosis , in most cases was often unsuccessful or yielded associations that failed to replicate . The common disease: common variant hypothesis became tractable to test with the advent of affordable high-throughput genotyping, and growing insights into the nature and coinheritance of genetic variation across different populations through large collaborative studies such as the International HapMap Project . This set the scene for GWAs in which informative common biallelic genetic markers could be genotyped in thousands of cases and controls to look for the evidence of association . One minor note in terms of terminology: single nucleotide variants (SNVs) include single nucleotide substitutions, which when present in the human population with a frequency of both alleles of >1% are also referred to as single nucleotide polymorphisms (SNPs). Rare SNVs may be defined based on a minor allele frequency (MAF) <1% but of note, GWAs typically involve genotyping SNPs with a MAF >5% which means both rare and less common variants (MAF 1–5%) are not well captured.
Genome-wide association studies
By June 2011, 1449 genome-wide associations have been reported at a P value of <5 × 10−8 for 237 traits (http://www.genome.gov/gwastudies/) (Fig. 1). The results have been striking in terms of the strength of association, with many variants implicated with considerable statistical confidence for diseases ranging from type I diabetes [16, 17] to leprosy [18, 19] and cancer . However, the magnitude of effect of individual disease-associated variants was in almost all cases very modest, typically 1.2-fold, and the proportion of the estimated heritability explained by such variants was relatively low, for example, ranging from 5% to 10% in type II diabetes  to 25% in Crohn’s disease . It would be wrong, however, to interpret this as meaning that GWAs have been unsuccessful: for the first time, we have a substantial number of robustly replicated associations with common genetic markers which are starting to be translated into risk modelling and prediction of clinical utility, notably in cancer, [23, 24]. More evident are the new insights into disease pathogenesis which GWAs are providing, ranging from the role of complement factor H in age-related macular degeneration  to Crohn’s disease where the significance of autophagy [26–29] and IL23 signalling [28, 30] has been highlighted and is providing new targets for therapeutic intervention [31, 32].
Figure 1. Genome-wide association studies. (a) Number of published genome-wide association studies (GWAs) reporting at least one significant single-nucleotide polymorphisms (SNP) trait association by year to October 2011 catalogued by the National Human Genome Research Institute (NHGRI) GWAS Catalog  [Hindorff LA, MacArthur J, Wise A, Junkins HA, Hall PN, Klemm AK, and Manolio TA. A Catalog of Published Genome-Wide Association Studies. Available at: http://www.genome.gov/gwastudies. Accessed 10/30/2011]; (b) Schematic showing location of associated marker SNPs from GWAs by frequency ; (c) Reported GWAs loci by disease classification .
Download figure to PowerPoint
The ‘missing heritability’ in common disease following GWAs has been the subject of much debate [33–35]. Dubbed by some investigators as the ‘dark matter’, which underlined the elusive nature of resolving the basis of this heritable risk, current views highlight the potential role of rarer variants with moderate or high magnitude of effect. GWAs have not interrogated such variants to date, and their analysis has become achievable through the increasing application of massively parallel sequencing as costs continue to fall (Fig. 2). Anticipated results over the next 12 months for ongoing studies involving in large numbers of cases will be highly informative in the context of common disease. There may also be much potentially useful information within GWAs data sets involving associated variants just below the selected thresholds for statistical significance: mining such data and sifting the wheat from the chaff is challenging and may be facilitated by increasing sample sizes (with a note of caution in terms of the cost benefit of doing so) and using functional genomics and other approaches to try and inform this process. There are many other potentially relevant contributors to this phenomenon of unexplained heritability: the estimations of heritable risk may be overinflated, and epigenetic factors are increasingly recognized to be significant contributors to heritable risk (including parent of origin effects and environmental modulators), while gene–gene and gene–environment interactions have not yet been well characterized.
Figure 2. DNA sequencing cost of the human genome. The dramatic fall in sequencing costs is illustrated by data arising from sequencing centres funded by the NHGRI [Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Large-Scale Genome Sequencing Program Available at: http://www.genome.gov/sequencingcosts. Accessed 10/30/2011].
Download figure to PowerPoint
The relentless pace of technological advances in our ability to detect and quantify genetic variation in a high-throughput manner now makes the analysis of rarer variants a feasible option at the whole exome and increasingly the whole genome level. While for common disease, the jury remains out on the relative importance of such variants, there is growing optimism that for rare ‘orphan’ diseases with very robust phenotypes such as primary immunodeficiencies and metabolic disorders, the potential is very great, while for Mendelian diseases, considerable success has already been reported [36–40].
The potential of whole exome sequencing was underlined in 2009 by data from sequencing four unrelated individuals with the rare autosomal dominant disorder Freeman Sheldon syndrome that resolved the known causal gene . Whole exome sequencing has since been successfully used to determine the genetic basis of a number of unresolved Mendelian disorders. This includes autosomal dominant traits where, for example, sequencing 10 unrelated probands resolved mutations of MLL2 as a major cause of Kabuki syndrome , while for Schinzel–Giedion syndrome, SETB1 was implicated following whole exome sequencing of four unrelated individuals which revealed de novo mutations involving this gene . For autosomal recessive diseases, success has also been achieved as illustrated by work involving whole exome sequencing of four individuals from three families with Miller syndrome which defined DHODH as the disease gene , and for hyperphosphatasia mental retardation syndrome where mutations in PIGV were resolved following whole exome sequencing of three siblings with validation in additional families . This latter work also highlighted the power of filtering regions based on identity by descent.
The value of whole exome sequencing using a family-based approach for sporadic disease was illustrated for 10 cases of unexplained severe mental retardation where case parent trios were sequenced and de novo likely pathogenic nonsynonymous SNVs identified in seven of the affected individuals . For very specific phenotypes where extensive biochemical and functional data and validation are possible, whole exome sequencing of a single individual may be informative as illustrated for a mitochondrial respiratory chain disorder where a mutation involving ACAD9 (encoding acyl-CoA dehydrogenase 9) was identified and causally implicated in disease, with other mutations in the same gene identified in further cases .
The utility of whole exome sequencing for clinical diagnosis is also increasingly recognized. This is illustrated by a patient referred with renal disease in whom Bartter syndrome was suspected: sequencing revealed a mutation in SLC26A3, leading to the diagnosis of congenital chloride diarrhoea . A further case involving a young patient with intractable and atypical inflammatory bowel disease illustrates how the therapeutic implications can be significant. In this instance, whole exome sequencing revealed a mutation in XIAP, knowledge of which contributed to a clinical decision to carry out stem cell transplantation [46, 47].
The optimal strategic approach to apply whole exome and whole genome sequencing in Mendelian disease is still in the process of being resolved for different scenarios, while for common multifactorial traits, how to apply high-throughput sequencing approaches is a source of considerable debate. If families can be identified, then sequencing distantly related individuals within the pedigree, looking for cosegregation and testing specific implicated variants in large cohorts may be fruitful, while for other traits, studying individuals in the extreme tails of the phenotype distribution, particularly where there is younger age of onset, is advocated . As costs continue to fall, however, whole genome sequencing of hundreds or thousands of cases and controls to identify all variants will be carried out.
The bioinformatic and analytical challenges such data sets represent should not be underestimated . The mapping and accurate calling of sequence and structural variants remain a very active area of development and research, in which further progress is being made and urgently needed [50–54]. The amounts of data involved are prodigious and on a scale more commonly encountered in astrophysics. Accepting that these challenges can be overcome, the subsequent analysis to narrow down the lists of potentially deleterious variants causing disease is challenging enough in Mendelian traits. Recent data from the 1000 Genomes Project have highlighted how on average, each of us has 250–300 loss-of-function variants in annotated genes and 50–100 variants previously associated with inherited disease .
There are several examples of common multifactorial traits where rare variants have been shown to play a significant role. Resequencing candidate genes identified through GWAs has been productive, with rare variants with large effects resolved in hypertriglyceridaemia , Crohn’s disease  and type I diabetes . The latter highlighted rare variants in IFIH1, a gene encoding interferon induced with helicase domain 1, which is important in the recognition of RNA from picornaviruses, and may be highly relevant given the link between enteroviruses and development of diabetes . Other candidate genes resolved through animal studies such as SIAE (encoding the enzyme sialic acid acetyl transferase) have revealed several functionally important rare variants associated with autoimmune disease . Further examples from autoimmune disease include the association of rare variants in the DNA exonuclease gene TREX1 with systemic lupus erythematosus .
For common traits, we can anticipate that if rare variants play a role, lessons should be learned from Mendelian diseases such that analysing association with rare variants present at a given gene or locus may be of value with many different mutations resulting in a common phenotype. Various analytical strategies are being advocated and have been reviewed elsewhere . A blurring of the distinction between common and Mendelian disease is apparent as we also appreciate the role of modifier genetic variants and the environment in observed penetrance and phenotypic heterogeneity in Mendelian disease, such that conditions such as sickle cell disease are viewed as complex multigenic disorders rather than monogenic disease . The role of modifier variants is highlighted by recent work in cystic fibrosis, where genome-wide association and linkage analysis have highlighted variation at chromosome 11p13 and 20q13.2, respectively, in modulating observed variation in the severity of lung disease among patients with two copies of loss-of-function CFTR alleles .