Diabet. Med. 28, 132–140 (2011)
Recent advances in genetic analysis have enabled researchers to perform genome-wide surveys for common DNA sequence variants associated with risk of Type 2 diabetes and related traits. Over the past 4 years, these endeavours have extended the number of proven Type 2 diabetes-susceptibility loci from a handful to the current total of over 40. Each of these loci provides an opportunity to uncover insights into the biology of glucose regulation and the pathogenesis of Type 2 diabetes, insights which should support clinical translation to identify novel ways of treating and preventing disease. Here, I describe (i) progress in identification of diabetes-susceptibility loci; (ii) biological insights that have been gained in the relatively short period since these loci were discovered; and (iii) the challenges that need to be addressed if we are to maximize the translational benefits of this research.
Diabetes Genetics Replication And Meta-analysis
maturity-onset diabetes of the young
Dorothy Hodgkin’s research career, celebrated at the Diabetes UK meeting through the annual lecture given in her name, involved seminal contributions to the determination of the structure of both insulin and DNA. Amongst the many insightful quotes attributed to her is the following: ‘A great advantage of X-ray analysis as a method of chemical structure analysis is its power to show some totally unexpected and surprising structure with, at the same time, complete certainty’ . In much the same way, recent advances in genetic technology, which have enabled systematic exploration of the effects of variation in DNA sequence on individual risk of diabetes and related conditions, are increasingly able to provide reproducible and novel insights into disease biology.
What is genetics for?
Human discovery genetics seeks to identify variation in DNA sequence that underlies individual differences in risk of disease and/or variation in continuous traits of biomedical significance . Broadly speaking, the expectation is that such discoveries will underpin advances in disease management in two major ways. The first route involves the transformation of genetic discoveries into an improved understanding of the biology of the disease of interest, building on the capacity of disease-susceptibility variants to highlight mechanisms and networks fundamental to disease pathogenesis. Such biological insights provide the substrate for future translational advances, be those through definition of novel therapeutic targets, characterization of biomarkers with the potential for improved diagnosis or monitoring of disease progression, or identification of more effective preventative strategies. The second route to translation follows from the belief that it will become increasingly possible to use genetics to support ‘personalized’ approaches to clinical care that take account of more precise measures of individual diagnosis, prognosis and anticipated therapeutic response .
Common and rare variants
Whilst, as we shall see, we remain some way from satisfying these lofty ambitions for the most common forms of diabetes, there are many, less frequent subtypes of diabetes for which the power of discovery genetics has already transformed medical care. The best example is provided by neonatal diabetes mellitus: the demonstration, by Andrew Hattersley and colleagues, that mutations in the KCNJ11 gene (encoding one component of the B-cell KATP-channel) were responsible for a large proportion of cases of this condition, led directly to the hypothesis that such patients would be likely to respond to treatment with sulphonylureas . It is now standard clinical practice for anyone with diabetes diagnosed before 6 months of age to undergo diagnostic testing for mutations in the KCNJ11 and ABCC8 genes, and where these are found, to attempt transfer from insulin (with which such patients have conventionally been treated) to sulphonylureas . For many individuals with neonatal diabetes, this has transformed management, bringing substantially improved diabetes control and relief from regular injections. There have been similar advances in the genetic understanding of other highly familial forms of diabetes, such as maturity-onset diabetes of the young (MODY), which have also had a direct impact on clinical care delivery for patients and their families .
It is not difficult to understand why the successful application of ‘genomic medicine’ to diabetes care has, so far at least, been limited to these less frequent subtypes. These are the most highly genetic forms of diabetes, caused by rare but penetrant mutations. Within families segregating such diseases, knowing whether or not an individual carries a particular mutation is usually sufficient to define their present (or future) disease status. These tight correlations between genotype and disease phenotype have driven successful efforts to identify the genes and variants responsible, and have also enabled rapid translation. These variants have a strong impact at the level of the individual, but, because they are rare, contribute relatively little to the burden of disease at the population level.
The situation is quite different for the overwhelming majority of individuals with non-autoimmune forms of diabetes. As far as we can tell, for most individuals with Type 2 diabetes, individual predisposition depends on the cumulative effects of the alleles they have inherited at a great many susceptibility variants (some of which will increase individual risk, whilst others protect), and the impact of a series of environmental exposures and lifestyle choices . In such a setting, any single genetic variant can only have a marginal effect on disease risk. The loose correlation between genotype and phenotype has, until recently, frustrated efforts to identify causal genetic variants, and also limits the clinical impact to be expected from any single genotype . However, whilst individual impact may be modest, because such variants are common, the effects at the population level can be substantial.
Genome-wide association analysis
It is only in the past 3–4 years that researchers have been able to make substantial progress in characterizing the variants that influence common forms of diabetes. Prior to 2006, association approaches were mostly restricted to small-scale studies of candidate genes. Although there were some successes (coding variants in the PPARG and KCNJ11 genes were shown to be robustly associated with Type 2 diabetes risk) [8,9], more often than not the findings from such studies failed to survive the crucial test of replication .
Three, more or less simultaneous, advances transformed the prospects for genetic discovery for common diseases such as Type 2 diabetes. First, building on completion of the human genome sequence, the International HapMap Consortium generated maps of DNA sequence differences that revealed the population-level structure of human sequence variation . Second, novel genotyping methods became available that enabled efficient, low-cost interrogation of many hundreds of thousands of variant genetic sites in a single experiment . Third, the realization that the variants being sought were likely to have modest effect- sizes led to a consolidation of research effort into consortia able to bring far larger sample sizes to bear [7,12].
The culmination of this ‘perfect storm’ was the emergence of the ‘genome-wide association study’ . To date, genome-wide association scans for Type 2 diabetes have focused on common variants (those for which the frequency of the less common allele is above 5%) and have been reported in samples from European and Asian populations [12–30]. For such studies, it suffices to focus on the 10 million or so sites (out of 3 billion) which are commonly variant in such populations. In fact, because of the extensive correlations between nearby markers (a phenomenon known as ‘linkage disequilibrium’), most of the information needed for genome-wide association analysis, for common variants at least, can be captured by directly typing approximately 500 000 markers.
In a series of studies of ever-escalating size, the number of genomic regions now confidently associated with Type 2 diabetes (that is, demonstrated to levels of significance well beyond those possible by chance) has grown, in just 3 years, to more than 40 [12–30] (Fig. 1). In one recent study, members of the Diabetes Genetics Replication And Meta-analysis (DIAGRAM) consortium were able to combine genome-wide association data from over 8000 samples from patients with Type 2 diabetes and 38 000 control subjects without diabetes, and to substantiate interesting associations in close to 100 000 additional samples, an effort which added a dozen new signals to the list, including the first on the X-chromosome .
There are a number of important points to make about these findings. The first is that the effect sizes of the associated variants are quite modest, ranging from a per allele odds ratio of ∼1.05 (the lower end of the range of detectability with the sample sizes available) to ∼1.35 (in the case of variants near TCF7L2) [27,31]. Put another way, when individuals with two copies of a particular high-risk allele are compared with those with no copies, the increase in diabetes risk ranges from ∼10% (at the lower end) to approximately double (for TCF7L2). The second is that, whilst it is customary (and convenient) to designate each of the signals by the name of a nearby gene (usually the transcript that maps most closely, or the strongest regional biological candidate), at very few of these loci is there strong functional evidence to indicate which of the nearby genes is actually responsible for mediating the effect on diabetes risk.
Whilst both these factors pose challenges to the efficient clinical translation of these genetic discoveries, it is also true that each of these 40 signals provides an opportunity for novel insights into diabetes pathogenesis. What lessons have been learned to date?
Lesson 1. Some of these association signals fall near interesting genes
At many Type 2 diabetes-susceptibility loci, a survey of the nearby genes fails to reveal strong biological candidates and a great deal of complex functional work will be required to characterize the mechanisms responsible.
However, at others, the association signal maps close to transcripts with established links to diabetes pathogenesis, exposing these as particularly compelling candidates. For example, in the case of the common variant signals that map near the WFS1, HNF1B, HNF1A and GCK genes, the fact that rare, penetrant mutations in the coding sequences of these genes are known to be causal for monogenic forms of diabetes indicates that the common variant associations are likely to be acting through those particular transcripts [5,27]. In the case of HNF1A, the common variants implicated in Type 2 diabetes-susceptibility map ∼20 kb 3′ to the gene, and are (geographically and statistically) distinct from a cluster of common variants 5′ to the gene that have been shown to influence phenotypes such as lipid concentrations and C-reactive protein levels . The likely explanation is that the former variants influence HNF1A expression in the pancreatic islet, whereas the latter exercise their effects on HNF1A transcription in the liver: in individuals with HNF1A-MODY attributable to coding mutations in the gene, of course, both pancreatic and hepatic phenotypes are observed.
However, first impressions may not always be correct. The region of chromosome 11 near the KCNQ1 gene harbours two distinct Type 2 diabetes-susceptibility signals [20,21,27]. It would be natural to regard KCNQ1 as a strong positional candidate, given that it encodes a potassium channel expressed in the pancreatic islet. However, neither humans with rare mutations in the KCNQ1 gene (which are causal for serious abnormalities of cardiac rhythm) nor Kcnq1 knockout mice, have any glycaemic phenotype. One clue to a possible alternative explanation comes from the fact that one of the Type 2 diabetes association signals maps to a part of the KCNQ1 sequence which also encodes an entirely different transcript (KCNQ1OT1). KCNQ1OT1 is known to be an important regulator of other genes in the region, including KCNQ1 itself , but also CDKN1C, a gene already heavily implicated in islet development . What is more, KCNQ10T1 is imprinted (in normal individuals, only the copy of the gene that is inherited from one’s father is expressed) and abnormalities of this imprinting mechanism can result in a wide range of abnormalities, including hyperinsulinaemic hypoglycaemia. Recent evidence that KCNQ1 associations with Type 2 diabetes risk are entirely driven by the maternally inherited allele , would seem to support the hypothesis that the variants at this locus exercise their effects via disruption of this regulatory system. In more general terms, these data are consistent with a role for imprinting in Type 2 diabetes pathogenesis.
Lesson 2. B-cell dysfunction predominates amongst known Type 2 diabetes-risk loci
Patients with established Type 2 diabetes typically exhibit abnormalities of both insulin action and B-cell function. There has been a long-standing (and somewhat specious) debate as to which of these processes should be considered the most important contributor to the pathogenesis of Type 2 diabetes. One way of addressing this question is to examine what effects Type 2 diabetes-risk alleles have on these processes by physiological studies conducted in healthy, non-diabetic individuals (Fig. 2). By and large, these studies support the view that most of these alleles are associated with reduced insulin secretion, placing the B-cell centre stage with respect to diabetes development [26,27,35]. However, this is not a universal finding and, for a handful of the loci, the primary effect appears to be a reduction in insulin sensitivity. In the case of FTO, this reduction is secondary to increased BMI [14,36], but at the other loci (including PPARG, KLF14, IRS1 and ADAMTS9) the evidence points to intrinsic defects in insulin-responsive tissues [8,25,27].
Amongst those loci where the predominant effect involves reduced insulin secretion, there are several where the best positional candidates (e.g. CDKAL1, CDKN2A/B, HHEX) have functions that may be consistent with involvement in regulating B-cell mass at the level of development, regeneration or senescence. At others, the focus would appear to be on B-cell dysfunction (e.g. at KCNJ11 and SLC30A8). However at many loci, these inferences are no more than educated guesses. At most of the known loci, the transcript through which the susceptibility effect is mediated remains to be identified with certainty: furthermore, we know little about the precise cellular functions of many regional transcripts, CDKAL1 being a prime example. Much more work is required before we can make confident assertions about the mechanisms through which each one of these variants perturbs glucose homeostasis.
Lesson 3. Some key mechanisms in diabetes pathogenesis have been revealed
Notwithstanding the difficulties outlined in the previous section, inspection of the growing number of loci identified is providing clues to pathways that appear to be involved in diabetes pathogenesis. In the recent paper from the DIAGRAM consortium, for example, evidence that susceptibility regions were enriched for genes influencing particular molecular networks and pathways was sought using a range of approaches, each of which makes use of a different type of reference data (such as the text of PubMed abstracts, pathway databases or protein–protein interaction maps) .
Few, if any, of the pathways or processes that have been widely touted as having a key role in Type 2 diabetes pathogenesis (e.g. oxidative phosphorylation and inflammation) were supported by such analyses. The only signal to emerge with any confidence was that related to cell cycle regulation, there being modest enrichment for genes encoding proteins involved in control of the key processes in mitosis. Many of these genes are expressed in islets, highlighting regulation of B-cell mass as one possible mechanism whereby genetic predisposition to Type 2 diabetes is enacted [27,35]. Perhaps the strongest evidence for this resides at the chromosome 9p21 association signal, one of the first loci to emerge from genome-wide association signals [11,15–17]. This region harbours at least one strong signal for Type 2 diabetes, as well as a second, statistically independent signal which influences risk of coronary artery disease and arterial aneurysm [11,37]. The nearest coding genes (these map approximately 200 kb away from the diabetes signal) are the cyclin-dependent kinase inhibitors, CDKN2A and CDKN2B. The proteins they encode are well known for their contribution to cancer pathogenesis: loss-of-function mutations, whether in the germ line (causing some familial forms of melanoma) or somatic (involved in a wide range of neoplasms), lead to release of cyclin-dependent kinase inhibition and contribute to the unrestrained cell division which is one characteristic of cancer. There is good evidence, from rodent models in particular, that the diabetes predisposition in this region might be the result of gain-of-function of these same genes, leading to inhibition of downstream cyclin-dependent kinases (including CDK4), which play an important role in the control of B-cell proliferation and senescence . In the mouse, both Cdkn2a overexpression and Cdk4 knockout result in islet hypoplasia and recapitulate the Type 2 diabetes phenotype . Quite how the Type 2 diabetes-risk variants in this region influence CDKN2A/B function and/or expression is not yet certain, but the recent identification of a non-coding RNA (named ANRIL), which is transcribed from the regions of maximal Type 2 diabetes and coronary artery disease association, and is thought to influence expression of CDKN2B at least, suggests that this may be involved in some way .
This example also demonstrates how genetic studies can lead to unexpected mechanistic connections between traits (in this case, Type 2 diabetes and cancer) which normally reside in separate chapters of medical textbooks. This connection is not restricted to the chromosome 9p region. Within the HNF1B gene, the same alleles that predispose to Type 2 diabetes are clearly protective against prostatic cancer, a finding which mirrors epidemiological evidence that men with diabetes are protected against this form of neoplasia .
Lesson 4. The genetic variants that influence fasting glucose levels in healthy individuals are not the same as those influencing risk of Type 2 diabetes
Non-diabetic individuals differ in their set points for fasting glucose and these differences show substantial heritability. As the processes that govern fasting glucose (the balance between insulin sensitivity and B-cell function) are the same as those considered fundamental to the pathogenesis of Type 2 diabetes, it seems reasonable to ask whether or not the variants that influence physiological variation in continuous glycaemic traits (fasting glucose, fasting insulin and derived measures of insulin secretion and action) are the same as those implicated in the pathological deterioration of glucose homeostasis that we recognise as Type 2 diabetes. Data from two large parallel efforts based around genome-wide association scan meta-analysis in European subjects, the one centred around Type 2 diabetes risk (the DIAGRAM consortium), the other around fasting glucose and insulin levels in healthy individuals [the Meta-Analysis of Glucose and Insulin-related traits Consortium (MAGIC)] has allowed us to examine this question [22–24,26,27].
In short, the overall picture is that, at some loci (such as the melatonin receptor, MTNR1B, gene), the same variants influence both [22–24], but that the overlap is far from complete [26,27]. There are some established diabetes-susceptibility loci (such as KCNJ11 and HNF1A) where, despite amassing data from over 40 000 non-diabetic subjects, no discernable association with fasting glucose can be observed [26,27]. There are other loci, such as G6PC2, which have effects on fasting glucose as strong as those seen at MTNR1B, but with no compelling evidence that they increase risk of diabetes (and, indeed, some suggestion that the glucose-raising allele might even be protective) [26,27].
These findings invite a number of consequences. Firstly, from a genetic perspective it is clear that not all individuals with a raised fasting glucose are at the same increased risk of Type 2 diabetes. Whilst the genetic signals are not strong enough to provide individual prediction in this regard, nonetheless these findings should bolster efforts to identify features that enable more accurate stratification of risk. Secondly, it is clear that the mechanisms involved in the normal, physiological regulation of basal glucose homeostasis in healthy individuals are somewhat distinct from those responsible for the pathogenesis of Type 2 diabetes. Thirdly, and more broadly, naive notions concerning the relationships between continuous biomedical traits (such as fasting glucose) and their cognate disease diagnoses (in this case, Type 2 diabetes) are likely to provide a misleadingly simplistic model of reality.
Lesson 5. Relationships between birthweight and diabetes risk are in part genetic
Over the past two decades, compelling evidence has accumulated connecting events in early life (particularly fetal undernutrition) with an increased risk of diabetes and cardiovascular disease in adulthood. The dominant explanation for these relationships has been provided by the fetal origins hypothesis, which argues that exposure to an adverse environment during fetal life and early infancy primes (or ‘programmes’) the individual to be at greater than average risk of cardiovascular and metabolic disease, particularly if exposed to a ‘diabetogenic’ environment in later life . These relationships are complex, in that a plot of the association between birthweight and diabetes is, in many developed countries, actually ‘U-shaped’ (Fig. 3). The association between high birthweight and diabetes risk reflected in the upper stroke of the ‘U’ can be attributed to the consequences (both genetic and epigenetic) of maternal gestational diabetes .
Whilst data supporting these hypotheses have been used to argue that the contribution of genetics to Type 2 diabetes pathogenesis has been overstated, an alternative (complementary) explanation for these observed associations invokes shared genetic influences on early growth and diabetes risk . Alleles that result in reduced insulin secretion (or action) are strong candidates in this respect: as well as their expected contribution to Type 2 diabetes risk in adulthood, the importance of insulin as a trophic factor in early life suggests that such alleles might also compromise early growth.
Rare mutations in the glucokinase gene have effects on diabetes risk and birthweight that are consistent with this ‘fetal insulin’ hypothesis [43,44], but are too rare to contribute to the observed epidemiological associations. By comparing results from genome-wide association studies for diabetes and for birthweight, it is possible to establish whether common variants with more substantial population-level effects might also be contributing. In recent work by the Early Growth Genetics (EGG) consortium , genome-wide association meta-analyses identified two signals for birthweight on chromosome 3, one of which overlies the ADCY5 gene (encoding adenylate cyclase 5). The same ADCY5 variants are also significantly associated with both Type 2 diabetes and fasting glucose levels , such that the diabetes-risk (and glucose-raising) allele results in lower birthweight. A similar relationship is observed at CDKAL1, another known Type 2 diabetes-risk allele with a clear effect on birthweight . These two genetic associations therefore provide proof of principle that the observed epidemiological relationships are, in part at least, the consequence of shared genetic determinants.
However, this story has a further twist. Diabetes risk alleles at the TCF7L2 gene (which remains the strongest common variant signal for Type 2 diabetes) are associated, not with lower, but with elevated, birthweight . By examining both maternal and fetal genotypes and their effects on birthweight, it is possible to understand this apparent anomaly. The risk alleles at CDKAL1 and ADCY5 appear to reduce birthweight through a direct effect on fetal insulin secretion, whereas the association between fetal genotype at TCF7L2 and birthweight is indirectly mediated through maternal hyperglycaemia . Remember that half the mothers of a fetus heterozygous for a TCF7L2 risk allele will carry the same allele and that this comes with an increased risk of gestational diabetes, thereby driving fetal macrosomia. The net effect of any given diabetes-risk allele on birthweight will reflect the balance of these two opposing forces. One way of interpreting these findings is to regard the direction of effect of the diabetes-risk allele on birthweight as providing a ‘bioassay’ of the timing of maximal B-cell defect: if this occurs in early life, birthweight tends to be reduced, whilst a variant that compromises B-cell performance mostly in later life will likely result in the opposite effect.
Lesson 6. The effects discovered are individually small and this has limited the potential for disease prediction
Part of the ‘hype’ concerning the genetics ‘revolution’ has centred around the potential for genetic discoveries to have value in terms of individual prediction of disease risk and response. However, despite the fact that over 40 common variant Type 2 diabetes association signals have been defined, the performance of such tests remains rather poor .
It is certainly possible, within a population, to identify some individuals who, because they have inherited they have inherited a particularly high number of diabetes-risk variants, are at especially high risk of diabetes. Equally, it is possible to define some individuals who carry relatively few risk alleles and who are therefore presumed to be at low risk. However, such individuals are relatively few, and the effects are not large: for example, individuals in the top 1% of the risk-score distribution have approximately twice the average risk of Type 2 diabetes, whilst those in the bottom 1% have about half . Further, it is likely that the risk profile of many such individuals will already be obvious based on classical (non-genetic) risk factors. Indeed, when one uses ‘receiver operating characteristic’ models (which combine estimates of sensitivity and specificity to arrive at a measure of the overall ‘discriminative accuracy’ of a given diagnostic test), the results are pretty disappointing. The discriminative accuracy of a test that combines all currently known common variants is approximately 60%, somewhat higher than the 50% score which would reflect performance no better than tossing a coin, but far below the 80% figure which is a reasonable benchmark for clinical utility, and well short of the performance achievable with classical (non-genetic) risk factors such as body mass index and family history .
Why do these genetic tests perform so poorly? The main reason is that the effect sizes associated with the common variant signals found to date are typically modest, so that, even in combination, their predictive value is poor . In fact, when one compares the extent to which these known variants can explain the patterns of familiality that are characteristic of Type 2 diabetes, it is clear that the known variants explain no more than 5–10% of the likely component of inherited predisposition. If we can find the remaining 90% or so (the ‘missing heritability’, as it has been termed), we may be able to do much better [27,49].
Challenges for the future
Despite the advances of the past few years, it is clear that there is much work to be done if we are to realize our expectations in terms of better ways of preventing and treating Type 2 diabetes. Successful translation will depend on a more complete enumeration of all aspects of genetic predisposition (finding the ‘missing heritability’) and on rigorous pursuit of approaches to turn these genetic discoveries into a detailed understanding of disease pathogenesis.
One particular focus at the moment for many groups is to understand the contribution to diabetes risk that is attributable to low-frequency variants of intermediate effect . Human genetic discovery efforts in the past two decades have been successful at finding rare variants of large impact (through linkage analyses conducted within families), as well as common variants of modest effect (through genome-wide association analyses), but neither approach has much power to survey variants with properties (allele frequencies, penetrances) in between those extremes. The advent of next-generation sequencing technologies is now powering efforts to explore these in a systematic and comprehensive way and we can expect answers to emerge in the coming year.
At the same time, all of these genetic discoveries are of limited value unless we manage to connect them to the biological ‘stories’ they can tell. To the extent that the search for low-frequency variants identifies (at least some) alleles with larger effects, the task of molecular and physiological characterization may be accelerated. In parallel, we need to understand far better the inner workings of the tissues that are key to diabetes pathogenesis, most notably the pancreatic islet. Again, next-generation sequencing approaches are transforming our capacity to define those genomic sequences which play a key role in the regulation of gene expression within such tissues, and studies such as these should make it far easier to extend functional studies of interesting variants beyond coding sequence .
Some final thoughts
For a disease such as diabetes, it is clear that the genetic variants one inherits do not explain all of one’s predisposition and, equally, genetic studies will provide only part of the solution. However, what genetics does provide is a unique opportunity to uncover fundamental processes of disease pathogenesis and to deliver measures of predisposition that are stable over a lifetime. The access to large-scale sequencing, coupled to massive, well-characterized population and patient samples, means that researchers, for the first time, have access to the tools required to face down the challenges of complex biology, and that a systematic and comprehensive understanding of the inherited component of diabetes predisposition will be available in the next decade. The genome, although large, is, after all, finite. Predicting the precise translational benefits that will accrue from this knowledge is not easy (as a great deal will depend on the specifics of what is found), but the prospects look brighter than at any point in the past.
Nothing to declare.
I would like to acknowledge the contributions of the many colleagues, senior and junior, national and international, with whom it has been such a pleasure to work on these challenging problems. The commitment to productive collaboration of researchers in our field has served as a powerful example to others. I would also like to thank Diabetes UK for sustained research support and, in particular, Moira Murphy and Simon Howell, who initiated the Type 2 diabetes genetics collaboration in the UK. I have also been fortunate to receive research support from the Wellcome Trust, Medical Research Council, UK National Institute of Health Research, the European Commission and the US National Institutes of Health.