Twins and the mystery of missing heritability: the contribution of gene–environment interactions


  • J. Kaprio

    Corresponding author
    1. Department of Public Health, Hjelt Institute, University of Helsinki, Helsinki, Finland
    2. Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
    3. Department of Mental Health and Substance Abuse Services, National Institute for Health and Welfare, Helsinki, Finland
    • Correspondence: Jaakko Kaprio, MD, PhD, Department of Public Health, PO Box 41 (Mannerheimintie 172), University of Helsinki, FIN-00014 Helsinki, Finland.

      (fax: 358 9 191 27 600; e-mail:

    Search for more papers by this author


Since 2006, the advent of increasingly larger genome-wide association studies and their meta-analyses have led to numerous, replicated findings of genetic polymorphisms associated with many diseases and traits. Early studies suggested that the identified loci generally accounted for a small fraction of the genetic variance estimated from twin and family studies. This led to the concept of ‘missing heritability’. Here, the progress in accounting for a greater proportion of the variance is reviewed. In particular, gene–environment interactions can, for some traits and in certain circumstances, explain part of this missing heritability.


The importance of family history has long been known, but its limitations are also recognized. The aggregate evidence from human studies of twins, adoptees and families, both nuclear families and extended pedigrees, in conjunction with animal studies has indicated the substantial role of genetic variability in accounting for inter-individual differences in normal human traits and in the risk of developing many diseases. Current genetic methods using genome-wide association (GWA) studies and next-generation sequencing aim to discover the underlying causes of genetic traits and diseases.

Cultural and biological inheritance, particularly for behavioural traits, may be difficult to disentangle from each other. Figure 1 shows a single family pedigree (the author's family) for a trait that currently has an incidence of 2.3 per 100 000 [95% confidence interval (CI) 1.8–2.7] in Finland. There are cases in four generations over more than 100 years, and this trait would appear to be inherited by autosomal dominant transmission. Current genetic technology should be able to identify the causative gene relatively easily given that several such families exist. Despite its apparent Mendelian nature, the trait is ‘being accepted by (and graduating as a physician from) the University of Helsinki Faculty of Medicine’. It is unlikely to be due to a single gene, but rather the interaction of genetic predisposition, family influence and parental support.

Figure 1.

A schematic family tree of a Finnish family with the trait of ‘being accepted by (and graduating as a physician from) the University of Helsinki Faculty of Medicine’. Onset of the trait (calendar year of acceptance) is shown. See text for further explanation.

After sequencing of the first human genomes a decade ago, increasing information regarding genetic variants, mostly single-nucleotide polymorphisms (SNPs), gave rise to GWA studies as technology permitted genotyping of hundreds of thousands of SNPs located throughout the genome. Large numbers of cases and controls could be genotyped for common variants with the aim of detecting allele frequency differences between cases and controls. Such differences, provided they are statistically significant considering the large number of statistical tests performed in each analysis, offer clues as to which genes may be of importance in the aetiology of the disease or trait.

In an early and influential review in 2008, setting the stage for GWA studies, 50 new loci were discussed [1]. Since then, the number of disease or trait-associated loci has increased 100-fold. In their conclusion, McCarthy et al. [1] noted ‘We remain unable to explain more than a small proportion of observed familial clustering for most multifactorial traits, a fact that emphasizes the need to extend analysis to a more complete range of potential susceptibility variants, and to support more explicit modelling of the joint effects of genes and environment’. The relatively small proportion of variance explained gave rise to the concept of ‘missing heritability’ as highlighted in 2009 by Manolio et al. [2], who examined potential sources of the unaccounted genetic variance.

Progress in GWASs

Despite the short timeframe since the advent of GWA studies, their scope has been crucial to identifying numerous novel common variants associated with complex traits and provided insights into the pathology and underlying mechanisms. The database of GWA studies ( at the National Institutes of Health (NIH) National Human Genome Research Institute [3] as of May 2012 includes almost 1300 publications that report a total of 6439 SNPs (significant at a level of <1 × 10−5). As noted by Hunter in a recent review of these studies, this effort has provided an enormous source of novel risk factors for disease that dwarfs decades of earlier epidemiological research [4]. The challenge is to turn the data into relevant information for clinical use and public health policy.

There are two possible explanations for the missing genetic variance: (i) there are multiple common variants with a small effect size and (ii) relatively rare variants are responsible for a significant proportion of unidentified genetic predisposition. Both hypotheses may be correct depending on the trait or disease being studied. In the first case, sample sizes need to be larger, and to this end, meta-analytical studies soon followed the first wave of GWA studies. Thus, a series of GWA studies with samples sizes of more than 100 000 culminated in 2010 in very large meta-analyses of height [5], body mass index (BMI) [6], regional adiposity [7] and blood lipids [8] as prototypic continuous traits of medical relevance. These studies identified tens of novel loci, each of which provides a potential window into previously unknown mechanisms. The fraction of variance accounted for ranged from <5% for BMI and waist circumference, to about 10% for height and more than 10% for blood lipids, that is at least 25% of the genetic variance for the latter. Clearly, an adequate sample size is an important prerequisite for the discovery of relevant genes. This highlights the polygenic nature of these common traits, with each individual locus contributing very little by itself, as hypothesized decades ago by R.A. Fisher and other quantitative geneticists.

Variants identified in GWA studies have stimulated research to understand their impact at cellular, individual and population levels. For example, multiple twin, family and adoption studies as well as animal studies indicate the important role of genetics in smoking behaviour [9]. In 2008, three GWA studies demonstrated variation in regions of 15q24 and 15q25 containing nicotinic acetylcholine receptor genes (a5/a3/b4), which contribute to lung cancer risk and are strongly associated with amount smoked and nicotine dependence measured by the Fagerström Test for Nicotine Dependence [10]. The functional variant causing D398N variation in the α5 subunit of the nicotinic acetylcholine receptor explained <1% of the variance in amount smoked, with an average effect per allele of one cigarette per day [11]; however, the same variant accounts for a fivefold higher variance when cotinine levels are examined [12, 13]. Better phenotyping and use of biomarkers can be a way of accounting for more of the variance. In improving phenotypes, multivariate twin models can be used to test whether and to what extent related phenotypes share the same underlying genes.

Despite the low level of variance accounted for, finding a functional variant is very important. For example, GWA studies of smoking have repeatedly confirmed the importance of the a5 nicotinic receptor variant, which is a nonsynonymous variant that alters alpha5 nicotinic receptor function [14, 15], and have thus provided clear insights into the mechanisms underlying the development of addictions.

The very strict criteria for statistical significance in GWA studies, in conjunction with the evidence for a highly polygenic structure for many traits with multiple loci of very little effect, may mean that many relevant loci remain to be discovered and thus account for some of the missing heritability. Recently, this was elegantly demonstrated by Visscher and colleagues by applying novel biostatistical methods to analyse the whole GWA study data set. They showed that 45% of the variance in height and 17% of the variance in BMI can be accounted for by autosomal SNPs [16], that SNPs in genes are more informative than elsewhere in the genome, and that chromosomal length is also an important determinant of informativeness. A similar result was obtained in a very recent GWA study of intelligence, in which no single significant locus was found, but overall genomic SNPs accounted for 40–51% of the variance. Thus, the more genetic information is used, the greater the amount of variance is accounted for. It is possible that, at least for some traits, a large amount of heritability is no longer missing, as summarized by Visscher et al. [17]. For traits as diverse as obesity, Crohn's disease, bipolar disorder, height and QT interval, using the Genome-wide Complex Trait Analysis (GCTA; approach [17], overall genomic SNPs accounted for about half of the estimated heritability from family studies. This proportion will undoubtedly increase as more sequence data become available. Even low-coverage exome sequencing can increase the power of GWA studies several fold, hence enhancing gene discovery and the proportion of variance explained [18].

Despite these advances in technologies and statistical methods, it is unlikely that sequencing and identification of rare variants will be able to explain all the genetic variance for many traits. For BMI, the amount accounted for in GWA studies is at most 17% [16], and yet the heritability of BMI is typically at least 60% [19]. In this review, the case for a role of gene–environment interactions will be made, in particular, for BMI and obesity, and the use of twin studies to investigate these interactions will be discussed.

Quantitative genetics

Twin studies have become the standard approach to distinguish between genetic and environmental effects on the variability of a trait or susceptibility to disease [20]. The proportion of variance attributable to inter-individual genetic differences is termed heritability. The derivation of and underlying assumptions for estimating heritability are well grounded in genetic epidemiology and quantitative genetics [21, 22]. Because heritability is defined by both genetic and environmental influences, it is not a fixed characteristic of a disease or trait, but a population-specific estimate, analogous to, for example, the mean height, cholesterol level or life expectancy in a population. It also cannot be interpretated at the family or individual level.

Approximately 1 : 50 individuals is a twin, and twins are born into all classes within society. Although some characteristics of pregnancy and infancy, such as low birth weight, distinguish twins from singletons, they are highly representative of the general population for most traits from early childhood onwards. Although twin studies have been used to estimate the heritability of different traits and disorders since the beginning of the twentieth century, it is only following statistical developments of the past 20–30 years as well as more extensive and systematic data collection that the scope of twin studies has greatly expanded. Thus, over 50 large twin registers and cohorts currently exist worldwide. These have been well documented in themed issues in 2002 and 2006 of the journal Twin Research and Human Genetics.

Basic twin model

Twin analyses enable testing of sophisticated hypotheses regarding the sources of individual differences [23]. In studies of traditional nuclear families, shared environmental and genetic effects can be difficult to disentangle, because individuals who are more closely related genetically (e.g. siblings as compared to cousins) are also more likely to share environmental influences, as shown above (Fig. 1). In basic twin models, gene–environment interactions are assumed not to exist, and if present, they are included as part of the additive genetic variance, inflating heritability estimates.

Twin studies are considered a powerful tool and the method of choice for distinguishing cultural inheritance from genetic inheritance. In addition, they allow partitioning of variance into genetic influences, common environmental influences and unique environmental influences. The twin model has provided extensive estimates of the heritability of different traits and conditions in many different populations. Whilst some variation in heritability estimates is due to chance as study samples are not always sufficiently large, examining the sources of variation in heritability estimates, particularly when this can be performed within a single population or by standardized methods and common protocols, can illuminate factors that modify the expression of genetic predisposition in a population. For example, the heritability of height is lower in Finnish twins born in the early part of the twentieth century than later, reflecting more adverse environmental conditions, such as poorer nutrition and increased infection rates [24]. If GWA study samples from populations with differences in their exposures to poor nutrition and infections in childhood are combined in the same analysis, it is likely that significant associations may remain undetected.

Developments in modelling

The basic twin model can be extended to test more complex hypotheses about traits, including sex differences in those traits. The inclusion of multiple phenotypes in the model allows one to study whether overlap between various traits and the same trait over time is due to shared genetic or environmental influences. Likewise current models allow inclusion of information on specific genetic variants and measured environmental factors, and their interactions. In some contexts, gene–environment interactions, i.e., that environments modify the effects of genes on the trait being studied, may account for a substantial part of the apparent heritability. Advances in statistical modelling allow tests of the effects of moderators on heritability to be tested and the role of gene–environment interactions to be estimated [25]. The moderation model examines whether the magnitudes of additive genetic variance, common environmental variance and unique environmental variance change as a function of a measured environmental factor [26].

Longitudinal models

A cross-sectional study design can be used to assess whether there are changes in the magnitude of genetic and environmental influences over time, assuming that there are no cohort effects. For example, we found using Finnish twin data that the effect of genetic factors on coffee consumption was higher in young than in middle-aged adults but increased again in old age [27]. This might suggest that given the coffee-drinking culture in Finnish workplaces with set coffee breaks, the environmental pressure to drink coffee may be greater amongst individuals of working age, whilst younger and older retired adults can more freely express their biological predisposition. To determine the genes underlying this predisposition, it may be important to take into account sources of environmental variability on the genetic effect. Several genes were identified in a recent GWA study, including CYP1A1 and CYP1A2, but with very modest effect sizes [28]. This study on genes affecting coffee consumption did not have the power to address differential effects across age.

However, it is desirable, and in fact necessary, to use longitudinal models to investigate whether the same genetic and environmental factors operate over time. Using three self-reported measurements of BMI over a 15-year period amongst adults of the Finnish twin cohort, we applied a latent growth model to estimate genetic effects on BMI level at baseline and the rate of change in BMI. The twins were aged 20–46 years at baseline. The heritability of the stable part of BMI was high, but we also found a substantial genetic influence on weight gain [heritability for men = 58% (95% CI 0.50–0.69), heritability for women = 64% (95% CI 0.58–0.69)]. The genetic correlation between BMI level and rate of change of BMI was almost zero, suggesting that the genes affecting BMI are different from those involved in weight change in adults [29].

To extend these observations, we have investigated growth patterns in adolescents and young adults [30]. The study sample consisted of 4915 monozygotic and like- and unlike-sex dizygotic twins, born between 1975 and 1979. Data on BMI were gathered when twins were on average 16.1, 17.1, 18.6 and 24.4 years old. Genetic and environmental influences on the BMI trajectories were modelled using latent growth curves (Fig. 2). The results showed that the heritability of BMI decreased slightly after the adolescent period, from about 80% to 70%. BMI transition from adolescence to young adulthood was best described by a quadratic trajectory that was highly (73–84%) accounted for by additive genetic influences. As in the older twins, genetic influences on BMI level showed a low correlation with influences on the increasing trend in BMI with age, indicating that different sets of genes underlie the change in BMI during these periods. For example, Fig. 3 shows that approximately three-quarters of the genetic variance of BMI in young adults was the same as in adolescents at the end of puberty. This study confirmed and extended the findings of several twin and family studies of longitudinal weight changes as summarized in a recent meta-analysis [31].

Figure 2.

Growth curve models for four waves of BMI in the Finntwin16 study [30] from age 16.1 to 24.4 years (young adulthood). Numbers are standardized estimates (and 95% confidence intervals) of genetic (A) and environmental (E) variances on BMI level and BMI slopes. Genetic and environmental correlations between BMI level and BMI slopes are shown on curved arrows. Environmental influences with no effects across time are shown as Esx, whilst the heritability of BMI at each measurement is shown as h2.

Figure 3.

Genetic and environmental variances of BMI in adolescent and young adult twins from the FT16 study [30]. The columns show how new variance in BMI is added across measurements (at ages 16, 17 and 18.5 years), whilst the fourth measurement was taken when the subjects were young adults (mean age 24.4 years). Data combined for male and female subjects; gender-specific data are provided in the original article.

Given that BMI captures both lean and fat mass variability, whilst most of the increase in mass after completing growth is fat mass and not muscle mass, this weak correlation is not totally unsurprising. In the large EPIC-Norfolk cohort study reported in 2010 [32], the genetic risk score based on the 12 BMI genes known at the time of the study did not predict weight change prospectively.

Modification by selected environmental factors

In addition to age/time, the genetic influences on obesity may vary as a function of other factors. Socio-economic status, especially level of education is known to be associated with rates of obesity, but how it modifies the expression of the genetic susceptibility to obesity has not been studied. We used data from the longitudinal FinnTwin12 study to investigate how parental education modifies genetic and environmental influences on BMI during adolescence [33]. Self-reported BMI values at the ages of 11–12, 14 and 17 years were collected from a population sample of 2432 complete Finnish twin pairs born between 1983 and 1987. On the basis of parental report, twins were grouped into those with high (both parents were high school graduates), mixed (only one parent graduated from high school) and limited (neither parent graduated) levels of parental education. Using a twin model stratified by parental education level, genetic and environmental influences on variation in BMI were modelled.

The heritability of BMI amongst 11- to 12-year-olds with a high parental education level was 85–87% compared to 61–68% if the parental education level was limited or mixed. A common environmental effect, that is, environmental factors shared by family members, was found (17–22%) if the level of parental education was limited or mixed but not if it was high. With increasing parental education, common environmental variance in BMI decreased at age 14 amongst boys (from 22% to 3%) and girls (from 17% to 10%); heritability increased amongst boys from 63% to 78%, but did not change amongst girls. The common environmental component disappeared and heritability of BMI was larger at the age of 17 for all parental education groups. These results are summarized in Fig. 4. We found that in highly educated families, a common environment did not affect variation in adolescent BMI but did in families with limited parental education [33]. This indicates that the contribution of various aetiological factors to early obesity development is likely to depend on parental educational level. If confirmed by other studies, these findings suggest that prevention programmes could be tailored according to parental educational level.

Figure 4.

Proportion of variance (as percentage of the total) in BMI in adolescent boys as a function of age (three time-points) and parental educational level (low, mixed or high). Blue bars represent genetic variance (heritability) and red areas show the proportion of variance due to common environmental effects. Adapted from Lajunen et al. [33].

Modification of the heritability of BMI and related traits by the level of physical activity has been convincingly shown in three recent studies [34-36]. Based on moderation models of twin data, the heritability of BMI is dependent on the level of physical activity at leisure time amongst young adult twins, as shown in Fig. 5 [35]. This dependence on physical activity was also seen for waist circumference in men. Similar results were obtained in the Danish Geminakar study of adult twins and in the Finntwin12 study of young adult twins [36] as well as amongst middle-aged Vietnam veteran twins [34].

Figure 5.

Modification of heritability estimates for BMI by level of leisure physical activity amongst young adults. Fitted models based on moderation models of genetic and environmental influences on BMI. Adapted from Mustelin et al. [35]

Li et al. [32] extended this model design to use measured genotypes known to be associated with BMI. In the very large EPIC-Norfolk study of more than 20 000 adults, a genetic risk score based on a dozen known genes identified from GWA studies was constructed. The study showed cross-sectionally that physical activity attenuated the association between risk score and BMI (Fig. 6 [32]), but prospectively the result was even more dramatic (Fig. 7). Overall the BMI genetic risk score was not associated with weight change (based on measured weight at baseline and at follow-up). The association between genetic risk score and weight change was positive in inactive subjects, as expected based on their role as risk alleles for BMI. However, amongst active subjects, the direction of the association was reversed, that is, those with a genetic predisposition to gain weight in the current obesogenic mostly physically inactive environment are more likely to lose weight when active. Physical activity itself was not correlated with the BMI genetic risk, so it is acting here as a purely environmental factor. As the authors concluded, ‘these findings challenge the deterministic view of the genetic predisposition to obesity that is often held by the public, as they show that even the most genetically predisposed individuals will benefit from adopting a healthy lifestyle’ [32].

Figure 6.

Attenuation of the association between BMI genetic score and BMI by level of physical activity. Source [32].

Figure 7.

Modification of the association between BMI genetic risk score and weight change in the prospective EPIC-Norfolk study. Source [32].

Reliable demonstration of gene–environment interactions requires very large sample sizes for the SNPs with typical effect sizes found in GWA studies. The role of physical activity in modifying the action of the FTO gene on BMI was conclusively demonstrated in a meta-analysis of 45 studies with 218 166 adults [37]. Physical activity attenuated the association between the FTO risk allele and the odds of obesity by 27% as predicted by the twin models. The studies of Li et al. [32] with a dozen genes and of Kilpeläinen et al. [37] demonstrate that very large sample sizes are needed. However, the options for creating gene risk scores are increasing. Once genome-wide genotype marker data have been obtained, they can be used to construct measured gene by measured environment interaction models, using targeted genes (i.e. the ‘traditional candidate gene’ approach; as in [37]) or nominally significant GWA study data for construction of genetic risk scores of likely causal genes (i.e. the most genetically informative subset of genes in the genome; as in [32]). Finally GCTA [17], which was designed to estimate the proportion of phenotypic variance explained by genome- or chromosome-wide SNPs for complex traits, can be applied in gene–environment interaction analyses for specific environments (including, for example, physical activity, diet or history of deprivation in childhood).


The past decade has seen the most rapid changes in genetics, as sequencing of the human genome has translated into tools to understand the role of genetic variation underlying inter-individual differences in traits and risk of disease. Relatively few common variants with even modest effect size were found with the first wave of GWA studies, thus generating debate on the source of the missing heritability; however, recent progress in increasing the coverage of the genome, increasing study sample size through meta-analyses and improving phenotyping have meant that a substantial proportion of the variance can now be accounted for in many common traits and conditions. Further statistical and technological advances can be expected to increase the understanding of the complex genetic epidemiology of these traits and diseases.

For traits such as obesity and BMI, which are known to vary with time, between countries and even within countries by social strata, gene–environment interactions appear to be of importance. This importance has been clearly shown by results from twin studies and supported by the information gained from recent GWA studies. To understand the genetic epidemiology of diseases and traits, study designs based on the known characteristics of the disease are needed to further elucidate the aetiological factors.

Conflict of interest statement

No conflicts of interest were declared.


Studies in the Finnish Twin Cohort have been supported by the Academy of Finland Center of Excellence in Complex Disease Genetics (grants 213506 and 129680), the Academy of Finland (grants 100499, 205585, 118555 and 141054), Global Research Awards for Nicotine Dependence (GRAND), European Network for Genetic and Genomic Epidemiology (ENGAGE), FP7-HEALTH-F4-2007 (grant 201413) and by the NIH (grants DA12854, AA12502, AA00145, AA09203, AA15416). I thank Alfredo Ortega-Alonso, PhD for creating Figs 2 and 3. I would also like to thank all my past and present co-authors and collaborators.