Beyond genotype to phenotype: why the phenotype of an individual cannot always be predicted from their genome sequence and the environment that they experience


  • Alejandro Burga,

    1. Genetic Systems, EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG) and UPF, Barcelona, Spain
    Search for more papers by this author
  • Ben Lehner

    Corresponding author
    1. Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
    • Genetic Systems, EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG) and UPF, Barcelona, Spain
    Search for more papers by this author


B. Lehner, Genetic Systems, EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG) and UPF, Dr. Aiguader 88, 08003 Barcelona, Spain

Fax: +34 93 316 00 99

Tel: +34 93 316 01 00



One promise of personalized medicine is that it will be possible to make useful predictions about the phenotypes of individuals from their complete genome sequences (e.g. concerning their susceptibility to disease). However, to what extent is knowledge about an individual's genotype, together with information about the environment that they have experienced, sufficient to predict phenotypic variation? In the present review, we argue that, although the ‘typical’ phenotypic outcome of an individual's genome can be predicted, it is much more difficult to predict the actual outcome for a particular individual. We highlight three reasons for this. First, the outcome of mutations can be influenced by random (stochastic) processes. Second, genetic variation present in one generation can influence phenotypic traits in the next generation, even if individuals do not inherit this variation. Third, the environment experienced by one generation can influence phenotypic variation in the next generation. These contributions to phenotypic variation have long been appreciated by quantitative geneticists, although they have only recently been studied at the molecular level. Taken together, they mean that, in many cases, the genotypes of individuals and the environment that they experience may not be sufficient to determine their phenotypes. A more comprehensive genotype-to-phenotype model will be required to make accurate predictions about the biology of individuals.




RNA interference


What is the biological basis of individuality? To put it simply: what makes each of us different from each other? The phenotype of each individual is usually considered as an interaction between two variables: the genes each individual carries and the environment that they experience. In this review, the word ‘environment’ refers to general external factors that are (or could) be shared by groups of individuals, such as diet, exposure to pathogens and lifestyle. However, are the genes and the environment of an individual sufficient to determine their phenotype? We review the evidence that suggests this is not always the case. In the present review, we focus on three areas. First, recent studies have provided insights into how stochastic molecular variation can be an important influence on phenotypic variation. Second, the outcome of an inherited mutation can be influenced by the genotype of the previous generation. Third, in multiple species, it is now clear that the environment of one generation can influence the phenotype of subsequent generations. Research in these three areas is enriching our conception of what determines an individual, revealing that the genes and the environment of an individual are not necessarily sufficient to determine their phenotype (Fig. 1). We speculate that this may also be true in our own species, and that it could be important for understanding and predicting disease susceptibility.

Figure 1.

Factors influencing the phenotype of an individual. In addition to the genome and environment experienced by individuals, recent advances have provided molecular insights into the influence of non-inherited parental genetic variants, parental environment and stochastic molecular variation (noise). These factors can be classified according to whether they would contribute to the traditional genetic (Vg) (light blue) or environmental (Ve) (light red) components of phenotypic variance (Vp) used in the population genetics literature.

Mapping phenotype to genotype plus the environment

The distinction between genotype and phenotype was an important conceptual advance made by Danish researcher Wilhelm Johannsen in 1903 [1]. When studying the inheritance of traits such as seed size in beans, he noted that, even though the beans were derived from highly inbred (isogenic) lines, the size of the seeds was variable and followed a normal distribution. To explain this observation, he differentiated between the genotype and phenotype of an individual seed and attributed the differences in seed size to variability in the environment in which the plants developed. This joint contribution of genes and environment has been the reigning paradigm that has been used to explain phenotypic variation among individuals. This suggests that, if we know the genome sequence of an individual and understand how the environment that they experience modifies the action of these genes, we should be able to predict all of their phenotypic traits. But how much can we say about any individual human simply based on knowledge of their genome sequence? A recent study went some way towards addressing this question in the model organism budding yeast, showing that, at least for some phenotypic traits, reasonable predictions could be made about how they vary relative to a reference individual from individual genome sequences [2]. The prevalence of genetic interactions or epistasis in genomes [3, 4] also poses a challenge for these predictions because the effect of a particular variant could depend on the genetic background. However, such studies only assess the ‘typical’ outcome of a particular genome sequence, and not the actual phenotype of each individual with that genome.

There is no doubt that both genetic and environmental variation have a major role in determining phenotypic variation [5]. What is less clear in many instances is their relative contribution. In 1920, Sewall Wright, one of the founders of quantitative genetics, addressed this problem by studying the relative importance of inheritance and environment in the piebald pattern of guinea pigs [6]. Wright could estimate the contribution of genes (heritability) to the total phenotypic variance from parental–offspring correlations and, in addition, he could estimate the contribution of the environment common to litter mates before birth (tangible environment). In an outbred control population, 42% of the phenotypic variation in piebald pattern could be attributed to genes but, unexpectedly, the contribution of the tangible environment was very small. Wright hypothesized that the remaining 58% must be a result of ‘irregularities during development due to the intangible sort of causes to which the word chance is applied’ [6].

Seventy years later, Gärtner reached similar conclusions by studying highly inbred lines of mice that showed extensive phenotypic variation in diverse traits even when grown in controlled environments and attributed this variation to a ‘third component’ in addition to genes and environment [7]. In quantitative genetics, the term ‘environment’ was initially used by Fisher to denote arbitrary external causes independent of heredity [8]. Thus, traditionally, the total phenotypic variance of a population (Vp) is partitioned into a genetic (Vg) and an environmental (Ve) component, where Vp = Vg + Ve. The environmental variance term, Ve, includes both tangible and intangible (stochastic) sources of variation [9] (Fig. 1).

Progress made in the last decade has increased our knowledge about the sources and consequences of this previously intangible variation. This, together with the fact that an important component of the phenotypic variance is not genetic, nor a result of general environmental factors, suggests that the measurement of stochastic variables is likely to increase our predictive power for individuals [10] and also suggests that stochastic processes should be considered as a third variable, independent from the environmental variance term [11]. Moreover, when aiming to predict the particular phenotype of individuals, it becomes a necessity to gain molecular insights into the various mechanisms in which Vg and Ve are further partitioned (Fig. 1).

Most human diseases also have a significant genetic component (i.e. heritability) [5], yet many observations indicate that genes and the tangible environment are insufficient to predict the phenotypes of individual people. Monozygotic (MZ) twins often differ substantially in their susceptibility to disease, despite sharing the same genome sequence and normally similar environments [12]. For example, if a woman develops breast cancer, the probability that her monozygotic twin sister will also develop this condition is only approximately 0.2 (concordance rate of 20%) [13]. However, if a man suffers from schizophrenia, his twin brother will suffer the same condition in approximately 50% of cases [14] (i.e. the genetic equivalent of tossing a coin). Although twin studies suggest the presence of a third factor besides genes and environment, a low concordance rate in disease liability in MZ twins cannot be taken as evidence for a lack of heritability [15]. MZ twins may show some somatic genetic variation [16, 17], although this is unlikely to explain phenotypic-discordance in most cases [18, 19]. Importantly, for these and many additional diseases [12], concordance rates set a limit to the best predictions we can ever hope to achieve from knowledge of an individual's genome alone. Aside from somatic mutations arising in particular tissues [20], what are the causes of this variation in the outcome of mutations? Is the general environment responsible for these differences? No single pair of individuals has experienced exactly the same environment. However, several studies performed with twins who grew up in the same or different families have tried to estimate the contribution of the ‘shared environment’. Shared environment makes reference to the environmental factors that two siblings growing in the same family have in common. The unexpected result of these studies is that the proportion of the variance explained for most traits and diseases is quite small. In other words, genetics aside, two siblings growing in the same house are not much more similar than any two children taken from the same population [21, 22]. In model organisms, the failure of a simple genotype-to-phenotype model is even more striking because both the genotype and the environment can be controlled. For example, in both Caenorhabditis elegans [23] and mice [7, 24, 25], isogenic individuals often show substantial phenotypic variation in a highly-controlled homogenous environment, particularly when they carry detrimental mutations [26] (Fig. 2). What is the nature of this third variable?

Figure 2.

Stochastic influences on the outcome of mutations. Radioactive decay is a stochastic process at the atomic level but deterministic in a population (A). Stochasticity and determinism are not mutually exclusive. Variation in the outcome of an inherited mutation in C. elegans (B). The mutation flh-1(bc374) inactivates a transcription factor required for embryonic development. However, even in isogenic strains and in a homogenous environment, only a subset of individuals is detrimentally affected by the mutation.

Tossing a coin: how inter-individual variation in gene expression can influence the outcome of mutations

It has long been suspected [27, 28] that isogenic individuals vary stochastically at the molecular level because of the low copy numbers of many important biological molecules involved in gene expression and the genome itself. Only recently, however, it has become possible to visualize and to quantify this inter-individual variation using fluorescent reporter constructs and single molecule detection techniques [29, 30]. Inter-individual variation in gene expression can be substantial, in both bacteria and higher eukaryotes [31-33] and can have important biological consequences, such as in cell signalling and development [34, 35]. It can also be a substantial cause of variation with respect to the phenotypic consequences of inherited mutations, as revealed in several recent studies [10, 36-38]. In this review, we focus on variability in gene expression, which could also be influenced by variation in DNA methylation [39] or errors in transcription or translation [40]. However, promiscuous molecular interactions or catalysis [41, 42], protein aggregation [43], variable cell-to-cell contacts and variation in mechanical force generation could all have a stochastic nature that influences phenotypic variation.

The first study to link variability in gene expression to the phenotypic outcome of a mutation (incomplete penetrance) was performed by Elowitz et al. [44]. The bacteria Bacillus subtilis can develop into a dormant spore when environmental conditions do not guarantee proper cell growth [45]. SpoIIR, together with other genes, activates this differentiation programme. Genetic modifications affecting the rate and/or time of onset of spoIIR expression were shown to affect the sporulation process of only a subset of the population. Quantifying the induction of a transcriptional reporter for SpoIIR by time-lapse microscopy revealed that variation in the rate and delay of expression of this gene was correlated with the phenotypic outcome of a mutation in SpoIIR in each individual. However, variation in the induction of the mutated gene only partially explained variation in the outcome of the mutation amongst individuals. This suggests that other unidentified factors, such as variation in the expression of other genes, are also important: reduced activity of the spoIIR gene exposed phenotypic variation that was normally buffered by the fully active gene [44].

The downstream consequences of an incompletely penetrant mutation have also been studied in C. elegans [36]. As a model, Van Oudenaarden and colleagues [36] used the development of the worm intestine, for which the gene regulatory network is well described [46]. The gene skn-1 codes for a transcription factor that is maternally provided to the zygote and is required for embryonic specification of the EMS blastomere [47]. Embryos from mothers carrying mutations in skn-1 undergo a developmental arrest (100% lethality) and the endoderm is absent in approximately 70% of embryos [47]. The SKN-1 protein initiates a highly redundant regulatory cascade by activating the expression of pairs of transcription factors, including med-1/med-2 and end-1/end-3, which later leads to the activation of elt-2, the master regulator of gene expression in the intestine. Using a fluorescence in situ hybridization technique that can detect single mRNA molecules, the levels of several downstream targets of skn-1 were quantified in individual embryos. skn-1 mutants showed an increased variability among individuals in the expression of the target gene end-1 and a lower mean level compared to wild-type worms. Animals with end-1 expression below a certain threshold failed to activate elt-2, producing an approximate binary response in the activation of this gene that presumably affects intestinal development. In a similar study, mutations in the transcription factor alr-1 were shown to increase variation in the expression of the target gene mec-3 in C. elegans [37].

These studies [36, 37] demonstrate how mutations in transcription factors can increase the variability in the expression of targets genes and also how this variability can propagate in gene regulatory cascades. However, although single molecule detection by fluorescence in situ hybridization allows an elegant quantitative description of the system, the use of fixed samples limits the study of gene expression dynamics and makes it difficult to establish a causal relationship between molecular variation and the actual phenotypic variation that occurs in each individual. The use of fluorescent reporter proteins such as green fluorescent protein allows the dynamics of gene expression to be quantified in vivo and compared among individuals. Importantly, the use of live imaging to quantify gene expression also makes it possible to evaluate how well particular differences in gene expression predict later phenotypic variation.

Genetically identical C. elegans individuals growing in an homogeneous environment show a high degree of variation in their lifespan, similar to the high variability present in human populations [48]. Also, isogenic lines of rats growing under laboratory controlled conditions show a large range of variability in their life spans (ranging from 60 to 140 weeks) [49]. Rea et al. [50] noted that, after applying a mild heat stress, which extends lifespan, the induction of a transcriptional fluorescent reporter for the heat shock hsp-16.2 gene was highly variable among individual worms. Furthermore, they showed that worms inducing higher levels of the reporter were longer lived compared to their low-expression counterparts [50]. However, given that this heat stress increases the mean lifespan of the population, it is not clear whether the predicted lifespan differences are the result of intrinsic differences in the nonstressed individuals or differences in the heat shock response.

In another example of the use of reporters, Pincus et al. [51] developed an ingenious way of measuring the growth rate and reporter fluorescence from individual worms during most of their lifespan. They found that variability in the expression levels of three microRNA reporters through mid-adulthood (mir-71, mir-246 and mir-239) was a predictor of lifespan [51]. Given that these microRNAs act upstream in the insulin pathway, variability in their levels may be one causal determinant of lifespan. Also, there is evidence that a variable pathogenicity response among individuals, as reported by a sod-3 transcriptional reporter, could play a role in determining the lifespan of C. elegans [52]. In all of these studies, however, reporter gene expression levels are measured quite late in life, meaning that they could, in reality, be reporting on variation in the life-history exposure to environmental stimuli.

We have also made use of fluorescent reporters to understand the causes of incomplete penetrance in C. elegans [10]. tbx-8 and tbx-9 are two partially redundant genes that originated from a gene duplication event and are required for the morphogenesis of the worm [53, 54]. A double knockout of these genes is 100% embryonic lethal, although the deletion of each gene alone results in a subset of embryos with abnormal phenotypes, despite the fact that all of them are genetic clones developing in the same environment. We found that tbx-9 null mutants induced higher levels tbx-8 compared to non-mutant animals, indicating the existence of a compensatory feedback circuit [55, 56]. Moreover, the expression of this reporter correlated with the phenotypic outcome of the mutation: those embryos with higher levels of induction of the redundant gene were more likely to develop into phenotypically normal animals.

We also found that variation in the expression of chaperones such as daf-21 (homologue of mammalian HSP90) early in development predicted tbx-9 mutation outcome. The outcome of many mutations is, directly or indirectly, dependent on chaperone activity [38, 57-59], and so stochastic variation in chaperone activity among individuals may represent a somewhat general influence on mutation outcome [10, 38]. In the case of the tbx-9 mutation, variation in the two buffering mechanisms (partially redundant gene duplicate and molecular chaperone expression) is independent. Thus, simultaneously quantifying the levels of both systems increased the accuracy of predictions. For any particular mutation, variation in many genetic interaction partners [3] could contribute to incomplete penetrance. By quantifying this variation, it should be possible to increase the accuracy of phenotypic predictions in individuals carrying a mutation.

Not your genes, but your parents'

Beyond stochastic molecular variation, there are other reasons why the genotype and environment of an individual can be insufficient to explain their phenotype. For example, some mutations do not affect the phenotype of the individuals who carry them but rather that of their offspring. Famous examples of this phenomenon are mutations in genes that code for maternal factors controlling development. The early development of a wide range of organisms depends on the activity of genes contributed maternally as mRNA or protein [60]. For example, in Drosophila melanogaster, genes such as bicoid and hunchback are contributed maternally and deposited as mRNA during oogenesis [61]. Consequently, a female fly homozygous mutant for bicoid will produce offspring with a ‘mutant’ phenotype even if the offspring themselves are only heterozygous for the mutation. Thus, in this relatively simple scenario, it is clear that the phenotype of the offspring also depends on the genotype of the mother.

Interestingly, the parental genotype can also influence disease susceptibility in offspring. Xing et al. [62] studied this phenomenon in Drosophila using a hyperactive JAK kinase mutant that develops blood tumours. A screen for genetic modifiers of this mutation (hopTum-I) revealed that many of these modifiers (such as Krüppel) act epigenetically and that their effects persisted in the offspring for at least two generations even in the absence of the modifier mutation [62] (Fig. 3). If this phenomenon also applies to humans, then the genetic variation in your parents' genome that you did not inherit could increase your risk of developing disease (e.g. cancer).

Figure 3.

Parental genotype influences on an offspring's phenotype. Paternal non-inherited genotype influences blood tumour susceptibility in the offspring of flies [62]. Mutations in Krüppel (Kr1) can enhance tumour formation in offspring carrying a HopTum-I mutation even when the offspring do not inherit the Kr1 mutation (*).

Another elegant example of an individual's phenotype being influenced by the genetics of their parents is provided in C. elegans. Genetic incompatibility occurs when individuals with different genotypes produce nonviable or infertile offspring. This occurs in crosses between two different strains of C. elegans; when the two strains are mated, a subset of the hybrid progeny die. Mapping the loci underlying this incompatibility identified two important natural variants. For one of these, it is the zygotic genotype that is important but, for the other, it is the parental genotype that matters: embryos homozygous for a mutation in the gene zeel-1 only arrest if the parent's sperm carries the incompatible allele in the gene peel-1 [63]. Thus, the outcome of a mutation in zeel-1 depends on the parental genotype at a second locus.

Blame your parents' lifestyle: transgenerational ‘epigenetic’ influences on the outcome of inherited mutations

Mutations in particular genes are relatively infrequent events. However, changes in the environment can be much more frequent. How can biological systems cope with this? In response to environmental change, a single genome can produce different phenotypes providing adaptation to the new conditions. This property is known as phenotypic plasticity [64]. If a new environmental condition is also likely to be experienced by the next generation, then it could prove adaptive to pass on information about the environment to the next generation and so elicit an appropriate phenotypic response. An old and controversial idea, such ‘Lamarckian’ influences on phenotypic variation are again beginning to receive much attention.

Phenotypic changes that occur in response to an environmental change in the previous generation have now been reported in several species [65-68]. One example is the flowering time in the monocarpic herb Campanulastrum americanum, where the maternal light environment influences the choice between annual and biennial flowering time in the next generation [69]. This ‘transgenerational’ plasticity could provide a fitness advantage, predicting long-term environmental variation [69].

In plants, there is no early separation of soma and germline, and gametes can be derived from somatic tissue late in development. It is therefore much easier to envisage how the environment experienced by one generation could influence the phenotype of the next generation. However, there are now also increasing numbers of likely examples of epigenetic (non-DNA encoded) inheritance described in animals [67-70]. For example, the waterflea Daphnia cucullata responds to the presence of predators by changing its morphology and developing a larger helmet. Agrawal et al. [70] showed that the progeny of mothers who grew in the presence of predators were better protected (had larger helmets) than those whose mothers grew in a control environment. This adaptive phenotypic plasticity effect could be detected for two generations.

In nematodes, RNA interference (RNAi)-triggered gene silencing can be inherited for many generations when selecting for phenotypically silenced progenitors in the absence of the original dsRNA trigger [71-73]. A recent study by Hobert and colleagues [86] provided a potential explanation for the existence of such a mechanism. It was found that the anti-viral RNAi-response of C. elegans [74] can also be transmitted to its progeny, and also that the antiviral RNA agent could be transmitted through sperm [75]. Perhaps this RNAi-based mechanism is also responsible for a reported case of inheritance of a behavioural imprinting in C. elegans [76]. In addition, mutations in genes coding for chromatin modifiers have been shown to have transgenerational effects on lifespan in C. elegans, possibly by altering the normal resetting of chromatin marks that takes place in the germline [77, 78].

Transgenerational effects have also been described in mammals. Examples include the epigenetic inheritance of coat colour in mice [79-81], parental imprinting of gene expression [82], maternal behaviours influencing gene expression in offspring [83] and inherited RNA-induced ‘paramutations’ [84]. Two recent studies have also demonstrated the potential for the transgenerational inheritance of environmental information in mammals. Ng et al. [85] showed that changes in paternal diet can influence the metabolic status of offspring in rats. Male rats fed a high-fat or normal diet were crossed with females on a control diet. Markedly, the female offspring of fathers fed a high-fat diet had impaired glucose-insulin homeostasis (Fig. 4). Expression profiling using microarrays showed that the β-cells of affected offspring had mild changes in the expression of hundreds of genes. For at least one gene, this change in expression was correlated with changes in the DNA methylation state of its promoter region [85]. In a similar study, Rando and colleagues showed that a paternal low-protein diet in mice caused an up-regulation of proliferation and lipid biosynthesis gene expression in the livers of the next generation. These changes in hepatic gene expression were also modest and associated with differences in promoter methylation [86]. However, as in rats, the actual mechanism of transgenerational inheritance remains unknown.

Figure 4.

Parental environment influences on the phenotype of offspring. Ng et al. [85] recently reported that female rats whose fathers were fed a high-fat diet (*) show altered glucose-insulin homeostasis compared to a control diet group.

In humans too, there is some epidemiological evidence for transgenerational environmental influences on phenotypic variation [87, 88]. How can environmental information be propagated through the germline to the next generation? Possibilities include patterns of histone modifications; even human sperm retain modified histones in many promoter regions [89, 90] and these are transmitted to the zygote [91]. In addition, DNA methylation, noncoding RNAs, proteins and metabolite levels could all transmit epigenetic information across generations. Understanding how this information is transmitted, as well as its consequences for phenotypic variation and evolutionary theory, are major questions for future research.


With advances in next generation technologies, there is great excitement about the use of individual genome sequencing for the prediction of disease risk and in the development of personalized medicine. But what can we really expect from this? We have highlighted in the present review that studies in monozygotic twins and model organisms show that precise phenotypic predictions from personal genomes are unlikely. The genome of an individual is an important factor, although clearly not sufficient to explain variation in most phenotypic traits. A role for the environment in influencing phenotypic variation is widely appreciated, although we have emphasized three additional important influences on phenotypic variation that, although long recognized by quantitative geneticists [9], have only recently begun to be studied at the molecular level. First, biological processes are inherently stochastic and this random variation can be an important influence on the outcome of a particular mutation. Second, genetic variation carried by one generation can be an important influence on phenotypic variation in the next generation, especially for developmental processes. Third, there is now good evidence that the environment experienced by one generation can influence the phenotype of the next generation. Maternal and paternal genetic effects, as well as nongenetic inheritance, clearly demonstrate that parental influence on offspring goes beyond the inherited genes. New layers of detail will undoubtedly need to be added in forthcoming years. For example, humans can be considered as ‘superorganisms’ with an internal ecosystem of diverse symbiotic microbiota and parasites [92], which is highly variable among individuals [93], and the implications of this in health and disease are only beginning to be studied.

When it comes to human genetics, it is the individual that matters. A person does not want to only know the typical outcome of a mutation that she carries, she wants to know whether she will actually develop a disease or not. To be able to make predictions about the phenotypes of individuals, it is clear from both historical and recent work in model organisms that knowledge of genome sequencing will be insufficient in many cases. Rather, we need to consider how genetic, environmental and stochastic variation, together with transgenerational effects, combine to determine the phenotypes of individuals.


Our research is funded by grants from the European Research Council, Ministerio de Ciencia e Innovación Plan Nacional BFU2008-00365 and BFU2011-26206, Agència de Gestió d'Ajuts Universitaris i de Recerca, ERASysBio+, the European Molecular Biology Organization Young Investigator Program, EU Framework 7 project 277899 4DCellFate, the EMBL/CRG Systems Biology Program and by a Formación de Personal Investigador–Ministerio de Ciencia e Innovación fellowship to A.B. We thank three anonymous reviewers for their helpful suggestions.