Hybrids between genetically diverse varieties display enhanced growth, and increased total biomass, stress resistance and grain yield. Gene expression and metabolic studies in maize, rice and other species suggest that protein metabolism plays a role in the growth differences between hybrids and inbreds. Single trait heterosis can be explained by the existing theories of dominance, overdominance and epistasis. General multigenic heterosis is observed in a wide variety of different species and is likely to share a common underlying biological mechanism. This review presents a model to explain differences in growth and yield caused by general multigenic heterosis. The model describes multigenic heterosis in terms of energy-use efficiency and faster cell cycle progression where hybrids have more efficient growth than inbreds because of differences in protein metabolism. The proposed model is consistent with the observed variation of gene expression in different pairs of inbred lines and hybrid offspring as well as growth differences in polyploids and aneuploids. It also suggests an approach to enhance yield gains in both hybrid and inbred crops via the creation of an appropriate computational analysis pipeline coupled to an efficient molecular breeding program.
Charles Darwin recognized the predominance of sexual reproduction in plants, and published descriptions of hybrid vegetables (including hybrid maize) in his 1876 book The effects of Cross and Self-fertilization in the Vegetable Kingdom (Darwin, 1876). Darwin wrote ‘There is weighty and abundant evidence that the flowers of most kinds of plants are constructed so as to be occasionally or habitually cross-fertilised by pollen from another flower, produced either by the same plant, or generally …by a distinct plant. Cross-fertilisation is sometimes ensured by the sexes being separated, and in a large number of cases by the pollen and stigma of the same flower being matured at different times … Cross-fertilisation is also ensured, in many cases, by mechanical contrivances of wonderful beauty, preventing the impregnation of the flowers by their own pollen … plants (can) present two or three distinct forms, adapted for reciprocal fertilisation, (that) can hardly fail to be intercrossed in each generation …there is a class, in which the ovules absolutely refuse to be fertilised by pollen from the same plant, but can be fertilised by pollen from any other individual of the same species. There are many species which are partially sterile with their own pollen … there is a large class in which the flowers present no apparent obstacle of any kind to self-fertilisation, nevertheless these plants are frequently intercrossed, owing to the prepotency of pollen from another individual or variety over the plant’s own pollen’ (Darwin, 1876). Sexual reproduction is clearly advantageous and under strong evolutionary selection. Animal behavioral studies and human cultural taboos suggest that most species have evolved mechanisms to avoid inbreeding (Pusey & Wolf, 1996) and inbreeding depression is essentially the opposite of heterosis or hybrid vigor. Heterosis is observed in a large number of different plant and animal species, and is a very basic conserved biological phenomenon, yet the molecular nature of heterosis remains poorly understood. Single gene or single trait heterosis is considered a distinct type of heterosis and accounts for many of the observations in the literature focused on specific traits, species or varieties. This review describes evidence suggesting a relatively simple molecular mechanism underlying general multigenic hybrid vigor shared across diverse species. The intent is to stimulate thought, generate challenges and promote further research toward understanding heterosis and sexual reproduction as well as stimulate development of novel approaches to molecular breeding. The simple working model is as follows: cells distinguish between parental alleles based on the relative stability of the encoded proteins, and use allele-specific gene expression to conserve energy and promote growth. Outcrossing provides more opportunity for allele selection and thereby increases the potential for enhanced vigor (Fig. 1).
II. Early studies of heterosis
The yield impact of maize hybrid crosses was described in 1908 by George Harrison Shull at the Cold Spring Harbor Laboratory in New York (for details see Shull, 1946) . Shull is most commonly credited for the discovery of heterosis in maize having introduced the concept of creating inbred lines which, when crossed, generate high-yielding hybrids with consistent, synchronous growth and development. Maize hybrid vigor was originally designated the ‘stimulus of heterozygosity’ by Shull, and was rapidly confirmed by E. M. East at the Connecticut State College. These early studies triggered a search for the biological mechanism underlying maize hybrid vigor that continues today, over a century later. Shull and East published several research papers on hybrid vigor between 1908 and 1920, and ‘stimulus of heterozygosity’ was shortened to ‘heterosis’ by Shull in 1914. Both Shull and East believed that the level of hybrid vigor was directly related to the number of heterozygous ‘characters’. (See the review by Crabb (1947) for an early historical account of the research on hybrid vigor. It was clear from these early studies that inbreds suffer serious yield losses (typically 40–60% or more) relative to outcrossing parental varieties.
Hybrids made from low-yielding inbreds typically gain the yield losses back, and occasionally yield up to 120% of the original parents. Generally, the greater the genetic diversity of the parents, the higher the level of heterosis achieved, although very diverse parental lines do not always make the best hybrids (Moll et al., 1965). The theory that varying levels of hybrid vigor are caused by the number of heterozygous characters in the hybrids became known as the ‘over-dominance theory’ (Shull, 1948). Overdominance is often described as the result of combining high quality alleles, and can be thought of as having the advantage of enzymatic or regulatory activity over the sum of different activities under varying conditions. Other researchers believed that hybrid vigor resulted from the number of dominant growth-promoting elements rather than the number of heterozygous characters (Jones, 1925). This became known as the ‘dominance theory’, and is commonly thought of as complementation of recessive deleterious alleles. Many attempts were made to identify specific trait differences between hybrid offspring and their inbred parents, but the main difference consistently found was an enhanced growth rate in the hybrid (Castle, 1926). Heterosis in plants and animals was initially considered to be the same phenomena, and it was believed that all tissues of a hybrid shared the growth advantage (Livesay, 1930). Hybrid mice were shown to display a fitness advantage over inbred mice, they reproduce more frequently, and are more resistant to stresses under adverse environmental conditions (Barnett & Scott, 1963; Barnett, 1964). Ashby and colleagues examined various traits in maize such as embryo size, cell size, meristem size, nucleus size, photosynthetic efficiency, germination, etc., and concluded that inbreds and hybrids had a small difference in embryo size (Ashby, 1930, 1932). The early research found no consistent differences in observed traits that could explain hybrid vigor and suggested that general multigenic heterosis is a cell-based, evolutionarily conserved basic biological phenomenon that displays enhanced growth rates (cell division) without an obvious impact on the overall developmental program. Many more recent studies on specific genes or specific traits suggest that heterosis is more complex than any single model can explain, there remains the possibility of an underlying biochemical difference to help explain heterosis across species.
Some early studies suggested that single gene differences might underpin hybrid vigor, but Donald F. Jones from the Connecticut Experimental Research Station, and others, concluded that the results of the first several decades of heterosis research was most consistent with the involvement of multiple genes (for an excellent personal description of the early history see Singleton, 1941). Specific traits controlled by a small number of genes can display heterosis in specific inbred crosses, but do not explain the entire spectrum of growth and stress-resistance characteristics of hybrids in all species. For example, recent studies in tomato demonstrate that the Single Flower Truss gene drives heterosis for yield (Krieger et al., 2010), but a flowering regulatory gene/pathway is unlikely to explain heterosis in diverse species such as mollusks and mammals. Similarly, single gene or single trait heterosis in maize is predicted to be linked to the complex genetics caused by genome duplication and subsequent fractionation events (Schnable et al., 2009). The dominance and over-dominance theories remain the best models to explain heterosis, especially single gene or single trait heterosis and are not mutually exclusive. Epistasis, the interaction between different genes, was later added as a third theory to explain hybrid vigor (Powers, 1944). Most recent results using genomic-scale approaches suggest that all three models of heterosis are in operation but are not sufficient to explain all the experimental observations.
1. It is reasonable to conclude that multiple plausible explanations exist for heterosis of specific traits, but what is the explanation for growth differences?
Inbreeding is detrimental to growth and vigor, creating less overall biomass as well as higher susceptibility to disease and environmental stress. Any model for hybrid vigor should explain the decreased growth and vigor of inbred plants, animals and even the decreased stature of inbred humans. Similarly, a viable model for multigenic hybrid vigor needs to explain why the highest heterosis is generally observed when the parental varieties are genetically diverse. The successful model also needs to explain why progressive polyploids display higher levels of heterosis than inbreds or diploid hybrids with similar genetics (Auger et al., 2005; Birchler et al., 2005, 2006; Riddle & Birchler, 2008; Riddle et al., 2010), and why haploid plants and plants carrying extra unpaired chromosomes (aneuploids) display lower heterosis than diploids (Birchler et al., 2007). These observations are not easily explained by the existing theories of dominance, overdominance or epistasis. The successful model will be species-independent, cell-based, evolutionarily conserved and explain why the underlying biological mechanism of hybrid vigor has been so elusive.
Commercial interest in exploiting maize heterosis resulted in the founding of Pioneer Hi-Bred by Henry A. Wallace in 1926. The very high yield of hybrid maize (Zea mays) and the consistency in first generation growth and maturation provide an excellent incentive for farmers to purchase hybrid maize seed each and every year. Commercial maize hybrids were introduced in the 1930s and 1940s and yield gains moved from nearly zero to c. 2% annually. The overwhelming majority of maize seed sold in the USA and developed countries today is hybrid seed generated by commercial seed companies. Very strong interest remains in determining the molecular mechanism of heterosis because that understanding could create more efficient routes to accelerate yield gains in maize and possibly other crop species. A molecular breeding program driven by an understanding of yield biology could result in savings of tens of millions of dollars (out of hundreds of millions spent) in empirically driven field trials carried out each year. Understanding multigenic heterosis can be viewed as a route toward understanding the biology of yield, the most important trait for farmers selling commodity crops. From studies of heterosis in Pioneer HiBred’s commercial maize varieties introduced over time between the 1950s and 1990s, Don Duvick demonstrated that hybrids and their inbred parental lines were both increasing in yield essentially in a parallel fashion (Duvick, 1992, 1999). This suggests that improvements in the inbred parental breeding varieties are leading to better commercial hybrids. Over the past few decades, annual yield gains appear to be slowly leveling off in conventional breeding programs (i.e. not counting the incremental yield gains created by transgenes). The leveling of yield gains places increasing importance on understanding the biology of yield and developing knowledge-based molecular breeding technology. Heterosis in other species is also of strong academic and commercial interest and will benefit from an efficient molecular breeding analysis pipeline. Ultra high throughput DNA sequencing is also providing breeders with new molecular tools.
III. Heterosis in diverse species
The vigor associated with hybrid animals was recognized thousands of years ago as shown by the breeding of male donkeys and female horses to create the sterile, but vigorous hybrid mule. F1 hybrid mice have also been shown to be more vigorous, reach maximum growth rates earlier and display higher growth overall than inbreds (Laird & Howard, 1967). Shellfish display high levels of heterosis and have become good model systems for research on growth, metabolism and hybrid vigor. Before genomic-scale studies were feasible, a positive correlation between heterozygosity and growth rates was described in a variety of shellfish species (Zouros, 1976, 1987; Zouros et al., 1980; Koehn & Gaffney, 1984; Mitton & Grant, 1984; Zouros & Foltz, 1987). For example, blue mussels (Mytilus edulis) collected from shore and grown at high-density, stressful conditions display a correlation between increased growth rate and enzyme heterozygosity (Gentili & Beaumont, 1988). Similarly, coot clams (Mulinia lateralis) grown under temperature and salinity stress display a correlation between growth rates and enzyme heterozygosity (Scott & Koehn, 1990). Another study of five pair-crosses of Louisiana oysters (Crassostrea virginica) reported that heterozygosity in eight out of nine enzymes did not correlate with growth rates of 1387 oysters studied (Foltz & Chatry, 1986), but this study examined less than one-tenth of one percent of oyster genes. They concluded that enzyme heterosis of all electrophoretically variable allozymes is not directly responsible for the increased growth rate of hybrid oysters. These findings are consistent with results from maize research, where it has been difficult to establish a direct connection between multigenic heterosis and a specific biochemical pathway.
A number of studies in mussels and clams linked heterosis and growth rates to increased metabolic efficiency (Koehn & Shumway, 1982; Hawkins et al., 1986; Toro et al., 1996; Bayne & Hawkins, 1997). These studies measured growth as a function of oxygen utilization and demonstrated that hybrids grow more efficiently. Although there are exceptions to this general finding (Garton & Haag, 1991), most studies report that hybrids use less metabolic energy per unit growth. Heterosis in oysters has also been correlated with efficient oxygen consumption and growth (Koehn & Shumway, 1982). Enhanced growth of hybrid oysters has since been correlated with decreased rates of protein metabolism (Hawkins et al., 1986; Bayne & Hawkins, 1997). Whole-body protein turnover is believed to be the metabolically costly difference between slowly growing inbred and rapidly growing hybrid species such as mussels and clams (Bayne & Hawkins, 1997). This is also illustrated by species that survive anoxic intervals by reducing protein synthesis and degradation, ion transport across membranes, urea synthesis and gluconeogenesis as these processes account for the majority of basal metabolic needs (Buck & Hochachka, 1993; Land et al., 1993; Land & Hochachka, 1994; Hochachka et al., 1996). Examples of such species are lungfish, frogs, toads, aquatic turtles (for a review see Hochachka & Somero, 2002) and the terrestrial snail Otala lactea (Ramnanan et al., 2009). These species reduce their metabolism as much as 10-fold to conserve energy in low-oxygen environments.
The amount of metabolic energy required to synthesize, fold, degrade, and resynthesize proteins is significant. Although this has not been studied extensively in plants, it is estimated that 17–60% of basal metabolism is dedicated to protein deposition, which includes protein synthesis and turnover (summarized in Quigg & Beardall, 2003). Gene expression profiling in shellfish and various plant species establishes a further experimental association between decreased rates of protein metabolism and heterosis (see Section V).
IV. Gene expression studies
A number of gene expression studies of hybrids vs their inbred parents have been carried out on large sets of predicted genes from both maize and rice (Kollipara et al., 2002; Guo et al., 2003, 2004, 2006; Auger et al., 2005; Bao et al., 2005; Huang et al., 2006a,b; Swanson-Wagner et al., 2006; Meyer et al., 2007; Song et al., 2007; Springer & Stupar, 2007; Uzarowska et al., 2007; Hoecker et al., 2008a,b; Stupar et al., 2008; Zhang et al., 2008; Wei et al., 2009; Frisch et al., 2010; Jahnke et al., 2010; Riddle et al., 2010). These studies were undertaken in an attempt to identify specific genes or pathways responsible for hybrid vigor in these important crops. Maize studies identified hundreds of genes with altered expression levels between the inbred parental lines and the hybrid offspring. The gene expression changes reported in hybrids vs inbreds vary, with some studies reporting a high percentage of additive gene expression changes (Li et al., 2009) and others reporting a high percentage of non-additive changes (Stupar et al., 2007, 2008). It is unclear what causes these different findings. An important and curious observation is that most gene expression changes are in different sets of genes when comparing one inbred and the hybrid vs an unrelated inbred and the hybrid. In addition, the majority of gene expression changes observed are not associated with any specific biochemical pathway, but appear to be randomly dispersed among pathways and functions. Therefore, gene expression analysis has not implicated a specific biochemical pathway responsible for hybrid vigor. However, one frequently observed change in gene expression in these studies is a decrease of protein metabolism genes in the hybrid relative to the inbred parental lines. For example, in studies of hybrid rice using serial analysis of gene expression (SAGE), Bao et al. (2005) state ‘Most of the downregulated genes in the hybrid were found related to protein processing (maturation and degradation)’. Examples of genes downregulated in hybrids are UBC2, a ubiquitin-conjugating enzyme for unfolded proteins, PPIase, a rate-limiting step in protein folding and UGGT, an endoplasmic reticulum enzyme that recognizes unfolded proteins. These findings suggest that the gene expression changes in inbreds vs hybrids may result from the proteins encoded by the differentially expressed genes rather than by the pathways that these proteins function in. This would explain why different sets of genes change expression when comparing specific unrelated inbreds with the hybrid they create. It would also explain why specific pathways other than protein metabolism are not identified. It is noteworthy that the protein metabolism genes responsible for protein folding, refolding and degradation are observed to decrease in the hybrids that are growing more rapidly (i.e. making more protein). This is consistent with the shellfish observations that growth rates are higher in hybrids displaying lower rates of protein metabolism.
V. Protein metabolism
In 1938, Rudolph Schoenheimer demonstrated that dietary amino acids were incorporated into an animal’s tissue proteins, and that these tissue proteins were constantly being broken down and resynthesized (Schoenheimer & Rittenberg, 1938; Schoenheimer & Clarke, 1942). Unlike many cellular components, protein turnover is relatively rapid, with average half-lives of c. 30 h in eukaryotic cells (Goldberg & Dice, 1974; Goldberg & St John, 1976). Regulated protein metabolism is now known to be involved in a great many cellular responses ranging from cell division to developmental progression, responses to light and the environment, and programmed cell death (Buchler et al., 2005; Ciechanover, 2007; Varshavsky, 2008). A significant percentage of newly synthesized proteins are also known to be rapidly degraded, and are believed to represent a class of unstable proteins that did not fold properly (Schubert et al., 2000; Benaroudj et al., 2001; Goldberg, 2003). Intracellular protein metabolism is autoregulated by the amount of protein substrate entering the degradation pathway. When unfolded proteins are generated within a cell, the genes encoding protein-refolding enzymes and protein degradation pathways are increased by transcriptional responses to an appropriate readjusted level. This transcriptional response is controlled by short-lived positive regulators (sigma factors in prokaryotes and transcription factors in eukaryotes) that are stabilized or activated by the presence of the unfolded protein substrates (Bahl et al., 1987; Straus et al., 1987, 1989, 1990; Yura et al., 1990; Kitagawa et al., 1991; Herman et al., 1995; Mathew et al., 1998; Zhao et al., 2005). Small molecule proteasome inhibitors induce the genes encoding protein degradation and refolding activities, and conditions that generate unfolded proteins lead to this cytoplasmic ‘unfolded protein response’ demonstrating that protein metabolism is regulated at the level of proteolysis. Proteins with defects in folding or proteins damaged post-translationally are rapidly degraded (Goff et al., 1984; Goff & Goldberg, 1985; Goldberg et al., 1987; Jubete et al., 1996). Unfolded proteins can result from damage caused by heat shock, heavy metal exposure, oxidation, incorporation of amino acid analogs, premature translation termination and a number of other stressful conditions. Even single unfolded protein molecules are rapidly degraded and can trigger increased transcription of protein metabolism genes (Goff & Goldberg, 1985). The amounts of available ubiquitin are indicative of cytoplasmic levels of unfolded protein with increases in the latter causing signals to be sent to the nucleus to lower rates of overall transcription and growth (Dennis et al., 1999; Dantuma et al., 2006; Groothuis et al., 2006; Foster & Fingar, 2010). This response may have evolved to limit synthesis of new proteins when a significant amount of protein is in the queue for refolding or degradation.
Although the average half-life of proteins in eukaryotic cells is c. 30 h (Goldberg & Dice, 1974; Goldberg & St John, 1976), variation in protein half-lives ranges from seconds for short-lived regulatory proteins to several weeks for long-lived structural proteins. Similarly, mRNAs have a wide range of half-lives (3–90 min in yeast) with long half-life transcripts involved in central metabolic functions and short half-life transcripts involved in regulatory functions (Wang et al., 2002). Many transcription factors and regulatory proteins serving transient functions are examples of short-lived regulatory proteins with short-lived mRNAs. The drought response element binding proteins (DREB2a and DREB2b) are good examples of plant short-lived regulatory proteins that control drought-response genes under stress conditions. Rubisco is an abundant protein with a relatively long half-life of 7–8 d (Hirel & Gallais, 2006), although the subunits of Rubisco are unstable until assembled into multisubunit complexes (Liu et al., 2010), reflecting a property of the degradation system that maintains stoichiometric balance between subunits in multimeric complexes. Some plant signal transduction proteins regulated by hormones or other conditions also have short half-lives (Dreher et al., 2006; Dreher & Callis, 2007).
Gene expression studies on hybrid vs inbred oysters revealed lower expression of protein metabolism genes analogous to the results reported in plants (Hedgecock et al., 2007; Meyer & Manahan, 2010). These observations in oysters are consistent with metabolic labeling studies demonstrating lower oxygen consumption per unit protein deposition (growth) in hybrid shellfish (Garton et al., 1984; Hawkins et al., 1986; Bayne & Hawkins, 1997; Bayne, 2000). Furthermore, differences in metabolic efficiency are reported to be significant enough to explain the majority of the growth rate difference between inbred and hybrid clams. When differences in the energy used for feeding are accounted for, almost all the growth rate difference between inbred and hybrid oysters can be explained by the efficiency of protein deposition (synthesis and metabolism) (Garton et al., 1984). The metabolic efficiency evidence and protein metabolism differences between hybrids and inbreds suggests that they differ in either the protein substrates being degraded or in the protein degradation machinery itself.
What proteins are being degraded in the inbred parental lines and why are they absent or not degraded in the hybrid lines?
VI. Allele-specific gene expression
There is now considerable evidence that a significant percentage (1–5% in several reports) of plant and animal genes are expressed in either an allele-specific fashion or even mono-allelically. Allele-specific expression (ASE) has been documented in humans (Okamoto et al., 1994; Gimelbrant et al., 2007), animal model systems (Cowles et al., 2002), yeast (Ronald et al., 2005), hybrid rice, and hybrid maize (Springer & Stupar, 2007; Guo et al., 2008; Stupar et al., 2008). Genes displaying ASE are present throughout the autosomes and not simply restricted to known regions of imprinting such as genes on the X chromosome, antibody genes, olfactory receptor genes or other epigenetically down-regulated chromosomal domains. Allele-specific expression is independent of which parent the specific allele originated from, but it is inherited and disease susceptibility genes in humans display allele-specific expression. Heterozygous carriers of several disease susceptibility genes express the wild-type allele but have undetectable levels of the disease susceptibility allele (Okamoto et al., 1994; Yan et al., 2002b; Parker-Katiraee et al., 2008; Voutsinas et al., 2010). It is estimated that most individuals are carriers of a few deleterious genes yet do not display disease symptoms (Morton et al., 1956), and relatively small changes in gene expression can have a dramatic impact on susceptibility to disease (Yan et al., 2002a). For example, canine cyclic neutropenia, an autosomal recessive stem cell disease affecting white blood cells, is caused by a frameshift mutation in the AP3B1 gene encoding a clathrin assembly complex protein (Benson et al., 2004). Heterozygous carriers of the recessive frameshift mutation do not display disease symptoms, and the mutant transcript is also not detectable. Only the wild-type allele is expressed, while the mutant allele encoding a defective frameshift gene product is reported to be rapidly degraded (Yan et al., 2002a). In homozygous dogs the mutant alleles are expressed and the dogs display disease symptoms. Curiously, the disease symptoms are not as severe as expected from the frameshift mutation, and transcripts the size of wild-type are also present. These wild-type sized transcripts are generated by an RNA polymerase slippage error at a polyA track creating transcripts with a restored open reading frame. The corrected transcripts accumulate to 5–10% of the wild-type transcript and partly relieve the disease symptoms. RNA polymerase does generate errors, particularly at polyA tracks where slippage is possible, but the percentage seen in several cases analogous to this canine disease are in excess of what would be expected by slippage errors alone. A plausible explanation would be that RNA polymerase slippage creates a more stable message encoding a relatively stable protein and this mRNA accumulates to a higher percentage of total transcript because the mutant transcript is rapidly degraded.
Similar observations have been reported from patients with Hemophilia A, where a frameshift mutation in the factor VIII gene causes the disease (Young et al., 1997). Heterozygous carriers of the mutation do not show disease symptoms or expression of the mutant allele. Homozygous diseased individuals express the mutant alleles but have less severe phenotypes than expected from the severity of the mutation. Again, the frameshift creates a polyA track that RNA polymerase slippage errors can partly restore. Approximately 7% of clones derived from mRNA in homozygous patients have a restored reading frame and encode a functional factor VIII protein (Young et al., 1997). Allele-specific expression at varying levels is also observed in patients with a dominant-negative point mutation in the PIT1 gene controlling pituitary gene expression (Okamoto et al., 1994). Some heterozygous carriers of the dominant Arg271Trp allele show no signs of disease and have no detectable mutant transcript. Disease-free family members display no detectable mutant transcript suggesting that the mutant transcript is down-regulated or rapidly degraded. As this is a point mutation encoding a non-synonymous amino acid substitution, it further suggests that a destabilized protein product encoded by the gene is involved in the downregulation or rapid degradation of the transcript. Family members with the dominant disease have a small amount of transcript relative to the wild-type gene. Heterozygous carriers with higher, yet not wild-type levels of expression of the mutation suffer from severe disease symptoms. Humans with a cytosine deletion in the apolipoprotein B gene (apoB) or compound heterozygotes with two mutant alleles display reading frame restoration and production of wild-type size proteins from the frame-shifted mutant alleles (Linton et al., 1992, 1997). These wild-type size proteins are believed to originate from mRNAs making up 11% of the transcripts from the mutant genes. Once again, the most reasonable explanation for the high percentage of corrected transcript is that rare errors made by RNA polymerase create a stabilized transcript encoding the in-frame protein relative to the out-of-frame mutant. A similar observation has been made in humans with a frameshift in the carbonic anhydrase II gene (Hu et al., 1995).
Together, these studies demonstrate that at least some allele-specific gene expression is caused by downregulation or rapid degradation of transcripts encoding defective proteins. Such a model is consistent with observations of ASE in many disease carriers and the fact that many known disease genes encode defective proteins owing to nonsynonymous amino acid substitutions (Yue et al., 2005). These findings link stability of the proteins to the expression or stability of the transcript and are consistent with known quality control mechanisms such as nonsense mediated decay (NMD) operating in the nucleus.
Although synthesis of proteins in the nucleus goes against the central dogma of molecular biology, considerable evidence exists that a small fraction of translation does indeed occur in the nucleus and is linked to quality control processes (Hentze, 2001; Pederson, 2001). This ‘pioneer’ round of translation in the nucleus accounts for c. 10–15% of total translation, and is consistent with historical observations of nuclear ribosomes, tRNAs, and nascent proteins. The evolution of prokaryotes to eukaryotes apparently did not eliminate coupled transcription and translation, but added a nuclear quality control scanning mechanism before mRNA maturation and export from the nucleus. Examples of quality control mechanisms outside the nucleus are the ‘unfolded protein response of the endoplasmic reticulum’ (UPR ER) in eukaryotic cells (Kozutsumi et al., 1988; Sidrauski et al., 1998; Hampton, 2000; Jonikas et al., 2009) and the ‘unfolded protein response of the periplasmic space in prokaryotes (Hasenbein et al., 2010). In the UPR ER, proteins being synthesized on rough ER and translocated across the membrane into the cisternae of the ER are scanned for proper folding. If the nascent polypeptide is not folding properly, a signal is sent across the membrane to the cytoplasmic side that activates an RNase and degrades the mRNA for that specific unfolded protein. In addition to eliminating specific unfolded proteins, genes encoding proteins required for proper folding in the ER are induced by the presence of unfolded proteins (Mori et al., 1996). Therefore, signaling mechanisms exist to monitor the level of unfolded proteins in the ER and respond to the level of unfolded proteins appropriately. The ability of cells to recognize and degrade unfolded proteins in the cytoplasm has been known for several decades (Goldberg & Dice, 1974; Goldberg & St John, 1976), and the sophistication of these quality control mechanisms reflects the importance of eliminating unfolded proteins and preventing them from being synthesized in large amounts.
If a nuclear scanning quality control process stabilizes/destabilizes mRNAs based on protein folding and stability, processing of transcripts and proteins should be colocalized. Studies on the localization of transcription and processing of transcripts have identified specific nuclear organelles called Cajal bodies (also known as ‘coiled bodies’) associated with growth and transcriptional activity (Cioce & Lamond, 2005). Cajal bodies are tethered to specific regions in the nucleus, probably via direct interactions with chromatin (Platani et al., 2002). The Cajal bodies disappear when cells enter mitosis, are more numerous in rapidly growing cells and are known to be the sites of RNA processing (Cioce & Lamond, 2005). Various functional activities have been associated with Cajal bodies, including processing of mRNAs and maturation of ribonucleoproteins (Gall, 2000, 2001; Frey & Matera, 2001; Dundr et al., 2004; Espert et al., 2006; Cristofari et al., 2007). Short half-life proteins have also been shown to co-localize in Cajal bodies following inhibition of protein metabolism or heat shock (Handwerger et al., 2002). Such colocalization studies are consistent with transcription, processing of mRNAs and degradation of short half-life proteins occurring in the same nuclear subcompartment.
The C-terminal domain (CTD) of RNA polymerase II (or the plant specific RNA polymerase IV) is believed to play an organizing role in the various steps between gene transcription, mRNA processing, protein and RNA stabilization or degradation and epigenetic regulation of genes (Iborra et al., 2004). The CTD has been shown to interact with a number of proteins that carry out these different activities in a staged fashion (Iborra et al., 2001). RNA Pol II and Pol IV have also been shown to interact with Argonaute via the CTD, establishing a link between transcription and epigenetic regulation via RNAi (Schramke et al., 2005; Till & Ladurner, 2007). It appears that the nuclear scanning quality control mechanism allows cells to discriminate between alleles based on the quality of the protein products made by those alleles. Homozygous alleles in the inbred are expressed regardless of the stability of the encoded protein. Although it remains unclear how a cell senses that both alleles are identical, a number of findings suggest that gene regulation is influenced by sister chromatid cohesion (Dorsett, 2007; Cipak et al., 2008; Wendt et al., 2008; Wendt & Peters, 2009; Wood et al., 2010). In summary, it appears that a great many cellular processes work in concert to produce high-quality mRNAs and proteins to coordinate nuclear with cytoplasmic activities and to conserve on the energy required to support growth (Keene, 2007; Komili & Silver, 2008).
VIII. A synthesis model
A relatively simple working model of multigenic heterosis, growth and yield emerges from the various studies described above. This model postulates that a cell-based quality control mechanism detects and downregulates alleles encoding unstable proteins in hybrids. Cells downregulate alleles based on the relative stability of the encoded proteins, and the hybrid growth advantage results from conserved energy and more rapid cell division (Fig. 2). Based on ASE, it is estimated that c. 1–5% of genes have alleles encoding unstable proteins. This model is consistent with the three existing heterosis theories of dominance, overdominance and epistasis, and serves to explain a number of other findings that are difficult to explain with the existing theories. In the case of dominance, weak alleles encoding unstable proteins are complemented by alleles encoding more stable proteins, allowing the weak allele to be downregulated. Hybrids therefore express more dominant growth-promoting ‘characters’ relative to inbred parental lines. In the case of overdominance, two different alleles encoding stable proteins under different environmental conditions are brought together and provide the hybrid with higher activity and stability of the protein function over a larger variation in environmental conditions. The vigor experienced by hybrids would be proportional to the number of different high-quality alleles brought together.
1. What other observations does such a model explain?
Progressive polyploids would be expected to display higher levels of heterosis than diploid hybrids under this model simply because they would have a larger number of alleles to choose between, and therefore the most stable proteins or protein combinations could be generated. Higher heterosis in polyploids has been described (Birchler et al., 2003, 2005, 2006; Auger et al., 2005; Riddle & Birchler, 2008; Riddle et al., 2010), but aneuploids appear to display lower levels of heterosis and this is not easily explained by existing theories. The answer could again be a consequence of increased protein metabolism in the aneuploids. Studies of yeast carrying extra, unpaired chromosomal regions demonstrate that expressed genes from unpaired DNA causes decreased fitness (growth rates). There is no consequence if the extra, unpaired DNA is noncoding as is the case with yeast carrying an artificial chromosome with human DNA that is not expressed. Detailed studies revealed that the unpaired regions of coding DNA create proteins that are degraded because they are no longer in stoichiometric balance with genes encoded in other regions of the genome (Torres et al., 2007). It has been known for several decades that subunits of multimeric proteins made in excess are recognized and degraded when unassembled with their holoenzyme complexes, analogous to the case of the unassembled Rubisco subunits described earlier. Therefore, it is possible that the decreased heterosis displayed by aneuploid plants is a consequence of the increased degradation of proteins encoded by the unpaired regions of the genome.
Newly formed polyploids or alloploids are known to undergo a period of instability in both phenotypes displayed and gene expression profiles. Approximately 0.4% of genes in newly formed Arabidopsis allotetraploids are reported to be silenced (Comai et al., 2000). Gene silencing and activation is also observed in newly synthesized wheat allotetraploids (Kashkush et al., 2002). If the proposed quality control mechanism functions in a dynamic fashion, one would expect to see different choices of alleles made under different environmental conditions and in newly formed genome combinations. For example, when an organism experiences heat shock and proteins are denatured, the model would predict a transient instability of gene expression and epigenetic regulation, which has been reported (Madlung & Comai, 2004). Similarly, when a completely new set of chromosomes are introduced via creation of a synthetic hybrid, genomic instability, phenotypic and gene expression variability would also be predicted. McClintock and others have described such instabilities as ‘genome shock’ (McClintock, 1983). Does this period of genome shock represent the time needed for the newly introduced genome to be compared with the existing genome and allele-specific gene expression of alleles encoding the most stable proteins to be re-established? Does protein stability play a role in gene fractionation events? These are questions awaiting further research.
IX. Putting the model to work
How can this model be tested and used to enhance crop yield? Direct experimental evidence for this model can be generated by previously established wet-laboratory techniques. A prediction of the model is that unique unstable proteins will be present in the two inbred parental lines and not in the hybrid created from these lines. Unstable proteins targeted for degradation are typically conjugated with multiple ubiquitins at lysine residues, then recognized by the proteasome and degraded into peptides. The peptides are then rapidly degraded to amino acids before amino acid recycling or further degradation to amino acid breakdown products. Hybrids should display a lower metabolic profile of recycled or degraded amino acids than inbreds. To identify specific unstable proteins, it is possible to enrich in poly-ubiquitinated proteins from plant extracts, protease digest or fragment the enriched proteins and identify the peptides by standard proteomics technologies. The genes encoding these unstable proteins can then be identified computationally. Allele-specific gene expression profiling would complement these efforts because specific alleles encoding unstable proteins should be downregulated in the hybrid relative to the parental inbred lines. To accelerate yield enhancement over generations, identification and elimination of highly expressed alleles encoding unstable proteins could be used to drive a molecular breeding program. One prediction of the model is that inbred lines used to generate commercial hybrid maize seed varieties over the past several decades would have fewer highly expressed unstable proteins. These defective alleles would have been selectively eliminated by common empirical breeding approaches. Expression analysis and allele sequencing of inbred and hybrids introduced over decades of seed product development would address this prediction and add supportive evidence to the theory.
How can this model be used to accelerate yield gains by computational approaches? Breeding of field crops, including maize, is done empirically, requiring many years of expensive field trials to generate a higher-yielding seed variety. It is feasible today to identify alleles encoding unstable proteins by a computational approach and use molecular breeding to eliminate highly expressed alleles encoding unstable proteins. There are currently approx. 65 k protein structures in the protein structure database and many software programs available to analyse protein stability based on primary amino acid sequence, conserved domains, or three-dimensional structure (Adzhubei et al., 2010). Ultra-high-throughput DNA sequencing is now available to generate the sequence of genomes or transcriptomes of plant varieties. Therefore, it is feasible to create an analysis pipeline comparing the stability of all known alleles (or gene family members) relative to each other, and rank their stability values. Together with the cost to synthesize the protein based on the amino acid composition and the relative abundance (transcription level), it should be possible to prioritize alleles and design an ideal genotype. Assuming 30 k genes with 10 alleles per gene, the number of possible combinations is approx 6 × 1044. Assuming that 5000 genes are expressed in the major tissues and 5% are alleles encoding unstable proteins, it would be necessary to replace c. 250 alleles in the two inbred parental lines to create an enhanced yielding hybrid. This is a feasible number of alleles to replace, although it would require a number of generations to achieve.
In summary, this paper describes a model for heterosis that is relatively simple, consistent with a number of diverse observations from a variety of species, and provides an obvious route toward crop enhancement. Recently, Brian Ginn authored an article in the Journal of Theoretical Biology describing the importance of protein stability in heterosis, although the proposed mechanism differs from allele-specific expression described here (Ginn, 2010). Ginn proposes that ‘The accumulation of misfolded and aggregated proteins within inbred organisms are the result of more negative free energies of folding for proteins encoded at homozygous gene loci and higher concentrations of potentially aggregating non-native protein species within the cell’. Identification and elimination of alleles encoding unstable proteins is technically challenging, yet currently feasible. An efficient bioinformatics analysis pipeline could be developed to support a molecular breeding program to accelerate yield gains across both hybrid and inbred crops. Artificial intelligence could then be used to refine the analysis pipeline based on iterative feedback from field trials incorporating environmental variability into the analysis. Ultimately this computational analysis pipeline used and refined over numerous generations of crop enhancement would generate the knowledge and understanding to allow future generations of synthetic biologists to efficiently design genes encoding stable proteins for novel environments or applications.
The author thanks Stephen Welch (Kansas State University) and James Birchler (University of Missouri) for critical comments on the manuscript.
This paper is dedicated to the memory of Dr James ‘Paulo’ Dice (1948–2010), an excellent teacher and research scientist, a superb mentor, a great fisherman and a lifelong friend.