As sessile organisms, land plants have exploited their metabolic systems to produce a panoply of structurally and functionally diverse natural chemicals and polymers to adapt to challenging ecosystems. Many of these core and specialized metabolites confer chemical shields against a multitude of abiotic stresses, while others play important roles in plants' interactions with their biotic environments. Plant specialized metabolites can be viewed as complex traits in the sense that the biosynthesis of these molecules typically requires multistep metabolic pathways comprising numerous specific enzymes belonging to diverse protein fold families. Resolving the evolutionary trajectories underlying the emergence of these specialized metabolic pathways will impact a fundamental question in biology – how do complex traits evolve in a Darwinian fashion? Here, I discuss several general patterns observed in rapidly evolving specialized metabolic systems in plants, and surmise mechanistic features at enzyme, pathway and organismal levels that rationalize the remarkable malleability of these systems through stepwise evolution. Future studies, focused on fine sampling of metabolic enzymes and pathways in phylogenetically related plant species, or employing directed evolution strategies in synthetic systems, will significantly broaden our perspective on how biological complexity arises at the metabolic level.
Complex traits are phenotypic features that emerge from the interplay of multiple genetic components (Lander & Schork, 1994). Many complex traits are also described as irreducible, because the absence of any component of such a trait would abolish its overall function (Weber, 1999). Although complex traits are omnipresent in life at different biological scales, with examples ranging from butterfly wing patterning to the assembly of multiprotein complexes (Beldade & Brakefield, 2002; Alber et al., 2007), little is known about the stepwise trajectories through which complex traits arise in nature. Charles Darwin pondered this question in the Origin of Species using the evolution of the complex human eye as an example, writing, ‘To suppose that the eye … could have been formed by natural selection, seems, I freely confess, absurd in the highest possible degree.’ (Darwin, 1859). Despite a lack of molecular details available at the time, he offered a rational hypothesis to explain this seemingly impossible puzzle, ‘If numerous gradations from a perfect and complex eye to one very imperfect and simple, each grade being useful to its possessor, can be shown to exist, … then the difficulty of believing that a perfect and complex eye could be formed by natural selection, though insuperable by our imagination, can hardly be considered real.’ (Darwin, 1859). One and a half centuries later, as biologists have now elucidated many molecular underpinnings of eyes as well as those related but less complex light sensitive structures in multiple animal lineages, Darwin's hypothesis concerning the evolution of the complex eye through stepwise selection is being corroborated. Not only is the human eye homologous to those simpler light-sensitive organs in other animal lineages, but as an organ developed via the opportunistic twists and turns of evolutionary processes, it is also far from being a perfectly engineered machinery, and exhibits major flaws from a designer's perspective (Lamb, 2011). Indeed, since Darwin, the elucidation of plausible evolutionary paths towards complexity has grown in appreciation in modern biological studies, and is being actively pursued in a wide selection of biological systems (Lenski et al., 2003; Gompel et al., 2005; Capra et al., 2012; Finnigan et al., 2012).
Metabolism is a defining property of cellular life, often depicted as a complex network of chemical transformations mediated by a multitude of enzymes (Weng & Noel, 2012b). Living organisms harness energy and chemical substances from their exterior environments, while synthesizing and degrading a plethora of metabolites to fulfill discrete physiological needs for the host survival and fitness in ever-changing environments. Metabolic systems can be viewed as complex traits, because individual metabolic enzymes in isolation are rarely pertinent to in vivo physiology until they become part of an organized metabolic pathway. Extant metabolic pathways in general contain a set of enzymes catalyzing sequential reactions in a highly concerted manner (Weng & Noel, 2012b). Given that the end metabolites of metabolic pathways ultimately confer selective advantages to the host, it is indeed difficult to understand how individual incipient enzymes arrive at specific activities over defined periods of time through stepwise evolution – Descent with Modification – before assembly of a more complex and integrated pathway. Without reference to the historical records encoded in organismal phylogeny and function, this quandary touches on the same question raised by Darwin concerning the evolution of the complex eye.
Metabolism provides an attractive platform to study evolutionary processes leading towards biological complexity. Metabolic systems exhibit tremendous depth and richness in their phylogenetic distribution throughout all domains of life. On the one hand, primary metabolism, encompassing pathways immediately required for the survival of the host, is conserved to a great extent in all living organisms, subject to modifications often in organisms living under extreme conditions (Mullins et al., 2008). On the other hand, specialized metabolism, referring to pathways that yield chemicals dispensable for survival at normal conditions but contributing critically to the population fitness of the host in ecological niches, is often distributed in a taxonomically restricted manner. This feature is especially important for comparative studies of extant metabolic systems across lineages spanning a dynamic range of evolutionary timescales, through which useful information might be extracted to help reconstitute plausible trajectories underlying the occurrences of discrete metabolic traits.
Plants are renowned for their ability to produce an enormous array of chemicals as unique adaptive strategies well suited for their sessile lifestyle in challenging terrestrial environments (Weng et al., 2012b). The total number of these so-called ‘specialized’ metabolites present in the plant kingdom remains elusive as more and more species are examined molecularly. Nonetheless, through our limited knowledge of a small fraction of this remarkable chemodiversity, we continue to add to our knowledge base concerning the amazing ability of plants to select and exploit a diverse collection of peculiar physicochemical properties of natural chemicals – including but not limited to color, flavor, fragrance, toxicity, stickiness, hydrophobicity, physical rigidity and redox potential – as means to overcome the multitude biotic and abiotic challenges facing plants throughout their lifecycle. In the past few decades, the employment of molecular genetics, biochemistry and structural biology for studying plant metabolism is yielding a growing molecular appreciation for the mechanisms of how specialized metabolites are biosynthesized in plants. Recent advances in the development of genomic resources for a widening collection of taxonomically diverse plant species across the green plant lineage has further facilitated genome mining for identifying new metabolic pathways, and affords growing opportunities for comparative studies of phylogenetically related metabolic enzymes and pathways.
In order to address the conundrum of the evolutionary origin of multi-step metabolic pathways, several hypotheses have been put forward historically, including the retrograde model (Horowitz, 1945), the patch hypothesis (Jensen, 1976), the screening hypothesis (Jones et al., 1991) and more recent hypotheses deduced from studying adaptive microbial metabolic pathways for detoxifying xenobiotics (Copley, 2009). Here, in the context of metabolic enzyme and pathway evolution, I summarize a set of general observations regarding the rapid expansion of specialized metabolic systems in plants, and provide a number of examples to illustrate some probable evolutionary trajectories underlying certain specialized metabolic traits. I then discuss the mechanistic basis for enzyme catalytic promiscuity and its recently recognized role in metabolic evolvability. Finally, I propose a generalized model to explain how complex metabolic traits could arise, sometimes in a saltatory fashion, through the assembly of promiscuous enzymes into new pathways, followed by functional refinement at the catalytic, spatial and temporal levels.
General evolutionary patterns in plant specialized metabolism
System-level phylogenetic examination of specialized metabolic enzymes and pathways across major phyla of the plant kingdom reveals several general patterns, all consistent with the stepwise evolutionary processes underlying the emergence and evolution of the metabolic traits observed in extant plants. First, diverse plant specialized metabolic pathways branch from core primary metabolism at different nodes (Fig. 1a). For example, the enormously rich phenylpropanoid metabolism widely present in plants starts with deamination of the aromatic amino acid phenylalanine through phenylalanine ammonium lyase (PAL; Vogt, 2010). The diverse family of secondary plant terpenes (isoprenoids) begin with the primary metabolites isopentenyl pyrophosphate and dimethylallyl pyrophosphate (Chen et al., 2011). Caffeine and related purine alkaloids, sparsely found in 13 orders of flowering plants, begin with core purine nucleotides (Ashihara et al., 2008). This observation suggests that the initial birth of those major specialized pathways present in extant plants probably involved emergent catalytic activities towards certain primary metabolites, which yield new compounds that enhance host fitness in particular environments.
Second, in general, the taxonomic distribution of plant specialized metabolic traits correlates with the gradual evolutionary development of specialized tissue types, organs and/or lifestyles observed in land plants as they underwent extensive divergence over the last 500 million yr (Fig. 1b). Several pathways, such as the biosynthesis of core phenylpropanoids, cuticles, sporopollenins, abscisic acid and flavonoids, are absent in extant charophytic algae most closely related to land plants, but ubiquitously present in all extant land plants (Weng & Chapple, 2010). The primary functions of these compounds are for protection against UV radiation and desiccation, representing major abiotic stresses facing those early land plants when migrating from aquatic habitats to terrestrial environments. When vascular plants arose, the ancestral core phenylpropanoid pathway was further elaborated to produce lignin, a phenolic polymer that provides physical rigidity to water-conducting xylem cells in vasculature and enables vascular plants to stand upright (Weng & Chapple, 2010; Weng et al., 2010b). The evolution of trichomes in euphyllophytes coincides with the occurrence of a diverse array of metabolites enriched in these specialized surface structures, wherein most of these compounds are involved in chemical defense against herbivores (Dai et al., 2010). The emergence of seed plants c. 300 million yr ago also led to the precipitation of a number of metabolic features related to seed physiology, such as the accumulation of condensed tannins in the seed coat and the rapid breakdown of starch during seed germination (Bradford & Nonogaki, 2007). Moreover, the rise of flowering plants over the past 200 million yr led to an explosion of chemodiversity in volatile compounds, emitted by floral tissues to attract co-evolving pollinating insects (Pichersky et al., 2006). Similarly, many land plant lineages are also known to form symbiotic relationships with root microbiomes, wherein certain metabolites are secreted from roots into the rhizosphere as chemical signals to mediate specific root–microbial interactions (e.g. the trihydroxychalcone-derived isoflavonoids produced in legumes induce nodulation by rhizobia; Walker et al., 2003; Bulgarelli et al., 2013). In plant specialized metabolism, new pathways continuously build on existing pathways, resulting in a relatively conserved set of earlier-evolved pathways extended by variable lineage-specific peripheral pathway branches in the extant land plants.
Third, the expansion of the specialized metabolism in plants did not involve the emergence of new protein folds, but rather the extensive exploitation of the sequence space in the pre-existing protein folds by natural selection. Many of these fold families are rooted in more ancient primary metabolic systems (Weng et al., 2012b; Fig. 2, Table 1). Particular catalytic machineries inherited along with the ancestral folds are often conserved during enzyme family expansion, although it is also common that new catalytic chemistry could arise within the fold family by reassembly of new catalytic residues in the active site (Weng & Noel, 2012b). For example, chalcone synthase (CHS), the first committed enzyme in flavonoid biosynthesis in plants, shares the same fold and catalytic machinery as the β-ketoacyl-ACP synthase III (KAS III), a key enzyme of fatty acid biosynthesis in plants and bacteria (Weng & Noel, 2012a). In another remarkable case, chalcone isomerase (CHI), a stereo-specific and catalytically perfected isomerase downstream of CHS in the flavonoid biosynthesis, evolved from a clade of noncatalytic CHI-fold proteins. In the green plant lineage including simple chlorophyte algae, these noncatalytic CHI-fold proteins play a role in lipid biosynthesis and homeostasis in plants, and are structurally conserved in several other eukaryotic lineages as well as in some bacteria (Ngaki et al., 2012). Different enzyme families, typified by their structural folds, underwent tremendous radiation during land plant evolution due to gene duplication events followed by selective refinement (Xue et al., 2012). New specialized metabolic pathways continuously emerged in a lineage-specific manner by independently recruiting and refining descendants of these radiating families to catalyze sequential chemical reactions (Weng et al., 2012b; Fig. 2).
Table 1. Major enzymes families known to be involved in specialized metabolism in extant land plants and their presumed cousins in primary metabolism
Major enzyme families involved in plant specialized metabolism
Cousins in primary metabolism
The number of genes in the genome of the green algae Chlamydomonas reinhardtii and the flowering plant Arabidopsis thaliana encoding enzymes belonging to each enzyme family are listed for comparison.
Glycosyltransferase family 1
Class III peroxidase
Cytochrome c peroxidase
Farnesyl pyrophosphate synthase
CCR-like NAD(P)H-dependent reductase
CAD-like alcohol dehydrogenase
Long-chain fatty acyl-CoA synthetase
Long-chain fatty acyl-CoA synthetase
Type III polyketide synthase
β-ketoacyl-ACP synthase III
Repeated emergence of identical metabolic traits in disparate lineages
As divergent evolution predominantly drives the continuous expansion of chemical complexity in plants (Fig. 3a), identical metabolic traits often arise independently in disparate lineages through parallel or convergent evolution (Barton et al., 2007; Pichersky & Lewinsohn, 2011; Weng & Noel, 2013; Fig. 3b,c). Because a number of plant species derived from divergent lineages often co-occupy the same ecological niches, the repeated evolution of common metabolic traits likely resulted from natural selection driven by similar selective pressures associated with particular environments. Alternatively, as plant specialized pathways are typically less constrained than their cousins in primary metabolism, early neutral drift followed later by selection before gene loss to some extent may also have contributed to the contingent occurrences of identical specialized metabolites in separate lineages.
Here, in the context of enzyme evolution, I use the term parallel evolution to describe independent acquisitions of identical catalytic properties in homologous enzymes belonging to the same fold family, whereas convergent evolution is reserved here to refer to the evolution of identical catalytic functions in nonhomologous enzymes possessing distinct protein folds (Fig. 3). It should be noted that these definitions, using protein fold structure as a major evaluation criterion, are more stringent than the ones previously used in describing repeated evolution in plant metabolic systems (Weng et al., 2010a; Pichersky & Lewinsohn, 2011). These updated definitions of parallel and convergent evolution as used in describing enzyme evolution are in compliance with other subdisciplines of evolution biology (Zuckerkandl et al., 1965; Barton et al., 2007). Moreover, the separation of convergent and parallel evolution – where previously parallel was considered subordinate to convergent – clarifies existing, blurry and imperfect boundaries between the two terms currently in use.
In an apparent example of parallel evolution, syringyl (S) lignin, a fundamental building block of plant cell walls, occurs in two major plant lineages, lycophytes and angiosperms, which diverged from each other > 400 million yr ago (Mya; Towers & Gibbs, 1953; Weng et al., 2008). In angiosperms, S lignin biosynthesis requires two enzymes, ferulate 5-hydroxylase (F5H) and caffeic acid O-methyltransferase (COMT), forming a metabolic branch diverting flux from the biosynthesis of guaiacyl (G) lignin, a lignin type common to all the vascular plants (Weng & Chapple, 2010). It was later discovered that the lycophyte Selaginella moellendorffii independently evolved a bifunctional phenylpropanoid 3,5-hydroxylase, SmF5H, and a companion bifunctional phenylpropanoid OMT, SmCOMT. Together, SmF5H and SmCOMT mediate a new S lignin pathway directly derived from p-hydroxyphenyl (H) monolignols, bypassing four steps of the canonical lignin biosynthetic defined in angiosperms (Weng et al., 2008, 2010a, 2011). At the enzyme level, F5Hs and COMTs of angiosperms and Selaginella belong to the cytochrome P450 and S-adenosyl-l-Met (SAM)-dependent OMT families, respectively, and have apparently arrived at their homologous activities in the two lineages via parallel evolution (Fig. 3b). Nonetheless, due to major differences in the substrate specificities between the homologous enzymes of angiosperm and Selaginella origins, the exact metabolic routes of S lignin biosynthesis are distinct at the pathway level in the two lineages (Weng et al., 2010a).
Clear cases of convergent evolution within the boundaries of the above separation of convergent and parallel evolution at the protein structure level are relatively rare compared to clear cases of parallel evolution, but have been documented in various biological systems (Zuckerkandl et al., 1965). A clear example of such a case lies in the plant flavonoid biosynthetic pathway. Whereas most of the flowering plants examined to date employ type II flavone synthases (FNS II), belonging to the cytochrome P450 family, to catalyze the oxidation of flavanones to the corresponding flavones, members of the Apiceae family evolved a distinct type I flavone synthase (FNS I), belonging to the 2-oxoglutarate-dependent dioxygenase (ODD) family, to mediate the same overall oxidation reaction but through a distinct chemical mechanism as employed by FNS II (Leonard et al., 2005; Fig. 3c).
Repeated emergence of similar or identical metabolic traits in distantly related taxa is indeed a common theme during land plant evolution, indicative of the considerable pliability of plant specialized metabolic systems (Pichersky & Lewinsohn, 2011; Weng & Noel, 2013). Nevertheless, our current knowledge of this remarkable phenomenon still remains mostly at the level of comparative phytochemistry focused on end product identity but rarely clear end product function. The ultimate elucidation of the molecular underpinnings of these cases will greatly illuminate how specific enzyme mechanisms, structures and metabolic networks are accessed through alternative evolutionary trajectories in nature.
Catalytic promiscuity and its role in metabolic evolvability
Enzymes are erroneously considered exquisitely precise and efficient molecular catalysts. Natural enzymes are also evolvable: in other words, they can migrate through mutational trajectories to arrive at alternative catalytic activities, sometimes in unexpected saltatory fashion (Zuckerkandl et al., 1965; O'Maille et al., 2008). How does an enzyme evolve from a given ancestral function to novel functions through descent with modification? Following scrutiny of multitude enzyme behaviors, it has been increasingly recognized that enzymes are not as ‘perfected’ as we assumed, but rather contain varying levels of catalytic promiscuity (Tokuriki & Tawfik, 2009; Fig. 4a). Enzyme catalytic promiscuity is intrinsically associated with the dynamic nature of proteins, and indeed encompasses a range of mechanistic processes, namely substrate permissiveness, mechanistic elasticity and concomitant product diversity (Weng & Noel, 2012b). By varying physicochemical conditions or mutations, the latent behaviors of enzymes may be amplified to yield alternative metabolites at sufficient levels, on which natural selection can operate (Aharoni et al., 2005; Fig. 4a). Gene duplication events, in many cases, yield multiple copies of isozymes, subject to subsequent differentiation in expression regulation or subcellular localization without changes in the catalytic function. In some other cases, gene duplication events can also disconnect the newly derived allele from its ancestral function that is often under evolutionary restraint, and allow divergence of catalytic function to promptly occur through additional mutations (Fig. 4a).
In primary metabolism, the essential functions of these core pathways impose constant selective pressures over long historical periods, ultimately driving the participating enzymes towards catalytic efficiency and, as a result, much lower levels of catalytic promiscuity (Bar-Even et al., 2011). However, in specialized metabolism, as enzyme functions tend to meander more frequently in response to fluctuating environments and/or have not been under sufficiently strong selection to reach optimum efficiency, catalytic promiscuity is more easily observed in these secondary enzymes (Tokuriki et al., 2012; Weng et al., 2012b). For example, terpene synthases (TPSs) are a class of plant specialized enzymes recognized for their catalytic promiscuity. One TPS can be responsible for the formation of a bouquet of hydrocarbon compounds in vivo (Tholl et al., 2005), wherein the product distribution fingerprints of a given TPS are acutely sensitive to environmental conditions and mutations (O'Maille et al., 2008). The elevated level of catalytic promiscuity in specialized metabolic enzymes is probably one of the fundamental factors that contribute to the remarkable evolvability seen in these metabolic systems found throughout nature.
A clear example of metabolic evolution through exploitation of ancestral enzyme promiscuity was recently demonstrated in Arabidopsis thaliana, highlighting the emergence of a new set of α-pyrone-bearing metabolites, arabidopyrones (Weng et al., 2012a). The first step of arabidopyrone biosynthesis is catalyzed by a P450 enzyme, CYP84A4, which evolved from a gene duplicate of the ancestral CYP84A1, a phenylpropanoid 5-hydroxylase involved in S lignin biosynthesis. CYP84A4 has apparently adopted one of the ancestral latent activities of its progenitor, and neofunctionalized into a specific phenylpropanoid 3-hydroxylase. This evolutionary development led to the synthesis of a catechol-substituted substrate, which can be further metabolized by a conserved extradiol ring-cleavage dioxygenase together with additional downstream redox enzymes to produce arabidopyrones. This study indicates that the system-level plasticity of the ancestral metabolic network allows the impact of newly evolved catalytic activity in one enzyme to project through multiple pre-existing promiscuous catalytic steps, to give rise to more profound metabolic outcomes.
Enzyme catalytic promiscuity seems to be a key component in how complex specialized metabolic pathways can rapidly evolve through Darwinian evolution (Fig. 4b). In the crowded cellular milieu, thousands of different types of enzymes are working simultaneously in transforming a multiplicity of metabolites at a given time point. Due to the intrinsic catalytic promiscuity of enzymes, especially those involved in specialized metabolism, there is always a level of ‘metabolic noise’ underlying so-called normal metabolic behavior, yielding low-level metabolites that are selectively neutral and may also vary greatly in their identities and concentrations throughout cellular metabolism. Contingency in evolution, such as genetic variations, oscillating ecological conditions or sudden changes in environments, can elicit rapid selection of promiscuous activities and products, resulting in the recruitment of ancillary enzymes to shape an emergent metabolic pathway for enhanced host and population fitness. Natural selection continually samples low-level promiscuously derived metabolites and, when conditions are ripe, captures previously neutral metabolites that now confer selective advantages. When these selective alleles persist, they rapidly spread throughout a population, progressing from pathways of stochastic origins to fixed metabolic traits, subject at times to further improvement through additional rounds of positive selection.
According to a recent estimation, out of the four billion species that ever lived on the Earth over the past 3.5 billion yr, only 1% of them have survived until today (Barnosky et al., 2011). As biologists trying to understand the evolutionary trajectories that connect related but discrete biological features, we are indeed left with only a very small fraction of the actual historical phylogenetic sampling explored throughout global life. Many highly complex traits, for example the human eye, are widely present in extant organisms. However, it is very common to find huge phylogenetic gaps that will never be known with certainty even with the most sophisticated means of ‘resurrecting’ so-called ancient proteins, particularly as the phylogenies used for such calculations represent a very sparse sampling of extant organisms (Thornton et al., 2003). In short, most evolutionary intermediates – nodes on a phylogenetic tree – were swept away in parallel by distinct lineages, even those intermediates providing historical fitness advantages in organismal populations. This key aspect of descent with modification hinders our understanding of the detailed processes underlying molecular evolution, a problem also impinging on the emerging field of synthetic biology (Smock & Gierasch, 2005). Examination of metabolic systems using a molecular evolutionary perspective restrained by our understanding of catalytic mechanisms also reveals a set of optimized complex pathways, for example the TCA cycle, with little clue as to how these pathways initially emerged. However, plant specialized metabolism, which has expanded exponentially over the past 500 million yr, provides a phylogenetically rich system for studying evolutionary processes leading to metabolic complexity on varying biological, chemical, architectural and time scales.
At the enzyme level, one can explore homologous enzymes isolated from closely related species to identify recently acquired mutations under positive selection in certain lineages that alter enzymatic activities, representing an early record of divergent evolution. By harnessing the great sequence diversity of a number of enzyme families using bioinformatics methods, for example statistical coupling analysis (SCA; Halabi et al., 2009), one can infer and further test the biophysical restraints imposed on a given protein fold family that shapes evolvability. At the pathway level, the progenitor enzymes of a clearly characterized evolutionarily new pathway (Matsuno et al., 2009; Weng et al., 2012a) can be reassembled in vitro or introduced into a suitable set of hosts to monitor system-level metabolic promiscuity under differing extant conditions. Directed evolution can then follow to reveal mutational effects and their contribution to catalytic promiscuity or specificity, flux, etc., as well as to partially recapitulate mutational trajectories that mimic the natural evolutional processes involving modification of multiple pathway components simultaneously. At the organismal level, it will be of great interest to investigate whether rapid enzyme evolution in plants is facilitated by additional molecular machineries encoded by the plant genomes. For example, heat shock proteins may assist in stabilizing inherently unstable proteins folds carrying destabilizing mutations long enough for favorable activities to be selected for (Queitsch et al., 2002; O'Maille et al., 2008).
I thank my postdoctoral advisor Joseph P. Noel for his support and helpful comments in preparation of this manuscript. I also acknowledge the constructive comments from the reviewers. J.K.W. was supported by the Howard Hughes Medical Institute and a postdoctoral fellowship from the Pioneer Foundation.