Being by far the largest family of enzymes to support plant metabolism, the cytochrome P450s (CYPs) constitute an excellent reporter of metabolism architecture and evolution. The huge superfamily of CYPs found in angiosperms is built on the successful evolution of 11 ancestral genes, with very different fates and progenies. Essential functions in the production of structural components (membrane sterols), light harvesting (carotenoids) or hormone biosynthesis kept some of them under purifying selection, limiting duplication and sub/neofunctionalization. One group (the CYP71 clan) after an early trigger to diversification, has kept growing, producing bursts of gene duplications at an accelerated rate. The CYP71 clan now represents more than half of all CYPs in higher plants. Such bursts of gene duplication are likely to contribute to adaptation to specific niches and to speciation. They also occur, although with lower frequency, in gene families under purifying selection. The CYP complement (CYPomes) of rice and the model grass weed Brachypodium distachyon have been compared to view evolution in a narrower time window. The results show that evolution of new functions in plant metabolism is a very long-term process. Comparative analysis of the plant CYPomes provides information on the successive steps required for the evolution of land plants, and points to several cases of convergent evolution in plant metabolism. It constitutes a very useful tool for spotting essential functions in plant metabolism and to guide investigations on gene function.
At present, 5100 sequences of plant cytochrome P450s (CYPs) have been annotated and named. The number of annotated CYPs in plants is significantly greater than is seen in other taxa: 1461 in vertebrates, 2137 in insects, 2960 in fungi, 1042 in bacteria, 27 in Archaea and two in viruses (Mimivirus) (Nelson, 2011). The 5100 sequences include 3651 CYPs from 11 fully sequenced plant genomes: Arabidopsis, papaya, grapevine, soybean, tomato, rice, Brachypodium distachyon, Selaginella moellendorffii (lycopod), Physcomitrella patens (moss), Chlamydomonas reinhardtii and Volvox carteri (colonial green algae) (see Table S1 in Supporting Information). An additional 1449 CYP sequences are named from 255 incompletely sequenced plant species with 1–183 sequences [Medicago truncatula (barrel medic, Fabales) http://www.medicagohapmap.org/downloads_genome/Mt3.5/genome_stats.php]. Named sequences can be found at the Cytochrome P450 Homepage (Nelson, 2009): http://drnelson.uthsc.edu/CytochromeP450.html. With the present increase in genome sequencing capacity, these numbers will be rapidly outdated, but analysis of existing data is already very informative.
When looking at individual plant genomes, CYPs form the third largest family of plant genes (245 in Arabidopsis when excluding predicted pseudogenes). The two largest gene families code for F-box proteins and receptor-like kinases (692 and 610 genes, respectively, in Arabidopsis) (Table 1). Typically, the ‘CYPome’ (i.e. the CYP complement of a given species) from angiosperms consists of around 300 genes grouping in about 50 families (Table S1). CYP genes represent around 1% of the plant protein-coding genes, a proportion only outranked and approached by genes coding for regulatory proteins and some families of transcription factors (C2H2, C3H, bHLH). CYPs thus constitute by far the largest family of enzymes in plant metabolism, the next being glycosyl transferases 1 (123 genes in Arabidopsis). For this reason, CYPs are an excellent mirror/reporter of plant evolution, and in particular of the evolution and role of plant metabolism in development and adaptation. The large number of CYP enzymes points to the importance of metabolite functionalization for bioactivity in signaling and defense, and for polymerization (lignin, suberins, cutins, sporopollenin) and the emergence of complex anatomical (e.g. pollen exine, anther cuticle) and chemical (e.g. taxol, vinblastine) structures. The CYPs have a key function for generating the chemical diversity that is the hallmark of plants compared with animals.
Table 1. Large gene families in Arabidopsis
Number of genes
C2H2 transcription factors
C3H transcription factors
bHLH transcription factors
AP2-EREBP transcription factors
MYB transcription factors
Glycosyltransferases family 1
MADS transcription factors
NAC transcription factors
Homeobox transcription factors
B3 DNA binding
MAP kinase kinase kinases
bZIP transcription factors
Peroxidase, class III
bZIP transcription factors
WRKY transcription factors
Glycoside hydrolase family 28
Subtilisin-like serine family
EF-hand containing proteins: group IV
Plant CYP families and nomenclature
The CYPs have been classified into families and subfamilies based on homology and phylogenetic criteria (Nelson et al., 1996). The original definition of CYP family membership was 40% amino acid sequence identity or higher. This was easy to achieve when very few sequences were known. As more and more sequences have been added, there is a phenomenon called family creep that stretches the 40% boundary backwards. If two adjacent families on trees are clearly separated, and a new sequence falls just outside the 40% family boundary but is obviously closest to one of the two families, it is more logical to include it in an existing family rather than making a whole new family just for that one sequence. Now the boundary is set instead of say 39% for membership in that family. For very large families, one can see that mergers will occur as sequences that were originally 37 or 38% identical to a known family become absorbed into the adjacent large family due to family creep. This phenomenon has occurred in the CYP71 family and in the CYP89 family (see examples below). Two choices are possible: rename these families as subfamilies of the large encapsulating family, or keep the names and make a footnote concerning the association with the larger family. For gene names that have been published and used in the literature it is best to make the footnote rather than change the name.
Naming of plant CYPs proceeded without taking orthology into account (e.g. cinnamate hydroxylase from Arabidopsis bears the same family and subfamily name as cinnamate hydroxylase from Helianthus tuberosus, but a different CYP number: CYP73A5 versus CYP73A1). This enormously increases the number of plant CYP names. A recent trend in nomenclature of vertebrate genes has been to emphasize orthology by applying the same CYP name to orthologs. This may also be applied to plant CYPs when relevant in the future, but would considerably increase the workload in the naming process.
The number of families/subfamilies in an organism is usually correlated to the number of main functions of CYPs. Those CYPs within the same family (or subfamily for large groups) can catalyze subsequent steps in the same pathway or similar reactions on different substrates. For example CYP71Cs in maize catalyze the sequence of reactions leading from indole to the benzoxazinone DIMBOA (Gierl and Frey, 2001). Conversely, CYP98s in Arabidopsis catalyze the meta-hydroxylation of the ring of different phenolic conjugates, esters or amides (Bassard et al., 2010) or CYP79s the conversion of different amino acids into oximes (Bak et al., 2006). Plant CYPs group so far into 127 families. This compares with 19 families in vertebrates, 67 in insects, but 399 in fungi and 333 in bacteria (Nelson, 2011). While the latter reflect a larger number of sequenced genomes, they also indicate that fungi and bacteria have evolved an even larger chemical diversity than plants, although a subset of them is also used for growth-supporting biodegradation. The increase in gene families discovered in plants in the last few years almost exclusively results from the sequencing of the genomes of algeae including those of Ostreococcus and Micromonas (21 new families), of moss (15 new families) and Selaginella (24 new families). As new angiosperm genomes (tomato and Brachypodium) are sequenced, no new CYP families are being discovered. The sequence space (the number of discoverable families) from angiosperms (Nelson, 2011) thus seems to be close to saturation. Consequently, the major functions of CYPs in flowering plants are most likely close to being covered by the named sequences. The genes per family in 14 annotated plant genomes are shown in Table S1.
Definition and evolution of plant CYP clans
Cytochrome P450 clans are deep gene clades observable on phylogenetic trees. This is illustrated by Figure 1 that shows a tree built using one representative member of each CYP family so far listed in plants. Each clan is derived from a single gene ancestor. Land plants have 11 clans, with the green alga Chlamydomonas having four or five unclassified deep branches that may be additional clans that evolved in single-celled green plants. More algal genomes will be needed to clarify this point. The 11 land plant clans form two groups: single-family clans (CYP51, CYP74, CYP97, CYP710, CYP711, CYP727, CYP746) and multi-family clans (CYP71, CYP72, CYP85, CYP86).
Figure 2 shows the occurrence of plant CYP families and clans across plant phylogeny taking into account the 5100 named plant CYPs. The families are grouped in clans and inside each clan families with the oldest members (deepest in the phylogeny) are arranged to the left side. Younger families are successively added to the right. This produces a stairstep pattern that shows the first appearance of CYP families in plant evolution. For example, in the CYP71 clan eight CYP families first appear in the Magnoliids, indicating a diversification of CYPs at the onset of the angiosperms. The figure also illustrates temporal relationships among CYPs. CYP77 and CYP89 are sister families (the phylogenetic tree in Figure 1 shows that they share a common ancestor), but looking at Figure 2 it is clearly apparent that CYP89 (present only in angiosperms) is younger than CYP77 (already found in lycopods). CYP89 thus had to evolve from CYP77 after duplication and divergence. This information is not apparent from phylogenetic trees alone.
The loss of a CYP family in a lineage such as rosids or asterids would produce a vertical white stripe in the column for that family. Examination of Figure 2 shows that such gene loss is extremely rare. CYP709 in the CYP72 clan is missing from all asterids, but only Solanales (potato and tomato) have full genome sequences. The CYP709 family may appear in another of these plant orders as more sequencing is done. Among eight missing families in Rosales after searches in nr, ESTdb and GSS, three families were found in the new apple genome sequence in the WGS section of NCBI. This demonstrates the necessity for whole genome sequences for confidence that a family is really missing from a taxon.
The flowering plants are evolutionarily young, with the oldest known fossil dated to 125 million years ago (Sun et al., 2002; Pennisi, 2010). There have been very few new CYP families added after angiosperms first appeared. CYP728 was first observed in Amborella, the basal angiosperm. Monocots have the first appearance of CYP87, CYP722 and CYP733, all of which also occur in dicots. So, these families evolved in their common ancestor. CYP99 and CYP723 are monocot specific, so they evolved after separation from dicots. CYP712 and CYP718 are found first in rosids. CYP702, CYP705 and CYP708 are Brassicales-specific families. Their evolution reflects the emergence of Brassicales-specific pathways. Other plant CYP families were present before angiosperms arose.
Figure 2 shows that CYP family loss in plants seems to be limited to single taxa and maybe to single species. It is rare to find two adjacent taxa that are missing the same CYP family. Even as the birth and death of sequences is a stochastic process, once created and functional, a CYP plant family is apparently difficult to lose.
The single-family clans are constrained from duplication and are most likely under purifying selection. The enzymes they include seem to be fundamental and highly conserved. Five of these single-family CYP clans are ancient, with orthologs already present in green algae (CYP51, CYP97, CYP710, CYP711, CYP746). Single-family clans usually code for enzymes with essential functions. CYP746 is a single-family clan that exists in green algae and moss, but it has not continued as a recognizable family into lycopods or vascular plants.
The CYP55 family in Chlamydomonas is related to the CYP55 family of fungi and probably results from horizontal transfer (see below). An animal CYP4-like clan is found in green algae, but not in land plants.
Of the four multiple-family CYP clans, CYP71, CYP72 and CYP85 have members in the liverworts. The CYP85 clan also appears to have divergent members in green algae. The best BLAST hits of CYP737, CYP738, CYP739 and CYP740 are all CYP85 clan members, though they do not cluster with CYP85 on the tree. The youngest clan in this set is CYP86 with three families present in moss (CYP86, CYP94, CYP704). All land plant clans were present by the time moss evolved. The expected completion of the liverwort genome (Marchantia polymorpha) may push some of these first appearances back in time.
The clustering of families on the global plant CYP tree (Figure 1) hints at some deeper relationships. CYP86 is adjacent to CYP97. The CYP97 family is a carotenoid metabolizing family. CYP86 clan enzymes take fatty acids and alkanes as substrates. A transition from carotenoids to fatty acids/alkanes is easy to imagine. Similarly, the sector that contains CYP51, CYP710 and the CYP85 clan may have evolved from a sterol metabolizing CYP51 ancestor. This most likely reflects the history acquisition of new functions. This is particularly obvious in the case of the CYP51 > CYP710 > CYP85 lineage where membrane sterol metabolism led to the emergence of steroids (brassinosteroids) and other large terpenoids.
Single-family CYP clans
CYP51 is one of the oldest and the most conserved eukaryotic CYPs. CYP710 most likely evolved from it very early in eukaryote evolution. The fungal ortholog of CYP710 is called CYP61. CYP710 and CYP61 have recently been recognized as belonging to a single clan that pre-dated the first split that led to plant–fungal divergence and may be nearly as ancient as the CYP51 clan. Both are found in Euglenozoa, Chromista (Alveolates plus Stramenopiles), brown algae and fungi. CYP51 and CYP710 are paired enzymes in sterol biosynthesis. CYP710A1 has been identified as a C-22 desaturase that catalyzes the synthesis of β-sitosterol in Arabidopsis (Morikawa et al., 2006). CYP710/CYP61s act after CYP51 (sterol 14α-demethylase) in sterol biosynthesis. Animals have lost the CYP710/CYP61 gene, which alters their sterols. Arthropods have lost both of them and rely on their diet for their membrane sterols. CYP51 with single or low copies in plant genomes was long considered as the prototype of a stable and highly conserved CYP due to high selection pressure. Recent work, however, has shown that grasses have recruited CYP51 to a new role in secondary metabolite synthesis. Rice has a CYP51H subfamily with multiple members. The same CYP51H subfamily has been characterized in Avena strigosa (oats) with CYP51H10 catalyzing the second step in avenacin biosynthesis (Qi et al., 2006). Avenacin is a triterpene glycoside antifungal toxin. This CYP51H subfamily is the only known excursion of CYP51 away from its ancestral role in sterol biosynthesis.
CYP74 was first observed in liverworts when looking just at plant taxa, but it has also recently been reported in Rhizobacteria, Cnidaria and Chordata (in amphioxus) (Lee et al., 2008). CYP74 from amphioxus was associated with epoxyalcohol synthase activity. Lateral transfer is the most plausible explanation for the presence of CYP74s in Rhizobacteria. If CYP74 in plants and animals descended from a shared ancestor without lateral transfer, then CYP74 would have to have originated in single-celled protists. The possibility of a lateral transfer cannot be ruled out at this stage, and this would alter the predicted origin of this family. CYP74s are atypical CYP enzymes that do not require molecular oxygen and electron donors, but catalyze the rearrangement of fatty acid hydroperoxides with unusually high turnover as allene oxide synthase in the jasmonate pathway, hydroperoxide lyase and divinyl ether synthase to form compounds with antimicrobial or signaling functions (for a recent review see Hughes et al., 2009). CYP74s are the first plant CYPs whose crystal structure has been resolved (Lee et al., 2008; Li et al., 2008). Due to their atypical mechanism and properties, their structure is, however, not representative for modeling other plant CYPs.
Some single-family clans have multiple subfamilies that are conserved across plant taxa. CYP97 has three subfamilies, each with a single gene in most sequenced land plant genomes. Soybeans have additional copies due to two whole genome duplications (Schmutz et al., 2010). CYP97 is one of the oldest plant-specific CYP clans. It is also found in the brown alga Ectocarpus siliculosus (Cock et al., 2010). CYP97s are plastidial enzymes that catalyze ring-hydroxylation of α- and β-carotenes to form essential components of the light-harvesting systems (Tian and DellaPenna, 2004; Kim et al., 2009, 2010).
The CYP711 clan appeared early in plant evolution, probably after the CYP97 clan. CYP711 sequences are found in land plants from lycopods but not in mosses. They are usually found singly or in low copies, and with gene duplication (five genes) in monocots. CYP711s catalyze a critical and still unidentified step in the synthesis of the carotenoid-derived shoot-branching hormones strigolactones (Booker et al., 2005). The CYP711 clan has two additional related families present in green algae only: CYP743 and CYP744 have best BLAST hits to CYP711, though these do not cluster with CYP711 in the global tree. They may be long-branch attracted by CYP804A1 from Micromonas. Those sequences might provide some hint concerning the CYP711 clan’s ancestral activity. The CYP727 family is not found in the moss P. patens; however, CYP751 is present and it seems to be a member of the CYP727 clan. CYP751 may just be a long diverging CYP727 ortholog in moss. CYP727 and CYP746 do not have known functions.
The diversification of CYPs in the multiple-family clans parallels land plant evolution. The earliest families are key to adaptation to life on land. These clans, especially CYP71, CYP72 and CYP85, have expanded dramatically. CYP86 is a more conservative clan with only four families. The CYP86 clan is remarkably consistent over time. Figure 2(b) shows that CYP86, CYP94 and CYP704 are in all taxa with adequate sequence sampling from moss on to angiosperms. CYP96 is the youngest family in this clan, first observable in Magnoliids. Apparently, the CYP96 family is an angiosperm invention. The only complete angiosperm genomes that are missing CYP96 are apple and strawberry. Both are Rosales genomes and both have CYP86, CYP94 and CYP704 sequences. Loss of CYP96 may enhance the desirability of Rosales fruits as an animal food and help to disperse the seeds. CYP86 clan members are so far exclusively associated with hydroxylation or epoxidation or fatty acids, fatty alcohols or alkanes and their derivatives (Pinot and Beisson, 2011). The different CYP86 clan enzymes have different preferences for chain length (as seen in fatty acyl transferases) or hydroxylation at the terminal/subterminal positions.
It is more difficult to predict substrate preferences by sequence membership in clans CYP71, CYP72 and CYP85, although phylogeny can support function prediction within families or subfamilies. As mentioned before, the CYP85 clan is essentially devoted to the metabolism of medium to large isoprenoids, including brassinosteroids (CYP85 and CYP90 families). The CYP72 clan is associated with the metabolism of a diversity of fairly hydrophobic compounds including fatty acids and isoprenoids, with the catabolism of hormones (brassinosteroids and gibberellin, GA) and with the biosynthesis of cytokinins. The CYP71 clan represents by itself more than 50% of all plant CYPs and, consequently, a huge diversity of functions. Those cover metabolism of aromatic and aliphatic amino acid derivatives (phenylpropanoids, indolic derivatives, glucosinolates, cyanogenic glucosides), small isoprenoids (mono- and sesquiterpenoids) and some triterpenoid derivatives, alkaloids, fatty acids, precursors of hormones, including GAs (CYP701), and probably also novel hormones to be identified. The CYPs involved in the biosynthesis of hormones and biopolymers (lignin, cutin, sporopollenin) form the ‘root’ (most ancient branches) of the clan.
The CYPs within a single family or branch (subfamily in case of CYP71s) usually metabolize similar/related compounds. However, consecutive steps in the same pathway can also involve completely divergent enzymes. This is particularly true for those earliest evolved CYPs. For example, CYP701A and CYP88A act sequentially in the same pathway as ent-kaurene oxidase and kaurenoic acid oxidase, but belong to the CYP71 and CYP85 clans, respectively. Therefore, it is not always instructive to discuss the functions of CYP clan families based on their sequence relatedness. Instead, an evolutionary approach based on first appearance in plant phylogeny reveals the emergence of new biochemical functions as plants evolved from algae to angiosperms and can serve as a guide to CYP functions.
A snapshot of early plant CYP evolution: Chlamydomonas and Volvox
Chlamydomonas has 40 CYP genes and two pseudogenes in 22 different CYP families. Some of these genes have incomplete gene models and lack expressed sequence tag (EST) support (some may not be expressed). Chlamydomonas contains the six plant clans CYP51, CYP85, CYP97, CYP710, CYP711 and CYP746. It is missing five plant clans including CYP72, CYP74, CYP86, CYP727 and CYP71; the latter is the dominant land plant clan. Even moss contains 41 CYP71 clan genes, suggesting that the CYP71 clan is unique to land plants.
A surprising finding is that the Chlamydomonas genome appears to encode only five previously known CYP families, four of which are present in plants and one in fungi. The other 17 Chlamydomonas CYP families may be unique, although preliminary analyses suggest that close orthologs for 17 of the 22 Chlamydomonas families exist in Volvox. CYP55, CYP737, CYP738, CYP740 and CYP770 are found in Chlamydomonas but not in Volvox. Furthermore, there are two groups of Chlamydomonas sequences that are more animal-like than plant-like. One group is most similar to animal CYP4 clan members (see phylogeny in Figure 2b top right). The CYP711 clan members (CYP743, CYP744) are similar to CYP5 (thromboxane A2 synthase in the CYP3 clan). CYP5 is only found in vertebrates. These sequences match well to an unnamed Trichoplax adhaerens sequence in the CYP3 clan (XM_002111744).
The presence of CYP51 and CYP710 was expected since those or their orthologs are found in fungi and more ancient phyla, such as brown algae and Euglenozoa. There is only one CYP710B1 sequence in Chlamydomonas. Most plants have multiple CYP710s, although Populus has only one. The red alga Cyanidioschyzon merolae has only five CYP genes, with two encoding CYP710 polypeptides (they are nearly identical to each other). The third clan shared by land plants and green algae is CYP97. This clan has three subfamilies in all plants, and all three subfamilies are encoded in the Chlamydomonas and Volvox genomes, as expected for photosynthetic organisms where major carotenoids found in land plants are present. The fourth family, CYP746, has no known function. It is present in moss but in no other plants that have been analyzed.
CYP85 and CYP711 clans are represented in green algae by families divergent or ancestral to the families found in land plants. Functional analysis of these CYP737, CYP738, CYP739, CYP740, CYP743 and CYP744 families might thus provide some hint about the evolution of the brassinosteroid and strigolactone signals and their ancestor’s function(s). In the same way, the members of the CYP4-like clan found in green algae can help to retrace the evolutionary history of the CYP86 clan since all share conserved signature sequences (Benveniste et al., 1998) and most likely also fatty-acid-metabolizing activity. In Chlamydomonas, CYP85 clan members CYP740, with CYP739 sequences are found in a six-member cluster on an 84-kb block of chromosome 10. The CYP711 clan has 12 members or one-third of all Chlamydomonas CYPs. If all these CYP711 clan enzymes metabolize apocarotenoids, there would likely be a large number of apocarotenoid derivatives in Chlamydomonas. Those might be related to the presence of retinal and rhodopsin-type photoreceptors, to the eyespot formation and function and to phototaxis (Kreimer, 2009).
CYP55 is present in Chlamydomonas, but not in Volvox, moss and other plant genomes sequenced so far. CYP55 in filamentous fungi is a soluble nitrite/nitrate reductase (P450nor) that does not require a redox partner protein and is essential for denitrification (Kizawa et al., 1991). All other eukaryotic CYPs are membrane bound, while bacterial CYPs are soluble. The CYP55 family is closely related to the bacterial CYP105 and was almost certainly acquired by fungi from bacteria. The origin of the Chlamydomonas CYP55 is unclear; however, the ChlamydomonasCYP55B1 gene does share one intron boundary with the Neurospora crassaCYP55A6.
The absence of the CYP74 clan in Chlamydomonas and Volvox suggests that either the oxylipin pathway was lost in unicellular/colonial algae or that the jasmonate pathway evolved in land plants after divergence of chlorophyte and streptophyte lineage. A possibility is that CYP74 evolved in animals and was later laterally transferred to land plants via symbiotic bacteria.
Emergence of plants onto land
Green algae and land plant lineages are estimated to have diverged approximately 725 million years ago (Zimmer et al., 2007). The transitions from an aquatic environment to an air/land environment required some biochemical innovations. The transition to multicellular organization also demanded new functions, but the analysis of the colonial Volvox genome in comparison to Chlamydomonas found very few new genes (Pennisi, 2010; Prochnik et al., 2010). The CYP families that are first seen in liverworts (from ESTs) are CYP73, CYP74, CYP88, CYP98, CYP736 and CYP761. More CYP families may exist in liverworts, but the Marchantia genome is not available yet. For example, CYP701 that catalyzes the conversion of ent-kaurene to kaurenoic acid for the formation of GAs (Helliwell et al., 1999) acts upstream of CYP88 in the GA biosynthesis pathway, so CYP701 would presumably be present in liverworts. Sequencing of the Marchantia genome will solve this mystery. It is also curious that the moss Physcomitrella does have CYP701 but does not have a CYP88 gene. It may have been lost in this genome. This seems confirmed by the absence of angiosperm-type active GAs and of the GA perception proteins GID1 and DELLA in moss though they exist in Selaginella (Hirano et al., 2007; Hayashi et al., 2010). In agreement with the presence of CYP701, quite large amounts of ent-kaurenoic acid were detected in P. patens protonemata (Hayashi et al., 2010). It was found essential for chloronemata to caulonemata differentiation, but the active GA-type endogenous compound in moss has still not been identified. The presence of two different CYP74A ESTs (BJ854294, BJ840718) in Marchantia shows that allene oxide synthase and the jasmonate pathway are present or evolving. It has been noted that GAs modulate jasmonate signaling (Hou et al., 2010) so the presence of CYP74 suggests that GAs may also be present or co-evolving, arguing for the existence of CYP701 in liverworts.
CYP761 is specific to liverworts and moss and lost in later evolution. CYP761 is clearly a CYP71 clan member, one of five CYP71 clan families already known in liverworts, anticipating the expansion of the CYP71 clan in moss to 41 members. The catalytic function of CYP736s is still unknown. Despite their early emergence, they do not seem to fulfill essential function(s) since they are absent in several taxa such as Arabidopsis and monocots. Recent work, relates them to plant defense and cross-talk with microorganisms, either pathogens or symbionts. Up to eight copies are present in the grape genome, with the three CYP736A25 variants in tandem on chromosome 7 (FJ828517) found differentially regulated at the transcriptional and post-transcriptional levels (via use of different initiation starts, polyadenylation sites and differential splicing) in Xylella fastidiosa susceptible and tolerant lines (Cheng et al., 2010). In soybean, CYP736A34 was identified as highly co-expressed with genes involved in root and Rhizobium-induced nodule development (Guttikonda et al., 2010).
CYP73s (C4H, cinnamate 4-hydroxylases) and CYP98s (C3′H, coumaryl shikimic acid/spermidine meta-hydroxylases) that catalyze the first two hydroxylations of the phenolic ring of phenylpropane units (Ehlting et al., 2006) are present in Marchantia and logically evolved earlier than CYP84s that catalyze the third hydroxylation of the phenolic ring of monolignols. They contribute to the synthesis of lignin in tracheophytes and probably to the formation of hydroids in moss (Weng and Chapple, 2010), but also of a large set of soluble phenolics present in mosses such as, caffeic, ferulic and caffeoylquinic acids (Jockovic et al., 2008), flavonoids or lignans (Umezawa, 2003). Both para- and meta-hydroxylation of phenolic precursors are most probably also required to form the precursors of the moss spore wall and of the UV-screening compounds found on moss sporophyte and gametophyte (Clarke and Robinson, 2008). UV-B absorption conferred by phenolic ring para-hydroxylation is usually considered as the main reason for the early evolution of the phenylpropanoid pathway (Weng and Chapple, 2010). Antioxidant activity provided by the second meta-hydroxylation might have become important in land plants. Reactivity of these antioxidants was palliated by conjugation with shikimate/quinate or spermidine that also provided cross-talk with shikimate and polyamine metabolisms (Schoch et al., 2001; Matsuno et al., 2009). The presence of lignin-like material in the cell wall of early non-vascular plants and the sporopollenin-like biopolymer found surrounding the zygotes of freshwater algae such as Charophytes (Weng and Chapple, 2010) suggest that evolution of CYP73 and CYP98 might in fact pre-date that of land-plants. CYP73s and CYP98s usually have few duplications in plant genomes, probably due to high constraints imposed by high metabolic fluxes and high channeling. It is thus noteworthy that CYP73 is present in four quite divergent copies in moss and at least two CYP73s in liverworts, since it seems indicative of early diversification in phenolic metabolism.
CYP75 for hydroxylation of the B-ring of flavonoids has not been reported yet in Marchantia, nor is it found in P. patens or Selaginella. Chalcone synthase is, however, present in moss (Jiang et al., 2006) and various flavonoids, including luteolin (3′-hydroxylated on the B ring) have been extracted from liverworts (Markham et al., 1998) and various moss species (Basile et al., 2003; Ryan et al., 2009). This would suggest convergent evolution of an independent B-ring hydroxylase in bryophytes. CYP788A1 from S. moellendorffii represents a well-documented case of such convergent evolution in lignin synthesis (Weng et al., 2006, 2010). Rather unexpectedly, Selaginella contains a significant amount of S lignin in cortical tissues. Weng and co-workers have shown that CYP788A1 catalyzes both the 3′ and 5′meta-hydroxylations of p-coumaraldehyde and p-coumaroyl alcohol and thus can bypass both reactions catalyzed by CYP98s and CYP84s in angiosperms. Another convergent evolution might have occurred in lignin metabolism, since the red algae Calliarthron cheilosporioides was recently reported to contain lignin with H, G and S units similar to angiosperms (Martone et al., 2009). These early CYPs are thus key enzymes in the evolution of important land plant hormone biosynthesis pathways, signaling/defense pathways and structural molecule biosynthesis pathways that evolved soon after emergence onto land.
By the time bryophytes evolved, seven more CYP families were established. CYP701 is definitely present in moss, though it may be older than that. CYP78, CYP86, CYP94, CYP703, CYP704 and CYP716 are new. CYP78 is once again a hormone biosynthesis CYP, essential for the production of a mobile growth factor mainly associated with flower and fruit size and coordination of apical meristem development (Ito and Meyerowitz, 2000Miyoshi et al., 2004; Eriksson et al., 2010; Kazama et al., 2010). The chemical structure of this signal is still unknown. CYP78As are found in all angiosperm species and moss, indicating an ancient conserved role. The presence of 3–10 paralogs in all plant genomes (three in moss) and mutant phenotypes suggest that they might catalyze successive steps in the same pathway, as CYP90s do in brassinosteroid metabolism.
CYP703 and CYP704Bs are respectively in-chain and omega fatty acid hydroxylases required to make precursors of the tough spore and pollen wall polymer sporopollenin (Morant et al., 2007; Dobritsa et al., 2009; Li et al., 2010). CYP704B2 in rice was also shown to provide lipid monomers for the synthesis of anther cutin (Li et al., 2010). CYP703 and CYP704 thus have essential roles in the protection of spores against desiccation upon conquest of land. It is interesting to note that while CYP704 is an early branch in the CYP86 clan, that includes the fatty acid hydroxylases of the CYP86 and CYP94 families (also functional in making cutin and suberin monomers to coat the exterior of the plant to protect against water loss; Pinot and Beisson, 2011), CYP703s share a common ancestor with CYP79s converting amino acids into oximes in the clan CYP71 (Bak et al., 2006). It will be interesting to reconstruct the evolution of this clade to the last available common ancestor to possibly understand this unexpected relationship. More can be found on the CYP86 clan in Nelson (2006) and Nelson et al. (2008).
CYP716s do not have a known function, but interestingly have as their closest non-plant relatives CYP26As, involved in hydroxylation of retinoic acid.
A large subset of CYP71 clan families and a few representatives of clans CYP85, CYP72 and CYP727 are found in moss but not in other land plants. Either these families evolved from a precursor after the moss lineage separated from other land plants or they became extinct in the other lineages. Those may provide candidates for pathways resulting from convergent evolution, producing, for example, moss flavonoids.
The next level of plant complexity is the lycopods represented by Selaginella. CYP77, CYP90 and CYP707 appear for the first time in lycopods. CYP77 is the second example, besides CYP703, of a branch evolved in the clan CYP71 that led to a fatty acid in-chain oxygenase. It was predicted by gene co-expression analysis (http://www-ibmp.u-strasbg.fr/~CYPedia/) and then confirmed with recombinant enzymes. CYP77s can either in-chain hydroxylate or expoxidize fatty acids, in particular with 16- and 18-carbon chain lengths to form the multiple precursors of cutin and suberin (Sauveplane et al., 2009). CYP77A6 from Arabidopsis thaliana was confirmed to be involved in the synthesis of flower cutin monomers (Li-Beisson et al., 2009; Pinot and Beisson, 2011). Omega hydroxylases all cluster in clan CYP86. Plant omega and in-chain fatty acid hydroxylases thus seem to result from independent radiations. It is interesting to note a CYP703 duplication in moss while it is present in single copy in all other land plants. This might indicate an early diversification of CYP703 function(s) beyond spore protection before the emergence of CYP77s. A subset of one CYP77A and one CYP77B seems the minimum required in angiosperms. CYP90 and CYP707 are the landmarks of the emergence of brassinosteroid biosynthesis and of abscisic acid (ABA) catabolism. CYP90 is the first family of the CYPs required for brassinosteroid synthesis. CYP90s catalyze the first steps and all the side chain and ring A modifications required for brassinosteroid biosynthesis (Nomura and Bishop, 2006). CYP90Bs, -As, -Ds and -Cs act successively in the brassinosteroid pathway. Each of them is single or just duplicated in each plant genome (Table S1). CYP90Cs that catalyze the 2-hydroxylation of the A-ring are not present in monocots. CYP85 catalyzes ring B oxidation in position 6 and ring extension for the formation of brassinolide. They are first seen in cycads and ginkgo. This implies the appearance of 6-oxo brassinosteroids with gymnosperms. It is interesting to mention that the presence of brassinosteroids has been reported in one green alga (Hydrodictyon reticulatum) and in Marchantia (Bajguz and Tretyn, 2003). If confirmed, this will mean again convergent evolution of steroid pathways in different phyla. CYP707s are the inactivators of ABA via 8′-hydroxylation to form phaseic acid, and play a key role in regulation of ABA-mediated physiological processes (Mizutani and Todoroki, 2006). Abscisic acid is detected at low concentrations in bacteria, fungi and algae, but seems to result from carotenoid-independent pathways (Hartung, 2010). A CYP707 radiation seems to coincide with the increase in ABA concentrations in plants that occurred upon colonization of the land and after the evolution of CYP97s (Hartung, 2010). Phaseic acid is found in some bryophytes. More genome sequencing might thus reveal that the CYP707s emerged earlier than lycophytes.
Ferns add four new CYP families. CYP75 in the CYP71 clan, is needed for flavonoid B-ring hydroxylation, as mentioned before. CYP75s‘bloom’ (show lineage-specific duplication as defined by Feyereisen, 2011) in grapevine, probably due to selection for organoleptic properties (high and complex flavonoid content). The three other novelties in ferns are CYP72, CYP709 and CYP735, all members of the CYP72 clan. The oldest members of the CYP72 clan are CYP765 and CYP766. CYP765 is found in moss and liverworts and CYP766 is found only in moss, but these CYP72 clan members do not continue as recognizable families into later plant species. Early in the nomenclature of the CYP72 clan there were only two families: CYP72 and CYP709. As more sequences (CYP714, CYP715, CYP721) were added the structure in the original families split and CYP72B was renamed CYP734 and CYP709A was renamed CYP735. CYP734A and CYP72C in angiosperms have similar functions, inactivating brassinosteroids (Takahashi et al., 2005; for a more recent update see Thornton et al., 2010). Therefore, while the role of the ancestral fern CYP72 is not established it might be related to brassinosteroid catabolism. The CYP72 family is, however, quite diverse, with 155 named sequences in the CYP72A subfamily that shows ‘blooms’ as for example in grape. The majority of these have unknown functions; however, demonstrated activity of the first member of the family, CYP72A1, as secologanin synthase in Catharanthus roseus indole alkaloid biosynthesis (Irmler et al., 2000) seems indicative of functions in the production of iridoid or other isoprenoid derivatives. CYP709C1 from wheat is able to hydroxylate fatty acids in ω-1 and ω-2 (Kandel et al., 2005), but its role in planta has to be established. The function of CYP735 is cytokinin biosynthesis (Takei et al., 2004; Sakakibara, 2005; Kamada-Nobusada and Sakakibara, 2009). Cytokinins are adenine derivatives with an isopentenyl side chain that becomes hydroxylated by CYP735 to form trans-zeatin. Cytokinins with hydroxylated side-chains are also found in moss, but essentially as cis-zeatin rather than trans-zeatin (Kamada-Nobusada and Sakakibara, 2009). This seems to coincide with radiation of an early branch of the cytokinin signal transduction pathway in moss (Pils and Heyl, 2009). It is thus likely that, in this case again, multiple solutions have been found in evolution to create novel branches in metabolic and signaling pathways. This is one further example of the role of CYPs in plant hormone metabolism.
Early gymnosperms represented by cycad and gingko sequences introduce seven new CYP families: CYP85, CYP92, CYP715, CYP720, CYP724 and CYP727. As mentioned before CYP85 represents an elaboration of the brassinosteroid pathway that began in lycopods and a switch from 6-deoxo- to 6-oxobrassinosteroids. CYP724 appears in the cycad Zamia vazquezii. CYP724 catalyzes a similar reaction (C22-hydroxylation) to the CYP90B subfamily (Ohnishi et al., 2006; Sakamoto et al., 2006). It is either redundant with CYP90B or it has as yet undetermined functions that would explain its conservation in all sequenced plant genomes. Information on the function of CYP92s is scarce. This CYP71 clan family is not represented in Arabidopsis, so it is not performing an essential function. CYP92B1 has been associated with fatty acid metabolism (Petkova-Andonova et al., 2002) in petunia flower bud. CYP92A5 was found to be responsive to elicitors in tobacco cells (Ralston et al., 2001). CYP92A6 in pea was reported to hydroxylate the C2 of brassinosteroids (see Nomura and Bishop, 2006). This does not seem to be the original function of CYP92s and might illustrate gene recruitment for a specific function in Fabaceae. CYP715 is another addition to the CYP72 clan with an unknown function that seems to be under purifying selection since it is present as a single copy in most plant genomes (two copies in poplar). CYP720 is a member of the CYP85 clan. It is found as a single copy in sequenced dicot genomes where it seems under strong purifying selection, but it is lost in monocots. CYP720 is thus likely to exert an essential function in dicots. The situation is very different in gymnosperms where the CYP720B subfamily has bloomed as a part of a defense-related terpenoid biosynthesis in conifers (Zulak and Bohlmann, 2010). CYP720B subfamily genes are induced by insect attack on trees. They are unusually promiscuous enzymes capable of catalyzing multiple oxidations on a broad range of diterpenoids in a reaction similar to kaurene oxidase (Ro et al., 2005; Zulak and Bohlmann, 2010). The last new family, CYP727, is the only maintained representative of this clan and has no known function. It is under strong purifying selection in both monocots and dicots, but was lost in Arabidopsis where it was possibly replaced by another enzyme.
Conifers are separated from angiosperms by more than 300 million years. They would be expected to have developed many new CYP families in that time. Progress in sequencing a conifer genome has been slow due to their very large size (20–40 Gb). Targeted isolation of CYP-containing bacterial artificial chromosome (BAC) clones has been accomplished, opening a way forward for CYP analysis at the conifer genome level (Hamberger et al., 2009). Current sampling by ESTs does not find many new CYP families. There are four conifer-specific families CYP725, CYP750, CYP798 and CYP799. The last two are only known from sequencing and no function has yet been assigned to them. CYP725 is in the taxol biosynthesis pathway (Jennewein et al., 2004; Rontein et al., 2008). CYP725A is exclusively in the Taxus genus. CYP725B is in also found in pine (Pinus) and spruce (Picea). The CYP725 family seems to be an offshoot of the much older CYP716 family. The CYP750 family was reported by Ro et al. (2005) but its function is not determined. The family has some relationship to CYP75 and CYP92, and all three of these families may share a common ancestor.
From gymnosperms to angiosperms
Relationships among the angiosperms are continuing to be revealed by more comprehensive phylogenetic analyses (Moore et al., 2010). Attempts to identify the deepest-branching angiosperm have singled out Amborella trichopoda as the best living candidate (Soltis et al., 2008). Some CYP sequences are already known from Amborella, but the genome has not been sequenced yet. Amborella contains the first appearance of the CYP71, CYP721 and CYP728 families. The number of first appearances may increase as the genome becomes available. This would make sense since the first angiosperm is expected to have many novel CYP-dependent pathways. CYP71 is the largest CYP clan and the largest CYP family in plants. Up to 84 CYP71 genes have been found in a single plant genome (rice). The family is an angiosperm innovation and CYP71A1 from avocado was the first named CYP to be cloned from plants. It was associated with the metabolism of monoterpenoids (Bozak et al., 1992). Several CYP71A and CYP71D enzymes from different plant sources were associated with terpenoid metabolism since then, usually for oxidation of mono- or sesquiterpenoids (Mau and Croteau, 2006). The CYP71 subfamilies have, however, been blooming in a species-specific manner, usually associated with the oxygenation of small molecules, for example of indolic derivatives in Arabidopsis and maize (Nafisi et al., 2007; Frey et al., 2009). These blooms seem to support adaptive responses of the corresponding taxa and were proposed to be associated with speciation (Feyereisen, 2011). CYP721 (CYP72 clan) and CYP728 (CYP85 clan) are of unknown function. The CYP728 family is lost in Arabidopsis.
Magnoliids are the first angiosperm group with good sequence representation and they contain the first appearances of eight CYP71 clan families (CYP79, CYP80, CYP81, CYP82, CYP84, CYP89, CYP93 and CYP706). Magnoliids also have the first CYP96 (CYP86 clan) and CYP714 (CYP72 clan). Many of those are associated with biotic interactions. The CYP79 family takes as substrates amino acids for making a whole range of compounds to repel herbivores and fight pathogens, such as cyanogenic glucosides, beta- and gamma-hydroxynitrile glucosides and glucosinolates (Bak et al., 2006; Nafisi et al., 2006; Bjarnholt and Møller, 2008; Jørgensen et al., 2011), or production of oximes (Møller, 2010). The CYP80 family has been associated with phenolic coupling and the biosynthesis of morphine in opium poppies (Frick et al., 2007) and hyoscyamine in black henbane (Hyoscyamus niger) (Nasomjai et al., 2009). CYP80 makes other alkaloids in other Ranunculales such as Coptis japonica (Ikezawa et al., 2003) and Eschscholzia californica (Pauli and Kutchan, 1998), but CYP80 is also present in many plants that are used as foods (grape, tomatoes and potatoes for example). This may reflect the presence of alkaloid toxins in their leaves, stems or green fruits or CYP80 recruitment for phenolic coupling in other pathways.
CYP84 catalyzes the 5-hydroxylation of the phenolic ring of coniferaldehyde and coniferyl alcohol to form syringyl (S) precursors of lignin and soluble sinapoyl derivatives. The presence of S lignin is typical of angiosperms (Meyer et al., 1996; Osakabe et al., 1999; Ehlting et al., 2006). This results in a new kind of wood (hardwood) that differs from conifers that do not have this CYP family. The guayacyl/syringyl copolymer is assumed to provide superior mechanical support and a better chemical barrier to pathogens (Weng and Chapple, 2010). CYP81 is another blooming-prone family that seems to contribute to various defense pathways in several taxa. CYP81Es from legumes catalyze B-ring hydroxylations of isoflavonoids (Ayabe and Akashi, 2006). CYP81Q1 from Sesamum indicum was shown to catalyze the formation of methylenedioxy bridges of the lignan sesamin (Ono et al., 2006). In Arabidospis, CYP81F2 4-hydroxylates the indole ring of indole-3-ylmethylglucosinolate for glucosinolate metabolism resulting in activation of plant defense against pathogens and aphids (Bednarek et al., 2009; Clay et al., 2009; Pfalz et al., 2009). CYP93B/C family members are also flavonoid/isoflavonoid biosynthesis enzymes acting upstream to CYP81s (for review see Ayabe and Akashi, 2006). The CYP93Bs and CYP93Cs are responsible for the oxidative attack of the ring C of flavonoids at positions 2 or 3 for the formation of flavones and licodione (CYP93Bs), or of isoflavones (CYP93C; restricted to legumes). CYP93A is a pterocarpan hydroxylase for the formation of the isoflavonoid phytoalexin glyceollins in pea.
CYP89 is closely related to the older family CYP77. It is probably an offshoot from it. Unlike CYP77, CYP89 has bloomed in several species of monocots and dicots. CYP89A35 in Capsicum annuum (chili pepper) is linked to pathogen response and is induced by salicylic acid and ABA (Kim et al., 2006). The CYP706 family shares a common ancestor with CYP75 and CYP76 families. Only one has an assigned function. CYP706B1 from cotton (Gossypium arboreum) is a (+)-δ-cadinene-8-hydroxylase (Luo et al., 2001), in the pathway for gossypol biosynthesis. Gossypol is a sesquiterpene aldehyde associated with plant defense.
A novelty in the CYP72 clan is CYP714. CYP714D1 from rice acts on GAs to inactivate them by epoxidation (Ma et al., 2006; Zhu et al., 2006). The gene is called EUI (elongated uppermost internode). EUI catalyzes 16α,17-epoxidation of non-13-hydroxylated GAs. This affects plant growth. Other subfamilies in CYP714 might be active on other GAs or other diterpenoids. For example CYP714A2 from A. thaliana is patented as a steviol synthase for the production of the sweetening diterpene derivative steviol (Yamaguchi, 2008).
CYP96 seems to be the most recent radiation in the CYP86 clan. Only the function of CYP96A15 from Arabidopsis is reported (Greer et al., 2007). It is a mid-chain alkane oxidase required for the surface wax biosynthesis. CYP96 might thus provide the wax precursors for reinforced surface protection against water loss.
Three CYP families appear in monocots that are also found in many dicots, namely CYP87, CYP722 and CYP733. All three of them belong to the CYP85 clan. CYP87A3 was spotted as an auxin- and light-responsive gene in rice coleoptile by Chaban et al. (2003). These authors suggested it might be part of a feedback mechanism to suppress auxin-induced growth. CYP722 and CYP733 are represented by single-copy genes in most plant genomes, indicating strong purifying selection and a possible role in plant development. CYP733 is, however, lost in Arabidopsis. Both are related to the ABA-metabolizing CYP707.
Monocots have two grass-specific families CYP99 and CYP723. CYP99 was named before the CYP71 family was populated with many sequences. CYP99 now falls inside the CYP71 family on trees, illustrating to the phenomenon of family creep in large CYP families. CYP83 (in glucosinolate biosynthesis; Nafisi et al., 2006) and CYP726 (fatty acid epoxidase from Euphorbia; Cahoon et al., 2002) have also merged into the ever-expanding CYP71 family. CYP99 is a diterpene oxidase from the momilactone biosynthesis gene cluster in rice (Shimura et al., 2007; Wang et al., 2011). Rice CYP723A2 is 40% identical to CYP89F1 from rice and seems to share a common ancestor with CYP77 and CYP89. It has no assigned function.
The lower eudicots debut two new families, CYP719 and CYP749. CYP719 results from CYP701-related radiation in the CYP71 clan, restricted to Ranunculales and Aristolochiales. CYP719 enzymes are part of the same alkaloid synthesis pathways as CYP80s to form methylendioxy bridges (Ikezawa et al., 2003, 2008; Gesell et al., 2009; Díaz Chávez et al., 2011). The CYP749 family in CYP72 clan is found in rosids, asterids and Ranunculales, but it is absent from Arabidopsis. No function has been assigned to CYP749.
CYP718 in the CYP85 clan is first seen in Vitis vinifera and is present also in many rosids and a few asterids. This family seems to be an offshoot of CYP716 in the core eudicots. Expressed sequence tags of CYP718 mRNAs are very rare. This gene seems to be expressed only in special conditions or for short times. Its function is not known. CYP712, in the CYP71 clan, also emerges in Vitales. It probably results from the evolution of a tandem duplicate of CYP93 since CYP712 genes occur adjacent to CYP93s in several plant genomes (Nelson et al., 2008) and both are closely related paralogs (Figure 1). This physical clustering also suggests that CYP93s and CYP712 may catalyze successive steps in the same pathway(s) in different plants (Osbourn and Field, 2009).
The remaining families, CYP702, CYP705 and CYP708, are Brassicales specific. CYP702 and CYP708 in clan 85 share a common ancestor with CYP87 (Figure 1). CYP705 is clearly a further diversification of CYP712, based on sequence identities in grape and soybean (Nelson et al., 2008). CYP705 (clan 71) and CYP708 (clan 85) seem closely linked in their co-evolution for the synthesis of triterpenoid derivatives (Field and Osbourn, 2008). They belong to common gene clusters exemplified by the operon-like cluster leading to the synthesis of thalianol-derived triterpenoids in Arabidopsis. This cluster assembles on chromosome 5 within 30 kilobases, a terpene synthase (the thalianol synthase), a thalianol hydroxylase (CYP708A2), a thalianol-diol desaturase (CYP705A5) and a BADH acyltransferase (Field and Osbourn, 2008). Similar gene associations are found in the Arabidopsis genome, that are predicted to lead to other triterpenoids. Other clusters of genes for the synthesis of defense compounds are also found in rice, leading to the formation of avenacin and of momilactone, or in maize for the formation of benzoxazinones (Osbourn and Field, 2009). This gene organization was proposed to facilitate inheritance of genes in the same pathway to avoid accumulation of deleterious intermediates. Some CYP702 and CYP705 genes are clustered in Arabidopsis. CYP702s are thus likely to also be related to triterpenoid metabolism.
Arabidopsis provides another interesting example of concerted gene evolution to build a novel pathway (Matsuno et al., 2009). It concerns the older and conserved CYP98 family and involves successive retroposition of the parent gene, strong positive selection for neo- or subfunctionalization, and a second tandem duplication for neofunctionalization. The resulting novel pathway leads to successive 3′- and 5′-hydroxylations of the phenolic ring of p-coumaric acid conjugated to spermidine for the synthesis of a major component of the pollen coat. This event occurred during evolution of the Brassicaceae and is also detected in Brassica napus. It is the first example of pathway birth via retroposition in higher plants.
Plant CYP evolution in a narrower time window: comparison of rice and Brachypodium distachyon
The analysis of the CYPs presented so far has focused on long time-scales across fairly deep phylogenetic distances. The CYPs do continue to evolve in smaller time-scales and even within a single taxon. Here we show a detailed comparison of two grasses, rice and B. distachyon. Brachypodium diverged from rice ∼40–53 million years ago (The International Brachypodium Initiative, 2010). Figures 3 and 4 show phylogenetic trees comparing 329 rice sequences with 226 Brachypodium sequences. These are the full-length, non-pseudogenes from the two genomes, with a few nearly intact pseudogenes included (these have P on the end of the CYP name). The Brachypodium branches are labeled in red and the rice branches are green. Orthologs appear as red and green pairs. Note that clear orthologs are given the same CYP name across the two species.
As mentioned above, genes with essential functions in signaling or development are expected to be under strong purifying selection. Duplicates tend to be eliminated and paralogs are easily recognized. Duplicated genes have more chances to be duplicated again (Feyereisen, 2010). Thus more recently evolved families are more versatile and likely to provide starting material for positive selection and for generating diversity to match adaptive requirements. This is well illustrated by the trees in Figures 3 and 4.
Figure 3 includes only the CYP71 clan members in 19 families. Based on phylogeny and the tree in Figure 3, CYP99 is inside the CYP71 family and CYP723 is inside the CYP89 family, though the best match is only 39% to other CYP89 members. These two families have been tagged as grass specific in Figure 2. In fact, they might be better viewed as subfamilies of CYP71 and CYP89, respectively. Figure 4 shows the remaining nine CYP clans with 27 CYP families. This tree has 139 rice branches and 113 Brachypodium branches. Genes in this tree feature most CYPs with functions in hormone synthesis and catabolism, carotenoids and oxylipin metabolism. There are 46 easily recognizable orthologous pairs, with a few more that are probable based on the tree. Approximately 40% of the Brachypodium sequences are in orthologous pairs with rice, twice as many as seen in the CYP71 clan. The largest bloom is seen in the CYP728C subfamily with seven genes in rice and only one in Brachypodium. The core set of CYPs within each clan is conserved as orthologs, while some blooms occur in clans CYP72, CYP85 and CYP86. Gene losses seem to be less common, since it is rare to find a gene cluster that is only in rice or only in Brachypodium. The CYP71 tree in Figure 3 has 23 pairs of orthologs (one pair is with a sorghum sequence included because rice did not have an ortholog to CYP71AM1). There are 190 rice sequences and 113 Brachypodium sequences in the tree. Only 19% of the 133 possible orthologs come up as true orthologs. This implies considerable divergence between the two grass genomes, with expansion of some families occurring most often in rice – see the CYP71X subfamily with 11 genes in rice and only one in Brachypodium (one o’clock), or the CYP71Z subfamily with a ratio of 8 : 1 in favor of rice (12 o’clock). The direction is, however, not always in rice’s favor: CYP89E is 8 : 1 in favor of Brachypodium (4 o’clock). These divergences of the grass CYPomes permits one to anticipate considerable divergences in their metabolomes. In addition, the compared CYPomes in Figures 3 and 4 nicely illustrate how each gene bloom and related metabolic diversification is initiated from a pre-existing gene and core pathway.
As more sequence information becomes available, an increasing amount of information can be extracted from comparison of the plant CYPomes. In the near future, enzymes in taxon-specific pathways will be identified from the comparison of their respective CYPomes. This information will be coupled to search for operon-like clusters (Osbourn and Field, 2009), gene-coexpression (see, e.g. http://www-ibmp.u-strasbg.fr/~CYPedia/; Ehlting et al., 2008) and proteome analyses, for accelerated pathway reconstruction. Focus on taxon-specific gene bursts will in addition provide huge resources for functionalization of target molecular structures when required for industrial applications.
Readily available data identify a subset of genes that appear under high purifying selection. Those are likely to serve important functions in higher plants. They also point to the duplication of genes encoding enzymes with well-documented functions in major pathways. This, for example, is the case for enzymes in phenylpropanoid metabolism. This pathway is so far described as linear. Gene duplication will need to be taken into account to incorporate additional complexity or diversification in such pathways. Given present data, this complexity may vary in different taxa. The comparative plant CYPome also identifies potential algal or moss homologs of genes with important functions in angiosperms. Determination of their functions will support reconstruction of the evolutionary processes that led to the emergence of angiosperms and highlight alternative solutions that have been found in different lineages to support important signaling or structural functions. Conversely, the comparative CYPome highlights alga- or moss-specific enzymes. Those are expected to be representative of alga or moss metabolic requirements. As recently proposed by Weng et al. (2010) in the case of lignin engineering, some of the enzymes evolved in lower plants might provide interesting solutions or shortcuts for the improvement of higher plants.
DW is grateful to the European Commission for the funding of the SmartCell project. Financial support of the Agence Nationale pour la Recherche to PHENOWALL (ANR-10-BLAN) is also gratefully acknowledged.