Genomic clustering of cyanogenic glucoside biosynthetic genes aids their identification in Lotus japonicus and suggests the repeated evolution of this chemical defence pathway

Authors


(fax +45 35333300; e-mail frro@life.ku.dk).

Summary

Cyanogenic glucosides are amino acid-derived defence compounds found in a large number of vascular plants. Their hydrolysis by specific β-glucosidases following tissue damage results in the release of hydrogen cyanide. The cyanogenesis deficient1 (cyd1) mutant of Lotus japonicus carries a partial deletion of the CYP79D3 gene, which encodes a cytochrome P450 enzyme that is responsible for the first step in cyanogenic glucoside biosynthesis. The genomic region surrounding CYP79D3 contains genes encoding the CYP736A2 protein and the UDP-glycosyltransferase UGT85K3. In combination with CYP79D3, these genes encode the enzymes that constitute the entire pathway for cyanogenic glucoside biosynthesis. The biosynthetic genes for cyanogenic glucoside biosynthesis are also co-localized in cassava (Manihot esculenta) and sorghum (Sorghum bicolor), but the three gene clusters show no other similarities. Although the individual enzymes encoded by the biosynthetic genes in these three plant species are related, they are not necessarily orthologous. The independent evolution of cyanogenic glucoside biosynthesis in several higher plant lineages by the repeated recruitment of members from similar gene families, such as the CYP79s, is a likely scenario.

Introduction

Plants have evolved a large spectrum of chemical defence compounds, and while some are produced by only a limited number of related species, others, such as cyanogenic glucosides, are more widely distributed. Over 60 different cyanogenic glucosides, amino acid-derived α-hydroxynitrile glucoside compounds, are known from over 2600 plant species ranging from ferns to angiosperms, including a disproportionately large number of crops (Seigler, 1975; Conn, 1980; Jones, 1998). Upon tissue disruption, cyanogenic glucosides are degraded by specific β-glucosidases, resulting in the release of hydrogen cyanide, which provides a defence mechanism against generalist herbivores (Morant et al., 2008). Cyanogenesis is often regarded as an evolutionarily ancient plant defence mechanism with a single evolutionary origin implied due to the fact that all presently identified enzymes for the first committed step, the conversion of an amino acid into an oxime, are cytochrome P450s belonging to the CYP79 family (for reviews, see Bak et al., 2006; Bjarnholt and Møller, 2008).

The first genes encoding enzymes for cyanogenic glucoside biosynthesis were identified in Sorghum bicolor using biochemical approaches (Koch et al., 1995; Bak et al., 1998a; Jones et al., 1999), which provided the basis for the identification of genes in other plant species by sequence homology. Sorghum produces the aromatic cyanogenic glucoside dhurrin, derived from tyrosine, and its biosynthesis requires three sequential steps involving two cytochrome P450 enzymes and an UDP-glucosyltransferase. The first cytochrome P450, CYP79A1, converts tyrosine into an oxime intermediate, which is converted by the second cytochrome P450, CYP71E1, into a hydroxynitrile, which is then stabilized by glucosylation catalysed by the UDP-glucosyltransferase UGT85B1, resulting in dhurrin formation (Figure 1).

Figure 1.

 The hydroxynitrile glucoside biosynthetic pathways in S. bicolor and L. japonicus.
In S. bicolor, the synthesis of dhurrin from tyrosine involves CYP79A1 (Koch et al., 1995), CYP71E1 (Bak et al., 1998a) and UGT85B1 (Jones et al., 1999). L. japonicus produces the cyanogenic glucosides linamarin (from valine) and lotaustralin (from isoleucine), as well as non-cyanogenic hydroxynitrile glucosides called rhodiocyanoside A and D (from isoleucine). CYP79D3 is the predominant enzyme in leaves; CYP79D4 is expressed in the root (Forslund et al., 2004). The isoleucine-derived 2-methylbutanal oxime intermediate is used in the synthesis of lotaustralin and rhodiocyanosides A and D. CYP736A2 is involved in the production of cyanogenic glucosides, while rhodiocyanoside production depends on the product encoded by the Rho locus. The UDP-glucosyltransferases identified belong to the UGT85 family.

Similarly, in other plant species, members of the CYP79 family have been shown to catalyse the first step in cyanogenic glucoside biosynthesis. In cassava (Manihot esculenta), the cyanogenic glucosides linamarin and lotaustralin are synthesized from the amino acids valine and isoleucine, respectively, involving the activity of CYP79D1 and CYP79D2 (Andersen et al., 2000). The model legume Lotus japonicus contains two genes, CYP79D3 and CYP79D4, that encode cytochrome P450 enzymes involved in the production of linamarin and lotaustralin and related non-cyanogenic β- and γ-hydroxynitrile glucosides called rhodiocyanosides (Forslund et al., 2004). Rhodiocyanoside biosynthesis is thought to diverge from cyanogenic glucoside biosynthesis at the level of the hydroxynitrile intermediate, and natural variation for the presence or absence of rhodiocyanosides involves a genetic locus that we refer to here as Rho (Figure 1) (Bjarnholt et al., 2008). In white clover (Trifolium repens), an adaptive defence polymorphism controlling the presence or absence of cyanogenic glucosides in natural populations is conferred by variation at the Ac locus. Ac encodes the CYP79D15 enzyme, which catalyses the first step in lotaustralin and linamarin biosynthesis (Olsen et al., 2008). The production of oximes from amino acids by members of the CYP79 family is also part of the biosynthesis of non-cyanogenic defence compounds such as glucosinolates and camalexin (Bak et al., 1998b; Hull et al., 2000; Reintanz et al., 2001; Glawischnig et al., 2004). Functional homologues of the sorghum CYP71E1 and UGT85B1 enzymes are more difficult to identify because they are members of large protein families and lack conserved sequence motifs indicating their specific metabolic function, and only recently have they been identified in cassava (Jørgensen et al., 2011; Kannangara et al., 2011).

The legume Lotus japonicus is used as a genetic model to study cyanogenesis, and we recently reported a high-throughput genetic screen for cyanogenesis deficient mutants (Takos et al., 2010). Here we identify the cyd1 locus as encoding the CYP79D3 gene, providing genetic evidence that CYP79D3 is the enzyme that produces cyanogenic glucosides in leaves of mature Lotus plants. The genomic region containing CYP79D3 contains at least five loci encoding enzymes involved in hydroxynitrile glucoside biosynthesis. These include the CYP736A2 and UGT85K3 genes, which complete the cyanogenic glucoside biosynthetic pathway, and the genetic locus responsible for natural variation in rhodiocyanoside biosynthesis (Rho). Clustering of genes involved in cyanogenic glucoside biosynthesis was also observed in cassava and sorghum, but the three genomic regions are very diverse in structure. Based on the available plant genome sequence data, it is a likely possibility that the recruitment of these sets of genes for cyanogenic glucoside synthesis represent independent events.

Results

The cyanogenesis deficient1 mutant is defective in CYP79D3

A screen for mutants defective in cyanogenesis (Takos et al., 2010) identified cyd1 as an acyanogenic mutant that, under the screening conditions, did not contain hydroxynitrile glucosides (either cyanogenic glucosides or rhodiocyanosides) in leaves. As synthesis of all hydroxynitrile glucosides in the cyd1 mutant was severely affected, a defect in an early step of the biosynthetic pathway, possibly in the formation of the oxime from the amino acid precursors, was suggested (Figure 1). Two paralogous cytochrome P450 genes, CYP79D3 and CYP79D4, had been shown previously to convert valine and isoleucine to the corresponding oximes, and of these CYP79D3 is predominantly expressed in leaves (Forslund et al., 2004). To determine the chromosomal position of the cyd1 locus, the mutant was crossed with L. japonicus cv. Gifu B-129 (Handberg and Stougaard, 1992). Analysis of F2 plants for the acyanogenic phenotype demonstrated close linkage of the recessive cyd1 locus with the SSLP marker TM1185 on genomic contig CM0241 of chromosome 3. In the L. japonicus genome sequence (Sato et al., 2008; http://www.kazusa.or.jp/lotus), both CYP79D3 and CYP79D4 are localized on contig CM0241 and estimated to be 240 kb apart. The sequences of the CYP79D3 (CM0241.700) and CYP79D4 (CM0241.310) genes were determined in the cyd1 mutant, and while both exons of the CYP79D4 gene were intact, it was not possible to amplify the second exon of CYP79D3 from the cyd1 mutant.

The Lotus genome database contains a 1926 bp fragment, LjSGA_030566, that contains a sequence that is 98% identical at the nucleotide level to part of the CYP79D3 intron and the entire second exon. LjSGA_030566 is thought to be a pseudo-gene lacking the first exon, and its precise chromosomal location is presently unknown. A number of single nucleotide differences, and most notably a 32 bp insertion in the LjSGA_030566 sequence, distinguish it from CYP79D3. Primers spanning this insertion site amplified two bands from genomic DNA of the wild-type MG-20 (191 and 223 bp), but only the larger LjSGA_030566-derived band was amplified from the cyd1 mutant (Figure 2a). Lack of a CYP79D3 transcript in the cyd1 mutant also indicated that the gene was defective (Figure 2b).

Figure 2.

 The cyd1 mutant is defective in CYP79D3.
(a) PCR amplification showing the 32 bp insertion polymorphism between CYP79D3 and LjSGA_030566 using MG-20 and cyd1 genomic DNA as template.
(b) Absence of the CYP79D3 transcript in the cyd1 mutant was determined by quantitative RT-PCR.
(c) Restoration of cyanogenesis in leaves of the cyd1 mutant transformed with a 35S:CYP79D3 construct was detected using HCN-sensitive Feigl–Anger paper.
(d) Extracted ion chromatogram of the MG-20 parental line, the cyd1 mutant, and the cyd1 mutant complemented with the 35S:CYP79D3 construct. Apical leaves of 5-month-old tissue culture-grown plants were analysed. Extracted ion peaks are for sodium adducts: linamarin (m/z 270, cyan); rhodiocyanoside A and D (m/z 282, red); lotaustralin (m/z 284, blue).

To confirm that loss of a functional CYP79D3 gene was responsible for the lack of cyanogenesis in the cyd1 mutant, the gene was re-introduced by transforming the cyd1 mutant with the CYP79D3 coding region under the control of the 35S CaMV promoter. Cyanogenesis was restored in multiple independent transformed lines as shown for line 5 in Figure 2c. Complementation of the cyd1 phenotype was further analysed by LC-MS of leaf extracts, and showed not only restored biosynthesis of lotaustralin and linamarin, but also of the non-cyanogenic hydroxynitrile glucosides rhodiocyanoside A and D (Figure 2d).

Contig CM0241 contains a UGT85 that is involved in cyanogenic glucoside biosynthesis

The first UDP-glucosyltransferase identified as glucosylating hydroxynitriles was SbUGT85B1, which is involved in dhurrin production in sorghum (Jones et al., 1999). In L. japonicus, the CM0241 contig contains a member of another UGT85 sub-family (gene CM0241.610) that has been assigned the name UGT85K3 following the recommendations for nomenclature of Mackenzie et al. (1997). A second gene, UGT85K2 (T45I18.30), encodes a protein that is 86% identical at the amino acid level to UGT85K3. UGT85K2 is located further down on chromosome 3 at approximately 31.6 cM (approximately 18 cM from CM0241). Both UGT85K2 and UGT85K3 show 41% amino acid identity with SbUGT85B1, and 56% identity with UGT85K4 and UGT85K5, which are involved in cyanogenic glucoside production in cassava (Kannangara et al., 2011).

The coding regions of UGT85K2 and UGT85K3 were amplified as cDNAs from MG-20 and expressed in Escherichia coli. To test the ability of UGT85K2 and UGT85K3 to glucosylate the different hydroxynitriles, E. coli extracts were incubated with 20 mm of the putative acceptors and 3.3 μm UDP-[14C]-glucose as donor. The reaction products formed were visualized on TLC plates and verified by LC-MS analysis. Both UGT85K2 and UGT85K3 were able to glucosylate acetone cyanohydrin (the linamarin aglycone), and 2-hydroxy-2-methylbutyronitrile (the lotaustralin aglycone) (Figure 3).

Figure 3.

 Glucosylation of hydroxynitrile substrates by UGT85K2 and UGT85K3.
(a) Activity of E. coli-expressed UGT85K2 and UGT85K3 on acetone cyanohydrin (AC) and 2-hydroxy-2-methylbutyronitrile (HBN) substrates using UDP-[14C]-glucose (G) as donor and analysed by TLC.
(b) Unlabelled standard for linamarin (lin) and lotaustralin (lot) run on the same plate and visualized following charring by H2SO4.

The CYP736A2 gene completes a functional biosynthetic pathway for cyanogenic glucoside biosynthesis in Lotus japonicus

In sorghum, the second step in cyanogenic glucoside biosynthesis is mediated by CYP71E1 (Bak et al., 1998a), and a functional homologue, CYP71E7, was recently characterized in cassava (Jørgensen et al., 2011). CYP71E7 was identified using CYP71E1 as the query in a BLASTp search of the cassava genome sequence, and the proteins show 49% identity. Using either CYP71E1 or CYP71E7 to query the Lotus genome sequence identified members of the CYP83E sub-family as the most similar, with identity scores of approximately 40%. Using CYP71E7 as the query, the highest identity score (42%) was obtained for CYP83E4 (CM1089.70), and no member of the CYP71E sub-family was identified in the Lotus genome. However, the CM0241 contig contains a member of another cytochrome P450 gene family named CYP736A2 (CM0241.850), as well as three closely related pseudo-genes (CM0241.540, CM0241.880 and CM0241.940). The CYP736A2 amino acid sequence shows approximately 35% identity with CYP71E1 and CYP71E7. The CYP71, CYP83 and CYP736 families all belong to the CYP71 clan, which contains over 50 CYP families (Nelson and Werck-Reichhart, 2011). Within the CYP71 clan, the three families are part of the same sub-branch formed by seven CYP families. Their phylogenetic relationship is shown in Figure 4. The genome sequences of sorghum, cassava and Lotus were queried using the CYP71E1, CYP71E7, CYP736A2 and CYP83E4 sequences. The sequences with the highest homology scores from each BLASTp search were collected, and a phylogenetic tree was constructed using the MEGA5 package (Figure 4) (Tamura et al., 2011). Additional protein sequences of members characteristic for the relevant CYP71 clan families, mostly from other plant species, were included in the analysis. Families just outside this sub-branch of the CYP71 clan are represented by CYP76A3, CYP80A1, CYP92A1 and CYP786A1, and anchor the tree.

Figure 4.

  Phylogenetic analysis of CYP736A2 in relation to the CYP71Es and CYP83Es.
The protein sequences most closely related to CYP736A2, CYP71E1, CYP71E7 and CYP83E4 (indicated with black circles) were collected from the Lotus, sorghum and cassava genomes. A phylogenetic tree was constructed by the neighbour-joining method using a 1000-replicate bootstrap test. Gene identifiers are those used in the respective genome databases; additional assigned names are indicated. Characteristic members of several families from the CYP71 clan were included for comparison (indicated with diamonds).

As shown in Figure 4, the CYP71 and CYP83 families are most closely related to each other, and group with the CYP726 and CYP99 families. These four families are now considered to have merged due to expansion of the CYP71 family as more sequences have become available (Nelson and Werck-Reichhart, 2011). The CYP736 family containing the Lotus CYP736A2 gene is not found in monocots, and consequently no sorghum homologues were identified. The CYP736 family is most closely related to the CYP84 and CYP750 families, and is clearly distinct from the CYP71E sub-family (Figure 4).

The genomic co-localization of CYP79D3 and UGT85K3 with CYP736A2 suggested that the latter might encode the second step of the cyanogenic glucoside pathway. To test this possibility, all three genes were cloned into the pJAM1502 vector under the control of the CaMV 35S promoter, and their activity was tested by Agrobacterium-mediated transient expression in Nicotiana benthamiana. Testing the L. japonicus genes separately and in various combinations showed that only the triple combination of CYP79D3, CYP736A2 and either UGT85K2 or UGT85K3 resulted in the production of linamarin and lotaustralin (Figures 5a and S1). Interestingly, the production of rhodiocyanosides was not observed. The cassava gene CYP71E7 was able to effectively substitute for CYP736A2, but the Lotus CYP83E4 was not (Figure 5b,c). The in planta enzymatic activities provide strong support for the idea that these genes make up the cyanogenic glucoside biosynthetic pathway in L. japonicus, and that the genes encoding the pathway are clustered on chromosome 3.

Figure 5.

 Reconstitution of the L. japonicus cyanogenic glucoside biosynthetic pathway by transient expression in Nicotiana benthamiana.
(a) Extracted ion chromatogram of tobacco leaves co-infiltrated with the L. japonicus genes CYP79D3, CYP736A2 and UGT85K3. Extracted ion peaks are for sodium adducts: linamarin (m/z 270, cyan); rhodiocyanoside A and D (m/z 282, red); lotaustralin (m/z 284, blue), as indicated in Figure 2(d).
(b) As (a), with cassava CYP71E7 replacing CYP736A2.
(c) As (a), with CYP83E4 replacing CYP736A2.
(d) As (a), with the mutated CYP736A2 (R366K) from cyd4.

The cyd4 mutant is defective in CYP736A2

In planta expression demonstrated the role of CYP736A2 in the biosynthesis of the cyanogenic glucosides linamarin and lotaustralin, and that CYP736A2 was not involved in the synthesis of rhodiocyanosides (see Figure 5). Consequently, a putative mutant in the CYP736A2 gene was expected to show no or reduced levels of cyanogenic glucosides but wild-type levels of rhodiocyanosides, resulting from the presence of an intact Rho gene (Figure 1). One of the lines obtained in the genetic screen for cyanogenesis-deficient mutants, referred to here as cyd4, displayed the predicted metabolic profile, with a >60% reduction in lotaustralin and wild-type levels of rhodiocyanosides (Figure 6). Sequencing of the CYP736A2 gene in the cyd4 mutant showed a G→A nucleotide substitution, changing the arginine at position 366 to a lysine. R366 in CYP736A2 is the invariant arginine residue of the EXXR motif that is present in virtually all cytochrome P450 proteins, and is essential for formation of the tertiary structure (Hasemann et al., 1995; Rupasinghe et al., 2006). When the mutated version of CYP736A2 was cloned as a cDNA from cyd4 and transiently expressed in Nicotiana benthamiana in combination with CYP79D3 and UGT85K3, no production of either linamarin or lotaustralin was observed (Figure 5d). These data provide genetic support for the role of CYP736A2 in the biosynthesis of cyanogenic glucosides in L. japonicus.

Figure 6.

 The cyd4 mutant defective in CYP736A2 contains reduced levels of cyanogenic glucosides.
(a) Extracted ion chromatogram of the cyd4 mutant. Extracted ion peaks are for sodium adducts: linamarin (m/z 270, cyan); rhodiocyanoside A and D (m/z 282, red); lotaustralin (m/z 284, blue), as indicated in Figure 2(d).
(b) Quantitative comparison of hydroxynitrile glucoside content in leaves of MG-20 wild-type and cyd4. Values are the means of five biological replicates ± SE.

Expression of cyanogenic glucoside biosynthetic genes in relation to leaf age

Quantitative real-time PCR was used to examine the expression profile of CYP736A2, UGT85K2 and UGT85K3 in comparison with that of CYP79D3. Characteristically, CYP79D3 expression is related to leaf age, with the highest expression observed in the apical leaves (Figure 7) (Forslund et al., 2004). CYP736A2 expression showed a similar expression profile to that observed for CYP79D3, with transcript levels decreasing in older leaves. The UGT85K2 and UGT85K3 gene expression levels did not change with leaf age. Although the genes are not strictly co-regulated, the overlap in expression pattern of CYP79D3, CYP736A2, UGT85K2 and UGT85K3 supports the view that these genes encode the enzymes of the pathway for cyanogenic glucoside biosynthesis in L. japonicus.

Figure 7.

 Expression of the L. japonicus cyanogenic glucoside biosynthetic genes in relation to leaf age, determined by quantitative RT-PCR.
Values are relative to the expression level in the 3rd leaf, and are the means of three biological replicates ± SE.
(a) Expression of CYP79D3.
(b) Expression of CYP736A2.
(c) Expression of UGT85K3.
(d) Expression of UGT85K2.

Natural variation in rhodiocyanoside biosynthesis is genetically linked

We previously investigated natural variation in hydroxynitrile glucoside content in L. japonicus accessions, and showed that absence of the non-cyanogenic rhodiocyanosides A and D in the MG-74 accession (see Figure 1) was due to a single recessive genetic locus named rho (Bjarnholt et al., 2008). The map position of rho was initially established using 30 re-selected rho F2 lines from a MG-74 × MG-20 cross, and all showed the MG-74 genotype at marker TM1185 on chromosome 3. The mapping population was expanded to 350 F2 lines by initially selecting recombinants between markers TM0436 (at 10.5 cM) and TM0403 (at 24 cM). These two markers were polymorphic between MG-20 and MG-74 and flanked the region containing rho. The recombinant F2 lines, and, where necessary, their segregating F3 progeny, were analysed for rhodiocyanoside production, and genotyped using additional markers. This established that rho was localized between markers TM1185 on contig CM0241 (at 13.3 cM) and TM0450 (at 15.3 cM). The genomic sequence in this region contains most of contig CM0241 but is discontinuous between CM0241 and BAC clone TM0450, and the identity of the Rho gene remains to be established.

Clustering of genes involved in cyanogenic glucoside biosynthesis in the Lotus, cassava and sorghum genomes

The clustering of the genes for biosynthesis of cyanogenic glucosides and related hydroxynitrile glucosides in the genome of L. japonicus prompted the analysis of additional plant genomes of cyanogenic species. Such an analysis is possible in cassava and sorghum due to recent progress in the availability of genome sequences (Paterson et al., 2009; Cassava Genome Project 2010, http://www.phytozome.net/cassava) and identification of CYP71 and UGT85 homologues involved in cyanogenic glucoside biosynthesis in cassava (Jørgensen et al., 2011; Kannangara et al., 2011).

In the sorghum genome sequence (available at http://www.phytozome.net/sorghum), the three genes previously identified as encoding the cyanogenic glucoside biosynthetic pathway are clustered within a 104 kb region localized at the top of chromosome 1 (Figure 8). Figure 8 also shows an uncharacterized member of the CYP71E sub-family, Sb01g001160, located to the left of CYP71E1 (Sb01g001180), that has 61% amino acid identity to CYP71E1 (see also Figure 4). The region generally shows low gene density, but also contains an alcohol dehydrogenase (Sb01g001170) and two putative transposases of the hAT element superfamily.

Figure 8.

 Schematic representation of the clustering of cyanogenic glucoside biosynthetic genes in the genomes of L. japonicus, S. bicolor and M. esculenta.
Functional genes are presented by arrows indicating their orientation. Confirmed genes in cyanogenic glucoside biosynthesis are labelled above each bar, with CYP79 genes in pink, CYP71E and CYP736 genes in green, and UGT85 genes in blue. The three CYP736A2-like pseudo-genes are indicated below the L. japonicus bar, as is the additional CYP71 in S. bicolor. The Rho locus is within 2 cM of the CM0241 contig. The remaining genes are numbered and annotated as follows: (1) acireductone dioxygenase, (2) nucleic acid binding, OB-fold, (3,4) hypothetical proteins, (5) ribonuclease H, (6) short-chain dehydrogenase/reductase, (7,8) putative transposases, (9) leucine-rich repeat receptor-like protein kinase, (10,11) α/β-fold hydrolases, (12) hypothetical protein, (13–15) putative hydroxynitrile lyases, (16) hypothetical protein.

Clustering of the biosynthetic genes for cyanogenic glucoside metabolism was also observed in cassava; a 100 kb region of the 4.6 Mb Scaffold08265 (genome draft version Cassava4.1; http://www.phytozome.net/cassava) is shown in Figure 8. In cassava, two cytochrome P450 genes of the CYP79 family, CYP79D1 and CYP79D2, are known, and the CYP79D2 gene (cassava4.1_005079) is part of the gene cluster, which also contains the two CYP71E paralogues (Andersen et al., 2000; Jørgensen et al., 2011) and the UDP-glucosyltransferase genes UGT85K4 and UGT85K5 (Kannangara et al., 2011). The CYP79D1 gene (cassava4.1_005059) is not part of the gene cluster and is localized on Scaffold08167, which is 356 kb long. The CYP79D1 and CYP79D2 genes are each adjacent to a highly similar leucine-rich repeat receptor-like protein kinase gene (>90% identity), suggesting a duplication event. In contrast to the gene clusters in Lotus and sorghum, the one in cassava is relatively gene-rich. Of the additional genes in the region, a set of three genes with homology to hydroxynitrile lyases is noteworthy, as these may play a role in cyanogenic glucoside catabolism (cassava4.1_027709, cassava4.1_031085 and cassava4.1_014254, numbered 13–15 in Figure 8).

The region of the cyanogenic glucoside biosynthetic gene cluster in Lotus is the most extended in terms of overall size. Only a limited number of functional genes are present, including a putative aci-reductone dioxygenase (CM0241.600) and a retrotransposon-related ribonuclease H domain-containing protein (CM0241.860), while additional partial gene fragments related to retroelements are also present (not indicated in Figure 8). Although clustering of genes for cyanogenic glucoside biosynthesis is observed in all three species, no other points of similarity could be identified, and the regions differ substantially in size, gene density, the identity of additional genes and the presence of transposon-related sequences.

Discussion

CYP79D3 as shared first enzyme in the biosynthesis of hydroxynitrile glucosides

Cyanogenesis is the release of toxic HCN following tissue damage and hydrolysis of cyanogenic glucosides, and results from the inherent instability of the α-hydroxynitrile aglycones. In L. japonicus, the isoleucine-derived α-hydroxynitrile glucoside lotaustralin co-occurs with the non-cyanogenic compounds rhodiocyanoside A (a γ-hydroxynitrile glucoside) and rhodiocyanoside D (a β-hydroxynitrile glucoside) (Forslund et al., 2004). We previously reported natural variation in rhodiocyanoside content in L. japonicus, and a diversification of the hydroxynitrile glucoside biosynthetic pathway at the level of the nitrile intermediate was suggested based on biochemical considerations (Bjarnholt et al., 2008). The present study of the cyd1 mutant provides genetic evidence for the idea that the first enzymatic step is shared. The cyd1 mutant was previously identified in a leaf-based screen for cyanogenesis-deficient mutants, and metabolite analysis showed that it contains neither cyanogenic glucosides nor the related rhodiocyanosides (Takos et al., 2010). This suggested a defect early in the biosynthetic pathway, possibly at the formation of the oximes from the amino acid precursors valine and isoleucine. The map position of the cyd1 mutant and the absence of the second exon indicate that cyd1 contains a defect in the CYP79D3 gene encoding a leaf-expressed oxime-producing cytochrome P450 enzyme. Complementation of the cyd1 mutant with a functional CYP79D3 gene restored the biosynthesis of all hydroxynitrile glucosides.

Genomic clustering identified the CYP736A2 and UGT85K3 genes

The genomic region with the genes encoding CYP79D3 and its paralogue CYP79D4 also contained candidate genes for the remaining two steps in cyanogenic glucoside biosynthesis in L. japonicus: the UDP-glucosyltransferase UGT85K3 and the cytochrome P450 CYP736A2. Both UGT85K3 and the closely related UGT85K2 were able to glucosylate the hydroxynitrile aglycones of linamarin and lotaustralin in vitro. No close homologue of the sorghum CYP71E1 or cassava CYP71E7 was identified in the Lotus genome sequence, and the candidacy of CYP736A2 was suggested by the observed genomic co-localization. The activity of CYP736A2 was demonstrated by reconstituting the Lotus biosynthetic pathway using transient co-expression in N. benthamiana. The triple combination of CYP79D3, CYP736A2 and UGT85K3 produced the cyanogenic glucosides linamarin and lotaustralin in tobacco. The cyd4 mutant provided genetic evidence for the role of CYP736A2 in cyanogenic glucoside biosynthesis. The CYP736A2 protein in the cyd4 mutant is defective in the invariant arginine of the EXXR motif (Hasemann et al., 1995), and its substitution by a lysine is likely to result in improper protein folding, thus explaining the observed lack of enzymatic activity. The cyd4 mutant showed a large reduction in cyanogenic glucoside content but wild-type levels of rhodiocyanosides, indicating that CYP736A2 is specific for cyanogenic glucoside biosynthesis. The metabolic profile of the cyd4 mutant most likely reflects the activity of the Rho enzyme, primarily involved in producing rhodiocyanosides but also capable of some cyanogenic glucoside production, although the possible existence of a third oxime-metabolizing enzyme cannot be excluded. Consistent with the idea that the Rho enzyme does not exclusively produces rhodiocyanosides is the observation that none of the available cyd mutants or any of eighty natural Lotus accessions produce rhodiocyanosides in the absence of lotaustralin. Identification and molecular characterization of the Rho gene is required to confirm this.

Gene clustering in cyanogenic glucoside biosynthesis

Our work in L. japonicus demonstrated that the biosynthetic genes for cyanogenic glucoside metabolism were co-localized in the genome. Additional gene clusters containing the genes for the biosynthesis of cyanogenic glucosides were observed in sorghum and cassava. The regions differed considerably in size and structure, with only the presence of the cyanogenic glucoside biosynthetic genes in common. The Lotus cluster is 355 kb long when including CYP79D4 but not considering the Rho locus. The region containing CYP79D3, CYP736A2 and UGT85K3 is 160 kb long, larger than the 104 kb region containing the corresponding genes in sorghum, or the 83 kb region observed in cassava. The cluster in cassava is relatively gene-dense and contains a total of 13 genes, including two CYP71E and two UGT85K genes. The diverse genomic structure of the three gene clusters suggests they have independent origins.

While gene clusters originating from repeated tandem duplication events are common in eukaryotes, the genetic and physical clustering of non-homologous genes that are part of the same biosynthetic pathway is rare. In plants, such clustering of non-homologous genes has only been described for a number of chemical defence compounds, mostly terpenoids (reviewed by Chu et al., 2011). Genes for benzoxazinoid biosynthesis in maize (Zea mays) are co-localized within a 6 cM region on chromosome 4 (Frey et al., 1997), genetic loci for avenacin biosynthesis in oat (Avena sativa) are clustered within 3.6 cM of the β-amyrin synthase gene AsbAS1 encoding the first committed enzyme of the pathway (Qi et al., 2004), and a 168 kb region containing genes for the biosynthesis of momilactones has been identified in rice (Oryza sativa) (Shimura et al., 2007). Rice also contains a second diterpenoid biosynthetic gene cluster that is multi-functional, containing genes for the production of two distinct sets of phytoalexins (Swaminathan et al., 2009). Arabidopsis contains a gene cluster for synthesis of the triterpene thalianol (Field and Osbourn, 2008).

It is presently unclear how metabolic gene clusters arise and why gene clustering is observed for some plant secondary metabolic pathways, for example the biosynthesis of terpenoids and cyanogenic glucosides, but not for others, such as anthocyanin or glucosinolate biosynthesis. Several selective advantages that may drive and maintain the clustering of non-homologous genes in the biosynthesis of plant defence compounds have been proposed (Gierl and Frey, 2001; Qi et al., 2004; Chu et al., 2011). Clustering would allow inheritance of the defensive trait as a single functional unit, and in polymorphic populations this advantage could drive the clustering of genes following their functional evolution. Similarly, clustering of the genes of a biosynthetic pathway with toxic reaction intermediates would reduce the likelihood of its disruption by segregation. Clustering may also allow coordinated expression of all genes in the pathway, potentially by regulation at the chromatin level. These proposed selective advantages for gene clusters are not mutually exclusive, and may each contribute to establishing or maintaining this genomic organization. For example, a two-stage selection process was suggested for cluster formation in prokaryotes, where the proximity of genes resulting from selection driving genes together would establish the conditions for a separate selection for co-regulation and operon assembly (Martin and McInerney, 2009).

When evaluating these selective advantages in relation to gene clustering in the cyanogenic glucoside biosynthetic pathway, the toxicity of intermediates is the most convincing amongst them. Cyanogenesis is a two-component system, and inheritance as a single defence unit would require the β-glucosidase to be co-localized with the biosynthetic genes. In L. japonicus, some linkage is observed between LjBGD2 (Takos et al., 2010; T33P07.150) and the biosynthetic genes, they are approximately 20 cM apart. However in sorghum, the β-glucosidase responsible for the hydrolysis of dhurrin is localized on chromosome 8 (Sb08g007570) while the biosynthetic genes are localized on chromosome 1. Operon-like gene clusters that facilitate coordinated expression of genes at the chromatin level have been suggested (Qi et al., 2004; Field and Osbourn, 2008). In L. japonicus, the CYP79D3 and CYP79D4 genes are differentially expressed, while UGT85K2 and UGT85K3 expression levels do not follow leaf age, arguing against co-ordination of gene expression driving the clustering of genes in cyanogenic glucoside biosynthesis. In contrast, the toxicity of the intermediates in the biosynthesis of cyanogenic glucosides has been experimentally demonstrated for the dhurrin biosynthetic pathway. Introduction of the sorghum genes CYP79A1 and CYP71E1 in Arabidopsis resulted in stunted plants with pleiotropic phenotypes, which were alleviated by the subsequent introduction of UGT85B1, resulting in dhurrin production and restoration of plant growth (Kristensen et al., 2005).

The CYP79 family and the repeated evolution of cyanogenic glucoside biosynthesis

The complete set of genes required for the biosynthesis of cyanogenic glucosides has now been identified in three plant species: the monocot S. bicolor and the dicots M. esculenta and L. japonicus. The genes encoding the different enzymes in these species are related but not necessarily orthologous. This suggests that cyanogenic glucoside biosynthesis has evolved independently in several higher plant lineages by recruiting members from similar gene families such as the CYP79s.

Members of the CYP79 family characteristically produce oximes from amino acids in the biosynthetic pathways of several chemical plant defence compounds, and besides cyanogenic glucosides these include glucosinolates and camalexin (Bak et al., 1998b; Hull et al., 2000; Reintanz et al., 2001; Glawischnig et al., 2004). Presently available sequence data show the first appearance of the CYP79 family in angiosperms, as is the case for the CYP71 family (Nelson and Werck-Reichhart, 2011). Because of the high substrate specificity of the oxime-producing CYP79 enzyme, it is regarded as the most selective step in determining which cyanogenic glucoside is produced in a certain plant species (Bjarnholt and Møller, 2008). A small CYP79 gene family is present in higher plant genomes, including non-cyanogenic species such as soybean (Glycine max) and poplar (Populus trichocarpa). The biological role of most of these CYP79 genes, including additional CYP79 genes in cyanogenic species, remains to be established. Phylogenetic analysis shows that the CYP79 enzymes involved in cyanogenic glucoside biosynthesis are more closely related to other CYP79s in the same plant species or lineage than to CYP79s involved in cyanogenic glucoside biosynthesis in more distant plant species (Figure S2). For instance, CYP79D3 and CYP79D4 from L. japonicus are part of a legume-specific clade, and only CYP79D15, which produces cyanogenic glucosides in white clover (T. repens), is likely to be orthologous (>90% identity).

The independent evolution of a similar function from homologous genes is referred to as ‘repeated evolution’, and is thought to represent the majority of convergent evolution events in plant secondary metabolism (Cseke et al., 1998; Pichersky and Lewinsohn, 2011). For example, a polyphyletic origin was proposed for the evolution of the pyrrolizidine alkaloid-mediated defence system in angiosperms (Reimann et al., 2004). The first enzyme in this pathway, homospermidine synthase, was independently recruited from deoxyhypusine synthase at least four times by loss of its ability to bind its protein substrate eIF5A. The first enzyme in the synthesis of benzoxazinoid pesticides, BX1, and the indole-3-glycerol phosphate lyase enzyme (IGL) both produce indole, as an intermediate and as a volatile defence compound, respectively, and are thought to have originated from independent gene duplications of the α-subunit of tryptophan synthase (Frey et al., 2000).

The ability to produce oximes from specific amino acids seems to be the overall characteristic of the CYP79 family. It has been suggested that the glucosinolate defence pathway, which has evolved independently on at least two occasions, originated from the cyanogenic glucoside pathway (Rodman et al., 1998), but it can equally be said that both independently evolved from the predisposition of members of the CYP79 family to produce oximes from amino acids and the need to convert the oximes into more stable and less reactive compounds.

Broad substrate specificity facilitated the repeated evolution of cyanogenic glucoside biosynthesis

In contrast with the CYP79s, members of the CYP71 and UGT families often have diverse and broad substrate specificities, and it has been argued that enzymatic versatility enables a limited number of glucosyltransferase genes to produce the large number of glucosides found in plants (Jones et al., 1999; Hansen et al., 2003; Schwab, 2003; Bjarnholt and Møller, 2008; Jørgensen et al., 2011). Broad substrate specificity or promiscuity of enzymes provide an immediately accessible starting point for the evolution of new functions (Khersonsky and Tawfik, 2010), and would facilitate the repeated evolution of cyanogenic glucoside biosynthesis following the emergence of a more specific signature enzyme, here for example a CYP79. It has also been suggested that the emergence of a signature enzyme by duplication and functional divergence could ‘seed’ the formation of a metabolic gene cluster (Chu et al., 2011).

The independent evolution of cyanogenic glucoside biosynthesis is most clearly demonstrated by the identification of CYP736A2 in L. japonicus. Although it could be argued that L. japonicus is exceptional in not possessing a CYP71E orthologue, the primary basis for identifying cyanogenic glucoside biosynthetic genes in plant species to date has been their homology to the original biochemically identified sorghum genes. Like the CYP79 and CYP71 families, CYP736s belong to the wider CYP71 clan. The CYP736 family shows an early emergence in plant evolution and is first seen in liverworts, but is also absent from several plant lineages including monocots (Nelson and Werck-Reichhart, 2011). Recent reports connect the CYP736 family with defence responses against microbial pathogens in grape (Vitis vinifera) (Cheng et al., 2010), and using microarray data, with nematode and Phytophthora infection and with nodulation in soybean (Guttikonda et al., 2010).

Convergent evolution of cyanogenic glucoside biosynthesis in plants and insects

In what is referred to as the ‘chemical arms race’ between plants and herbivores, the same solution may recur frequently, particularly for biosynthetic pathways involving a relatively small number of enzymes. While cyanogenic glucosides provide plants with a defence mechanism against generalist herbivores, some specialized herbivores have co-evolved with their host plants. The larvae of the Burnet moth (Zygaena filipendulae) feed on Lotus corniculatus and sequester the cyanogenic glucosides linamarin and lotaustralin from their food plants for use in their own defence against predators (Zagrobelny et al., 2007). They are also able to synthesize both compounds de novo when the supply from their food is insufficient. By convergent evolution, the Burnet moth has obtained a biosynthetic pathway with identical reaction intermediates but involving insect versions of the two cytochrome P450s, called CYP405A2 and CYP332A3, and a UGT gene named UGT33A1 (Jensen et al., 2011). This example from insects supports the idea of independent recruitment of genes for cyanogenic glucoside synthesis in plants. Further insights into the evolution of cyanogenic defence pathways in plants can be obtained by the identification of the biosynthetic genes in a diverse range of species, including non-angiosperms, which seem to lack members of the CYP79 family (Nelson and Werck-Reichhart, 2011). Gene clustering, if a general feature, may aid this identification.

Experimental procedures

Plant materials and growth conditions

Lotus japonicus cv. MG-20 (Legume Base, Miyazaki University, Japan) and the cyd1 mutant (Takos et al., 2010) were grown on half-strength Murashige & Skoog basal salt mix containing 1% w/v sucrose in 1% w/v agar. Plants were germinated and grown at 24°C with a 16 h light cycle unless otherwise indicated. Nicotiana benthamiana plants were germinated from seed and grown in soil in a greenhouse at 22°C.

Chemicals

UDP-[14C]-glucose (Perkin Elmer NEN Radiochemicals, http://www.perkinelmer.com/) was dried under nitrogen gas to remove ethanol and dissolved in water. The substrates acetone cyanohydrin and 2-hydroxy-2-methylbutyronitrile were synthesized in our lab by M.S.M. Lotaustralin was purified by preparative HPLC (Bjarnholt et al., 2008). Linamarin was purchased from AG Scientific (http://www.agscientific.com/). Amygdalin was purchased from Sigma-Aldrich (http://www.sigmaaldrich.com/).

RNA extraction and cDNA synthesis

Total RNA was prepared from 100 mg of plant tissue using an RNeasy plant mini kit (Qiagen, http://www.qiagen.com/) with on-column DNase I digestion. For cDNA synthesis, 1–2 μg of total RNA was reverse-transcribed by SuperScript III reverse transcriptase (Invitrogen, http://www.invitrogen.com/) in a reaction primed using 50 μm oligo(dT)20 or using an iScript cDNA synthesis kit (Bio-Rad, http://www.bio-rad.com/), which contains MMLV-derived reverse transcriptase and is primed using a blend of oligo(dT) and random hexamer primers.

Genetic markers and genome analysis

The L. japonicus genome sequence (Sato et al., 2008) and marker information is available at http://www.kazusa.or.jp/lotus/index.html. Genome assembly build 2.5 was used in our analysis. Primers flanking the 32 bp insertion in LjSGA_030566 and distinguishing it from CYP79D3 were 5′-GGAGAAAGATCAGAATCAATGAC-3′ and 5′-GATTCGCTGTGCAATGAGCTGAC -3′. The polymorphism between the PCR products was visualized on a 4% agarose gel. The S. bicolor genome sequence (version 1.0 release) and the genome sequence of M. esculenta (version 4.1) are both available from http://www.phytozome.net. Named cytochrome P450 sequences are available from the cytochrome P450 homepage at http://drnelson.uthsc.edu/cytochromeP450.html.

Preparation of constructs

The cDNA sequences of genes were amplified by PCR from L. japonicus MG-20 cDNA prepared from RNA extracted from young leaf tissue, and from cassava cv. TME12 cDNA. The PCR products were cloned by Gateway recombination into the entry vector pDONOR207 (Invitrogen). Expression constructs were prepared by linearizing the pDONOR207 entry clones by restriction digest, and cloned by Gateway recombination into either the S-tag vector pJAM1786 (similar to pJAM1784 but with reading frame C.1) for expression in bacteria or pJAM1502 (Luo et al., 2007) for both stable and transient expression in plants. All the PCR primers contained the attB1 and attB2 Gateway cloning sites, and the sequences are given in Table S1.

Complementation of the cyd1 mutant with CaMV35S:LjCYP79D3

The CYP79D3 cDNA sequence under the control of the CaMV 35S promoter/terminator elements in pJAM1502 was transferred into Agrobacterium tumefaciens (AGL1) by electroporation. Transformation of the cyd1 mutant with this construct was based on a method using hypocotyl explants as previously described (Lombari et al., 2005).

Cyanogenesis detection and metabolite profiling

Cyanogenesis was visualized using Feigl–Anger paper (Feigl and Anger, 1966), which was prepared by wetting Whatman 3MM paper (http://www.whatman.com/) in a 5 g L−1 chloroform solution of copper ethylacetoacetate (Alfa Aesar, http://www.alfa.com/) and 4,4′-methylenebis(N,N-dimethylaniline) (Sigma-Aldrich). The dried paper was stored at 4°C until use. Plant tissue was disrupted by a freeze–thaw cycle, and HCN release was visualized as described previously (Takos et al., 2010).

For metabolite profiling of hydroxynitrile glucosides, plant material was weighed and extracted in 300 μl of 85% v/v methanol spiked with amygdalin to 200 μm to correct for volume losses during sample preparation. The extraction was performed by boiling in a water bath for 3 min and subsequent cooling of the sample on ice. The extracts were filtered through 45 μm Ultrafree-MC Durapore PVDF filters (Millipore, http://www.millipore.com/) and diluted 1:10 in water. Extracts were stored in glass vials at 4°C prior to analysis. Analytical LC-MS was performed using an Agilent 1100 Series LC (Agilent Technologies, http://www.agilent.com) coupled to a Bruker HCT-Ultra ion trap mass spectrometer (Bruker Daltonics, http://www.bdal.com). A Zorbax SB-C18 column (Agilent, 2.1 mm id × 50 mm, 1.8 μm) was used, with chromatography conditions as described previously (Takos et al., 2010). The mass spectrometer was run in positive electrospray mode.

Quantitative real-time PCR

Transcript levels of genes were measured by real-time PCR using a SYBR Green I-based method on a Rotor-Gene Q cycler (Qiagen). The PCR reaction conditions were as previously described (Takos et al., 2010). For each primer pair, a single PCR product was obtained from each cDNA as verified by gel electrophoresis and melt curve analysis. Normalized gene expression of target genes was obtained from the difference in cycle threshold of target genes and the reference gene, taking into account primer efficiencies calculated from standard curves of diluted PCR product using Q-Gene software (Muller et al., 2002). The L. japonicus RNA polymerase II large subunit (AV777095) was used as a reference gene, and the sequences of the real-time PCR primers are given in Table S1.

Glycosyltransferase assays

Clones of the E. coli XJa autolysis strain (Zymo Research, http://www.zymoresearch.com) containing the cDNA sequences of LjUGT85K3 and LjUGT85K2 in pJAM1786 were grown at 28°C in Luria broth containing ampicillin and 3 mm l-arabinose. Expression was induced after 5 h by addition of 4 mm IPTG, and incubation was continued overnight. Cells were harvested by centrifugation for 10 min at 4°C, and briefly washed with 100 mm Tris/HCl, 2 mm EDTA, pH 7.5. Pellets were resuspended in 100 mm Tris/HCl pH 7.5, 2 mm EDTA, 5 mm DTT, containing Complete protease inhibitor cocktail (Roche, http://www.roche.com). The resuspension was frozen in aliquots of 400 μl in liquid nitrogen, and then thawed at 37°C for 5 min to lyse cells. The crude extract was clarified by centrifugation at 20,000 × g for 20 min at 4°C, and the supernatant used for glycosylation assays. All steps were performed at 4°C. The amount of recombinant S-tagged enzyme was quantified using a FRETWorks S-Tag assay kit (Novagen, http://www.merck-chemicals.com/).

The glycosylation assay was preformed as previously described (Hansen et al., 2003) in assay mixtures of 20 μl containing 0.1 μg recombinant S-tagged enzyme, 100 mm Tris/HCl pH 7.5, an initial concentration of 20 mm of putative substrates (acetone cyanohydrin or 2-hydroxy-2-methylbutyronitrile) and 3.3 μm UDP-[14C]-glucose. Assay mixtures were incubated at 30°C for 60 min, and the reaction was terminated by addition of 2 μl 10% acetic acid. Glycosylation products were separated by TLC on Silica Gel 60 F254 plates (Merck, http://www.merck-chemicals.com/) using a mobile phase of ethyl acetate:acetone:dichloromethane:methanol:water (40:30:12:10:8 by volume). Radiolabelled compounds were visualized by exposure of PhosphorImager screens (Molecular Dynamics, http://www.moleculardynamics.com/) for 4 days. Pictures were developed using a Storm 860 scanner and Storm scanner control software version 5.03 (Amersham Biosciences, http://www.gelifesciences.com/). In addition to the assay mixtures, 5 μg of unlabelled linamarin and lotaustralin were spotted on the same TLC plate. After separation, the linamarin and lotaustralin lanes were cut from the TLC plate and the hydroxynitrile glucosides were visualized by dipping in 20% v/v H2SO4 in methanol to char the sugars.

Transient expression in leaves of N. benthamiana plants

Overnight cultures of A. tumefaciens (AGL1) containing expression constructs of cDNAs under the control of CaMV 35S promoter/terminator elements in pJAM1502 and the gene-silencing inhibitor protein p19 (Voinnet et al., 2003) were grown. The cells were harvested and resuspended to an OD600 of 2.0 in 10 mm MES, 10 mm MgCl2 and 100 μm acetosyringone. After 4 h incubation at ambient temperature, the A. tumefaciens was used to infiltrate leaves of 3-week-old N. benthamiana plants. A. tumefaciens strains containing expression constructs of plant cDNAs and the strain containing the expression construct for the p19-encoding gene were co-infiltrated. After 4–5 days, leaf discs (1 cm diameter) were cut from infiltrated leaves and extracted in 85% v/v methanol for metabolite analysis as described above.

Acknowledgements

This work was financially supported by the Danmark Grundforskningsfond (Danish National Research Foundation) under the ‘Niels Bohr Visiting Professorship’ program. B.L.M. acknowledges financial support from the Villum Research Center ‘Pro-Active Plants’ and from the Danish Council for Independent Research/Technology and Production Sciences. C.K. acknowledges a PhD stipend granted by the Faculty of Life Sciences, University of Copenhagen, Denmark.

Ancillary