ISOPRENE SYNTHASE GENES FORM A MONOPHYLETIC CLADE OF ACYCLIC TERPENE SYNTHASES IN THE TPS-B TERPENE SYNTHASE FAMILY

Authors


Abstract

Many plants emit significant amounts of isoprene, which is hypothesized to help leaves tolerate short episodes of high temperature. Isoprene emission is found in all major groups of land plants including mosses, ferns, gymnosperms, and angiosperms; however, within these groups isoprene emission is variable. The patchy distribution of isoprene emission implies an evolutionary pattern characterized by many origins or many losses. To better understand the evolution of isoprene emission, we examine the phylogenetic relationships among isoprene synthase and monoterpene synthase genes in the angiosperms. In this study we identify nine new isoprene synthases within the rosid angiosperms. We also document the capacity of a myrcene synthase in Humulus lupulus to produce isoprene. Isoprene synthases and (E)-β-ocimene synthases form a monophyletic group within the Tps-b clade of terpene synthases. No asterid genes fall within this clade. The chemistry of isoprene synthase and ocimene synthase is similar and likely affects the apparent relationships among Tps-b enzymes. The chronology of rosid evolution suggests a Cretaceous origin followed by many losses of isoprene synthase over the course of evolutionary history. The phylogenetic pattern of Tps-b genes indicates that isoprene emission from non-rosid angiosperms likely arose independently.

Isoprene emission occurs from the leaves of many but not all plant species. It is produced enzymatically in the chloroplast from dimethylallyl diphosphate (DMADP; Silver and Fall 1991; Wildermuth and Fall 1996; Lichtenthaler et al. 1997; Wildermuth and Fall 1998; Schnitzler et al. 2005) and diffuses into the atmosphere through the stomata (Fall and Monson 1992). The rate of isoprene emission is strongly correlated with light intensity in a pattern closely following that of photosynthesis (Guenther et al. 1991; Harley et al. 1996, 1997).

Isoprene emission may provide benefits for emitting plants. The most frequently suggested benefits are protection from very short periods of thermal stress (Sharkey and Singsaas 1995; Singsaas et al. 1997; Sharkey et al. 2001; Siwko et al. 2007) and protection from reactive oxygen species (Loreto et al. 2001; Loreto and Velikova 2001; Affek and Yakir 2002; Vickers et al. 2009). Exposure to short periods of thermal stress is a common occurrence for leaves of broadleaved species (Singsaas and Sharkey 1998; Singsaas et al. 1999).

Isoprene emission accounts for inputs of up to 660 Tg of reactive carbon into the atmosphere annually, exceeding inputs from anthropogenic sources (Chameides et al. 1988; Guenther et al. 1995, 2006). Isoprene emissions from plants have been implicated in the production of tropospheric ozone (Brasseur and Chatfield 1991; Chameides and Lodge 1992; Chameides et al. 1992; Hofzumahaus et al. 2009), carbon monoxide, formaldehyde (Sumner et al. 2001), and aerosols (Claeys et al. 2004; Edney et al. 2005; Kroll et al. 2005, 2006; Ng et al. 2006; Kourtchev et al. 2008) though isoprene may also suppress aerosol formation (Kanawade et al. 2011). Although isoprene is not a strong greenhouse gas, its production and emission may indirectly affect the greenhouse effects of methane by increasing the residence time of methane in the atmosphere (Archibald et al. 2011). Despite its importance in global atmospheric chemistry, the biology of isoprene emission and evolutionary origins, are obscure.

Among those plants that emit, isoprene typically represents a carbon cost of 0.5% to 2% of net photosynthetic carbon assimilation, although under conditions of high temperature or when water stress restricts photosynthetic capacity, isoprene emission can exceed 15% of photosynthesis (Tingey et al. 1979; Monson and Fall 1989; Loreto and Sharkey 1990; Sharkey and Loreto 1993; Fuentes et al. 1999). The fact that emitting and nonemitting species often cooccur in ecosystems indicates that the trait of isoprene emission is not essential for plants. It has been speculated that the large carbon cost borne by isoprene-emitting species must be balanced by benefits that outweigh the carbon cost to the emitting plant (Sharkey and Yeh 2001).

Isoprene emission is widespread across the plant kingdom. In a survey of mosses and ferns, Hanson et al. (1999) found that 94% of the mosses and 50% of the ferns sampled produced isoprene. Within the gymnosperms isoprene emission has been found in all of the species in the Ephedraceae that have been examined, and has been observed from many species in the genera Picea and Abies (Harley et al. 1999). Angiosperms contain numerous isoprene-emitting taxa. Within the angiosperms, isoprene emission is common but not universal in the rosids, uncommon in the asterids, present in some grasses and palms (monocots), and sporadic in early diverging angiosperm lineages (Harley et al. 1999). In Figure 1, plant orders with at least one species reported by Kesselmeier and Staudt (1999) to emit significant amounts of isoprene are mapped onto a recent phylogeny of the angiosperms (The Angiosperm Phylogeny Group 2009) and this shows the preponderance of isoprene emission in the rosids.

Figure 1.

Phylogenetic distribution of isoprene emission. The plant families for which significant isoprene emission was reported by Kesselmeier and Staudt (1999) are shown by boxes on a recent Angiosperm phylogeny (The Angiosperm Phylogeny Group 2009; tree used with permission).

This patchy distribution of isoprene emission in plants has prompted considerable interest in the evolution of isoprene emission. A single origin of isoprene emission at the base of the land plants was proposed by Hanson et al. (1999) based in large part on the observation that isoprene emission was common in early diverging plant lineages and the idea that isoprene emission was more likely to be lost than gained. An alternative view that isoprene emission has evolved many times was put forward by Harley et al. (1999) and Lerdau and Gray (2003). This “multiple origins” hypothesis was supported by work on the evolution of monoterpene synthases (Bohlmann et al. 1998; Trapp and Croteau 2001) showing that the terpene synthases of angiosperms and gymnosperms are independently derived. Monson et al. (in press) proposed that evolution of the capacity for isoprene synthesis was almost as likely as loss of the capacity for emission and used phylogenetic reconstructions to infer many gains and losses of function.

Sequencing of genes for isoprene synthase in Populus and Pueraria montana (Miller et al. 2001; Sasaki et al. 2005; Sharkey et al. 2005; Fortunati et al. 2008; Vickers et al. 2010) showed they were quite different but it was unclear if this pattern reflects rapid evolution or multiple origins of these genes. Methylbutenol is closely related to isoprene and the MBO synthase of Pinus makes a small amount of isoprene (Gray et al. 2011). The DNA sequences of the known isoprene synthases from angiosperms place them within the Tps-b group of angiosperm monoterpene synthases (Sharkey et al. 2005; Lee and Chappell 2008; Keszei et al. 2010; Gray et al. 2011) whereas the MBO synthase gene nests within the gymnosperm Tps-d monoterpene synthase clade (Gray et al. 2011). The genomic organization of monoterpene synthases in angiosperms and gymnosperms is very different. Angiosperm monoterpene synthase genes contain 6 introns and 7 exons whereas gymnosperm monoterpene synthases contain 9 introns and 10 exons (Trapp and Croteau 2001). This suggests that when a gymnosperm isoprene synthase is found, it too will group with the gymnosperm monoterpene synthases and almost assures that isoprene emission arose independently in the angiosperms and gymnosperms. Gene sequences can also provide information on the evolution of isoprene synthesis within the angiosperms.

The gene for isoprene synthase is known from two groups of plants, Populus sp. in the order Malpighiales and P. montana (kudzu) in the order Fabales. Both genes nest within the angiosperm monoterpene synthase Tps-b clade, and appear to have evolved from monoterpene synthases. Modeling by Sharkey et al. (2005) and a recently solved isoprene synthase crystal structure (Köksal et al. 2010) identified two homologous Phe residues that seem to play a role in reducing the active site volume such that the 10-carbon substrate of the monoterpene synthases, geranyl diphosphate (GDP), will not fit. Despite sharing these two Phe residues, the amino acid sequences of the isoprene synthases from Populus sp. and kudzu share remarkably little AA identity (53%). In angiosperms, it is possible that isoprene synthase genes arose independently in each plant order (13 times or more if it arose more than once in each order) or once at the base of the angiosperms, with subsequent losses, or something in between these extremes.

Acquiring isoprene synthase genes from additional species and examining them within the context of the angiosperm monoterpene synthase genes, holds promise for distinguishing between the single origin and multiple origin hypotheses for the evolution of isoprene emission within the angiosperms. An apparently monophyletic grouping of isoprene synthases could result from a single evolutionary origin or from functional constraints or convergent evolution. The Tps-b terpene synthases have two distinct domains with the N-terminal domain likely having no specific role in the enzymatic reaction. Therefore, the question of functional constraints can be tested by assessing the relationships among genes using just the N-terminal domain sequence or just the C-terminal domain sequence, which contains nearly all of the active site residues. To better understand the evolution of isoprene synthase in the angiosperms, we identified new isoprene synthase genes and compare them with known monoterpene synthases.

Materials and Methods

RNA EXTRACTION AND cDNA SYNTHESIS

Leaves were harvested, washed in deionized water to remove surface contaminants, flash frozen in liquid nitrogen, and stored at −80°C until RNA extraction. Total RNA was extracted using a modified CTAB procedure adapted from Chang et al. (1993). Leaf tissue (100–250 mg) was ground to a fine powder under liquid nitrogen using a mortar and pestle and extracted with 800 μL of hot (70°C) extraction buffer by shaking vigorously for 30 s and incubating at 70°C for 30 min. Extraction buffer contained 2% CTAB (hexadecyl-cetyltrimethylammonium bromide), 100 mM Tris-HCl pH 8.0, 2M NaCl, 25 mM EDTA, 2 mM ascorbic acid, 0.5 mg/ml spermidine, and 4% PVP-40, with 4% PVPP and 4%β-mercaptoethanol added immediately before use. This extraction buffer was amended with an additional 4% of PVP-10 when used to extract RNA from leaves of Robinia pseudoacacia. Following incubation, the aqueous phase was extracted twice with equal volumes of chloroform:isoamyl alcohol (24:1) by shaking to form an emulsion and centrifuging to separate the aqueous and organic layers. RNA was precipitated with one fourth volume of 10M LiCl by incubating at 4°C overnight and collected by centrifugation. RNA was further purified by resuspending in 500 μl SSTE buffer (1M NaCl, 0.5% SDS, 10 mM Tris-HCl pH 8.0, 1 mM EDTA), and extracted once with equal volumes of phenol, phenol:chloroform:IAA (24:24:1), and chloroform:IAA (24:1). Following the final extraction, RNA was precipitated with 2 volumes of 70% ethanol at −20°C for 2 h. RNA was collected by centrifugation (20 min 16,000 ×g, 4°C), the resulting pellet washed twice with 1 ml of cold 70% ethanol, air dried, and resuspended in 50 μl of RNAse-free water. Purified RNA was stored at −80°C until use in cDNA synthesis. First-strand cDNAs were synthesized using Moloney murine leukemia virus reverse transcriptase (Invitrogen) following the manufacturer's instructions.

HOMOLOGY-BASED CLONING

To obtain isoprene synthase genes from Populus and Salix, primers were designed to match the known sequence of isoprene synthase from Populus alba. Primer pairs designed to match the DNA sequence surrounding the start codon and stop codon successfully amplified isoprene synthase genes from cDNA. Amplification was carried out using Phusion High Fidelity DNA polymerase (Finnlabs) with an initial denaturation at 98°C for 2 min followed by 35 to 45 cycles of denaturation at 98°C for 15 s, annealing at 63°C for 30 s, and extension at 72°C for 90 s, with a final extension at 72°C for 5 min. PCR reactions contained 0.5 U of Phusion DNA polymerase, 0.25 μl cDNA, 0.2 mM dNTPs, 0.5 μM of each primer, and the manufacturer's HF buffer at 1× concentration in a 20 μl total reaction volume.

To obtain isoprene synthase genes from Robinia pseudoacacia and Wisteria, degenerate primers were designed to the N-terminal RRX8W motif and the DDXXD region of the amino acid sequence of isoprene synthase from P. montana. PCR reactions performed using 1.25 U of GoTaq DNA polymerase (Promega), 0.25 μl cDNA, 0.2 mM dNTPs, 0.6 μM of each primer, and the manufacturer's buffer at 1× concentration were carried out with an initial denaturation at 95°C for 2 min followed by 35 to 45 cycles of 95°C for 30 s, 50°C for 30 s, and 72°C for 3 min, and a final extension at 72°C for 9 min and yielded amplicons with high homology to the P. montana isoprene synthase sequence. Rapid amplification of cDNA ends (3′-RACE) was used to amplify the 3′ end of the gene from cDNA using a gene specific forward primer and an adapter primer used to prime the initial cDNA synthesis reaction (Frohman 1993). Amplification was carried out in two steps with an initial denaturation at 95°C for 2 min followed by one cycle of denaturing at 95°C for 30 s, annealing at 48°C for 5 min, extension at 72°C for 40 min, and a second series of 35 to 45 cycles of denaturation at 95°C for 30 s, annealing at 55°C to 65°C for 30 s, and extension at 72°C for 3 min, with a final extension at 72°C for 9 min. Reactions yielding positive PCR hits were purified, sequenced directly, and the resulting sequence data used to design primers specific to the 3’end of the gene. Pairing primers designed to the RRX8W motif and the 3’end of the gene allowed amplification of fragments suitable for expression in Escherichia coli.

SCREENING GENBANK FOR PUTATIVE ISOPRENE SYNTHASES

Blast searches of the GenBank database were performed using the amino acid sequences of the known isoprene synthases from Populus and kudzu as queries. The resulting sequences identified were screened for the presence of two Phe residues that have been implicated in reducing active site volume in isoprene synthases relative to monoterpene synthases. We identified sequences from Eucalyptus globulus (AB266390) and Melaleuca alterniflora (AY279379), which contained both Phe homologs, and a myrcene synthase from Humulus lupulus (EU760349) that contained the Phe 7 residues upstream of the DDXXD region but lacked the second Phe residue homologous to F485 in P. alba. These sequences were selected for functional expression to determine whether they possessed isoprene synthase activity.

FUNCTIONAL EXPRESSION

Isoprene synthase genes from Populus sp. and Salix were ligated into the pET28 expression vector, and transformed into E. coli BL21 RIL for protein expression. Isoprene synthase genes obtained from Robinia pseudoacacia, and the putative isoprene synthase genes obtained by screening GenBank (Eucalyptus#AB266390, Melaleuca#AY279379, and Humulus#EU760349) were codon optimized for expression in E. coli using the GeneDesigner software package (DNA 2.0) ligated into the expression vector pJexpress 401(DNA 2.0, Menlo Park, CA.), and transformed into E. coli BL21 RIL for protein expression. Cell cultures of transformed E. coli cell lines were grown to an OD600= 0.8 at 37°C with shaking at 250 rpm in 100 ml volumes of Terrific Broth (TB) contained in 500 ml Erlenmeyer baffle flasks. Cultures were induced with 2 mM isopropyl-β-d-thiogalactopyranoside (IPTG) and grown for an additional 3 h at 34°C for protein expression. Cells were harvested by centrifugation (4000 ×g, 20 min, 4°C), resuspended in lysis buffer (50 mM NaH2PO4, 2 M NaCl, 1 mM DTT, 20 mM β-mercaptoethanol, 0.1% Tween-20, 40% glycerol), sonicated to break open cells, and centrifuged to pellet cell fragments (5000 ×g, 45 min, 4°C). The supernatant was used in enzyme assays.

ENZYME ASSAY AND CHARACTERIZATION

Enzyme assays were performed in 20 μl total reaction volumes contained in 2 ml screw cap vials. Reaction mixtures contained 100 mM Hepes buffer pH 8.0, 20 mM DTT, 20 mM MgCl2, 40% glycerol, 11 mM DMADP, and 5 μl of crude extract. To test for monoterpene synthase activity, 2 mM GDP was substituted for the DMADP. Vials were incubated at 37°C for 30 min after which a 2 ml sample was withdrawn from the vial using a syringe, and manually injected onto a stainless steel cryotrapping sample loop immersed in liquid nitrogen and purged with helium. The cryo-concentrated sample was flash desorbed and injected onto the column of a Shimadzu GC17A equipped with a photoionization detector. Separation occurred on a DB1 column (30 m, 0.32 mm ID, 5 μm film; JW Scientific/Agilent Technologies, Palo Alto, CA) held isothermally at 60°C with a head pressure of 23 psi. Headspace concentrations of volatiles were determined from 4 point standard curves run daily by injecting various amounts of 100 ppm isoprene in N2 standard from a tank obtained from Scott-Marrin (Riverside, CA). The gene from E. globulus was further characterized using a separate gene synthesized (DNA 2.0) with a His tag. The supplied plasmid was used to transform DH5αE. coli. Cells were grown and isoprene synthase purified using a nickel column. Assays were carried out with a range of DMADP concentrations to determine kinetic constants. The kinetic analysis was carried out by measuring isoprene with a Fast Isoprene Sensor (Hills Scientific, Boulder, CO; Hills and Zimmerman 1990).

PHYLOGENETIC RECONSTRUCTION

The phylogenetic position of the novel isoprene synthases identified in this study within the angiosperm terpene synthase gene family was estimated in a Bayesian framework with MrBayes using amino acid sequence data. Angiosperm monoterpene synthase genes were selected from the monoterpene synthases with known function (Table 1 of Degenhardt et al. 2009). When highly similar genes (>95% amino acid identity) from the same genus were present, only one was chosen to reduce excessive recent branching. Isoprene synthases were added to this collection and then additional genes were sought to increase the representation across phylogenetic groups in the angiosperms. Several genes from Glycine max were predicted by Gnomon (http://www.ncbi.nlm.nih.gov/genome/guide/build.shtml) as “isoprene synthase, chloroplast-like” and these were included. Terpene synthases from species that have been fully sequenced (Vitis, Arabidopsis, Populus) were included for additional gene diversity. This resulted in a group of 70 amino acid sequences. Amino acid sequence data for terpene synthases was obtained from GenBank (Benson et al. 2011) and combined with the amino acid sequences of the genes identified in this study. Sequences were aligned with CLC Sequence Viewer (CLC bio http://www.clcbio.com/).

Table 1.  Identity and similarity of isoprene synthase amino acid sequences.
TaxonMalpighialesFabalesMyrtalesRosales
Populus Salix Robinia Wisteria Pueraria Eucalyptus Melaleuca Humulus
 
  1. Values below (–) line =% AA identity. Above line (–) =% AA similarity.

Populus 98716868716861
Salix 97   73 69 69 71 68 61
Robinia 55568888686862
Wisteria 53 53 77   86 69 69 61
Pueraria 53537975666763
Eucalyptus 53 53 52 50 51   90 59
Melaleuca 49505149508260
Humulus 45 45 42 42 42 41 40  

The resulting alignment was edited by hand to correct for minor misalignments to yield a final alignment containing 70 taxa and 731 sites. The amino acid dataset was analyzed with MrBayes version 3.2 (http://mrbayes.sourceforge.net/) Markov chain Monte Carlo algorithm using the Jones matrix (Jones et al. 1992) for fixed rate amino acid substitution as recommended in the MrBayes 3.2 manual with 1 million generations in which 1 of every 1000 trees was sampled. Posterior probabilities of splits were obtained from the 50% majority rule consensus of sampled trees after discarding the first 25% as burn-in. Phylogenetic trees were rendered in FigTreev1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/). The trees were rooted with the Tps-g monoterpene synthases (Dudareva et al. 2003). The N-terminal alignment was taken from the complete alignment by deleting all residues starting with the second of two strictly conserved Trp residues making an alignment of 403 characters. The C-terminal alignment was taken from the second Trp to the end and had 328 characters.

Trees based on DNA were generated for a subset of genes that included all of an isoprene/ocimene synthase clade plus one gene found to be sister to this clade in the protein tree plus TPS10 from Arabidopsis. A Neighbor Joining tree with 1000 bootstraps was carried out with CLCSequence viewer and a tree based on output from MrBayes was produced as above using the default values recommended in MrBayes and with 1,000,000 generations. In all cases the average standard deviation of split frequencies declined to 0.01 or less during the run.

Results

CLONING NOVEL ISOPRENE SYNTHASES

Nine new isoprene synthase genes were identified using a homology-based cloning approach and by searching GenBank for genes annotated as terpene synthases with active site features consistent with those found in the known isoprene synthases from Populus and P. montana. In the Malpighiales we identified isoprene synthases from Populus balsamifera (JN173037), Populus deltoides (JN173039), Populus fremontii (JN173040), and Populus grandidentata (JN173038), and an isoprene synthase from Salix (JN173043). In the Fabales we identified isoprene synthases from Robinia pseudoacacia (JN173041), and Wisteria (JN173042). In the Myrtales we confirmed the identity of genes from E. globulus (AB266390) and M. alterniflora (AY279379) as isoprene synthases. Genes were expressed in E. coli, and crude extracts showed isoprene synthase activity with DMADP but no activity with GDP.

Isoprene synthases derived from species in the same order show a high degree of amino acid identity and similarity (Table 1) whereas there is much less similarity or identity between isoprene synthases from species in different orders. Within the Malpighiales isoprene synthases are >97% identical at the amino acid level. Within the Fabales amino acid identity of isoprene synthases is >75%, and within the Myrtales amino acid identity of isoprene synthases is >82%. This high degree of identity within a plant order corresponds to a high degree of support (posterior probability 100%) for gene relationships within a plant family (Fig. 2). Between plant orders isoprene synthases show (49%–55%) amino acid identity and (68%–73%) similarity (Table 1). The kinetics of the Eucalyptus enzyme were determined and it was found this enzyme had a lower Km than known isoprene synthases (0.16 mM) higher kcat (0.195 s−1) than other isoprene synthases, and substrate inhibition (Ki= 0.9 mM) as seen in Populus sp. isoprene synthases (Schnitzler et al. 2005) but no evidence of strong cooperativity found in kudzu (Sharkey et al. 2005).

Figure 2.

Relationships between isoprene synthases and angiosperm monoterpene synthases. Amino acid sequences were analyzed in a Bayesian Framework using MrBayes 3.2 for 106 generations. Red branches indicate the rosid group, blue labels indicate taxa belongs to asterid group, green are basal taxa and pink are monocot taxa. Brown tips indicate genes with isoprene synthase activity. Numbers at nodes represent posterior probabilities of splits and the tree illustrated represents a 50% consensus tree. Blue dots show gene pairs of Vitis vinifera. Boxes show the extent of the isoprene/ocimene clade, the true isoprene synthases, Arabidopsis thaliana proteins, and three prominent groups of enzymes from mints (Lamiaceae). Tree drawn in FigTree 1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/).

ISOPRENE SYNTHASES AND OCIMENE SYNTHASES FORM A MONOPYLETIC CLADE IN THE TPS-B GENE FAMILY

A phylogenetic tree based on a Bayesian analysis of amino acids sequences showed significant structure within the Tps-b family (Fig. 2). Several Tps-g sequences were used to root the tree. Tps-b enzymes have a canonical RRX8W sequence and typically this sequence is RRSANYXPX2W. The Tps-g/Tps-b border in the tree is defined by this sequence very near the N-terminus. One section of the true Tps-b enzymes contains genes from magnoliids (e.g., Litsea cubeba) and genes from a large number of rosids but no asterids. All of these Tps-b genes (similar to the grouping labeled Clade 2 by Martin et al. 2010) produce acyclic products. All previously known isoprene synthases and all new isoprene synthases reported here fall into this Tps-b2 family. The other Tps-b genes (clade 1) contain many genes that make cyclic products. Clade 1 contains two genes from sorghum, the only example of Tps-b genes in grasses that could be found. Litsea cubeba had a gene in clade 1 as well as clade 2 suggesting that clades 1 and 2 result from an ancient duplication in the Tps-b gene family with only one branch developing the capacity for cyclization. Cyclization of monoterpenes involves a complex rearrangement of GDP to linalyl diphosphate (Fig. 3).

Figure 3.

Chemistry of formation of isoprene, ocimene, and cyclic monoterpenes (represented by limonene). Cyclization involves several steps not required for isoprene or ocimene synthesis. Figure made in ChemDraw.

The relationships deduced from Tps-b genes are incongruent with the species relationships in many places. This could be because of extensive gene duplications followed by extensive gene losses or phylogenetic errors that result from selective pressures on amino acid replacements.

The proteins that are known isoprene synthases cluster together in a monophyletic group (Fig. 2). Protein BAF 02831 from E. globulus is annotated as a predicted monoterpene synthase but was shown to be an isoprene synthase here and it clusters with all the other known isoprene synthases. However, other considerations are not completely consistent with a monophyletic origin of the isoprene synthases. Populus sp. (Malpighiales) and kudzu (Fabales) are in the fabid section of the rosids whereas Eucalyptus (Myrtales) is in the malvid section but phylogenetic analysis and enzyme kinetics suggested that the Populus isoprene synthase was more similar to the Eucalyptus than to the kudzu gene. In addition, although the Bayesian analysis gave strong support to the monophyly of isoprene synthases (posterior probability > 99%), a Neighbor Joining tree did not confirm this (Fig. S1). Trees of this clade based on DNA, or just the third codon DNA could not resolve the relationships among isoprene synthases from the different orders of plants (Fig. 4).

Figure 4.

Tree based on DNA sequences for the genes in the isoprene/ocimene synthase clade together with Tps47 of Vitis vinifera and TPS10 of Arabidopsis thaliana. Sequences were aligned in CLCSequence viewer and then adjusted by hand to ensure triplets were aligned and that codons for highly conserved amino acids were aligned. The left side is a tree based on complete codons, on the right is a tree based only on third position bases. The analysis was carried out using MrBayes 3.2 as recommended with 1,000,000 generations.

The isoprene synthase enzyme contains two domains, a catalytically active C-terminal α-domain and a noncatalytically active N-terminal β-domain. The C-terminal domain contains the active site, whereas the N-terminal domain has little interaction with the substrate. Of these two domains, the N-terminal domain is less likely to be under selective pressure related to enzyme function, and may therefore provide a more unbiased measure of underlying phylogenetic relationships. Reanalyzing the isoprene synthase amino acid sequences using only the data from the N-terminus (between the highly conserved RRSANY motif and the completely conserved WW motif in the A helix) produced a tree that did not confirm the monophyly of the isoprene synthases but a tree based on the C-terminus amino acids did confirm monophyly (Fig. 5).

Figure 5.

Trees based on either the N-terminal or C-terminal parts of the enzymes. The analysis was carried out with the 70 species shown in Figure 2 but just the isoprene/ocimene synthase clade is shown. The scale bar shows frequency of change per site and the figures of the two trees were stretched to make the scale the identical.

The isoprene synthase clade is part of a larger clade of proteins that also contains either acyclic-monoterpene synthases [e.g., ABY65110 is an (E)-β-ocimene synthase (Arimura et al. 2008)] or proteins whose function has not been experimentally verified. The monophyly of this isoprene/ocimene synthase clade was supported in all analyses undertaken (e.g., a Neighbor Joining analysis found 100% bootstrap support for monophyly of this clade). All of the proteins within this isoprene/ocimene synthase clade are derived from members of the rosid group of angiosperms. Within this clade are two pairs of sequences from Glycine max that are annotated in GenBank as putative isoprene synthases (Benson et al. 2011).

AMINO ACIDS UNIQUE TO ISOPRENE SYNTHASES

The single function isoprene synthase enzymes all contained a pair of Phe residues in positions homologous to F338 (most commonly Ile in other proteins) and F485 (most commonly Val in other proteins; numbers based on the Populus alba sequence; shown in magenta in Fig. 6). Two other amino acids appeared to be specific to isoprene synthases. These are S445 (most commonly Val or Ile in other proteins, shown in brown in Fig. 6), the middle of a triple serine motif that is at a bend in helix G and that contributes some surface to the active site. Finally, at position 505 isoprene synthases have an asparagine (green in Fig. 6) whereas nearly all other Tps-b proteins have a lysine at this position. An “isoprene score” was devised consisting of how many of these four amino acids are the cannonical F338, S445, F485, and N505. All known isoprene synthases have a score of four whereas the average score for all of the other enzymes is less than 0.5, no protein has a score of 3, and only four have a score as high as 2. Of the four Glycine max putative isoprene synthases, two have an isoprene synthase score of 4 whereas two have scores of 1.

Figure 6.

Structure of isoprene synthase (PDB 3NOG) showing the positions of the four amino acid residues that are specific to isoprene synthases. These are F338 (magenta on right), S445 (brown), F485 (magenta on left), and N505 (green, with N-atom in blue and carbonyl in red). The substrate analog dimethylallyl-S-thiolodiphosphate is shown along with aspartates of the DDXXD conserved region (red) and magnesium atoms (blue spheres surrounded by dots). Figure made in MacPyMol.

Because Glycine max does not emit isoprene (T. D. Sharkey, unpubl. data) the DNA sequences of Glycine max putative isoprene synthases with scores of 4 were examined. The coding regions were very similar to that of P. montana but the stop codon was absent in both putative isoprene synthases. The database reported a longer protein in both cases. The exons were generally much longer on the Glycine max sequences and the 5’ UTR showed evidence of DNA insertions. The P. montana sequence contains 250 base pairs upstream of the translation start site and these were sufficient to cause expression when the gene was put into Arabidopsis. A TATA box near the beginning of this sequence was missing in both Glycine max sequences. Finally, an exon border appeared to be missing in one of the Glycine max sequences and the predicted protein was wrong as a result. A corrected protein sequence could be translated from the original genomic DNA and this predicted protein sequence was used in analyses here.

Given the monophyly of the isoprene/ocimene clade and potentially similar chemistry, the amino acid alignment was searched for residues unique to the isoprene/ocimene clade that might be involved in the reaction mechanism. Asparagine 489 (Populus sequence number) was identified. In nearly all other Tps-b genes this position has an aspartate residue but all proteins in the isoprene/ocimene clade have asparagine. The side chain nitrogen of this amino acid is located at the top of an empty space adjacent to the active site in the crystal structure (Fig. 7). This would put it in an ideal place to coordinate a water molecule, which could serve as the general base in the reaction mechanism. Abstraction of the proton closest to this location from GDP would make (E)-β-ocimene. The Vitus vinifera enzymes in the isoprene/ocimene clade make only (E)-β-ocimene whereas ocimene synthases outside this clade (e.g., Arabidopsis thaliana Tps02 and 03) make both ocimene and myrcene. This is consistent with a specific proton abstraction mechanism in the isoprene/ocimene clade enzymes but a less specific mechanism outside this clade.

Figure 7.

The substrate analog dimethylallyl-S-thiolodiphosphate shown with mesh indicating active site surface of isoprene synthase (3NOG) plus the asparagine 489 characteristic of isoprene/ocimene clade proteins. The N atom is in a good location to coordinate a water molecule that could serve as a base for the abstraction of a proton from the correct position to make (E)-β-ocimene.

IDENTIFICATION OF A SECONDARY ISOPRENE SYNTHASE ACTIVITY IN THE MYRCENE SYNTHASE FROM H. LUPULUS

Searching GenBank for sequences similar to isoprene synthase revealed a myrcene synthase gene/protein (EU760349 / ACI32638) from H. lupulus (hops) that contained a Phe residue homologous to F338 in the isoprene synthase from Populus alba but lacked the second Phe residue homologous to F485 in P. alba (Table 1). Instead, this position is occupied by a Val residue (V502). Examining the Humulus myrcene synthase allows a test of the importance of the second Phe residue in generating isoprene synthase function. The Humulus myrcene synthase gene was artificially synthesized and cloned into E. coli for protein expression. Crude extracts showed isoprene synthase activity with DMADP, but at a lower rate than that observed for the other isoprene synthases identified in this study.

Discussion

All currently published, and several additional sequences reported here, are from the rosid section of the angiosperms. Phylogenetic reconstructions based on both DNA and amino acid sequences suggest that isoprene synthases in rosids form a monophyletic clade closely related to several (E)-β-ocimene synthases. This indicates that there could be a functional constraint on the evolution of isoprene emission. This would reduce the number of independent events of isoprene emission relative to that suggested by Monson et al. (in press) but both this analysis and that of Monson et al. agree that the number of independent isoprene synthase evolution events is large, likely greater than 10. Given the simplicity of the chemical mechanism of isoprene synthesis from DMADP some of these other isoprene synthase may be very different from the currently known isoprene synthases.

FUNCTIONAL CONSTRAINTS AND MONOPHYLY OF ISOPRENE SYNTHASES

Isoprene synthases in the rosids, accounting for a large proportion of the total isoprene emission into the atmosphere, may have arisen just once or nearly simultaneously from a slightly larger clade of genes that include ocimene synthases. Several amino acids appear to be essential to isoprene synthases. The importance of two Phe residues (F338 and F485 in P. alba) in creating isoprene synthase activity was first suggested by modeling exercises conducted by Sharkey et al. (2005), and later supported when the crystal structure of poplar isoprene synthase was solved by Köskol et al. (2010). Using these two Phe residues as markers for isoprene synthase activity, we identified several new isoprene synthases using a homology based cloning approach and by screening GenBank for sequences with probable isoprene synthase function. To date, the presence of both Phe residues has been an unambiguous marker for isoprene synthase function in angiosperms.

The first of these Phe residues (homologous to F338 in P. alba) appears to close off the rear of the active site and is present in all of the single-product isoprene synthases as well as the myrcene synthase from H. lupulus. The second Phe residue (homologous to F485 in P. alba) closes off the H-helix side of the active site and prevents GDP from binding in an alternate conformation (Gray et al. 2011). When both Phe residues are present the resulting enzymes are obligate isoprene synthases. None of the isoprene synthases containing both Phe residues showed any activity with GDP in this study.

The other two amino acid residues specific to isoprene synthases are S445 and N505. S445 occurs in the middle of a triple S and forms a bend in helix G that pushes into the active site (Fig. 7). Most other enzymes have a hydrophobic residue in between the two S residues. This would have a relatively large effect on the hydrophobicity of the active site in this region.

Residue N505 occurs at a location that determines the requirement for K+ (Green et al. 2009). Most angiosperm monoterpene synthases have a Lys at this location and require little K+ for activity. Many gymnosperm terpene synthases have an uncharged amino acid at this location and are dependent on K+ for activity. All known isoprene synthases have an Asn at this location. Isoprene synthase is stimulated some by K+ (Köksal et al. 2010) but K+ is not essential for activity.

The tree based on N-terminal sequences did not completely resolve the true isoprene synthases from the ocimene synthases. The N-terminal sequences appeared to have evolved at a faster rate than the C-terminal sequences. This makes it difficult to conclude with certainty whether isoprene synthase arose once or has arisen several times within the isoprene/ocimene clade. The fact that Eucalyptus and Populus genes are normally more similar to each other than to Fabales sequences indicates that these could represent two independent events leading to isoprene synthase. Eucalyptus and Populus isoprene synthases share the property of substrate inhibition at high substrate concentration, which is not seen in P. montana reinforcing the idea that Eucalyptus and Populus isoprene synthases have a common origin which is different from that of P. montana.

The myrcene synthase from H. lupulus will also catalyze the conversion of DMADP to isoprene. Because the Hops gene contains a Phe residue (F354) homologous to F338 in P. alba but lacks a Phe residue homologous to F485 in P. alba, the presence of F354 in the Hops myrcene synthase is sufficient to allow the enzyme to function as an isoprene synthase in vitro, and the second Phe residue (F485 in P. alba) is not required for isoprene formation. The Hops gene catalyzes isoprene production from DMADP, but does so at a lower rate than the single-function isoprene synthases, suggesting that the active site configuration of this enzyme is less optimal for isoprene production. This may be due to the replacement of the residue homologous to F485 in P. alba with a Val in the Hops myrcene synthase. Phenylalanine contains a large aromatic side chain with a high density of π electrons. These π electrons have been implicated in stabilizing the carbocation formed in the initial stages of the terpene synthase reaction mechanism (Dougherty 1996; Christianson 2006). Valine lacks these π electrons and may result in catalysis of DMADP into isoprene proceeding less efficiently.

Two sequences from Glycine max look to be isoprene synthases but several lines of evidence indicate these are pseudogenes that no longer function. These sequences are not found in EST databanks. The fact that these two genes have been disabled in different ways indicate it may be relatively easy to lose the trait of isoprene emission, perhaps especially when plants are under strong selection pressure because of use as crops. Selection during crop breeding will select for high rates of photosynthesis and therefore select for open stomata. This will decrease the high temperature episodes and so reduce the benefits of emitting isoprene. Soybeans (Glycine max) appear to have lost the capacity for isoprene emission relatively recently and pseudogenes are still apparent in the soybean genome.

GENOMIC ORIGINS OF Tps-b DIVERSITY

It is likely that Tps-b genes evolved from Tps-g genes (Tholl and Lee 2011). Tps-g enzymes typically make linalool, an acyclic alcohol. The event that gave rise to Tps-b genes must have occurred very early in the evolution of angiosperms because Tps-b genes are present in the Laurales (Litsea cubeba). Litsea cubeba has three terpene synthases. LcTps1 (Chang and Chu 2011) makes only ocimene, primarily in leaves (Fig. 2). LcTps2 makes primarily α-thujene and LcTps3 makes a range of products, mostly cyclic monoterpenes, mostly in fruits (Chang and Chu 2011). This fundamental split between enzymes that make (E)-β-ocimene and some acyclic alcohols versus enzymes that make cyclic monoterpenes must have occurred very early in the evolution of angiosperms. This could represent an ancient genome-wide duplication followed by gain of the ability to carry out the complex cyclization chemistry (Fig. 3). Also quite early was the emergence of the isoprene/ocimene clade. Tps-b enzymes typically have transit peptides and function in chloroplasts but a number of Tps-b enzymes are now known that apparently are not targeted to chloroplasts. These enzymes likely are making farnesene, farnesol, and nerolidol given the similar chemistry required to make farnesene and ocimene [the most common farnesene isomer, E,E,α-farnesene, results from abstraction of the same proton that causes production of the most common ocimene, (E)-β-ocimene (Gray et al. 2011)].

Despite ample evidence of the acyclic-only Tps-b genes emerging prior to the emergence of asterids, no asterid sequences were found in this clade. This suggests that asterids lost the isoprene/ocimene type of Tps-b gene and the large number of Tps-b genes in some asterid lineages arose relatively recently, likely through either whole-genome duplication or localized duplication. A similar loss followed by radiation is apparent in Arabidopsis thaliana genes. There is a Tps-g enzyme that makes only linalool, and a group of Tps-b genes that include a pair of genes that make myrcene and ocimene, and one coding for an enzyme that can make cyclized products. All of the Tps-b genes appear to have a single origin followed by several rounds of duplication and neofunctionalization (Fig. 2; Tholl and Lee 2011). Terpene synthases from Solanaceae and Cannibaceae species also appear to have a single progenitor. Tomato has a range of Tps-b genes (Falara et al. 2011) although none in the isoprene/ocimene clade. A search (tblastn) restricted to potato turned up no Tps-b genes despite the fact that the potato genome has been sequenced (2011). The monophyly of Tps-b genes in several families, along with the pseudogenes in Glycine max, indicates that terpene synthase genes are easily lost.

However, Vitis vinifera appears to have retained many Tps genes. The Vitales have numerous Tps-g genes (not shown; Martin et al. 2010). In addition, they have genes basal to the isoprene/ocimene clade, a pair of ocimene synthase genes in the isoprene/ocimene synthase clade, and a pair of genes in the cyclic Tps-b clade (gene pairs are indicated with blue dots on Fig. 2). This is consistent with ancient duplications leading to the acyclic versus cyclic genes and a more recent genome-wide duplication, giving rise to the pairs.

If the known isoprene synthases arose after the divergence of the rosids then it is likely that a different evolutionary event is responsible for the ability of some magnoliids and other groups to make isoprene. The genome for date palm is currently being annotated (http://qatar-weill.cornell.edu/research/datepalmGenome). Four genes were annotated as terpene synthases but none belong to the Tps-b family. Two are closely related to kaurene synthases and the other two appear to be very similar but with varying lengths of the N terminus, either a very recent duplication or mis-annotation. The sequence PDK_30s1006771g001 was annotated as a sesquiterpene synthase but is the closest to the monoterpene synthases among date palm sequences. This sequence was most closely related to other monocot sequences (data not shown). No evidence could be found for an isoprene synthase in angiosperm lineages other than the rosids similar to the known isoprene/ocimene clade synthases. Thus, isoprene synthase may have arisen as many as five times in the angiosperms, once in the rosids and once each in the magnoliids, commelinids, Ranunculales, and Proteales. It is also possible that the occurrence in the Ericales represents an independent evolution of isoprene synthase. Isoprene emission also arose at least once in the gymnosperms and must have arisen independently in ferns plus mosses, possibly independently. Therefore, it is likely that isoprene synthases have arisen at least seven times and potentially more often.

EVOLUTION OF ISOPRENE EMISSION

The presence of Vitales genes in the isoprene/ocimene clade and the presence of isoprene emission in the Saxifragales (Liquidambar) at the base of the rosid clade together with the low frequency of isoprene emission in the asterids (Fig. 1; Harley et al. 1999), which are a sister clade to the rosids, suggests that the origin of many isoprene synthases may have occurred at the base of the rosids.

Recent work on the evolution of the rosid angiosperms suggests that the rosids evolved during the Cretaceous and that they underwent a rapid radiation into the current diversity of families within a 4 to 5 million year period (Wang et al. 2009). This rapid radiation into families may explain why isoprene synthases do not reflect the phylogeny in Fig. 1). Using molecular clock based evidence, Wang et al. (2009) estimated a time of origin of the rosids at 125-95 mya, and the divergence between the Malvidae and Fabidae at 112-93 mya.

A single origin of isoprene emission at the base of the rosids requires there to have been an origin of isoprene emission in the early Cretaceous followed by many losses of isoprene emission over the subsequent course of rosid evolution. Losses of isoprene emission may be easy to understand given the high cost of emission in isoprene-emitting taxa. Isoprene emission routinely accounts for as much as 2% of net photosynthesis in emitters, but under stress that inhibits photosynthetic capacity, the cost of emission can be much higher (Tingey et al. 1979; Monson and Fall 1989; Loreto and Sharkey 1990; Sharkey and Loreto 1993; Fuentes and Wang 1999). It is reasonable to infer that this high carbon cost could place isoprene-emitting taxa at a disadvantage relative to nonemitting taxa in the absence of a benefit conferred by isoprene. The frequent loss of Tps-b genes in other lineages further supports the likelihood of gene loss explaining much of the varied occurrence of the capacity for isoprene emission among extant plants.

A paleohistorical perspective may provide insight into why isoprene emission has been lost in so many plant taxa. Sharkey and Yeh (2001) suggested that atmospheric CO2 levels may have played an important role in shaping the evolution of isoprene emission. During the Cretaceous, when isoprene emission appears to have arisen in the rosids, CO2 levels were greater than 1000 ppm (Retallack 2009; Doria et al. 2011). Under high CO2, photosynthesis operates more efficiently in C3 plants due to a reduction in photorespiration (Jordan and Ogren 1984; Ehleringer et al. 1991; Cowling and Sage 1998).

Following the Cretaceous, atmospheric CO2 levels declined reaching unusually low levels in both the Miocene (260–300 ppm), and the Pleistocene (180–240 ppm; Retallack 2009; Doria et al. 2011). Under low CO2 conditions, C3 plants experience reductions in photosynthesis (Cowling and Sage 1998; Gerhart and Ward 2010; Possell and Hewitt 2011), reduced growth and may experience delayed reproduction or reproductive failure. Although photosynthesis declines under low CO2, the response of isoprene emission to low CO2 is quite different. Isoprene emission increases at low CO2 (Monson and Fall 1989; Centritto et al. 2004) and may have been two to three times higher at Pleistocene CO2 levels (180 ppm) than under current CO2 conditions (380 ppm; Possell and Hewitt 2011). Thus, in a low CO2 atmosphere such as that occurring during the Miocene (8–25 mya) and Pleistocene (2.5 mya–12,000 years), isoprene emission would have been much more costly for an emitting plant than it was in the high CO2 atmosphere present during the Cretaceous when isoprene emission appears to have evolved in the rosids. The added cost of isoprene emission during the low CO2 Miocene may have made isoprene emitters less competitive, and selection may have favored those individuals in which isoprene emission was lost.

This leads to the expectation that losses of isoprene emission should be clustered during the low CO2 periods of the mid-Miocene and Pleistocene. Isoprene emission is generally uniform among species within a genus but variable among genera within a family (Harley et al. 1999). This suggests that losses of isoprene emission occurred after the diversification of the rosids into families and during the development of genus level diversity. The well-supported Cretaceous origin of the rosid Orders and Families (Wang et al. 2009) is consistent with a more recent loss of emission for some taxa in the Miocene. Although the divergence dates for rosid genera are not well known, it is clear that many representatives of modern genera were present in the Miocene (Axelrod 1983; Gerhart and Ward 2010). Selection against isoprene emission in a Miocene low CO2 environment could then have led to losses of isoprene emission in some of these ancestral species whose descendants then retained their emission status and diversified into the uniformly emitting or nonemitting genera observed today.


Associate Editor: Dr. Luke Harmon

ACKNOWLEDGMENTS

This work was supported by the US National Science Foundation (Grant No. IOS-0950574) and a gift from ZuvaChem, Inc. The kinetic constants of the Eucalyptus isoprene synthase (clone supplied by the Center for Industrial Biotechnology in partnership with ZuvaChem) were determined by Allison Sutter and Sean Weise. T. D. Sharkey consults for ZuvaChem, a company interested in using isoprene synthases for commercial production of isoprene.

Ancillary