Identification and Characterization of Bacterial Diterpene Cyclases that Synthesize the Cembrane Skeleton

Digging up skeletons: We report the identification and the functional characterization of two terpene cyclases (DtcycA and DtcycB) that were mined from the genome of Streptomyces sp. SANK 60404. DtcycA and DtcycB are novel bacterial diterpene cyclases for the synthesis of the cembrane skeleton.

octat-9-en-7-ol synthase (CotB2) from cyclooctatin-producing Streptomyces melanosporofaciens MI614-43F2 [13] ( Figure S1 in the Supporting Information). Of these four enzymes, Cyc1, SsCPS, and Rv3377c are considered to be class II terpene cyclases. As mentioned above, a diphosphate group is found in the reaction product formed by these class II enzymes. These class II enzymes are accompanied by class I terpene synthases Cyc2, [8] ORF3, [10] and Rv3378c, [14] which react with each reaction product of the class II terpene cyclases (Cyc1, SsCPS, and Rv3377c, respectively) to release a diphosphate group (Figure S1). Interestingly, Cyc1 and SsCPS were found by their proximity to the mevalonate pathway gene cluster, which provides isoprene unit precursors of GGDP. [8,10] However, of the four above-mentioned enzymes, CotB2 is considered to be a class I diterpene synthase because it possesses an NSE motif. [13] In addition, a diphosphate group is removed by the reaction catalyzed by CotB2 to form the reaction product cyclooctat-9en-7-ol.
Here, we report two novel diterpene cyclases, DtcycA and DtcycB, that were mined from Streptomyces sp. SANK 60404, which was not previously known to be a terpenoid producer. [15,16] To mine these diterpene cyclases from the bacterium, we directed our attention to genes in close proximity to a gene encoding a GGDP synthase, which is indispensable for diterpene production. There are two reasons for this choice of approach: 1) the bacterial diterpene cyclases mentioned above exhibit only a low level of overall sequence similarity, whereas the GGDP synthases from these bacteria display more than 30 % identity, [8,10,13] and 2) a GGDP synthase gene is located near a diterpene cyclase gene in the genomes of these bacteria ( Figure S2).
We first de novo sequenced the genome of Streptomyces sp. SANK 60404. An assembly of the sequence reads (estimated coverage, 124-fold) yielded 1565 contigs with a total of 7 706 959 base pairs. To mine the unidentified diterpene cyclases from this bacterium, we first attempted to retrieve the GGDP synthases in the draft sequence because biochemically characterized diterpene cyclases are frequently located in close proximity to GGDP synthase genes, as mentioned above. Using the BLAST program with the amino acid sequence of CotB1 (a Streptomyces GGDP synthase for cyclooctatin biosynthesis) [13] as the query, we found five contigs containing open reading frames that displayed 31-32 % sequence identity to CotB1. We then searched the regions flanking these five open reading frames for putative terpene synthases that include DDXXD/E and/or NSE/DTE motifs (class I terpene synthases) or DXDD motif (class II terpene cyclases). We found two genes that encoded putative class I terpene synthases (DtcycA and DtcycB) in the regions that flank the GGDP synthase genes, but no gene encoding a putative class II terpene cyclase ( Figure S3). DtcycA (371 amino acids, DDBJ/EMBL/GenBank accession number AB738084) has a clear NSE motif, but exhibits only 14 % identity to CotB2 over 332 amino acids. DtcycB (343 amino acids, accession number AB738085) has a clear NSE motif, but exhibits only 19 % identity to CotB2 over 308 amino acids ( Figure S4).
With DtcycA and DtcycB as the query sequences, we then performed BLAST searches of the NCBI protein database, and identified terpene cyclase homologues (E values < 1 10 À10 ). For the closest homologues, DtcycA showed 27 % identity over 369 amino acids with a putative terpene cyclase (Protein ID, YP_004922572) from Streptomyces flavogriseus ATCC 33331, and DtcycB showed 23 % identity over 338 amino acids with a putative terpene cyclase (Protein ID, ZP_06911744) from Streptomyces pristinaespiralis ATCC 25486. Both homologues are found in a phylogenetic dendrogram based on the analysis of amino acid sequences from bacterial sesquiterpene (C15) cyclase homologues. [17] In addition, it has been reported that sesquiterpene cyclases with similar sequences produce the same major sesquiterpene. [17] Therefore, if a highly homologous terpene cyclase has previously been characterized, a phylogenetic analysis of terpene cyclases could allow prediction of their reaction products. However, we were unable to predict the reaction products synthesized by DtcycA and DtcycB, because none of the closely homologous terpene cyclases has been characterized.
Therefore, to elucidate the enzymatic functions of DtcycA and DtcycB, we overexpressed the corresponding genes in Escherichia coli and characterized the gene products. The molecular masses of the recombinant DtcycA and DtcycB enzymes were estimated at 42 and 38 kDa, respectively, by SDS-PAGE, and at 95 and 69 kDa, respectively, by gel filtration chromatography. These results suggest that both proteins are likely homodimers ( Figures S5 and S6). A functional analysis of the recombinant enzymes when using GDP (C10), FDP (C15), and GGDP (C20) as substrates indicated that only GGDP was converted into several reaction products in the presence of MgCl 2 . The two main products formed by the DtcycA reaction were detected at 10:29 and 11:13 min in the GC analysis, whereas the three main products formed by DtcycB were detected at 10:04, 11:11, and 11:13 min (Figures 1 and S7). In the absence of MgCl 2 , no products were detected in the reaction mixtures.
Large-scale preparation of the reaction products allowed us to deduce the structures of these compounds. The two DtcycA-catalyzed reaction products were determined to be the isopropylidene isomer of cembrene C (1, (1E,5E,9E)-1,5,9-trimethyl-12-(propan-2-ylidene)cyclotetradeca-1,5,9-triene) [18,19] and (R)-nephthenol (2); [20] two of the three DtcycB-catalyzed reaction products were identified as (R)-nephthenol (2) and (R)cembrene A (3). [18,20] These structures, with the exception of 1, were elucidated by comparison of the 1 H and 13 C NMR spectral data and the specific optical rotation of each product with literature data (Tables S1 to S3). Although 1 is a known compound (the isopropylidene isomer of cembrene C), we determined the structure of 1 by extensive NMR spectroscopic analysis, because of the lack of spectral data for the isopropylidene isomer of cembrene C in the literature ( Figure S8-S12). [18,19] We thus conclude that both DtcycA and DtcycB encode diterpene cyclases that are capable of forming multiple diterpene products.
The absolute stereochemistry of 4 was assigned by using the modified Mosher method. [21] The reaction of compound 4 with (R,S)-a-methoxy-a-(trifluoromethyl)phenylacetyl (MTPA) chloride in pyridine formed (S,R)-MTPA esters of 4. Based on the 1 H-shifting values (Dd = d(S)Àd(R)) of the (S,R)-MTPA derivatives, the absolute configuration of the secondary alcohol of 4 was determined to be S (Scheme 1 and Figure S18). Thus, the absolute stereochemistry of 4 was determined to be S.
None of the products formed by DtcycA and DtcycB possessed a diphosphate group, which suggests that the DtcycA and DtcycB enzymes initiate carbocation formation with substrate ionization. Therefore, we propose an ionization-dependent cyclization reaction mechanism for the multiple products that are formed by DtcycA and DtcycB (Scheme 2). The mechanism for the reaction catalyzed by the DtcycA enzyme presumably involves ionization of GGDP and formation of a cycloprop-yl carbocation intermediate (5). Deprotonation of 5 at C-14 yields 1, and the subsequent attack at the C-15 position of 5 by a water nucleophile yields compound 2. The mechanism of the reaction catalyzed by the DtcycB enzyme also involves the ionization of GGDP and the formation of 5. Compound 2 is then formed as in the mechanism catalyzed by DtcycA. In addition, the deprotonation of 5 at C-16 yields 3, and the nucleophilic attack at the C-14 position of 5 by water yields the novel compound 4.
We calculated the steady-state kinetic constants of DtcycA and DtcycB when using GGDP as the substrate. A typical hyperbolic curve was obtained for product formation as a function of substrate concentration (GGDP, Figure S5 and S6), which indicates that the reaction exhibits Michaelis-Menten kinetics. The steady-state kinetic constants for each cyclase were estimated by incubating the reactions for 3 min and using various GGDP concentrations (10-200 mm). This process yielded a K m value of 93.7 AE 8.4 mm and a k cat value of 2.8 min À1 for the recombinant DtcycA construct, and K m 42.1 AE 7.3 mm and k cat 1.3 min À1 for recombinant DtcycB; these values are comparable to those obtained for the previously characterized Cyc1 from Kitasatospora griseola, [9] SsCPS from Streptomyces sp. KO-3988, [11] and Rv3377c from Mycobacterium tuberculosis H37 (Table 2). [12] Finally, to determine whether diterpenes 1, 2, 3, and 4 are produced by Streptomyces sp. SANK 60404 in vivo, we attempted to detect these diterpenes in the culture broths of the  SANK 60404 under these standard laboratory culture conditions. However, it is also possible that diterpenes 1, 2, 3, and 4 are converted into as-yet unidentified products by Streptomyces sp. SANK 60404. Interestingly, the DtcycA and DtcycB genes are flanked by hypothetical genes that might be involved in the modification of these diterpene products ( Figure S3). In this study, we found two genes that encode diterpene cyclases, DtcycA and DtcycB, in proximity to GGDP synthase genes in the genome of Streptomyces sp. SANK 60404. In contrast, BLAST searches with CotB2 as the query sequence failed to find DtcycA or DtcycB in the draft genome sequence of Streptomyces sp. SANK 60404. This finding suggests that database searches that are based on proximity to GGDP synthase genes can be used to find novel diterpene cyclases in public protein databases, and that many currently unrecognized diterpene cyclases might be buried in public protein databases. We then identified diterpene products synthesized by recombinant DtcycA and DtcycB. Serendipitously, these diterpene cyclases formed a diterpene with a novel skeleton and three cembrane diterpenes that have previously been isolated from soft coral [18][19][20] and some plants, [22,23] but not from bacteria. However, DtcycA and DtcycB show no sequence similarity to cembrene A synthase (CAS2, 614 amino acids, accession number XM_002513288) from the castor oil plant (Ricinus communis) [22] or to cembratrien-ol synthases (NsCBTS2a, 598 amino acids, accession number HM241151; NsCBTS2b, 598 amino acids, accession number HM241152; NsCBTS3, 598 amino acids, accession number HM241153) from tobacco (Nicotiana sylvestris). [23] Thus, DtcycA and DtcycB are novel diterpene cyclases that synthesize the cembrane skeleton.
The crystal structures of two plant diterpene cyclases have been previously resolved: taxadiene synthase and ent-copalyl diphosphate synthase. [24,25] These plant enzymes have three ahelical domains (a, b, and g). The active site of taxadiene synthase (a class I synthase) is located at the a domain, and an aspartate-rich motif and an NSE/DTE motif are found on opposite sides of the entrance to the active site pocket. In class I synthase family members, these motifs are involved in the binding of Mg 2 + ions, which is indispensable for substrate recognition.
Because DtcycA and DtcycB catalyze the Mg 2 + -dependent cyclization of GGDP, these motifs were expected to be conserved in both enzymes. In CotB2, which catalyzes the Mg 2 + -dependent cyclization of GGDP, both motifs were clearly detected. However, the alignment of DtcycA and DtcycB with other bacterial class I diterpene synthases clearly revealed NSE/DTE motifs, but the presence of aspartate-rich motifs was ambiguous ( Figure S4). A previous study reported that only the NSE/ DTE motif is frequently conserved in class I terpene synthases. [26] Further insights into the structural basis of substrate recognition require the determination of the crystal structures of the DtcycA and DtcycB enzymes complexed with their substrates and Mg 2 + ions. Such crystal structures might be able to answer the particularly intriguing question of why both DtcycA and DtcycB yield 2 in spite of their low sequence similarity.
Enzyme assay with polyprenyl diphosphate substrates: The enzyme assays were performed in HEPES·NaOH (50 mm, pH 7.5) containing MgCl 2 (5 mm), polyprenyl diphosphate (1 mm; GDP, FDP, or GGDP), and recombinant enzyme (1 mg mL À1 ). The reaction mixture (500 mL) was incubated at 30 8C for 2 hrs. After incubation, the reaction mixture was extracted with ethyl acetate (500 mL). The organic extracts were then evaporated to dryness under vacuum, and the residues were reconstituted with ethyl acetate (100 mL) for GC-MS analysis (see below).
Large-scale preparation of diterpene products: The large-scale production of the diterpene products was performed in a total volume of 350 mL. The reaction was performed in HEPES·NaOH (50 mm, pH 7.5) containing MgCl 2 (2 mm), GGDP (0.5 mm), and recombinant enzyme (up to 0.2 mg mL À1 , added in two consecutive 4 h steps). The reaction mixture was incubated (30 8C, 18 h) and then extracted twice with ethyl acetate (350 mL). After drying over Na 2 SO 4 , the organic layer was evaporated in vacuo, and the residue was dissolved in methanol (1 mL). The reaction products were purified by preparative HPLC with a PEGASIL ODS column (20 250 mm; Senshu Scientific, Tokyo, Japan) and isocratic elution (methanol (90 %), 8 mL min À1 ). The absorption of the eluate was monitored at 210 nm. In a single preparation, 1 (4.0 mg) and 2 (3.0 mg) were obtained from the DtcycA-catalyzed reaction mixtures, and 2 (5.0 mg), 3 (5.4 mg), and 4 (2.4 mg) were obtained from the DtcycB-catalyzed reaction mixtures. To prepare the (R,S)a-methoxy-a-(trifluoromethyl)phenylacetyl esters of 4 (below), the DtcycB-catalyzed reaction was performed several times.
Structural analysis of the diterpene products: The structures of the diterpene products were analyzed by 1 H NMR spectroscopy, 13 C NMR spectroscopy, HMBC, HSQC, and COSY (600 MHz, ECA-600; JEOL, Tokyo, Japan). High-resolution Triple TOF 5600 MS apparatus (AB SCIEX, Tokyo, Japan) was used to determine the molecular formulae of the reaction products. MS analysis was performed by using electrospray ionization in the positive-ion mode. The optical rotations were recorded with a DIP-1000 polarimeter (JASCO, Tokyo, Japan).
After this reaction mixture (without enzyme) was incubated at 30 8C for 3 min, the reaction was initiated by adding DtcycA (532 mg) or DtcycB (466 mg). The enzyme-dependent oxidation of NADH was monitored by using a UV-1600PC spectrophotometer (Shimadzu, Kyoto, Japan) equipped with a CPS-240 A cell holder (Shimadzu) adjusted to 30 8C. Initial velocity was determined from the slope of a plot of NADH consumption against incubation time.
The molar extinction coefficient (e) of NADH is 6220 m À1 cm À1 at 340 nm. The steady-state kinetic parameters were calculated by using SigmaPlot 12.3 software (Systat Software, Point Richmond, CA).
Accession numbers: The nucleotide sequences of DtcycA and DtcycB have been deposited in the DDBJ/EMBL/GenBank nucleotide sequence database and assigned the accession numbers AB738084 and AB738085, respectively.