Heartwood-specific transcriptome and metabolite signatures of tropical sandalwood (Santalum album) reveal the final step of (Z)-santalol fragrance biosynthesis


  • The sequences reported in this paper have been deposited in the NCBI GenBank database under accession numbers KU169302 (SaCYP736A167) and BioProject ID PRJNA297453 (RNA-Seq data).


Tropical sandalwood (Santalum album) produces one of the world's most highly prized fragrances, which is extracted from mature heartwood. However, in some places such as southern India, natural populations of this slow-growing tree are threatened by over-exploitation. Sandalwood oil contains four major and fragrance-defining sesquiterpenols: (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol. The first committed step in their biosynthesis is catalyzed by a multi-product santalene/bergamotene synthase. Sandalwood cytochromes P450 of the CYP76F sub-family were recently shown to hydroxylate santalenes and bergamotene; however, these enzymes produced mostly (E)-santalols and (E)-α-exo-bergamotol. We hypothesized that different santalene/bergamotene hydroxylases evolved in S. album to stereo-selectively produce (E)- or (Z)-sesquiterpenols, and that genes encoding (Z)-specific P450s contribute to sandalwood oil formation if co-expressed in the heartwood with upstream genes of sesquiterpene biosynthesis. This hypothesis was validated by the discovery of a heartwood-specific transcriptome signature for sesquiterpenoid biosynthesis, including highly expressed SaCYP736A167 transcripts. We characterized SaCYP736A167 as a multi-substrate P450, which stereo-selectively produces (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol, matching authentic sandalwood oil. This work completes the discovery of the biosynthetic enzymes of key components of sandalwood fragrance, and highlights the evolutionary diversification of stereo-selective P450s in sesquiterpenoid biosynthesis. Bioengineering of microbial systems using SaCYP736A167, combined with santalene/bergamotene synthase, has potential for development of alternative industrial production systems for sandalwood oil fragrances.


Sandalwoods (Santalum spp.) are slow-growing hemi-parasitic trees of the Santalaceae family that are native to tropical and temperate regions of India, Indonesia, Australia and the Pacific Islands (Harbaugh and Baldwin, 2007). For centuries, sandalwood has been used for the production of incense and carvings. High-quality sandalwood oil, extracted from the heartwood (HW) of stems and roots, is used in the manufacture of modern high-end perfumes and cosmetics. Santalum album, commonly known as tropical or Indian sandalwood, is one of the most valuable commercially used sandalwood species. The essential oil of S. album contains predominantly the sesquiterpene alcohols (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol (Figure 1). Of those, (Z)-α-santalol and (Z)-β-santalol are the most important fragrance-defining components of sandalwood oil (Baldovini et al., 2011). The biological functions of santalols in the tree remain unknown, but it is possible that the sesquiterpenoids protect the HW against microbial decay.

Figure 1.

Biosynthetic pathway for santalols and bergamotol in S. album.

The multi-product sesquiterpene synthase SaSSy converts farnesyl diphosphate to the olefins α-santalene, α-exo-bergamotene, epi-β-santalene and β-santalene, which are hydroxylated at the C12 position by multi-substrate stereo-selective P450 enzymes that produce (Z) or (E) isomers of the santalols and exo-bergamotol.

Unsustainable exploitation of wild trees, combined with increasing global demand for sandalwood products, have threatened native sandalwood populations in some places such as southern India (Arun Kumar et al., 2012; Rashkow, 2014). In an effort to address the growing demand for sandalwood oil, sandalwood plantations have been established, for example in northern Australia. However, the productivity of plantations is limited by slow growth, long rotation times, and variation in oil yield. Chemical synthesis of santalols has also been explored (Christenson and Willis, 1980; Brocke et al., 2008; Muratore et al., 2010), but remains uneconomic at an industrial scale. Metabolic engineering of microbial or plant systems may provide an alternative to sustainably produce key components of sandalwood oil. This approach requires elucidation of the genes and enzymes for the biosynthesis of sandalwood oil sesquiterpenoids.

Sesquiterpenoid biosynthesis relies on isopentenyl diphosphate and dimethylallyl diphosphate precursors of the mevalonic acid (MEV) pathway, and their condensation by farnesyl diphosphate synthase (FPPS) (Figure S1). In sandalwood, farnesyl diphosphate is converted by a multi-product sesquiterpene synthase, santalene/bergamotene synthase (SaSSy), into a blend of α-santalene, β-santalene, epi-β-santalene and α-exo-bergamotene (Jones et al., 2011) (Figure 1). Recently, we reported the discovery of nine different cytochrome P450-dependent mono-oxygenases (P450s) of the S. album CYP76F sub-family, which function as multi-substrate santalene/bergamotene oxidases (Diaz-Chavez et al., 2013). These P450s hydroxylate α-santalene, β-santalene, epi-β-santalene and α-exo-bergamotene. However, all nine functionally characterized SaCYP76F enzymes produced predominantly (E)-α-santalol, (E)-β-santalol, (E)-epi-β-santalol and (E)-α-exo-bergamotol, and only produced small amounts of the (Z) stereoisomers, which are characteristic of the sandalwood oil, suggesting that one or more additional enzymes are required for sandalwood oil biosynthesis (Diaz-Chavez et al., 2013).

Given the abundance of the (Z) stereoisomers of α-santalol, β-santalol, epi-β-santalol and α-exo-bergamotol in the HW oil, we hypothesized that: (i) additional santalene/bergamotene-hydroxylating P450s are active in S. album, (ii) the ability of P450s to hydroxylate santalenes and bergamotene evolved in different sub-families of the S. album P450 gene family, (iii) different S. album P450s produce preferentially either the (Z) or (E) stereoisomers of α-, β- and epi-β-santalol and α-exo-bergamotol, (iv) expression of a S. album P450 for stereo-selective biosynthesis of the (Z) stereoisomers is associated with, and perhaps restricted to, HW, and (v) such a spatial pattern of P450 transcript expression may also correlate with expression of transcripts for earlier steps of sandalwood oil sesquiterpenoid biosynthesis.

Here, we describe the validation of these hypotheses, which led to the discovery, cloning and functional characterization of SaCYP736A167, expressed in HW of S. album. SaCYP736A167 encodes a P450 enzyme that stereo-selectively produces (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol as found in authentic sandalwood oil. This enzyme is not closely related to other terpenoid-oxidizing P450s in sandalwood or other species. We demonstrate biosynthesis of (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol in yeast cells expressing SaSSy and SaCYP736A167, providing proof of concept that a bioengineered microbial system has potential for industrial production of sandalwood oil fragrances.


Gene discovery strategy

We tested our hypotheses using an integrated approach comprising metabolite and transcriptome analyses, P450 cDNA cloning, and functional enzyme characterization. We collected samples representing three stages of wood development from stems of four individual 15-year-old S. album trees in a tropical plantation in northern Australia. Samples were defined as sapwood (SW), transition zone (TZ) or heartwood (HW). SW represents the outer and youngest wood tissue and is characterized by a white or pale yellow color, followed towards the center of the stem by a TZ of pink or faint red color, with the oldest HW of red or dark-brown color at the center (Figures 2 and 3). These three zones, which typically display uneven shapes, are characteristic of oil-producing sandalwood trees. Given the variable thickness of these zones, we distinguished SW, TZ and HW phenotypically by the color of wood shavings while they were collected from living trees using a woodborer. We used the same SW, TZ and HW samples from each of the four trees (biological replicates) for sesquiterpenoid metabolite profiling and transcriptome analysis of sesquiterpenoid biosynthesis.

Figure 2.

Sandalwood oil sesquiterpenols accumulate in HW tissue.

Normalized relative amounts of the four major sandalwood oil sesquiterpenols that accumulate preferentially in sandalwood HW: (Z)-α-santalol, (Z)-α-exo-bergamotol, (Z)-epi-β-santalol and (Z)-β-santalol, determined by GC/MS in pentane extracts (Figure 3). Values are means and standard deviation of four biological replicates. The inset shows a cross-section from the base of a 15-year-old S. album stem, indicating the change in color across the gradient of wood maturation (SW, sapwood; TZ, transition zone; HW: heartwood).

Figure 3.

Sesquiterpenoid profile of HW tissue extracts.

Sesquiterpenoids were identified by GC/MS in pentane extracts of HW tissue. Peaks representing sesquiterpenoids that were identified are indicated by numbers in the order of retention time. Individual segments (a)–(c) of the GC profile are magnified. Sesquiterpenoids were identified by comparison to reference mass spectra of the National Institute of Standards and Technology (NIST http://chemdata.nist.gov/) and searches against the Wiley W9N08L library, and, where available, by comparison with published linear retention indices.

Sesquiterpenol profiles across the gradient of S. album wood maturation

Across the gradient of wood maturation from SW to HW, we detected the major sesquiterpenols (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol, and additional minor sesquiterpenoids (Figure 3). Separation of (Z) and (E) stereoisomers of these sesquiterpenols was achieved by GC/MS. In HW, (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol were found in a ratio of 13:6:1:2 (Figures 2 and 3). The relative tissue distribution of all four sesquiterpenols combined was 0.4% in SW, 0.1% in TZ and 99.5% in HW. While confirming that sandalwood oil accumulates predominantly in the HW, this pattern of accumulation does not rule out that biosynthesis may occur in SW or the TZ, followed by transport into HW.

Transcriptome signatures of sesquiterpenoid biosynthesis across the gradient of S. album wood maturation

To identify spatial patterns of sesquiterpenoid biosynthesis in sandalwood stems, we established a comprehensive reference transcriptome from a total of 12 separate SW, TZ and HW libraries produced from four trees. The transcriptome represents a total of 593 510 non-redundant transcripts with a mean length of 864 bp, which are translated into 111 422 predicted non-redundant peptides with a mean length of 230 amino acids. Across the four biological replicates, we identified 2263 transcripts as differentially expressed between SW and TZ, 829 transcripts as differentially expressed between TZ and HW, and 2874 transcripts as differentially expressed between HW and SW (more than fourfold difference of transcript abundance; P < 0.01, FDR < 0.05). These results indicate that there are substantial differences in the transcriptome profiles of different stages of S. album wood maturation.

Transcripts were annotated manually for all steps of the MEV and methylerythritol phosphate (MEP) pathways and FPPS based on the Arabidopsis thaliana reference genes (Lange et al., 2000a; Zhang et al., 2005), and for terpene synthases and P450s using databases described by Zerbe et al. (2013). Expression analysis of these annotated transcripts revealed strong, HW-specific signatures for sesquiterpenoid biosynthesis (Figure 4). HW showed preferential expression of MEV and FPPS transcripts, contrasting with low expression of the MEP pathway (Figure 4). Among the set of annotated terpene synthases, the sesquiterpene synthases santalene/bergamotene synthase (SaSSy) and sesquisabinene synthase (SaSSABS) were highly and differentially expressed in HW (Figure 4). SaSSY ranked among the top 0.1% most highly expressed transcripts in HW. In contrast, MEP pathway transcripts were preferentially expressed in SW and expressed at a low level in HW. These patterns, together with the HW accumulation of sesquiterpenols (Figure 2), suggested that biosynthesis of sandalwood oil occurs in the HW; they do not support a scenario of sandalwood oil biosynthesis in the younger SW or TZ. Based on the discovery of a unique sesquiterpenoid transcriptome signature in the HW, we focused our search for P450s involved in sandalwood oil biosynthesis on transcripts with preferential HW expression.

Figure 4.

Transcriptome signatures for terpenoid biosynthesis in sandalwood stems across the gradient of wood maturation.

Transcript abundance was assessed for (a) terpenoid biosynthesis genes in the MEP and MEV pathways, plus FPPS, and (b) terpene synthase genes. All sesquiterpenoid biosynthetic genes of the MEV pathway, FPPS, and functionally known sandalwood sesquiterpene synthases (SaSSy, SaBS, SaSSABS1 and SaSSABS2) showed highly preferential HW expression. In contrast, MEP pathway transcripts were part of a SW-specific cluster. Gene names are as indicated in Figure S1. Genes shaded in grey are significantly differentially expressed. Expression values in each row are z-score-normalized as described in Experimental procedures.

SaCYP736A167 is highly and selectively expressed in S. album heartwood

We identified sandalwood P450 transcripts by homology search against a previously described P450 protein database (Zerbe et al., 2013). Across the complete sandalwood transcriptome, we found 116 unique assembled transcripts encoding predicted P450 proteins of more than 400 amino acids in length, 82 of which were apparently full-length (>495 amino acids). Candidate P450s were selected based on HW-preferential expression (Figure 5a), matching the expression of SaSSy, FPPS and MEV pathway genes as well as sesquiterpenol profiles. Family and sub-family annotations were assigned by phylogenetic clustering and on the basis of bi-directional best BLAST hits (http://www.genome.jp/tools/kaas/). Using the criteria that the target P450 must be both preferentially and highly expressed in HW, we found that a P450 of the CYP736A sub-family (SaCYP736A167) was the most highly expressed P450 transcript in the HW (Figure 5b).

Figure 5.

P450 genes of S. album form a HW-specific transcriptome cluster with highly expressed CYP736A167 transcripts.

(a) Annotated non-redundant S. album P450 transcripts were mapped onto the gradient of wood maturation covering SW, TZ and HW. SW and HW tissues show distinct clusters of P450 transcripts, with several P450s showing HW-preferential expression. Expression values in each row are z-score-normalized as described in Experimental procedures.

(b) Scatter plot of P450 transcript abundance in SW versus HW (expressed as counts per million, cpm) identified SaCYP736A167 as the top candidate for a role in HW sandalwood oil biosynthesis based on the two criteria of (i) HW specificity and (ii) transcript abundance.

SaCYP736A167 produces (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol

To test the function encoded by the SaCYP736A167 transcript, a codon-optimized cDNA was co-expressed with the S. album cytochrome P450 reductase gene SaCPR2 (Diaz-Chavez et al., 2013) in yeast (Saccharomyces cerevisiae) cells. Microsomal SaCPR2/SaCYP736A167 was assayed for enzyme activity in vitro using NADPH and purified α-santalene or β-santalene, or these two substrates combined. SaCYP736A167 converted α-santalene into (Z)-α-santalol and β-santalene into (Z)-β-santalol as determined by GC/MS retention times and mass spectra of assay products, which matched those of authentic components of sandalwood oil (Figure 6 and Figure S3). We also tested SaCYP736A167 using the complete set of the four SaSSy products as substrates (Figure 6 and Figure S4). In these assays, α-santalene, β-santalene, epi-β-santalene and α-exo-bergamotene were converted into (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol, respectively. Using α-santalene and β-santalene, obtained by purification of SaSSy products, as substrates, SaCYP736A167 showed apparent Km values of 2.7 μm (Vmax 0.24 μm min−1) and 35.8 μm (Vmax 0.65 μm min−1), respectively.

Figure 6.

Sesquiterpenols formed in in vitro assays using SaCYP736A167.

Sesquiterpene substrates were incubated with microsomes containing SaCYP736A167 and SaCPR2 expressed in yeast.

(a–d) Total ion chromatograms were obtained for assays using (a) α-santalene alone, (b) β-santalene alone, (c) α-santalene and β-santalene, and (d) products of SaSSy activity, specifically α-santalene, β-santalene, epi-β-santalene and α-exo-bergamotene (Figure S4).

(e) Total ion chromatogram for authentic S. album oil.

(f) Control assays were performed using microsomes isolated from yeast cells transformed with the empty vector.

Products were identified as (Z)-α-santalol (peak 1), (Z)-α-exo-bergamotol (peak 2), (Z)-epi-β-santalol (peak 3) and (Z)-β-santalol (peak 4). Mass spectra of compounds corresponding to peaks 1–4 are shown in Figure S3.

Engineered yeast produces sandalwood sesquiterpenols

To validate the results from in vitro assays and assess the potential to produce the major sandalwood oil components in a microbial host, we reconstructed a biosynthetic pathway including FPPS, SaSSy, SaCPR2 and SaCYP736A167 in yeast strain AM94 (Ignea et al., 2012). Yeast cells expressing FPPS and SaSSy produced santalenes and bergamotene, but no detectable levels of the corresponding sesquiterpenols, while cells transformed with FPPS, SaSSy, SaCPR2 and SaCYP736A167 formed the four main sandalwood oil sesquiterpenols (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol (Figure 7).

Figure 7.

In vivo biosynthesis of sandalwood sesquiterpenols in engineered yeast.

(a–c) Total ion chromatograms of extracts of engineered yeast AM94 strains: (a) control strain transformed with empty vectors, (b) strain expressing FPPS and SaSSy, (c) strain expressing FPPS and SaSSy together with SaCYP736A167 and SaCPR2.

(d) Total ion chromatogram for authentic S. album oil.

Sesquiterpenols were identified by GC/MS as (Z)-α-santalol (peak 1), (Z)-exo-bergamotol (peak 2), (Z)-epi-β-santalol (peak 3) and (Z)-β-santalol (peak 4). Peaks indicated by asterisks correspond to farnesol, which was present in all extracts. The peaks in (c) indicated by hash symbols (#) represent yeast modifications of santalols (Diaz-Chavez et al., 2013).

Phylogenetic relationships for SaCYP736A167

We reconstructed a phylogeny of plant CYP736A sub-family members (Figure 8) including SaCYP736A167 and functionally characterized CYP736A members from Lotus japonicus, Sorbus aucuparia and Malus × domestica. Selected CYP71, CYP84, CYP750 sub-family members (belonging to the CYP71 clan) were also included, some of which are known to be involved in terpenoid oxidation (Table S1). Overall, SaCYP736A167 does not cluster with previously characterized P450s involved in terpenoid oxidation. The most closely related and functionally defined P450s are L. japonicus LjCYP736A2 (cyanogenic glucoside metabolism; Takos et al., 2011) and S. aucuparia SoCYP736A107 (noraucuparin formation in biphenyl phytoalexin biosynthesis; Sircar et al., 2015). The CYP750B1 sabinene hydroxylase of the gymnosperm Thuja plicata appears to be the most closely related P450, with a known function in terpenoid oxidation (Gesell et al., 2015).

Figure 8.

Phylogenetic relationship for SaCYP736A167.

Maximum-likelihood phylogeny for SaCYP736A167 and P450s of the CYP71 clan, including sub-families CYP736A, CYP750, CYP84 and CYP71. Products of functionally characterized CYP736A enzymes, including SaCYP736A167, are shown with their structures. For comparison, SaCYP76F39v1 and its product are also included (Diaz-Chavez et al., 2013). Bootstrap values greater than 75% from 1000 replicates are indicated by solid black circles at branch points. The scale bar represents 0.2 amino acids substitutions per site. A. thalianaCYP51G1 served as an outgroup. Species included in the tree are: Aa, Artemisia annua; At, Arabidopsis thaliana; Bc, Bupleurum chinense; Bs, Barnadesia spinosa; Ci, Cichorium intybus; Eg, Eucalyptus globulus; Gm, Glycine max; Hm, Hyoscyamus muticus; Ls, Lactuca sativa; Lj, Lotus japonicus; Md, Malus × domestica (cv. Golden Delicious and Holsteiner Cox); Pc, Pyrus communis; Pg, Panax ginseng; Pm, Prunus mume; Sa, Santalum album; Sl, Solanum lycopersicum; So, Sorbus aucuparia; Vv, Vitis vinifera. Accession numbers and known functions are listed in Table S1.


Despite the long history of humans using tropical sandalwood for traditional ceremonial and modern industrial purposes, none of the genes that are specific to biosynthesis of the fragrance-defining sesquiterpenols were known until recently (Jones et al., 2011). This lack of progress was partly due to sandalwood trees being recalcitrant to biochemical and functional genomic studies, requiring access to oil-producing trees in remote locations, collection of biological materials under tropical field conditions, and isolation of RNA from HW. In the last few years, genes and enzymes have been reported for all but the critical final step in biosynthesis of the most valuable sesquiterpenol components of tropical sandalwood and related Santalum species. Specifically, we have cloned and characterized FPPS and a suite of sesquiterpene synthases including SaSSy (Jones et al., 2011; Moniodis et al., 2015), plus several monoterpene synthases for biosynthesis of minor sandalwood compounds (Jones et al., 2008). The elusive final step in biosynthesis of the four main quality-defining sandalwood oil components, namely (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol, was proposed to involve stereo-selective hydroxylation of the four SaSSy products α-santalene, β-santalene, epi-β-santalene and α-exo-bergamotene. Members of the sandalwood SaCYP76F sub-family catalyze hydroxylation of all four SaSSy products; however, the resulting sesquiterpenols were mostly of (E) stereochemistry (Diaz-Chavez et al., 2013).

Here, we successfully validated the following line of hypotheses: Several different santalene/bergamotene hydroxylases exist in S. album, whereby different stereo-selective P450s that use the same santalene and bergamotene substrates evolved in distant P450 clades [namely the CYP736A (this work) and the CYP76F (Diaz-Chavez et al., 2013) subfamilies]. While SaCYP76F enzymes preferentially produce the (E)-stereoisomers, SaCYP736A167 produces almost exclusively the (Z)-stereoisomers of α-, β- and epi-β-santalol and α-exo-bergamotol. Our discovery of SaCYP736A167 was based on the underlying hypothesis that transcript expression of a P450 for stereo-selective biosynthesis of the (Z) stereoisomers is associated primarily with HW, and that this spatially selective P450 expression correlates with expression of upstream steps of sesquiterpenoid biosynthesis.

Genome- and transcriptome-enabled discovery of P450s of terpenoid biosynthesis may be informed by protein phylogeny, whereby P450 candidates are selected if they cluster with CYP sub-families containing enzymes of known function in terpenoid metabolism (Ro et al., 2005; Zerbe et al., 2013). However, this strategy excludes candidate genes in P450 clades with different or unknown functions. For example, SaCYP736A167 belongs to a sub-family for which biochemical functions were only recently identified, none of which involved a terpenoid substrate (Takos et al., 2011; Sircar et al., 2015). Thus, association by function would not have revealed SaCYP736A167 as the P450 of interest. An alternative approach is to screen for P450s with genomic locations near terpene synthase genes (Boutanaev et al., 2015). This approach is applicable in plant species with well-assembled genome sequences, but does not apply for the many plant species without assembled genomes, such as sandalwood, or for species where genome assembly remains a challenge (Warren et al., 2015). Here, we applied an alternative screening strategy for sandalwood P450s, which was completely relaxed from the criterion of clade-specific association by function. Instead, our strategy focused on association with the broader biosynthetic process, combined with spatial and temporal patterns of transcript expression and metabolite accumulation. Our approach was based on hypothesized tissue-specific and developmentally controlled expression in HW, and association of P450 expression with the expression of upstream sesquiterpenoid pathway genes and sesquiterpenol accumulation.

We found highly distinctive transcriptome signatures for the MEP pathway associated with younger SW, in contrast with the MEV pathway plus downstream sesquiterpenoid transcriptome signatures associated with HW. The HW-specific transcriptome signature of sesquiterpenoid biosynthesis enabled the discovery of SaCYP736A167, which is highly and selectively expressed in HW and encodes an enzyme that stereo-selectively produces an authentic sandalwood oil profile comprising (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol. The clear delineation of SW and HW transcriptome signatures was surprising, because our field-based collections in a remote tropical plantation were not performed using any means of tissue micro-dissection, as would be possible in the laboratory. But instead, small samples of wood shavings were taken with a wood borer from living trees and samples were phenotypically scored by eye for differences in color associated with HW formation. Nevertheless, this approach separated oil-producing and -accumulating HW from the non-accumulating SW and TZ. The observed resolution of contrasting SW- and HW-specific transcriptome signatures and metabolite profiles was independently reproduced using samples from four trees. Despite harvesting RNA from a highly recalcitrant mature wood tissue, the HW transcriptome quality was such that it covered all steps of sandalwood oil biosynthesis. We identified a tissue age-dependent transition from predominantly MEP pathway expression (in younger SW) to predominantly MEV pathway expression (in older HW) along the multi-year gradient of wood maturation. While cell type-specific dominance of a particular terpenoid pathway has been reported previously, for example in glandular trichomes of various plant species (Lange et al., 2000a,b; Wang et al., 2008; Schilmiller et al., 2009), sandalwood xylem showed an age-dependent pattern of separation of MEP and MEV pathways in the same tissue. The underlying mechanisms controlling this shift from predominance of the MEP pathway in younger SW to predominance of the MEV pathway in older HW remain unknown. It may include selective up- and down-regulation of these two pathways over time in the same cell types. Alternatively, as the wood tissue ages, the metabolism of specific cell types specialized for sesquiterpenol biosynthesis may come to the fore, while metabolism of other cell types ceases, perhaps associated with cell death in maturing HW.

Accumulation of secondary metabolites in HW appears to be a common feature in many tree species. Examples include accumulation of anti-microbial thujaplicins and lignans in the HW of cedars (Morris and Stirling, 2012) or tannins in the HW of oak, the latter also being appreciated for barrel-aging of wines (Hillis, 1987). Deposition of secondary metabolites in HW is thought to be a consequence of programmed cell death, which results in ray parenchyma and other previously metabolically active cells releasing their contents into the apoplastic space. However, our work shows that the complete sesquiterpenoid pathway of sandalwood oil biosynthesis comprising the MEV pathway as well as FPPS, SaSSY and SaCYP736A167 becomes selectively up-regulated in the sandalwood HW, despite much of the HW comprising dead xylem. Specialized HW ray parenchyma cells have been suggested to play a role in the biosynthesis of sandalwood oil (Jones et al., 2008). It is indeed possible that these cells do not undergo cell death, but become metabolically specialized for biosynthesis of sesquiterpenoids, whose accumulation may protect the HW against microbial decay.

The CYP736A sub-family of P450s appears to be present in some but not all plant species. For example, monocots seem to lack CYP736As. Earlier work in grapevine (Vitis vinifera) and soybean (Glycine max) suggested that CYP736As may be involved in interactions with pathogenic or symbiotic micro-organisms (Cheng et al., 2010; Guttikonda et al., 2010). Recently, two CYP736As with roles in cyanogenic glucoside biosynthesis in Lotus japonicus (Takos et al., 2011) and biphenyl phytoalexin biosynthesis in rowan and apple (Sircar et al., 2015) were functionally characterized. The substrates of these two CYP736As are as different from one another as they are from the sesquiterpene substrates of SaCYP736A167 (Figure 8). Together, these results suggest that different CYP736As play species-specific and diverse roles in plant secondary metabolism, but are not essential across the plant kingdom.

Future work will investigate the differences that control the opposite stereo-selectivity of SaCYP736A167 and SaCYP76F39v1 (Diaz-Chavez et al., 2013), which act on the same substrates. SaCYP736A167 and SaCYP76F39v1 only share 36% amino acid sequence identity (Figure S4), belong to distant P450 clades, and only SaCYP736A167, but none of the SaCYP76Fs, showed high HW-preferential expression. Identification of transcription factors and promoter elements that effect differential expression of SaCYP736A167 and SaCYP76Fs may shed light on the control of HW-specific accumulation of sandalwood oil. Substrate affinity for α-santalene appears to be higher by an order of magnitude for SaCYP736A167 (Km = 2.7 μm) compared to SaCYP76F39v1 (Km = 26 μm), while the affinity for β-santalene is similar for SaCYP736A167 (Km = 36 μm) and SaCYP76F39v1 (Km = 34 μm) (Diaz-Chavez et al., 2013). These basic kinetic parameters were obtained using microsomes, which do not represent a purified protein, and it was not possible to assess the amount of active CYP736A167 protein in microsome preparations due to inconclusive CO spectra.

Following recent work on formation of santalenes in Escherichia coli and yeast (Jones et al., 2011; Chen et al., 2013; Diaz-Chavez et al., 2013) and formation of (E)-santalols in yeast (Diaz-Chavez et al., 2013), we have successfully completed metabolic engineering of a biosynthetic pathway to produce the highly-desirable (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol in a recombinant yeast strain expressing SaCYP736A167 together with SaSSy, SaCPR2 and SaFPPS. We have thereby identified all of the genes that are essential for metabolic engineering of a microbial system to produce sandalwood oil constituents, and tested them in a successful proof of concept experiment. With additional strain and process optimization, such a biotechnological system holds potential to supplement traditional sandalwood oil production. Use of gene expression probes for SaSSy and SaCYP736A167 may also allow plantation managers to non-destructively assess the variable capacity of sandalwood trees for oil production. Furthermore, the unique HW-specific gene expression signatures of sesquiterpenoid biosynthesis will allow delineation of developing sandalwood HW from the TZ and SW using molecular probes that measure (and may predict) onset of oil accumulation under normal growth conditions. This will allow managers to test the effectiveness of treatments that may induce oil production in younger plantation trees, for example by means of elicitor or stress treatments that induce terpenoid accumulation in the xylem of other woody plants (Martin et al., 2002; Zulak et al., 2009).

In conclusion, this work completes the discovery of genes necessary for the biosynthesis of (Z)-α-santalol, (Z)-β-santalol, (Z)-epi-β-santalol and (Z)-α-exo-bergamotol in the HW of tropical sandalwood. The discovery of the previously elusive SaCYP736A167 was enabled by identification of a HW-specific transcriptome signature for sesquiterpenoid biosynthesis, which contrasted with transcriptome signatures for the younger wood tissues. Stereo-selective activity of SaCYP736A167 evolved in a CYP sub-family, CYP736A, that was not previously known to include enzymes of terpenoid metabolism. Our results enable development of practical applications, involving metabolic engineering of micro-organisms and use of biomarkers in plantations, to improve existing and develop new production systems for sandalwood oil fragrances.

Experimental procedures

Plant material and RNA isolation

Wood samples were collected in the form of shavings using a hand-driven drill from the stems of four 15-year-old S. album trees grown in a commercial sandalwood plantation in Kununurra, northern Australia, on 2nd May 2014. Trees were sampled at 50 cm from the ground. Progressive samples from the outer SW, intermediate TZ and inner HW were separated on the basis of color (SW, white/yellow; TZ, pink/red; HW, red/dark-brown). Samples were frozen and stored in liquid nitrogen, shipped to the laboratory and stored at −80°C. Frozen tissue was ground to a fine powder, and RNA was extracted using PureLink plant RNA reagent (Invitrogen, https://www.thermofisher.com/ca/en/home/brands/invitrogen.html) with addition of glycogen at a final concentration of 3.3 ng μl−1 prior to RNA precipitation. RNA quality and concentration were assessed using an Agilent Technologies 2100 Bioanalyzer (www.agilent.com).

Transcriptome sequencing, assembly and annotation

Approximately 700 ng total RNA per sample were used for strand-specific library preparation and sequencing at McGill University/Génome Québec Innovation Centre (Montreal, Canada). Separate libraries were produced for SW, TZ and HW from the four trees (four biological replicates). rRNA was depleted prior to library construction using a Ribo-Zero plant kit (Illumina, http://www.illumina.com/). Sequencing was performed on a HiSeq 2000 Illumina platform using 100-cycle paired-end sequencing, generating a total of 618 million 100 bp paired-end reads from the 12 libraries. Initially, we combined all sequence reads, removed rRNA sequences, and trimmed adapters using Trimmomatic (Lohse et al., 2012). The remaining 559 million paired-end reads were merged into single-end reads using the BBMerge tool (BBmap software; http://sourceforge.net/projects/bbmap/). Merged single-end reads and unmerged paired-end reads were assembled de novo using Trinity (Haas et al., 2013), generating 593 510 non-redundant transcripts with a mean length of 864 bp. Predicted peptides were determined using Transdecoder (Haas et al., 2013), generating 111 422 predicted non-redundant peptides with a mean length of 230 amino acids. Candidate P450s and terpene synthases were identified and annotated by homology search using customized protein databases (Zerbe et al., 2013). MEP and MEV pathway genes were identified using A. thaliana sequences from PlantCyc (http://www.plantcyc.org/) as reference, and annotated manually.

Identification of differentially expressed genes and tissue-specific expression signatures

Transcript abundance was determined using Salmon (http://combine-lab.github.io/salmon/) (Patro et al., 2014) based on k-mer counts from each library, and expressed as simulated read counts. Read counts estimated using Salmon for each library were normalized using the R package edgeR (Robinson et al., 2010), and analyzed for differentially expressed genes between the three different wood samples (SW, TZ and HW) for each of the four biological replicates using the ‘generalized linear models’ function. The false discovery rate was set at 5%. Transcripts with a more than fourfold difference in transcript abundance (P < 0.01, FDR < 0.05) were defined as differentially expressed. Hierarchical clustering was performed by simultaneously comparing normalized read count data from all three datasets (SW, TZ and HW) using Pearson's correlation as a distance measure of transcripts and the complete linkage algorithm for clustering. Heatmaps were generated using the heatmap.2 function of the R package gplots version 2.17.0 (https://cran.r-project.org/web/packages/gplots/) with the scale = ‘row’ option to z-score-normalize rows. Z-scores were calculated as: z = (x − mean)/SD, where x is the mean transcript abundance in a given tissue for the four biological replicates, the mean represents the mean transcript abundance across all three tissues (row; SW, TZ and HW), and SD is the standard deviation of transcript abundance across all three tissues (row; SW, TZ, and HW).

cDNA cloning, yeast transformation, microsome preparation and P450 assays

Full-length P450 cDNA for SaCYP736A167 was PCR-amplified from sandalwood cDNA template using primers 5′-CTTCCTCAACATGTCTCCG-3′ and 5′-GGACTCCAAGCGATAGGTT-3′, cloned into the pJET vector (ThermoFischer), and sequence verified. A yeast codon-optimized version was synthesized using GeneArt (https://www.thermofisher.com/ca/en/home/life-science/cloning/gene-synthesis/geneart-gene-synthesis.html) custom algorithms and services, cloned into pYeDP60 (Hamann and Møller, 2007) and transformed into BY4741 yeast cells (GE Life Sciences, http://www.gelifesciences.com/). For in vitro assays, microsomes were isolated from BY4741 cells transformed with plasmids harboring SaCYP736A167 and SaCPR2 and grown as previously described (Diaz-Chavez et al., 2013). Microsomes were assayed for P450 enzyme activity with α-santalene and β-santalene as described previously (Diaz-Chavez et al., 2013). The substrates were produced using SaSSy and purified (Diaz-Chavez et al., 2013). Hexane extracts of P450 assays were concentrated under N2 gas to approximately 50 μl, followed by GC/MS analysis.

Kinetic analysis was performed as described by Diaz-Chavez et al. (2013) using a single microsomal batch and TLC-purified α-santalene and β-santalene (Daramwar et al., 2012). A range of substrate concentrations from 0 to 250 μm were tested in triplicate for β-santalene and as single reactions for α-santalene (due to limited availability of substrate). Reactions were performed in 50 mm potassium phosphate (pH 7.5), 1 mm NADPH, and started by adding 20 μl of microsomes (containing 10 μg protein μl−1) in a total volume of 400 μl. Reactions were incubated at 30°C for 20 min with gentle shacking at 30 rpm. Reactions were stopped by adding 0.5 ml of hexane spiked with isobutyl benzene as an internal standard, followed by vigorous vortexing. Hexane extracts were analyzed by GC/MS. Calculation of kinetic constants was performed by non-linear regression to the Michaelis–Menten equation using the R package drc 2.5-12 (Ritz et al., 2005). For yeast in vivo assays, strain AM94 (Ignea et al., 2012) was co-transformed with plasmids harboring cDNAs for SaFPPS, SaSSy, SaCPR2 and codon-optimized SaCYP736A167. The yeast strain AM94 was selected as it had been previously improved with respect to sesquiterpene production (Ignea et al., 2012). Transformed yeast cells were cultured, harvested and extracted, and extracts were prepared for GC/MS analysis, as described previously (Diaz-Chavez et al., 2013).

Metabolite extraction from plant material

Metabolites were extracted using 50 mg ground tissue (dry weight) with 1 ml of pentane, spiked with isobutyl benzene (0.1 mg ml−1) as an internal standard, by end-over-end mixing for 24 h at room temperature. Samples were centrifuged at 1000 g for 15 min, and the pentane phase was transferred to a new GC vial for GC/MS analysis. The relative abundance of sesquiterpenoid metabolites was calculated by manual integration of peak areas and normalization using the internal standard and the amount of tissues used (dry weight). Analyses were performed using four biological replicates, each of which comprised two technical replicates.

GC/MS analysis

Sesquiterpenoids extracted from wood samples and obtained in enzyme assays were analyzed on an Agilent Technologies 7890A/5975C GC/MS system operating in electron ionization selected ion monitoring mode. Samples were analyzed on a DB-Wax fused-silica column (Agilent Technologies) (30 m long, 250 μm internal diameter, 0.25 μm film thickness). The injector was operated in pulsed splitless mode, with the injector temperature maintained at 250°C. Helium was used as the carrier gas, with a flow rate of 0.8 ml min−1 and pulsed pressure set at 25 psi for 0.5 min. Scan range: m/z 40–500; SIM: m/z 93, 94, 105, 107, 119, 122 and 202. The dwell time was 50 msec. The oven program comprised 40°C for 3 min, ramps of 10°C min−1 to 130°C, 2°C min−1 to 200°C, 50°C min−1 to 250°C, then 250°C for 15 min. ChemStation software (Agilent Technologies) was used for data acquisition and processing. Compounds were identified by comparison of their mass spectra against the NIST/EPA/NIH mass spectral library version 2.0 (http://chemdata.nist.gov/).


We thank the Forest Products Commission, Western Australia, for generously providing access to sandalwood plantation trees, David Nelson (University of Tennessee Health Science Center, Memphis, TN) for P450 naming, and Karen Reid (University of British Columbia) for excellent project and laboratory management. This work was supported with funds from Allylix Inc. and Evolva Inc. (to J.B.), the Natural Sciences and Engineering Research Council of Canada (to J.B.), and the Australian Research Council (to P.M.F., E.L.B. and J.B.). J.B. is a University of British Columbia Distinguished University Scholar. The authors declare that they have no conflict of interest in accordance with journal policy, except that J.B. served as an adviser to Allylix Inc. and Evolva Inc., who supported this work in part.