• Open Access

Camelina seed transcriptome: a tool for meal and oil improvement and translational research

Authors


Correspondence (fax +1 402 472 3139; email: ecahoon2@unl.edu)

Summary

Camelina (Camelina sativa), a Brassicaceae oilseed, has received recent interest as a biofuel crop and production platform for industrial oils. Limiting wider production of camelina for these uses is the need to improve the quality and content of the seed protein-rich meal and oil, which is enriched in oxidatively unstable polyunsaturated fatty acids that are deleterious for biodiesel. To identify candidate genes for meal and oil quality improvement, a transcriptome reference was built from 2047 Sanger ESTs and more than 2 million 454-derived sequence reads, representing genes expressed in developing camelina seeds. The transcriptome of approximately 60K transcripts from 22 597 putative genes includes camelina homologues of nearly all known seed-expressed genes, suggesting a high level of completeness and usefulness of the reference. These sequences included candidates for 12S (cruciferins) and 2S (napins) seed storage proteins (SSPs) and nearly all known lipid genes, which have been compiled into an accessible database. To demonstrate the utility of the transcriptome for seed quality modification, seed-specific RNAi lines deficient in napins were generated by targeting 2S SSP genes, and high oleic acid oil lines were obtained by targeting FATTY ACID DESATURASE 2 (FAD2) and FATTY ACID ELONGASE 1 (FAE1). The high sequence identity between Arabidopsis thaliana and camelina genes was also exploited to engineer high oleic lines by RNAi with Arabidopsis FAD2 and FAE1 sequences. It is expected that these transcriptomic data will be useful for breeding and engineering of additional camelina seed traits and for translating findings from the model Arabidopsis to an oilseed crop.

Introduction

Camelina sativa is an emerging Brassicaceae oilseed crop in the Great Plains of North America and Pacific Northwest of the United States. Archaeological excavations have uncovered evidence of camelina seed in agricultural use as early as 1500–400 BC, with more recent use as an oilseed crop in the 20th century in parts of Europe (Putnam et al., 1993; Zubr, 1997). Several characteristics of camelina contribute to the rising interest in its use as an oilseed crop for both food and nonfood purposes. The oil content of camelina seeds ranges from 28% to 40% of the seed weight, with storage proteins making up an additional approximately 30% of the seed weight (Budin et al., 1995; Marquard and Kuhlmann, 1986; Putnam et al., 1993). In addition, camelina seed oil has received commercial interest for the nutraceutical value of its high omega-3 fatty acid content (Putnam et al., 1993). Moreover, compared with that of other Brassicaceae species, camelina meal contains lower levels of glucosinolates, which can break down into toxic intermediates and limit the livestock feed value of the protein-rich meal (Pilgeram et al., 2007; Zubr, 1997).

A number of agronomic attributes make camelina particularly attractive for production in geographic regions such as the Great Plains of North America. Camelina grows well on marginal land and has been shown to surpass yields of oilseed crops such as flax under drought-like conditions (Bramm et al., 1990; Zubr, 1997). Furthermore, camelina has a moderate-to-low requirement for nutrients and a low seeding rate, making input costs low (Pilgeram et al., 2007; Zubr, 1997). Because camelina reaches maturity in only 85–100 days, it can be used in double-cropping systems with crops such as winter wheat (Putnam et al., 1993).

For research and technological development, camelina surpasses other oil crops in efficiency with respect to generation time and ease of transformation. Camelina is amenable to Agrobacterium-mediated transformation by simple floral dip infiltration under vacuum (Lu and Kang, 2008), a procedure commonly used in Arabidopsis research laboratories throughout the world. With this method, transgenic camelina lines can be generated 6–8 weeks after transformation. By contrast, soybean transformations typically require 6–10 months for the generation of transgenic seed following the initial biolistic or Agrobacterium delivery of genes to cells as well as extensive tissue culture maintenance of plant material (Wang et al., 2012). Because of these attributes, camelina can be quickly engineered for improved seed quality and agronomic traits with minimal technical expertise.

One target for biotechnological improvement is the fatty acid profile of camelina seed oil, which is not currently ideal for any single purpose. The high percentage of polyunsaturated fatty acids [35%–40% α-linolenic acid (18 : 3) and 20%–25% linoleic acid (18 : 2)] in camelina oil makes it highly prone to oxidation and suboptimal for fuel and bio-based lubricant applications (Frohlich and Rice, 2005). Additionally, the potential negative impacts of the C20 and C22 fatty acid content of camelina oil for fuel and lubricant applications are still unclear and warrant further investigation (Kramer et al., 1992). Transgenic approaches are beginning to show the effectiveness of single transgenes for seed composition modification. Using a single FATTY ACID DESATURASE 2 (FAD2)-antisense suppression, construct to modify camelina resulted in the generation of mid-oleic acid (18 : 1∆9) lines, but did not reduce levels of C20 and C22 fatty acids (Kang et al., 2011).

Camelina agronomics and breeding programmes are emerging with hopes for the development of marketable traits, as have been successful over the past 20 years in canola (Putnam et al., 1993). Initial genetic characterization of important agronomic traits by Gehringer et al. (2006) used AFLPs to construct a genetic map of camelina, with QTLs for seed yield, oil content, 1000-seed mass and plant height. In a recent molecular marker-based survey, Ghamkhar et al. (2010) evaluated four oil quality measures with respect to the geographic origins of 53 accessions, showing a strong association between environmental adaptation and oil content, and a high genetic diversity within the species that could be exploited in breeding. These efforts are currently limited by lack of publicly available sequence data.

A diploid inheritance of camelina traits was observed by Gehringer et al. (2006), while skewed segregation of some AFLP markers and extra amplification of Brassica-derived SSR markers suggested genomic duplication or polyploidy. Recent findings by Hutcheon et al. (2010) show that several genes known to be single copy in Arabidopsis thaliana and other diploids, including FAD2 and FATTY ACID ELONGASE 1 (FAE1), are present in three copies in the C. sativa genome, evidence that C. sativa may be allohexaploid, which has significant implications for camelina crop improvement through breeding.

Here, we present expressed gene sequences from C. sativa developing seeds. This report provides a detailed profile of transcripts for potential targets for seed quality improvement through biotechnology and breeding and represents an advance towards the development of camelina genomics tools. The 454 pyrosequencing combined with data from Sanger-based EST sequencing of developing camelina seeds was used to build a seed transcriptome reference. Sequence alignments showed high conservation in transcripts for genes associated with seed quality traits among camelina, Arabidopsis and canola taxa. Homology among genes acting in acyl lipid metabolism is presented, and a publicly available sequence catalogue was developed for camelina seed quality enhancement. We show further that highly abundant seed storage proteins (SSPs) patterns in these taxa are closely comparable. In addition, we demonstrate the value of the seed transcriptome for targeted suppression of genes that modify SSP compositions and oil quality traits for feed and biodiesel applications. Furthermore, data are presented highlighting the value of camelina for translation of Arabidopsis-based research.

Results and discussion

The transcriptome is a broad seed gene resource

Upon sequencing and assembly of >2 million 454 pyrosequencing reads from developing camelina seeds at 15–20 days after pollination (DAP) and 2047 Sanger reads of cloned ESTs, more than 60 000 camelina transcripts were identified (Table 1). The camelina transcriptome provided here includes both assembled transcripts (isotigs and unbroken contigs from the Newbler assembly) as well as singletons, and the dataset is trimmed for elements with a minimum length of 100 nucleotides. The data assembly is available as a BLAST database at www.camelinagenome.org and additionally as unassembled reads at NCBI accessions SRA056520 (454 reads), JZ030844JZ032890 (dbEST submission; LIBEST accession numbers 027979 and 027980), and NCBI BioProject PRJNA167924. A total of 103 276 searchable elements in the database were unified in the Newbler assembly into 22 597 isogroups and 43 060 singleton sequences that provide single-pass reference for additional transcripts (Table 1). With the expectation that at least some close paralogs and homeologs would be expressed together in seeds, we performed a survey of polymorphic transcripts by mapping back the original reads to the assembly and estimating variants for isogroups of greatest interest. The assembly reference as a whole, however, intentionally remains compressed with respect to possible homeologs, paralogs and alleles, that is, transcripts sharing >90% identity (40-bp overlap) remain assembled together. These were used for comparative functional genomics surveys below.

Table 1. Summary of camelina transcript data sequenced and assembled
Data typeReads sequencedData (nts)Assembled databaseIsogroups
N, quality filteredL mean ± SDL medianTotal elementsSingleton 454 readsSingleton ESTsIsotigsTranscript statistics
454 GS-FLX Titanium2 013 672373 ± 129418748 486 621103 27643 0602960 187

Mean isotig length ± SD 1492 ± 899

longest isotig length 23 135

22 597
Sanger EST clones2047555 ± 1696011 135 358

Mean reads coverage depth 14×

single-isotig isogroups 10 764 (18%)

More than 95% of isogroups showed conservation with either TAIR10 or dbEST plant entries as evidenced by camelina tBLASTX hits with E-values <1 e−5. Of hits to TAIR10 proteins, 89% of isogroups (68.3% of individual transcripts) matched with E-values <1 e−10, and of those with TAIR10 hits of lower significance, half showed substantially higher significance in hits to NCBI dbEST entries representing other Arabidopsis data sets (A. thaliana or A. lyrata) or Brassica cDNA collections. About 16.7% of the camelina transcriptome data set did not significantly match Arabidopsis or Brassica protein sequences. Notwithstanding the fact that the vast majority of these (91%) were singletons and therefore less complete/accurate, based on NCBI NR database matches, many appear to be expressed portions of transposable elements conserved with those catalogued in the A. thaliana and A. lyrata genomes.

About half (45%) of all camelina BLASTX hits to TAIR10 showed ≥90% primary sequence identity to Arabidopsis proteins, with an average identity overall of 81%, suggesting very high conservation in seed functions. The majority of proteins involved in protein translation, folding and secretion, ubiquitination and proteolysis, among other highly conserved functions, were identical or nearly identical to those in Arabidopsis.

A high degree of transcriptome completeness was apparent when we surveyed a variety of measures. Full-length coding sequences made up a large proportion of the data set, evidenced by coverage of homologous Arabidopsis proteins. 10 647 camelina transcripts covered 100% of the length of TAIR10 representative gene model coding regions; 30 887 covered more than 80% of these lengths.

Our camelina seed sampling was limited to mid-developmental stages in which the embryo matures, cells expand and storage reserves accumulate, with similar morphology to Brassica napus seeds between 3 and 4 weeks of development (Hajduch et al., 2006). We were curious to estimate the extent to which our transcriptome represents a complete set of seed-expressed genes, although technically comparable data sets from other plants are not available. In a comprehensive profiling of transcripts using microarrays, genes expressed in Arabidopsis seeds from very early to late developmental stages were reported recently (Le et al., 2010). After cleaning from this list currently obsolete TAIR10 gene models and accounting for redundancy in ATH1 probe hybridization, only 10 Arabidopsis genes, or 2.6% of the Le et al. Arabidopsis seed expression survey, appeared to lack homologues in our camelina sequenced data set.

Overall, these results demonstrate that the transcriptome sequence reference presented here provides comprehensive representation of expressed genes in developing seed/embryos and will therefore broadly serve future studies aimed at camelina seed quality enhancement.

Developing camelina seeds contain abundant SSPs: potential targets for high-value protein production in C. sativa

The camelina developing seed transcriptome provides important information about SSP genes that can ultimately be used for meal improvement, as described below. Two types of SSPs occur in Arabidopsis, B. napus and other Brassicaceae crops: 12S globulins (referred to as cruciferins) and 2S albumins (referred to as napins; Herman and Larkins, 1999). SSPs are the main source of protein in meal and thus represent a potential target for modification (Schmidt and Herman, 2008). Whether it is increasing the protein content of the meal, replacing the SSPs with a more valuable protein, or decreasing the protein content to drive flux of energy reserve intermediates to a metabolic pathway of higher priority, SSP genes represent gainful targets for manipulation (Boothe et al., 2010).

To compare SSPs found in camelina with Arabidopsis and Bnapus, proteins were extracted from seeds and subjected to SDS-PAGE. Coomassie staining indicated that the predominant proteins in each species are similar in size (Figure 1). Mass spectrometry confirmed the presence of the large and small subunits of the 2S albumins and the α and β subunits of the 12S globulins in all three species (data not shown).

Figure 1.

Comparison between Arabidopsis, camelina and Brassica seed storage proteins. Equal loadings of protein extracts (12 μL) from 100-mg seeds from each species were separated by 15% SDS-PAGE. Gels were stained with Coomassie for band detection. The α and β subunits of the 12S and the large (L) and small (S) subunits of the 2S proteins are noted. Protein identities were confirmed by mass spectrometry (data not shown).

In Arabidopsis, the 2S albumins are encoded by five genes designated SESA1-5 (Accession numbers P15457, P15458, P15459, P15460 and Q9FH31, respectively; Krebbers et al., 1988; Van der Klei et al., 1993). Interestingly, at least eight transcripts coding for 2S albumins in camelina were identified in our assembly of EST and 454 reads and confirmed with the generation and sequencing of individual PCR products (Figure 2a and Figure S1). The camelina 2S protein sequences all share a high degree of identity among each other, ranging from 71% to 98%.

Figure 2.

Evolutionary relationship of camelina, Arabidopsis and Brassica 2S and 12S seed storage proteins (SSPs). 2S (a) and 12S (b) camelina contigs were aligned with Arabidopsis and Brassica homologues using ClustalW (Thompson et al., 1994). Phylogenetic trees were built with the MEGA4 software, using the neighbour-joining method (Tamura et al., 2007). The Arabidopsis thaliana and Brassica napus sequences used for (a) are AtSESA1–5 (Accession numbers P15457, P15458, P15459, P15460 and Q9FH31; respectively) and BnSESAE, BnSESA2, BnSESA4, BnSESAB, BnSESA3 and BnSESA1. The A. thaliana and B. napus sequences used for (b) are AtSESCRU1–4 (Accession numbers Q96318, Q9ZWA9, P15456 and P15455, respectively), and BnSESCRU1, BnSESCRU2, BnSESCRUA, BnSESCRU3 and BnSESCRU4. Accession numbers for the camelina, and Brassica SSPs are shown in the trees.

SESCRU1-4 encodes Arabidopsis 12S globulins (Accession numbers Q96318, Q9ZWA9, P15456 and P15455, respectively; Pang et al., 1988; Theologis et al., 2000). Seven full-length genes and ten partial-length genes annotated as 12S-related proteins were identified in our camelina transcriptome (Figure 2b). Each of the full-length 12S proteins of camelina shows a higher degree of identity with homologues in Arabidopsis than with corresponding B. napus proteins. For example, C. sativa SESCRU1 (CsSESCRU1) is 92% identical to the 12S Arabidopsis SESCRU3 protein and only 84% identical to the 12S SESCRU4 protein in B. napus (Figure S2). Additionally, CsSESCRU6, CsSESCRU4 and CsSESCRU5 all share 90% amino acid sequence identity to the AtSESCRU4 protein and only about 80% identity to the B. napus SESCRU1 protein (Figure S3).

The camelina developing seed transcriptome can be used to modify SSP composition

To demonstrate the value of the transcriptome for seed quality modification, we sought to suppress camelina SSP expression based on our sequence information. Alteration in SSP composition can impact both livestock feed and industrial uses of the seed meal (Holding and Larkins, 2008). A chimeric hairpin construct was made using fragments of the camelina 2S CsSESA7 and camelina 12S CsSESCRU3 sequences (Figure 3a) with the transgene under control of the strong seed-specific promoter for the soybean glycinin-1 gene. Knockdown of camelina 2S proteins was nearly complete in eight individual T1 transgenic seeds based on SDS-PAGE analyses, compared with the wild-type control (Figure 3b). These results indicate that the hairpin sequence derived from CsSESA7 may indeed recognize all of the camelina 2S transcripts.

Figure 3.

Seed storage protein (SSP) suppression in camelina. (a) Transgene of the double 2S and 12S camelina SSP hairpin used for RNAi suppression. Protein extracts (40 μg) from eight different T1 DsRed-positive individual seeds and one control seed (b) and six individual seeds from two different T2 lines and controls (c) were separated by 15% SDS-PAGE. Gels were stained with Coomassie for band detection.

In contrast, our construct was not able to reduce overall 12S protein levels in seeds as evidenced from protein staining in SDS-PAGE, where little to no reduction is shown (Figure 3b,c). To further investigate the protein species present in these seeds, peptide analysis was carried out on gel-isolated camelina 12S β and α subunits, revealing the best match among Arabidopsis proteins to be the subunits of SESCRU4 (data not shown). The hairpin sequence used more closely matched the Arabidopsis SESCRU2 protein (93%, as compared to 53% for SESCRU4). We suspect that a camelina 12S hairpin sequence that more closely matches the Arabidopsis SESCRU4 sequence might lead to a more obvious knockdown effect on the camelina 12S proteins. An alignment of 12S protein sequences showed greater sequence divergence than that among 2S sequences (Figures S2 and S3). The hairpin derived from CsSESCRU3 is likely to have hybridized more selectively among camelina 12S transcripts. Our results show broad reduction in 2S SSPs and selective suppression of members of the 12S SSP family. It is likely that refining construct design using alternative 12S gene sequences would target the 12S proteins more broadly. It is also possible that seed development is more sensitive to overall 12S protein loss than to 2S loss and thus warrants further investigation.

Analysis of T2 seeds revealed that camelina 2S SSP suppression is stable (Figure 3c). Moreover, the suppression did not have any detrimental effects on seed germination under greenhouse conditions (results not shown). When total seed protein content and oil content were measured in the SSP RNAi lines, there was no significant difference between the content in wild-type seeds and two different homozygous RNAi lines, SSP RNAi 2 and SSP RNAi 3 (Figure S4). Lower expression of SSPs could lead to a change in the accumulation of other important components in the seed, including other proteins, oil and starch, which require more in-depth proteomic evaluation. In addition, introduction of high-value foreign proteins could make camelina seed meal a more attractive source of co-products for biofuel and industrial oil production.

Genes related to fatty acid and oil synthesis are well-represented in the camelina developing seed transcriptome

Previously an acyl lipid metabolism gene database was developed for Arabidopsis (Beisson et al., 2003), and this was recently expanded from deep transcriptional profiling of developing seeds of four different oilseed species (Troncoso-Ponce et al., 2011). This significant endeavour resulted in detailed sequence information from more than 350 genes encoding for proteins involved in lipid metabolism and provides a useful resource for construction of additional oilseed lipid databases (http://aralip.plantbiology.msu.edu/).

To build a similar database for camelina, we queried the transcriptome assembly to predict camelina orthologs of Arabidopsis lipid metabolic pathway components using a reciprocal best-hits BLAST (RBH) approach. Protein sequences of the representative Arabidopsis gene model (AGI indicated) from the lipid metabolism database above were used to query the camelina transcriptome using tBLASTN, and in parallel, the camelina transcriptome was used with BLASTX to query the representative gene model proteins of TAIR10. For any given tBLASTN AGI hit, we retained the top hit of each camelina transcriptome element (reduced to isogroup or singleton), based firstly on bit score and secondly on E-value. RBH pairs are those tBLASTN AGI outputs that matched the AGI output of the top camelina to TAIR10 BLASTX hit. Both RBH pairs and members of likely co-orthologous groups are indicated in Table S1.

From 571 Arabidopsis genes related to lipid metabolism queried, 825 putative unique camelina genes or co-assembled homeologous gene sets were found. About 457 (80%) represented reciprocal best BLAST hits and therefore likely orthologs to 635 unique camelina isogroups.

Ninety-eight (17%) of the Arabidopsis queries matched camelina genes well but did not survive the RBH test. Camelina isogroups not already assigned a putative orthology in the RBH set but that aligned well to more than one Arabidopsis sequence along with remaining Arabidopsis queries that matched camelina genes already assigned in the RBH analysis, indicating lesser relative homology, are considered putative co-orthologs (Table S1). It is important to note that because our transcriptome contains only expressed genes of seeds rather than a whole-genome complement, accurate orthology cannot be determined. Figure 4b shows a portion of the database we prepared at www.camelinagenome.org.

Figure 4.

Acyl-CoA-dependent and Acyl-CoA-independent routes to triacylglycerol (TAG) biosynthesis in camelina and the Camelina acyl lpid metabolism database. (a) Overview of acyl-CoA-dependent and acyl-CoA-independent flux between diacylglycerol (DAG) and TAG (b) Screenshot of the Camelina acyl lipid metabolism database. Abbreviations for (a): DGAT, diacylglycerol acyltransferase; PDAT, phosphatidylcholine: diacylglycerol acyltransferase; PDCT, phosphatidylcholine: diacylglycerol cholinephosphotransferase; LPCAT, 1-acyl-glycerol-3-phosphocholine acyltransferase; PC, phosphatidylcholine; LPC, lysophosphatidylcholine. Camelina candidate genes involved are shown in the pathway as isogroup number or singletons (a).

In the seed transcriptome presented here, transcripts for all plastid-localized de novo fatty acid biosynthetic pathway genes and all ER-associated triacylglycerol (TAG) synthesis enzymes were identified. Also found were several genes of current interest implicating a more complex route to TAG formation, including camelina homologues of At1g12640 and At1g63050, 1-acyl-glycerol-3-phosphocholine acyltransferase (LPLAT/LPCAT) genes that are involved in phosphatidylcholine acyl editing (Bates et al., 2012; Figure 4a). The acyl editing activity of LPCAT was shown to play an important role in phosphatidylcholine: diacylglycerol acyltransferase 1 (PDAT1) catalysed TAG biosynthesis in an Arabidopsis diacylglycerol acyltransferase 1 (DGAT1) mutant (Xu et al., 2012; Figure 4a).

In addition to the acyl-CoA-dependent enzymes mentioned above, transcripts for acyl-CoA-independent phospholipid: diacylglycerol acyltransferase (PDAT) and phosphatidylcholine: diacylglycerol cholinephosphotransferase (PDCT/ROD1) were identified (Figure 4a). PDAT synthesizes TAG from phosphatidylcholine (PC) and diacylglycerol (DAG), and recently, it was shown to contribute along with DGAT1 to TAG biosynthesis (Dahlqvist et al., 2000; Zhang et al., 2009; Xu et al., 2012). PDCT transfers a phosphocholine headgroup of PC to a DAG molecule at the sn-3 position (Lu et al., 2009). This headgroup exchange allows 18 : 1 containing DAG molecules to be converted into PC for further desaturation or other modifications and for resulting modified fatty acids on PC to return to DAG molecules for TAG biosynthesis, thus modifying the fatty acid composition in TAG (Lu et al., 2009; Hu et al., 2012).

SNPs in assembled transcripts estimate complexity in the camelina developing seed transcriptome

Use of a single set of overlap parameters does not allow for a separate assembly of all transcripts in a data set as complex as that of camelina. Sometimes transcripts from close paralogs, homeologs or alleles will be compressed into a single isotig when the entire reads set is used for a transcriptome assembly. Here, we subjected singletons to additional assembly (see 'Developing camelina seeds contain abundant SSPs: potential targets for high-value protein production in C. sativa'), potentially further combining slightly polymorphic transcripts into isotigs. The consensus sequence of a mixed-variant isotig is useful for the identification of functional homology, our primary goal in this study; however, lost are single-nucleotide polymorphisms (SNPs) present in the reads. These positions of variation would reveal evolutionary distance, functional and regulatory diversity in camelina genes and could be used for primer/probe design for further investigation. To add detail to the RBH analysis for lipid biosynthesis genes, therefore, we estimated variation within the isotigs listed in Table S1.

Briefly, reads that made up each assembled isotig were mapped back to its consensus sequence. With restrictions on quality and depth support for SNP identification and on overall length match to original isotig (Appendix S1), we counted the number of variants that could be formed from overlapping SNPs in the reads. Reads that disagreed with respect to any SNP(s) were used as evidence for other variant(s). Minimum variants and the fraction of reads supporting them are showing in Table S1, columns 18 and 19, respectively. In columns 16 and 17 of Table S1, we also indicate maximum number of variants that can be estimated from the data. In each case, variants contain no conflict among SNPs in overlapped regions. The minimum variant estimation requires all reads with nonconflicting polymorphism to assemble into a single variant, whereas the maximum variant estimation allows for all combination of reads that could possibly assemble based on sequence overlaps. The latter permits noise, reads combinations that are unlikely to be biologically real; however, we show this estimation to indicate that for some isotigs, there is evidence that given more sequencing depth and/or reassembly of reads, additional variants might be found. Each isotig in our web database, www.camelinagenome.org, is linked to the reads that make up each isotig, found using the ‘Show Reads’ option within the camelina BLAST server. These are provided to selectively reassemble reads and/or design primers/probes for whole transcript validation.

The variant estimates in Table S1 provide a valuable starting point for further study of camelina lipid metabolism genes, while direct validation of transcripts and a whole-genome assembly of camelina will provide greater clarity for the separation of close paralogs, homeologs and expressed alleles. Figure S5 summarizes estimated transcript variation in the overall camelina lipid biosynthesis database with respect to the minimum variants calculated in the above analysis. 40% of the database is contained in isotigs that may be true single transcripts. Singletons make up 15% not subjected to variant analysis, leaving approximately 45% of available isotigs that may be expanded into additional individual transcript sequences upon further experimental analysis.

The camelina developing seed transcriptome is useful for oil modification and the close relationship between Arabidopsis and camelina allows for translational research for oil enhancement

In addition to its use for seed protein modification, the camelina developing seed transcriptome is a valuable resource for oil modification. More than 50% of camelina oil is composed of polyunsaturated fatty acids [35%–40% α-linolenic acid (18 : 3) and 20%–25% linoleic acid (18 : 2)], rendering its use as a biofuel limited due to the high rates of oxidation of these fatty acids. A more ideal biodiesel blend, for example, would have high proportions of the monoenoic oleic acid (18 : 1), which has greater oxidative stability than polyunsaturated fatty acids and better cold flow properties than saturated fatty acids (e.g. 16 : 0 and 18 : 0; Durrett et al., 2008). FAD2 encodes an ER membrane-bound desaturase catalysing conversion of oleic acid to linoleic acid, and FAE1 encodes an enzyme that initiates the addition of 2 carbon units to 18 carbon fatty acyl-CoAs leading to the synthesis of very long-chain fatty acids. These two genes are consequently priority targets for genetic manipulation in camelina towards lowering the polyunsaturated fatty acid content and increasing the oleic acid content. Hutcheon et al. (2010) used Southern blot analyses and cloning to determine that there are three copies of each gene, with each copy showing more than 90% amino acid sequence identity to the Arabidopsis homologue. Seed-specific antisense suppression by Kang et al. (2011) of the camelina FAD2 gene resulted in transgenic camelina lines with as high as approximately 50 wt% oleic acid content and a reduced 18 : 2 (6 wt%) and 18 : 3 (11 wt%) content compared with wild type, but no reduction in the content of C20 and C22 fatty acids.

According to our RBH analysis, camelina isogroup 03202 was orthologous to Arabidopsis FAD2. Upon inspection of the 132 454 reads that were assembled into this isogroup, we confirmed three variants closely resembling the cloned FAD2-A, -B and -C sequences previously reported (Hutcheon et al., 2010), sharing ≥98% identity.

We amplified the FAD2-C cDNA for use in RNAi experiments (Figure 5a). Upon screening 24 FAD2 RNAi lines, we were able to attain about 50 wt% oleic acid in camelina seed oil (Figure 5c). The 18 : 2 and 18 : 3 content was reduced to approximately 4.5 and 13 wt%, respectively, in the top-performing homozygous RNAi line. To further increase the oleic acid content in seed oil of camelina and reduce the content of C20 and C22 fatty acids, a camelina FAE1 RNAi expression cassette was introduced into the same binary vector with the camelina FAD2 RNAi cassette. The sequence amplified from a camelina seed cDNA library based on our reference set was identical to the previously reported FAE1-A gene (Hutcheon et al., 2010). From 19 independent lines screened, we identified a line with as high as 70 wt% oleic acid (Figure 5a,c). The double FAD2/FAE1 RNAi seeds contained only about 4 wt% 18 : 2 and 8 wt% 18 : 3 compared with 17 wt% 18 : 2 and 36 wt% 18 : 3 in wild-type seeds. In addition, C20 and C22 fatty acids were reduced from approximately 17 wt% of total fatty acids in wild-type seeds to approximately 4 wt% in FAD2/FAE1 RNAi seeds. Of most significance, 20 : 1 levels were reduced from 12 wt% in wild-type seeds to 3 wt% in FAD2/FAE1 RNAi seeds. We acknowledge that transgenic lines were not generated for FAE1 RNAi suppression alone. However, the decrease in 20 : 1 and other very long-chain fatty acids and the corresponding increase in oleic acid content between the FAD2/FAE1 RNAi and FAD2 RNAi lines can be directly attributed to the FAE1 suppression.

Figure 5.

Camelina and Arabidopsis FAD2/FAE1 RNAi increase the oleic acid (18 : 1) content in seeds of transgenic camelina. Arabidopsis and camelina partial FAD2 (a) and FAE1 (b) sequences used for RNAi in (d) were aligned using ClustalW (Thompson et al., 1994). Fatty acid composition of seeds from wild type and top-performing homozygous FAD2 or FAD2/FAE1 RNAi lines generated using camelina FAD2 and FAD2/FAE1 sequences (cFAD2 RNAi and cFAD2/cFAE1; (c) and Arabidopsis FAD2 and FAD2/FAE1 sequences (aFAD2 RNAi and aFAD2/aFAE1 RNAi; (d) for RNAi targeting. Results shown are from seeds from the T3 generation. 18 : 2i, Δ9,15 isomer of linoleic acid.

A nearly identical lipid profile phenotype was generated in camelina lines transformed with a seed-specific RNAi suppression construct using the Arabidopsis FAD2 gene sequence as with the camelina FAD2 sequence, demonstrating cross-utility of sequence references in oil engineering of these two closely related plants. Up to 50 wt% oleic acid was achieved in seeds of the top-performing lines, compared with approximately 12 wt% oleic acid in wild-type seeds (Figure 5d). Further studies using a double Arabidopsis FAD2/FAE1 RNAi binary vector resulted in even higher levels of oleic acid in transgenic camelina seeds, up to approximately 66 wt% of total fatty acids (Figure 5d). The 18 : 2, 18 : 3, and C20 and C22 fatty acid components of the seed oil were also decreased in these lines to levels similar to those achieved with camelina homologues. Although the Arabidopsis FAD2/FAE1 RNAi seeds did have significantly more total oil compared with wild-type seeds, total oil content levels were between 30% and 40% in all transgenic lines studied (Figure S4b).

Currently, no camelina genome sequence is available, and its complexity and polyploidy are expected to present hurdles on future attempts at oil engineering and other modification based on selective gene suppression, particularly through breeding. The results presented here show that it is possible to alter the fatty acid profile of camelina with limited genomic information, based on homology to equivalent Arabidopsis genes. The ability to achieve similar oil composition phenotypes in camelina using Arabidopsis homologues for gene suppression highlights the value of camelina for translating findings in Arabidopsis to a crop plant, which is desirable for showing agronomic relevance of discoveries in Arabidopsis. In addition, the high oleic acid phenotypes achieved here can serve as optimized oil platforms for more extensive metabolic engineering to achieve higher value traits, such as sn-3-acetyl TAGs or wax esters for drop-in biofuels or specialty lubricants (Carlsson et al., 2011; Durrett et al., 2010).

Conclusions

In this study, C. sativa genomics tools were enhanced with detailed transcript analyses of developing seeds, including genes recognized as targets for meal and oil improvement. The sequences identified in this study were highly homologous to genes in Arabidopsis and B. napus. Camelina seed protein composition in transgenic plants was dramatically altered by SSP suppression based on sequence information from the seed transcriptome. The potential beneficial impacts of this genetic modification are yet to be explored: whether lowering protein content in the meal can increase oil content either directly or indirectly, or production of more valuable proteins can be achieved. Moreover, camelina oil composition was tailored towards a better biofuel and bio-based lubricant composition, with generation of transgenic lines with high oleic acid content and low polyunsaturated and very long-chain fatty acid content. This trait was achieved by means of RNAi suppression of genes for key enzymes involved in fatty acid modification, using not only camelina sequences identified in the transcriptomic data but also Arabidopsis sequences, showing the close relationship between the two species and the utility of camelina for translation of findings in Arabidopsis. Genes relevant to seed lipid metabolism in camelina, and their variants, were determined in a detailed sequence alignment survey and compiled into a publicly accessible database. This database will support future breeding and engineering efforts to modify camelina oil compositions for industrial or fuel purposes.

Experimental procedures

RNA isolation from developing seeds and cDNA library construction

Total RNA was isolated from C. sativa (cv. Sunesson) seeds removed from pods with seed coats intact at 10–13 and 15–20 DAP. Seeds were collected from greenhouse grown plants and immediately frozen in liquid nitrogen and stored at −80 °C until use in RNA isolation.

For seeds collected at 10–13 DAP, total RNA was isolated according to a method described previously (Mattheus et al., 2003). In brief, developing seeds were ground to a fine powder in liquid nitrogen. The powders were transferred to a chilled centrifuge tube containing cold extraction buffer consisting of 100 mm Tris–HCl, pH 8.0, 50 mm ethylenebis (oxyethylenenitrilo)tetraacetic acid, pH 8.0, 100 mm sodium chloride, 1% 6-(p-toluidino)-2-naphthalenesulphonic acid, 6% sodium p-aminosalicylic acid, 1% SDS, 1% PVP-40, 3% PVPP: chloroform and 1% β-mercaptoethanol. The sample was centrifuged for 10 min at 10 000 g at 4 °C. An equal volume of chloroform was added to the recovered supernatant, and the mixture was vortexed for 2 min and centrifuged for 10 min at 10 000 g at 4 °C. The recovered aqueous phase was extracted twice with phenol/chloroform (1 : 1, v/v) and extracted once with chloroform. The RNA was precipitated overnight with 0.1 volume of 3 m sodium acetate (pH 5.2) and 2.5 volume of 95% ethanol at −20 °C. The RNA was precipitated by centrifugation for 30 min at 10 000 g at 4 °C, rinsed once with 70% ethanol, briefly dried and dissolved in DEPC-treated water. Total RNA was isolated from seeds collected at 15–20 DAP using a method described by Suzuki et al. (2004).

PolyA+-RNA was enriched from total RNA by two passes over oligo-dT cellulose columns from an Illustra mRNA purification kit (GE Healthcare, Little Chalfont, UK). PolyA+-RNA from developing seeds at 15–20 DAP was used for 454 sequencing, and polyA+-RNA from both stages was used to prepare cDNA libraries for Sanger EST sequencing. A cDNA library for seeds at 10–13 DAP was prepared using the kit SuperScript Plasmid System for cDNA Synthesis and Plasmid Cloning with Gateway Technology (Invitrogen, Grand Island, NY). cDNA inserts were cloned directionally 5′ to 3′ in the SalI and NotI sites of the vector pSPORT1. A cDNA library for seeds at 15–20 DAP was prepared using the CloneMiner cDNA library kit (Invitrogen) with directional cloning by topoisomerase reaction into the pDONR222 vector. Aliquots of cloning reactions for each library construction were introduced into Escherichia coli DH10B Electromax (Invitrogen) cells by electroporation. The cDNA library prepared from seeds at 10–13 DAP was designated Cam1 (LIBEST accession 027979), and the cDNA library prepared from seeds at 15–20 DAP was designated CsDS2 (LIBEST accession 027980).

Sanger-based EST sequencing of cDNA clones

Plasmids were isolated from randomly picked colonies from the Cam1 and CsDS2 libraries in E. coli. cDNA inserts in the isolated plasmids were subjected to Sanger-based DNA sequencing from their 5′ ends using a T7 primer in the case of the Cam1 library (LIBEST accession 027979) or M13 forward primer in the case of the CsDS2 library (LIBEST accession 027980) as described. Sequences were trimmed to remove vector and low-quality sequences. Tentative identities of cDNAs were assigned following BLASTX and BLASTN analyses of trimmed sequences.

454 transcriptome sequencing

About 200 ng of polyA+-enriched RNA was used in the preparation of a single sequencing library with custom adaptors according to methods of Carter, Smith and Mockaitis, in press. The double-stranded cDNA library intermediate was partially normalized by DSN treatment (Evrogen, Moscow, Russia) to reduce the representation of the transcripts of greatest abundance. Shearing prior to adaptor ligation was by nebulization (30 s, 30 psi). The final library was assessed on a Bioanalyzer DNA7500 chip (Agilent) and showed a peak size of 660 bp.

Emulsion PCR and sequencing were carried out according to the manufacturer (Roche/454 Sequencing, Branford, CT). Two regions of two-region plus three regions of four-region GS-FLX Titanium PicoTitre™ plate were run to 800 cycles.

Transcriptome assembly

Reads were cleaned of adaptor sequences (http://sourceforge.net/projects/estclean/) and assembled together with the Sanger-sequenced EST clone reads, using NEWBLER v2.3 GSAssembler (Roche/454 Sequencing) with default parameters for cDNA (40 bp overlap; 90% identity). Using NEWBLER 2.3 GSMapper, 7.5% of singletons mapped back to this assembly. Finally, 253 additional isotigs were assembled after subjecting the remaining singletons to another round of assembly as above.

The camelina transcriptome provided here includes both assembled transcripts (isotigs and unbroken contigs from the Newbler assembly) as well as singletons, and the data set is trimmed for elements with a minimum length of 100 nts.

Phylogenetic tree building and sequence alignments

Phylogenetic trees were generated after ClustalW sequence alignment with the MEGA4 software neighbour-joining tree application and 1000 bootstrap replicates (Tamura et al., 2007; Thompson et al., 1994).

Homology analysis and construction of the Camelina acyl lipid metabolism gene database

The camelina transcriptome assembly was matched by BLASTX (Altschul et al., 1990), using BLOSUM62 scoring matrix and a word size of 3, to protein sequences of TAIR10 representative gene models (www.arabidopsis.org) with an E-value limit of 1 e−5. The top hit(s) for each query sequence was retained based on best bit score and E-value. Secondly TAIR10 models (above) were matched to camelina assembly elements (isotigs and singletons) using tBLASTN with an E-value limit of 1 e−5. Candidate acyl lipid metabolism gene sequences were retrieved from the Arabidopsis Lipid Gene database (http://lipids.plantbiology.msu.edu/), and BLAST result sets above were trimmed to include only these genes. The best isotig for each isogroup was retained, trimming out putative alternative transcripts of the same gene.

The Camelina acyl lipid metabolism gene database (www.camelinagenome.org) was constructed using Dreamweaver MX version 7.0.1 (Macromedia Inc., San Francisco, CA) and custom scripts written in PERL v5.10.1. The BLAST server uses NCBI BLAST 2.2.22.

Analysis of lipid metabolism gene variants within isotigs

A custom Perl script was written to estimate variation within isotigs shown in Table S1. Briefly, reads were mapped back to the consensus isotig they comprised, and SNPs were determined. Putative transcript variations were called based on overlap of shared SNPs among reads, with one estimate (maximum variants) allowing for all possible reads overlap combinations and one estimate (minimum variants) restricting polymorphism into the minimum number of transcripts. The script and its full description are provided in www.camelinagenome.org, and see also Supporting information.

SDS-PAGE analyses of SSPs

Briefly, crude SSPs were extracted from seeds by grinding 100 mg mature seeds in 200 μL of extraction buffer [100 mm Tris–HCl, pH 8.0, 0.5% SDS (w/v), 10% glycerol (v/v) and 2% β-mercaptoethanol (v/v)]. Extracts were boiled for 3 min and centrifuged for 3 min at 16 300 g. The 12 μL from each supernatant was subjected to 15% SDS-PAGE. Protein gels were stained with Coomassie Brilliant Blue R250 for 30 min and de-stained in 10% glacial acetic acid (v/v)/40% methanol (v/v) until bands were visualized. For the transgenic SSP camelina seed analyses, single seeds were ground in 100 μL of extraction buffer, and 40 μg protein from each supernatant was subjected to 15% SDS-PAGE. Total seed protein was measured as described in Supporting information.

Mass spectrometry analyses

Mass spectrometry analyses were carried out by the Nebraska Center for Mass Spectrometry (NCMS) at the University of Nebraska-Lincoln using protein spot picking and mass spectrometry methods as described (Shevchenko et al., 1996; Schweitzer et al., 2012) with slight modifications described in Supporting information.

Confirmation of individual camelina SSPs

To confirm the camelina 2S SSP transcripts represented distinct genes, specific primers for each gene were designed to PCR-amplify candidates from a camelina cDNA library (Table S2). PCR products were confirmed with sequencing.

Preparation of RNAi suppression constructs for camelina SSP and FAD2/FAE1, and Arabidopsis FAD2/FAE1 genes

Detailed descriptions of the construction of RNAi suppression constructs for the preparation of camelina SSP and FAD2/FAE1, and Arabidopsis FAD2/FAE1 genes are provided in the Supporting information.

Camelina transformation and selection of transformants

Camelina plants were grown under greenhouse conditions with 14-h day length (24–26 °C) and 8-h dark (18–20 °C) with natural and supplemental lighting at 400–500 μmoles/m2/s. Transgenic camelina lines were generated according to the Agrobacterium-mediated method of Lu and Kang (2008). DsRed-positive seeds were identified using a green LED flashlight with a red camera filter lens (Lu and Kang, 2008). For Basta-resistant lines, T1 seeds were planted in flats, and the first true leaves were allowed to emerge. A 0.003% Basta solution was sprayed onto the surface of the leaves at 3- to 4-day intervals. Resistant young plants were transferred to individual pots and allowed to mature. For kanamycin-resistant lines, approximately 1 g of sterilized T1 seeds was spread on a 15-cm plate containing 1 × MS with vitamins (pH 5.7) medium with 1% sucrose, 0.28% phytagar and 50 μg/mL kanamycin. After 2 weeks of growth at room temperature under continuous light, resistant plants were planted and transferred to the greenhouse.

Analysis of camelina seed fatty acid composition

Fatty acid methyl esters (FAMEs) were prepared by transesterification with trimethylsulphonium hydroxide (TMSH; Butte, 1983). Single transgenic seeds were directly crushed in 50 μL of TMSH in glass GC vials. Heptane (400 μL) was added to each vial. After room temperature incubation with agitation for 30 min, FAMEs were analysed by gas chromatography as described (Cahoon et al., 2006). Total fatty acid content of seeds was measured as described in Supporting information.

Acknowledgements

The authors thank Zach Smith and James Ford (IU CGB) for preparing the 454 library and sequencing runs, respectively, and Aaron Buechlein (IU CGB) for assistance with data analysis. We also thank Rebecca Cahoon for technical assistance and for critique of the manuscript. Transcriptome sequencing was supported by the Center for Advanced Biofuel Systems (CABS), an Energy Frontier Research Center funded by the Camelina engineering studies were supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Award Number DE-SC0001295 to EBC. Camelina engineering studies were supported by the U.S. Department of Agriculture-Agriculture and Food Research Initiative 2009-05988, and NSF Plant Genome IOS 0701919 to EBC. The IU CGB was supported in part by the Indiana METACyt Initiative of Indiana University and by Lilly Endowment, Inc.

Ancillary