• Open Access

Optimization of the expression of the HIV fusion inhibitor cyanovirin-N from the tobacco plastid genome


(fax +49 331 567 8701; email rbock@mpimp-golm.mpg.de)


Plants with transgenic plastid (chloroplast) genomes represent a promising production platform in molecular farming, mainly because of the plastids’ potential to accumulate foreign proteins to very high levels and the increased biosafety conferred by the maternal mode of plastid inheritance. Although some transgenes can be expressed to extraordinarily high levels, the expression of others has been unsuccessful. Lack of detectable transgene expression is usually attributable to either RNA instability or protein instability. Here, we have investigated the possibilities to improve the production of a pharmaceutical protein that is difficult to express in chloroplasts: the HIV-1 fusion inhibitor cyanovirin-N (CV-N). Testing various N-terminal and C-terminal fusions to peptide sequences from two proteins known to accumulate to high levels in transgenic plastids (GFP and the protein antibiotic PlyGBS), we show that both low mRNA stability and low protein stability contribute to the lack of detectable CV-N expression in chloroplasts. Both problems can be alleviated by N-terminal fusions to the CV-N coding region, thus highlighting a suitable strategy for optimization of plastid transgene expression.


Techniques for inserting transgenes into the plastid (chloroplast) genome by genetic transformation have stirred considerable interest among plant biotechnologists (Maliga, 2004; Bock, 2007; Koop et al., 2007). Transgene expression in plastids offers a number of attractions, including high-precision transgene insertion via homologous recombination, absence of gene silencing mechanisms and position effects, convenient transgene pyramiding by co-expression from operons and greatly increased transgene confinement because of maternal chloroplast inheritance (Bock, 2001; Ruf et al., 2007; Svab and Maliga, 2007). Arguably the most alluring feature of chloroplast transformation is the potential to achieve extraordinarily high accumulation levels of foreign proteins (De Cosa et al., 2001; Tregoning et al., 2003; Molina et al., 2004; Zhou et al., 2008; Oey et al., 2009a,b). However, although a number of recombinant proteins could be expressed to levels >5% of the total soluble protein (TSP; reviewed, e.g., in Daniell et al., 2009; Bock and Warzecha, 2010) and, in the most extreme case, accumulation levels of >70% of TSP were achieved (Oey et al., 2009b), there is also a number of examples of foreign proteins that accumulated to only very low levels or could not be expressed to detectable levels at all (Birch-Machin et al., 2004; Bellucci et al., 2005). It appears that accumulation of foreign proteins in transgenic chloroplasts is often limited by protein stability (Birch-Machin et al., 2004; Zhou et al., 2008; Oey et al., 2009a,b), although lack of RNA stability can also be responsible for unsuccessful expression of plastid transgenes (Wurbs et al., 2007). In a few reported cases, N-terminal peptide fusions resulted in serendipitous stabilization of otherwise unstable recombinant proteins (Ye et al., 2001; Herz et al., 2005; Lenzi et al., 2008), but the molecular basis for this effect has remained unknown. Recent work has demonstrated that key determinants of plastid protein stability reside in the N-terminus. Different N-terminal fusions to a reporter protein, while not affecting translation rates, can cause large differences in protein accumulation (Apel et al., 2010).

Here, we have investigated factors that limit expression of a pharmaceutical protein that could not be expressed to detectable levels in plastids using standard approaches and evaluated possible strategies to improve transgene expression. Cyanovirin-N (CV-N) is a small protein (of 11 kDa) produced by the cyanobacterium Nostoc ellipsosporum. The physiological function of CV-N in the cyanobacterium is unknown. In screens for inhibitors that block entry of the human immunodeficiency virus (HIV) into human cells, CV-N was identified as a potent antiviral agent (Boyd et al., 1997). CV-N binds irreversibly to the HIV surface envelope glycoprotein gp120 targeting N-linked high-mannose oligosaccharides (Bewley et al., 2002; Botos et al., 2002). Masking of these sugar moieties blocks binding of the virus to its receptors on host cells, thus resulting in inhibition of attachment and fusion of the viral particles to the target cells and, in this way, ultimately preventing infection. Low nanomolar concentrations of CV-N inactivate all variants of HIV-1 and the protein is also active against HIV-2, SIV and herpes virus. The CV-N protein contains two intramolecular disulphide bonds, and structural analyses have revealed that the protein occurs as a dimer (Barrientos and Gronenborn, 2002; Matei et al., 2010). Several properties make CV-N an excellent candidate for development as a topical anti-HIV microbicide. These include its high specific activity against HIV, its extreme resistance to physicochemical degradation and its high safety (for review, see, e. g., de Clerq, 2000; Xiong et al., 2010). Preclinical development of CV-N is currently underway. Recombinant production of CV-N has been achieved in several systems, including bacteria and transgenic plants, but protein yields have been relatively low (Colleluori et al., 2005; Sexton et al., 2006; Gao et al., 2010; Xiong et al., 2010). Because development of CV-N as an affordable anti-HIV microbicide will require inexpensive mass production of the protein, we attempted expression of CV-N from the plastid genome of tobacco plants. As our initial efforts did not result in accumulation of detectable amounts of CV-N in chloroplasts, we decided to investigate the molecular cause of unsuccessful transgene expression and test possible strategies to improve CV-N expression levels in transplastomic plants.


Design of expression constructs for stabilization of cyanovirin-N in plastids

We initially attempted to express the anti-HIV polypeptide CV-N from the tobacco chloroplast genome using standard vectors for high-level transgene expression (Zhou et al., 2008). Although transplastomic plants were readily obtained, the CV-N protein was undetectable in Western blot analyses with anti-CV-N antibodies (see below and data not shown). As protein stability has recently emerged as a major limitation to foreign protein accumulation in transgenic plastids (Birch-Machin et al., 2004) and the N-terminal amino acid sequence has been found to harbour important determinants of protein stability in chloroplasts (Apel et al., 2010), we decided to explore the possibility of improving CV-N expression by protecting its termini with peptide sequences from proteins that had been demonstrated to accumulate to very high levels in chloroplasts. We chose two such proteins: the green fluorescent protein GFP and the phage endolysin protein PlyGBS. GFP is a standard reporter of gene expression and has previously been used in numerous transplastomic studies (Reed et al., 2001; Newell et al., 2003; Limaye et al., 2006; Stegemann and Bock, 2009). Expression levels are typically in the range of a few per cent of the total soluble protein (TSP; Reed et al., 2001), but can be considerably higher in some protein fusions (Limaye et al., 2006). The PlyGBS protein is a bactericidal protein with considerable potential as next-generation antibiotic. When expressed from the tobacco plastid genome, it accumulated to more than 70% of the TSP and it was demonstrated that these extraordinarily high expression levels are largely because of the protein’s extremely high stability inside chloroplasts (Oey et al., 2009b).

As CV-N is a relatively small polypeptide and it is well known that small proteins tend to be more unstable (Ortigosa et al., 2010), we first wanted to determine the effects of fusing long polypeptide sequences to the N-terminus or the C-terminus of CV-N. To this end, we fused the full-length gfp gene sequence in frame to the cv-n reading frame in three different ways (Figure 1). In vector pZE15, gfp precedes cv-n, thus producing a GFP::CV-N fusion protein. In vector pDK122, gfp follows cv-n, resulting in expression of a CV-N::GFP fusion. In vector pZE16, cv-n was embedded within the gfp coding region, thus producing a protein whose N- and C-termini are from GFP (Figure 1b).

Figure 1.

 Construction of transformation vectors for optimizing cyanovirin-N expression from the tobacco plastid genome. (a) Physical map of the targeting region in the plastid genome and structure of plastid transformation vectors containing cv-n expression cassettes. Genes above the line are transcribed from the left to the right, and genes below the line are transcribed in the opposite direction. The transgenes are targeted to the intergenic region between trnfM and trnG. The expression cassette consists of the Chlamydomonas rRNA operon promoter (Prrn) fused to the phage T7 gene 10 leader sequence (T7g10) and the atpA 3′ UTR (TatpA) from Chlamydomonas. Expected sizes of DNA fragments in restriction fragment length polymorphism (RFLP) analyses with the restriction enzymes BamHI or BglII are indicated. The location of the RFLP probe is shown as a black bar. The selectable marker gene aadA is driven by a chimeric rRNA operon promoter (Prrn; Svab and Maliga, 1993) and fused to the 3′ UTR from the tobacco psbA gene (TpsbA). (b) Overview of expression constructs analysed in this work. The names of the 14 plastid transformation vectors are given, and the foreign proteins expressed from them are schematically shown. The calculated molecular masses of all proteins are also given. N, N-terminal sequence from PlyGBS or GFP; C, C-terminal sequence from PlyGBS or GFP; aa, amino acids.

In addition to determining the influence of large N-terminal and C-terminal fusions, we also wanted to test whether or not the addition of short sequences to the termini of CV-N can improve protein accumulation. We, therefore, constructed a series of fusions to N-terminal and C-terminal sequences of different lengths that were taken from the highly stable endolysin protein PlyGBS. In steps of 10 amino acids, we systematically elongated the termini of the CV-N protein producing three sets of constructs each comprising (i) an N-terminal fusion, (ii) a C-terminal fusion and (iii) the double fusion to both termini. Varying the length of the fused PlyGBS sequences from 10 to 30 amino acids, this resulted in altogether nine fusion constructs (Figure 1b). Finally, constructs for expression of the unfused GFP (pDK133) and the unfused CV-N (pZE17) were generated as controls.

The gene constructs were inserted into an expression cassette consisting of the ribosomal RNA operon promoter (Prrn) fused to the strong translation initiation signals derived from the gene 10 leader of bacteriophage T7 (T7g10; Kuroda and Maliga, 2001) and the 3′ UTR sequence from the atpA gene. Promoter and 3′ UTR sequences were taken from Chlamydomonas reinhardtii to avoid unwanted recombination with the resident copies in the tobacco plastid genome (Rogalski et al., 2006, 2008). All expression cassettes were integrated into a standard plastid transformation vector containing a chimeric spectinomycin resistance gene (aadA) as selectable marker (Figure 1a).

Introduction of cv-n fusion genes into the tobacco plastid genome

The 14 constructed vectors (Figure 1b) were used to conduct plastid transformation experiments in tobacco using particle gun-mediated (biolistic) transformation (Svab and Maliga, 1993). The bombarded leaf samples were subjected to spectinomycin selection by exposing leaf pieces to antibiotic-containing regeneration medium. Spectinomycin-resistant lines were obtained with all transformation vectors. To eliminate spontaneous antibiotic-resistant mutants that can arise from specific point mutations in the plastid 16S rRNA gene, all primary spectinomycin-resistant lines were tested for double resistance to spectinomycin and streptomycin (Svab and Maliga, 1993; Bock, 2001). Identified doubly resistant lines were considered as true chloroplast transformants (transplastomic lines) and subjected to three additional regeneration rounds on spectinomycin-containing medium to isolate homoplasmic tissue. Successful transformation of the plastid genome and correct targeting of the transgenes to the intergenic spacer between the trnfM and trnG genes by homologous recombination (Figure 1a) was analysed by Southern blot experiments for two independently generated transplastomic lines per construct (Figure 2a). The results confirmed that transplastomic lines had been obtained for all constructs. All lines showed the expected hybridization patterns in restriction fragment length polymorphism (RFLP) analyses, and absence of a hybridization signal for the wild-type fragment suggested that the lines are homoplasmic. However, upon strong exposure of the blots, weak signals corresponding in size to the wild-type fragments were seen (Figure 2a and data not shown). Previous work had established that these signals do not usually come from residual copies of the wild-type plastid genome, but are rather explained by the presence of so-called promiscuous DNA of plastid origin that is present in the nuclear genome (Hager et al., 1999; Ruf et al., 2000).

Figure 2.

 Analysis of plastid transformants. (a) restriction fragment length polymorphism analysis of transplastomic tobacco lines. Total cellular DNA was digested with the restriction enzymes indicated and hybridized to a radiolabelled probe detecting the region of the plastid genome that flanks the transgene insertion site (Figure 1a). The GFP constructs and the unfused CV-N expression construct were analysed with the restriction enzyme BglII, and all PlyGBS fusion constructs were assayed with the enzyme BamHI (cp. Figure 1a). The upper panel shows a first set of transplastomic lines (one per construct), the lower panel a second set of independently generated lines. Fragment sizes for the wild type (3.4 kb BglII fragment and 4.4 kb BamHI fragment) and the unfused CV-N expression construct (6.5 kb) are indicated (cp. Figure 1a). The size difference between the hybridization signals in the wild type and the transplastomic lines corresponds to the combined size of the two transgene cassettes. (b) Seed germination assays on spectinomycin-containing medium to confirm homoplasmy of transplastomic lines. Five lines representing different constructs are exemplarily shown. Wt, wild type.

To ultimately confirm homoplasmy, we conducted inheritance tests, which provide the most sensitive assays of the homoplasmic state of plastid transformants (Bock, 2001). Germination of T1 seeds on antibiotic-containing medium revealed a homogeneous population of spectinomycin-resistant seedlings, strongly indicating that the transplastomic lines are indeed homoplasmic and confirming maternal inheritance of the aadA transgene, as expected for a plastid gene in tobacco (Ruf et al., 2007).

None of the transplastomic lines displayed any phenotypic abnormalities and, upon transfer to soil and growth under standard greenhouse conditions, all transplastomic plants were indistinguishable from wild-type plants.

cv-n mRNA accumulation in transgenic plastids

To test for expression of the introduced transgene constructs, we first performed Northern blot experiments using transgene-specific probes. Hybridization to a cv-n probe detected a doublet of transcripts (Figure 3) with the upper band of ∼0.7 kb representing the expected full-length cv-n mRNA. In the GFP fusion constructs pZE15, pDK122 and pZE16, the expected ∼1.5-kb mRNA for the fusion gene was detected, and in pZE15, additionally the ∼0.5-kb short transcript was also present in the plants expressing the unfused cv-n mRNA (pZE17; Figure 3). The small transcript was also seen in all nine plyGBS fusion constructs. However, while its size increases with the size of the plyGBS sequence fused to cv-n in all constructs carrying C-terminal fusions, it remained unchanged compared to pZE17 in all N-terminal fusions (pZE23, pZE48 and pZE49). When considering all possible explanations for the appearance of this additional band (RNA degradation, aberrant RNA processing, premature transcription termination, alternative transcription initiation from a cryptic promoter), only two explanations were consistent with the observations that (i) the size of the transcript remained unchanged in all N-terminal fusions constructs and (ii) the small transcript was not detected in the two constructs carrying long C-terminal GFP fusions (pDK122 and pZE16): alternative transcription initiation from a cryptic promoter within the cv-n coding region or aberrant RNA processing in the 5′ part of cv-n. In both cases, only C-terminal fusions result in an increase of transcript size. In the case of construct pDK122 carrying the entire gfp reading frame fused to cv-n, the resulting small size difference in the 1.5-kb transcript is too small to be resolved in the gel (Figure 3). In pZE16, the diffuse band at ∼1 kb could correspond to the added size of the shortened cv-n sequence and the 3′ part of gfp (cp. Figures 1b and 3). At present, we cannot confidently distinguish between transcription initiation from a cryptic promoter and aberrant RNA processing as the possible cause of the additional shorter-than-normal transcript species. When searching the cv-n coding region for putative cryptic promoter sequences, we found a perfect −10 box (TATAAT) starting at position +28 of the cv-n coding region. In the conserved spacing 18 nt upstream, there is a sequence that could loosely resemble an (imperfect) −35 box and in the conserved spacing 7 nt downstream, there is an A nucleotide that could represent the transcriptional start site. However, experimental evidence from in vitro capping or a similar technique suitable to identify primary transcription start sites would be required to confirm that the short unexpected transcript species really originates from cryptic transcription initiation.

Figure 3.

 Analysis of RNA accumulation from the various expression constructs in transplastomic tobacco plants. Total cellular RNA was electrophoretically separated, blotted and hybridized to a radiolabelled cv-n probe (upper panel) or an aadA probe (lower panel). Sizes of major hybridizing bands are indicated in kb. Note that most transplastomic lines (with the exception of the GFP fusions in pZE16 and pDK122) accumulate an additional small RNA species that is shorter than the full-length cv-n mRNA and may be the product of alternative transcription initiation within the cv-n coding region (cp. Figure 1a). (The asterisk marks a diffuse band in the pZE16-4 sample that corresponds to the added size of the shortened cv-n sequence and the 3′ part of gfp — see text for details.). The upper bands in pZE48, pZE46, pZE49 and pZE47 most probably represent a read-through transcript, which has been observed previously (Zhou et al., 2008) and may originate from incomplete processing at the atpA 3′ UTR. The aadA hybridization signals were used for normalization and quantification of mRNA accumulation from the cv-n fusion genes.

The most interesting result from the transcript analysis was the finding that the different fusion constructs showed strong variation in RNA accumulation levels. Comparing the different plyGBS fusions, constructs pZE48, pZE46, pZE49 and pZE47 displayed much higher transcript accumulation levels than the other fusions (Figure 3). The common feature of these four constructs is that they harbour ≥60 nt of plyGBS sequence at the 5′ end of the coding region. Thus, it appears that 5′ fusions to the cv-n coding region exert a strongly stabilizing effect on the mRNA. Interestingly, this effect was not present in constructs pZE23 and pZE21, which carry the 10 amino acid (30 nt) plyGBS fusion to the 5′ end of the cv-n coding region, suggesting that a certain length of the sequence insertion is needed to confer the transcript-stabilizing effect. This interpretation is further supported by the analysis of the gfp fusions. While the N-terminal fusion of either the full-length gfp sequence (pZE15) or the 5′ half (pZE16) resulted in strong transcript accumulation, the fusion of gfp to the 3′ end of cv-n (pDK122) yielded much lower mRNA accumulation levels, consistent with the proposed transcript-stabilizing effect of (sufficiently long) insertions between the 5′ untranslated leader and the cv-n coding region.

Quantification of the differences in transcript accumulation using a control hybridization to an aadA-specific probe for normalization (Figure 3) revealed that the insertion of 60 or 90 nt of 5′plyGBS sequence resulted in an approximately 8- to 10-fold increase in mRNA accumulation levels. Taken together, these data raise the possibility that the poor expression of CV-N in plastids could, at least in part, be because of low transcript stability. Moreover, the data suggest that internal sequences can contribute significantly to the stability of transgene-derived mRNAs.

CV-N protein accumulation in transplastomic tobacco plants and bacteria: GFP fusions

Having shown that the different cv-n fusion genes give rise to large differences in mRNA accumulation levels, we next wanted to analyse gene expression at the protein level. To this end, we conducted a series of Western blot analyses using specific antibodies against GFP and CV-N. As chloroplasts are derived from cyanobacteria and have retained a prokaryotic-type gene expression machinery, plastid expression elements (promoters, 5′ and 3′ UTRs) are usually also active in bacteria (Apel et al., 2010). This allowed us to test the identical fusion gene constructs also in the bacterium Escherichia coli and, in this way, conduct a side-by-side comparison of the effects of the various fusions on protein accumulation between bacteria and plastids.

We first analysed accumulation of the fusions between CV-N and GFP in transplastomic plants. Probing of Western blots with an anti-GFP antibody revealed very different protein accumulation patterns for the three GFP fusions (Figure 4a). The full-length fusion protein of GFP and CV-N has a calculated molecular mass of 38 kDa. This 38-kDa protein was not detectable in transplastomic plants expressing the GFP::CV-N fusion (pZE15) and, instead, accumulation of several smaller proteins was seen (Figure 4a). These proteins presumably represent degradation intermediates of the full-length GFP::CV-N. In addition to the full-length protein, a putative degradation product of similar size as the largest band in the pZE15 sample was also detected in plants expression the CV-N::GFP fusion (pDK122; Figure 4a). Interestingly, none of the putative degradation products seen in the N-terminal and C-terminal fusions were observed in the transplastomic pZE16 plants. This may indicate that both the N-terminus and the C-terminus of the CV-N protein are prone to proteolytic degradation and suggests that protection of both termini is required to stabilize the protein in chloroplasts.

Figure 4.

 Western blot analysis to determine foreign protein accumulation in transplastomic plants and bacteria expressing gfp fusion constructs. (a) Protein accumulation in leaves from transplastomic plants. A concentration of 1 μg of total soluble protein was loaded for pDK133 and 5 μg for all other lines. For quantitative assessment of protein accumulation, purified recombinant GFP was loaded in four dilutions. The left blot was hybridized to an anti-GFP antibody, the right blot to an anti-CV-N antibody. Wt, wild-type tobacco. (b) Recombinant protein accumulation in Escherichia coli strains harbouring the same plasmid constructs as the corresponding transplastomic tobacco lines.

To obtain further information on the identity of the putative degradation intermediates accumulating in the pZE15 and pDK122 plants, we probed Western blots with an anti-CV-N antibody. Remarkably, with the exception of the largest protein species in the pZE15 plants, none of the prominent degradation products detected with the anti-GFP antibody were recognized by the anti-CV-N antibody (Figure 4a). This indicates that most of the degradation intermediates lack the CV-N sequence and provides strong evidence for degradation initiating in the CV-N part of the fusion proteins. Consistent with this interpretation, the unfused CV-N expressed from construct pZE17 was undetectable in the corresponding transplastomic plants.

To compare these results for protein accumulation in plastids with a bacterial system, we performed the analogous Western blots with E. coli strains expressing the identical constructs. In agreement with earlier reports (Colleluori et al., 2005), the unfused CV-N protein was readily detectable, indicating that CV-N may be significantly more stable in bacteria than in plastids. Although some weakly hybridizing putative degradation products could be detected in the bacterial strains expressing the GFP fusions, their accumulation levels were much lower than in chloroplasts (Figure 4b). Moreover, in contrast to the transplastomic plant lines, accumulation of the fusion proteins was very similar in all three transgenic bacterial strains. Similar to the chloroplast-produced protein, the bacterial GFP::CV-N fusion protein (expressed from construct pZE15) migrated faster than the other two fusion proteins of GFP and CV-N. Whether this is because of an unusual behaviour of this fusion protein in gel electrophoresis or rather because of a proteolytic processing event occurring in both plastids and bacteria is currently not clear. If aberrant electrophoretic mobility is the cause, the upper band in the transplastomic pZE15-1 sample (Figure 4a) would likely also represent the full-length fusion protein.

CV-N protein accumulation in transplastomic tobacco plants and bacteria: PlyGBS fusions

Next, we analysed protein accumulation for the CV-N fusions with N-terminal and/or C-terminal peptide sequences from the protein antibiotic PlyGBS. Comparison of all constructs expressed in transgenic chloroplasts (Figure 5a) revealed a conspicuous pattern. The CV-N protein was undetectable in all lines expressing proteins that expose the CV-N sequence at their N-terminus. In contrast, all fusions in which the N-terminus of CV-N was protected by PlyGBS sequences accumulated the CV-N protein. Remarkably, although producing 10-fold less RNA than the 20 and 30 amino acid fusions, the 10 amino acid fusion resulted in very similar protein accumulation levels (Figure 5a). A distinct proteolytic cleavage event appears to occur in the fusion proteins carrying both N-terminal and C-terminal PlyGBS-derived sequences (pZE46 and pZE47). The virtual absence of similar cleavage products from the pZE48 and pZE49 plants suggests that this cleavage occurs within the C-terminal fusion part of the proteins (Figure 5a).

Figure 5.

 Western blot analysis to determine foreign protein accumulation in transplastomic plants and bacteria expressing the plyGBS fusion constructs. (a) Protein accumulation in leaves from transplastomic plants as determined with an anti-CV-N antibody. For each plant line, 60 μg of total soluble protein was loaded. For indirect quantitative assessment of protein accumulation, protein from transplastomic line pZE16-4 (cp. Figure 4a) was loaded in three dilutions. Equal loading was further confirmed by Coomassie staining of the high-molecular-weight region of an identical gel and assessing the amount of large subunit of Rubisco (RbcL). Immunoreacting bands in the 30–60-kDa region of the gel are nonspecific cross-reactions (as evidenced by their occurrence in similar intensity in the wild-type control), with the possible exception of the band indicated by an asterisk, which only occurs in samples accumulating detectable amounts of CV-N fusion proteins and may represent a CV-N multimer (Sexton et al., 2006). Wt: wild-type tobacco. (b) Recombinant protein accumulation in Escherichia coli strains harbouring the same plasmid constructs as the corresponding transplastomic tobacco lines.

CV-N is known to form highly stable multimers (Sexton et al., 2006). In addition to several cross-reacting bands present also in the wild-type control, all transplastomic plants accumulating detectable amounts of CV-N fusion protein also showed an apparently specific band of approximately 50 kDa (labelled by the asterisk in Figure 5a). Although we used strongly denaturing (urea-containing) gel electrophoresis systems, complete denaturation of the CV-N multimers turned out to be difficult to achieve (data not shown). We, therefore, believe that this band represents residual CV-N multimers.

In contrast to the large differences in recombinant protein accumulation occurring in the different transplastomic plant lines, only moderate differences were seen when the same constructs were expressed in E. coli (Figure 5b). This confirms that CV-N is much more stable in bacteria than in chloroplasts and neither N-terminal nor C-terminal fusions can significantly enhance expression levels in E. coli. Overall, the sizes of the major bands detected in the bacterial samples correspond well with those of the (undegraded) chloroplast-expressed proteins (Figure 5; cp. Figure 1b).


In the course of this work, we have investigated the possibility to enhance the expression of an unstable recombinant protein in plastids by protecting its N-terminus and/or C-terminus with polypeptide sequences taken from the highly stable proteins GFP and PlyGBS. Our data show that expression of the native (unfused) cv-n gene results in low levels of mRNA and no detectable protein. Interestingly, it was possible to stabilize the mRNA by sequence insertions ≥60 nt between the 5′ UTR and the cv-n coding region (Figure 3). However, the resulting 10-fold increase in mRNA accumulation levels was not chiefly responsible for the achieved improvement in protein accumulation. Much more important was an apparent stabilizing effect on the protein resulting from the presence of protecting N-terminal sequences. This becomes clear upon comparison of constructs pZE23, pZE48 and pZE49. mRNA stabilization occurs only in pZE48 (carrying 20 codons form the N-terminus of PlyGBS) and pZE49 (carrying 30 codons form the N-terminus of PlyGBS), but not in pZE23 (carrying 10 codons form the N-terminus of PlyGBS). Nonetheless, transplastomic pZE23 plants accumulate similarly high levels of CV-N fusion protein as the pZE48 and pZE49 lines. This finding underscores the importance of translational regulation in the expression of chloroplast genes and transgenes (Eberhard et al., 2002; Kahlau and Bock, 2008) and demonstrates that translation regulation can largely over-ride changes in mRNA accumulation.

The finding that an internal mRNA sequence can drastically enhance transcript stability was somewhat surprising. Previous work had shown that protection of the 5′ and 3′ termini of plastid transcripts is crucial to their stability. While the 3′ ends of plastid mRNAs are usually protected by stemloop-type RNA secondary structures (and RNA-binding proteins associated with them; Stern and Gruissem, 1987; Adams and Stern, 1990; Hayes et al., 1996), the 5′ ends are usually less structured and are often protected by sequence-specific RNA-binding proteins (Pfalz et al., 2009). In the absence of these protective structures, transcripts are rapidly degraded by endoribonucleases acting in 3′->5′ or 5′->3′ direction (Hayes et al., 1996; Drager et al., 1998, 1999; Walter et al., 2002). However, there is also evidence that transcript degradation in chloroplasts can be initiated by endoribonucleolytic cleavage within the coding region (Klaff, 1995). Thus, it seems possible that the insertions of plyGBS or gfp sequences downstream of the 5′ UTR impede endoribonucleolytic cleavage of the cv-n coding region and, in this way, exert the observed stabilizing effect on the mRNA. Whether this is mediated by the alteration in the mRNA sequence or rather by changes in the secondary structure of the mRNA remains to be determined.

Recent work has uncovered that important determinants of plastid protein stability reside in the N-terminal sequence (Apel et al., 2010). The data obtained in this study confirm this important role of the N-terminus (Figures 4a and 5a) and, in addition, reveal that, at least in certain cases, also protection of the C-terminus can improve protein stability (cp. pDK122 and pZE16 in Figure 4a). Overall, the protective effect from the GFP fusions was significantly stronger than that from the PlyGBS fusions. While the GFP fusions to CV-N in pZE15 and pZE16 transplastomic plants accumulated to approximately 0.3% of TSP, protein accumulation in the PlyGBS fusions was more than 20-fold lower. This somewhat contrasts the much higher accumulation levels of PlyGBS compared to GFP in transplastomic plants (Reed et al., 2001; Oey et al., 2009b), but may simply be an effect of the larger size of the GFP sequences fused to CV-N (Figure 1b) and/or their more favourable effect on the three-dimensional structure of the fusion proteins.

In addition to stabilizing recombinant proteins, N-terminal and/or C-terminal fusion peptides can serve other useful purposes. For example, they can accommodate (cleavable) tags that aid subsequent protein purification and/or sequences that facilitate transmucosal delivery of orally administered pharmaceutical proteins (Davoodi-Semiromi et al., 2010; Ruhlman et al., 2010; Lee et al., 2011).

Although plastids stem from formerly free-living eubacteria, expression levels obtained in E. coli are not always a reliable predictor of attainable protein accumulation levels in chloroplasts (Magee et al., 2004). This becomes particularly evident, when the expression of the unfused CV-N and the PlyGBS fusions with an N-terminally exposed CV-N are compared between bacteria and plastids (Figure 5). The reason for these striking differences between E. coli and chloroplasts are not clear, but the more reducing milieu in the stroma of the chloroplast and the concomitantly higher propensity of correct disulphide bond formation (Bally et al., 2008; Bock and Warzecha, 2010) could potentially result in altered protein folding and, in this way, influence protein stability. Alternatively, differences in the proteolytic activities present in chloroplasts versus bacteria could be responsible for the observed differences in recombinant protein accumulation.

Unfortunately, our knowledge about the rules that govern protein stability in plastids is still very much limited, and it is currently impossible to predict the attainable accumulation levels of recombinant proteins intended to be expressed in transgenic chloroplasts. However, the identification of sequences that stabilize otherwise unstable recombinant proteins provides useful tools for improvement of protein expression. In addition, our finding that significant increases in transcript stability can be achieved by inserting sequences from stable mRNAs between the 5′ UTR and the coding region of the transgene of interest highlights a possible solution in all those cases, where transgene expression is limited by mRNA accumulation.

Experimental procedures

Bacterial strains and growth conditions

Cloning work and isolation of plasmids were carried out using E. coli cells (One Shot® Top10F’; Invitrogen, Karlsruhe, Germany). For protein extraction from E. coli, strain SURE 2 (Agilent Technologies, Cedar Creek, TX) was used. Bacteria were grown in LB medium (10 g/L bacto tryptone, 5 g/L yeast extract, 10 g/L NaCl) with ampicillin (100 μg/mL) or spectinomycin (100 μg/mL) at 37 °C under continuous shaking (180 rpm).

Plant material

Aseptically grown tobacco (Nicotiana tabacum cv. Petit Havana) plants were raised on agar-solidified MS medium (Murashige and Skoog, 1962) containing sucrose (30 g/L). Regenerated shoots from transplastomic lines were rooted and propagated on the same media. Rooted homoplasmic plants were transferred to soil and grown to maturity under standard glasshouse conditions.

Cloning procedures

The plastid transformation vectors constructed in this study are based on the previously described plasmid pRB95 (Ruf et al., 2001). Expression of gfp, cv-n and all fusion genes is driven by the Chlamydomonas reinhardtii 16 rRNA operon promoter (Prrn) fused to the T7g10 leader sequence and the 3′ UTR from the Chlamydomonas plastid atpA gene. The Prrn-T7g10 was obtained by PCR amplification of the Chlamydomonas Prrn promoter using primers that introduce SacI and AflII restriction sites (CrPrrnfor: 5′-TTGAGCTCGTAAGGGGAAGGGGAAAAC-3′ and CrPrrnrev: 5′-TTTTCTTAAGCAGTGTTTTTAATTTAACTT-3′). The T7g10 sequence was amplified with primers G10T7for (5′-TTTTCTTAAGGGAGACCACAACGGTTTCC-3′) and G10T7rev (5′-TTTCATATGTATATCTCCTTCTT-3′) introducing AflII and NdeI restriction sites. Promoter and 5′ UTR were subsequently combined via their AflII sites and assembled into expression cassette pHK20 (Kuroda and Maliga, 2001) as SacI/NdeI fragment. The 3′ UTR from the Chlamydomonas plastid atpA gene (TatpA) was amplified using primers CrTatpAfor (5′-TTTTTCTAGATTTTAATTAAGTAGGAACTCGG-3′) and CrTatpArev (5′-TTTTAAGCTTCAAAAATTTTTAATGTTAACATAC-3′) introducing XbaI and HindIII restriction sites with the primer sequences. Following digestion with XbaI and HindIII, the fragment replaced the 3′ UTR sequence in the aforementioned pHK20-derived expression cassette. Finally, the expression cassette was inserted into plastid transformation vector pRB95, generating vector pDK115. All transgenes were cloned into pDK115 as NdeI/XbaI fragments (pZE vector series, pDK122 and pDK133; Figure 1a).

A synthetic gene for CV-N was produced by DNA synthesis (Entelechon, Regensburg, Germany) using the codon usage preferred by Nicotiana tabacum chloroplasts and introducing a 5′ NdeI and a 3′ XbaI site for subsequent cloning steps. The synthetic gene was cloned into vector pCR4-TOPO generating plasmid pCR4-TOPO-cv-n. The gfp::cv-n fusion gene was constructed by PCR amplification of gfp using primers designed to eliminate the stop codon and introduce NdeI and AseI restrictions sites upstream and downstream, respectively, of the gfp coding region. The primers used were PZF9-5′n-GFP (5′-TTTTCATATGAGTAAAGGAGAAGAACTT-3′) and PZF10-3′n-GFP (5′-TTTTATTAATGATTAGTTCATCCATGCC-3′). The modified gfp was inserted as NdeI/AseI fragment into vector pCR4-TOPO-cv-n opened with NdeI. The cv-n::gfp fusion gene was obtained by amplifying the cv-n coding sequence from plasmid pCR4-TOPO-cv-n using the primer pair M13-reverse (5′-GGAAACAGCTATGACCATG-3′) and PZF6-3′CVN (5′-TTTTTCTAGATTTTGGATCCTTCATATTTTAATGTTCC-3′) to create XbaI and BamHI sites at the 3′ end. The PCR product was then digested with NdeI and XbaI and inserted into vector pDK115 to obtain plasmid pDK121. Subsequently, a gfp variant with a BamHI site at the 5′ end (instead of NdeI) was digested with BamHI and XbaI and ligated into pDK121. The gfp::cv-n::gfp fusion gene was produced by PCR amplification of the cv-n coding sequence with the primer pair PZF13-5′i-CVN (5′-TTTTATCGATATGTTAGGAAAATTTTCTC-3′) and PZF14-3′i-CVN (5′-TTTTATCGATTTCATATTTTAATGTTCC-3′) to eliminate the stop codon and introduce ClaI restriction sites at both the 5′ and 3′ ends. After digestion with ClaI, the cv-n fragment was inserted into the unique ClaI site within the gfp coding region. The 30 nt plyGBS::cv-n fusion was obtained by annealing the synthetic oligonucleotides P5′PlyGBS-30n (5′-TATGGCTAGCGCTACTTATCAAGAATATAAA-3′) and P5′PlyGBS-30nc (5′-TATTTATATTCTTGATAAGTAGCGCTAGCCA-3′) followed by insertion of the annealed oligonucleotides into the 5′ NdeI site of cv-n. The cv-n::30 nt plyGBS fusion was constructed by amplification of the cv-n sequence from pDK121 (including the Prrn-T7G10 promoter) using primers P3′PlyGBS-30n (5′-TTTCTAGAGATATCAGTAGCATTTACTAAATAACTATCTTCATATTTTAATGTTCCA-3′) and M13-forward (5′-GTAAAACGACGGCCAGT-3′). Primer P3′PlyGBS-30n contains 30 nt of the 3′ end of the plyGBS coding sequence, overlaps with the 3′ part of the cv-n coding region and contains a terminal XbaI recognition sequence. The PCR product was then inserted as SacI/XbaI fragment into the similarly digested vector pDK115. The 30 nt plyGBS::cv-n::30 nt plyGBS fusion was produced from the cv-n::30 nt plyGBS fusion as described for the 30 nt plyGBS::cv-n fusion. The 60 nt plyGBS::cv-n fusion was generated by PCR amplification of 60 bp from the 5′ portion of the plyGBS coding region (Oey et al., 2009b) with primers P5′PlyGBS-NdeI (5′-TTTCATATGGCTAGCGCTACTTA-3′) and P5′PlyGBS60nRev (5′-AATATCATAAGCATTTCCAT-3′) and separate amplification of the cv-n sequence combining a primer that overlaps with the 5′ part of the plyGBS coding region (P5′PlyGBS60n-cvn: 5′-ATGGAAATGCTTATGATATTATGTTAGGAAAATTTTCTC-3′) with primer P3′CVN-XbaI (5′-TTTTCTAGATTATTCATATTTTAA-3′). Subsequently, both PCR products were combined in one PCR to produce the 60 nt plyGBS::cv-n fusion. The PCR product was then inserted into pDK115 as NdeI/XbaI fragment. The cv-n::60 nt plyGBS fusion was obtained by a similar strategy using primers P3′PlyGBS-XbaI (5′-TTTTCTAGATATCAGTAGCATTTA-3′) and P3′PlyGBS60nFor (5′-AATCATCCTGAATCTGCTCA-3′) for amplification of the 3′plyGBS fragment and primer pair P3′PlyGBS60n-cvn (5′-TGAGCAGATTCAGGATGATTTTCATATTTTAATGTTCCA-3′) and P5′CVN-NdeI (5′-TTTCATATGTTAGGAAAATTTTCT-3′) for cv-n amplification. The 60 nt plyGBS::cv-n::60 nt plyGBS fusion was obtained by separately amplifying the 5′ and 3′ fragments of plyGBS as described earlier and the cv-n sequence with primer pair P5′PlyGBS60n-cvn and P3′PlyGBS60n-cvn. The fusion was then produced by inclusion of the three PCR products in one PCR followed by cloning of the PCR product into pDK115 as NdeI/XbaI fragment. The 90 nt plyGBS::cv-n fusion was produced analogously to the 60 nt plyGBS::cv-n fusion using primers P5′PlyGBS-NdeI and P5′PlyGBS90nRev (5′-ATCCCAACATTGTGCTCCAA-3′) for amplification of the plyGBS fragment and primers P5′PlyGBS90n-cvn (5′-TTGGAGCACAATGTTGGGATATGTTAGGAAAATTTTCTC-3′) and P3′CVN-XbaI for cv-n amplification. The cv-n::90 nt plyGBS fusion was produced analogously to the cv-n::60 nt plyGBS fusion using primers P3′PlyGBS-XbaI and P3PlyGBS90nFor (5′-TATGAAAAAGTAAATGGATG-3′) for amplification of the plyGBS fragment and primers P3′PlyGBS90n-cvn (5′-CATCCATTTACTTTTTCATATTCATATTTTAATGTTCCA-3′) and P5′CVN-NdeI for cv-n amplification. The 90 nt plyGBS::cv-n::90 nt plyGBS fusion was obtained analogously to the 60 nt plyGBS::cv-n::60 nt plyGBS fusion using the 90-bp PCR products from plyGBS and a cv-n amplification product generated with primers P5′PlyGBS90n-cvn and P3′PlyGBS90n-cvn. All expression cassettes were finally cloned into vector pRB95 as SacI/HindIII fragments (Figure 1), and their DNA sequences were verified by complete sequencing.

Transformation of tobacco chloroplasts

Plastid transformation was performed using the biolistic protocol (Svab and Maliga, 1993). Briefly, young leaves from aseptically grown tobacco plants were bombarded with plasmid DNA-coated 0.6 μm gold particles using the PDS-1000/He biolistic gun with the Hepta Adaptor (BioRad, Munich, Germany). Primary spectinomycin-resistant lines were selected on an MS-based regeneration medium containing 500 mg/L spectinomycin (Svab and Maliga, 1993). Spontaneous spectinomycin-resistant mutants were eliminated by double resistance tests on regeneration medium containing both spectinomycin and streptomycin (500 mg/L each; Bock, 2001). For each transgene construct, several independent transplastomic lines were generated and subjected to two to three additional rounds of regeneration on spectinomycin-containing medium to obtain homoplasmic tissue.

Isolation of nucleic acids and gel blot analyses

Total plant DNA was isolated from leaf samples by a cetyltrimethylammoniumbromide-based method (Doyle and Doyle, 1990). Total cellular RNA was extracted using the peqGOLD TriFast reagent (Peqlab GmbH, Erlangen, Germany) following the manufacturer’s protocol.

For Southern blot analysis, DNA samples (5 μg total DNA) were digested with BglII or BamHI and separated by electrophoresis in 1% agarose gels. RNA samples (5 μg total RNA) were electrophoresed in formaldehyde-containing 1% agarose gels. A 550 bp PCR product generated by amplification of a portion of the psaB coding region (Wurbs et al., 2007) was used as RFLP probe to verify plastid transformation and assess the homoplasmic state of the transformants. For detection of aadA and cv-n transcripts by Northern blot analysis, the entire coding sequences (excised from plasmid clones) were used as probes. Electrophoretically separated nucleic acids were transferred onto Hybond XL (GE Healthcare, Buckinghamshire, UK) membranes by capillary blotting using standard protocols.

Hybridization probes were purified by agarose gel electrophoresis following extraction of the DNA fragments of interest from excised gel slices using the Nucleospin Extract II kit (Macherey-Nagel, Düren, Germany). Probes were labelled with α[32P]dCTP by random priming using the Multiprime DNA labelling system (GE Healthcare). Hybridizations were performed at 65 °C in Rapid-Hyb buffer (GE Healthcare) following the manufacturer’s protocol.

Protein extraction and immunoblot analyses

Extraction of total soluble protein from E. coli cells (harvested at an OD600 of approximately 0.4) was carried out as described previously (Neupert et al., 2008). Bacterial pellets were solubilized in 500 μL of SDS–PAGE loading buffer and 1 μL was used for Western blot analysis. Total soluble protein from plant material was isolated using a phenol-based extraction method (Cahoon et al., 1992). The obtained protein pellets were dissolved in 1% SDS and their concentration was measured using the BCA Protein Assay kit (Pierce, Rockford, IL, USA). For maximum denaturation of the expressed fusion proteins, samples were mixed with SDS loading buffer (115 mm Tris–HCl, pH 6.8; 4% (w/v) SDS; 100 mm DTT; 19% (v/v) glycerol), heated at 95 °C for 10 min, then separated by electrophoresis in 16% or 18% PAA gels containing 8 m urea and blotted onto polyvinylidene difluoride membranes. Membranes were treated with blocking buffer (20 mm Tris–HCl, pH 7.6; 137 mm NaCl; 0.5% casein) for 1 h and then incubated for 1 h with either rabbit polyclonal anti-GFP antibody (JL-8; Clontech, Mountain View, CA, USA) diluted 1 : 10 000 or rabbit polyclonal anti-CV-N antibody (kindly provided by Dr James B. McMahon, NCI-Frederick, USA) diluted 1 : 1000 in buffer (20 mm Tris–HCl, pH 7.6; 137 mm NaCl; 0.1% Tween-20). Detection was performed with the ECL Plus system (GE Healthcare) and an anti-rabbit secondary antibody (Agrisera AB; Vännäs, Sweden).

Quantification of RNA and protein accumulation

Hybridization signals were analysed using a Typhoon Trio+ variable mode imager (GE Healthcare). Quantification of signal intensities from Northern blots (phosphorimages) and Western blots (X-ray films) was carried out using the ImageQuant software (GE Healthcare). Selected band areas were quantified with the automatic quantification option, and background correction was performed by selecting an appropriate background area of the blot. Signal intensities of the mRNA bands of the cv-n fusions in Northern blots were normalized to the corresponding aadA signals and finally set in relation to the reference signal of the cv-n mRNA in construct pZE17. Signal intensities of bands in Western blots were set in relation to serial dilutions of recombinant GFP (to quantify CV-N fusions with GFP) or serial dilutions of the GFP::CV-N::GFP fusion to indirectly quantify the CV-N fusions with the various PlyGBS sequences.


We thank Stefanie Seeger and Claudia Hasse for help with plant transformation, Anja Klevesath for technical assistance and the MPI-MP Green Team for plant care and cultivation. The anti-cyanovirin antibody was kindly provided by Dr James B. McMahon (NCI-Frederick, USA). This research was supported by the Max Planck Society and a grant from the European Union (FP7 METAPRO 244348).