Metabolic engineering of recombinant protein secretion by Saccharomyces cerevisiae

Authors

  • Jin Hou,

    1. Department of Chemical and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
    2. State Key Laboratory of Microbial Technology, Shandong University, Jinan, China
    Search for more papers by this author
  • Keith E.J. Tyo,

    1. Department of Chemical and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
    2. Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
    Search for more papers by this author
  • Zihe Liu,

    1. Department of Chemical and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
    Search for more papers by this author
  • Dina Petranovic,

    1. Department of Chemical and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
    Search for more papers by this author
  • Jens Nielsen

    Corresponding author
    • Department of Chemical and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
    Search for more papers by this author

Correspondence: Jens Nielsen, Department of Chemical and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96 Göteborg, Sweden. Tel.: +46 31 772 3804; fax: +46 31 772 3801;

e-mail: nielsenj@chalmers.se

Abstract

The yeast Saccharomyces cerevisiae is a widely used cell factory for the production of fuels and chemicals, and it is also provides a platform for the production of many heterologous proteins of medical or industrial interest. Therefore, many studies have focused on metabolic engineering S. cerevisiae to improve the recombinant protein production, and with the development of systems biology, it is interesting to see how this approach can be applied both to gain further insight into protein production and secretion and to further engineer the cell for improved production of valuable proteins. In this review, the protein post-translational modification such as folding, trafficking, and secretion, steps that are traditionally studied in isolation will here be described in the context of the whole system of protein secretion. Furthermore, examples of engineering secretion pathways, high-throughput screening and systems biology applications of studying protein production and secretion are also given to show how the protein production can be improved by different approaches. The objective of the review is to describe individual biological processes in the context of the larger, complex protein synthesis network.

Introduction

The introduction of genetic engineering in the 1970s resulted in the establishment of an efficient biotech industry with one of the foci being the production of recombinant proteins for therapeutic use. Today more than 50 pharmaceutical proteins are being produced using recombinant technologies, and many of these are blockbuster pharmaceuticals (Walsh, 2010). These recombinant proteins can be produced using a range of different cell factories, including bacteria, yeast, filamentous fungi, insect cells, and mammalian cells. Many different yeast and fungal systems have been compared for protein production for pharmaceutical, food, and other industries (Madzak et al., 2004; Porro et al., 2005). At the industrial level, there is a consolidation in the choice of cell factory, so most of the production is achieved in Escherichia coli, Pichia pastoris, Saccharomyces cerevisiae, and Chinese hamster ovary cells (CHO cells). This consolidation that provides a limited number of general production platforms, allows faster optimization and scale up of protein production by the given cell factory. Furthermore, engineering of these microorganisms is driven by the desire to improve productivity and the ability to produce new products with optimal pharmacokinetic properties, for example, strains of P. pastoris that can produce proteins with human glycosylation patterns (Gerngross, 2004; Hamilton et al., 2006; Li et al., 2006). This development of more efficient and improved cell factories is driven by metabolic engineering, which involves directed genetic engineering of cell factories with the objective to change and improve their properties (Kim et al., 2012).

The yeast S. cerevisiae is a widely used cell factory for the production of fuels and chemicals, such as bioethanol– by far the largest volume fermentation product. It is also used for the production of several recombinant proteins, for example, human insulin, hepatitis vaccines, and human papillomavirus (HPV) vaccines. Saccharomyces cerevisiae also serves as an important model eukaryote, and many fundamental studies have therefore been performed on this organism. It was also the first eukaryotic organism to have its genome sequenced, and a number of high-throughput studies have been pioneered using this organism as a model (Nielsen & Jewett, 2008). Owing to its model organism status and use in industry for recombinant protein production, there have been many studies on both (1) the basic cell and molecular biology of protein secretion and (2) strategies for engineering these processes for improved protein production.

There are many examples of engineering of S. cerevisiae for improved protein production, including optimizing of fermentation process, selecting the expression vectors systems, choosing the signal sequence for extracellular targeting and engineering host strains for better folding and post-translational modification (Idiris et al., 2010). Largely improvement of the heterologous protein has been achieved from milligrams to grams per liter based on these engineering in the past decade. However, as illustrated in these reviews, many of these attempts have given rather specific conclusions: rational targets for over-expression or deletion have been chosen, but often it was found that the strategy worked successfully only for one (or a few) protein(s), and the same engineered strain could not be used as a general cell factory platform for the production of a range of different recombinant proteins. This can be explained by the complexity of protein processing and secretion pathways. Folding, glycosylation, disulfide bond formation, and vesicle trafficking must all be accomplished while maintaining quality control feedback loops and avoiding situations that will perturb cellular homeostasis. Each process must be tuned to a specific state based on the secreted protein's physical properties, for example, number of disulfide bonds, protein size, protein hydrophobicity, etc. Through detailed understanding of the individual processes and integrated analysis of the interplay between these processes, it should be possible to derive general models for protein secretion that can be used for engineering the secretion pathway and thereby result in improved cell factories for recombinant protein production. Therefore, genetic engineering combining with systems biology approach has become more and more useful for effective recombinant protein production (Graf et al., 2009).

Systems biology approaches are increasingly valuable for metabolic engineering of cell factories for metabolite production (Nielsen & Jewett, 2008). This is particularly due to the robustness of genome-scale metabolic models (GEMs). Whether these concepts can be expanded into use for improving recombinant protein production is still to be seen, but considering the complexity and many interaction partners involved in protein synthesis, protein folding, protein processing and secretion, it is very likely that systems biology approaches may substantially impact this field, both in terms of gaining system-level understanding and in terms of identifying engineering targets using these system-level models.

Our review focuses on systematically organizing and interconnecting secretory processes, that is, mapping of key components in post-translational modification process. This scaffold moves us toward a systems level of the large and complicated process of protein production. Different examples of recombinant protein production by yeast, including an overview of the different tools available for efficient protein production, will highlight the parameters that can be altered in these systems and potential outcomes. There are very few systems biology studies on protein secretion, but we will give examples on the use of omics analysis for studying specific processes, and we will also provide overall flowcharts for protein secretion processing that may be used as scaffolds for building more detailed models of protein secretion.

A scaffold for protein post-translational modifications

For secreted proteins and proteins targeted to the plasma membrane and organelles of the endosome membrane system, there are many steps after translation before the protein is matured and trafficked to the correct location. A common pathway, called the secretory pathway, is used to complete the protein maturation process. Correct folding, post-translational modifications, and trafficking are required for membrane-bound, ER, Golgi, vacuole, cell outer membrane, cell wall, or secreted proteins. The secretory pathway primarily relies on local interactions (e.g. receptor-secreted protein interactions or chemical alterations to the secreted protein) to make decisions on the fate of the secretory protein, rather than transcriptional responses (e.g. transcriptional activation of a secretory machinery). Exhaustive catalogs of secretory/vps/endocytotic factors have been obtained by forward genetics, suppressors screens, screens of null mutant collections, and synthetic genetic analysis (Bard & Malhotra, 2006; Weerapana & Imperiali, 2006) (and at yeastgenome.org), so in the following, we will focus on the ‘decision making’ components of the secretory pathway that interact directly with proteins traversing this pathway. By this, we map how the inherent biochemistry and the state of a secretory protein (amino acid sequence, folding, oxidation, glycosylation, etc.) determine the response by the secretory pathway. Many recognition complexes that are responsible for directing the vesicle to the correct organelle operate independently from the cargo that is in the vesicle and operate at a higher level of organization than is covered here. These aspects are covered in reviews by Pfeffer (Pfeffer & Aivazian, 2004). In many of the reviews discussed earlier, aspects of the yeast secretory pathway are melded with higher eukaryotic secretory pathways, but here we will focus on delineating yeast specific processes. Figure 1 shows an overview of the secretory pathway and the major processes involved. Supporting Information, Figure S1 through Fig. S5 and Tables 1 and 2 break down specific processes and catalog the secretory proteins associated with this pathway.

Figure 1.

Overview of the secretory machinery. The nascent peptide is folded and modified through different mechanisms until it reaches an appropriate structure to perform its functions as a protein. SRP, signal recognition particle; SPC, signal peptidase complex; PMT, protein O-mannosyl transferase; OST, oligosaccharyl transferase; Ubiq, ubiquitin; Lect, Lectin; ALP, arginine transporter pathway; CPY, carboxypeptidase Y pathway.

Table 1. Proteins involved in cytosolic and ER decisions
Protein complex or groupingProteins involvedAction
Signal recognition particle (SRP)Srp14p, Srp21p, Srp54p, Srp65p, Srp68p, scR1 RNARecognize presignal, direct to SR
SRP receptor (SR)Src101p, Src102pER receptor for SRP
Sec61 complexSec61p,Sbh1p,Ss1pCotranslational translocation pore
Sec62/63 complexSec62p,Sec63p, Sec71p Sec72pPost-translational translocation pore
Signal peptidase complex (SPC)Sec11p, Spc1p, Spc2p, Spc3pPresignal cleavage
Oligosaccharyl transferase (OST)Wbp1p, Swp1p, Ost2p, Ost1p, Ost5p, Stt3p, Ost3p, Ost6p, Ost4pN-linked glycosylation
Protein O-mannosyl transferases (PMT)Pmt1p,Pmt2p, Pmt3p, Pmt4p, Pmt5p, Pmt6p, Pmt7pO-linked glycosylation
ER chaperonesKar2p, Sil1p, Lhs1pProtein folding
ER Redox enzymesEro1p, Pdi1p, Eug1p,Mpd1p, Mpd2p, Eps1pOxidation/reduction of disulfide bonds
N-linked glycan trimmingCwh41p, Rot2p, Mns1pMisfolded protein sensing
Hrd1p complexHrd1p, Hrd3p, Usa1p, Der1pMisfolded protein sensing/trafficking
COPII cargo receptorsSec24p, Sfb2p, Sfb3p, Shr3p, Chs7p, Vma22p, Uso1p, Ypt1pTraffic proteins from ER to Golgi
Table 2. Proteins involved in Golgi and post-Golgi decisions
Protein complex or groupingProteins involvedAction
M-Pol I complexMnn9p, Van1pMannose extension (2–10)
M-Pol II complexAnp1p, Mnn9p, Mnn10p, Mnn11p, Hoc1pMannose extension (11–40)
Extension N-linked mannan polymerasesMnn1p, Mnn2p, Mnn5pMannose extension (+40)
O-linked mannosylasesKtr1p, Ktr3p, Mnt1p/Kre2p, Mnn1pMannose extension (5) for secretory proteins
COPI complexCop1p(α), Sec27p(β), Sec21p(γ), Ret2p(δ)Receptors for retrotransport from cis-Golgi to ER
AP-1 complexAps1p, Apl2p, Apl4p, Apm1pCPY pathway to vacuole
AP-3 complexApl6p, Aps3p, Apm3p, Apl5pCPY & ALP pathway to vacuole
GGA complexGga1p, Gga2pUbiquitin-based sorting to vacuole
ESCRT-0 complexVps27p, Hse1pUbiquitin-based sorting to vacuole
ESCRT-1 complexStp22p, Srn2p, Vps28p, Mvb12pUbiquitin-based sorting to vacuole
ESCRT-2 complexVps25, Snf8, Vps36Ubiquitin-based sorting to vacuole
ESCRT-3 complexVps20, Vps24, Did4p, Snf7pUbiquitin-based sorting to vacuole

Targeting to the endoplasmic reticulum

After ribosomal synthesis begins, a protein bound for the secretory pathway must be selectively targeted to the ER, the first organelle the protein has to pass through in this pathway. The presignal sequence, an N-terminal 15–50 amino acid sequence, determines this step. Varying hydrophobicity of the central region of the presignal can lead to one of three fates (Fig. S1; Martoglio & Dobberstein, 1998). The first, default route uses a hydrophilic presignal (or the lack of a presignal) to ensure cytosolic translation of the protein. A second route uses highly hydrophobic signals to initiate cotranslational translocation at the ER/cytosol interface. In this process, the presignal is bound by the signal recognition particle (SRP) during translation (Table 1; Ng et al., 1996; Mason et al., 2000). SRP will pause translation and direct the ribosome to the ER membrane-bound SRP receptor (SR) (Table 1). Once the ribosome/SRP complex has docked at the SR, cotranslational translocation proceeds, that is, the polypeptide is synthesized as it passes through the Sec61 complex into the ER lumen (Rapiejko & Gilmore, 1997). The energy to drive the polypeptide into the ER is generated by GTP hydrolysis during translation (Osborne et al., 2005). Membrane-bound proteins will be inserted into the ER membrane during cotranslational translocation. After the N-terminal presignal has been inserted into the Sec61 complex, the various hydrophobic regions of the polypeptide chains can leave the Sec61 pore and enter the lipid phase of the ER membrane (Van den Berg et al., 2004). A third pathway exists for presignals that are weakly hydrophobic (Fig. S1). These presignals are not bound by the SRP, and translation is carried out in the cytosol, and the unfolded polypeptide chain is stabilized by cytosolic chaperones. In this scenario, the presignal will interact directly with the Sec61 and Sec62/63 complexes (Table 1), independent of SRP (Plath et al., 1998). The newly synthesized, but unfolded, protein is pulled through the Sec61 complex by being bound to Kar2p, a yeast ER chaperone homolog of Bip/GRP788 (Matlack et al., 1999). As Kar2p binds more and more of the polypeptide chain in the ER, the protein is pulled from the cytosol to the ER. This third, SRP-independent pathway appears to be sufficient to traffic ER-bound proteins necessary for growth and survival. SRP null mutants are viable, but grow slowly, indicating that the second route (involving SRP particle) is important but not strictly required for viability (Brown et al., 1994; Rapoport, 2007).

Endoplasmic reticulum processing

By either of the routes described previously, the polypeptide will begin to enter the ER. During translocation, many structural and chemical modifications will be occurring to manage folding and quality control (Fig. S2). For soluble proteins, the presignal is cleaved by the signal peptidase complex (SPC) immediately (Table 1; YaDeau et al., 1991). For membrane proteins with multiple transmembrane regions, the presignal remains until all membrane spanning regions have been synthesized. Finally, folding chaperones will begin to cover exposed hydrophobic patches (Simons et al., 1995).

Importantly, initial glycosylations occur during translocation (Fig. S2). Glycosylation helps with (1) folding the protein, (2) protecting it from proteases, and (3) serves as a signal for quality control. Glycosylation occurs in two varieties in yeast, N-linked, and O-linked. N-linked glycosylation is accomplished by adding a 14 sugar glycan tree to the asparagine residue of the recognition sequence N-X-S or N-X-T, where X may be any amino acid except proline (Bause, 1983). A N-acetylglucosamine is the anchor of the glycan tree and is attached to the asparagine of the polypeptide. The N-linked glycosylation is completed by the ER-resident oligosaccharyl transferase (OST) (Table 1; Burda & Aebi, 1998). O-linked glycosylation occurs at the hydroxyl groups of serine and threonine and is catalyzed by protein O-mannosyltransferases (PMTs) (Table 1; Strahl-Bolsinger et al., 1999). PMTs transfer a single mannose to the serine/threonine in the ER, but more mannoses may be added later in the Golgi. O-linked glycosylation appears to occur before N-linked glycosylation, resulting in O-linked glycosylation on the serine/threonine of the N-linked recognition sequence (N-X-[S/T]). This implies that N-linked asparagine glycosylation and O-linked serine/threonine glycosylation may be in competition (Ecker et al., 2003).

After translocation, the nascent protein must undergo a series of folding and disulfide bond-forming steps. Quality control sensing determines if the correct structures have been formed before the folded protein is allowed to leave the ER for the Golgi (Fig. S2). Protein chaperones assist the polypeptides along the path to correct folding and help to remove them from the ER when a protein has terminally misfolded (Table 1). Kar2p (BiP), a Hsp70 family molecular chaperone, binds exposed hydrophobic stretches of amino acids (Blond-Elguindi et al., 1993). These hydrophobic regions are generally on the interior of a protein and are only exposed in incorrectly folded proteins. Kar2p repeatedly binds/releases these hydrophobic regions while hydrolyzing ATP (Gething, 1999). When Kar2p is bound by ATP, the Kar2p protein binds weakly to misfolded proteins, while ADP-bound Kar2p binds misfolded proteins tightly.

Disulfide bond formation must correctly pair distal cysteines of the polypeptide chain to form and stabilize the protein in its mature conformation (Fig. S2 and Table 1). Electrons are transferred from the newly formed disulfide bond to protein disulfide isomerase (PDI, Pdi1p in S. cerevisiae) which in turn passes the electrons to the FAD-bound Oxidoreductin 1 (Ero1p). Finally, the electrons are passed to the terminal electron acceptor O2 (Tu & Weissman, 2002). This mechanism forms disulfide bridges at random, and the correct pairings must be found by a trial and error process, involving the repeated oxidation/reduction of cysteines by Pdi1p and its homologs (Tu & Weissman, 2004).

Exit from the ER can proceed by two pathways, (1) to the degradation pathway, ER-associated degradation (ERAD), for misfolded proteins (Fig. S3), and (2) to the Golgi, for properly folded proteins (Fig. S2). The exact biochemical mechanisms for these two pathways have not been completely determined in yeast. However, many parts of the decision making process have been identified. Detection of misfolded proteins and subsequent degradation is accomplished by several pathways (Fig. S3). Glycosylation structures of glycoproteins can traffic proteins to degradation. N-linked glycosylation trimming by glucosidase I (Cwh41p) and glucosidase II (Rot2p) are accomplished quickly and are observed for proteins that exit the ER (Fig. S3 and Table 1; Herscovics, 1999). ER mannosidase I (Mns1p) appears to be a gatekeeper for this degradation pathway. Mns1p removes a single mannose that is involved with targeting for the degradation pathway. Mns1p activity is lower than Cwh41p and Rot2p (Jakob et al., 1998), and this may result in a residence-time clock for proteins that are attempting to be folded. If a protein remains in the ER for too long, the mannose will be removed from the glycoprotein, and the protein will be retranslocated to the cytosol for degradation (Knop et al., 1996). Yos9p, Htm1p, and Mnl1p are believed to act as lectins for targeting de-mannosylated proteins to the ERAD (Fig. S3; Jakob et al., 2001). Kar2p and the Sec61 complex also are involved in the ERAD pathway, with Kar2p-binding acting as a residence-time clock similar to Mns1p, causing terminally misfolded proteins to be shuttled out of the ER (Brodsky et al., 1999). Membrane-bound misfolded proteins can be trafficked to degradation by three different pathways, depending on if the misfolding takes place in the ER lumen, intramembrane space, or on the cytosolic side (Fig. S3; Carvalho et al., 2006). When misfolding occurs on the ER luminal side, Der1p recruits the misfolded protein to the Hrd1p complex for ubiquination (Table 1). When misfolding occurs inside the membrane, the Hrd1p complex ubiquinates in a Der1p-independent manner. Finally, cytosolic misfolding is managed by the Doa10p ubiquitin ligase. Ubiquinated proteins are trafficked to cytosolic proteosome activities. Calnexin/calreticulin systems have been elucidated in mammalian systems. However, the calnexin homolog in yeast (Cne1p) does not appear to have the same function but does have chaperone activity and is involved in the protein degradation pathway (Xu et al., 2004).

For a protein to exit to the Golgi, it must by-pass the degradation pathways mentioned previously and be recognized by receptors for export in COPII vesicles (Fig. S2). These COPII vesicles will traverse from the ER to Golgi where the membrane-bound or soluble proteins are further processed (for a recent review, see Dancourt & Barlowe, 2010). Sar1p acts as a trigger for the structural formation of the COPII vesicles, recruiting Sec13, Sec23p, Sec24p, and Sec31p) to complete the bud formation (Matsuoka et al., 1998). Importantly, several recognition signals are used to specifically bind export-ready proteins inside the forming vesicle. Soluble proteins are trafficked by: Sec24p binding to the di-acidic DXE cargo-sorting signal (Mossessova et al., 2003), and Emp24p, Erv14p, Erv25p, Erv26p, and Erv29p receptors binding to other unidentified motifs (Schimmoller et al., 1995; Belden & Barlowe, 1996), Membrane-bound proteins have cytosolic signals that are recognized by the Sec23-Sec24 complex (Table 1; Bonifacino & Glick, 2004). Sfb2p and Sfb3p, which are Sec24p homologs, are believed to bind other cargo-sorting signals (Roberg et al., 1999; Kurihara et al., 2000; Peng et al., 2000). Shr3p, Chs7p, and Vma22p associate specifically with secretory proteins and may be involved in sorting their target proteins to the Golgi (Herrmann et al., 1999). Glycosylphosphatidylinositol (GPI)-anchored proteins are sorted to the Golgi by Uso1p and/or Ypt1p (Morsomme et al., 2003). Still other proteins appear to be captured nonspecifically and are transported to the Golgi by bulk flow (Malkus et al., 2002). After the COPII vesicle buds off the ER, it traverses to the Golgi by diffusion (Preuss et al., 1992).

Golgi processing

In S. cerevisiae, the Golgi apparatus exists as individual cisternae scattered throughout the cell, which changes from cis cisternae to trans cisternae, in contrast to higher eukaryotes that have well-ordered stacked cisternae (Matsuura-Tokita et al., 2006). Regardless of the localization, many important modifications are made to the proteins in the Golgi, and these modifications affect the post-Golgi trafficking (Fig. S4). Glycoproteins are mannosylated (sometimes exceeding 50 mannoses) on the N-linked and O-linked sugar structures (Fig. S4; Hashimoto & Yoda, 1997; Jungmann & Munro, 1998). Mannoses are added to N-linked sugars in consecutive order by Och1p (one mannose), mannan polymerase I complex (M-Pol I) (10 mannose), mannan polymerase II complex (M-Pol II) (40 mannose), and finally, Mnn1p, Mnn2p, and Mnn5p which can add more mannose (Table 2; Hashimoto & Yoda, 1997; Jungmann & Munro, 1998). O-linked glycans have more stringent mannosylation, and only five mannoses are added, and only to proteins that will be on the exterior of the cell (Table 2; Strahl-Bolsinger et al., 1999). The O-mannosylations are believed to be a signal for trafficking to the exocytosis pathways.

Maturation of the protein in the Golgi also involves cleaving the polypeptide chain. Three Golgi-resident proteases can cleave the polypeptide based on different recognition sites (Fig. S4, lower part). Kex1p cleaves C-terminal arginine or lysine (Cooper & Bussey, 1989). Kex2p, the most well-studied protease, cleaves a (K/R)-R motif (Rockwell et al., 2002). Ste13p is a dipeptidyl aminopeptidase that cleaves repeated X-A motifs (Julius et al., 1983). These polypeptide cleavages allow the following: maturation of proteins, activation of catalytic activity, and changed conformation for binding the intended receptor.

Post-Golgi sorting

After the Golgi maturation processes are completed, the most important sorting processes will take place on the exit from the Golgi. Trafficking from the Golgi can go in many directions, depending on the final destination of the protein, retrograde to ER, transport to early endosome, late endosome, vacuole, plasma membrane, or extracellular space (Fig. S5). Retrograde transport from the cis-Golgi to the ER is important to return membrane area, ER SNAREs, cargo adaptor proteins, and membrane components to the ER, otherwise these resources would be depleted from the ER. COPI vesicles are responsible for the retrograde transport from cis-Golgi to ER (Table 2). Soluble proteins in the Golgi that must be transferred back to the ER contain an HDEL sequence that is bound by the COPI protein Erd2p (Aoe et al., 1997). A range of COPI subunits can recognize cytoplasmic motifs of membrane proteins, such as α and β′ to KKXX, γ to FF or K[K/R]XX of p24 protein, and δ to the δL motifs (WXX[W/Y/F]) (Eugster et al., 2004). Another motif, RKR, on the cytoplasmic side of potassium transporters Trk1p/Trk2p causes retrograde transport to the ER, although the receptor is not known (Zerangue et al., 1999).

Three pathways exit from the trans-Golgi network (TGN), (1) the carboxypeptidase (CPY) pathway, (2) the Golgi-localized, γ-Ear–containing, ADP-ribosylation factor-binding proteins (GGAs)-associated pathway, and (3) alkaline phosphate (ALP) pathway. The default route to the vacuole is via the CPY pathway, a two-step process using adaptor protein (AP) complexes 1 (Fig. S5 and Table 2). AP-1 complex vesicles can transfer proteins from the TGN to the early endosome (Valdivia et al., 2002; Abazeed & Fuller, 2008). In in vitro studies, Kex2p is sorted via AP-1 complex to the early then late endosome (Abazeed & Fuller, 2008). From the early endosome, the default route moves proteins from the late endosome to the vacuole (Dell'Angelica et al., 1997). Data suggest that proteins not having a sorting signal are automatically sorted to the CPY pathway, such as recombinant secretory proteins (Cowles et al., 1997).

The GGA-associated pathway traffics vesicles directly from the TGN to the late endosome. A QRPL motif followed by ubiquitination appears to be the common signals for targeting through this pathway. Gga1p and Gga2p are the sorting proteins and this pathway traffics Vps10p and other vacuole resident proteins to the late endosome (Valls et al., 1990). Rsp5p is a broad-range ubiquitin ligase responsible for ubiquinating these proteins (Dunn & Hicke, 2001; Wang et al., 2001). The ubiquitin-binding domain of Gga1p and Gga2p (Table 2) appears important in the trafficking process (Costaguta et al., 2006). At the late endosome, the GGA pathway converges with the CPY pathway in default transport to the vacuole.

Finally, an additional route exists to traffic proteins from the TGN directly to the vacuole, namely the ALP pathway. The ALP pathway transports membrane proteins using AP-3, independent of the endosome (Fig. S5; Piper et al., 1997). This pathway relies on a 13–16 amino acid (arginine- and lysine-rich) cytoplasmic signal and was identified by ALP sorting aberrant mutants (Cowles et al., 1997).

Endosomal sorting complex required for transport (ESCRT) complexes, four complexes in all, can also bind ubiquinated proteins and form luminal vesicles that are trafficked to the vacuole (Table 2). The ESCRT-0 complex (Vps27p and Hse1p) has ubiquitin interacting motifs that recruit the other ESCRT complexes (Bilodeau et al., 2002). These complexes recruit a deubiquinating enzyme (Doa4p), necessary for maintaining ubiquitin homeostasis in the cytosol, and structural proteins that create the luminal vesicles for vacuolar degradation that are characteristic of the multivesicle bodies (MVB) (Dupre & Haguenauer-Tsapis, 2001; Luhtala & Odorizzi, 2004).

Exocytosis

For proteins that will follow the exocytotic pathway from the trans-Golgi, two pathways exist (Fig. S5). From density-based separation experiments, two types of vesicles are known to merge with the cell membrane and are named light density secretory vesicles (LDSV) and heavy density secretory vesicles (HDSV) (Harsay & Bretscher, 1995). LDSV are known to carry constitutively expressed cell membrane proteins, such as Bgl2p, Pma1p, and Gas1p. LDSV are believed to emerge from the trans-Golgi and transit directly to the cell membrane (Gurunathan et al., 2002). This process takes around 30 min. LDSV may be the final step in lipid raft-based sorting that begins in the ER (Bagnat et al., 2000). Specific cell membrane proteins partition to high sterol-rich domains of the ER membrane. These rafts are directed through the secretory pathway and are finally merged with the cell membrane. Conversely, HDSV package soluble, secreted proteins, such as invertase (Suc2p) and acid phosphates (Pho11p, Pho12p, and Pho5p) that are transcriptionally regulated and induced under certain conditions. HDSV move from the endosome to the cell membrane and are thus subject to many of the mutations that block movement to and through the early/late endosome (Gurunathan et al., 2002). These mutants, which block the HDSV pathway, were shown to use the LDSV pathway for the secretion of proteins normally bound for HDSV pathway (Harsay & Schekman, 2002).

Unfolded protein response–transcriptional control of the secretory pathway

While much of the secretory pathway is managed on the basis of protein–protein interactions (such as ubiquination of misfolded proteins) and chemical modifications to the trafficked protein (such as glycosylation and disulfide bond formation), these processes occur in unstressed conditions during normal cell growth. However, when protein folding stress begins to overwhelm the processing machinery of the ER, large scale transcriptional alterations become necessary to bring the secretory pathway back into homeostasis. This transcriptional response, the Unfolded Protein Response (UPR) is a large scale orchestrated response that increases the capacity of the secretory pathway, clearance of misfolded proteins, and oxidative conditions in the ER (Bard & Malhotra, 2006).

The UPR broadly consists of an upstream sensing mechanism and a downstream activation mechanism to coordinate this broad stress response. The upstream mechanism has been studied in great detail and is primarily controlled by two key proteins, the ER transmembrane protein, Ire1p, and the transcriptional activator, Hac1p. Ire1p contains an ER luminal domain that binds Kar2p/BiP and a cytosolic domain that has kinase and endonuclease activity. Misfolded proteins in the ER are detected when large amounts of Kar2p are recruited away from Ire1p. Under normal conditions, a portion of Kar2p is associated with immature protein, allowing them to fold completely, while the majority of Kar2p is associated with Ire1p. This association with Ire1p causes steric effects that prevent dimerization of Ire1p. However, under stress conditions, most Kar2p molecules are associated with unfolded protein, while simultaneously unfolded proteins are bound to Ire1p. This exchange of Kar2p for unfolded protein causes Ire1p to dimerize. Upon dimerization, the cytoplasmic portion of Ire1p phosphorylates itself, which in turn, activates an endonuclease domain on the cytoplasmic portion of Ire1p. This endonuclease activity is specific to an mRNA sequence in HAC1u, the transcribed RNA from HAC1. Unactivated HAC1u mRNA is constitutively expressed in the cell. However, because of the presence of a 3′ RNA hairpin, HAC1u cannot be translated. Activated Ire1p cleaves HAC1u (becoming HAC1i, for induced) to remove the hairpin, which is followed by R1g1p ligation (tRNA ligase), allowing translation to proceed. Hac1p can then be expressed as a functional transcriptional activator. Recent study revealed that ER-lumenal domain of yeast Ire1 can bind to unfolded proteins directly, drive Ire1 dimerization and activate the UPR (Gardner & Walter, 2011).

A mathematical model has been developed to describe the upstream/activation portion of the UPR. Raden et al. (2005) use a series of ordinary differential equations to describe the Ire1p activation, as it relates to its Kar2p binding state. The model predicted steric effects, by only Kar2p, are not adequate to explain the dynamics of UPR activation. A key facet of this work was that the model considered the relative concentrations of Ire1p and Kar2p in the ER, combined with expected kinetics. The model predicted that with Kar2p over-expression, the cell should tolerate higher amounts of unfolded protein before inducing the UPR. This prediction was tested experimentally, and it was found that the amplitude of UPR activation was decreased, but the UPR induction threshold occurred at the same unfolded protein levels. A revised model, which included an unknown secondary effecter (presumably unfolded protein binding to Ire1p), was able to capture the experimental observations. This model should be useful in understanding the conditions that lead to upstream UPR activation and the level of activation that can be expected.

The downstream portion of the UPR is characterized by a large, multifaceted response to bring the secretory pathway back to homeostasis (Tyo et al., 2012). Hac1p is a transcriptional activator that is known to interact with three binding sequence (in coordination with Gcn4p) to regulate many different activities within the cell in an attempt to correct the misfolded protein problem in the ER (Mori et al., 1996; Travers et al., 2000; Patil et al., 2004). In all, the expression of approximately 380 genes is altered in the UPR response, although only half have Hac1p binding sequences in the promoter (Travers et al., 2000; Kimata et al., 2006). The upstream/detection part of the UPR pathway has been elucidated, however, the downstream/implementation part of the response has been limited to identifying promoter sequences that are specific to UPR and DNA microarray analysis that has identified genes altered by the UPR (Travers et al., 2000; Kimata et al., 2006).

Many cellular responses are activated simultaneously. Broadly, the UPR (1) increases capacity of the secretory pathway and (2) clears unwanted/unnecessary proteins. In the ER, folding rate is increased by upregulating chaperones, such as Fkb2p, Lhs1p, and Kar2p, and disulfide bond formation by Ero1p, Pdi1p, and others. To accommodate increased disulfide bond formation activity and the subsequent reactive oxygen species that can damage the cell (Haynes et al., 2004), oxidative stress response genes are also activated. Glycosylation processing elements of the ER and Golgi are also upregulated to increase processing capacity of the secretory pathway, as these glycosylations are required for proper folding of many proteins. Trafficking components used in COPI, COPII, and post-Golgi vesicles are upregulated. Finally, metabolic pathways for lipid and inositol are upregulated, to increase the amount of membrane. Membrane, while often not considered to be an active component of the secretory pathway, provides essential surface area that is essential for almost all secretory pathway processes.

Aside from increasing secretion capacity, the UPR also clears unfolded protein and reduces the demand for the secretory pathway. To remove misfolded proteins, elements of the ERAD and ubiquitin/proteosome system are upregulated (Travers et al., 2000). Interestingly, cotranslational translocation and post-translational translocation are increased at the ER-cytosol interface, but this is most likely to facilitate the transport of misfolded proteins back to the cytosol for proteolysis, not transport into the ER. Misfolded proteins may also be cleared from the ER in a “feed forward” manner by moving them through the Golgi to the vacuole, as COPII vesicle components are upregulated. Evidence indicates that misfolded proteins can be degraded independent of ERAD, as mutants that abolish ERAD are constitutively activated for UPR and misfolded proteins can be targeted to the vacuole (Hong et al., 1996; Travers et al., 2000). Kimata et al. (2006) also found a number of exocytosis-targeted proteins were downregulated in the UPR. For example, acid phosphotases (Pho3p and Pho5p), various transporters (Ato3p, Fet3p, Fre1p, and Tpo1p), and α-factor, which consume secretory pathway capacity, are reduced to help secretory pathway stress.

The downstream portion of the UPR is ripe for systems biology modeling. As discussed, the UPR initiates and coordinates many processes in the cell to bring the secretory pathway back to homeostasis. While the transcription factor Hac1p is known to signal the UPR, the specific transcription factors that initiate the many subtasks of the UPR have not been identified. As well, the biological information flow should be useful to engineer the secretory pathway for greater recombinant protein productivity. Recently, integrative systems biology analysis was used to identify Hac1p, Fhl1p, and Skn7p as significant transcription factors in the UPR response (Tyo et al., 2012). Fhl1p shows us the role in the coordinated downregulation of ribosomal protein and ribosomal rRNA, thereby decreasing the total translational capacity of the cell. Skn7p is responsible for managing oxidative and osmotic stress responses in the cell. In a UPR stress response, Skn7p is used to upregulate oxidative stress response, thereby mitigating ROS, while downregulating osmotic stress response. Downregulating the osmotic stress response results in fewer cell wall proteins being processed in the secretory pathway, freeing up additional secretion capacity. Further study should lead to scaffold models to map all major branches of the UPR.

Biotechnology: parameters to increase secretion

Through detailed knowledge of the secretion pathway, it has become possible to improve the secretion yield and efficiency through a combination of different molecular techniques (Idiris et al., 2010): (1) engineering signal sequences, (2) optimizing the ER folding environment, (3) affecting vesicle transport, and (4) reducing proteinase activities. High-throughput screening approach is also frequently used to improve the secretory capability, and in the future, it will be interesting to exploit systems biology tools for the evaluation of improved mutants with the objective to find novel metabolic engineering targets.

Nowadays, the secretion level of recombinant protein secretion in S. cerevisiae is still in the order of mg L−1, although some industrial companies have managed to elevate the titers of certain proteins to the g L−1 range. A summary of recombinant protein secretion systems in S. cerevisiae is presented in Table S1 and a more detailed review of different strategies is given in the following.

Engineering the signal sequence

The leader sequence

The leader sequence determines, in part, the trafficking of a secreted protein. The presequence determines whether cotranslational translocation or post-translational translocation occurs for entrance to the ER and the pro-sequence determines the sorting mechanisms in the trans-Golgi network. Native S. cerevisiae leader sequences, foreign leader sequences, and leader sequence devised from theory (synthetic leader) have been used to target heterologous proteins for secretion.

Native leaders often possess certain advantages, which is proved by many cases including human serum albumin (HSA) (Sleep et al., 1990), human interferon (IFN) (Piggott et al., 1987), and Aspergillus niger glucose oxidase (GOD) (De Baetselier et al., 1992). However, recombinant proteins produced by S. cerevisiae are often hyperglycosylated and retained in the periplasmic space (Spear & Ng, 2003; Schmidt, 2004). It is therefore sometimes preferred to choose highly glycosylated leaders, such as the S. cerevisiae α-factor leader, which has proven to be very efficient in some cases, for example, for the secretion of human epidermal growth factor (hEGF) (Chigira et al., 2008), human platelet derived growth factor (PDGF) (Robinson et al., 1994), and Schizosaccharomyces pombe acid phosphatase (Baldari et al., 1987). However, it is not possible to predict which leader is best suited for efficient secretion of a given protein. It is therefore often required to experimentally evaluate different leaders. This is illustrated by a study of Li et al. (2002), who evaluated various leader sequences including INU1, SUC2, PHO5, and MEL1, to secrete either green fluorescent protein (GFP) or GFP-hexokinase fusions. In all cases, the majority of the protein accumulated in the vacuole or endosome (Li et al., 2002). However, using a viral leader from the K28 preprotoxin, secretion was efficient (Eiden-Plach et al., 2004). Another example of this is a study which showed that the yeast invertase signal SUC2 was correctly cleaved from all secreted IFN molecules (Parekh & Wittrup, 1997) unlike when using the native IFN leader that only resulted in 64% cleavage (Hitzeman et al., 1983). However, when using the same SUC2 leader to secrete human α-1-antitrypsin (α-AT), approximately 80% of the protein accumulated in the secretory pathway (Moir & Dumais, 1987).

Synthetic leaders are often used to solve secretion problems, such as (1) inefficient processing of pre- or pro-leaders, (2) hyperglycosylation protein accumulation, and (3) incorrect trafficking in the secretory pathway. Examples of synthetic pre- and pro-leaders include the expression of insulin precursor (IP) (Kjeldsen, 2000), human adenosine A2a receptor (A2aR) (Butz et al., 2003), bovine pancreatic trypsin inhibitor (BPTI) (Parekh & Wittrup, 1997), and single-chain antibody (scFv) (Shusta et al., 1998). Recently, we performed a comparison of a synthetic leader with the α-factor leader and found the synthetic leader to be slightly more efficient for the secretion of insulin precursor and α-amylase (Liu et al., 2012).

There have also been several studies on the importance of both the pre- and pro-regions for different secretion strategies. For most proteins, for example, human insulin-like growth factor 1 (fhIGF-1) (Romanos et al., 1992) and α-globin (Rothblatt et al., 1987), both the pre- and pro-leader should be applied to achieve an optimal secretion. However, there are some exceptions. Ernst et al. found that the pro-region of the α factor leader has only a minor effect on secreting aminoglycoside phosphotransferase (APH) and granulocyte colony-stimulating factor (GCSF), whereas for interleukin-1β, the preregion decreased Kex2p processing efficiency compared with the case when only the pro-region was applied (Ernst, 1988). One possible explanation is that the pro-region may help to stabilize the mRNA or facilitate transcription process (Gabrielsen et al., 1990), however, more studies are still needed to further look into the roles of the different parts of the leader sequence.

Spacers for leader sequences

To achieve a correct final product, the specific proteases need to efficiently cut the pre- and pro-proteins at the correct places. This affects sorting as well as product quality. Recombinant protein secretion directed by pre- pro-leader sequences typically relies on Kex2p endoprotease activity, which is often limiting. Inefficient Kex2p processing results in the secretion of hyperglycosylated unprocessed pro-proteins (Fabre et al., 1991; Kjeldsen et al., 1996). There are many ways to solve this problem. In some cases, spacer residues were included to provide a hydrophilic environment that improves cleavage by Kex2p (Guisez et al., 1991). Another approach, modifying the protein coding sequence, such as to include an alanine N-terminal to the human Interleukin-6 (hIL-6), can also improve cleavage (Guisez et al., 1991). Kjeldsen et al. (1996) tried either to apply a spacer peptide between the leader and the insulin precursor or to apply a “mini C-peptide” (Kjeldsen et al., 1999), and both approaches were found to increase the efficiency of Kex2p endoprotease processing. However, a spacer at the N-terminus of the secreted protein is not always helpful, and in one study, it was found that this approach resulted in 5% intracellular retention of hEGF and 50% for IFN (Singh et al., 1984). Another approach is to over-express the protease genes. Barr et al. (1987) over-expressed the KEX2 gene, and this resulted in improved secretion of correctly processed transforming growth factor-α (TGFα) into the culture medium. Over-expression of S. cerevisiae aspartyl protease (YAP3) (Egel-Mitani et al., 1990) or dipeptidyl aminopeptidase (STE13) (Julius et al., 1983) was also found to improve pro-sequence cleavage. In general, the spacer should have an absence of nonspecific interaction sequences (Fuchs et al., 1997), optimal proteolytic accessibility (Leong & Chen, 2007), and protection of the interface from hydrophobic fragments (Reiter et al., 1994).

Engineering protein folding and glycosylation

Glycosylation takes place in the ER and Golgi and can be engineered based on the amino acid sequence of the protein or the glycosylation enzymes (Tables 1 and 2). Glycosylation mitigates aggregation (Parthasarathy et al., 2006) and hydrolysis (Rudd et al., 2004), and also increases interaction affinity and selectivity (Rudd et al., 1999), but it is still not fully clarified how glycosylation affects secretion level.

Glycosylation seems to have no significant effect on the secretion of α-amylase (Nieto et al., 1999) and IL-1α (Livi et al., 1990). While on the other hand, missing one essential glycosylation site of CD47 reduced its surface expression level by more than 90% (Parthasarathy et al., 2006). Glycosylation has been shown to facilitate protein folding of EGF (Demain & Vaishnav, 2009) and immunoglobulin (Rudd et al., 1999) and keep the activity of interleukin-1β (Livi et al., 1991). Furthermore, introducing extra N-glycosylation sites can improve secretion, as illustrated by the secretion of cutinase, where a fivefold or 1.8-fold increase in secretion was obtained after introducing a N-glycosylation site in the N-terminal and C-terminal regions, respectively (Sagt et al., 2000).

When no glycosylation sites can be added or engineered in the coding region of the protein, an alternative solution is to apply a leader sequence which contains N-glycosylation sites (Chen et al., 1994). N-glycosylation has been shown to be very important for α-factor leader, especially for the pro-region, when directing insulin secretion (Caplan et al., 1991; Kjeldsen et al., 1998). A synthetic leader LA19 with two N-glycosylation sites has also been developed (Fabre et al., 1991) and demonstrated optimal glycosylation for insulin secretion (Kjeldsen et al., 1998). In addition to engineering glycosylation to improve secretory efficiency, important improvements have been made in engineering humanized glycosylation in yeasts. Wildt and Gerngross review this topic in detail (Wildt & Gerngross, 2005).

The number of disulfide bonds is another factor that affects protein secretion (Hober & Ljung, 1999). For example, the expression level of insulin-like growth factor-1 (IGF1) decreased by about one-third when removing either Cys23p or Cys96p, which are likely to be involved in disulfide bond formation (Steube et al., 1991). The expression level and affinity of CD47 decreased by 30% when the core disulfide bond is missing (Parthasarathy et al., 2006).

Protein folding in the ER is often considered the flux controlling step in the secretion pathway (Lim et al., 2002), and over-expression of chaperones, especially Kar2p and PDI, therefore often allows for improved secretion. Kar2p acts as a folding chaperone by binding to exposed hydrophobic sequences (Ma et al., 1990) and also as an ER detergent functioning in the ERAD process (Robinson et al., 1996). On the other hand, PDI catalyzes disulfide bonds formation and isomerization (Laboissière et al., 1995). The soluble levels of PDI decrease upon over-expressing recombinant proteins, implying it functions not only as a catalyst, but also as a chaperone, binding to the heterologous proteins (Robinson & Wittrup, 1995). Over-expression of either Kar2p or PDI improves secretion levels in many cases (Table 3). Over-expression of PDI also improves secretion for proteins that do not contain disulfide bonds, for example, Pyrococcus furiosus β-glucosidase (Smith & Robinson, 2002), suggesting that PDI may act in a chaperone-like capacity or cooperate with the folding or degradation mechanisms on nondisulfide bonded protein (Powers & Robinson, 2007).

Table 3. Chaperone over-expression for recombinant protein secretion in Saccaromyces cerevisiae
Protein nameAmino acidDisulfide bondN-glycosylation siteBiP+ (by fold)PDI+ (by fold)BiP+ PDI+ (by fold)
  1. Several data points come from Protein Knowledgebase (UniProtKB).

PDGF-B1095110 (Robinson et al., 1994)
Hirudin65302.5 (Kim et al., 2003)
BPTI58301 (Robinson et al., 1996)1 (Kowalski et al., 1998)
scFv244212.4 (Shusta et al., 1998)2.3 (Shusta et al., 1998)10.4 (Hackel et al., 2006)
scTCR240132 (Shusta et al., 2000)
A2aR412 01 (Butz et al., 2003)75% (Butz et al., 2003)1 (Butz et al., 2003)
rhG-CSF174201 (Robinson et al., 1996)
PHO435891 (Robinson et al., 1996)4 (Robinson et al., 1994)
P.furiosus β-glucosidase4211 Cys01 (Smith & Robinson, 2002)1 (Smith & Robinson, 2002)1.6 (Smith et al., 2004)
Bovine prochymosin3454 Cys220 (Harmsen et al., 1996)
Plant thaumatin235801 (Harmsen et al., 1996)

Sometimes, Kar2p and PDI work together to ensure proper folding, and Mayer et al. (2000) suggested that Kar2p may maintain the protein in an un-folded state by binding to the protein, and this makes the cysteine residues accessible for PDI activity. This Kar2p/PDI cooperativity increased secretion of scFv (Xu et al., 2005) and β-glucosidase (Smith et al., 2004). However, in other cases, over-expression yields only a minor increase or even a decrease in the secretion, as illustrated for plant thaumatin (Harmsen et al., 1996), IFN-α2a and A2aR (Butz et al., 2003). These differences can be explained by each protein's unique characteristics, such as the presence of glycosylation sites and the number of disulfide bonds.

Besides Kar2p, the cochaperones that are involved in regulating the ATPase activities of Kar2p, like DnaJ-like chaperone Jem1p, Scj1p, and nucleotide exchange factor Sil1p and Lhs1p, are also reported to increase the protein production. By single or multiple over-expression of these chaperones, the secretion levels of recombinant human albumin (rHA) granulocyte–macrophage colony-stimulating factor (GM-CSF), and recombinant human transferrin were improved significantly (Payne et al., 2008). Another approach to engineering the protein folding and secretion is to activate UPR by manipulation of the HAC1 gene. Over-expression of S. cerevisiae HAC1 resulted in a 70% increase in Bacillus amyloliquefaciens α-amylase secretion, but did not increase the secretion of ER-accumulated Trichoderma reesei endoglucanase EGI (Valkonen et al., 2003). Over-expressing T. reesei HAC1 in yeast resulted in a 2.4-fold increase in α-amylase secretion (Higashio & Kohno, 2002). It indicates the effect of UPR activation by HAC1 over-expression is protein specific and dependent on protein properties and regulation impact.

Engineering protein trafficking and minimizing protein degradation

High-level expression of recombinant proteins often results in misfolding and accumulation of protein at certain steps in the secretion pathway. However, different proteins accumulate in different compartments, hepatitis B surface antigen (HBsAg) (Biemans et al., 1991), α-1-antitrypsin (Moir & Dumais, 1987), and erythropoietin (Elliott et al., 1989) accumulate in the ER compartment, but soybean proglycinin is retained in the Golgi (Utsumi et al., 1991). Secretion of heterologous proteins may also interfere with native protein secretion, for example, the secretion of host acid phosphatase gets disturbed by the secretion of tissue-type plasminogen activator (tPA) (Hinnen et al., 1989), probably due to induction of cell stress and lack of capacity in the secretion pathway. Secretion of heterologous genes may also cause increased ER stress that may link to other cellular processes and hereby result in reduced overall productivity.

Other proteins also assist with secretion. For example, Over-expression of the PDI oxidant Ero1p and a cell wall protein Ccw12p, has been reported to optimize the secretion of scTCR by 5.1- and 7.9-fold, respectively (Wentz & Shusta, 2007). Over-expression of the UBI4 gene, increase the secretion level of elafin by 10-fold (Chen et al., 1994). Over-expression of SSO1 and SSO2, which are crucial for vesicle fusion to plasma membrane, increased α-amylase secretion by 2-fold (Larsson et al., 2001; Toikkanen et al., 2004). Co-over-expression of COG6, COY1, and IMH1, all genes related to Golgi vesicle transport, enhance Fab production by 1.2-fold (Gasser et al., 2007). Mutation of the cell wall protein Gas1p strongly improved the secretion of IGF1 (Brinkmann et al., 1993), and a mutation of PMR1, a Golgi-resident calcium ATPase gene (Rudolph et al., 1989), increased the secretion of prochymosin (Harmsen et al., 1996) and propapain (Ramjee et al., 1996). Recently, we showed that it is also possible to improve protein secretion by over-expression of SNARE regulating proteins Sec1/Munc18 (SM) proteins that modulate vesicle transport (Hou et al., 2012).

Proteins, targeted to the vacuole by a group of vacuolar sorting proteins (VPS) (Graham, 1991) and degraded, can hence not be exported. Interestingly, the intracellular sorting is dependent on the catalytic activity of Kex2p (Zhang et al., 2001). Deleting VPS4, VPS8, VPS13, VPS35, VPS36, or PEP4, all encoding vacuolar proteinases, resulted in higher yields of an insulin-containing fusion protein (ICFP) (Zhang et al., 2001). Single deletion of the extracellular protease Ski5p successfully improved the secretion level of killer toxin (Bussey et al., 1983), and disruption of YAP3 alone or together with KEX2 reduced the degradation of HSA and HSA-human growth hormone fusion protein. As well, a single deletion of KEX2 had a minor effect (Geisow et al., 1991).

Besides vacuolar sorting, some proteins may undergo proteasome-based protein degradation. This has been seen for cutinase production in yeast (Sagt et al., 2002). Delta's strains have mutant genomic UBC4 gene, which encodes the ubiquitin-conjugating enzyme, resulting in extremely high plasmid copy number and over-expression of different proteins (Sleep et al., 2001).

High-throughput screening for secretory pathway mutants

Random mutagenesis and screening

Random mutagenesis and screening is another powerful tool to optimize protein expression level, stability, function and antigen-binding affinity (Wittrup, 2001; Vasserot et al., 2003). This can be mutagenesis of either (1) the recombinant protein to be secreted, or (2) the host strain to alter synthesis and secretory properties.

Concerning mutagenesis of the recombinant protein, Zhang et al. (2003) studied single- and double-point mutations within the insulin B-chain and suggested that failure to properly form disulfide bonds should contribute to altered intracellular trafficking. Kowalski et al. (1998) created all possible single and pairwise mutants of a BPTI cysteine and concluded that 5–55 disulfide bond is essential for protein folding and secretion.

When pursuing mutagenesis of the host strain, Smith et al. (1985) found four possible targets by screening mutagenized bovine growth hormone (rBGH) secretion strains and reported that mutations in two genes in particular, SSC1 and SSC2, yield the highest increase in around 15-fold compared with reference strains. Arffman et al. (1990) successfully isolated a strain that could secrete 70-fold more endoglucanase I (EGI) compared with a reference strain through multiple rounds of mutagenesis and selections.

Screening through yeast surface display system

Yeast surface display is a useful technology for the screening of improved protein expression, and it has been used for selecting high-secretion mutants of tumor necrosis factor receptor (TNFR) (Schweickhardt et al., 2003) and scFv (Starwalt et al., 2003). In yeast surface display, the target protein is bound to the mating agglutinin Aga2p by a pair of disulfide bonds. Then, the fusion is displayed on the surface of the cell by binding to the cell wall protein Aga1p (Huang & Shusta, 2005). Surface display data correlates well with secretion data (Shusta et al., 1999), and the technology can therefore be used for the screening of efficient secretion clones. Wentz & Shusta (2007) performed a genome-wide screening through flow cytometric scan by combining yeast cDNA libraries with yeast surface display and found five gene products that promoted display level of a single-chain T-cell receptor (scTCR), including cell wall proteins (Ccw12p, Cwp2p, and Sed1p), ribosomal protein (Rpp0p), and an ER oxidase (Ero1p).

Omics analysis application for recombinant protein secretion

Genome-wide systems analysis is becoming a very powerful tool to understand the cellular responses to protein production and assess the potential strategies for improving secretion. Bonander et al. (2009) analyzed the transcriptome data of eukaryotic glycerol facilitator (Fsp1) producing strains and showed that tuning BMS1 transcript levels resulted in a change of ribosomal subunit ratio and could be used to optimize yields of functional membrane and soluble protein targets. Gonzalez et al. (2003) used metabolic flux analysis to compare a human superoxide dismutase (SOD) production strain to a wild-type strain and showed that the flux of precursors to amino acids and nucleotides was higher, and the activities of the pentose phosphate (PP) pathway and TCA cycle were lower in the recombinant strain. They demonstrated that using the growth associated expression system, ideal conditions for SOD synthesis were either active growth condition during respirofermentative metabolism or transition phase from a growing to a nongrowing state. The data indicated an increase in SOD flux could be achieved using a nongrowth-associated expression system that can eliminate part of the metabolic burden. Recently, our study analyzed secretory pathway dysfunction resulting from heterologous production of human insulin precursor or α-amylase in HAC1 dependent and independent manner by transcriptome and flux analysis. This study revealed that the oxidative radical production because of a futile cycle of disulfide formation and breaking and provided implication on engineering recombinant protein secretion, like engineering the post-Golgi sorting, and balancing the protein folding rates and oxidation rates (Tyo et al., 2012).

Besides S. cerevisiae, the systems biology approach was also used to analyze the secretion capability of P. pastoris. The comparison of the transcriptome of a P. pastoris strain producing human trysinogen with a nonproducing strain revealed a set of secretion helper genes. Thirteen of 524 upregulated genes were selected and the respective S. cerevisiae homologs were cloned and over-expressed in a P. pastoris strain expressing human antibody Fab fragment. Besides five previously characterized secretion helpers (PDI, Ero1p, Sso2p, Kar2/BiP, and Hac1p), another six proteins, more precisely Bfr2p and Bmh2p involved in protein transport, the chaperones Ssa4p and Sse1p, the vacuolar ATPase subunit Cup5p and Kin2p, a protein kinase connected to exocytosis proved their benefits in protein production (Gasser et al., 2007). Through modeling and measuring intracellular fluxes of secreted recombinant protein in P. pastoris with a 34S procedure, Pfeffer et al. demonstrated that 58% protein produced intracellularly were degraded within the cell, 35% were secreted to exterior and 7% were inherited to the daughter cells. This study provides insights of bottlenecks of recombinant protein production and is useful to determine the suitable strategy for secretion improvement (Pfeffer et al., 2012). Although there are not many examples on omics-based cell engineering, as the requirement for advanced cell factory platforms for protein production become greater, these systems biology tools will be highly useful to provide genomic-wide understanding of protein production processes and lead to further rational engineering in yeast, and the studies mentioned earlier provide excellent illustrations of the power of systems biology for studying the complex protein secretory pathway.

Conclusions and perspectives

From the discussions above, it is clear that there are many examples where engineering different parts of the protein secretion pathway has resulted in improvement of heterologous protein production by S. cerevisiae. The availability of efficient expression systems, fermentation techniques, combined with the advances in systems and synthetic biology has secured yeast as an important platform for many protein productions.

To obtain higher yields and higher quality proteins, secretion pathway engineering will be further applied to increase the protein secretion capability. Additional studies on quality control mechanism in ER are required to understand the cellular response to protein folding burden.

Still current engineering strategies are often only successful for a single protein, and they do not result in the establishment of a generally improved cell factory platform for heterologous protein production. Thus, with the objective to establish such a platform, there is clearly a need for improved knowledge about how the flux through the secretory pathway is controlled by the individual steps in the pathway.

Considering the complexity of protein production and secretion with the involvement of a very large number of components, such knowledge can only be obtained through integrated analysis of the complete system/pathway. Such integrated analysis should preferentially be performed using different engineered strains producing different types of proteins to understand the full spectrum of states the yeast protein production system can express. This kind of study could be carried out through expressing several different types of proteins, at best involving small nonglycosylated proteins like human insulin and more complex proteins such as highly glycosylated proteins with a large number of disulfide bonds like erythropoietin, in many different engineered strains, for example, strains that have over-expression of different foldases and isomerases. Through detailed analysis of these strains, for example, using different omics techniques and quantitative analysis of the secretion kinetics, using, for example, pulse-chase experiments, grown at different environmental conditions, it will be possible to establish a large dataset that would allow for advanced correlation analysis. Such correlation analysis could, for example, lead to identification of whether there is a correlation between expression and production for small and simple proteins or whether there is consistently an UPR for more complex proteins, independent of expression strength. Such correlations may lead to a number of hypotheses that can then form the basis for more detailed experiments, for example, on the role of individual proteins (or group of proteins) on protein synthesis and secretion. Results from these experiments can further be evaluated in the context of specific models for protein synthesis and secretion, and the end result of this kind of study may be a rather detailed mathematical model for these pathways, in analogy with models build for metabolism (Soh et al., 2012). Besides allowing for quantitative analysis of the role of the different steps in the pathways, such models can be used to guide engineering design of new cell factories (Tyo et al., 2010). Another path often used for in metabolic engineering for improved metabolite production is a combination of adaptive evolution (Çakar et al., 2012) combined with detailed phenotypic analysis to identify novel metabolic engineering targets, an approach generally referred to as inverse metabolic engineering (Oud et al., 2012).

Even though there are already some examples of mathematical models for specific subprocesses, for example, transcription and translation, there are currently no detailed mathematical models for the overall protein production process in yeast. An obvious first step would be to use existing mathematical models for glycosylation in CHO cells (Shelikoff et al., 1996; Umaña & Bailey, 1997; Krambeck & Betenbaugh, 2005) and expand them to predict glycosylation in yeast. By this, we would better understand how both native and heterologous proteins are glycosylated and could use this knowledge to enhance our understanding of late secretory pathway sorting. Furthermore, there are relatively few studies where omics technologies have been used to their full potential to study the global effect on cellular function to, for example, the UPR. Compared with metabolism, where very detailed mathematical models have been set up and are used for designing pathway engineering strategies, there is much development needed before similar strategies can be used for designing novel engineering strategies for improving protein production. Thus, we conclude that even though there are currently very few examples of how systems biology has contributed to both our basic understanding and engineering of protein synthesis and secretion, systems biology has much to offer in this research field.

Acknowledgements

We thank NIH F32 Kirschstein NRSA fellowship, The Knut and Alice Wallenberg Foundation, EU Framework VII project SYSINBIO (Grant no. 212766), European Research Council project INSYSBIO (Grant no. 247013), and the Chalmers Foundation for funding.

Authors’ contribution

J.H., K.E.J.T. and Z.L. contributed equally to this work.

Ancillary