Correspondence: Jens Nielsen, Department of Chemical and Biological Engineering, Chalmers University of Technology, Kemivägen 10, 412 96 Gothenburg, Sweden. Tel.: +46 31 772 3804; fax: +46 31 772 3801; e-mail: email@example.com
The generation of novel yeast cell factories for production of high-value industrial biotechnological products relies on three metabolic engineering principles: design, construction, and analysis. In the last two decades, strong efforts have been put on developing faster and more efficient strategies and/or technologies for each one of these principles. For design and construction, three major strategies are described in this review: (1) rational metabolic engineering; (2) inverse metabolic engineering; and (3) evolutionary strategies. Independent of the selected strategy, the process of designing yeast strains involves five decision points: (1) choice of product, (2) choice of chassis, (3) identification of target genes, (4) regulating the expression level of target genes, and (5) network balancing of the target genes. At the construction level, several molecular biology tools have been developed through the concept of synthetic biology and applied for the generation of novel, engineered yeast strains. For comprehensive and quantitative analysis of constructed strains, systems biology tools are commonly used and using a multi-omics approach. Key information about the biological system can be revealed, for example, identification of genetic regulatory mechanisms and competitive pathways, thereby assisting the in silico design of metabolic engineering strategies for improving strain performance. Examples on how systems and synthetic biology brought yeast metabolic engineering closer to industrial biotechnology are described in this review, and these examples should demonstrate the potential of a systems-level approach for fast and efficient generation of yeast cell factories.
With recent progresses in genomics, for example, high-throughput genome sequencing and systems biology, for example, genome-scale metabolic models, large data sets can now be generated and analyzed in an integrated and comprehensive way. In combination with robust design/construction tools, such as global molecular level understanding of cellular processes and functions may allow for the development of synthetic networks and genetic circuits leading to, for example, production of engineered biomolecules or programmable organisms displaying novel biological behaviors. Indeed, systems-level knowledge is providing biotech researchers and product developers with the necessary tools to more rapidly (re)design and (re)construct microbial systems and cellular capabilities for a variety of applications. Examples of industrial biotechnology applications of microorganisms engineered using a systems-level approach include renewable chemicals, for example, commodity, fine and bulk chemicals, biopharmaceuticals, for example, compounds for the treatment of infectious diseases and cancer, biofuels, and food ingredients (reviewed in Nevoigt, 2008; Park et al., 2008; Ruder et al., 2011). In the long run, synthetic minimal cells may emerge as a valuable tool for generating other bioproducts, providing the appropriate chassis to integrate functional synthetic parts and devices with functions that cannot generally be found in nature (Gibson et al., 2010; Zhang et al., 2010). Currently, the only functional strategy is still to re-engineer the metabolism of existing cells.
Retrofitting existing microorganisms for the generation of natural or synthetic products is however complex, requiring high-level understanding of how cellular functions and intracellular molecular interactions perform under external and internal perturbations. In addition, the insertion of heterologous pathways in a microorganism does not imply per se a high-level production of the desired bioproduct (Nielsen & Jewett, 2008). To improve yields and/or productivities for reducing costs of production, knowledge-oriented engineering strategies must be adopted. This may include improving precursor metabolites and cofactor supply, up-regulation of genes involved in product export routes, or redesign/reconstruction of metabolic pathways so that feedback inhibition and competing routes are avoided. Other alternatives include adaptive evolution and protein engineering. Adaptive evolution tries to maximize performance by manipulating entire cellular systems (Hong et al., 2011). By combining genetic variations with the selection of beneficial mutations of evolved microorganisms, microorganisms with improved efficiency or tolerance can be generated. Protein engineering focuses on the optimization of the structure and function of proteins involved in specific biosynthetic pathways, combining and/or modifying genes from a number of organisms as well as designing entirely new synthetic genes (Yu et al., 2006).
A widely used model organism for the generation of novel bioproducts, studying human diseases or simply process optimization is the yeast Saccharomyces cerevisiae. Triggered by the publication of its entire genome sequence in 1996 and subsequent advances in disciplines such as genetics and molecular biology, S. cerevisiae has emerged as a platform with enormous potential for R and D. Extensive libraries of genes, metabolites, profiles of RNA transcripts, enzymes, and protein structures have been created. With the aid of detailed mathematical models, the data deposited in such libraries have been gradually analyzed, for example, transcriptomics, proteomics and metabolomics, and integrated, for example, interactomics and fluxomics, allowing a more comprehensive knowledge of how networks and genetic components displaying certain biological behaviors are regulated in this organism. Today, a wide range of products is produced by engineered S. cerevisiae, ranging from protein drugs, for example, human insulin and vaccines, for example, against human papillomavirus and hepatitis, to fine chemicals, for example, sesquiterpenes, commodity chemicals, for example, lactic acid or biofuels, for example, butanol. A comprehensive list of products produced by engineered S. cerevisiae is presented in Table 1. A promising application of S. cerevisiae is the production of antibodies, an ability commonly associated with other yeast species such as Pichia pastoris (Gerngross, 2005). In the last two decades, triggered by the ease of yeast genetic manipulation and the capacity of S. cerevisiae proteins to undergo post-translational modifications, for example, N- and O-glycosylation, there has been a significant increase in the number of studies targeting the generation of humanized S. cerevisiae N- and O-glycosylation strains (Nakayama et al., 1992; Chigira et al., 2008). Although further developments are still required, these studies demonstrate that S. cerevisiae as a cell factory for the production of antibodies and other human recombinant glycoproteins can indeed become reality in the near future. Another promising end-product with potential interest to biotechnological industries is the yeast S. cerevisiae per se, nonengineered, that can be used as a probiotic. In a recent study, orally administered S. cerevisiae strain UFMG 905 isolated from ‘cachaça’ production inhibited weight loss and increased survival rate after Salmonella typhimurium (ST) challenge in a murine model of typhoid fever (Martins et al., 2011). By binding to ST, S. cerevisiae impairs the activation of mitogen-activated protein kinases (p38 and JNK) and transcription factors NF-κB and AP-1, signaling pathways involved in the transcriptional activation of proinflammatory mediators. It is evident that improved characterization of probiotic S. cerevisiae strains using systems biology tools will drive further research into characterization of yeast-derived bioactive compounds in many different human tissues, with clear opportunities for further biotechnological innovation.
Table 1. Examples of products generated by engineered Saccharomyces cerevisiae cell factories
Isolation of DTPSs from different plants species via PCR-based methods, cDNA library sequencing, and database screening, coupled to the substitution of native yeast GGPPs and engineered strain with enhanced flux through the mevalonate pathway
Combining coexpression of two heterologous genes from different plant species (4CL5 and HCBT) with the deletion of pad1and exogenous supply of various combinations of cinnamic acids and anthranilate derivatives
Strain generated for 26 different cinnamoyl anthranilate molecules – PoC
Coupling metabolic engineering (expression of selenocysteine methyltransferase plus high intracellular levels of S-adenosyl-methionine) with bioprocess optimization (fine tuned carbon- and sulfate-limited fed-batch)
24-fold increase in SeMCys production compared with certified reference material of selenized yeast
The aforementioned applications illustrate the potential of systems-level approach for reducing R and D time and increasing speed to market of yeast-derived products. However, the development of such yeast cell factories is often neither fast nor efficient. Product- and cell-related bottlenecks, such as product or intermediate toxicity, competing metabolic pathways, cofactor imbalances, transcription factor regulation, or molecular interactions, are commonly responsible for hampering the generation of novel/efficient strains. To tackle these issues, a systems biological approach, where methods and approaches developed by many disciplines such as mathematics, physics, and biology are integrated in a rational and comprehensive way, must join forces with synthetic biology with its capacity to synthetically create whole systems and circuits. In this review, we provide an overview of the different developmental phases involved in the generation of yeast cell factories (Fig. 1) as well as a description of tools, techniques, methods, and strategies normally used in each of those phases.
Developmental phases of metabolic engineering yeast cell factories
Phase I – the design
The design of yeast cell factories can be accomplished using: (1) rational metabolic engineering – the construction of a desired product producing microorganism by genetically engineering based on physiological, biochemical, and genetic information (Tavares et al., 2011); and (2) inverse metabolic engineering – initial selection of cellular systems with phenotypes similar to the desired one followed by comparative analysis for the identification of genetic differences among selected systems and verification of the potential of identified target genes for generating the desired phenotype in genetically engineered strains (Lee et al., 2011). A classic example of metabolic engineering application to strain generation is the production of resveratrol, an antioxidant known to display cancer chemopreventive activity and to reduce the risk of coronary heart diseases, in S. cerevisiae (Becker et al., 2003). By coupling co-expression of two heterologous genes (coenzyme-A ligase and grapevine resveratrol synthase) in S. cerevisiae laboratory strains with p-coumaric acid medium supplementation, Becker and coworkers developed a yeast strain with the capacity to produce resveratrol. A second example is the production of ethanol as a biofuel in S. cerevisiae (Guadalupe Medina et al., 2010). By combining strain engineering (gene deletions and integration) with medium optimization, Guadalupe Medina et al. were able to generate a yeast strain capable of eliminating glycerol production, a major byproduct of fermentation, and partially converting acetate, an inhibitor of yeast performance in lignocellulosic hydrolysates, to ethanol.
Adaptive evolution is an alternative to metabolic engineering strategies (Fig. 2). Using indirect evolution, for example, evolutionary engineering and random mutagenesis, or direct evolution, for example, recombination and shuffling of genes, pathways or even whole cells, strains with a desired phenotype can be generated (Sauer, 2001). The underlying concept of indirect adaptive evolution is that microorganisms under intracellular or extracellular stimuli tend to evolve/adapt their intrinsic characteristics, for example, cell functions and metabolism, to adapt to the non-natural conditions. During these evolution processes, random genetic mutations are prone to occur, and if the right selection pressure is applied, there is continuous selection for beneficial mutations. Directed evolution, on the other hand, tries to mimic natural evolution in vitro by operating at the molecular level and focusing on specific product-related metabolic pathways. This method requires a mechanism for introducing genetic variations, for example, DNA shuffling, as well as a powerful selection/screening method to select for the improved strain (Liu et al., 2011; Tyo et al., 2011; Wang et al., 2011). The key advantage in using adaptive evolution approaches is that one does not need to understand the underlying molecular mechanisms for a desirable phenotype to improve it. Combined with global engineering strategies, adaptive evolution can take advantage of recent advances in systems and synthetic biology to close the gap between genotype and phenotype, thereby allowing for the identification of novel metabolic engineering targets and a resulting faster design of yeast cell factories (Young et al., 2010). Many successful examples of tuning the phenotype of a microorganism are reported in the literature, including the adaptation of S. cerevisiae to high concentrations of acetic acid (Wright et al., 2011), to multiple stresses (Cakar et al., 2005), to growth on a specific carbon source (Liu & Hu, 2010), or to improved growth on galactose (Hong et al., 2011).
The workflow for designing product producing strains based on the technologies described previously involves five key decision points (Fig. 1). The first decision is the type of product to be synthesized. This will influence the second decision point, the choice of chassis to be used for producing the desired product. This selection is based on information such as the biosynthetic capability of the host, response of the host strain to substrates, intermediates and product, accessibility of genetic tools, and fermentation process accessibility (Keasling, 2010). The second decision point involves verifying whether the chosen chassis can naturally produce the desired product. If this is the case, one can proceed to step three, but if it cannot, pathway engineering strategies, for example, expression of heterologous enzymes, must be developed for establishing product formation. It can also well be that the chosen chassis can produce the desired product, but it does so via a pathway that results in low yield and one may then still include reconstruction of a heterologous pathway in the chassis chosen for production. With recent progress in bioinformatics, metabolic pathway design can now be performed in silico using tools such as the Biochemical Network Integrated Computational Explorer (BNICE) framework (Hatzimanikatis et al., 2005), thus aiding the selection of appropriate genes for reconstructing a biosynthetic pathway. An example of reconstruction of a heterologous pathway is the production of the antimalarial drug precursor artemisinic acid in S. cerevisiae. By engineering the mevalonate pathway concomitantly with introducing amorphadiene synthase and a novel cytochrome P450 monooxygenase from Artemisia annua, Ro and coworkers generated strains able to produce artemisinic acid at titers up to 100 mg L−1 (Ro et al., 2006). The third decision point involves identification of which genes to modify so that the desired product can be produced in high yield and rates. These target genes can be classified into genes that have a positive or a negative effect on product formation. Positive effect genes include the biosynthetic genes leading to the desired product and its exporter, substrate uptake genes and stress response genes, for example. This class of genes can be of endogenous, exogenous, or de novo source (Prather & Martin, 2008). Negative effect genes enclose genes leading to product degradation, genes encoding enzymes that compete for intermediates and product importer genes. Here, optimization algorithms such as OptKnock (Burgard et al., 2003), OptGene (Patil et al., 2005), or OptForce (Ranganathan et al., 2010) can be applied to genome-scale metabolic models to identify metabolic engineering targets in silico. Examples of practical applications of this approach are given further below. The fourth decision point is how to control the quantity and quality of target gene(s). At the quantity level, numerous methods can be used to control gene expression levels. For example, gene copy number control can be achieved using plasmid or chromosomal integration or deletion. Transcription or translation level control can be performed via promoter modification, 5′-UTR modifications and 3′-UTR modifications including RNA riboswitches, codon usage, and translation level alteration. At the quality level, control of enzyme activity levels can be achieved using protein engineering (see section below for more details on protein engineering). The fifth and final decision point is to fine-tune the product biosynthetic network to achieve maximum yields and productivities (Klein-Marcuschamer et al., 2010).
Phase II – the construction
Selection of target gene(s)
The main target genes related to the biosynthesis of a valuable product are selected from endogenous (host cell), exogenous (a different organism), or synthetic (entirely designed or resulting from natural or artificial mutagenesis) sources. These target genes can be modified by over-expression, attenuation, or deletion. Recently, there have been many attempts to create artificial enzymes (Prather & Martin, 2008). For example, 3-hydroxypropionic acid (3-HP) producing strains were constructed using different synthetic pathways containing synthetic enzymes: (1) a β-alanine intermediate pathway using l-alanine-2,3-mutase capable of converting l-alanine to β-alanine, which did not exist in the nature and was developed from L-lysine-2,3-mutase by protein engineering (Jessen et al., 2008); and (2) an oxaloacetate intermediate pathway using a broad substrate range 2-ketoacid decarboxylase, CoA-dependent oxaloacetate dehydrogenase, or malate decarboxylase (Burk & Osterhout, 2010).
Expression level – regulation of target gene
The successfulness of metabolic engineering strategies relies partially on the capacity to control accurately the expression levels of genes directly involved in the metabolic pathway leading to a desired product. To achieve different degrees of controllability, numerous approaches can be used (Table 2). The simplest one is to regulate/control gene copy number. This can be achieved by introducing a plasmid harboring a positive effect target gene or integrating one or several copies of additional positive effect target genes into the chromosome or deleting negative effect target genes from the chromosome. Alternative methods for regulating target gene expression levels include the following: (1) modification of the promoters of target genes or their cognate binding proteins; (2) alteration of 5′-nontranslated and 3′-nontranslated regions; and (3) development of riboregulators. In the first approach, the control of gene expression relies on the identification or design of optimal promoters. For this, promoter libraries, which led to medium to large data sets on gene expression levels in yeast, have been created and screened using either native (Partow et al., 2010; Tochigi et al., 2010) or synthetic promoters (Jeppsson et al., 2003; Nevoigt et al., 2006; Hartner et al., 2008). In the second approach, regulation by 5′-UTRs and 3′-UTRs is mediated by binding of specific RNA-binding proteins to nucleotide motifs located in the 5′-UTRs and 3′-UTRs or by interactions between sequence elements located in the UTRs and specific complementary noncoding RNAs, i.e. a riboregulator. For example, in a recent study, reporter protein expression was down-regulated in S. cerevisiae by integrating a synthetic tetracycline binding aptamer into their 5′-UTRs (Kotter et al., 2009). Another remarkable example are RNA-based synthetic control modules integrated in the 3′-UTRs of the target gene that are based on the hairpin substrate of the RNase III Rnt1. These recognition sites were modulated by either random mutagenesis (Babiskin & Smolke, 2011a) or introduction of RNA-aptamers (Babiskin & Smolke, 2011b). This approach could be employed as a useful gene expression regulation tool for yeast metabolic engineering.
Table 2. Examples of genetic engineering tools for metabolic engineering in yeast
Yeast centromeric plasmids containing a centromere sequence (CEN6), an autonomously replicating sequence (ARSH4) and one of four yeast selectable auxotropic marker genes (HIS3, TRP1, LEU2, or URA3) and antibiotic resistance gene
70–100% high efficiencies assembling of a functional combined D-xylose utilization and zeaxanthin biosynthesis pathway (~19 kb consisting of eight genes) either on a plasmid or on a yeast chromosome at once
In Pichia pastoris, novel short synthetic promoter was developed from a synthetic promoter library, which was constructed by deletion and duplication of putative transcription factor-binding sites within the AOX1 promoter sequence
Fusion protein variants generated by coupling yeast's farnesyl diphosphate synthase (FPPS) with patchoulol synthase (PTS) of plant origin (Pogostemon cablin) in S. cerevisiae showed increased the production of patchoulol up to 2-fold
Riboregulators, such as antisense RNA, micro RNA (miRNA), short interfering RNAs (siRNAs), riboswitches and ribozymes, of both prokaryote and eukaryote origin have been proven to be efficient in regulating gene expression (Isaacs et al., 2006). In addition, making use of their regulatory capacities, researchers have been using riboregulators as synthetic parts and devices for conditional gene expression systems (Suess & Weigand, 2008; Saito & Inoue, 2009). For example, gene regulation has been achieved using antisense RNA (Nasr et al., 1995; Olsson et al., 1997), riboregulators that represent ligand-dependent RNA-encoded genetic control elements (Bayer & Smolke, 2005), and ribozymes (i.e. RNA enzymes) (Atkins & Gerlach, 1994).
In metabolic engineering, specific biochemical reactions can be improved by mutation of a target protein. This can be achieved using protein engineering, which may result in increased enzyme activity, activity on non-natural substrates, enhanced thermostability, tolerance toward organic solvents, enantioselectivity or prevention of substrate/product inhibition (Luetz et al., 2008). Protein engineering is commonly performed using two approaches (Tang & Zhao, 2009): (1) rational design performed by site-directed mutagenesis based on existing knowledge about the structure and kinetic properties of the target enzyme and computational design (Saven, 2011); and (2) directed evolution achieved through random mutagenesis using error-prone PCR or DNA shuffling, for example (Labrou, 2010). Examples of protein engineering for S. cerevisiae strain improvement include the following: (1) the generation of a novel NADP+-dependent xylitol dehydrogenase via site-directed mutagenesis with improved thermostability and catalytic efficiency for ethanol production (Watanabe et al., 2007); and (2) improved lactate production in S. cerevisiae via site-directed mutagenesis of the Lactobacillus plantarum lactate dehydrogenase gene (Branduardi et al., 2006).
Network fine-tuning – expression level of target gene
In metabolic engineering strategies, fine-tuning the expression level of multiple genes is critical. Synchronous and optimal expression of multiple genes in S. cerevisiae can be achieved using a multiple gene promoter shuffling (MGPS) strategy consisting of (1) promoter selection from a promoter library; (2) fusion of selected promoters to multiple target genes; and (3) combination of different promoter-gene pairs (Lu & Jeffries, 2007). Using MGPS, Lu and coworkers shuffled the promoters of GND2 and HXK2 with the genes for transaldolase, transketolase, and pyruvate kinase and generated an ethanol-producing strain for growth on xylose. An alternative approach to MGPS is global transcription machinery engineering (gTME). In gTME, a mutant library of a natural transcription factor is generated and expressed in the host followed by screening the resulting strain library for the desired phenotype. The selected transcription factor can subsequently be used for metabolic engineering (Alper et al., 2006). Another alternative is zinc-finger protein library engineering. In this strategy, a synthetic Cys2-His2 zinc-finger motif library is generated, screened, and the optimal motif used to control the expression of a target gene (Papworth et al., 2006). It has been also proven that internal ribosome entry sites (IRESs) have a certain potential to allow expression of multiple genes simultaneously in yeast (Seino et al., 2005; Xia & Holcik, 2009). IRESs were found in mammals and certain viruses and mediate cap-independent translation modes in eukaryote (Jackson et al., 2010). Thompson et al. (2001) attempted to insert an IRES sequence located in the genome of a cricket paralysis virus into the intergenic region of dicistronic mRNA consisting of LEU2 and URA3 in S. cerevisiae and succeeded to translate efficiently the second URA3 cistron. This finding allows the use of IRESs as synthetic expression regulation parts in yeast. An alternative approach for redirecting/optimizing fluxes toward a desired product is to modulate the spatial organization of pathway enzymes, which may prevent the loss of intermediates by diffusion, degradation, or conversion through competitive pathways (Albertsen et al., 2011). Recently, the Keasling group introduced the protein scaffold approach (Dueber et al., 2009). Using protein–protein interaction domains and ligands from metazoan cells, protein scaffolds containing three enzymes related to mevalonate biosynthetic enzymes were constructed. This allowed for optimization of local enzyme ratios and led to an improvement in product formation even at low enzyme expression level thus reducing the metabolic load. Another approach is direct fusion of pathway enzymes (Albertsen et al., 2011). As a model system, several fusion protein variants in which farnesyl diphosphate synthase of yeast was coupled with patchoulol synthase of plant origin (Pogostemon cablin) were constructed and applied to increase the production of patchoulol.
Phase III – the analysis
To address the function or dysfunction of engineered microorganisms in a global quantitative way, tools from systems biology can be valuable. In the following, the usefulness of ‘omics’ technologies will be addressed, including the description of some useful methods/techniques as well as examples of applications of ‘omics’ for strain engineering/improvement (see Fig. 2).
Genomics combines three areas of science and technology: genetics, high-throughput analytical tools, and bioinformatics. Together, they allow the study of all genes of a cell, identification of entire DNA sequences and analysis of function/interaction of genes within the entire genome's network, for example, epistasis. Since the 1980s when Fred Sanger sequenced for the first time a complete genome of a virus and a mitochondrion, numerous techniques/methods for efficient and accurate gene identification, genome sequencing and mapping, data storage and analysis have been developed. From the analysis of single/specific genes, for example, PCR, one has progressed to DNA sequencing techniques such as the ‘Sanger method’ (Sanger et al., 1977) and more recently to high-throughput DNA sequencing techniques, for example, massively parallel signature sequencing (Brenner et al., 2000) and pyrosequencing (Albert et al., 2007). A classical application of genomics is the identification of mutations arising during yeast evolution experiments (Kvitek & Sherlock, 2011). Using whole-genome sequencing, Kvitek et al. were able to detect all single-nucleotide polymorphisms (SNPs), insertions, deletions, and copy number variants in evolved strains under glucose-limited conditions. Combined with competitive fitness assays, these findings allowed the identification of the mutations individually responsible for yeast adaptation as well as the impact of intergenic interactions on the adaptation process. A second example reports the application of a chemogenomics approach for the identification of novel engineering targets for increased tolerance of yeast cells toward ethanol (Teixeira et al., 2009). By combining genome-wide screening with clustering analysis, Teixeira et al. identified 254 genes important for ethanol tolerance, including those involved in vacuolar, peroxisomal, and vesicular transport, mitochondrial function, protein sorting, and aromatic amino acid metabolism. Of these 254 genes, 18 were considered essential genes for the increase in ethanol tolerance (e.g. FPS1 encoding the plasma membrane aquaglycerolporin). Using comparative genomics, it is possible to identify potential genome structure and function correlations among several yeast species or strains. For gene identification, techniques such as sequencing of expressed sequence tags, serial analysis of gene expression and hybridization to microarrays are commonly used (reviewed in Moody, 2001). Using pairwise or multiple alignment computer programs, for example, RAPYD (Schneider et al., 2011), it is possible to extract key features from sequences and identify promoter regions, regulatory regions, domain information, or SNPs. A recent study demonstrates the usefulness of comparative genomics for the generation of triterpene high-producer yeast strains (Madsen et al., 2011). The authors based their analysis on previous work that showed how whole-genome sequencing and comparative genome analysis for identifying SNPs between two S. cerevisiae strains (CEN.PK113-7D and S288C) could be used to explain phenotypic differences between the strains (Otero et al., 2010). Several pathways with a significant number of SNPs were identified, including the ergosterol biosynthetic and fatty acid metabolic pathways. Based on these findings, Madsen et al. were able to constructed seven yeast mutants engineered to enhance carbon flux through the mevalonate pathway and accumulate high levels of β-amyrin. Recently, we combined genome sequencing, transcriptome and metabolome analysis for the characterization of yeast strains adapted to grow faster on galactose, and hereby identified novel metabolic engineering targets (Hong et al., 2011). The study showed the value of analyzing several different mutants, as this allowed for the identification of consensus mutations, in this case in the RAS2 gene, and it was shown that one mutation, RAS2Tyr112, was significantly contributing to the improved galactose uptake in the mutated strains.
Unlike the genome, the transcriptome is extremely dynamic; studying the transcriptome is, therefore, a comprehensive way of assessing gene expression patterns through the quantification of all RNA molecules. There are several methods/techniques for transcriptome analysis, and they include among others genome tiling arrays, alternative splicing arrays like cross-linking immunoprecipitation (CLIP), RNA-tag sequencing like SAGE (Velculescu et al., 1995), whole RNA sequencing and gene expression arrays, for example, Affymetrix or Agilent (Canelas et al., 2010). For example, Hanlon and coworkers used a modified chromatin immunoprecipitation (ChIP) procedure with DNA microarray analysis (ChIP-chip) to identify the mechanisms by which Tup1, a transcriptional regulator, regulates gene expression in S. cerevisiae under changing macro-environments (Hanlon et al., 2011). They were able to identify four novel candidate cofactors (Cin5, Skn7, Phd1, and Yap6) that interact with Tup1 and therefore modulate the cellular response to a variety of stress conditions. This work paves the way for the identification novel transcription factor interactions and regulatory mechanisms. UV cross-linking methods such as CLIP are commonly used to map key RNA-protein interaction sites with impact on cellular regulation and phenotype. In a recent study, Wolf and coworkers coupled CLIP with Illumina sequencing to understand the role of Khd1, a RNA-binding protein, in the transition from yeast shape to filamentous growth (Wolf et al., 2010). They discovered that Kdh1 regulates both transcription and translation of FLO11 encoding a cell wall protein essential for filamentous growth by acting at the mRNA level. The translation of FLO11 is regulated via binding of Kdh1 to repeated sequences in the open reading frame of FLO11 mRNA, while the transcription is regulated indirectly via the repression of ASH1 mRNA. RNA deep sequencing (RNA-seq) techniques such as Illumina and SOLiD sequencing are commonly used techniques for high-throughput transcriptome analysis (further details on these and other RNA sequencing techniques are reviewed in Wang et al., 2009). In a recent study, Smith and coworkers used Illumina sequencing to understand the role of Dis3, catalytically active RNase associated with the yeast exosome core, on RNA metabolism, for example, turnover of mRNA, rRNA or tRNA, cell cycle progression and microtubule localization and structure in S. cerevisiae (Smith et al., 2011). They found out that DIS3 mutant strains induced cell cycle- and microtubules-related transcript levels quite different from wild-type strains. In addition and for the first time, it was shown that Dis3, Rrp6 (RNase associated with yeast nuclear-localized exosome subunits and complexes) and exosome subunit localization/interaction/function are intrinsically connected. However, the molecular mechanisms underlying this link are still to be fully understood. In another study, Illumina and SOLiD sequencing were used to elucidate the role of noncoding RNAs (ncRNAs) on gene regulation in yeast S. cerevisiae (van Dijk et al., 2011). These authors were able to identify by wide-transcriptome analysis a novel class of Xrn1-sensitive unstable transcripts with strong impact on the regulation of gene expression in S. cerevisiae and, potentially, in post-transcriptional control. Regarding gene expression arrays, our group has recently compared different transcriptome platforms and found that there is good consistency between different platforms, and it clearly showed that gene expression arrays give very reliable data (Canelas et al., 2010).
For transcriptome data mining, statistical significance tests (Ndukum et al., 2011), gene set enrichment analysis (Subramanian et al., 2005), cluster analysis (Sampaio et al., 2011) among many other methods can be used. Software packages such as the biomet toolbox (Cvijovic et al., 2010) allow not only statistical analysis of transcriptome-wide data but also a visualization of key features extracted from such analyses.
Owing to specific regulatory mechanisms unrelated to transcriptional control, for example, translation, post-translational modifications and protein degradation, transcript levels and protein levels are commonly uncorrelated (Olivares-Hernández et al., 2010; Foss et al., 2011; Olivares-Hernandez et al., 2011; Straub, 2011). Two recent studies from our group have shed some light into possible mechanisms responsible for such variation (Olivares-Hernández et al., 2010; Olivares-Hernandez et al., 2011). In the first study, published experimental proteome and transcriptome data were analyzed, clustered and then correlated to find patterns linking the function of a gene with its post-transcriptional regulation. Using an in-house categorization method and integrating genome-wide information, we observed that translational regulation is gene-specific, i.e. transcript/protein levels correlate well in genes having similar cellular functions. In a follow-up study, the impact of translation efficiency on protein-mRNA level variation was investigated. We found that protein-mRNA correlation is highly dependent on gene codon composition, i.e. genes with similar codon frequencies induce similar protein and mRNA levels. These findings clearly highlight the value of a global analysis of the proteome rather than solely a ‘transcript-centric’ view of the cell for a precise quantification/localization of proteins as well as for improved understanding of protein functions and state modifications, interaction networks and expression regulation patterns. In research areas such as metabolic engineering where performance-hampering problems, for example, abundance of proteins and enzyme activities, are prone to occur in consequence of the modifications introduced to the cell, the application of proteomics has been proving to be extremely helpful (Redding-Johanson et al., 2011). By monitoring all pathway intermediates and not only the final product, proteomics can help to identify limiting steps in the global metabolic network, thus providing key information for rerouting pathways toward optimized product generation. A recent study highlighted the potential of proteomics for unraveling the importance of protein complexes in cellular functions (Lee et al., 2011). By applying a new optimization method for protein complexome analysis to the Munich Information Center for Protein Sequences database (Ruepp et al., 2004), Lee and coworkers were able to assign abundance and biological functions to protein complexes as well as previously unknown abundance and functions to reported proteins. In two recent studies, global quantitative S. cerevisiae proteome methods were compared (Usaite et al., 2008) and implemented (Zhang et al., 2011b) to understand cellular functions such as nutrient sensing and metabolic pathways coordination. In the first one, two quantitative approaches (spectral counting and stable isotope labeling) were combined with online multidimensional fractionation and tandem mass spectrometry and evaluated regarding their potential for identifying protein expression differences between engineered and wild-type strains. Despite presenting distinct benefits, for example, sensitiveness and reproducibility, both methods were equally effective in finding within the strains analyzed the proteins with significantly different expression levels (Usaite et al., 2008). By combining the quantitative proteomics with transcriptome and metabolome analysis a detailed network map was reconstructed for the key protein kinase Snf1 (Usaite et al., 2009), and such a map may be very useful in terms of identifying targets for metabolic engineering. In the other study, a systems-level approach was used to obtain key information on how nutrient-limited conditions impact the interaction of Snf1 and TORC1, two key nutrient sensing pathways. By combining phospho-proteomics technology with transcriptome and metabolome data, we were able to map the interaction between these two kinases and identify mechanisms through which Snf1-TORC1 relation regulates/control nutrient sensing and metabolic pathways (Zhang et al., 2011a, b).
Metabolomics involves comprehensive quantitative analysis of all measurable intracellular and extracellular metabolites (e.g. carbohydrates, fatty acids, and amino acids) and their changes over time under given genetic/environmental perturbations. Using this information, one can identify key regulatory nodes in the cellular metabolic network and, further on, characterize cellular functions, for example, gene, transcript, and protein abundance. The analysis of the microbial metabolome consists of three steps: (1) metabolome sample preparation which includes cell quenching, metabolome extraction, and concentration; (2) qualification and quantification of the metabolome; and (3) data analysis and interpretation (Reaves & Rabinowitz, 2011).
In metabolome sample preparation, it is important to use approaches that allow rapid quenching of enzymatic activity, separation of intra- and extra-metabolome, and extraction of the complete metabolome (van Gulik, 2010). For this, rapid sampling and quenching techniques have been developed in yeast (Weibel et al., 1974; de Koning & van Dam, 1992; Theobald et al., 1993; Larsson & Tornkvist, 1996; Gonzalez et al., 1997; Lange et al., 2001; Mashego et al., 2003, 2006; Bolten & Wittmann, 2008; Canelas et al., 2008b). Using a GC-MS-based analysis of metabolites (Villas-Boas et al., 2005b), we performed a comparative analysis of six different methods for extraction of metabolites and found that there are quite large variations in terms of recovery of different metabolite classes using the different methods (Villas-Boas et al., 2005a).
Qualification and quantification of the metabolome has been performed using different methods. The detection technique to use is highly dependent on the number of metabolites to analyze, their complexity, for example, volatile metabolites, and demanded accuracy. Historically, metabolite quantification relies either on spectrophotometric assays (detection of single molecules) or on simple chromatographic separation techniques (detection of molecules on mixtures of low complexity). Over the past decade, advanced methods for analyzing highly complex mixtures of compounds with high accuracy and sensitivity have been established. They consist, in most cases, of combinations of two technologies: chromatographic techniques that allow an initial separation of extracts and spectrometry-based techniques, for example, LC-MS (Zhou et al., 2011), GC-MS (Garcia & Barbas, 2011), CE-MS (Ramautar et al., 2011), NMR (Zhang et al., 2011a, b) and MALDI-MS (Shepherd et al., 2011). The analysis and interpretation of metabolome data can be performed using chemometric approaches or targeted profiling. The first is interested in looking at all metabolites simultaneously, identifying and quantifying specific compounds, and clustering them into specific categories or conditions. For that, multivariate analysis such as cluster analysis, principal component analysis or partial least-squares are commonly used (Allen et al., 2003; Henschke et al., 2006; Rellini et al., 2009). In targeted profiling, metabolite identification and quantification is achieved by comparing the spectrum of interest to a library of reference spectra of pure compounds.
To reduce inaccuracy in metabolome analysis, isotope dilution mass spectrometry (IDMS) was developed (Bowers et al., 1993). IDMS presents some key features such as it allows for the use of species-specific internal standards and compensates for metabolite loss during sample preparation leading to higher accuracy and reproducibility (Yang et al., 2004). IDMS has been applied to analyze methionine and seleno-methionine in S. cerevisiae (Goenaga Infante et al., 2008). It was further employed to quantify intermediates of the nonoxidative pentose phosphate pathway, including the epimers ribulose-5-phosphate and xylulose-5-phosphate as well as erythrose-4-phosphate and glyceraldehyde-3-phosphate in S. cerevisiae in steady-state continuous cultures under pulses of glucose (Cipollina et al., 2009).
Reliable and quantitative metabolome analysis in S. cerevisiae can be also performed using the NAD/NADH ratios (Canelas et al., 2008a). In this article, a bacterial mannitol-1-phosphate 5-dehydrogenase was introduced as a sensor reaction and the ratio of fructose-6-phosphate to mannitol-1-phosphate measured to determine the cytosolic free NAD/NADH ratio. This was found to be 10 times higher than the whole-cell total NAD/NADH ratio under aerobic glucose-limited conditions. This approach was then applied to analyze short-term metabolic responses to pulses of glucose (electron-donor) and acetaldehyde (electron-acceptor), respectively.
Additional examples of metabolomics application in yeast have been reported from the Sauer group. In the first study, Christen & Sauer (2011) used metabolomics and 13C-flux analysis to characterize the intracellular aerobic glucose metabolism of several yeast species (including S. cerevisiae). They discovered that the observed extracellular physiology of yeast cells when changing from respiro-fermentative metabolism to fully aerobic respiration is not followed at the intracellular level. In addition, intracellular metabolite concentrations were found to be species-specific. In another study, the genotype and phenotype of two yeast strains (CEN.PK and S288C) under aerobic, high glucose conditions (glucose repression) were investigated (Kümmel et al., 2010). Using a comparative multi-omic analysis (metabolomics, proteomics and physiology), Kümmel et al. demonstrated the importance of the genetic background on major metabolic pathways such as those involved in glucose signaling and regulation. Multi-omics analysis (metabolomics, proteomics, and fluxomics) was also applied to thermodynamically classify reactions of the central carbon metabolism in yeast as either pseudo-, near- or far-from-equilibrium (Canelas et al., 2011).
Fluxomics involves quantification of the rate of turnover of metabolites through metabolic pathways. With this information, comprehensive characterization of metabolic networks (control and functional regulation) and, subsequently, the phenotype of an organism can be assessed. For the analysis of metabolic fluxes, two types of mathematical models exist: (1) steady-state models, for example, flux balance analysis (FBA) and 13C-based metabolic flux analysis (13C-MFA) that focus on stoichiometric properties of the metabolic networks; and (2) kinetic models that can be combined with FBA in so-called dynamic FBA (dFBA) that focus on cell-wide dynamic regulation (reviewed in Feng et al., 2010). FBA is normally used to identify metabolic pathways with potential for enhanced product formation and cellular metabolic performance, especially if combined with metabolic pathway analysis. On the other hand, 13C-MFA aims at understanding the conceptual operation of a metabolic network using labeled precursors. When applied to large-scale metabolic networks, these models provide an advanced understanding of the cell metabolism at the genome-scale (examples and applications of genome-scale metabolic models are reviewed in Osterlund et al., 2011). Using dFBA, it is possible to analyze the changes in enzyme activities at a global-scale. The description of cellular metabolic and regulatory modifications upon perturbations on extracellular environment can be also assessed using the concept of metabolic control analysis (Kacser & Burns, 1973; Heinrich & Rapoport, 1974). This analytical tool is very useful in predicting the effect of enzymes on a target metabolic pathway, for example, identification of enzymes leading to a nonspecific and nondesired flux. Open access toolboxes are available to analyze fluxome data at bearable computational times and with significant accuracy [e.g. BioMet Toolbox (Cvijovic et al., 2010), OpenFLUX (Quek et al., 2009), and COBRA toolbox (Becker et al., 2007)]. Other methods/algorithms used in fluxomics are extensively described in Park et al. (2009); Liu et al. (2010). Fluxomics is widely used in metabolic engineering as it provides a direct view on how the carbon fluxes are distributing throughout the metabolic network, and hence, it is possible to readout the impact of genetic modifications on the global physiological behavior of an organism. In several recent studies, FBA was successfully used to identify new target genes for enhanced production of succinate (Otero et al., 2007), sesquiterpenes (Asadollahi et al., 2009), vanillin (Brochado et al., 2010), and formic acid (Kennedy et al., 2009). A similar strategy was applied for the identification of key enzymes involved in ethanol production in yeast (Bro et al., 2006). Using a genome-scale reconstructed metabolic network of S. cerevisiae, Bro and co-authors were able to identify an enzyme with potential to reduce glycerol formation and increase ethanol yields under anaerobic, glucose/xylose growth conditions. By expressing the NADP+-dependent glyceraldehyde-3-phosphate dehydrogenase (GAPN) in engineered S. cerevisiae strains, glycerol formation was reduced up to 58% and ethanol yields increased up to 24%. In addition, a genome-scale dFBA model was applied to predict glycerol and ethanol formation under different environmental or genetic perturbations (Vargas et al., 2011).
Today, numerous examples on how systems and synthetic biology has brought yeast metabolic engineering closer to industrial biotechnology have been observed. Using engineered yeast S. cerevisiae as the model organism, it is now possible to generate non-natural biological products such as fine and commodity chemicals, for example, sesquiterpenes and lactic acid, or novel biofuels, for example, butanol, via a nonchemical way. However, retrofitting microorganisms for the generation of such non-natural or synthetic products is extremely complex, requiring a comprehensive understanding of the cellular mechanisms inducing a specific genotype or phenotype. Moreover, the impact of specific, target modifications, for example, insertion of heterologous pathways, on product formation is not always satisfactory, imposing the design and implementation of direct and dynamic feedback optimization strategies.
Recent advances in biology, bioinformatics, and many other disciplines have enabled the development of efficient technologies for designing, constructing, and analyzing novel yeast strains. For example, genes or even entire metabolic pathways can nowadays be designed in silico using tools such as the BNICE framework. Whole genomes can be sequenced and analyzed in a straightforward and fast way by combining high-throughput analytical methods with bioinformatics tools. Molecular biology tools, such as riboswitches or MGPSs, and enzyme engineering enable the accurate control of the quantity and/or quality of single and multiple gene expression. Omics technologies, on the other hand, have been proving to be extremely useful for analyzing quantitatively and comprehensively the cell, providing rich-information about the behavior of the system, for example, genetic regulation, transcriptional control, and protein–protein interactions. Merged with metabolic engineering strategies (rational or inverse), the aforementioned bio-based technologies will clearly enable much faster generation of yeast cell factories. Therefore, in the forthcoming years, it is expected to witness a rapid transition of yeast-derived products from laboratory-scale to industrial implementation. In the long run, systems-level approach will undoubtedly play a central role in industrial biotechnology, driving the design of yeast cell factories with improved capacity to generate high-value bioproducts.
The authors acknowledge the financial support received from the EU Framework VII project SYSINBIO (www.sysbio.se/sysinbio), the European Research Council, Knut and Alice Wallenberg Foundation and the Chalmers Foundation.