Correspondence: Vassily Hatzimanikatis, Laboratory of Computational Systems Biotechnology, Ecole Polytechnique Fédérale de Lausanne, Station 6, CH-1015 Lausanne, Switzerland. Tel.: +41 021 693 98 70; fax: +41 021 693 98 75; e-mail: firstname.lastname@example.org
Many important problems in cell biology arise from the dense nonlinear interactions between functional modules. The importance of mathematical modelling and computer simulation in understanding cellular processes is now indisputable and widely appreciated. Genome-scale metabolic models have gained much popularity and utility in helping us to understand and test hypotheses about these complex networks. However, there are some caveats that come with the use and interpretation of different types of metabolic models, which we aim to highlight here. We discuss and illustrate how the integration of thermodynamic and kinetic properties of the yeast metabolic networks in network analyses can help in understanding and utilizing this organism more successfully in the areas of metabolic engineering, synthetic biology and disease treatment.
Yeast has been used for the production of food, beverages, ingredients, fuels, chemicals and pharmaceutical proteins. The long history of yeast in the development of diverse bioprocesses has led to the accumulation of a wealth of data about its physiology, biochemistry and regulation (Walker, 1998). These studies have established yeast as a robust industrial organism (Petranovic & Vemuri, 2009) and with the current renaissance of interest for the production of fuels and chemicals from renewable resources, interest in yeast has exploded and we expect it will grow stronger. A deeper understanding of its cellular physiology and metabolism can help us to better engineer Saccharomyces cerevisiae to improve the efficiency of production of heterologous products (Bro et al., 2003; van Maris et al., 2007; Wattanachaisaereekul et al., 2007; Wisselink et al., 2009; Zelle et al., 2010).
In addition, in the last 25 years, yeast has been used extensively as a model system for the study of the cell cycle and its connection to cancer (Hartwell, 2004). The similarities in terms of carbon, energy and lipid metabolism between yeast and humans have also made yeast an excellent model system of choice for the study of the role of metabolism in disease aetiology and treatment (Petranovic & Nielsen, 2008; Nielsen, 2009; Bolotin-Fukuhara et al., 2010).
The emergence of functional genomics and systems biology has opened new perspectives for the analysis and the study of biological organisms, and yeast was one of the first organisms to be studied during the development of these technologies. Nielsen and colleagues have reviewed and discussed the development and uses of these technologies in yeast research and development (Jewett et al., 2005; Nielsen & Jewett, 2008). One of the key messages from these reviews is the importance of metabolic fluxes, as the final outcome of intricate nonlinear interactions between the different networks: the networks of transcription, translation, post-translational modification, signal transduction and protein–protein interaction.
Network analysis has been a major effort of research in the area of biological sciences (Albert & Barabasi, 2002; Newman, 2003; Barabasi & Oltvai, 2004; Papin et al., 2004; Joyce & Palsson, 2006; Feist et al., 2009). Technologies that emerged from the progress in genomics have allowed the experimental identification and verification of interactions between genes and their products, from proteins to metabolites to integrated phenotypes, and a wealth of computational methods has been developed and is continuously developing, for the integration of this information into networks and their analysis. The ultimate goal of these methods is to synthesize knowledge into predictive mathematical models that can be used in computational analyses to provide insight and accelerate discovery.
Although it is acknowledged that it is difficult to classify mathematical models in systems biology, two main classes are generally considered (Nielsen & Jewett, 2008): top-down models, where new biological information is extracted from large data sets and the analysis used is mainly inductive (Kell, 2005; Joyce & Palsson, 2006; Ananiadou et al., 2010); and bottom-up models, which are built on detailed mechanistic knowledge and the analysis is deductive but is limited to small networks (Rieger et al., 2005). Herein lies the major challenge in systems biology, the ability to build models with the mechanistic quality of the bottom-up models and the scale (i.e. number of components and interactions) of the top-down models (Papin et al., 2004; Mehra & Hatzimanikatis, 2006).
Nielsen & Jewett (2008) observed that although it is difficult to reconcile bottom-up and top-down modelling, the efforts in curating and building metabolic network models are coming close to achieving this. The combined knowledge of physiology, biochemistry and metabolism allow the reconstruction of networks, which are further curated using flux balance analysis (FBA) to complete missing parts and derive a functional metabolic network. The continuing integration of knowledge about the networks that regulate the activity of the metabolic activities is the first successful demonstration of bridging the gaps between functional regulatory networks (Papin et al., 2004; Joyce & Palsson, 2006; Hyduke & Palsson, 2010; Schellenberger et al., 2010; Hasunuma et al., 2011).
This remarkable progress in the area of metabolic modelling is very good news for those working in yeast metabolism, physiology and bioprocessing. The systems-level, genome-scale understanding of carbon and energy metabolism is critical to enhancing our metabolic toolbox for optimizing the production of industrial chemicals and fuels from yeast. Additionally, it can also help in the elucidation of the aetiology of many metabolic diseases. Understanding carbon metabolism and energy management is also important for understanding and engineering stress tolerance, which is an unavoidable consequence of the desired bioprocess conditions (e.g. high titres and low pH) (Nicolaou et al., 2010), and cellular stress has also been implicated in many diseases (Costa & Moradas-Ferreira, 2001; Sorolla et al., 2008).
However, many important technological issues limit the full promise of useful applications of metabolic models in yeast. Although we understand the structure of the model, i.e. the biochemistry and many regulatory connections, we do not have enough global-scale data. Rather, we have partial data from proteomic, metabolomic, and physiology studies, and in many cases, uncritical analysis of partial data can lead to erroneous conclusions. Another problem arises from the complexity of biological systems and the large-scale, high-throughput nature of the data, as there are also differences in results that arise from the ‘same’ studies in different laboratories, an important issue that has been recently acknowledged and addressed (Lin et al., 2009; Ukibe et al., 2009; Hong et al., 2010; Zhao et al., 2010).
In all the studies of yeast networks, the information is mainly qualitative, i.e. network interactions are described as on/off properties, with very little information on the strength of these interactions. Hence, although these networks can be used to integrate and interpret quantitative observations, such as fluxes and expression data, they can only simulate and predict experiments that disrupt the network connectivity, such as gene knock-out, loss-of-function mutations and mutations of gene regulatory elements. The ultimate objective would be to formulate models that can both describe the steady-state behaviour and predict the dynamic responses of yeast metabolic networks in order to provide insight on how the system would behave to knock-in or knock-down of genes. This would allow us to manipulate the metabolic network to achieve our desired objectives. We will discuss here some of the approaches towards this objective and how integration of thermodynamic and kinetic properties can bring us closer to that aim.
Some considerations in the use and application of the genome-scale metabolic model of yeast
Genome-scale metabolic models have gained significant popularity as versatile tools in many studies (Feist & Palsson, 2008; Oberhardt et al., 2009) and they have proven to be valuable in guiding metabolic engineering decisions (Bro et al., 2003; Patil et al., 2004). With the development of high-throughput and automated reconstruction methods (DeJongh et al., 2007; Henry et al., 2010; Radrich et al., 2010), genome-scale metabolic reconstructions have been increasing at an accelerated pace, even though their number still lags behind that of the genome sequences being completed.
The first genome-scale metabolic model of the yeast S. cerevisiae, named iFF708, was published in 2003 (Famili et al., 2003; Forster et al., 2003). This model was subsequently modified through the inclusion of additional biochemical reactions, genes, regulatory constraints and compartments (Duarte et al., 2004; Kuepfer et al., 2005; Herrgard et al., 2008; Nookaew et al., 2008). Three laboratories, two of which have collaborated in the development of the original model, have led the main developments of these models. Recently, a ‘consensus’ model has been developed through a collaborative approach of a community of yeast researchers to serve as a resource for collecting and summarizing the current and growing knowledge of yeast metabolism (Herrgard et al., 2008).
A main use of the genome-scale models is the study of the physiology of gene deletions. Snitkin et al. (2008) compared model (iFF708) predictions against 465 gene deletion mutants under 16 conditions and found a high fraction of correct predictions (94%) that validated the high predictive capacity of the model and demonstrated how inconsistencies can also be used to drive further hypothesis testing. What is interesting in this study is that Snitkin et al. used the disagreements between model predictions and experiments to guide experimental refinement, which also improved the experimental data significantly. After these refinements in experimental information, they repeated the computational analysis and comparisons with experiments to improve and refine the genome-scale metabolic model.
The first yeast genome-scale model has, as of the time of preparing this review (September 2011), 365 citations, with 79 reviews and 263 research articles, suggesting an important impact in yeast research. However, a few observations can be made regarding the applications of the model. First, a very small number have used the model for discovery of genetic modifications and guidance for metabolic engineering towards improved strain performance. In the first of the three most notable cases, the model was used to identify and rank a set of gene deletions and insertions for the manipulation of redox metabolism towards increased ethanol yield (Bro et al., 2006). Experimental implementation validated the predictions and demonstrated improved ethanol yields even on xylose/glucose mixtures. The second notable case involves the identification of five nonobvious gene deletions for the engineering of C1 metabolism (Kennedy et al., 2009). Finally, in the third case, the yeast genome-scale model was used for the identification of metabolic engineering targets for improving the production of sesquiterpenes (Asadollahi et al., 2009). The complexity of the pathway and its distance from the central carbon pathway [there are eight reactions in the mevalonate pathway from acetyl-coenzyme A (AcCoA) to farnesyl-diphosphate which is the primary precursor of the various sesquiterpenes] made the use of the genome-scale model indispensible. The resulting metabolic engineering strategy, which involved multiple genetic modifications, demonstrated the value and the validity of the model.
On the other hand, a large number of papers that cited the first yeast genome-scale model focused on metabolomics analysis. However, they primarily use the model as a high-quality curated database of metabolites and reactions. Although this two-dimensional annotation has been one of the objectives in genome-scale modelling (Palsson, 2004; Reed et al., 2006), it does not contribute immediately into design of strategies for strain improvement or disease treatment.
The work by Patil & Nielsen (2005) enabled a breakthrough because it allowed the integration of gene expression and metabolomics data into the genome-scale metabolic model for the identification of network patterns that follow a common transcriptional response. The algorithm they developed identifies reporter metabolites and a set of connected genes with significant coordinated changes to genetic and environmental perturbations. This method allows the genome-scale model, together with other genomic technologies, such as transcription factor enrichment, to be used for the identification of important regulatory proteins and their associated regulatory networks (Cakir et al., 2006; Raghevendran et al., 2006; Fazio et al., 2008; Cimini et al., 2009).
Finally, the integration of proteomics information within the context of genome-scale modelling is a recent exciting development (Costenoble et al., 2011). Although this work focused in the study of metabolic adaptation to changes in nutritional conditions, it demonstrated the feasibility of using targeted proteomics for the quantification of almost all the enzymes in central carbon and amino-acid pathways. The synergistic application of these technologies and methodologies with genome-scale model analysis will be a major progress for metabolic engineering.
Approaches to address some issues in the Flux Balance Analysis (FBA) of metabolic models
The discussion above highlights a surprisingly limited use of genome-scale models for metabolic engineering. It appears that the community working in this field has been more active with generating new and larger models and less so with actually using the models. As U. Sauer observed (personal communication), it is the latter that matters, but in every nascent field it is easier to develop tools than to reach new scientific discoveries by applying them, and metabolomics or fluxomics are no different in this respect.
The limited uses of genome-scale models are due to many challenges and issues, which make it difficult for somebody without a good experience in modelling and computation to use them in a productive manner. One of the key challenges in FBA of genome-scale models is the identification of multiple solutions resulting from the underdetermined nature of the problem. The number of alternative solutions scales exponentially with the size of the network (Mahadevan & Schilling, 2003). Even though there are methods that aim to characterize the different flux modes to analyse the possibilities systematically, such as elementary flux modes (EFMs), extreme pathway (EPs) and other variants, most of these methods still do not perform well as the size of the model increases, and hence their applicability and usefulness are restricted. Therefore, given the limited amount of information about certain fluxes or enzyme activities, the main challenge is how we can derive a representative or characteristic flux distribution that can explain the observed phenotype at steady state. Such representative flux state(s) could also be a combination of more elementary flux states that should be further identified and characterized (Hoffmann et al., 2006; Barrett et al., 2009; Llaneras & Pico, 2010).
Flux balance models of metabolism are routinely used in the fitting of labelling experiments for the quantification of metabolic fluxes. However, all of these studies, with one notable exception (Blank et al., 2005), employ small-scale, reduced models of yeast metabolism and they derive additional constraints for determining unique flux profiles. The concept of core models is not new; in fact, historically genome-scale models have evolved from reduced ‘core’ stoichiometric models by including increasingly details. The scale of these models made them more manageable and facilitated analysis. The issue of manageability is illustrated by the number of possible flux modes that the network can have; for example, for a small yeast network comprising 53 reactions, there can be up to 6741 EFMs depending on the carbon source (Dunn et al., 1994) whereas for an Escherichia coli model with 112 reactions, the number of EFMs calculated was 2 450 787 (Perko, 1986). Hence, even though there are methods that can allow the almost complete enumeration and characterization of the EFMs/EPs, the scale of the resulting number of possibilities will remain a huge obstacle in analysis and we must make some drastic assumptions to reduce the possibilities.
Another driver in the use of reduced models has been to understand central metabolism well enough, before attempting to understand and make predictions at the genome scale. Actually, in most of the problems in metabolic engineering, the desired outcome has been the manipulation of central metabolism for redirecting the carbon flux towards desired pathways.
However, reduced models that are used to perform analyses of experimental data are often incompatible with each other as the set of reactions, components and degree of detail (e.g. proton-balancing, balancing of cofactors, etc.) differ significantly. Moreover, there is not an explicit list of the assumptions that would allow consistency checks of the model. For example, the assumptions about the presence or absence of alternative pathways in the determination of flux ratios for labelling experiments will affect the variability of the flux distribution. This can lead to different conclusions arising from the same set of data and difficulty in cross-utilization of datasets across laboratories that could have helped to complete the characterization of the network.
Typically, reduced or core models in the past have been built in a bottom-up approach. We believe that we need a top-down approach that can take advantage of all the knowledge in the genome-scale models. One of the main objectives of such an approach will be to recover the simplicity and clarity of these earlier core models without losing the annotation details and the curated knowledge that has been amassed into the genome-scale models. With the increasing addition of details in these genome-scale models, it is necessary and important to add new knowledge consistently and modularly to keep track of changes, for example with different releases of the S. cerevisiae reconstructions. Moreover, a computational method will allow for a systematic and unambiguous model reduction, and it will facilitate consistency and communication between different laboratories.
Thermodynamic analysis of metabolic networks has also been shown to be important in reducing the flux space and eliminating thermodynamically infeasible pathways (Henry et al., 2007; Boghigian et al., 2010; Soh & Hatzimanikatis, 2010). Thermodynamics can also help to eliminate the need for ad hoc assignment of reaction directionality that can unwittingly preclude possible flux distributions that might be of interest. An example is the phosphoenoylpyruvate carboxykinase (PEPCK) reaction that is often assumed to be operating in the ATP-utilization direction. However, as shown both experimentally and computationally, this reaction can operate in the reverse direction under certain conditions (Deok et al., 2006; Gorsich et al., 2006; Singh et al., 2011). Hence by assuming certain fixed directionalities in the model, we might prematurely eliminate the true state of the network prior to analysis. Therefore, thermodynamics must be used to improve the curation of the models, as they provide additional control over the decision between the assumed (in literature or based on generalized arguments) reaction directionality vs. the possible reaction directionality, based on the estimated Gibbs free energy and the possible range of metabolite concentrations in the cell or metabolomics data.
Besides reducing the flux space effectively, thermodynamics offer another approach for integrating and overlaying additional layers of information in the form of thermodynamic displacement and metabolite concentration information. It has been shown (Henry et al., 2007; Soh & Hatzimanikatis, 2010) that if we include additional information in the form of metabolomics and fluxomics data, we can reduce the possible flux ranges of the network to help us better characterize the flux distribution. Network thermodynamics can also be used to check for consistency of the metabolomics data with flux data, as we would expect that the directionality of the reactions, as determined by the full set of metabolites measured in an experiment, is not in conflict with the directionality of the reactions as determined from the labelling data.
As FBA models are only snapshots of the network at a point in time, they do not allow us to extrapolate the dynamic response of the network. Although approaches based on FBA, such as dynamic FBA (dFBA) by Mahadevan et al., 2002, attempt to overcome this limitation, these methods often use a highly reduced model, and they cannot simulate or predict the response of the metabolite levels because they do not integrate kinetic information. The biggest limitation of FBA methods is their inability to predict response to changes in enzyme activities. In most metabolic engineering studies, we are interested in identifying enzymes as targets for overexpression and/or downregulation, as gene knock out can have a detrimental effect on the physiology of the strain. A recent study investigated the effects of single nucleotide polymorphisms (SNPs) on the phenotypic differences between two different yeast strains (Canelas et al., 2010). The investigators found SNPs in 20% of the metabolic genes and based on these differences they hypothesized physiological differences, which they confirmed experimentally. Based on further transcriptomics analysis, the authors hypothesized that SNPs can be responsible for changes in enzyme concentration and/or function, such as kinetic properties. Such hypotheses, as well as identification of targets for gene overexpression and protein engineering, cannot be analysed without the use of kinetic models of metabolic networks.
Some considerations on the development of kinetic models of yeast metabolism
One of the more widely used yeast kinetic models (Teusink et al., 2000) for analysis and also further model development has about 257 citations as of September 2011. However, very little has been done in the original development of large-scale kinetic models in yeast. With the exception of one case (Wang & Hatzimanikatis, 2006b), almost all models of yeast central carbon catabolism do not distinguish the mitochondrial reactions from the cytosolic reactions. The main issue in the development of kinetic models of metabolic networks is the limited available information and the uncertainty associated with this information. We have previously studied and classified the uncertainty in the study of metabolic pathways in two types: structural and quantitative (Miskovic & Hatzimanikatis, 2010). Structural uncertainty concerns the limited knowledge of the stoichiometry and of the kinetic laws of the enzymes in the pathways. Although the stoichiometry of the pathways in yeast is well characterized, there still exist gaps in some pathways and the kinetics of their enzymes are completely unknown (DeJongh et al., 2007; Feist et al., 2009; Henry et al., 2009; Kumar & Maranas, 2009; Stanley et al., 2010). Also, the kinetic parameters of most enzymes are not available and when they are available they are usually known as ‘apparent Km values’ but not as parameters in detailed kinetic mechanisms. There is also an important concern regarding how the parameters of the enzymes quantified in vitro will change in the crowded intracellular environment (Savageau, 1995; Schnell & Turner, 2004).
Flux distributions, thermodynamic information, metabolite concentration and kinetic parameters are subject to quantitative uncertainty. Despite the advances in methods for the quantification of metabolic fluxes, they still carry some error. The thermodynamic properties of most of the reactions are estimated using group contribution methods, and therefore they contain estimation errors and the error of the experiments used in the estimation process (Jankowski et al., 2008). The biggest uncertainty is in the metabolite measurements, and in addition only a relatively small number of metabolites can be measured compared with the entire metabolome of the organism.
The uncertainty in building mathematical models is very large even for systems that are well studied, such as E. coli and S. cerevisiae. Therefore, when we consider the analysis and engineering of novel pathways, we should expect much higher qualitative and quantitative uncertainty in the information about these systems (Tyo et al., 2007; Alper & Stephanopoulos, 2009).
Uncertainty is a problem common to many areas of physical and chemical sciences and engineering. Within these fields there exist a large number of methods and approaches that allow for the modelling and quantification of uncertainty. These methods have been used in the analysis of metabolic networks and they have provided some insight into the properties of the networks, and guidance for metabolic engineering (Wang et al., 2004; Wang & Hatzimanikatis, 2006a, b; Kiparissides et al., 2009). However, any significant effort in this area faces challenges in the modelling and simulation of uncertainty. When we consider uncertainty modelling and analysis of kinetic models of chemical and biochemical systems, we must ensure sufficiency in the sampling of the kinetic parameters, calculate the properties of a population of the system, solve large systems of nonlinear equations, and perform a statistical analysis to characterize the properties of the population of the system. This leads to many computational challenges: (i) the ranges of the parameter values are not known or they are very large; (ii) the size and nonlinearities introduce computational difficulties; and (iii) reliable statistics can require a computationally prohibitive number of samples. We have recently developed an uncertainty analysis framework, tailored to metabolic systems, and we have made significant progress in addressing these issues (Miskovic & Hatzimanikatis, 2010).
Predicting network responses with limited information
Optimization and Risk Analysis of Complex Living Entities (ORACLE) is a modelling and computational framework which we have recently introduced for the study of metabolic networks under uncertainty (Wang et al., 2004; Wang & Hatzimanikatis, 2006a, b; Miskovic & Hatzimanikatis, 2010). It uses uncertainty and risk analysis method to circumvent most of the limitations mentioned above. In its current stage, ORACLE is used for metabolic control analysis and it allows flux control coefficients and concentration control coefficients to be determined. These coefficients quantify the fold change in metabolic fluxes and metabolite concentrations for a fold change in enzyme activities or in any environmental parameter. There are other similar algorithms for analysis of kinetic metabolic models (Steuer et al., 2006; Tran et al., 2008) and Miskovic & Hatzimanikatis (2011) explain the differences between ORACLE and these approaches. The main advantages of the ORACLE framework are: (i) the ability to consistently integrate thermodynamics and physicochemical constraints into kinetic models, (ii) the ability to integrate omics information (transcriptomics, proteomics, metabolomics, fluxomics) and (iii) its scalability, which enables us to predict kinetic responses of metabolism even for genome-scale metabolic models, which is not feasible with any of the other approaches.
The pivotal point in the development of ORACLE is the recognition that control coefficients depend on the degree of enzyme saturation, also known as enzyme elasticities, which in turn can be estimated through the distribution of the enzyme between the different mechanistic enzyme states. This observation led us to reconsider the uncertainties in the enzyme state space instead of the kinetic parameter space. This reformulation gives the major advantage that we can derive the degree of saturation, or elasticities, by sampling the enzyme state space, which, unlike the parameter space, is very well bounded between 0 and 1. These bounds can be further constrained if the kinetic parameters of an enzyme are approximately known.
The ORACLE framework involves a set of computational procedures, which integrate the available information into a mathematical structure, and through Monte Carlo sampling for retrofitting missing information they generate the population of all possible control coefficients. Conceptually, ORACLE involves the following steps (Fig. 1):
Integration of available information
We start by defining the stoichiometry, based on the information from the genome-scale model. We proceed further by integrating the estimated flux profiles based on information from fluxomics analysis or on hypotheses about desirable flux distributions in an engineered pathway. Finally, we estimate the standard free energy of reactions based on the available experimental information, or using group contribution methods.
Exploring the space of metabolite concentrations
The concentration levels for some of the metabolites in the system might be available, or can be estimated from experiments under similar physiological conditions. For the metabolites whose levels are missing, we can use sampling under thermodynamic constraints in order to preserve the observed flux directionality.
Exploring the space of the kinetic properties (elasticities)
Sampling of either the enzyme states (Miskovic & Hatzimanikatis, 2011) or the degree of saturation of the enzyme's active site (Wang et al., 2004) is very efficient and it can also integrate partial knowledge of the enzyme kinetic parameters.
Consistency checks and pruning
Partial knowledge of the experimentally observed response of a metabolic flux to changes in the activity of an enzyme is used to reject inconsistent samples.
Calculation and statistical analysis, data mining, and visualization of control coefficients
The populations of control coefficients are subsequently analysed using nonparametric statistics and data mining to assess and rank the importance of the enzymes with respect to their impact on the specified objectives (Silverman, 1986; Conover, 1998; Chen & Lonardi, 2009).
Ultimately, the results from ORACLE are not predictions but statistical expectations of success of the metabolic engineering targets they identify. ORACLE provides a set of alternative solutions, evaluated with respect to their uncertainty, which can be given back to the experts for evaluation. This ‘expert opinion’ is the ultimate integration of information that is almost impossible to take into account during the formulation of the model. Overall, ORACLE employs modelling and analysis in a new way, which have been successfully used in other disciplines.
Thermodynamic and kinetic analysis of a reduced, core metabolic model of yeast
In this section, we discuss some of our recent unpublished work to illustrate how we can approach some of the problems discussed above. Our work is based on a core yeast metabolic model. We have developed a computational algorithm that allows the reduction of genome-scale models into core metabolic models. This method allows the unambiguous reduction of genome-scale models and it is also ‘reversible’, in the sense that the results from the analysis of the core model are commutable with analysis using the genome-scale model. The reduction was based on the iMM904 model (Mo et al., 2009) and this is the first such core model for yeast and it consists of 89 reactions and 88 metabolites across two compartments (cytosol and mitochondrial) as shown in Fig. 2. All the reactions are proton-balanced as this has been shown to be important in affecting the overall solution (Reed et al., 2003). Modelling the reactions around, across and inside compartments is important for understanding the in vivo redox and energy balance (Karbowicz & Smith, 1984), but unfortunately is often neglected in most reduced models. In addition, we performed thermodynamic curation and we have been able to include thermodynamic constraints in our reduced model.
We first used this model to perform some basic flux balance analyses, and used reference experimental data from recent work from the Sauer Laboratory to compare our results. Initially we used only information about the carbon source uptake rate and product fluxes (Wang et al., 2011) and, without assuming any reaction directionalities a priori, we performed FBA and flux variability analysis without any thermodynamic constraints. As expected, the system is under-constrained and it is able to generate biomass from CO2 recycling reactions and ATP recycling. However, after adding thermodynamic constraints, the maximum biomass flux drops to close to the measured value as many of the CO2 recycling reactions are automatically constrained in the proper direction under normal physiological concentration ranges predicted by the thermodynamic constraints (Fig. 3). By contrast, when we fix the reaction directionalities in the direction most commonly assumed in genome-scale models, the flux variability is significantly reduced (Fig. 4). By specifying the reaction directionality a priori, we can overly constrain the model in two ways. First, as discussed earlier, reactions that can be reversible under certain conditions, e.g. in the case of PEPCK which was found to be able to operate in the ATP-generating direction in E. coli (Deok et al., 2006; Singh et al., 2011) and in S. cerevisiae (Gorsich et al., 2006) under high CO2 concentrations. Hence by setting the reaction directionality a priori we would have eliminated this possibility and we could not explain the observed physiology using the model. Second, by assigning a priori directionalities, we introduce in the system ad hoc inflexibility and tight constraints, as we observe that, even with thermodynamic constraints, the flux ranges are quite large as compared with those with specified reaction directionalities. Although in metabolomics and fluxomics studies, we would like the model to have few degrees of freedom, in order to have smaller uncertainties in the estimation of the flux values, we should not contaminate our analysis with artefacts from arbitrary assumptions about reaction directionality.
All these results and conclusions from FBA of the core model have been found to hold when we used the corresponding genome-scale model used to derive the core model. Therefore, researchers who are familiar with FBA on small stoichiometric models, but who are not experienced with genome-scale models, can use and analyse this reduced core model much more easily, and their results and conclusions can then be used for genome-scale analysis.
After obtaining a representative flux profile from the thermodynamics-based FBA, we sampled feasible metabolite concentrations and computed the corresponding reaction displacement from thermodynamic equilibrium. We observed (Fig. 2) that the displacement of approximately half of the reactions could be either near or far from equilibrium, whereas the other reactions could assume a wider range of displacements (Table 1).
Table 1. Distribution of reaction displacements from thermodynamic equilibrium in the network. Reactions are classified according to their displacement in the following groups: I - near equilibrium (NE); II - neither near equilibrium nor far from equilibrium (Between); III- far away from equilibrium (FA); Reactions that might belong to more than one of these groups are denotes as I+II, etc. (see Fig. 2 for more details)
I + II
II + III
I + II
No of reactions
We also used ORACLE to investigate how changes in the activities of the enzymes in the network would affect the flux distribution and the levels of the metabolites. We investigated the response of the splitting ratio of the glycolytic fluxes, as quantified by the ratio of the flux through fructose-biphosphate aldolase, over the flux through glucose-6-phosphate-1-dehydrogenase (ZWF). The primary positive control over this ratio lies in ATP maintenance and pyruvate decarboxylase (PDC), whereas the negative control lies in ammonia (NH4t) and oxaloacetate (OAt) transport (Fig. 5). Interestingly, even though an enhancement of hexose transporters (HXT) or hexokinase (HXK) activity has, on average, a negative impact on this ratio, error bars suggest that there are physiological states where its effect could be positive.
The ATP/ADP ratio and the redox potential (NADH/NAD) are important factors in metabolic engineering as adenylate cofactors and pyridine nucleotides are involved in many reactions. Our analysis suggests that the control over these quantities is distributed differently depending on the compartment of the cell. More specifically, a group of enzymes, i.e. HXT, PDC, external NADH dehydrogenase (NDH), ATP synthase and CO2 transport, have positive control, and ATPM and ADP/ATP carrier protein have negative control over ATP/ADP ratio in mitochondria (Fig. 6a). By contrast, in the cytosol the positive control over ATP/ADP ratio is primarily from HXT, whereas the negative control is shifted to PDC, pyruvate dehydrogenase and NH4t (Fig. 6b). Similarly, the major positive control over redox potential in mitochondria (Fig. 6c) is in glucose-6-phosphate isomerase (PGI), and the negative control is distributed between ZWF, CO2 and 6-phospho-d-glucono-1,5-lactone lactonohydrolase (GND1). In contrast, in the cytosol the largest positive control coefficients of energy charge are those with respect to PGI and ATPM, whereas HXT and HXK appear to have the most important negative control. This interesting connection of the redox potential in the mitochondria and the activities in the upper glycolysis and pentose phosphate can be identified and explained only through the application and use of ORACLE.
We have also analysed how the control is distributed over the ethanol yield (with ethanol being one of the most important industrial products) from glucose. Although ATPM and HXT have a major positive and negative effect, respectively, (Fig. 7), the control coefficients are very small in magnitude, and even for the most significant enzymes the mean value is near 0.1, suggesting that activities of multiple enzymes should be altered to effectively increase ethanol yield.
The integration of regulatory constraints will be the next major advancement in the area of metabolic modelling in yeast. Past and ongoing work in the Palsson and Nielsen laboratories is advancing developments in this area rapidly. An interesting approach could come from the combination of concepts from the work of Patil & Nielsen (2005) and Hasunuma et al. (2011). Such an approach will provide important missing links for the development of kinetic models.
Ultimately a kinetic, nonlinear model is the goal. Although there are a number of publications, which claim such models, they all face numerous limitations, which has not been adequately addressed. We should always keep in mind the proverbial quote from Manfred Eigen: ‘A theory has only the alternative of being right or wrong. A model has a third possibility: it may be right, but irrelevant.’ The relevance of the mathematical models in yeast will be evaluated from their contribution to the advancement in our understanding of disease and to the accelerated development of industrial strains. Although there is considerable research evidence in these areas, successful resolution of some of the issues discussed in this article will enhance and broaden the impact of mathematical modelling in yeast research.
We acknowledge the help of Christen Stefan and Professor Uwe Sauer with providing details on their published data. K.C.S. was supported by the Swiss National Science Foundation. L.M. and V.H. were supported by funding from Ecole Polytechnique Fédérale de Lausanne (EPFL) and NEMO for Bioethanol, an EU FP7 Programme. Support was also provided from SystemsX.ch, The Swiss Initiative in Systems Biology, through project MetaNetX.
Names of metabolites and reactions of the network in Fig. 2: HXT, hexose transporters for glucose; HXK, hexokinase; PGI, glucose-6-phosphate isomerase; PFK, phosphofructokinase; FBA, fructose-biphosphate aldolase; TPI, triose phosphate isomerase; TDH, glyceraldehyde-3-phosphate dehydrogenase; PGK, phosphoglycerate kinase; GPM, phosphoglycerate mutase; ENO, enolase; PYK, pyruvate kinase; ZWF, glucose-6-phosphate-1-dehydrogenase; RKI, ribose-5-phosphate isomerase; RPE, ribulose-5-phosphate 3-epimerase; TKL1, transketolase; TKL2, transketolase; TAL, transaldolase; PDC, pyruvate decarboxylase; ALD, aldehyde dehydrogenase; ACS, acetyl-CoA synthase; CAT, carnitine o-acetyltransferase; ACARtrans, acetylcarnitine diffusion; YAT, carnitine o-acetyltransferase; CARtrans, carnitine diffusion; PYRtrans, pyruvate carrier; PDA, pyruvate dehydrogenase; PYC, pyruvate carboxylase; PCK, phosphoenolpyruvate carboxylkinase; OAtrans, oxaloacetate carrier; MAE, malic enzyme; CIT, citrate synthase; ACO, aconitase; IDH, isocitrate dehydrogenase; KGD, α-ketoglutarate dehydrogenase; LSC, succinate-CoA ligase; SDH, succinate dehydrogenase; FUM, fumaratase; MDH, malate dehydrogenase; NDH, external NADH dehydrogenase; NDI, NADH dehydrogenase; NDR, NADPH reductase; QCR, ubiquinol cytochrome C reductase; COX, cytochrome C oxidase; ASN, ATP synthase; AAC, ADP/ATP carrier protein; ADK, adenylate kinase; ATPmt, ATP maintenance; ADH, cytosolic alcohol dehydrogenase; SCD, succinate dehydrogenase (ubiquinone-6), mitochondrial; ACET, acetate diffusion; COH, carbonic acid hydro-lyase; PPP, pyrophosphate phosphohydrolase; MLPIT, malate transport, mitochondrial; ICL, isocitrate glyoxylate-lyase; MLS, l-malate glyoxylate-lyase (CoA-acetylating); MDHc, (S)-malate:NAD+ oxidoreductase; CITc, citrate oxaloacetate-lyase cytosolic; ACOc, citrate hydro-lyase cytosolic; LACm2r, d-lactate transport, mitochondrial; CITt2m, citrate transport, mitochondrial; LDH, (R)-lactate:ferricytochrome-c 2-oxidoreductase; O2m, O2 transport (diffusion); CO2m, CO2 transport (diffusion), mitochondrial; PIm, phosphate transporter, mitochondrial; CO2t, CO2 transport via diffusion; GLYCt, glycerol transport in/out via diffusion reversible; PYRst, pyruvate transport via proton symport; SO4t, sulfate transport via proton symport; Pit, phosphate transport via proton symport; O2t, O2 transport via diffusion; NH4t, ammonia transport via diffusion; LACt2r, d-lactate transport via proton symport; SUCCt2r, succinate transporter in/out via proton symport; MALt2r, l-malate transport in via proton symport; GND1, 6-phospho-d-glucono-1,5-lactone lactonohydrolase; GND2, 6-phospho-d-gluconate:NADP + 2-oxidoreductase (decarboxylating); GPD1, glycerol-3-phosphate:NAD + 2-oxidoreductase; GPD2, glycerol-3-phosphate phosphohydrolase.