Department of Biomedical Engineering, University of Virginia, Health System, Charlottesville, VA, USA
Corresponding author. Department of Biomedical Engineering, University of Virginia, Box 800759, Health System, Charlottesville, VA 22908, USA. Tel.: +1 434 924 8195; Fax: +1 434 982 3870; E-mail: firstname.lastname@example.org
The availability and utility of genome-scale metabolic reconstructions have exploded since the first genome-scale reconstruction was published a decade ago. Reconstructions have now been built for a wide variety of organisms, and have been used toward five major ends: (1) contextualization of high-throughput data, (2) guidance of metabolic engineering, (3) directing hypothesis-driven discovery, (4) interrogation of multi-species relationships, and (5) network property discovery. In this review, we examine the many uses and future directions of genome-scale metabolic reconstructions, and we highlight trends and opportunities in the field that will make the greatest impact on many fields of biology.
Biochemistry has long been occupied with the reconstruction of metabolic pathways. With modern genome-sequencing capabilities, these pathway reconstructions have been increasingly integrated into genome-scale metabolic models. Ten years ago, a metabolic model of Haemophilus influenza became the first genome-scale metabolic reconstruction to be published (Edwards and Palsson, 1999). In the decade since, the field of genome-scale metabolic network analysis has expanded rapidly, and today >50 genome-scale metabolic reconstructions have been published (see Figure 1A). With the growing influence of these reconstructions on biomedical and biological research, and with the field now shifting from an inward focus on method development to an outward focus on application development, it is timely to review the various uses of genome-scale metabolic reconstructions and the future potential of systems-based analyses of metabolism.
Of all organisms that have been analyzed through a constraint-based metabolic reconstruction, Escherichia coli has gained the most attention as a model organism. As the applications of E. coli genome-scale models have been reviewed earlier in detail (Feist and Palsson, 2008), we specifically exclude E. coli from this review, and instead focus on the many other target organisms that have been studied.
A survey of papers citing metabolic reconstructions over the last 10 years revealed several themes in uses of the reconstructions. Following these themes, this review assigns uses of genome-scale metabolic reconstructions (metabolic GENREs; Becker and Palsson, 2005) to five major categories: (1) contextualization of high-throughput data, (2) guidance of metabolic engineering, (3) directing hypothesis-driven discovery, (4) interrogation of multi-species relationships, and (5) network property discovery (see Figure 2). These categories were chosen because they cover the majority of topics that have been addressed using metabolic GENREs, and they each represent a significantly different aim of these models. Areas of research not covered in these categories represent directions which in general have not been as well developed, and which may be possible avenues for future study. Each section describes the importance and historical context of that use, ways that metabolic GENREs have accelerated the knowledge gained in the given field, and finally the drawbacks or current hurdles in applying metabolic GENREs to the given problem. These applications include both applied and theoretical approaches, and represent a broad range of problems that have been engaged using genome-scale metabolic reconstructions.
Model building and analysis
Methods for developing metabolic reconstructions have been reviewed in the past (Reed et al, 2006; Durot et al, 2009; Feist et al, 2009) and several resources exist for model building and analysis (Becker et al, 2007), so those methods are not presented in detail here. Although the majority of this review focuses on uses of completed metabolic GENREs, we will nevertheless quickly overview the reconstruction process itself to highlight ways in which it can be intrinsically useful.
To date, all high-confidence genome-scale metabolic reconstructions have been built manually through a four-step process (Oberhardt et al, 2008). First, an initial reconstruction is built from gene-annotation data coupled with information from online databases such as KEGG (Kanehisa et al, 2006) and EXPASY (Gasteiger et al, 2003), which link known genes to functional categories and help bridge the genotype–phenotype gap. Second, the initial reconstruction is curated through an examination of the primary literature. Then, the reconstruction as a knowledge base is converted into a mathematical model that can be analyzed through constraint-based approaches. Third, the reconstruction is validated through comparison of model predictions to phenotypic data. In a final fourth step, a metabolic reconstruction is subjected to continued wet- and dry-lab cycles, which improve accuracy and allow investigation of key hypotheses.
Generally, a reconstruction includes semi-automated gene-annotation data based on BLAST-homology scores from a sequenced genome, augmented by detailed, manually collected data from organism-specific literature. One of the most immediate contributions of metabolic GENREs to biological knowledge comes in the process of gap analysis during model building, whereby formerly un-annotated gene functions are incorporated into gene-annotation knowledge by analysis of incomplete but essential metabolic pathways (Gonzalez et al, 2008; Oberhardt et al, 2008; Chavali et al, 2008b). The gap-analysis process can be beneficial both by stimulating literature searches that reveal previously overlooked phenotypic data and by posing hypotheses for enzymes that likely exist in the organism but for which no corresponding gene is currently annotated. Aside from offering hypotheses for future experiments, this process serves to crystallize the work done on a particular organism and highlight major areas still left for investigation. The gap-analysis step is also crucial for conversion of a genome-scale reconstruction as a knowledge base into the metabolic GENRE as a functional model, toward whose analysis the full suite of network tools can be applied (Manichaikul et al, 2009).
A completed metabolic GENRE can be used for a variety of applications. Many of the constraint-based analyses possible on metabolic GENREs rely on the theory that evolution selects for fitness-optimizing organisms, a concept crystallized with the development of flux balance analysis (FBA) (Maynard Smith, 1978; Lee et al, 2006). FBA involves optimization of a network for a given objective function, often a ‘biomass’ reaction, to predict in silico flux values and/or growth. This optimization process outputs an optimal set of metabolic flux values that are consistent with maximization (or minimization) of the chosen objective. Another class of constraint-based approaches, termed pathway analysis methods, enables an accounting of all possible flux pathways in a given network, and have been used to examine network properties such as flexibility and variability of flux distributions (Papin et al, 2004). These techniques underlie most model validation performed on metabolic GENREs, and have opened up a wide variety of applications and new directions for model use.
Currently available reconstructions
Metabolic GENREs of prokaryotes encompass an average of 600 metabolites, 650 genes, and 800 reactions, whereas metabolic GENREs of eukaryotes include on average 1200 metabolites, 1000 genes, and 1500 reactions. Excluding the two existing reconstructions of Homo sapiens metabolism lowers the average eukaryotic network size to 800, 800, and 1300, metabolites, genes, and reactions, respectively, a closer but still higher distribution to that of prokaryotes (see Figure 1B–D). Depending on whether the mouse, human, and Arabidopsis thaliana metabolic reconstructions are included in the statistic, between 6 and 13% of all ORFs in a eukaryotic genome are generally included in a metabolic GENRE, whereas metabolic GENREs of prokaryotes include on average 18% of all ORF's. Existing reconstructions span the domains Eukaryota, Bacteria, and Archaea. The most represented domain is bacteria, with 25 species reconstructed. The phylogenetic tree in Figure 3 reveals a conspicuous lack of plant metabolic reconstructions, with a preliminary reconstruction of A. thaliana (Radrich et al, http://hdl.handle.net/10101/npre.2009.3309.1) as the only plant metabolic GENRE released so far. This gap indicates an important direction for future efforts.
Category 1: Contextualization of high-throughput data
With biology increasingly becoming a data-rich field, an emerging challenge has been determining how to organize, sort, interrelate, and contextualize all of the high-throughput datasets now available. This challenge has motivated the field of top–down systems biology, wherein statistical analyses of high-throughput data are used to infer biochemical network structures and functions. In top–down modeling, determination of network structure poses a major technological and computational hurdle (Stark et al, 2003). However, many of the weaknesses of top–down modeling, such as lower accuracy and confidence in the resulting models, can be alleviated by comparison or merging with carefully built bottom–up models, such as metabolic GENREs. By serving as a framework on which other data types can be overlaid, the metabolic reconstruction has served as a powerful tool for contextualizing high-throughput data and aiding top–down approaches, as described below.
High-throughput data can be overlaid onto a metabolic GENRE in several ways. One highly functional way to use a metabolic GENRE for contextualization of gene expression data, protein expression data, C13 flux data, or high performance liquid chromatography derived byproduct secretion profiles is by directly imposing constraints on the GENRE based on the values in the experimental dataset. For instance, if an experimental dataset indicates that glycolytic enzymes are highly active under a given condition, flux can be funneled through glycolysis by constraining the relevant fluxes in silico, thereby forcing flux through the activated reactions and allowing evaluation of changes in global flux distributions. Gene microarray data can be similarly used to constrain metabolic fluxes and can give tremendous insight into conditional changes in metabolic activity (Shlomi et al, 2008), despite a nonperfect correlation between gene expression and protein expression (Ideker et al, 2001; Chechik et al, 2008). In addition to altering the constraints in a functioning metabolic model, high-throughput data can also simply be overlaid on a metabolic network to foster insight into metabolic hotspots or pathways that are significantly altered under certain conditions (Usaite et al, 2006). This process can elucidate otherwise inscrutable relationships between datapoints. Other types of high-throughput data can be analyzed in context of a metabolic GENRE as well, including rapid phenotyping data (e.g. BIOLOG phenotype microarrays, which test metabolic phenotypes of cells under thousands of growth conditions simultaneously), and whole or partial-genome gene essentiality data (Oberhardt et al, 2008). These data describe physiological states, which can directly be compared with in silico phenotypes, often in a qualitative or binary way: an organism grows on substrate X or it does not. In this sense, contextualization of these data represents a refinement of traditional taxonomic methods, which often use growth profiles of an organism for identification but generally lack mechanistic details or explanations for the majority of assayed phenotypes (Boone et al, 2001).
Metabolic reconstructions have been particularly useful as context for gene expression data. Many examples exist of gene microarrays being used in conjunction with genome-scale metabolic reconstructions to give a deeper understanding of why certain changes in expression occur in different environments. In particular, Saccharomyces cerevisiae has been used as a model organism for this type of analysis. A comparison of in silico metabolic fluxes versus microarray gene expression data in E. coli and S. cerevisiae revealed that metabolic genes whose fluxes are directionally coupled generally show similar expression patterns, share transcriptional regulators, and reside in the same operon (Notebaart et al, 2008). Expression data has also been coupled with various generations of S. cerevisiae metabolic reconstructions to determine which portions of metabolism are most sensitive to nitrogen limitation (Usaite et al, 2006) and to compare metabolic states during growth on glucose, maltose, ethanol, and acetate (Daran-Lapujade et al, 2004). In these studies, expression states of metabolic genes were overlaid on the reactions their protein products catalyze, and expression patterns of metabolic enzymes were then compared against the fluxes predicted in silico under the given growth condition. Without a model to lay these expression data on, it would be difficult to characterize the global expression states. In another striking example, a metabolic model of S. cerevisiae was augmented with 55 regulatory transcription factors regulating 348 metabolic genes to form a regulatory-metabolic network (Herrgard et al, 2006). From an initial regulatory network, ChIP-chip and binding-site motif data were used to expand the regulatory rule-set, and this expanded network was shown to have higher predictive power of gene expression when evaluated with 12 microarray datasets. The use of a regulatory-metabolic model to predict gene expression changes is a powerful direction for further research with metabolic reconstructions, one which pushes closer toward modeling the function of an entire cell (Lee et al, 2008b).
High-throughput technologies to determine the intracellular metabolic state of cells have also been aided by the development of metabolic GENREs. Intracellular metabolic fluxes can be determined through the use of 13C-labeled glucose experiments, in which labeled carbon is tracked during growth of cells in a chemostat culture and computational methods are used to reconstruct the paths that carbon took inside the cells during growth. Although 13C isotopomer tracking has been performed without the aid of a metabolic GENRE, the comprehensive coverage of metabolic pathways enabled by the genome-scale reconstructions has made these attractive frameworks for 13C tracking experiments (Vo et al, 2007; Panagiotou et al, 2008). Metabolic GENREs have also been used as frameworks for interpreting metabolite concentration data. In one study (Cakir et al, 2006), a high-throughput GC-MS method was used to determine concentrations of 52 metabolites in S. cerevisiae. Differences in metabolite concentrations under known environmental conditions were mapped onto a modified S. cerevisiae metabolic GENRE, and this mapping was then combined with transcriptome data to investigate the effectors of metabolic regulation in the cell.
In many cases such as the one highlighted above, multiple high-throughput data types are analyzed in concert through the framework of the metabolic GENRE, which allows for a highly integrated picture of cell function to emerge. Transcriptomic data in particular is often linked with other data types, such as protein expression data (Shlomi et al, 2008), protein–protein interaction data, protein–metabolite interaction data, and physical interaction data (Prinz et al, 2004). Particularly in light of multiple data types, the metabolic GENRE can be a valuable tool for data interpretation. For instance, multiple data types were used in concert with a metabolic GENRE to determine tissue-specific metabolic activities in H. sapiens (Shlomi et al, 2008) and to compare the filamentous-form versus the yeast form of S. cerevisiae (Prinz et al, 2004).
Metabolic GENREs are natural frameworks for contextualizing genome-scale data, and as a result there have been many studies that use metabolic GENREs in this manner. High-throughput data have even been used to aid in building metabolic GENREs; for instance, the fact that some mitochondrial genes lie outside the mitochondrion made proteomic data key to building a model of human mitochondrial metabolism, in combination with biochemical data from literature (Vo et al, 2004). However, a major challenge still lies in determining an optimal strategy for interpreting high-throughput data. For instance, while a link has been established between expression of the gene for a metabolic enzyme and the bounds on metabolite flux through that enzyme (Chechik et al, 2008), establishing a scalable, reliable heuristic for bounding reaction fluxes using transcriptomic or proteomic data remains an unanswered challenge in the field. Noise also has a function in obscuring the relationship between high-throughput data and flux-related phenotypes. Particularly in situations where multiple noisy high-throughput datasets are considered at once, the question arises of how to integrate all of the data into one cohesive mathematical framework. These difficulties will have to be addressed through development of rigorous quality control measures for standardizing data analysis in the future.
Category 2: Guidance of metabolic engineering
Metabolic engineering involves the use of recombinant DNA technology to selectively alter cell metabolism and improve a targeted cellular function (Bailey et al, 1990). Traditionally, metabolic engineering has been performed on a small scale through manipulation of a few genes to affect yield of a target metabolite. Enzymatic targets are chosen through analysis of literature-derived central metabolic pathway maps, or intuitive engineering based on local metabolic knowledge. These local approaches have yielded success in the past, enabling engineering of new metabolic pathways and improvement of existing processes in E. coli (Bailey et al, 1990), S. cerevisiae (Nevoigt, 2008), and other microorganisms (Park and Lee, 2008). However, the complexity of metabolic networks, compounded by multiple layers of transcriptional, protein, and substrate-level regulation of metabolic enzymes, renders predictable metabolic engineering extremely difficult, and often causes unwanted consequences or sub-optimal outcomes when local network maps or intuitive knowledge are the basis of engineering decisions (Kim et al, 2008; Nevoigt, 2008). The inherent drawbacks of using local analysis tools to guide cell-scale metabolic engineering efforts have motivated the use of metabolic GENREs and other genome-scale technologies, in what has been termed ‘systems metabolic engineering’ (Park and Lee, 2008). The use of metabolic GENREs represents a major evolution for the field, wherein whole-cell networks and systems-level analyses are for the first time being leveled to determine optimal engineering strategies on a whole-cell basis (Park and Lee, 2008).
Because of the industrial importance and metabolic centrality of TCA intermediates such as malic acid and succinic acid, many recent metabolic engineering efforts have focused on increasing production of these metabolites. In a recent study, S. cerevisiae was engineered to produce 59 g/l of malate, an amount five times higher than earlier efforts (Zelle et al, 2008). This remarkable improvement was validated by 13C-NMR flux determination, using a metabolic GENRE as the basis for the 13C flux model. Several other examples of metabolic GENRE-guided metabolic engineering involve genome-scale reconstructions of the succinic acid producing bacterium Mannheimia succiniciproducens. In the initial publication of the M. succiniciproducens genome sequence, a constraint-based metabolic model was presented including 373 reactions and 352 metabolites (Hong et al, 2004). This model was used to compare metabolic flux distributions between M. succiniciproducens and E. coli, and to identify a combination of three pyruvate-forming enzymes whose removal from E. coli would likely increase succinic acid production. Genetic engineering efforts targeting those three genes were successful, and a succinic acid producing strain of E. coli was generated (Lee et al, 2008c). Later, an expanded metabolic GENRE was published for M. succiniciproducens, including 686 reactions and 519 metabolites (Kim et al, 2007). This expanded reconstruction, which was used to predict succinic acid production in a variety of experimental circumstances, is currently being used to further investigate the metabolic capabilities and guide engineering strategies in M. succiniciproducens (Lee et al, 2008c).
Metabolic GENREs have been used to guide other types of genetic engineering efforts aside from increasing production of value-added chemicals. In one study, an algorithm called Optknock (Burgard et al, 2003) was applied to the metabolic GENRE of Geobacter sulfurreducens to determine optimal gene knockouts to maximally increase respiration rates (Izallalen et al, 2008). G. sulfurreducens is a bacterium whose ability to oxidize organic compounds using metals as terminal electron acceptors has made it highly attractive for bioremediation efforts. The Optknock analysis predicted that increasing ATP demand would increase NADPH oxidation rates, and subsequent alteration of membrane-bound F0F1 ATP synthase achieved the predicted increase in G. sulfurreducens respiration rate (Izallalen et al, 2008). Metabolic GENRE-guided genetic engineering has also been used to aid the scale-up for bulk production of a vaccine against the pathogen, Neisseria meningitides (Baart et al, 2007a). These examples serve to highlight the various ways in which metabolic GENREs have aided genetic engineering by casting the engineering efforts in context of the whole-cell metabolism of the organism being studied.
Although metabolic GENREs are powerful tools of great utility for metabolic engineering, issues such as pleiotropy (a single gene that affects multiple phenotypic traits), unaccounted for or inactive isozymes, and mis-annotation of critical genes can weaken the efficacy of computational predictions in determining engineering targets. In addition, it is sometimes not clear whether the considerable effort it takes to build a metabolic reconstruction is a good investment when developing a metabolic engineering strategy, as issues such as allosteric enzyme regulation can necessitate detailed dynamic modeling of a specific pathway to accurately predict phenotypes (Stephanopoulos and Vallino, 1991; Contador et al, 2009). Still, even when such approaches are necessary for the engineering process, metabolic GENREs are uniquely capable of predicting secondary effects of a given metabolic perturbation on other, often nonobvious portions of metabolism (Nevoigt, 2008), and therefore nearly always have the potential to be useful in developing such strategies. Further, kinetic constants for metabolic interactions can be extremely difficult to ascertain, so constraint-based modeling remains an attractive alternative to these methods (Contador et al, 2009). Another hurdle in metabolic engineering is the importance of transcriptional regulation in determining metabolic phenotypes. Regulatory genes are often important targets in metabolic engineering efforts due to their primary function in determining the distribution of metabolic flux (Bailey et al, 1990). With the inclusion of some regulatory rules, the usefulness of a metabolic GENRE can be significantly increased for guiding these efforts. However, regulatory networks are generally less well characterized than metabolic networks, and tend to be far more species and strain specific (Herrgard et al, 2004). This makes it difficult to reliably predict good metabolic engineering targets, as some crucial regulatory information is generally unknown for a given organism. Therefore, an increased effort into the reconstruction and analysis of regulatory networks will be of major utility for GENRE-guided metabolic engineering efforts in the future.
Category 3: Directing hypothesis-driven discovery
Much of what is known in biology today is the result of meticulous, hypothesis-driven discovery. This research has been guided by heuristic, informal models of biology developed in the minds of experts during years of work in a particular field or on a particular problem. However, with the tremendous expansion of biological data in recent years, the need has arisen for new method development to integrate high-throughput data with the biological discovery process. Gene microarrays serve as a prime example; a traditional hypothesis-driven study might include examination of 1 or 2 genes in a microarray that are of particular interest. This approach would ignore the thousands of other genes on the chip, however, and could miss important information or trends embedded in those data. Therefore, a systematic framework for incorporating genome-scale data available from multiple high-throughput methods would allow hypothesis-driven biology to benefit from the full range of tools available today.
Metabolic GENREs represent concise collections of existing hypotheses, and taken together as a broad context they enable systematic identification of new hypotheses that can be tested and resolved. Therefore, they represent a crucial framework for incorporating the flood of biological data now available into the biological discovery process.
Beyond computational predictions, metabolic GENREs have been used to frame investigations into specific biological questions, using a mix of traditional biological approaches and computational systems-level thinking. This type of mixed analysis has been used extensively in G. sulfurreducens to determine pathway usage in redundant cellular systems. In one study, metabolic GENRE-derived flux predictions were compared with growth phenotypes of G. sulfurreducens to elucidate which of eight sets of redundant pathways are used in vivo (Segura et al, 2008), and several functions were shown to be carried out by only one gene despite the existence of seemingly redundant pathways. This finding bolstered an earlier study, in which it was shown through computational analysis and in vivo deletion studies that isozymes are often not perfectly redundant (Harrison et al, 2007). Another study used 13C-labeling to track fluxes through a known threonine-associated isoleucine biosynthetic pathway in G. sulfurreducens (Risso et al, 2008). With the discovery that only a fraction of isoleucine is generated through this pathway, a putative citramalate synthase gene was tested and shown through knockout experiments to comprise part of a previously uncharacterized citramalate-associated isoleucine synthesis pathway, which produces the majority of isoleucine in the cell. The initial search for the citramalate synthase gene was motivated by model simulations in which addition of a citramalate pathway significantly improved agreement of experimental and predicted isotopomer distributions in the 13C experiment (Risso et al, 2008).
Some biological questions investigated using metabolic GENREs involve cellular-level phenomena difficult to approach without a whole-cell model of metabolism. In one analysis, the proximity of in vivo transposon-disrupted genes to downstream essential genes was compared with the in silico essentiality of the transposon-disrupted genes using a metabolic GENRE of Pseudomonas aeruginosa (Oberhardt et al, 2008). It was shown that transposon inserts in one published genome-wide transposon study affected downstream genes, whereas transposon inserts in another published study did not, consistent with claims in the two studies. In another analysis, the transcriptional timing of metabolic genes in defined sub-networks of a metabolic GENRE of S. cerevisiae was studied using time courses of transcriptomic and proteomic data, as well as protein binding affinity data from ChIP-chip assays (Chechik et al, 2008). This analysis suggested that under relatively static environmental conditions, metabolism is primarily controlled through protein-level regulation, whereas during times of environmental change, transcriptional control guides metabolic function. Transcriptional control has also been studied by identification of ‘reporter metabolites,’ which represent the most highly transcriptionally regulated metabolites in a system (Patil and Nielsen, 2005; Raghevendran et al, 2006; David et al, 2008). Like the analysis of transcriptional timing in S. cerevisiae, reporter metabolites represent a marriage of metabolic GENRE-network analysis and traditional biological investigation, and they enable study of functional phenomena in cells that would be difficult to assess otherwise.
Metabolic GENREs intrinsically represent a simplification of cellular function. The distinct biochemical networks categorized by scientists (e.g. metabolism, regulation, and signaling) blend together in a living cell, creating a far more complicated web of interactions than is convenient or possible to model (Featherstone and Broadie, 2002). This web is fundamentally stochastic, and co-habits the cell with many other simultaneous phenomena including transcription and translation, protein modification, cell division, adhesion, motility, and mechanical transduction of external forces. The very simplifications that make metabolic GENREs powerful tools also make them challenging to use for the study of totally unknown or novel phenomena.
Ostensibly, these challenges would limit the usefulness of metabolic GENREs to analyzing purely metabolic processes and refining our knowledge of already characterized cellular systems. However, the examples highlighted above in this section suggest otherwise. Metabolic GENREs enable integration of large datasets for analysis of whole-cell phenotypes, and when wielded effectively, these analyses can be targeted to answer profound questions in biology. The key to unlocking the potential of metabolic GENREs is to ask tractable questions, and to understand well the limitations of the technology used to determine the answers.
Category 4: Interrogation of multi-species relationships
Few cells grow in pure cultures outside of the laboratory, and in many cases it is through the interactions of species that the most interesting phenotypes emerge (Riedel et al, 2001; Filoche et al, 2004; Fernandez et al, 2008). Metagenomics studies particularly have shown most ecosystems to be extremely diverse, including up to thousands of distinct taxa (Vieites et al, 2009). Further, higher-eukaryotic biology necessitates the study of multi-cellular systems, as it inherently focus on interactions between different cell types. Metabolic GENREs are increasingly being applied to these multi-cell problems, as well as to the study of functional differences between species. These avenues of research offer exciting prospects for deepening our understanding of the workings of multi-cellular communities and bridging the phenotype–genotype gap in the future.
A promising direction for computational systems biology is the incorporation of network-level analysis into the field of comparative genomics, which is currently driven by bioinformatics. Comparative network-level analysis is not completely new, as some of the first metabolic reconstructions generated were compared with discern phenotypic differences between species (Schilling et al, 2002; Forster et al, 2003). However, several studies have emerged in recent years that either compare metabolic reconstructions of highly related species or use models of interacting species to predict communal phenotypes. As more metabolic GENREs become available, these comparisons become more feasible and relevant for a wide variety of organisms. However, most multi-species analyses reported to date have involved either sub-genome-scale metabolic models or models that have not been carefully annotated. For instance, an analysis of the syntrophic bacteria Desulfovibrio vulgaris and Methanococcus maripaludis included creation of a dual-species stoichiometric model, including 170 reactions and 147 metabolites that comprised the central metabolism of both species (Stolyar et al, 2007). This work represents an important step in applying scalable computational methods to mutualistic bacterial communities, but it is focused on central metabolism rather than the genome scale.
Several multi-species analyses have focused on highly related cell types, attempting to discern differences in metabolic phenotypes based on network analysis. One group developed genome-level reconstructions of four halophilic bacteria, and compared their metabolic phenotypes by highlighting differences in various pathways between the four organisms (Falb et al, 2008). Although the group used computational flux methods in a separate publication to analyze a highly developed metabolic GENRE of a halophilic bacterium (Gonzalez et al, 2008), the comparison study itself included no in silico flux analysis. Another recent study did use in silico flux methods to compare two whole-cell metabolic networks, those of the human fibroblast and the diseased fibroblast suffering from Leigh's syndrome (Vo et al, 2007). The networks were derived from the global human metabolic network reconstruction (Duarte et al, 2007), and 13C-flux, literature-derived phenotypic data, and in silico computed flux states were used to discern differences between the normal and diseased states. Other efforts have included highly automated comparisons between species, sometimes taking many species into account (Verkhedkar et al, 2007; Borenstein et al, 2008; Lee et al, 2009).
Of the five categories of uses of metabolic GENREs described in this paper, multi-species studies have been represented the least in literature so far. With more genome-scale metabolic models being built and an increased focus on studying multi-cellular systems, however, we anticipate that this field will see a major increase in activity in the coming years. The difficulty and time required to build a well-curated metabolic GENRE is a major bottleneck toward these efforts, as a comparison of multiple species or the interactions between a host and a pathogen requires the building of two or more genome-scale models. Also, meaningful modeling of multi-species interactions often necessitates augmentation of a metabolic GENRE with some regulatory and signaling information, which further increases the complexity and difficulty of this type of modeling (de Kievit and Iglewski, 2000).
Along with intracellular signaling and regulatory pathways, spatial and temporal constraints can also become important for interactions between different cell types in a biofilm or tissue, leading to other computational and experimental challenges. Various modeling methods have been proposed to address these issues (Robertson et al, 2007; Chavali et al, 2008a), but coupling metabolic GENREs with models at varying spatiotemporal scales will require novel computational approaches (Jiang et al, 2005; Lee et al, 2008b; Zhang et al, 2009). A merging of metabolic GENREs, regulatory and signaling networks into multi-cell or multi-species models will be a major achievement of systems biology, and will move us closer to the goal of accurately modeling cells in therapeutically and environmentally important contexts.
Category 5: Network property discovery
With the development of powerful molecular biology tools over the last half century, a reductionist mindset has dominated the practice of biology (Singh, 2003). However, scientists have long understood the importance of holistic thinking when approaching biological systems, as complex cellular networks can spawn emergent phenomena that would be undetectable by reductionist approaches (Waliszewski et al, 1998; Westerhoff and Palsson, 2004). Conditionally essential genes (i.e. ‘synthetic lethals’) would be overlooked, for instance, if genes were studied purely in isolation (Scherens and Goffeau, 2004). In recent years, some of the same tools that underpinned reductionist biology have been expanded to high-throughput methods, enabling the development of gene-based holistic network analysis techniques (Westerhoff and Palsson, 2004). Metabolic GENREs have enabled analysis of emergent phenomena through a focus on whole networks rather than individual pathways or genes, and many computational techniques have been developed to probe network properties. These types of network-level analyses will be critical to fully unravel the complex genotype–phenotype relationships in cells.
One of the most direct contributions of metabolic GENREs to our understanding of metabolism has been its enabling of the study of otherwise inaccessible network properties. Metabolic properties such as the existence of loops (Kun et al, 2008; Wright and Wagner, 2008), optimal pathway usage (Nishikawa et al, 2008), metabolite connectivity (Becker et al, 2006; Samal et al, 2006; Guimera et al, 2007), and pathway redundancy (Papin et al, 2002b; Mahadevan and Lovley, 2008) have all been studied in metabolic GENREs using computational methods. Many of these network analyses are performed through variants of FBA (see ‘Model building and analysis’ section). A primary end toward which network analyses have been used is the improvement of existing genome annotations, such as in the FBA-driven gap-analysis process of model building. However, other more systematic methods have been developed for improving genome annotations based on the analyses afforded by a metabolic GENRE. One such method used a metabolic GENRE of S. cerevisiae to derive condition-dependent annotations of metabolic genes, achieving higher accuracy than gene ontology annotations in determining gene function (Rokhlenko et al, 2007; Shlomi et al, 2007). Another major application of network analysis tools has been the discovery of co-regulated genes, through a computational process called flux coupling analysis (Burgard et al, 2004). This computational method has been validated recently through NMR-derived metabolite profiles of single-gene knockouts in yeast (Bundy et al, 2007). Coupled reaction sets have been used for a variety of purposes, including prediction of novel drug targets in Mycobacterium tuberculosis (Jamshidi and Palsson, 2007).
The field of computational systems biology has produced a rich array of methods for network-based analysis, offering tremendous insight into the functioning of metabolic networks. However, many of these methods produce results that can be difficult to link to observable phenotypes. Forging this link poses the greatest challenge toward development of useful network-based tools. For instance, several methods exist to analyze redundancy in metabolic networks (Price et al, 2002, 2003; Papin et al, 2002a, 2002b; Mahadevan and Lovley, 2008). Although these techniques define ‘redundancy’ intuitively in terms of the number of available paths between a given set of inputs and outputs, relating ‘redundancy’ to an observable phenotype poses a difficult challenge.
Although some network analysis techniques focus on phenotypes that are currently difficult to measure, there still exists great value in this research. With improvements in experimental technology, some currently unobservable phenotypes will become measurable, and the gap between in silico and observable phenotypes will shrink. Also, some network analyses have already deeply influenced biological thinking, even in some cases where the methods yield no easily measured phenotypes (Jeong et al, 2000). Network analysis tools will continue to be critical to the success of systems biology, by both expanding the scope of what is thought possible, and by anticipating the emergence of cutting-edge wet-lab technologies.
Exploring evolutionary relationships
Significant interest exists in using metabolic GENREs to investigate functional evolution of metabolic and regulatory networks, especially considering the strongly evolutionary-based assumptions underlying analysis techniques such as FBA. Some studies have examined the phenomenon of short-term adaptive evolution, and have shown that as a particular strain adapts to media it is grown in, its growth characteristics will converge toward the FBA-predicted optimal solution (Fong and Palsson, 2004; Fong et al, 2005). Although these studies support the theory that evolution has honed organisms to optimize for fitness-related phenotypes (Maynard Smith, 1978), the historical evolutionary process itself remains fairly unstudied in context of metabolic GENREs. This fascinating research direction represents a promising area of study. There have been a few recent attempts to use metabolic models toward this end. In particular, it has been suggested that certain topological properties of metabolic networks (such as degree distribution) might be formed as a byproduct of selection for some other phenotypes (such as growth rate), rather than because the topological properties themselves elicit a selective advantage (Pfeiffer et al, 2001; Papp et al, 2009). Genome-scale metabolic models have yet to be significantly used toward answering these important questions, but the breadth of organisms for which GENREs are available today (see Figure 3) represent a tremendous opportunity to merge evolutionary genomics with network-based metabolic analysis, and gain unique insight into the evolutionary forces that have on metabolism.
Current status of genome-scale metabolic reconstructions
Metabolic GENREs are contributing to the development of predictive, mechanistic models of an entire cell, an unrealistic goal before the genomic revolution. These models provide a broad framework whose predictions can be continually refined as more data and computational methods become available. Figure 4 shows the analyses that have already been performed on metabolic GENRE of various species, and also highlights many gaps where some possibly informative analyses have yet to be done. These gaps represent a roadmap for future efforts. Metabolic GENREs are best viewed as low-resolution blueprints on top of which other systems, constraints, and perturbations can be overlaid. As these features are overlaid on metabolic GENREs, they continually improve the resolution of model predictions. With incorporation of regulatory and signaling data as well as other high-order systems into the constraint sets, metabolic GENREs are becoming increasingly agile and expressive of realistic cell phenotypes.
As one of the simplest and most informative methods in constraint-based modeling, FBA has become a standard in the field, with a biomass reaction usually serving as the objective. Although the biomass objective yields accurate predictions under simple growth conditions for prokaryotic cells (Fong and Palsson, 2004), it is unclear to what degree optimization of biomass is descriptive of growth conditions in nature. Some studies have explored this by examining whether cells follow nonbiomass objective functions under certain conditions (Schuetz et al, 2007; Gianchandani et al, 2008), but even these studies stop short of questioning the optimization premise itself. Furthermore, although FBA predicts metabolic flux values through a network, FBA notably produces only one optimal solution, whereas it is quite common for multiple equally valid optima exist. This concept has been examined through an extension of FBA called flux variability analysis, which explores the entire optimal solution space as opposed to picking just one optimal solution (Mahadevan and Schilling, 2003), but it is an important caveat that should curb over-interpretation of FBA results.
Given a fully defined metabolic, regulatory, and signaling network, we would hypothesize that no extrinsic objective would be necessary, but rather that what we call an ‘objective’ would be intrinsically built into the rules of the network itself. Therefore, as metabolic GENREs are increasingly augmented with regulatory and signaling rules, at what point does the ‘objective function’ hypothesis break down entirely? Where exactly in a cell is the information content we call an ‘objective’ held? These questions will become increasingly crucial and addressable as systems biology models become more sophisticated, and will stand to provide enormous insight into evolutionary biology.
An assumption underlying most approaches with metabolic GENREs is that networks modeled at steady state can still yield valuable information, regardless of the lack of detailed kinetic data. However, the steady state flux approximation unravels at the edges of metabolic activity, where stochasticity, enzyme kinetics, spatial distributions, and varying levels of metabolic regulation become dominant forces in cell activity. These edge effects become increasingly relevant as more accurate results are demanded from metabolic GENREs, and they represent the major challenges to building models of an entire cell. We have already described some efforts to incorporate dynamics and transcriptional control into metabolic GENREs (Herrgard et al, 2006; Lee et al, 2008b). A framework was also recently proposed for incorporating genome-scale kinetic data into metabolic GENREs for whole-cell dynamic modeling (Jamshidi and Palsson, 2008). However, a standing challenge is the determination of enough kinetic parameters to enable use of such a framework. In some cases, it has been shown that knowledge of a few key parameters can be sufficient for predicting metabolic and regulatory dynamics (Lee et al, 2008b), but it is unclear how effective these reduced-parameter models can be. Also, we are only now beginning to tackle different temporal and spatial scales at work in cells, and efforts to incorporate stochasticity into genome-scale regulatory and metabolic models are in their infancy (Zhu et al, 2004; Ghosh et al, 2007).
With transcriptional regulatory interactions, stochastic effects, nonlinear enzyme regulation, and other unaccounted-for cellular events intrinsically limiting the accuracy of predictions, validation of metabolic GENREs becomes a difficult and open-ended goal. Metabolic GENREs are often validated with comparisons between in silico phenotypes and various sets of in vivo data. However, no standard exists for how a model should be validated, which is apparent from the scattered representation of methods in validation of existing models (see Figure 4). Recent efforts have been made to quantify the level of discrepancy expected between in silico and in vivo metabolic phenotypes. In one notable study, 465 single-gene mutants of S. cerevisiae were grown and quantified under 16 different growth conditions each (Snitkin et al, 2008). An analysis of the performance of two published S. cerevisiae metabolic GENREs revealed sensitivity (correctly predicted nonessential genes versus the total number of nonessential genes) to be on the order of 95%, and specificity (correctly predicted essentials versus the total number of essential genes) to range between 50 and 60%. These numbers were significantly improved to approximately 95–98% and 69–86% (respectively) through disqualification of some in vivo experiments, which were discovered on further analysis to be in error. These final numbers are a gauge of the accuracy that might be expected from metabolic GENREs of well-studied organisms, given a preponderance of well-verified experimental data for comparison of in vivo and in silico results.
Many of the most pressing questions in basic and applied biology involve studying relationships between multiple cell types. Comparisons of diseased versus normal networks in the human will continue to yield insight into disease and drug activity (Vo et al, 2007), whereas development of more sophisticated modeling methods for interacting species (Stolyar et al, 2007) will enable increasingly realistic prediction of communal phenotypes. Evolutionary genomics will be enriched by the incorporation of network data into the analysis of species relationships. Also, methods developed to examine dynamic model activity (Klipp et al, 2005; Jamshidi and Palsson, 2008), to integrate metabolic models with regulatory and signaling information (Lee et al, 2008b), and to model interactions on multiple spatial or temporal scales (Chavali et al, 2008a; Zhang et al, 2009) have yet to be significantly applied to metabolic GENREs. These extensions of metabolic GENREs will likely yield great insight in the future. Metabolic GENREs represent a bold attempt to characterize the ‘black box’ of genotype–phenotype relationships within a fully mechanistic model. These models have already led to many advances, ranging from theoretical to highly practical applications. The five categories outlined in this paper show many of these uses, and explain both the limitations and the promise of metabolic GENREs. As systems biology matures and continues to deepen the marriage between cutting-edge wet-lab technology and sophisticated computational modeling, metabolic GENRE will serve a crucial function in the years to come.
We thank Daniel Segre for suggesting the phylogenetic tree analysis, and Andres Pinzon for helping construct the figure. We also acknowledge Erwin Gianchandani, Arvind Chavali, and Ani Manichaikul for helpful comments on the figures. We also thank our funding sources National Science Foundation (NSF) (CAREER Grant 0643548 to JP) and National Institutes of Health (NIH) (GM08715/NIH Biotechnology Training Grant). BOP is funded by the NIGMS and the NIAID.
Conflict of Interest
The authors declare that they have no conflict of interest.