A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information



An updated genome-scale reconstruction of the metabolic network in Escherichia coli K-12 MG1655 is presented. This updated metabolic reconstruction includes: (1) an alignment with the latest genome annotation and the metabolic content of EcoCyc leading to the inclusion of the activities of 1260 ORFs, (2) characterization and quantification of the biomass components and maintenance requirements associated with growth of E. coli and (3) thermodynamic information for the included chemical reactions. The conversion of this metabolic network reconstruction into an in silico model is detailed. A new step in the metabolic reconstruction process, termed thermodynamic consistency analysis, is introduced, in which reactions were checked for consistency with thermodynamic reversibility estimates. Applications demonstrating the capabilities of the genome-scale metabolic model to predict high-throughput experimental growth and gene deletion phenotypic screens are presented. The increased scope and computational capability using this new reconstruction is expected to broaden the spectrum of both basic biology and applied systems biology studies of E. coli metabolism.


Genome sequencing and annotation, along with biochemical characterization of cellular machinery, has enabled reconstruction of cellular processes on the genomic scale since the turn of the millennium. Initial targets for reconstruction were primarily microorganisms, but with advancements in genomic sequencing technology and annotation techniques, reconstructions of higher-order species are appearing (Reed et al, 2006a). Of the available genome-scale reconstructions, those for metabolism are most prevalent due to the large body of work characterizing and cataloging metabolic processes. Similarly, the metabolic reconstruction of the bacterium E. coli is arguably the most advanced because it possesses the most complete body of data available for its metabolism and growth behavior. Building on the rich history of E. coli metabolic reconstruction (Figure 1), we generated an updated genome-scale metabolic reconstruction for E. coli containing 1260 ORFs from the latest genome annotation (Riley et al, 2006). We characterize the iAF1260 reconstruction, detail its conversion to a computational model and demonstrate its application as a model to predict selected cellular phenotypes.

Figure 1 characterizes the content of iAF1260 by categorizing the ORFs (i.e., genes), reactions and metabolites contained in the reconstruction in terms of their Clusters of Orthologous Groups (COGs) functional class. The figure also outlines the genetic content of five previous reconstructions of E. coli metabolism. The major areas of expansion for iAF1260 relative to previous work (i.e., the enhancements) fall into five categories: (i) an increased scope, (ii) compartmentalization of the reconstruction into three distinct cellular compartments (the cytosol, periplasm and extra-cellular space), (iii) increased pathway detail, (iv) incorporation of reaction thermodynamic information, and (v) alignment with the E. coli specific database, EcoCyc. Each of these five major areas of expansion is discussed further in the text.

In an effort to clearly demonstrate how a genome-scale reconstruction can be utilized as a computational model for phenotypic predictions (e.g., flux balance analysis (FBA) calculations), we delineated the steps necessary for converting a reconstruction to a computational model (3Figure 3). In summary, this process involves: (i) explicit assignment of the metabolites participating in a reaction (often times incorporated into the reconstruction process), (ii) definition of a system boundary, (iii) conversion of the defined system into a mathematical format that forms the basis for a computational model, (iv) curation of the network, which often requires filling gaps in the reconstruction, and (v) determining the strain specific parameters for a particular organism or system (e.g., maintenance parameters).

After formulating a computational model based on the iAF1260 reconstruction, we utilized this model to predict and quantify the active pathways and probable system outputs under glucose aerobic conditions, a common E. coli laboratory growth condition. In this process, we addressed modeling issues specific to E. coli and the chosen growth conditions that are required for accurate phenotypic predictions using iAF1260, which include (i) transcriptional regulatory events, (ii) cellular maintenance costs, and (iii) reaction kinetic effects. We then used the information from this process and applied it to study growth under a different condition, aerobic growth on succinate. The modeling predictions made using iAF1260 agreed well with reported experimental values under these conditions (Figure 3). Looking further at modeling predictions on a pathway by pathway and reaction by reaction basis, we compared results from growth on glucose to 13C-labeled experiments (Fischer et al, 2004) and found a good agreement between predicted and observed values. A sensitivity analysis was also performed to determine how changes in key model parameters affect the computational results generated using FBA in conjunction with the iAF1260 model.

In forming iAF1260, we incorporated thermodynamic information to provide another means of assessing reaction reversibility beyond what is stated in the primary literature and assignments made using general heuristic rules. This process, termed thermodynamic consistency analysis, utilized flux variability analysis (Mahadevan and Schilling, 2003) under 174 different carbon source conditions to identify reactions that operated in a thermodynamically infeasible direction during near optimal growth. Specific examples are presented that describe how thermodynamically inconsistent reactions were altered for inclusion in iAF1260. Additionally, this analysis facilitated the further classification of reactions as essential, substitutable or blocked. Interestingly, a large number of the reactions in the reconstruction behaved uniformly regardless of the carbon source being utilized. Once reactions that operated in thermodynamically infeasible directions according to the flux variability analysis were identified and adjusted to remove all thermodynamic inconsistencies, we examined calculated ΔrG values and further adjusted reactions to be consistent with those we predicted to be irreversible with high likelihood.

To demonstrate how iAF1260 can be used as context for biological content, we analyzed two high-throughput data types: growth phenotype (http://www.biolog.com) and gene essentiality (Figure 5) (Baba et al, 2006; Joyce et al, 2006) data for E. coli. To do this, we utilized FBA to screen carbon, nitrogen, phosphorus and sulfur sources that could support simulated growth, and also computationally determined the essential ORFs for growth under glucose aerobic conditions. We compared our predictions to the high-throughput data sets and outlined the level or agreement between computational and experimental results. Whereas the agreements validate the content of the reconstruction and modeling methods (an overall agreement of 76% for growth phenotype and 92% for gene essentiality predictions), the disagreements provide area for further investigation (included in the Supplementary Data). For the growth phenotypes, disagreements indicate possible areas where there are either errors in the reconstruction, where regulation limits the utilization of pathways needed for growth, and/or point to areas where further biochemical characterization is needed (targeted areas for biological discovery). Disagreements in gene essentiality data point to specific areas where additional intracellular and transport reactions can be examined to rectify the disagreements or identify a current limitation of the reconstruction and modeling methods (e.g., transcription and translation processes are not currently incorporated in the modeling scheme).

Finally, future directions for improvement of the metabolic reconstruction of E. coli are discussed, and as the field of systems biology expands, it is expected that iAF1260 will serve as a key component for the study of E. coli and related organisms by providing a comprehensive picture of cellular metabolism.