Lactic acid bacteria (LAB) are microbes that are used all over the world in a variety of fermentations. Beside their most important application, which is undoubtedly in the dairy industry, LAB are also applied at an industrial scale in the fermentation of other food-raw materials like meat and vegetables. LAB have a relatively simple carbon and energy metabolism which is characterized by the rapid glycolytic conversion of sugars into lactic acid. Many examples of successful metabolic engineering approaches in LAB focus on re-routing of the pyruvate metabolism. Recently, LAB have also been used for the engineering of complex biosynthetic pathways leading to the production of valuable metabolites with health benefits for the consumers (Hugenholtz and Smid 2002). Engineering complex biosynthetic pathways such as for vitamin or polysaccharide biosynthesis, often leads to unexpected phenotypes which can only be understood if genome-wide metabolic models of the micro-organism are available. Here we describe the construction of metabolic models of Lactobacillus plantarum based on the availability of genome sequence information. After prediction of gene function, we have focused on the development and improvement of methods and tools to go from genome sequence to gene annotation, to pathway reconstruction and to prediction of phenotype through metabolic models. We have set up different bioinformatics tools, including web-interfaced databases and simulation software. This paper describes some of these tools, and how they are used and combined with experimental data to arrive at a model of the metabolic network of L. plantarum. The use of these types of models and the type of questions that can be addressed will be discussed.
The food industry is expected to produce safe, healthy and nutritious products of high quality. For many food products, fermentation with starter cultures containing lactic acid bacteria (LAB) is an essential part of the production process. Looking at world-wide production figures, the fermentation of milk, meat, vegetables and cereals are the most important. The key process in food fermentations is the production of lactic acid by these fermentative bacteria. Next to lactic acid, which has a preservative effect, various important flavours, texturing and nutritional compounds are being produced through the activity of LAB enzymes, both during fermentation and product maturation (De Vos and Hugenholtz 2004).
Lactic acid bacteria are not only found in food fermentation processes, but a large diversity of species is also present in different environmental niches, including the mammalian gastrointestinal tract, with several having probiotic properties. Probiotics are live microbial food and feed supplements which are reported to improve the microbial balance of the intestine (Naidu et al. 1999). Mediation presumably occurs through stimulation of the commensal flora and competitive exclusion of pathogens. Finally, some LAB species such as Streptococcus pyogenes, Enterococcus faecalis, Streptococcus mutans and Streptococcus sobrinus are known to have pathogenic and cariogenic properties (Molinari and Chhatwal 1999).
To fully exploit their potential, LAB have been the subject of considerable research and commercial development. Focus has however primarily been on empirical strain selection and the study of individual enzymes or simple metabolic pathways. As the entire metabolism of LAB seems to be tuned towards rapid and maximal production of lactic acid from sugar, many metabolic engineering strategies for LAB have been focused on rerouting of pyruvate metabolism towards other products that can be used as food ingredients with considerable economic value (flavours, aroma components, sweeteners). Hols et al. (1999) have demonstrated efficient rerouting of the lactococcal sugar metabolism towards the production of alanine by overexpression of an heterologous alanine dehydrogenase in an LDH-deficient strain. Because of the central role of glycolysis and the relatively high fluxes observed in this pathway, the first published metabolic models of LAB have focused primarily on this major pathway. Hoefnagel et al. (2002) have demonstrated that the key control points in the flux of two important flavour compounds, acetoin and diacetyl, can be found using a metabolic model of pyruvate distribution in Lactococcus lactis based on enzyme kinetics in combination with metabolic control analysis. Experiments confirmed qualitatively and quantitatively the prediction of the model, that knocking out lactate dehydrogenase and overexpressing NADH oxidase leads to an increased flux through the acetolactate synthase branch linked to pyruvate metabolism. However, many kinetic models of yeast glycolysis have been published, none of which have led to a clear strategy for enhanced glycolytic flux. Teusink et al. (2000) have shown that the in vivo behaviour of yeast glycolysis cannot be easily explained and understood in terms of the in vitro kinetic properties of the constituent enzymes. These findings clearly demonstrate the difficulties of the use of kinetic models for improvement of metabolic engineering strategies. Only if relatively isolated and linear metabolic pathways are studied which do not have multiple interactions with central metabolism, kinetic models can be used for accurate prediction of the fluxes in engineered strains to support metabolic engineering strategies (Nielsen and Jorgensen, 1995). Kinetic modelling is therefore unsuitable for developing metabolic engineering strategies for complex anabolic pathways with unknown kinetic parameters and many links with cofactors and other key metabolites such as ATP and NAD+/NADH.
With the genomics revolution in biology, and hence the genome sequencing of numerous LAB initiated in the past few years (Klaenhammer et al. 2002), we now have a unique opportunity to radically change the way we use metabolic models for improving fermentation and cell factory performance of industrial micro-organisms. Instead of using metabolic models based on single pathways, we now have the opportunity to reconstruct the complete metabolic networks of micro-organisms based on data derived from annotated genome sequences. Our knowledge of the complete genetic potential has paved the way for the integration of high-throughput functional genomics data into comprehensive models of cell factories (Siezen et al. 2004).
Stoichiometric network models allow direct input of genome sequence data. Therefore, this modelling approach can be regarded as a first logical step in the exploration of the basic properties of the metabolic network. These models are based on a stoichiometry matrix with all enzymes in one dimension and all metabolites in the other. Tools are available for automatic construction of the matrix and a number of straightforward matrix calculations can be performed to explore properties of the metabolic network. For example, linear programming tools can be used to obtain optimal flux distributions in the network given physicochemical constraints that restrict the behaviour of the network and objective functions such as maximal biomass or product yield (Covert and Palsson 2002). This genome scale approach is more attractive compared with kinetic models, especially for engineering complex biosynthetic pathways involving key metabolites like folate. The reason for this is that engineering folate biosynthesis will have a global effect on the overall metabolism of the host cell because folate acts as a cofactor in a number of central processes in the metabolism like the biosynthesis of methionine, glycine, serine and purines (see Fig. 1).
3. New tools for the genomics approach
Methods for data integration, data storage and data analysis at the size of hundreds to thousands of genes, compounds and reactions, are still in development, as the amount and magnitude of data is new to biology. Important aspects are standardization of vocabulary, functional interactions (ontologies), biological concepts for data integration and visualization. New software needs to be developed and existing tools that traditionally deal with only a few genes or reactions need to be scaled up. These developments go extremely fast, and some of the required concepts and tools are emerging.
We are working on the development of a model of the complete metabolic network of L. plantarum, based on the genome annotation (Kleerebezem et al. 2003). Emphasis is on questions like which tools need to be used or developed, or which information is required for the model development. Finally, we will discuss briefly the type of questions that can be addressed with global metabolic models.
3.1 Metabolic network reconstruction
For a reconstruction of the metabolic network, information is required about the (putative) functions of genes, and databases with information of metabolic pathways. We have used the sequence information of L. plantarum WCFS1 (Kleerebezem et al. 2003). Putative biological functions could be assigned to 2120 (70%) of the 3052 predicted protein-encoding genes. Annotation data describing the function of genes, comments of curators and further additional information are stored in an in-house developed, web-interfaced, MySQL database. The database stores information on different micro-organisms and different versions (updates) of the annotated genomes. Queries can be performed, and comments can be added via the web interface.
For information on metabolic pathways, we used primary literature and a number of databases. These include KEGG (http://www.genome.ad.jp/kegg/kegg2.html), ERGO Bioinformatics Suite (http://ergo.integratedgenomics.com/ERGO), Brenda (http://www.brenda.uni-koeln.de/), MetaCyc (Karp et al. 2002, http://www.metacyc.org) and SimphenyTM (Genomatica Inc., San Diego, CA, USA). All these databases contain similar information, but there are many errors (wrong annotation, different databases giving different EC numbers for the same reaction, wrong reaction stoichiometry, incorrect formulae of chemical structure etc.), and therefore the different databases are supplementary but also contradictory. Moreover, not only information about reactions is required, but also on the enzymes that carry out these reactions. There exist many-to-many relationships between genes, proteins and the reactions they carry out. For example, alcohol dehydrogenase is encoded by one gene, but the enzyme can oxidize many alcohols. The proton translocating F1F0-ATPase, however, carries out only one reaction but the enzyme consists of two complexes with 5 (F1 complex) and 3 (F0 complex) subunits, encoded by eight different genes. The branched-chain amino acid transporter LivABCDE even combines these two features (Fig. 2). These complicated relations between genes and reactions and the inevitable mistakes in the database, make the construction of the first model very labour-intensive, but once a first high-quality model is made, it can form the basis of new models of other strains and species.
Experimental data that is required to reconstruct the metabolic network are the potential inputs and outputs. These comprise the substrates that the organism can consume, the products that it can make and the composition of the biomass. The latter is very important to resolve issues in membrane and cell-wall biochemistry, leading to significant sinks of for example, carbon and phosphate. Half of the 20 amino acids, and seven of 10 known cofactors and vitamins, are needed to be supplied in a minimal growth medium (A. Wegkamp, unpublished data), reflecting the relatively rich environments where L. plantarum grows. Yet, all but three amino acid biosynthesis routes appear to be complete in L. plantarum. There are many ways to explain these inconsistencies, including kinetic constraints (regulation), mutations that render genes inactive, errors in the prediction of gene functions, and others. We are in the process of resolving some of these issues, which has led to many new hypotheses about the function of particular genes and metabolic pathways.
A specific roadmap of the metabolism of the organism of interest is in itself extremely useful. Systematic exploration of the network properties allows one to predict viability and product formation of knockout strains, optimal yields and perhaps even regulatory sites (Stelling et al. 2002). These are all qualitative aspects of the network. Other questions that can be addressed and that have a more quantitative nature, are: given a certain fixed growth rate, what is the maximal by-product formation, and which fluxes within the network increase to achieve that? Given a certain rate of glucose uptake, what is the maximal growth rate and what is the optimal flux distribution to achieve that (see for an introductory paper, Covert et al. 2001)? The modelling tool SimphenyTM (Genomatica Inc.) can be used for answering these more quantitative questions. Moreover, as SimphenyTM contains a large database with biochemical reactions and many quality control features, it facilitates in the reconstruction and quality analysis of large-scale metabolic networks.
Our current model of the L. plantarum WCFS1 network comprises 708 genes (23% of the genome), 595 reactions and 601 metabolites. When finished, the null space of the network's stoichiometry matrix will be analysed through Simpheny's built-in linear programming tool, to obtain optimal flux distributions given constraints and objective functions (Schilling et al. 2000).
3.3 Integration and visualization
Within the genomics revolution, new high-throughput methods have been developed that allow the measurement of activities of all the genes and or proteins in the system. Obviously there are many relevant statistical analyses that can be performed on these data sets. However, visualization of the data sets in a biological context is extremely important to help interpreting these data from a biological viewpoint. Once the connections between genes and reactions in a metabolic map have been defined, high-throughput transcriptome or proteome data can be projected on metabolic maps. This can be performed with MetaCyc related software and within SimphenyTM (see Fig. 3). Moreover, these data sets can be projected on circular maps and on linear genome maps through the Microbial Genome Viewer (Kerkhoven et al. 2004).
Having a large-scale metabolic model allows one to relate the changes in activity of genes to changes in the flux distribution. Understanding the changes on a metabolic level will then help in understanding the regulatory mechanisms that lead to these changes in fluxes. The metabolic model can thus form a solid biochemical basis on which to build and interpret other functions of the cell, such as signalling and regulatory networks.
In the next years, detailed analysis and comparison of the complete genome content (presence/absence of genes for certain metabolic routes, regulatory networks, etc.) of many (LAB) species and strains will provide key insights in understanding the natural diversity of their capabilities, roles and interactions. This knowledge will greatly assist efforts to select for specific traits, and to maintain and design stable genomic arrangements in existing strains and in new derivatives. With the elucidation of entire genomes sequences, the future approach to metabolic analysis will be the reconstruction of metabolic potential using bioinformatics tools and databases, followed by targeted experimental verification and exploration of the metabolic network properties. For this, models and simulations will become essential. We anticipate that in the future, metabolic networks for particular applications will be designed in the computer, not unlike what is commonplace for most of today's high-tech products (Csete and Doyle 2002).