SEARCH

SEARCH BY CITATION

Keywords:

  • systems biology;
  • biological network;
  • mathematical modelling;
  • high-throughput data;
  • computational biology

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
  4. ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
  5. CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
  6. SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
  7. SYSTEMS BIOLOGY TOOLS
  8. EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
  9. REFERENCES

Systems Biology is a nascent field that arose from the technology driven omics measurement revolution. It goes beyond mere data analysis and focuses on the biological behaviour emerging from the dynamic interactions between system components that are organized in a hierarchical and highly connected manner. Mathematical models have been used as a conceptual framework to study such systems and their impact is maximal when there is a synergistic interplay of the models with experimental data and biological domain knowledge. The review provides an introduction to the modelling process and selectively highlights manuscripts that have strong biological impact.

La biologie des systèmes est un domaine qui est apparu récemment avec la révolution dans les mesures omiques dues aux progrès technologiques. Ce domaine va au-delà de la simple analyse et s'intéresse au comportement biologique venant des interactions dynamiques entre les composantes de systèmes qui sont organisées de façon hiérarchique et hautement interreliées. Des modèles mathématiques ont été utilisés comme cadre conceptuel afin d'étudier ces systèmes, et leur impact est maximal lorsqu'il y a une interrelation synergétique entre d'une part, les modèles, et d'autre part, les données expérimentales et les connaissances dans le domaine biologique. Cette revue de la littérature fournit une introduction aux articles mettant en lumière les procédés de modélisation et la sélectivité et qui ont un fort impact biologique.


INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?

  1. Top of page
  2. Abstract
  3. INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
  4. ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
  5. CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
  6. SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
  7. SYSTEMS BIOLOGY TOOLS
  8. EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
  9. REFERENCES

Living systems consist of numerous biological components that are connected together as layered networks in time and space. The components and the interactions among them are constantly changing in response to perturbations inside and outside the system. The components of a living system function at different time and length scales. The changing interactions among the components results in higher order behaviour that leads to the adaptation and survival of the organism in changing environments. Systems Biology is a study of the complex dynamic interactions among the various components that underlie biological systems. The biological components include molecular entities, such as DNA, RNA, proteins, metabolites, or more organized entities, such as organelles, cells, tissues, organs. Systems Biology results in an understanding of behaviour that emerges from hierarchical dynamic networks present in living systems. The application area of interest in system biology may include medicine, agriculture, marine biology, and industrial biotechnology. The focus of this manuscript is on the synergistic interplay between biology and mathematics which is the basis for successful applications of Systems Biology.

Systems Biology is a field that is in its early stages of evolution and is still trying to grapple with the best strategy to study complex systems. Such studies may be purely experimental, or purely mathematical or computational or a combination of both approaches. The analysis of the complex dynamic interactions leads first to the ability to understand the behaviour of biological systems and subsequently to the ability to manipulate these systems. One approach for Systems Biology is purely experimental with an attempt to characterize the various molecular components such as genes, proteins, metabolites, signaling molecules. Recent breakthroughs in measurement technologies (the “omics revolution”) have led to the ability to characterize molecular components and resulted in a concurrent explosion of new data such as genomic, transcriptomic, proteomic, metabolomic, lipidomic, intermolecular interactions (interactome). A detailed experimental cataloguing of molecular components is thus becoming feasible. A large number of manuscripts in the literature have used a purely experimental strategy in Systems Biology. A mere experimental cataloguing of the system components although useful, does not by itself reveal much about the complex and constantly changing dynamic interaction among the component parts that ultimately determines system behaviour. In this review, we have deliberately focused our attention on studies that go beyond purely experimental approaches. At the very least, there is a need for a conceptual framework to understand the connections and relationships among the components. One needs to understand how the component parts are connected, how they relate to each other in time and space and how they respond and adapt to external environmental influence. The key to the success of Systems Biology is therefore the availability of a language or a conceptual framework to describe and analyse interactions. An explicit and “dynamic language of connectivity” is therefore the next revolution that is even more important for Systems Biology than the current “omics” measurement revolution.

At the simplest level, a language of connectivity can be a “verbal” description that lists how various components are connected and interact with each other. However, verbal representations or “verbal models” have their limitations especially as the number of components increase and the interactions become complicated. Another way of describing connections and interactions is as a “pictorial representation” (as in a connected graph or a map, arrows and nodes diagrams, signed digraphs, flowcharts, etc.). Metabolic and signaling pathways (commonly available in biology textbooks) are examples of a pictorial representation or “graphic models.” These pictorial representations can be converted to more rigorous mathematical representations using graph-theoretic approaches and Bayesian network formulations. However, these pictorial maps are usually inadequate for dynamic relationships even though animated images or pictures could have limited utility in providing spatial localization information. The big limitation of static maps is that they do not provide information on flows. For example, in a chemical engineering system, a piping and instrumentation diagram does not provide any information on material and energy flows. Similarly, a map of a global news network does not say anything about the information or news flowing through the network. Similar analogies can be pointed out for electrical circuits, road maps, etc.

Mathematical models and especially differential equations provide a trusted language to describe how the quantitative value of a system component changes in a dynamic manner as a function of time and environmental context. Such models require a quantitative understanding of the system and knowledge of the functional relationships among the components. Models need not always be mathematical. Because of the inherent complexity of biological systems, it may not always be possible to represent every detail in a mathematical model. In many cases, one may not know “a priori” what details are important and should be included in the model and what details are unimportant enough to be excluded. Even if one knew the important variables, one may not know the exact quantitative relationships among the variables. In such cases, one could potentially use qualitative differential equations or computer representations or models that are a hybrid of quantitative or qualitative approaches. Control systems engineering provides a language and a set of tools to describe feedback loops that are ubiquitous in biological systems. Systems Engineering has contributed to Systems Biology by bringing in systems tools and perspectives. However, the complexity of engineering systems is relatively very low in comparison to biological systems. Systems engineering tools are not easily or directly transferable to biology without paying special attention to the unique aspects of living systems. Biological systems are complex, self-organizing, highly adaptive, operating at many levels of hierarchy and highly nonlinear. It is indeed a daunting challenge to meaningfully apply mathematical or systems engineering tools to complex dynamic nonlinear adaptive living systems. One could justifiably question if existing tools are sufficient to meet the challenges of living system or whether a new “language of connectivity” has to be invented.

Mathematical modelling is not new to biology. Models such as the logistic model of growth of populations (proposed by Verhulst, 1838) have been around for over a century. Other notable models include the work of Hodgkin and Huxley (1952), who developed a quantitative model of the ionic mechanisms involved in neuronal stimulation in 1952 and subsequently received the 1963 Nobel prize in physiology and medicine for their role in discovering the ionic basis of nervous conduction. What is new to biology is the huge explosion of “omic” data and the impact that it has had on the motivation for modelling. Whenever there has been a paucity of experimental data, modelling has been out of vogue. When there is a surplus of experimental data, then the first reaction is to look for ways to make sense of this excess of information. Quantitative analysis is sometimes an afterthought that is necessitated by the need to analyse huge amounts of data and to detect underlying patterns. The field of bioinformatics was an outgrowth of this need. Analysis of data alone (as done in bioinformatics) is not sufficient to understand how the components are connected together and especially how they interact in a dynamic manner. One needs an appropriate framework to describe the dynamically changing glue that connects everything together. This requires the change in focus from the analysis of data (bioinformatics) to creating a modelling or conceptual framework to understand and use the data (Systems Biology). Bioinformatics is an initial response to large amounts of data and an attempt to make sense out of the data while Systems Biology goes beyond the mere analysis of data towards the next step of creating a conceptual and knowledge framework in the language of mathematical models.

Although mathematical modelling has been around for long (Bailey, 1998), the term “Systems Biology” has been used commonly in the literature only recently (see Figure 1). Since its inception, there have been several reviews and textbooks on this topic (Ideker et al., 2001a; Kitano, 2002; Ge et al., 2003; Wiley et al., 2003; Kell, 2004; Weston and Hood, 2004; Snoep and Westerhoff, 2005; Palsson, 2006; Swedlow et al., 2006; Alon, 2007; Bruggeman and Westerhoff, 2007) and this term has been defined in a variety of ways. For example, the most common definition of Systems Biology appears to be one defined by Selinger et al. (2003) who state that Systems Biology is “the study of how the parts work together to form a functioning biological system” (Selinger et al., 2003; Snoep and Westerhoff, 2005). An alternative related definition is provided by Gutierrez et al. (2005), who state that Systems Biology is “the exercise of integrating the existing knowledge about biological components and extracting the unifying organizational principles that explain the form and function of living organisms.” Yet another related definition is provided by Leroy Hood (Hood et al., 2004), who state that “Systems Biology is a scientific discipline that endeavours to quantify all of the molecular elements of a biological system to assess their interactions and to integrate that information into graphical network models that serve as predictive hypotheses to explain emergent behaviours.” Hence, there appears to be an emerging consensus on the broader level definition of Systems Biology although the details such as the emphasis on modelling tools are perhaps not conserved. In addition, another key aspect that has been emphasized previously is the modular or hierarchical nature of biological systems that naturally requires systems level analysis (Hartwell et al., 1999; Trewavas, 2006). Interestingly, Trewavas uses a 1974 quote from Francois Jacob “Every object that biology studies is a system of systems” to highlight the hierarchical nature of biological systems in his perspective.

thumbnail image

Figure 1. Growth in Systems Biology research in the last decade (2007 numbers as of December 7, 2007).

Download figure to PowerPoint

In this review, in addition to the perspective of Systems Biology as the study of behaviour emerging from hierarchical dynamic networks, we emphasize the interplay between the modelling and experimental components and its importance in Systems Biology research. In fact, most of the papers reviewed in this manuscript are papers that contain both modelling and experimental aspects of research (Figure 1) rather than either by itself.

The ultimate benefit from models in Systems Biology is in the design or reengineering of biological systems or processes. For example, in industrial biotechnology, one may want to use a model to design a cell to produce a protein or product that it may not have produced before. A goal in plant biotechnology may be to engineer disease resistance in plants. Genetic improvement of systems may require the introduction (or deletion) of several genes. Such interventions may perturb the entire dynamic network of relationships among genes, proteins, metabolites, and cellular structures. The successful design of a new system may require time-consuming and expensive trial and error experiments. In engineering, design or changes in processes are often done with the help of models and simulations. In biology, design or reengineering rarely makes use of quantitative mathematical models and is usually done by “Edisonian” trial and error. When quantitative models begin to be effectively and widely used for design and reengineering of biological systems, then the golden age of Systems Biology will have arrived!

In this manuscript, we start with a detailed primer on the use of mathematics as a language of connectivity in Systems Biology (Section “Role of Models in Systems Biology: Can Mathematics Be a Good Language of Connectivity?”). This introduction is meant for individuals with stronger expertise in biology than in mathematical modelling. Section “Conceptual Basis for Classification of Models: What are the Metrics of Usefulness of Models in Systems Biology?” provides a conceptual basis for classifying models in terms of their utility in Systems Biology. We also address the considerable challenges involved in the integration of models and experiments. Section “Selected Manuscripts Based on Specific Biological Impact” consists of several examples of papers from the literature that illustrate the various concepts detailed in this manuscript. This review paper is not intended to be an extensive review of all work done in the field. With modern Internet search tools, it is easy to get a comprehensive listing of papers in a particular area. However, for a novice to the field, this review can be used as a guide to classify results from such search engines into a conceptual framework that leads to a better understanding of the developments in the field of Systems Biology.

ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?

  1. Top of page
  2. Abstract
  3. INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
  4. ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
  5. CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
  6. SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
  7. SYSTEMS BIOLOGY TOOLS
  8. EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
  9. REFERENCES

In the life sciences, we have made a sudden and dramatic transition from a data poor environment to an extremely data rich “omics” context. This data rich world is beginning to have a major effect on the degree of complexity of potential models and the motivation for building such models. In the past, models have focused at a macroscopic level of the system detail because of the inability to get sufficient experimental information at the molecular level. Detailed models only have meaning to the extent that they can be “validated” or connected to biological reality. Validation requires experimental data. In the absence of experimental data, the motivation to write very detailed models is indeed limited. The model of Domach and Shuler (Domach et al., 1984) is a rare example of a model with a large amount of detail long before the omic measurement revolution. However, recent advances in analytical technologies has enabled genome wide measurements of gene, protein expression as well as metabolite profiling (Ferea and Brown, 1999; Fiehn et al., 2000; Pandey and Mann, 2000), which again has led to the development of more sophisticated models that can describe a number of physiological descriptors.

Systems Biology is an interdisciplinary field and it is difficult to be an expert in all aspects of the field. This is especially challenging when making the connections between experiments and modelling. This section has been written for individuals with a strong background in experimental aspects of Systems Biology and who may want to fully understand the role of mathematical modelling in biology. A good appreciation of the role of mathematical models can result in strong synergies between experiments and mathematics and the potential for significant breakthroughs. In this section, we have provided a basic understanding of the process of mathematical modelling, the experimental validation of models and the basis for selecting the degree of detail in the model.

The Process of Mathematical Modelling: Eight Steps to a Model

Modelling is the process of representing a “real system” in a symbolic language. The new language may be mathematical in which case the process of “translating reality” into mathematics would be termed as “mathematical modelling.” It is extremely important in the modelling process to understand, “What is lost in the translation of the real system to a model?” If only limited aspects of the real system are sampled, one may have very inadequate and completely incorrect or incomplete “model” of the system. The quality of modelling depends on the quality of understanding or knowledge about a system. In some cases, there may be little or no connection between the mathematical model and the real system. Such incomplete or inadequate models are what often lead to the high degree of cynicism or distrust of mathematical models among biologists. It is therefore extremely critical to have the capability to discriminate between inadequate, incomplete, or misleading models versus high quality models based on a sound understanding of the system and clear goals. Whether a model is complete or adequate depends largely on the goals and expectations of the systems biologist and the reasons why the models were built. Understanding the steps involved in building a mathematical model can help the experimentalist build (or at least identify) useful mathematical models.

There are eight key steps in building a mathematical model of a biological system:

  • Step 1: Define Goals and Expectations: where to focus.

  • Step 2: Define Assumptions: what is important, what to include, and what to rule out.

  • Step 3: Define Variables: select entities that are relevant to the modelling goals.

  • Step 4: Define System Connectivity among the Variables: what is connected.

  • Step 5: Define Functional Relationships and Equations: how are things connected and how they change.

  • Step 6: Identify Values of Constant Parameters in equations: quantifying relationships.

  • Step 7: Run Computer Simulations to obtain Trends: visualizing connections.

  • Step 8: Validate the Model: qualitative and quantitative confirmation.

Once the goals of the model are defined and the expectations of the biologists are clarified, one defines the key assumptions that are to be made. This leads to a definition of the variables that are considered important and to be included in the model. It is helpful if the variable of the model can be measured experimentally. The next step in the modelling process is to define the critical connectivities among the key variables of interest within the system. How are the variables of the system related to each other in time and space? Is there a strong or weak connection or no connection at all? Is there a linear relationship among the variables or a nonlinear relationship? Is there a cause and effect relationship or is there a circular relationship as in the case of regulated phenomena with feedback loops? Are the variables related through a metabolic reaction network? Or are the variables part of an information network that alerts or signals the biological system about an impending problem or the necessity to take decisive action (such as turning on or off of a gene or changing the shape or activity of a protein)? Are the variables related by their sharing a physical structure or spatial location (e.g. a ribosome, mitochondria membrane, tissue, or organ)? Can such connectivities be quantified experimentally? Once the connectivity of the system is defined, one comes up with the functional relationships and the equations describing the system. Modelling is an evolutionary process. One starts with a first version of the model and refines it through multiple iterations (Figure 2). The first model is rarely the adequate one. It is only through a process of successive refinement that models improve to a point where they can be of utility. Computer simulation of mathematical models and qualitative validations using domain knowledge can play an important role in speeding up the process of model refinement. The next subsection discusses the process of model refinement and validation in greater detail.

thumbnail image

Figure 2. The Process of mathematical modelling.

Download figure to PowerPoint

Model Refinement and Validation: Synergistic Interplay between Mathematics and Biological Reality

The key to the success of Systems Biology lies in the strong interplay between experimental work, biological domain knowledge, and mathematical models. The reconnection of the mathematics back to the underlying biological reality is an essential part of Systems Biology. This iterative interplay between biological reality and mathematical modelling has to occur at “every step” in the modelling process and is illustrated in Figure 3. For example, biological domain knowledge and availability of experimental data are key to deciding on the variables to be included in the model, the degree of biological detail, and in the simplifying assumptions that can be made on the connectivity of the system. Experimental data can provide input on the functional mathematical forms that can be chosen and on the values of the parameters in the model. The variables, assumptions, functional forms, etc., suggested by the experiments need to be modified in an iterative manner until the model is fully validated. The modelling process is successful when the evolutionary process of iterating between biology and mathematics leads to a continuous improvement in the level of understanding of the biology. Modelling is a continuous process where many views of a subsystem have to be eventually integrated into a “systems” view of all key aspects of reality. No model may be perfect but each successive iteration of the model may lead to a “better” model and thus greater success in achieving the goals of the modelling process. A well-trained Systems Biology team can be characterized by the ability to catalyze such a dialectic process. An example of this interplay between modelling and experiments can be seen in the work of Agrawal on modelling programmed cell death in Arabidopsis plant defense systems (Agrawal et al., 2004). In this work, an initial hypothesis on the connectivity of the signaling network was successively refined until the simulated results were validated by the experimental observation.

thumbnail image

Figure 3. Interplay between biological data and mathematical modelling.

Download figure to PowerPoint

The identification of the quantitative values of the parameters is a very challenging task in modelling. It is possible to develop mathematical models with very little biological insight behind it. Such models are useless unless they are “validated” or connected to the real world. This process of validation may involve both qualitative validation as well as quantitative validation. In qualitative validation, one has to ensure that the qualitative trends are correct. For example, an expected trend may be that the population of cells in the bioreactor is increasing with time or the level of nutrients in the bioreactor is decreasing with time. Such qualitative validation can be based on “intuition” or “biological knowledge” or “biological common sense” or “deeper biological knowledge” of the system. Quantitative validation, however, requires real numbers for each of the parameter values. It is now no longer sufficient to say something is increasing or decreasing but one needs to predict the exact numerical value of the variable in space and time. This is only possible if the mathematical functions and equations relating the variables are accurate and if the numerical values of the parameters are exactly identified. The task of parameter identification is considerably easier if one has a simple model with minimal variables and few parameters. The degree of detail in a mathematical model is extremely important and is often a major function of the ability to experimentally validate the model. This is discussed in greater detail in the next subsection.

Mathematical Model Complexity Versus Degree of Biological Detail: Top-Down Versus Bottom-Up Models

A major challenge for a systems biologist is: what is the right degree of complexity of the mathematical model? The answer is best given by someone with deep domain knowledge about the biological system that is being modelled and a genuine understanding of the goals of the modelling process. The choice of a simple or complex model is often dictated by the modelling goals as well as by the quantity and quality of biological information available about the system. The art of mathematical modelling requires a trade-off between mathematical simplicity and biological sophistication. The main disadvantage of simple (top-down) models is that it may not capture all the biological details of a system that are essential to describing a behaviour of interest especially if such behaviour was not considered during its development. If a certain biological component or phenomena is missing from the model, then one lacks the ability to understand or manipulate the missing component. There is a temptation to build detailed (bottom-up) models just to incorporate the new types of experimental data that are being generated. However, one should check if the additional data is contributing to the achievement of the modelling goals. This challenge is even greater today in the context of the explosion in experimental data resulting from the “omic” measurement technologies.

A mathematical model of a biological system that includes all the physiological descriptors of the system would consist of tens of thousand of partial differential equations that define the location and concentrations of various molecules at each point in time. Such an approach at model building would be considered a “bottom-up” approach where one starts with all available details and then builds the most elaborate feasible model. Many purely descriptive models are of the bottom-up type (such as the e-cell, virtual cell modelling framework developed by Tomita et al. (1999), Takahashi et al. (2003), and Loew et al. (2001)). One of the earliest examples of a bottom-up descriptive model was that of Domach–Shuler (Domach et al., 1984). More recently, Castellanos et al. (2004) have been working on making these detailed descriptions more modular. Most bioinformatics approaches are of the bottom-up type although they rarely go beyond data analysis methodologies into the realm of mathematical modelling.

The approach of Entelos (Defranoux et al., 2005) to modelling of diseases is on the other hand very different and can be called a “top-down” approach. Manuscripts may be classified as “top-down” approaches where one starts from the whole system level and bores down into detail as needed. In chemical engineering, one uses the term “lumping” to describe the process of treating a large number of variables as a single mathematical entity. One starts with the minimum level of detail and only adds detail according to the demands of the goals of the model. Variables that cannot be measured or experimentally validated may be avoided in such a top-down approach. Variables that are not relevant to the goals and expectations of the model are not included. The cybernetic modelling approach proposed originally by Dhurjati et al. (1985) is an example of a top-down approach that tries to avoid going into too many biological details by focusing on the apparent “goal” of the system. The cybernetic approach is analogous to focusing on the general of an army and trying to figure out his strategy rather than keeping track of what all the soldiers are doing. The living system is considered to have a goal that it may have acquired over the course of evolution and this cybernetic goal is used mathematically to reduce the model complexity. The flux balance models later proposed by Majewski and Domach (1990) and Varma and Palsson (1994) have a similar concept of an optimal goal.

The choice of the degree of detail in a model may eventually be determined by more practical considerations such as the ability to accurately determine the numerical values of parameters and the ability to perform validation experiments. There is an optimal degree of detail dictated by the balance between being inclusive and being useful.

CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?

  1. Top of page
  2. Abstract
  3. INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
  4. ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
  5. CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
  6. SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
  7. SYSTEMS BIOLOGY TOOLS
  8. EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
  9. REFERENCES

The metrics of “utility” or “success” in the modelling process depend on one's goal. There are several reasons for building mathematical models in Systems Biology. One may build a model to organize information, to test alternative hypotheses, to provide guidance for new experiments, to aid in data analysis, to uncover regulatory networks, or to discover underlying principles, etc. Alternatively, one may want to use a model to “predict behaviour in new domains” thus alleviating the need for further expensive experiments. Or one may just try to “fit data” over a limited domain so that one can interpolate or predict behaviour in that limited domain. The goal of a model may also be to take “control action” in order to “maintain the system” within a range of behaviours or to respond to extreme or crisis situations. The success or failure of a model depends on the ability to achieve the stated goals of the modelling. If one achieves the goal, one has succeeded in the modelling endeavour. Too many models claim to predict but it is a rare gem of a model that truly has predictive ability. Models are governed by the rules of “garbage in–garbage out”. A hollow inadequate model of a system will produce misleading and garbage predictions. Simple models are as likely to be inadequate as complex sophisticated models. The ability to achieve the goals of the modelling is the ultimate metric of model quality.

Manuscripts may be classified based on the type of applications that are being considered. For example, the application areas may include: application to medicine as in drug discovery; application to agriculture as in understanding plant disease resistance or genetic modification of plants; industrial applications as in production of recombinant proteins or metabolic engineering of new pathways, small molecules, etc.; environmental application as in bioremediation; energy applications as in biofuels, etc. One can also classify the manuscripts according to the specific system that is being modelled (e.g. a regulatory circuit, organelle, type of cell such as bacteria vs. yeast, tissue type, organ, animals, or plants). Models may also be classified based on the biological phenomena that have been included in the model such as gene induction/repression, regulation, transcription, post-transcriptional modifications, translation, post-translational modifications, folding and activity, binding of molecules, protein–RNA interactions or other molecule–molecule interactions, metabolic or signaling networks, regulatory circuitry, membrane-level phenomena, intracellular events, intercellular phenomena, biological transport, programmed cell death, network behaviour and interactions, multi-cellular phenomena, growth, differentiation, development, toxicity response. Manuscripts can also be classified based on the type of quantitative methodology used. In the case of purely data analysis type approaches (more Bioinformatics than Systems Biology) one could classify manuscripts based on whether one is using clustering, ANOVA, singular value decomposition, neural nets, genetic algorithms, etc. In the case of models, one could classify models whether they are based on flux balance (stoichiometry, matrix algebra) or whether they use ODEs, PDEs, delay differential equations, stochastic approaches, etc.

Conceptual classification of the various models that are available in the Systems Biology literature can act as a guide in exploring the large amounts of literature in the field. In this section, we have chosen to classify manuscripts conceptually on the basis of the biological impact (Figure 4).

thumbnail image

Figure 4. Classification of models and manuscripts related to Systems Biology.

Download figure to PowerPoint

Models for Organizing Information

A major role of models is in providing an organizing framework or a language to connect the various components and describe their relationships in time and space. This can provide a “big picture” view of how the parts are connected to the whole. Mathematics can provide a much more compact representation than verbal or pictorial descriptions. For dynamic systems, mathematics may be the only way to describe complex, non-linear, spatiotemporal relationships among components (as it is not easy to describe a partial differential equation relationship in words or pictures!). However, spatial organization of the parts may sometimes be better illustrated in a picture than in an equation! Even if the model is not able to describe all the relationships in precise quantitative terms, the very process of organizing the information about the system and representing it in mathematical terms can have beneficial effects. This step might be the basis for subsequent detailed quantitative analysis and simulation. A mathematician or engineer may find it much easier to understand a set of partial differential equations rather than very detailed and qualitative biological descriptions. A peripheral goal of such modelling may thus be as a linguistic translator for “communicating information across disciplines.”

Models for Guidance in Design of Experiments

The goal of modelling may be to design the next set of experiments to improve the understanding of the system. In Systems Biology, an important reason to model is to provide guidance prior to doing experiments. What should one measure, when, and how many times? How much confidence can one have in the measurements? For both the modeller and the experimentalist, it is not the “amount” of data that is important but the “nature and quality” of data and its relevance which is critical. If one is doing a micro-array experiment with an Arabidopsis plant system, should one be collecting data every few minutes, every few hours, or every few days? What are the other important molecules and when should they be sampled? Should one be measuring metabolite information or should one be focusing on data at the level of genes and proteins? Models have a role in the proper design and conception of an experiment in order to get the most out of expensive experiments. For example, a model can provide some initial and approximate ideas on the dynamic characteristics of some of the measured variables and thus guide the experimentalist on when to take data in order to capture the dynamic behaviour. Quality of experimental information also has an impact on determining the usefulness of model itself. The proper design and conception of the experiment allows one to get the most out of the model. Recently, Gadkar et al. (2005) have developed techniques to determine the optimal set of experiments that will lead to the maximal identifiability of parameters in models of large-scale biological networks. Such techniques can be valuable to design the most informative experiments to discriminate different hypotheses.

Models to Understand Dynamics and Regulation, Discover Novel Interactions and Organizing Principles

Once one starts dealing with highly time-dependent (dynamic) phenomena or systems with large number of regulatory feedback loops, then one immediately starts understanding the extremely valuable role of mathematical models. One can get away without mathematical models for organizing static information, for analysis of simple data, for design of simple experiments, for testing of simple hypothesis, or for understanding simple underlying patterns. However, the key and defining characteristic of life is change. The state of a living system has to be characterized in terms of dynamically changing physiological descriptors. The responsiveness and adaptability of living systems is because of the sophisticated regulatory networks at all levels ranging from gene-level networks to protein and metabolite level networks to signaling networks. It is impossible to reason non-mathematically through the dynamic effects of the interactions among a large numbers of regulatory networks. Mathematical modelling and systems engineering tools are a necessity. The incorporation of regulation into the modelling framework is not new (Goodwin, 1963). Both dynamics and regulation are complicated and one cannot deal with it in the absence of a mathematical framework. In the best of circumstances, models and experiments working in concert can lead to the discovery of novel organizing principles underlying biological systems, such as the goal of “optimal resource utilization” in modelling bacterial metabolic networks (Schuetz et al., 2007).

Models to Guide in Data Analysis

The goal of data analysis is to convert large amounts of “data” to “knowledge”. This goal is not unique to biology. In chemical engineering, there has been considerable work done in the analysis of data coming from chemical plants in order to recognize “faults” (Petti et al., 1990; Venkatsubramanian et al., 2003). This requires recognition of the data pattern or fingerprints or fault signatures generated by a specific fault. The fault fingerprint or fault signature may be obtained from domain knowledge that may be stored in the form of rules, stored as a pattern in a computer memory or as a dynamic pattern generated by a model. Many of the ideas and methodologies used in the other fields can be borrowed fruitfully to analyze biological data.

Experimental data are analyzed to obtain parameter information or to discover underlying patterns. Certain types of data analysis, such as regression, clustering, PCA, ANOVA, support vector machines, do not require “formal” mathematical models. One can regress coefficients for polynomial expressions that cannot be considered as mathematical models. However, if one is trying to detect dynamic patterns or complicated dynamic motifs, then one needs to have a mathematical model to generate and recognize these dynamic patterns (Michaud et al., 2003). Biological domain knowledge and models can substantially enhance data analysis capabilities and can lead to much better analysis and understanding of the data. Weather data, chemical plant data, satellite image data, etc., all have different underlying characteristics. It is sometimes easier to diagnose certain patterns when one has domain knowledge that may be encoded in models. The earliest motivation for quantitative approaches in Systems Biology was the need to make sense out of an ever-expanding pool of experimental data. The initial approaches in bioinformatics were a good start but as one deals with dynamics and non-linear phenomena, one starts seeing the need for knowledge-based data analysis and model-based data analysis.

Models for Hypothesis Generation, Discrimination, and Testing

The real essence of a mathematical model may be found in the assumptions that facilitate the translation of physical reality into a set of equations. The model may thus be considered to consist of two different parts: (1) a set of mathematical equations and (2) a set of assumptions. Too often, one equates a mathematical model with a set of equations and does not pay sufficient attention to the assumptions. However, the assumptions are just as important as the equations, both to the development of the model and its application. Each set of assumptions may lead to a different set of equations or system parameters. Each set of assumptions can be thus considered to be a hypothesis and one can test many different hypotheses using models. Besides assumptions about what is connected to what and how, there are several other types of modelling assumptions. These include assumptions about rates, assumption about feedback and nature of feedback (positive, negative, more complex), assumptions about what are the key variables influencing the system behaviour (what variables to focus on and what behaviours to ignore), assumptions related to system parameters, assumptions related to spatiotemporal aspects, etc. Many of these assumptions can be tested mathematically and computationally to get system insights and provide guidance for experiments. One could also do hypothesis testing via experiments but this is more expensive than mathematical or computational hypothesis testing. Thus models can be used to prune down the potential set of hypothesis so that the experiments can be focused on discriminating between a few key hypotheses.

An example of a modelling hypothesis is the assumption of a particular type of connectivity between various components (Agrawal et al., 2004). There are many possible ways to connect signaling molecules or other cellular components. One could mathematically or computationally test the consequences of each of these alternative hypotheses and see which ones are biologically plausible. Each hypothesis leads to a different model and one can “test” each of these models to figure out which hypothesis results in behaviour that is closest to reality. Such an approach has proved to be very fruitful in the work of Agrawal et al. (2004) on the Arabidopis plant defense system. One can have a systematic way of testing hypotheses starting with a few simple hypotheses and progressively refining them with successive modelling iterations. From this point of view, modelling is the process of hypothesis building and testing. This is also the essence of the scientific method.

Models as Process Substitutes and for Reengineering and Design of Systems

A model that has been validated and in which one has sufficient confidence can be used as a very valuable tool for design and reengineering. A validated model in which one has sufficient confidence can be used as a substitute for the real system or process to ask “what if” questions and to simulate a large range of scenarios. As computational experiments are much less expensive than real experiments, using a validated model as a process substitute can have great value. The manipulation of a system in order to improve or optimize it can be very difficult and expensive if it is done in a purely empirical and Edisonian manner. Incorporation of models into the design loop can be a very effective. Most advanced industries, such as the chemical, automobile, and aerospace industries already integrate models into their design on a regular basis. One should expect the same to occur in Systems Biology once high-quality models are available. In synthetic biology, one is trying to come up with novel genetic regulatory elements that can act together as a module and exhibit simple and unique engineering functions such as a toggle switch (Gardner et al., 2000), oscillator (Elowitz and Leibler, 2000) with the eventual goal of assembling all these elements into modules of increasing complexity such as even a synthetic cell. In genetic engineering and metabolic engineering, one is trying to manipulate the genes and metabolism in such a way as to achieve some system or process goal.

SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT

  1. Top of page
  2. Abstract
  3. INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
  4. ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
  5. CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
  6. SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
  7. SYSTEMS BIOLOGY TOOLS
  8. EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
  9. REFERENCES

Several manuscripts are purely experimental and provide important information insight into selected biological systems. We have not included such manuscripts in this section. Manuscripts that are purely mathematical with no attempt to connect to experimental or biological reality have also not been included in this section. The main focus of this section is on manuscripts where there is a genuine attempt to integrate both mathematical and experimental approaches. An integrated and iterative experimental and modelling approach is critical to effectively understanding the behaviour of biological systems and to develop useful models of the system. In this section, we have reviewed key papers across four broad areas in the context of the biological question posed and iterative interaction between the modelling and experimental studies. We have not included manuscripts that use models for organization of information as described in Section “Models for Organizing Information” as this objective is ubiquitous to most articles. Further, manuscripts where models were exclusively used for design of experiments (Section “Models for Guidance in Design of Experiments”) were rather sparse and consequently, we did not devote a separate section for this part. We have organized the remainder of the four areas to mirror the classification of models discussed previously (Section “Conceptual Basis for Classification of Models: What are the Metrics of Usefulness of Models in Systems Biology?”) so that the first section focuses on the discovery of novel interactions/organizational principles and is followed by the section on the systems-level data analysis (Figure 4). The third section describes studies on hypothesis generation and testing and the final section presents manuscripts in the area of model-based design for synthetic biology.

Discovery of Novel Interactions/Organizational Principles

Improved understanding of biological systems necessitates a search for the governing principles underlying the response of the system to environmental and genetic perturbations. Identification of similar functional modules across different pathways and the characterization of the role played by such modules are required for improved models of the systems. Another important application of Systems Biology is in the discovery of novel interactions among biological entities, for example, a new metabolic pathway or a gene regulatory interaction. Studies that focus on characterizing these organizational principles and novel biological interactions are discussed below.

Regulatory networks

As an example, Milo et al. (2002), Shen-Orr et al. (2002), Mangan and Alon (2003) analyzed the transcriptional regulatory network in E. coli and uncovered several patterns of regulatory interactions that were over-represented relative to the rest of the network. These motifs represent functional modules such as a feed-forward loop, a single input module, and a dense overlapping region. Mangan and Alon (2003) subsequently developed a mathematical model of the feed-forward loop and showed that this regulatory module can lead to sign sensitive acceleration, that is, shorter response time for the target gene expression to the input. Additional features such as sign sensitive delays, cooperativity, and pulse generators were also identified depending on the nature of the interactions in the loop. Thus in this study, the objective was the identification of novel design principles, where the input to the initial computational study was the curated regulatory network. These motifs then formed the basis of the mathematical model used to elucidate the potential functions of the regulatory network modules. However, the regulatory network is not characterized in most organisms and identifying regulatory interactions is an intense area of research for which several methods have been proposed (Tavazoie et al., 1999; Liao et al., 2003; Vadigepalli et al., 2003).

Regulatory network inference approaches involve the reconstruction of connections based on gene expression data through bioinformatics approaches that rely on sequence and clustering analysis (Tavazoie et al., 1999; McCue et al., 2001). Once the structure of these interactions are established, other approaches such as Network Component Analysis (Liao et al., 2003) can be used to define strength of the interactions based on gene expression data. However, in these analyses the primary emphasis is on the reconstruction of the component interactions from data and hence, most of these initial studies have focused on connection from data to conceptual or qualitative models of gene regulation. Recent studies have begun to validate these regulatory network models by using these models to guide experimental design and the collection of gene expression data. For example, Covert et al. (2004) developed an integrated metabolic and a regulatory network model of E. coli and utilized the combined model to design experiments to better characterize modules that were found to be inconsistent based on literature data (Herrgard et al., 2004). Gene expression data from the model-designed experiments was reconciled with the model and subsequent refinements to model were made resulting in improved accuracy for predictions of both gene deletion phenotypes and gene expression.

An interesting approach to regulatory network reconstruction based on gene expression data was proposed by Gardner et al. (2003). In this approach, a linear system model of gene expression was used to identify the strength and presence of gene interactions based on transcriptome data. Specifically, the model of a nine-gene subnetwork of the SOS pathway was developed and the expression data from nine perturbation experiments was used to obtain a quantitative model of the regulatory network. Subsequently, the incorporation of expression data upon exposure to mitomycin C (recA inhibitor) in the model correctly identified the transcriptional target of mitomycin C. Following this work, di Bernardo et al. (2005) used a similar algorithm to develop a network model of genome wide gene expression from 515 experimental conditions that included exposure to compounds, gene deletions, and overexpression. This network model was then utilized to infer the mechanism of action of a drug (PTSB) with a previously unknown mode of action. The inferred target (thioredoxin reductase) was subsequently validated with biochemical experiments. Thus, in both these cases, the objective of the modelling and experimental exercise was to characterize the regulatory network and to identify potential mechanism of action of drug targets. The network models in this case were used to identify these novel interactions in the network which was subsequently verified experimentally. This interaction between the models and experiments is an essential defining feature of the Systems Biology approach.

Metabolic networks

Analysis of metabolic networks has been conducted resulting in the identification of organizational principles of metabolic modules. For example, Almaas et al. (2004, 2005) analyzed the metabolic network of E. coli and identified a high flux backbone of the network that was primarily reorganized in response to changes in environment. Mahadevan and Palsson (2005) analyzed the metabolic networks of S. cerevisiae, G. sulfurreducens, and E. coli and identified that lowly connected metabolites such as those in a biosynthetic pathway with lower flux are just as likely to participate in essential metabolic reactions. In another study, Ihmels et al. (2004) analyzed the transcriptional regulation of metabolic networks in S. cerevisiae and observed that the coordinated effect of regulation via the suppression of branch points, led to increased linearity in the metabolic flow. Papp et al. (2004) analyzed the role of gene duplicates in the metabolic network of S. cerevisiae and concluded that the functional role of these duplicates was not to provide robustness to gene deletions but for boosting enzymatic flux through the presence of multiple copies of the same gene. Thus, taken together these studies clearly point out several design principles in these metabolic networks, namely, the importance of regulation of the high flux backbone to changing environments, the essentiality of lowly connected metabolites, the transcriptional mechanism of orchestrating global changes to the network, and the relation of gene copies with the functional role of the gene product.

Signaling networks

Detailed models have also been constructed for signaling pathways for mammalian and plant systems. For example, Bhalla and Iyengar (1999) constructed a comprehensive model of signaling pathways in the hippocampal CA1 neuron, comprising of over 100 biochemical reactions across 15 different modules. All of these reactions were represented using mass action and Michaelis–Menten kinetics and the parameter values obtained from literature studies. The resulting comprehensive model was used to simulate interactions among the pathways and these simulations provided critical insights on the systems level response of these pathways including complex dynamics, such as bistability, signal integration, and non-linear response, all of which arise as a result of the intricate interplay between the different modules analyzed. In another study, Hoffmann et al. (2002) developed a pathway scale computational model of the nuclear localization of the transcription factor NF-κB and its activation. The analysis of the model in this case suggested novel functional roles for the three IκB isoforms to enable a shorter NF-κB response and to decrease oscillations in the response, that was subsequently experimentally validated, leading to an improved understanding of this signaling module.

Computational analysis of signaling pathways has also been conducted for bacterial processes too. Alon et al. (1999) investigated the robustness of the chemotaxis network in E. coli through which it responds to moves towards various chemoattractants. Both computational analysis of a two-state model (Barkai and Leibler, 1997) and the experimental observations in that study confirmed that the property of exact adaptation arises from the network structure and is invariant to changes in intracellular protein concentrations. Finally, Yi et al. (2000) have shown that robustness of the exact adaptation in the chemotaxis network is due to the integral feedback control property arising from the structure of the chemotaxis network.

Systems-Level Data Analysis and Mining

Conceptual, qualitative, and quantitative models have been used to interpret biological data, and data mining. Several statistical tools have been used to integrate data at different biological levels and draw physiologically meaningful conclusions. While a detailed review of all of these methods is outside the scope of this article, we will summarize few key articles that establish the fundamentals of use of models for data integration and mining.

Mehra et al. (2003) developed a mathematical model for representing the transcription and translational processes and identified several factors, such as the protein decay rate, concentration of free ribosomes, that can reduce the correlation between changes in mRNA and protein levels. Experiments that measured both mRNA and protein levels from the same culture under different conditions (Lee et al., 2003) revealed that the changes in mRNA and protein levels were not correlated significantly clearly illustrating the need to measure both mRNA levels and protein levels for system under study (Hatzimanikatis and Lee, 1999). Hence, the protein and mRNA levels represent two complementary sources of information about the biological system.

Ideker et al. (2001b) developed a physical interaction network model with ∼3000 interactions and used this qualitative network to integrate both mRNA and protein expression data in S. cerevisiae to obtain information on the functional role of proteins in the galactose utilization pathway. They identified 997 mRNAs that were significantly perturbed and subsequently verified some of the hypothesis generated by the qualitative network model. More recently, Haugen et al. (2004) used an updated interaction network model consisting of 20 985 interactions to analyze gene expression data and phenotype data from growth competition experiments for 4650 strains to identify global changes in the transcriptional and metabolic network of S. cerevisiae in response to arsenic contamination.

Hwang et al. (2005a) have also recognized the issues in data integration, when the different data types vary in size, confidence, and network coverage and have developed a general algorithm based on advanced statistical techniques to better interpret the data. These authors used the developed approach to analyze 18 data sets including mRNA, protein levels, protein–DNA interaction data, and protein–protein interaction data related to galactose utilization in S. cerevisiae (Hwang et al., 2005b) and identified 69 genes that were perturbed significantly in the data sets. Additionally, the analysis suggested that fructose metabolism would be down regulated in the presence of galactose via the downregulation of a hexose transporter, a hypothesis that was experimentally verified through the measurement of corresponding protein levels.

Patil and Nielsen (2005) analyzed transcriptional changes associated with metabolic networks and identified reporter metabolites around which significant transcriptional changes occur. They used this approach to integrate gene expression data from S. cerevisiae with the metabolic network and identified metabolites in parts of the network which was perturbed suggesting that this approach could be used to analyze the mechanism of action of a perturbation given the expression data. Recently, Cakir et al. (2006) used a similar approach to integrate metabolomic and transcriptional regulation data from 84 intracellular metabolites and analyzed data from a laboratory and an industrial strain of S. cerevisiae used for ethanol production. The results of this analysis allowed the discrimination of the dominant regulatory process in the two strains as metabolic regulation or transcriptional regulation providing valuable insights for potential engineering of these strains for improved biocatalytic performance.

One of the other research areas where such data mining techniques are valuable is in the area of discovery of biomarkers based on the analysis of biological data. These studies typically involve the collection of a large number of data sets across different conditions such as diseased and healthy tissues. The goal of these studies is the identification of specific biological entities that are altered in the diseased tissues and can be reliably correlated with the disease. One of the earliest studies that utilized micro-array data is the work of Nutt et al. (2003) for cancer classification. In this study, the authors analyzed gene expression data from ∼12 000 genes in 100 different tissue types and used the k-nearest neighbourhood model classification algorithm to classify the different tissue types. The authors report that the gene expression data-based classification algorithm was able to better distinguish between the different tissues types and were more predictive of the clinical outcomes than classical methods.

A key difference in these data mining approaches as compared with the rest of the reviewed articles is in the coarse grained nature of the interaction network model and the use of heterogeneous data sets as inputs to the computational approaches. The models are qualitative and hence, can be scaled to accommodate regulatory, metabolic, and signaling interactions in one integrated network.

Generation, Testing, and Elimination of Biological Hypotheses

In addition to signaling networks, cell differentiation leading to sporulation in B. subtilis was analyzed by Iber et al. (2006) In this study, a mathematical model of the sporulation process that included 150 reactions with 25 parameters was developed and was validated with literature data. The validated model was used to generate a key biological hypothesis that the differences in the volume of two cell types that develop from the same cell, determines cell fate and the sporulation process. Specifically, a low rate of dephosphorylation catalyzed by a phosphatase was predicted to be a key factor in sporulation and subsequent experimental measurement of this rate confirmed the hypothesis of the mathematical model illustrating the value of the interactions between modelling for hypothesis generation and experiments for testing such hypotheses.

The availability of whole genome sequences for micro-organisms have made possible the genome-scale reconstruction of metabolic networks. Such reconstructed networks formed the basis of the development of genome-scale metabolic models based on Flux Balance Analysis. Edwards and Palsson (2000) developed a comprehensive steady-state metabolic model of E. coli comprising of over 627 reactions and 438 metabolites and compared the model predictions with large-scale gene essentiality data. Edwards et al. (2001) used this genome-scale metabolic model to test the hypothesis that the E. coli grows at the optimal model predicted rate on glucose. While the experimental observations confirmed the optimal growth on glucose, it clearly indicated that E. coli growth on glycerol was suboptimal. Interestingly, subsequent adaptive evolution of E. coli glycerol lead to increased growth rates that were closer to the model predicted optimal growth rates (Ibarra et al., 2002). Resequencing these adapted strains revealed that the genetic changes accompanying the increased growth rate on glycerol involved multiple mutations in RNA polymerase, glycerol kinase suggesting that the molecular basis for adaptation in micro-organisms involve both local and global changes (Herring et al., 2006). Further, these results also suggested that flux through the glycerol kinase might have led to the suboptimal growth observed initially.

More recently, Reed et al. (2006) used an updated model of E. coli metabolism and compared the predictions with phenotype data and identified cases with incorrect predictions. They then formulated an optimization algorithm to identify missing reactions that if added would enable correct predictions of the phenotype data. The computer-generated hypotheses were experimentally confirmed in five cases leading to a novel functional assignment for two enzymatic activities and four transport proteins. Mahadevan et al. (2006) recently constructed a genome-scale metabolic model of metabolism in the dissimilatory Fe(III) reduction with the objective of improved understanding of this unique mode of metabolism. The model developed consisted of 588 genes and 524 reactions and led to the finding that energetics of global proton balance could explain the observation of decreased biomass yields associated with Fe(III) reduction. The model was also used to predict the outcome of knockout experiments and a counter intuitive prediction that led to increased biomass yield in a mutant was experimentally verified leading to further validation of the energetics of metal respiration incorporated in the model. Segura et al. (2007) further investigated the role of redundant pathways by computationally calculating all of the alternate pathways and verified the physiological role of the potentially redundant pathways through a combination of genetic, biochemical, and growth physiology data for six mutant strains in 12 different environmental conditions. This analysis provided valuable information on the functioning of central metabolism in G. sulfurreducens.

Model-Based Design for Synthetic Biology

Chemical engineering and other engineering disciplines rely heavily on analytical tools and mathematical models in several areas including reactor design, unit operations, and process control, where mathematical modelling and simulation are routinely used in design. With the availability of detailed models of biological systems, such systematic design and manipulation of biological processes at the molecular level is becoming a reality and has led to the emergence of “synthetic biology.” In this section, we focus on studies that have developed an integrated modelling and experimental approach to the implementation of the model-based designs.

As an example, Gardner et al. (2000) developed a mathematical model of gene regulatory network with two repressors that mutually downregulate the expression of the other and each of which is inducible by either the addition of IPTG or increased temperature. Dynamical analysis of the mathematical model revealed the existence of bistability when the promoter strengths of repression are matched. They then constructed a plasmid with the two repressors and the promoters and observed the switch-like behaviour as predicted by the model. Although the system modelled in this example was small with two states, the linkage between the molecular entities (transcripts) and a higher-level response (the toggle switch) is clearly elucidated.

Similarly, Leibler and co-workers (Barkai and Leibler, 2000) developed a computational model based on the common features of the gene regulatory network controlling circadian rhythms (the biological clock) in different organisms. The analysis of the model indicated that the presence of both negative and positive feedback regulation elements provided robustness of oscillations in spite of biochemical noise and changing cellular environment. Elowitz and Leibler (2000) constructed an oscillatory gene regulatory network with three mutually acting repressors in a series. Essentially, this involved the construction of three negative feedback elements arranged in a circular fashion, where each repressor downregulated the expression of the downstream element. This gene regulatory network was found to exhibit oscillations similar to the networks controlling the circadian rhythms. However, in contrast to the circadian rhythm networks, the repressilator was found to be sensitive to biochemical noise and the environment, suggesting that the structure of the regulatory network is primary determinant of dynamic behaviour such as robustness (Vilar et al., 2002).

Liao and co-workers (Fung et al., 2005) extended this concept by integrating a gene regulatory network with a metabolic pathway to obtain sustained oscillations in both the enzyme and metabolite levels. In this study, an integrated model of the regulatory and metabolic network was developed consisting of four metabolites and three enzymes. Even though this model was not a large-scale study, the analysis of the model suggested that a high glycolytic rate was required for sustained oscillations. Experiments with substrates with different glycolytic flux (e.g. glucose that has relatively high flux compared to glycerol) confirmed the predicted result, as oscillations were observed when glucose was utilized, whereas there were no oscillations when glycerol was the substrate.

Other researchers have also constructed synthetic biological networks involving bacteria signaling networks. An example of this is quorum sensing pathway that is ubiquitous in bacteria and is a mechanism for detecting cell density. Weiss and co-workers engineered the sensing circuit so that cells at different distances from the centre would express different proteins in response to signaling molecules synthesized by cells in centre. By changing the sensitivity of the circuit, they were able to create a programmed pattern of cells on a plate (Basu et al., 2004; McDaniel and Weiss, 2005). Computational analysis with a model comprising of four proteins and the signaling molecule revealed the importance of the degradation rate of a protein in determining the size of the pattern formed.

More recently, genome-scale metabolic models have been used to engineer strains capable of enhanced synthesis of biochemical compounds. As example, Bro et al. (2006) used the genome-scale model of S. cerevisiae to identify gene insertion strategies the implementation of which lead to improved ethanol production. Similarly, Palsson and co-workers (Fong et al., 2005) used an optimization algorithm to calculate the gene deletion strategies that would couple lactic acid over production in E. coli to growth. They constructed a strain with the identified mutations and adaptively evolved the mutant to enhance both growth rates and lactic acid production. Alper et al. (2005) integrated this model-based approach identification of gene deletion with random mutagenesis to select for strains that increased the production of lycopene in E. coli. Hence, these promising studies indicate the potential of a systematic approach to designing complex biological network that encompass all types of biochemical networks.

SYSTEMS BIOLOGY TOOLS

  1. Top of page
  2. Abstract
  3. INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
  4. ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
  5. CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
  6. SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
  7. SYSTEMS BIOLOGY TOOLS
  8. EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
  9. REFERENCES

The future of Systems Biology depends on the training and education of a new generation of researchers capable of doing work in a highly interdisciplinary field and also on the availability of research tools and infrastructure that allows one to bridge the disciplinary gaps. We conclude with a few comments on research tools and education.

In the last few years, increasing number of tools and software have been available for modelling biological systems. These include both academic grade and commercial grade modelling software. Some of the examples of academic grade software include Gepasi, Systems Biology Workbench, MetaFluxNet, CellNetAnalyzer, FBA that are summarized in Table 1. Most of the software allow for the simulation of a biochemical network once the pathways and the kinetic parameters are known. Some of these also include dynamical analysis software that is valuable for the assessment of stability and other dynamic properties. Several companies have also developed tools for modelling and analysis including Mathworks (Natick, MA), Genomatica Inc. (San Diego, CA). Gene Network Sciences (Ithaca, NY), and Entelos (Foster City, CA), although in most cases the modelling platform is proprietary and is not publicly available. A more detailed list of tools in Systems Biology is also available at http://sbml.org/.

Table 1. List of academic and commercial softwares and their applications
 ApplicationSource
Academic software
CellNetAnalyzerSimulation and topological analysis of metabolic and signaling networkswww.mpi-magdeburg.mpg.de/projects/cna/cna.html
Biology workbenchWeb-based tool for searching protein and gene sequence databaseshttp://workbench.sdsc.edu/
GENESISSimulation platform for neural systems consisting of biochemical reactions and single neuron modelswww.genesis-sim.org/genesis/
GepasiBiochemical kinetic simulator, metabolic control analysis, optimization, and stability analysiswww.gepasi.org
MetaFluxNetMetabolic flux analysis, flux balance analysismbel.kaist.ac.kr/mfn
Systems Biology workbenchPlatform connecting applications for modelling, analysis, visualization, and data manipulationsbw.kgi.edu
VCELLRemote modelling and simulation environmentvcell.org
Commercial software
SimPhenyTMModels of metabolism in simple and complex organismsGenomatica Inc.
SimBiologyTMFramework for developing and analyzing models of biological systemsMathworks Inc.
PhysiolabTMModels of human physiologyEntelos Inc.
VisualCellTM, VisualHeartTMPlatform for data integration and simulation, cardiac modellingGene Network Sciences Inc.
In silico discoveryModels of cellular metabolismIn silico Biotechnology GmbH

The abundance of biological models has led to the definition of a Systems Biology markup language SBML (http://sbml.org/) that serves to standardize the models to enhance their portability across different software. Finally, model databases are also becoming available where researchers can post their model of a biological system for community use. Examples of these model databases include DOQCS (doqcs.ncbs.res.in), BIOMODELS (http://www.biomodels.net/) database. Alliance for Cell Signaling has made available data from a variety of studies relating to signaling networks at http://www.signaling-gateway.org/data/Data.html for the research community. Similar gateways also exist for other biological fields, such as cell migration (http://www.cellmigration.org/index.shtml) and cell signaling (stke.sciencemag.org/cm). The future of Systems Biology research will be greatly aided by the availability of tools to seamlessly move between heterogeneous databases, powerful and easy-to-use analysis tools, modelling and simulation tools, visualization capabilities, easy access to domain knowledge, new technologies for experimental measurements, etc. In addition, novel computational approaches that can represent both the deterministic nature of well-studied subsystems with the biological uncertainty in less studied subsystems will be valuable in advancing the utility of the systems biology approach. These activities set the stage where comprehensive models of complex biological systems can be assembled by integrating the efforts of the Systems Biology community as a whole.

An area of major potential impact for Systems Biology is in medicine. Ultimately, health of a system is defined by its physiological state. From a Systems Biology point of view, health is defined by a certain state of the physiological components and a certain state of dynamic connectivity among the components. Disease in this context is a state with the wrong components or connections. The long-standing view of medicine has been that a drug molecule can somehow take a system from a state of disease to a state of health. As Jay Bailey states (Gibbs, 2001), “one reason why drug discovery technologies have not paid off as hoped is that they are based on the naive idea that you can redirect the cell in a way that you want it go by sending in a drug that inhibits only one protein,” Systems Biology provides a novel and revolutionary perspective on health, disease, and the way to pharmaceutical discovery (Hood et al., 2004). Many of the current problems of unanticipated toxicity of drugs that has been plaguing the pharmaceutical industry recently is a result of missing out on a systems level perspective (Bugrim et al., 2004; Ekins et al., 2005).

EDUCATION OF FUTURE SYSTEMS BIOLOGISTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
  4. ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
  5. CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
  6. SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
  7. SYSTEMS BIOLOGY TOOLS
  8. EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
  9. REFERENCES

The training and education of a successful systems biologist or mathematical modeller is a very challenging task. The individual has to have sufficient multi-disciplinary knowledge and background to be able to communicate seamlessly between the disciplines of life sciences and mathematics. The systems biologist also needs to have sufficient knowledge of the biological system to understand the goals of the biologist and to capture the essential features of the system in a mathematical language. The individual has to have sufficient mastery of mathematical, statistical, and engineering tools in order to facilitate the selection of the appropriate quantitative tools and methodologies. The ability of translating biology into the mathematics without losing key information in the translation process is important. The ability to interpret the mathematical models in a proper way to generate biological insights is also critical. The systems biologists should be able to communicate to the life scientists as to what the right level of expectations ought to be from the mathematical analysis. Excessive expectations from the mathematical models eventually result in unnecessary hype, or cynicism and distrust from the life scientists. On the other hand, excessive distrust of the models can lead to under-utilization of potentially powerful and valuable tools for furthering the frontiers of biology. A successful systems biologist should therefore attempt to balance these conflicting objectives of expectation and potential and should also have the ability to understand the biological questions and select the right quantitative tools in order to effectively answer these questions.

There are three ways in which one could train systems biologist. First, one can take individuals who are well trained in mathematics and computational sciences and provide them with biological domain knowledge to be effective. Second, one can take individuals with expertise in the biological domain and train them in mathematics and computation. The first two approaches are relevant in corporations or institutions where senior people are attempting to enter the field of Systems Biology. The third and most effective approach to training of systems biologists is to start at the undergraduate level and train a whole new generation of students who are equally adept at mathematics and biology. In fields, such as chemical engineering, it has been common to have students who are comfortable with both mathematics and chemistry. In Systems Biology, one needs to come up with a new class of students who are in command of both mathematics and biology and who can bring about new synergies between the two previously disparate fields.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
  4. ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
  5. CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
  6. SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
  7. SYSTEMS BIOLOGY TOOLS
  8. EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
  9. REFERENCES