### Abstract

- Top of page
- Abstract
- INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
- ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
- CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
- SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
- SYSTEMS BIOLOGY TOOLS
- EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
- REFERENCES

Systems Biology is a nascent field that arose from the technology driven omics measurement revolution. It goes beyond mere data analysis and focuses on the biological behaviour emerging from the dynamic interactions between system components that are organized in a hierarchical and highly connected manner. Mathematical models have been used as a conceptual framework to study such systems and their impact is maximal when there is a synergistic interplay of the models with experimental data and biological domain knowledge. The review provides an introduction to the modelling process and selectively highlights manuscripts that have strong biological impact.

La biologie des systèmes est un domaine qui est apparu récemment avec la révolution dans les mesures omiques dues aux progrès technologiques. Ce domaine va au-delà de la simple analyse et s'intéresse au comportement biologique venant des interactions dynamiques entre les composantes de systèmes qui sont organisées de façon hiérarchique et hautement interreliées. Des modèles mathématiques ont été utilisés comme cadre conceptuel afin d'étudier ces systèmes, et leur impact est maximal lorsqu'il y a une interrelation synergétique entre d'une part, les modèles, et d'autre part, les données expérimentales et les connaissances dans le domaine biologique. Cette revue de la littérature fournit une introduction aux articles mettant en lumière les procédés de modélisation et la sélectivité et qui ont un fort impact biologique.

### INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?

- Top of page
- Abstract
- INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
- ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
- CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
- SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
- SYSTEMS BIOLOGY TOOLS
- EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
- REFERENCES

Living systems consist of numerous biological components that are connected together as layered networks in time and space. The components and the interactions among them are constantly changing in response to perturbations inside and outside the system. The components of a living system function at different time and length scales. The changing interactions among the components results in higher order behaviour that leads to the adaptation and survival of the organism in changing environments. Systems Biology is a study of the complex dynamic interactions among the various components that underlie biological systems. The biological components include molecular entities, such as DNA, RNA, proteins, metabolites, or more organized entities, such as organelles, cells, tissues, organs. Systems Biology results in an understanding of behaviour that emerges from hierarchical dynamic networks present in living systems. The application area of interest in system biology may include medicine, agriculture, marine biology, and industrial biotechnology. The focus of this manuscript is on the synergistic interplay between biology and mathematics which is the basis for successful applications of Systems Biology.

Systems Biology is a field that is in its early stages of evolution and is still trying to grapple with the best strategy to study complex systems. Such studies may be purely experimental, or purely mathematical or computational or a combination of both approaches. The analysis of the complex dynamic interactions leads first to the ability to understand the behaviour of biological systems and subsequently to the ability to manipulate these systems. One approach for Systems Biology is purely experimental with an attempt to characterize the various molecular components such as genes, proteins, metabolites, signaling molecules. Recent breakthroughs in measurement technologies (the “omics revolution”) have led to the ability to characterize molecular components and resulted in a concurrent explosion of new data such as genomic, transcriptomic, proteomic, metabolomic, lipidomic, intermolecular interactions (interactome). A detailed experimental cataloguing of molecular components is thus becoming feasible. A large number of manuscripts in the literature have used a purely experimental strategy in Systems Biology. A mere experimental cataloguing of the system components although useful, does not by itself reveal much about the complex and constantly changing dynamic interaction among the component parts that ultimately determines system behaviour. In this review, we have deliberately focused our attention on studies that go beyond purely experimental approaches. At the very least, there is a need for a conceptual framework to understand the connections and relationships among the components. One needs to understand how the component parts are connected, how they relate to each other in time and space and how they respond and adapt to external environmental influence. The key to the success of Systems Biology is therefore the availability of a language or a conceptual framework to describe and analyse interactions. An explicit and “dynamic language of connectivity” is therefore the next revolution that is even more important for Systems Biology than the current “omics” measurement revolution.

At the simplest level, a language of connectivity can be a “verbal” description that lists how various components are connected and interact with each other. However, verbal representations or “verbal models” have their limitations especially as the number of components increase and the interactions become complicated. Another way of describing connections and interactions is as a “pictorial representation” (as in a connected graph or a map, arrows and nodes diagrams, signed digraphs, flowcharts, etc.). Metabolic and signaling pathways (commonly available in biology textbooks) are examples of a pictorial representation or “graphic models.” These pictorial representations can be converted to more rigorous mathematical representations using graph-theoretic approaches and Bayesian network formulations. However, these pictorial maps are usually inadequate for dynamic relationships even though animated images or pictures could have limited utility in providing spatial localization information. The big limitation of static maps is that they do not provide information on flows. For example, in a chemical engineering system, a piping and instrumentation diagram does not provide any information on material and energy flows. Similarly, a map of a global news network does not say anything about the information or news flowing through the network. Similar analogies can be pointed out for electrical circuits, road maps, etc.

Mathematical models and especially differential equations provide a trusted language to describe how the quantitative value of a system component changes in a dynamic manner as a function of time and environmental context. Such models require a quantitative understanding of the system and knowledge of the functional relationships among the components. Models need not always be mathematical. Because of the inherent complexity of biological systems, it may not always be possible to represent every detail in a mathematical model. In many cases, one may not know “a priori” what details are important and should be included in the model and what details are unimportant enough to be excluded. Even if one knew the important variables, one may not know the exact quantitative relationships among the variables. In such cases, one could potentially use qualitative differential equations or computer representations or models that are a hybrid of quantitative or qualitative approaches. Control systems engineering provides a language and a set of tools to describe feedback loops that are ubiquitous in biological systems. Systems Engineering has contributed to Systems Biology by bringing in systems tools and perspectives. However, the complexity of engineering systems is relatively very low in comparison to biological systems. Systems engineering tools are not easily or directly transferable to biology without paying special attention to the unique aspects of living systems. Biological systems are complex, self-organizing, highly adaptive, operating at many levels of hierarchy and highly nonlinear. It is indeed a daunting challenge to meaningfully apply mathematical or systems engineering tools to complex dynamic nonlinear adaptive living systems. One could justifiably question if existing tools are sufficient to meet the challenges of living system or whether a new “language of connectivity” has to be invented.

Mathematical modelling is not new to biology. Models such as the logistic model of growth of populations (proposed by Verhulst, 1838) have been around for over a century. Other notable models include the work of Hodgkin and Huxley (1952), who developed a quantitative model of the ionic mechanisms involved in neuronal stimulation in 1952 and subsequently received the 1963 Nobel prize in physiology and medicine for their role in discovering the ionic basis of nervous conduction. What is new to biology is the huge explosion of “omic” data and the impact that it has had on the motivation for modelling. Whenever there has been a paucity of experimental data, modelling has been out of vogue. When there is a surplus of experimental data, then the first reaction is to look for ways to make sense of this excess of information. Quantitative analysis is sometimes an afterthought that is necessitated by the need to analyse huge amounts of data and to detect underlying patterns. The field of bioinformatics was an outgrowth of this need. Analysis of data alone (as done in bioinformatics) is not sufficient to understand how the components are connected together and especially how they interact in a dynamic manner. One needs an appropriate framework to describe the dynamically changing glue that connects everything together. This requires the change in focus from the analysis of data (bioinformatics) to creating a modelling or conceptual framework to understand and use the data (Systems Biology). Bioinformatics is an initial response to large amounts of data and an attempt to make sense out of the data while Systems Biology goes beyond the mere analysis of data towards the next step of creating a conceptual and knowledge framework in the language of mathematical models.

Although mathematical modelling has been around for long (Bailey, 1998), the term “Systems Biology” has been used commonly in the literature only recently (see Figure 1). Since its inception, there have been several reviews and textbooks on this topic (Ideker et al., 2001a; Kitano, 2002; Ge et al., 2003; Wiley et al., 2003; Kell, 2004; Weston and Hood, 2004; Snoep and Westerhoff, 2005; Palsson, 2006; Swedlow et al., 2006; Alon, 2007; Bruggeman and Westerhoff, 2007) and this term has been defined in a variety of ways. For example, the most common definition of Systems Biology appears to be one defined by Selinger et al. (2003) who state that Systems Biology is “the study of how the parts work together to form a functioning biological system” (Selinger et al., 2003; Snoep and Westerhoff, 2005). An alternative related definition is provided by Gutierrez et al. (2005), who state that Systems Biology is “the exercise of integrating the existing knowledge about biological components and extracting the unifying organizational principles that explain the form and function of living organisms.” Yet another related definition is provided by Leroy Hood (Hood et al., 2004), who state that “Systems Biology is a scientific discipline that endeavours to quantify all of the molecular elements of a biological system to assess their interactions and to integrate that information into graphical network models that serve as predictive hypotheses to explain emergent behaviours.” Hence, there appears to be an emerging consensus on the broader level definition of Systems Biology although the details such as the emphasis on modelling tools are perhaps not conserved. In addition, another key aspect that has been emphasized previously is the modular or hierarchical nature of biological systems that naturally requires systems level analysis (Hartwell et al., 1999; Trewavas, 2006). Interestingly, Trewavas uses a 1974 quote from Francois Jacob “Every object that biology studies is a system of systems” to highlight the hierarchical nature of biological systems in his perspective.

In this review, in addition to the perspective of Systems Biology as the study of behaviour emerging from hierarchical dynamic networks, we emphasize the interplay between the modelling and experimental components and its importance in Systems Biology research. In fact, most of the papers reviewed in this manuscript are papers that contain both modelling and experimental aspects of research (Figure 1) rather than either by itself.

The ultimate benefit from models in Systems Biology is in the design or reengineering of biological systems or processes. For example, in industrial biotechnology, one may want to use a model to design a cell to produce a protein or product that it may not have produced before. A goal in plant biotechnology may be to engineer disease resistance in plants. Genetic improvement of systems may require the introduction (or deletion) of several genes. Such interventions may perturb the entire dynamic network of relationships among genes, proteins, metabolites, and cellular structures. The successful design of a new system may require time-consuming and expensive trial and error experiments. In engineering, design or changes in processes are often done with the help of models and simulations. In biology, design or reengineering rarely makes use of quantitative mathematical models and is usually done by “Edisonian” trial and error. When quantitative models begin to be effectively and widely used for design and reengineering of biological systems, then the golden age of Systems Biology will have arrived!

In this manuscript, we start with a detailed primer on the use of mathematics as a language of connectivity in Systems Biology (Section “Role of Models in Systems Biology: Can Mathematics Be a Good Language of Connectivity?”). This introduction is meant for individuals with stronger expertise in biology than in mathematical modelling. Section “Conceptual Basis for Classification of Models: What are the Metrics of Usefulness of Models in Systems Biology?” provides a conceptual basis for classifying models in terms of their utility in Systems Biology. We also address the considerable challenges involved in the integration of models and experiments. Section “Selected Manuscripts Based on Specific Biological Impact” consists of several examples of papers from the literature that illustrate the various concepts detailed in this manuscript. This review paper is not intended to be an extensive review of all work done in the field. With modern Internet search tools, it is easy to get a comprehensive listing of papers in a particular area. However, for a novice to the field, this review can be used as a guide to classify results from such search engines into a conceptual framework that leads to a better understanding of the developments in the field of Systems Biology.

### ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?

- Top of page
- Abstract
- INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
- ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
- CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
- SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
- SYSTEMS BIOLOGY TOOLS
- EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
- REFERENCES

In the life sciences, we have made a sudden and dramatic transition from a data poor environment to an extremely data rich “omics” context. This data rich world is beginning to have a major effect on the degree of complexity of potential models and the motivation for building such models. In the past, models have focused at a macroscopic level of the system detail because of the inability to get sufficient experimental information at the molecular level. Detailed models only have meaning to the extent that they can be “validated” or connected to biological reality. Validation requires experimental data. In the absence of experimental data, the motivation to write very detailed models is indeed limited. The model of Domach and Shuler (Domach et al., 1984) is a rare example of a model with a large amount of detail long before the omic measurement revolution. However, recent advances in analytical technologies has enabled genome wide measurements of gene, protein expression as well as metabolite profiling (Ferea and Brown, 1999; Fiehn et al., 2000; Pandey and Mann, 2000), which again has led to the development of more sophisticated models that can describe a number of physiological descriptors.

Systems Biology is an interdisciplinary field and it is difficult to be an expert in all aspects of the field. This is especially challenging when making the connections between experiments and modelling. This section has been written for individuals with a strong background in experimental aspects of Systems Biology and who may want to fully understand the role of mathematical modelling in biology. A good appreciation of the role of mathematical models can result in strong synergies between experiments and mathematics and the potential for significant breakthroughs. In this section, we have provided a basic understanding of the process of mathematical modelling, the experimental validation of models and the basis for selecting the degree of detail in the model.

#### The Process of Mathematical Modelling: Eight Steps to a Model

Modelling is the process of representing a “real system” in a symbolic language. The new language may be mathematical in which case the process of “translating reality” into mathematics would be termed as “mathematical modelling.” It is extremely important in the modelling process to understand, “What is lost in the translation of the real system to a model?” If only limited aspects of the real system are sampled, one may have very inadequate and completely incorrect or incomplete “model” of the system. The quality of modelling depends on the quality of understanding or knowledge about a system. In some cases, there may be little or no connection between the mathematical model and the real system. Such incomplete or inadequate models are what often lead to the high degree of cynicism or distrust of mathematical models among biologists. It is therefore extremely critical to have the capability to discriminate between inadequate, incomplete, or misleading models versus high quality models based on a sound understanding of the system and clear goals. Whether a model is complete or adequate depends largely on the goals and expectations of the systems biologist and the reasons why the models were built. Understanding the steps involved in building a mathematical model can help the experimentalist build (or at least identify) useful mathematical models.

There are eight key steps in building a mathematical model of a biological system:

**Step 1**: *Define Goals and Expectations: where to focus.*

**Step 2**: *Define Assumptions: what is important, what to include, and what to rule out.*

**Step 3**: *Define Variables: select entities that are relevant to the modelling goals.*

**Step 4**: *Define System Connectivity among the Variables: what is connected.*

**Step 5**: *Define Functional Relationships and Equations: how are things connected and how they change.*

**Step 6**: *Identify Values of Constant Parameters in equations: quantifying relationships.*

**Step 7**: *Run Computer Simulations to obtain Trends: visualizing connections.*

**Step 8**: *Validate the Model: qualitative and quantitative confirmation.*

Once the goals of the model are defined and the expectations of the biologists are clarified, one defines the key assumptions that are to be made. This leads to a definition of the variables that are considered important and to be included in the model. It is helpful if the variable of the model can be measured experimentally. The next step in the modelling process is to define the critical connectivities among the key variables of interest within the system. How are the variables of the system related to each other in time and space? Is there a strong or weak connection or no connection at all? Is there a linear relationship among the variables or a nonlinear relationship? Is there a cause and effect relationship or is there a circular relationship as in the case of regulated phenomena with feedback loops? Are the variables related through a metabolic reaction network? Or are the variables part of an information network that alerts or signals the biological system about an impending problem or the necessity to take decisive action (such as turning on or off of a gene or changing the shape or activity of a protein)? Are the variables related by their sharing a physical structure or spatial location (e.g. a ribosome, mitochondria membrane, tissue, or organ)? Can such connectivities be quantified experimentally? Once the connectivity of the system is defined, one comes up with the functional relationships and the equations describing the system. Modelling is an evolutionary process. One starts with a first version of the model and refines it through multiple iterations (Figure 2). The first model is rarely the adequate one. It is only through a process of successive refinement that models improve to a point where they can be of utility. Computer simulation of mathematical models and qualitative validations using domain knowledge can play an important role in speeding up the process of model refinement. The next subsection discusses the process of model refinement and validation in greater detail.

#### Model Refinement and Validation: Synergistic Interplay between Mathematics and Biological Reality

The key to the success of Systems Biology lies in the strong interplay between experimental work, biological domain knowledge, and mathematical models. The reconnection of the mathematics back to the underlying biological reality is an essential part of Systems Biology. This iterative interplay between biological reality and mathematical modelling has to occur at “every step” in the modelling process and is illustrated in Figure 3. For example, biological domain knowledge and availability of experimental data are key to deciding on the variables to be included in the model, the degree of biological detail, and in the simplifying assumptions that can be made on the connectivity of the system. Experimental data can provide input on the functional mathematical forms that can be chosen and on the values of the parameters in the model. The variables, assumptions, functional forms, etc., suggested by the experiments need to be modified in an iterative manner until the model is fully validated. The modelling process is successful when the evolutionary process of iterating between biology and mathematics leads to a continuous improvement in the level of understanding of the biology. Modelling is a continuous process where many views of a subsystem have to be eventually integrated into a “systems” view of all key aspects of reality. No model may be perfect but each successive iteration of the model may lead to a “better” model and thus greater success in achieving the goals of the modelling process. A well-trained Systems Biology team can be characterized by the ability to catalyze such a dialectic process. An example of this interplay between modelling and experiments can be seen in the work of Agrawal on modelling programmed cell death in *Arabidopsis* plant defense systems (Agrawal et al., 2004). In this work, an initial hypothesis on the connectivity of the signaling network was successively refined until the simulated results were validated by the experimental observation.

The identification of the quantitative values of the parameters is a very challenging task in modelling. It is possible to develop mathematical models with very little biological insight behind it. Such models are useless unless they are “validated” or connected to the real world. This process of validation may involve both qualitative validation as well as quantitative validation. In qualitative validation, one has to ensure that the qualitative trends are correct. For example, an expected trend may be that the population of cells in the bioreactor is increasing with time or the level of nutrients in the bioreactor is decreasing with time. Such qualitative validation can be based on “intuition” or “biological knowledge” or “biological common sense” or “deeper biological knowledge” of the system. Quantitative validation, however, requires real numbers for each of the parameter values. It is now no longer sufficient to say something is increasing or decreasing but one needs to predict the exact numerical value of the variable in space and time. This is only possible if the mathematical functions and equations relating the variables are accurate and if the numerical values of the parameters are exactly identified. The task of parameter identification is considerably easier if one has a simple model with minimal variables and few parameters. The degree of detail in a mathematical model is extremely important and is often a major function of the ability to experimentally validate the model. This is discussed in greater detail in the next subsection.

#### Mathematical Model Complexity Versus Degree of Biological Detail: Top-Down Versus Bottom-Up Models

A major challenge for a systems biologist is: what is the right degree of complexity of the mathematical model? The answer is best given by someone with deep domain knowledge about the biological system that is being modelled and a genuine understanding of the goals of the modelling process. The choice of a simple or complex model is often dictated by the modelling goals as well as by the quantity and quality of biological information available about the system. The art of mathematical modelling requires a trade-off between mathematical simplicity and biological sophistication. The main disadvantage of simple (top-down) models is that it may not capture all the biological details of a system that are essential to describing a behaviour of interest especially if such behaviour was not considered during its development. If a certain biological component or phenomena is missing from the model, then one lacks the ability to understand or manipulate the missing component. There is a temptation to build detailed (bottom-up) models just to incorporate the new types of experimental data that are being generated. However, one should check if the additional data is contributing to the achievement of the modelling goals. This challenge is even greater today in the context of the explosion in experimental data resulting from the “omic” measurement technologies.

A mathematical model of a biological system that includes all the physiological descriptors of the system would consist of tens of thousand of partial differential equations that define the location and concentrations of various molecules at each point in time. Such an approach at model building would be considered a “bottom-up” approach where one starts with all available details and then builds the most elaborate feasible model. Many purely descriptive models are of the bottom-up type (such as the e-cell, virtual cell modelling framework developed by Tomita et al. (1999), Takahashi et al. (2003), and Loew et al. (2001)). One of the earliest examples of a bottom-up descriptive model was that of Domach–Shuler (Domach et al., 1984). More recently, Castellanos et al. (2004) have been working on making these detailed descriptions more modular. Most bioinformatics approaches are of the bottom-up type although they rarely go beyond data analysis methodologies into the realm of mathematical modelling.

The approach of Entelos (Defranoux et al., 2005) to modelling of diseases is on the other hand very different and can be called a “top-down” approach. Manuscripts may be classified as “top-down” approaches where one starts from the whole system level and bores down into detail as needed. In chemical engineering, one uses the term “lumping” to describe the process of treating a large number of variables as a single mathematical entity. One starts with the minimum level of detail and only adds detail according to the demands of the goals of the model. Variables that cannot be measured or experimentally validated may be avoided in such a top-down approach. Variables that are not relevant to the goals and expectations of the model are not included. The cybernetic modelling approach proposed originally by Dhurjati et al. (1985) is an example of a top-down approach that tries to avoid going into too many biological details by focusing on the apparent “goal” of the system. The cybernetic approach is analogous to focusing on the general of an army and trying to figure out his strategy rather than keeping track of what all the soldiers are doing. The living system is considered to have a goal that it may have acquired over the course of evolution and this cybernetic goal is used mathematically to reduce the model complexity. The flux balance models later proposed by Majewski and Domach (1990) and Varma and Palsson (1994) have a similar concept of an optimal goal.

The choice of the degree of detail in a model may eventually be determined by more practical considerations such as the ability to accurately determine the numerical values of parameters and the ability to perform validation experiments. There is an optimal degree of detail dictated by the balance between being inclusive and being useful.

### CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?

- Top of page
- Abstract
- INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
- ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
- CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
- SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
- SYSTEMS BIOLOGY TOOLS
- EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
- REFERENCES

The metrics of “utility” or “success” in the modelling process depend on one's goal. There are several reasons for building mathematical models in Systems Biology. One may build a model to organize information, to test alternative hypotheses, to provide guidance for new experiments, to aid in data analysis, to uncover regulatory networks, or to discover underlying principles, etc. Alternatively, one may want to use a model to “predict behaviour in new domains” thus alleviating the need for further expensive experiments. Or one may just try to “fit data” over a limited domain so that one can interpolate or predict behaviour in that limited domain. The goal of a model may also be to take “control action” in order to “maintain the system” within a range of behaviours or to respond to extreme or crisis situations. The success or failure of a model depends on the ability to achieve the stated goals of the modelling. If one achieves the goal, one has succeeded in the modelling endeavour. Too many models claim to predict but it is a rare gem of a model that truly has predictive ability. Models are governed by the rules of “garbage in–garbage out”. A hollow inadequate model of a system will produce misleading and garbage predictions. Simple models are as likely to be inadequate as complex sophisticated models. The ability to achieve the goals of the modelling is the ultimate metric of model quality.

Manuscripts may be classified based on the type of applications that are being considered. For example, the application areas may include: application to medicine as in drug discovery; application to agriculture as in understanding plant disease resistance or genetic modification of plants; industrial applications as in production of recombinant proteins or metabolic engineering of new pathways, small molecules, etc.; environmental application as in bioremediation; energy applications as in biofuels, etc. One can also classify the manuscripts according to the specific system that is being modelled (e.g. a regulatory circuit, organelle, type of cell such as bacteria vs. yeast, tissue type, organ, animals, or plants). Models may also be classified based on the biological phenomena that have been included in the model such as gene induction/repression, regulation, transcription, post-transcriptional modifications, translation, post-translational modifications, folding and activity, binding of molecules, protein–RNA interactions or other molecule–molecule interactions, metabolic or signaling networks, regulatory circuitry, membrane-level phenomena, intracellular events, intercellular phenomena, biological transport, programmed cell death, network behaviour and interactions, multi-cellular phenomena, growth, differentiation, development, toxicity response. Manuscripts can also be classified based on the type of quantitative methodology used. In the case of purely data analysis type approaches (more Bioinformatics than Systems Biology) one could classify manuscripts based on whether one is using clustering, ANOVA, singular value decomposition, neural nets, genetic algorithms, etc. In the case of models, one could classify models whether they are based on flux balance (stoichiometry, matrix algebra) or whether they use ODEs, PDEs, delay differential equations, stochastic approaches, etc.

Conceptual classification of the various models that are available in the Systems Biology literature can act as a guide in exploring the large amounts of literature in the field. In this section, we have chosen to classify manuscripts conceptually on the basis of the biological impact (Figure 4).

#### Models for Organizing Information

A major role of models is in providing an organizing framework or a language to connect the various components and describe their relationships in time and space. This can provide a “big picture” view of how the parts are connected to the whole. Mathematics can provide a much more compact representation than verbal or pictorial descriptions. For dynamic systems, mathematics may be the only way to describe complex, non-linear, spatiotemporal relationships among components (as it is not easy to describe a partial differential equation relationship in words or pictures!). However, spatial organization of the parts may sometimes be better illustrated in a picture than in an equation! Even if the model is not able to describe all the relationships in precise quantitative terms, the very process of organizing the information about the system and representing it in mathematical terms can have beneficial effects. This step might be the basis for subsequent detailed quantitative analysis and simulation. A mathematician or engineer may find it much easier to understand a set of partial differential equations rather than very detailed and qualitative biological descriptions. A peripheral goal of such modelling may thus be as a linguistic translator for “communicating information across disciplines.”

#### Models for Guidance in Design of Experiments

The goal of modelling may be to design the next set of experiments to improve the understanding of the system. In Systems Biology, an important reason to model is to provide guidance prior to doing experiments. What should one measure, when, and how many times? How much confidence can one have in the measurements? For both the modeller and the experimentalist, it is not the “amount” of data that is important but the “nature and quality” of data and its relevance which is critical. If one is doing a micro-array experiment with an *Arabidopsis* plant system, should one be collecting data every few minutes, every few hours, or every few days? What are the other important molecules and when should they be sampled? Should one be measuring metabolite information or should one be focusing on data at the level of genes and proteins? Models have a role in the proper design and conception of an experiment in order to get the most out of expensive experiments. For example, a model can provide some initial and approximate ideas on the dynamic characteristics of some of the measured variables and thus guide the experimentalist on when to take data in order to capture the dynamic behaviour. Quality of experimental information also has an impact on determining the usefulness of model itself. The proper design and conception of the experiment allows one to get the most out of the model. Recently, Gadkar et al. (2005) have developed techniques to determine the optimal set of experiments that will lead to the maximal identifiability of parameters in models of large-scale biological networks. Such techniques can be valuable to design the most informative experiments to discriminate different hypotheses.

#### Models to Understand Dynamics and Regulation, Discover Novel Interactions and Organizing Principles

Once one starts dealing with highly time-dependent (dynamic) phenomena or systems with large number of regulatory feedback loops, then one immediately starts understanding the extremely valuable role of mathematical models. One can get away without mathematical models for organizing static information, for analysis of simple data, for design of simple experiments, for testing of simple hypothesis, or for understanding simple underlying patterns. However, the key and defining characteristic of life is change. The state of a living system has to be characterized in terms of dynamically changing physiological descriptors. The responsiveness and adaptability of living systems is because of the sophisticated regulatory networks at all levels ranging from gene-level networks to protein and metabolite level networks to signaling networks. It is impossible to reason non-mathematically through the dynamic effects of the interactions among a large numbers of regulatory networks. Mathematical modelling and systems engineering tools are a necessity. The incorporation of regulation into the modelling framework is not new (Goodwin, 1963). Both dynamics and regulation are complicated and one cannot deal with it in the absence of a mathematical framework. In the best of circumstances, models and experiments working in concert can lead to the discovery of novel organizing principles underlying biological systems, such as the goal of “optimal resource utilization” in modelling bacterial metabolic networks (Schuetz et al., 2007).

#### Models to Guide in Data Analysis

The goal of data analysis is to convert large amounts of “data” to “knowledge”. This goal is not unique to biology. In chemical engineering, there has been considerable work done in the analysis of data coming from chemical plants in order to recognize “faults” (Petti et al., 1990; Venkatsubramanian et al., 2003). This requires recognition of the data pattern or fingerprints or fault signatures generated by a specific fault. The fault fingerprint or fault signature may be obtained from domain knowledge that may be stored in the form of rules, stored as a pattern in a computer memory or as a dynamic pattern generated by a model. Many of the ideas and methodologies used in the other fields can be borrowed fruitfully to analyze biological data.

Experimental data are analyzed to obtain parameter information or to discover underlying patterns. Certain types of data analysis, such as regression, clustering, PCA, ANOVA, support vector machines, do not require “formal” mathematical models. One can regress coefficients for polynomial expressions that cannot be considered as mathematical models. However, if one is trying to detect dynamic patterns or complicated dynamic motifs, then one needs to have a mathematical model to generate and recognize these dynamic patterns (Michaud et al., 2003). Biological domain knowledge and models can substantially enhance data analysis capabilities and can lead to much better analysis and understanding of the data. Weather data, chemical plant data, satellite image data, etc., all have different underlying characteristics. It is sometimes easier to diagnose certain patterns when one has domain knowledge that may be encoded in models. The earliest motivation for quantitative approaches in Systems Biology was the need to make sense out of an ever-expanding pool of experimental data. The initial approaches in bioinformatics were a good start but as one deals with dynamics and non-linear phenomena, one starts seeing the need for knowledge-based data analysis and model-based data analysis.

#### Models for Hypothesis Generation, Discrimination, and Testing

The real essence of a mathematical model may be found in the assumptions that facilitate the translation of physical reality into a set of equations. The model may thus be considered to consist of two different parts: (1) a set of mathematical equations and (2) a set of assumptions. Too often, one equates a mathematical model with a set of equations and does not pay sufficient attention to the assumptions. However, the assumptions are just as important as the equations, both to the development of the model and its application. Each set of assumptions may lead to a different set of equations or system parameters. Each set of assumptions can be thus considered to be a hypothesis and one can test many different hypotheses using models. Besides assumptions about what is connected to what and how, there are several other types of modelling assumptions. These include assumptions about rates, assumption about feedback and nature of feedback (positive, negative, more complex), assumptions about what are the key variables influencing the system behaviour (what variables to focus on and what behaviours to ignore), assumptions related to system parameters, assumptions related to spatiotemporal aspects, etc. Many of these assumptions can be tested mathematically and computationally to get system insights and provide guidance for experiments. One could also do hypothesis testing via experiments but this is more expensive than mathematical or computational hypothesis testing. Thus models can be used to prune down the potential set of hypothesis so that the experiments can be focused on discriminating between a few key hypotheses.

An example of a modelling hypothesis is the assumption of a particular type of connectivity between various components (Agrawal et al., 2004). There are many possible ways to connect signaling molecules or other cellular components. One could mathematically or computationally test the consequences of each of these alternative hypotheses and see which ones are biologically plausible. Each hypothesis leads to a different model and one can “test” each of these models to figure out which hypothesis results in behaviour that is closest to reality. Such an approach has proved to be very fruitful in the work of Agrawal et al. (2004) on the *Arabidopis* plant defense system. One can have a systematic way of testing hypotheses starting with a few simple hypotheses and progressively refining them with successive modelling iterations. From this point of view, modelling is the process of hypothesis building and testing. This is also the essence of the scientific method.

#### Models as Process Substitutes and for Reengineering and Design of Systems

A model that has been validated and in which one has sufficient confidence can be used as a very valuable tool for design and reengineering. A validated model in which one has sufficient confidence can be used as a substitute for the real system or process to ask “what if” questions and to simulate a large range of scenarios. As computational experiments are much less expensive than real experiments, using a validated model as a process substitute can have great value. The manipulation of a system in order to improve or optimize it can be very difficult and expensive if it is done in a purely empirical and Edisonian manner. Incorporation of models into the design loop can be a very effective. Most advanced industries, such as the chemical, automobile, and aerospace industries already integrate models into their design on a regular basis. One should expect the same to occur in Systems Biology once high-quality models are available. In synthetic biology, one is trying to come up with novel genetic regulatory elements that can act together as a module and exhibit simple and unique engineering functions such as a toggle switch (Gardner et al., 2000), oscillator (Elowitz and Leibler, 2000) with the eventual goal of assembling all these elements into modules of increasing complexity such as even a synthetic cell. In genetic engineering and metabolic engineering, one is trying to manipulate the genes and metabolism in such a way as to achieve some system or process goal.

### SYSTEMS BIOLOGY TOOLS

- Top of page
- Abstract
- INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
- ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
- CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
- SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
- SYSTEMS BIOLOGY TOOLS
- EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
- REFERENCES

The future of Systems Biology depends on the training and education of a new generation of researchers capable of doing work in a highly interdisciplinary field and also on the availability of research tools and infrastructure that allows one to bridge the disciplinary gaps. We conclude with a few comments on research tools and education.

In the last few years, increasing number of tools and software have been available for modelling biological systems. These include both academic grade and commercial grade modelling software. Some of the examples of academic grade software include Gepasi, Systems Biology Workbench, MetaFluxNet, CellNetAnalyzer, FBA that are summarized in Table 1. Most of the software allow for the simulation of a biochemical network once the pathways and the kinetic parameters are known. Some of these also include dynamical analysis software that is valuable for the assessment of stability and other dynamic properties. Several companies have also developed tools for modelling and analysis including Mathworks (Natick, MA), Genomatica Inc. (San Diego, CA). Gene Network Sciences (Ithaca, NY), and Entelos (Foster City, CA), although in most cases the modelling platform is proprietary and is not publicly available. A more detailed list of tools in Systems Biology is also available at http://sbml.org/.

Table 1. List of academic and commercial softwares and their applications | Application | Source |
---|

Academic software |

CellNetAnalyzer | Simulation and topological analysis of metabolic and signaling networks | www.mpi-magdeburg.mpg.de/projects/cna/cna.html |

Biology workbench | Web-based tool for searching protein and gene sequence databases | http://workbench.sdsc.edu/ |

GENESIS | Simulation platform for neural systems consisting of biochemical reactions and single neuron models | www.genesis-sim.org/genesis/ |

Gepasi | Biochemical kinetic simulator, metabolic control analysis, optimization, and stability analysis | www.gepasi.org |

MetaFluxNet | Metabolic flux analysis, flux balance analysis | mbel.kaist.ac.kr/mfn |

Systems Biology workbench | Platform connecting applications for modelling, analysis, visualization, and data manipulation | sbw.kgi.edu |

VCELL | Remote modelling and simulation environment | vcell.org |

Commercial software |

SimPheny^{TM} | Models of metabolism in simple and complex organisms | Genomatica Inc. |

SimBiology^{TM} | Framework for developing and analyzing models of biological systems | Mathworks Inc. |

Physiolab^{TM} | Models of human physiology | Entelos Inc. |

VisualCell^{TM}, VisualHeart^{TM} | Platform for data integration and simulation, cardiac modelling | Gene Network Sciences Inc. |

In silico discovery | Models of cellular metabolism | In silico Biotechnology GmbH |

The abundance of biological models has led to the definition of a Systems Biology markup language SBML (http://sbml.org/) that serves to standardize the models to enhance their portability across different software. Finally, model databases are also becoming available where researchers can post their model of a biological system for community use. Examples of these model databases include DOQCS (doqcs.ncbs.res.in), BIOMODELS (http://www.biomodels.net/) database. Alliance for Cell Signaling has made available data from a variety of studies relating to signaling networks at http://www.signaling-gateway.org/data/Data.html for the research community. Similar gateways also exist for other biological fields, such as cell migration (http://www.cellmigration.org/index.shtml) and cell signaling (stke.sciencemag.org/cm). The future of Systems Biology research will be greatly aided by the availability of tools to seamlessly move between heterogeneous databases, powerful and easy-to-use analysis tools, modelling and simulation tools, visualization capabilities, easy access to domain knowledge, new technologies for experimental measurements, etc. In addition, novel computational approaches that can represent both the deterministic nature of well-studied subsystems with the biological uncertainty in less studied subsystems will be valuable in advancing the utility of the systems biology approach. These activities set the stage where comprehensive models of complex biological systems can be assembled by integrating the efforts of the Systems Biology community as a whole.

An area of major potential impact for Systems Biology is in medicine. Ultimately, health of a system is defined by its physiological state. From a Systems Biology point of view, health is defined by a certain state of the physiological components and a certain state of dynamic connectivity among the components. Disease in this context is a state with the wrong components or connections. The long-standing view of medicine has been that a drug molecule can somehow take a system from a state of disease to a state of health. As Jay Bailey states (Gibbs, 2001), “one reason why drug discovery technologies have not paid off as hoped is that they are based on the naive idea that you can redirect the cell in a way that you want it go by sending in a drug that inhibits only one protein,” Systems Biology provides a novel and revolutionary perspective on health, disease, and the way to pharmaceutical discovery (Hood et al., 2004). Many of the current problems of unanticipated toxicity of drugs that has been plaguing the pharmaceutical industry recently is a result of missing out on a systems level perspective (Bugrim et al., 2004; Ekins et al., 2005).

### EDUCATION OF FUTURE SYSTEMS BIOLOGISTS

- Top of page
- Abstract
- INTRODUCTION: WHAT IS SYSTEMS BIOLOGY?
- ROLE OF MODELS IN SYSTEMS BIOLOGY: CAN MATHEMATICS BE A GOOD LANGUAGE OF CONNECTIVITY?
- CONCEPTUAL BASIS FOR CLASSIFICATION OF MODELS: WHAT ARE THE METRICS OF USEFULNESS OF MODELS IN SYSTEMS BIOLOGY?
- SELECTED MANUSCRIPTS BASED ON SPECIFIC BIOLOGICAL IMPACT
- SYSTEMS BIOLOGY TOOLS
- EDUCATION OF FUTURE SYSTEMS BIOLOGISTS
- REFERENCES

The training and education of a successful systems biologist or mathematical modeller is a very challenging task. The individual has to have sufficient multi-disciplinary knowledge and background to be able to communicate seamlessly between the disciplines of life sciences and mathematics. The systems biologist also needs to have sufficient knowledge of the biological system to understand the goals of the biologist and to capture the essential features of the system in a mathematical language. The individual has to have sufficient mastery of mathematical, statistical, and engineering tools in order to facilitate the selection of the appropriate quantitative tools and methodologies. The ability of translating biology into the mathematics without losing key information in the translation process is important. The ability to interpret the mathematical models in a proper way to generate biological insights is also critical. The systems biologists should be able to communicate to the life scientists as to what the right level of expectations ought to be from the mathematical analysis. Excessive expectations from the mathematical models eventually result in unnecessary hype, or cynicism and distrust from the life scientists. On the other hand, excessive distrust of the models can lead to under-utilization of potentially powerful and valuable tools for furthering the frontiers of biology. A successful systems biologist should therefore attempt to balance these conflicting objectives of expectation and potential and should also have the ability to understand the biological questions and select the right quantitative tools in order to effectively answer these questions.

There are three ways in which one could train systems biologist. First, one can take individuals who are well trained in mathematics and computational sciences and provide them with biological domain knowledge to be effective. Second, one can take individuals with expertise in the biological domain and train them in mathematics and computation. The first two approaches are relevant in corporations or institutions where senior people are attempting to enter the field of Systems Biology. The third and most effective approach to training of systems biologists is to start at the undergraduate level and train a whole new generation of students who are equally adept at mathematics and biology. In fields, such as chemical engineering, it has been common to have students who are comfortable with both mathematics and chemistry. In Systems Biology, one needs to come up with a new class of students who are in command of both mathematics and biology and who can bring about new synergies between the two previously disparate fields.