Neil Dalchau was awarded the 2011 New Phytologist Tansley Medal for excellence in plant science. The medal is in recognition of Neil's outstanding contribution to research in plant science, at an early stage in his career, as presented in this article; see the Editorial by Dolan, 193: 821–822.
The use of mathematical modelling in understanding and dissecting physiological mechanisms in plants has seen many successes. Notably, studies of the component interactions of the Arabidopsis circadian clock have yielded multiple insights into the roles of specific regulators at the transcriptional and post-transcriptional level. In this article, I review the use of mathematical techniques in dissecting the Arabidopsis clock mechanism, covering first the well-established use of mechanistic models implemented as systems of nonlinear ordinary differential equations. In situations where mechanistic models are not appropriate, I describe how linear time-invariant (LTI) systems, a type of black-box model, can offer quantitative descriptions of biological systems that provide a systems-level understanding without detailed descriptions of the underlying mechanism. A comparison of the two approaches is provided to exemplify when LTI systems modelling might offer advantages for interpreting biological measurements. In particular, formal analysis of large datasets with LTI systems can offer genome-scale inferences, which is of timely relevance as novel experimental techniques are generating increasingly large quantities of data.
In physics, the equations of motion are thought of and referred to as the ‘governing equations’, and can be solved to compute trajectories of objects in space and time when subjected to various forces. The closest we have come to governing equations in biology are systems of ordinary differential equations (ODEs) which arise when applying mass action kinetics to biochemical equations. For example, the widely known Michaelis–Menten and Hill functions arise when describing the kinetics of enzymes (Hill, 1910; Murray, 2005). Models of gene regulatory networks described as systems of nonlinear ODEs where transcription factor regulation is described with Hill functions are also ubiquitous in biology (Karlebach & Shamir, 2008). Nonlinear ODE systems have been used to model a large number of mechanisms in plants, including the Arabidopsis circadian clock (Locke et al., 2005a,b, 2006; Zeilinger et al., 2006; Pokhilko et al., 2010), carbohydrate metabolism (Nägele et al., 2010) and photosynthesis (Poolman et al., 2000).
Modelling biological mechanisms with nonlinear systems enables a detailed description and analysis of the underlying biochemical interactions. For investigating the role of specific regulators (e.g. transcription factors, proteases, F-box proteins, etc.), this approach is particularly appropriate, as it is straightforward to simulate the effect of mutations and compare the dynamics with experimental measurements. However, this comes at a cost. The number of model variables and kinetic parameters can quickly grow as more details are represented within the model. Often the kinetic rate parameters are not known, and when they are, the uncertainty associated with those values may be high, as they may be measured in a different context to those desired, for example, different temperature, pH, reagents, in vitro vs in vivo. Taking these considerations into account, the level of abstraction for a model of a biological system should be largely determined by the level of detailed biochemical understanding a priori, and also the quantity and relevance of experimental measurements available for directing model construction.
When the level of biochemical understanding is limited, and few experimental observations exist for constituent components, mechanistic models may be of limited utility, as it is difficult to know the precise functional forms and the underlying topology of the regulatory network. In such cases, it can be more appropriate to describe the dynamics of biological processes with black-box models, which seek to describe the measured outputs signals, potentially driven by variable input signals. In the following sections, I describe first the use of mechanistic models in dissecting the Arabidopsis circadian clock network, then progress onto an example of using black-box models to understand high-level properties of circadian networks, discussing along the way the breakthroughs that have arisen, and the relative advantages/disadvantages of each approach (see Table 1 for a summary).
Table 1. A comparison of mechanistic and black-box models
Intuitive. Mechanisms are explicitly represented by equations or rules.
Very limited. Internal variables are hidden and generic.
Variable. Depending on the level of mechanistic detail desired. Model reduction techniques enable good efficiency.
Difficult and idiosyncratic. Many algorithms exist, but often unreliable.
For linear time-invariant (LTI) systems, parameter identification is efficient and mostly reliable. Nonlinear black-box models suffer analogous to nonlinear mechanistic models.
For ordinary differential equation (ODE) models, stability and bifurcation analysis tools exist. Frequency response analysis not straightforward. Many analyses rely on linearisation.
For LTI systems, frequency response analysis is straightforward and very informative.
The transcriptional feedback loops in the Arabidopsis circadian oscillator network
Circadian clocks have evolved to enable organisms to adapt to the daily rotation of the planet. Anticipation of daily cycles in light availability and temperature are conferred to cells by intricate biochemical networks of feedback loops that generate oscillations with a near 24 h period. In Arabidopsis, the first feedback loop identified was the transcriptional inhibition of TIMING OF CAB EXPRESSION 1 (TOC1) by dimers of CIRCADIAN CLOCK ASSOCIATED 1 (CCA1) and LATE ELONGATED HYPOCOTYL (LHY) that bind to conserved promoter motifs known as evening elements (EE) (Alabadíet al., 2001). Since then, a combination of reverse genetics and mathematical modelling has been instrumental in identifying and characterising an increasing number of components.
A series of mechanistic mathematical models from Professor A. Millar's laboratory (University of Edinburgh) have consistently contributed new theories and predictions regarding the relationships between central clock genes and light inputs (Locke et al., 2005a,b, 2006; Pokhilko et al., 2010; see Fig. 1 for a comparison). The first model illustrated the fundamental deficiencies in single-loop views of circadian clocks, which cannot account for robust timekeeping in monogenic mutations to CCA1, LHY or TOC1 (Locke et al., 2005a). However, this article introduced a general modelling framework that has been the foundation of subsequent models.
The major drawback of describing gene networks with equations in this way is the large number of unknown kinetic rate parameters upon which the resulting simulations will critically depend. To solve this problem, Locke and colleagues used parameter optimisation techniques to infer parameter values from the behaviours that the model sought to reproduce (Locke et al., 2005a). These techniques rely upon defining a measure of the distance of the model simulation from the experimental observations. For example, the period of oscillation in constant light (LL) or constant dark (DD) conditions and the phase relationship between the expression of TOC1 mRNA and CCA1/LHY mRNA were compared with experimentally determined values and a high cost assigned when the simulation deviated from the measured periods/phases. The optimisation procedure chosen was based on simulated annealing, which seeks to randomly traverse the parameter space in a step-wise manner, homing in on parameter values that minimise the deviation between the model simulated behaviour and the target experimental characteristics. An advantage of performing these global searches over the parameter space is that inconsistencies between model and data can be attributed to inaccuracies in the genetic network, as opposed to inaccuracies in the parameter values. A disadvantage is that is difficult to know in advance whether a sufficient number of experimental observations have been obtained to reliably constrain the parameter values.
Nevertheless, an impressive discovery was made whilst attempting to extend the single-loop model. As it was not clear which genes were fundamental to the operation of the clock, two hypothetical transcriptional regulators (named X and Y) were incorporated into an improved model (Locke et al., 2005b; Fig. 1b). Comparison of the simulated expression profiles for Y with experimental data measuring circadian-regulated genes revealed that GIGANTEA (GI) might confer the action of Y in the clock (Locke et al., 2005b). Subsequent experimentation has shown that GI could comprise up to 70% of the behaviour of Y, implicating GI as a central oscillator component (Locke et al., 2006; Martin-Tryon et al., 2007).
The most recent model of the Arabidopsis clock (P2010; Fig. 1d) aimed to incorporate additional components and experimental observations in order to provide a more complete mechanistic description of circadian timing (Pokhilko et al., 2010). In particular, the F-box protein ZEITLUPE (ZTL), which targets TOC1 protein for degradation (Más et al., 2003), was explicitly represented, including its stabilisation by GI and light (Kim et al., 2007). A further hypothetical component, the night inhibitor (NI), was included to enable tighter phase control of morning LHY/CCA1 expression. It was suggested that PSEUDO RESPONSE REGULATOR 5 (PRR5) is a candidate for NI, acting downstream of PRR7/9.
Owing to the success of identifying GI and the ability to incorporate new components regularly with extensive comparisons with experimental data, the plant clock community has largely accepted the importance of this work, and the models produced have laid the foundations for understanding this complex network more completely. The P2010 model is published alongside all of the data used to guide its construction (http://millar.bio.ed.ac.uk), enabling other research groups to extend the model with additional components, whilst maintaining a common footing with the existing published models. This offers scientists the opportunity to interpret their own perturbation experiments in a quantitative framework, the results of which are not intuitive because of the interconnected topology of the network.
Exploring the effect of external inputs on the Arabidopsis clock using mathematical modelling: sucrose modulation of the oscillator
Mathematical models offer a fast way to interpret surprising phenotypes. I was investigating the effect of exogenous sucrose supply on the circadian network in order to understand why oscillations of cytosolic-free Ca2+ ([Ca2+]cyt) are abolished when plants are grown on agar media containing sucrose (Johnson et al., 1995). To determine whether this effect was mediated by the central oscillator, oscillations of CCA1, TOC1 and GI promoter activity were measured in LL and DD conditions (Dalchau et al., 2011). The oscillations were weak or absent in DD unless sucrose was supplied in the media, despite there being minor or no effects in LL (Knight et al., 2008).
The models of the Arabidopsis clock available at the time were all derived through comparing the simulated behaviours with experimental measurements of seedlings grown in the presence of exogenous sucrose. Therefore, if any of the processes represented in the model are sucrose-dependent, then their simulated behaviour may not correlate well with measured behaviours of plants grown in the absence of exogenous sucrose. I supposed that the exogenous supply of sucrose would not re-wire the interconnections of the gene regulatory network, rather that it might simply perturb (accelerate or decelerate) some of these processes. Accordingly, the kinetic rate constants might be different in non-sucrose-buffered plants. To test this hypothesis, simulated parametric perturbations to an Arabidopsis clock model (Locke et al., 2006; Fig. 1c) were compared with experimental measurements to see whether the effect of removing sucrose could be reproduced (Fig. 2). Several parameters could be modified which abolished DD oscillations with no effect on LL oscillations. However, closer examination revealed that only one parameter change produced a single oscillation following the transition from light–dark (LD) cycles to DD, as observed experimentally. The simulations therefore predicted that this parameter, the rate of Y/GI transcription, is sucrose-dependent. Experimental evidence for this prediction was provided by showing that gi-11 null mutant plants were arrhythmic in DD when supplied with sucrose, indicating GI is required for the circadian clock network to report metabolic status.
The use of mathematical modelling as an experimental tool was vital for identifying the role of GI as part of a sucrose-sensing network. The ability to predict dynamics resulting from a wide range of perturbations reduced the need to perform large-scale experimental screens, which are costly, time-consuming and often lacking in temporal resolution. Furthermore, as many transcripts respond transiently to sucrose treatment, generating hypotheses of how sucrose signals permeate through the circadian network is not trivial.
The significance of sucrose sensing by the circadian oscillator in Arabidopsis is still not clear, though the reverse interaction, circadian regulation of metabolism, results in increased biomass and photosynthesis (Dodd et al., 2005; Ni et al., 2009). It is possible that interactions in both directions are necessary for these physiological benefits. Feedback loops are a natural mechanistic choice for improving performance and robustness in engineered systems (e.g. industrial process control) and have been frequently associated with improved performance, flexibility and robustness in circadian networks (Stelling et al., 2004; Rand et al., 2006).
Describing biological phenomena with black-box models
While the approach of using detailed mechanistic models has proved beneficial for understanding the transcriptional loops of the Arabidopsis circadian clock and its regulation by metabolic sugars, they are not appropriate in black-box modelling scenarios, where unknown mechanisms connect regulatory input signals to observable output signals. In such a situation, a more generic model may offer a reasonable approximation of the underlying mechanism, and help to interpret properties of the system in a quantitative way without needing to specify mechanistic details. Employing a rigorous formalism tailored to such black-box analyses can enable automatic model construction, simply from measurements of the input and output signals. In this way, a simple model can be quickly generated that quantifies the major contributions of each input signal to the observed output. The major drawback is that the internal variables of black-box models do not carry a physical interpretation, in general (see Table 1 for a comparison of the advantages/disadvantages of mechanistic models and black-box models). Examples of black-box modelling formalisms include neural networks, cubic (polynomial) splines and statistical models such as generalised linear models or Gaussian process models (for technical definitions, see Davison, 2003).
We and others have proposed the use of linear time-invariant (LTI) models for approximating biological mechanisms (Mettetal et al., 2008; Dalchau et al., 2010). LTI models are systems of ODEs in which the dependencies of model variables are linear. Therefore, LTI models are a form of linear black-box model, which carry some distinctions from nonlinear black-box models (Table 1). Biological systems are inherently nonlinear, so LTI models can only ever be an approximation of the underlying mechanisms. However, all models, whether linear or nonlinear, dynamic or static, deterministic or stochastic, are only ever approximations. The justification in any model derives from its ability to predict previously unobserved behaviours. In the absence of detailed mechanistic knowledge, it is not possible to write an appropriate nonlinear model, although it is possible to write down a first order approximation, in a general way. LTI models can be interpreted as such an approximation. What remains is to estimate the parameters of this approximation. Estimation and analysis of LTI models is far easier than for nonlinear models, meaning that a great deal more can be learned, if a good approximation of the real system can be obtained.
The general theory of systems identification (Ljung, 1999) enables the construction of several black-box model classes, including LTI models, which map input signals to output signals in the form of observation data (Fig. 3). Systems identification originates in industrial engineering, and has been used to represent dynamical behaviour of measurable outputs to applied stimuli (inputs) that are typically measurable variables or Boolean (on/off) signals. A common choice is to employ the prediction-error method to obtain estimates of the system parameters (Box 1). A suite of algorithms, including the prediction-error method, are available in the Systems Identification Toolbox for MATLAB.
Table Box 1. Linear time-invariant (LTI) models
An LTI model can be written in compact notation as
y = Cx
where y is the system output, u is the system input, x is the internal state and A, B, C contain the rate parameters which define the interactions between inputs, outputs and internal states. The internal states are latent variables, as they are not measured, but approximate the dynamics of the underlying mechanism. Providing latent variables enables models of varying complexity, enabling descriptions of increasingly complex dynamics.
The prediction-error method is commonly used to approximate the system parameters in A, B and C. The method is characterised by taking the observations of the input and output signals at a specific time, u(tk), y(tk), and obtaining a model estimate for the output at the next time, η(tk+1). By calculating the difference between the model estimate and the corresponding measurement at that time, y(tk+1) −η(tk+1), we obtain the prediction-error at that time-point. Squaring the prediction-errors at each time and summing these gives a measure of fitness for a parameterised model, compared with the whole dataset. The goal is then to adjust the parameters to the values that minimise the total prediction-error, which can be achieved very efficiently with existing techniques (Ljung, 1999).
A drawback of using systems identification methods is that the modeller is constrained to fitting experimental data-points, when qualitative features might be more desirable. For example, if the measurements of the input process are oscillatory, then unless the oscillation period of the output process is the same, it may not be possible to obtain a good model. This is because the simulated outputs will always take on the period of the inputs, preventing synchrony with the measured output. Consequently, it is important to carefully select appropriate measurements for model parameter estimation.
Analysing seasonal adaptation with LTI models
Many circadian-regulated processes change their time of peak activity relative to dawn between different photoperiods, as daylength undergoes seasonal variations (Michael et al., 2008). Several theories have been proposed that seek to explain how photoperiodic time measurement occurs, each combining specific roles for circadian and light signals in adapting the output rhythm. The ‘external coincidence’ hypothesis suggests light inputs entrain the clock but also act on the output rhythm to adjust timing (Bünning, 1936), while an ‘internal coincidence’ hypothesis suggests that light need only entrain the clock because multiple oscillator components can coordinate seasonal adaptation internally (Pittendrigh & Minis, 1964).
To determine how seasonal adaptation depends on the combination of light signalling pathways and signalling from circadian oscillators, my colleagues and I constructed and analysed LTI models of a range of biological processes (Dalchau et al., 2010). We began with a model of the circadian and light regulation of [Ca2+]cyt to demonstrate the applicability of LTI models in describing biological processes (Fig. 3). Light modulation, distinct from that mediated by the clock, was required to reproduce the change in oscillation phase between days with long (16 h light : 8 h dark) and short (8 h light : 16 h dark) photoperiods. To determine if a similar conclusion could be established at the transcriptome scale, 3503 LTI models were inferred from microarray datasets measuring transcript abundance in a variety of photoperiodic regimes over 48 h. Each model was permitted to incorporate two input signals, one from CCA1, and one from light.
Following model estimation, a cross-validation procedure was used to assess the predictive capability of each model by testing the fit to previously unseen datasets. Approximately half of the models achieved a correlation that exceeded 0.75. These highly predictive models were then used to analyse the contributions to the model dynamics of each input signal, using frequency response analysis. Transcripts whose peak activity changed oscillation phase between different photoperiods were associated with significant regulation from light signalling pathways, while those whose peak activity was independent of the photoperiod were predicted to be regulated only by the circadian clock. This finding is consistent with the external coincidence hypothesis proposed by Bünning (1936), which links seasonal variation in the timing of circadian oscillations to external light inputs.
Mathematical modelling has demonstrably enhanced our understanding of the plant circadian clock and its interactions with physiology. In the examples presented, I showed how mechanistic models have proven useful for investigating the transcriptional network underlying the Arabidopsis circadian clock. Models of increasing complexity have formalised the increasing understanding of the clock, consistently incorporating and predicting novel components (Locke et al., 2005b, 2006; Pokhilko et al., 2010). A quantitative representation of the Arabidopsis circadian clock (Locke et al., 2006) enabled us to predict how sucrose modulates clock function (Fig. 1; Dalchau et al., 2011), illustrating how models can be used not only for specific understanding of the components they represent, but for interpreting the role of previously unconsidered regulators. In any complex biological system, the number of components and interactions is large, so intelligent hypothesis generation helps to direct experimentation. This is a key contribution that mathematical modelling offers to biological research.
In contrast to mechanistic models, I have introduced the use of black-box models for describing and analysing biological systems. LTI models are black-box dynamical systems that can sometimes approximate complex behaviour observed in biological systems, and can be efficiently identified from experimental measurements using systems identification techniques. In situations where few pathway components have been identified and dynamical behaviour appears to rely on multiple input pathways, LTI models can be used to quantify the contribution of each input (Fig. 3). The efficiency and reliability of linear systems identification techniques facilitates rapid model construction, enabling genome-scale analyses that incorporate non-equilibrium dynamics. In this way, we provided evidence for external coincidence in the regulation of circadian outputs by the central clock and external light inputs, as proposed by Bünning some 65 yr ago (Dalchau et al., 2010). Linear systems identification therefore enables the interpretation of large biological datasets with dynamical models, offering insights beyond what is possible with purely statistical comparisons (Dalchau et al., 2010; Honkela et al., 2010). In the increasingly data-rich climate of biological research, LTI modelling offers a solution for assigning quantitative traits to biological mechanisms. Therefore, LTI models could be seen with more regularity in the coming years.
I am extremely grateful to my PhD supervisor Alex Webb for guidance and motivation during my PhD studies, and advice on this manuscript. I am grateful to the BBSRC for funding the work discussed in this review. Finally, I am grateful to the reviewers for technical advice on different modelling formalisms.