[ Steven Niederer (left) received an Engineering Science Degree at the University of Auckland in 2003. He completed his DPhil on ‘Modelling the rat left ventricle’ in the Computing Laboratory at the University of Oxford in 2008. In 2009 he was awarded a UK Engineering and Physical Sciences Research Council Life Science Interface Fellowship to develop mathematical models of heart failure. This work led to a position at the Biomedical Engineering Department at King's College London where he works on the translation of cardiac modelling techniques into clinical applications. Nic Smith (right) has formally held academic positions at the Universities of Oxford and Auckland and is currently Professor and Head of Biomedical Engineering at King's College London. His research is characterised by the development of integrated multi-scale and multi-physics models of the heart, which provide the ability to link biophysically detailed experimental data to integrated function from sub-cellular to the whole organ level. He has particular interests in cardiac excitation–contraction coupling and fluid structure interaction in the coronary system.]
Abstract The link between experimental data and biophysically based mathematical models is key to computational simulation meeting its potential to provide physiological insight. However, despite the importance of this link, scrutiny and analysis of the processes by which models are parameterised from data are currently lacking. While this situation is common to many areas of physiological modelling, to provide a concrete context, we use examples drawn from detailed models of cardiac electro-mechanics. Using this biophysically detailed cohort of models we highlight the specific issues of model parameterization and propose this process can be separated into three stages: observation, fitting and validation. Finally, future research challenges and directions in this area are discussed.
Physiology is the study of both the constitutive parts and the integrated function of biological systems. However, the advent of ‘-omic’ science has increasingly resulted in the field being reposed within reductionist frameworks (Noble, 2008). Despite the aim of many of these emerging fields to consider all the parts of physiological systems collectively (Evans, 2000) research endeavours have consistently focused on quantitating the constituents. This has led to less attention being paid to understanding the functional mechanisms and consequences underpinning these measurements and observations. At the same time the cost of data storage is decreasing while tools and techniques for rapidly and robustly obtaining large volumes of biological data continue to develop. This improved ability to create and store data means that the underlying physiology of many systems risks becoming exceptionally well described but still relatively poorly understood.
Exploiting this growth in data to increase our understanding of physiological function is now necessitating the linking of this, often fractionated, information over multiple spatial and temporal scales, species and populations. Conventional intuition-driven analysis rapidly becomes limiting in these circumstances. Overcoming these limitations is increasingly motivating, and in some cases even requiring, the use of computational and mathematical models to integrate data within biophysically constrained physiological frameworks (Davidson et al. 1995; Hernandez & Kambhampati, 2004). The potential application of this type of computational approach is widespread and has been previously recognised in multiple reviews. A central theme in the exemplar cases provided in much of this literature is the study of complex systems where experimental data alone have failed to provide functional insight. In these systems, where sufficient experimental data are available to characterise the underlying complexity, computational techniques have been used to formalise and represent the underlying system biophysics. Using these quantitative frameworks multiple, often disparate, data sets have been linked together to provide insight not available through observation alone.
Specific examples of the successful application of this approach span multiple spatial scales. In genetics where, despite comprehensive genome sequences now being available for multiple species, many aspects of gene function and regulation remain elusive (Benfey & Mitchell-Olds, 2008). Directly addressing this issue, computational models have begun to expose and unravel some of the intrinsic complexity of gene regulation and the link between genotypes and phenotypes (Collins et al. 2003). The elements of non-linear feedback mechanisms, multiple agents and significant quantities of data, common in genetics, are also present in the simulation and computational analysis of cellular physiology and function in systems biology (Kitano, 2002; Butcher et al. 2004). These models at the cellular scale provide the additional ability to simulate interactions between proteins and molecules, offering a link between cellular patho-physiology with pharmacology and drug treatments (van de Waterbeemd & Gifford, 2003; Deisboeck et al. 2009). Cellular scale models also enable the integration of information at the increasing spatial scales of the whole organ and body. These measurements, when at the scale of the organ, often represent emergent behaviour resulting from the integrated behaviour of the systems constitutive components. Applying models to these data provides an essential mechanism to link between organ scale and clinically relevant measurements with underlying cellular and subcellular function (Lee et al. 2009).
Despite the potential of this approach to derive function from observations, linking models to data remains a significant challenge. While this issue is generic, to provide a specific context with a set of concrete, and we hope relevant, examples from our own research community, we will focus in the remainder of the article below on the challenges faced in linking coupled electrophysiological and mechanical cardiac models (Fig. 1) to experimental and clinical data. Although the examples provided below are taken from this specific research area, the breadth of spatial and temporal scales and applications means these examples are generally transferable to other computational physiology modelling systems.
Models of cardiac electromechanics have been applied at the genetic (Campbell & McCulloch, 2011), cellular (Niederer & Smith, 2007; Campbell et al. 2008), tissue (Niederer & Smith, 2008; Land et al. 2011; Nordsletten et al. 2011) and organ scales (Nickerson et al. 2005; Campbell et al. 2009; Niederer & Smith, 2009; Gurev et al. 2010, 2011; Kerckhoffs et al. 2010). Most recently, significant technical advances have led to a renewed focus on translation of this approach to the clinic (Rudy et al. 2008) with computational models being used to simulate human cardiac function (Keldermann et al. 2010) and specific patient cases (Aguado-Sierra et al. 2011; Niederer et al. 2011a,b). A selection of these models is presented in Fig. 2. This transition towards clinical applications in electromechanics models parallels advances in defibrillation (Luther et al. 2011; Tandri et al. 2011) and drug interaction (Mirams et al. 2011) modelling studies and places renewed emphasis on the issues discussed above. Specifically, while it has significant potential to positively impact human health, the use of simulation results to inform treatment decisions only increases the importance of rigorously describing the link between data to model parameters and simulation results. The links between models and data are formed during three distinct phases in the model development process: observation, fitting and validation.
Consistent with the scientific method, the motivation for developing a computational physiology model begins with an observation, question or hypothesis. Intrinsically mathematical models are approximations providing a finite caricature of the unbounded real system. The initial observation is used to define the limits and specifics of the system under study, including the species, temperature, age and spatial and temporal scales of the model (Smith et al. 2007; Niederer et al. 2009). In clinical and experimental studies the impact of this information is generally recognized and these values are routinely reported. Unlike wet lab studies, where new and intrinsically relevant data from the system of interest are typically clearly presented as results and set apart from contextual and potentially less relevant external data introduced in the discussion, computational models are agnostic to the source or type of data. Unless otherwise specified, the relevance of data to the system understudy is not represented in the model. As a result the conclusions of computational studies are often derived from a mix of data with varying degrees of relevance to the original observation.
Defining the limits of the system of interest, explicitly stating approximations and clearly defining the envelope within which the model can be applied, provides the necessary context for interpreting modelling results. In some cases the system under study is general; for example, a ventricular, atrial or Purkinje cell, with specific details such as the temperature, species or specific cell location not explicitly stated. These models often retain the potential to provide important generic insights but are clearly more limited when aiming to represent specific experimental or clinical systems. While this issue remains in computational physiology studies, recognition of the importance of these factors is the trend in the number of recent computational physiology models now focused on addressing these issues (Fink et al. 2011). This work has resulted in second generation models, which integrate additional data to enhance the temperature and species consistency of a model parameterisation. This progression is demonstrated in Fig. 3, which shows the change in the experimental data dependence between an initial foundation mouse model (Bondarenko et al. 2004) and a more recent refinement (Li et al. 2010). Most recently, this evolution of model specificity has been further focused in our group to account for specific genetic strains within a species (Li et al. 2010, 2011).
The second link between experimental data and models in the development process is fitting parameter values. This fitting process is underpinned by three elements: labelling data and dependencies, processing data and determining parameters. To fully describe the fitting of model parameters to data requires a transparent description of the link between the value used in the final simulations and the actual recording made. This description is provided by accurately labelling the source of experimental data used to fit model parameters and labelling the dependency of model parameters on specific data. Despite the relatively low cost and high value of comprehensive labelling it is often absent in modelling studies.
Determining model parameters from experimental data often requires a degree of data processing. When measurements are made from the system under study, data processing can involve multiple steps with the raw measurements being transformed, calibrated, filtered and smoothed to remove noise and infer more physiologically meaningful measurements. For example, tension development in single cardiac myocytes can be calculated by measuring the bending of carbon tweezers with known stiffness (Iribe et al. 2007), the cytosolic Ca2+ concentration or the cell membrane potential can be calculated from fluorescence measurements (Bishop et al. 2007; Li et al. 2010) and endocardial electrical activation times can be extracted from non-contract catheter mapping systems (Niederer et al. 2011a). The process of transforming these indirect measurements into physiological values that feed into the model development process is thus clearly important for model fidelity and reproducibility. Once the data are processed the physiological values can then be used to determine model parameters.
In some cases it is not possible to make direct measurements in situ or the desired measurement is confounded by other signals. In cardiac electromechanics ion channels currents can be isolated in expression systems, sarcomere kinetics can be measured in skinned preparations and SERCA-ATPase can be characterized in sarcoplasmic reticulum vesicles. Although all of these approaches allow the properties of specific proteins to be characterised, the isolation process can often have an unknown effect on function. Expression systems may lack small but important channel subunits or post-translational regulation (Tseng-Crank et al. 1990; Blair et al. 1991; Paulmichl et al. 1991), altering the channel kinetics. Skinning alters the chemical environment of the sarcomere and may compromise sarcomere proteins causing the observed decrease in active tension Ca2+ sensitivity in these preparations (Niederer et al. 2006). Creating sarcoplasmic reticulum vesicles compromises the cellular environment, removing potentially important regulatory mechanisms of SERCA function. Mapping parameters from these reduced or altered systems to cardiac myocytes and whole organ systems poses significant challenges. Previously these differences have been assumed nominal or parameters from the altered system have been scaled to match a limited set of in situ observations (Niederer et al. 2006). However, in the case of expression systems, recent models that link channel structure at the protein scale to function (Silva et al. 2009) may provide a novel method for quantitatively linking expression system measurements to in situ channel kinetics.
Parameters are typically determined by minimizing a cost function describing the difference between a model result and the corresponding experimental or clinical measurement. In some cases measurements relate directly to model parameters; for example, the length–tension relationship in a cardiac myocyte can be measured directly and can be represented explicitly in the model (Niederer et al. 2006; Rice et al. 2008). Alternately, model parameters can be fitted to integrated emergent phenomena, such as fitting the sodium channel density to achieve a desired action potential upstroke (Li et al. 2010). In these, often numerous, situations it is only possible to fit parameters to these system scale observations. However, the difficulty in fitting parameters to systemic phenomena is that any errors from fitting other model components accumulate in the parameters that are fitted to system scale observations. This process, thus, can significantly decrease confidence in the value of parameters estimated using system responses (Niederer et al. 2009). Once the cost function is defined it can then be minimized manually or using one of many minimization algorithms. However, it is important to note that these approaches are not always successful or viable due to (1) disparities in model complexity and information contained in experimental data and/or (2) non-linearities in the model leading to fitting algorithms identifying local minima. For cardiac cell models it has been shown that in some cases it is impossible to fit a unique set of parameters to the model equations (Sarkar & Sobie, 2010). This difficulty in fitting parameters has been addressed by either manually fitting parameters or by developing the models in a form that enables parameters to be directly identifiable (Fink & Noble, 2009). The inclusion of sensitivity studies has also been adopted to provide additional information regarding the dependence of model outcomes on specific model parameters (Li et al. 2010; O’Hara et al. 2011). The goal of these sensitivity analyses is to separate model parameters into those that have a significant impact on model outcomes and those that have only a nominal effect. The parameters that have a significant impact must all be well constrained by experimental data; if an important parameter is poorly constrained this significantly reduces the confidence in any conclusions drawn from the model results. Conversely the data requirements underpinning parameters whose variation is identified as not producing large changes in key predictions are lower.
The final crucial stage in model development is validation. Rather than the validation of a model being considered binary i.e. a model is validated or it is not, we argue the degree of validation should be thought of as a spectrum. At one end there is no validation where the model can only replicate the data it was fitted to. On the other end of the spectrum you can show that a unique set of parameters has been determined from the available training data and that the model predicts measurements from an identified set of system-level data not included in the fitting process (Niederer et al. 2011a; Provost et al. 2011). The multi-physics nature of cardiac electromechanics models and an insufficient quantity and quality of data has limited validation in the past. Examples of previous studies include the combination of models of rat contraction at room temperature with guinea pig electrophysiology models at body temperature (Nickerson et al. 2001), using a guinea pig electrophysiology model within a rabbit geometry (Jie et al. 2010), the development of whole organ models, where the cellular models were based on available data at room temperature, which could not be compared directly against in vivo measurements (Niederer & Smith, 2009).
As is often the case in computational modeling, comprehensive validation is not achievable. However, we assert that in these cases, models compared against limited data demonstrate that the proposed set of parameters and equations provide only a plausible representation of the system under study. Moving to the concept of a plausibility spectrum would lead to models demonstrating that they are highly plausible through multiple independent comparisons of model predictions and experimental data. This would, in turn, provide a more transparent description of the model validation process.
Improving the link between models and experimental data is essential if models are to fulfill their role in linking multiple physiological observations across disparate spatial and temporal scales with underlying function. Cardiac electromechanical models have begun to consider the impact of temperature and species on model outcomes. However, at present the vast majority of these models still ignore the dominant predictor of disease: age (Houle et al. 2010). Models have begun to account for the effects of age by simulating neonatal cells (Wang & Sobie, 2008) and the progression of heart failure following genetic knockout of SERCA over periods of weeks (Li et al. 2011). Accounting for age in computational physiology models will need to increase as models are used in clinical applications. This is particularly important in the use of computational models in drug screening (Mirams et al. 2011; Moreno et al. 2011), where the failure to identify drugs that adversely affect older unhealthy patients is because of efficacy being based on experimental studies using young healthy animals (Lin, 1995).
In addition to age, models will also need to start to account for biological variations associated with inter-subject differences in experimental data. Current cardiac electromechanics models often implicitly aim to represent population averages or a single individual. Studies have begun to include the impact of spatial variability in models with cell types fitted to data from cells in specific regions of the heart (Campbell et al. 2008; Fink et al. 2011; O’Hara et al. 2011). However, these models have yet to account for the intrinsic variation between individual cells (Bahar et al. 2006) or between individuals (Lerner & Kannel, 1986). Accounting for this variation motivates the need for computational frameworks to move away from a single model methodology to studying populations of models that can represent the underlying variability present in an individual's cells or across a population of individuals.
In addition to intrinsic variability within experimental data, as already discussed, measurements also inevitably suffer from uncertainty due to noise, under-sampling or bias. Regardless of the cause, models need to take into account the impact of uncertainty on predictions. This has motivated our own work in cardiac electromechanics models to provide confidence intervals or ranges of potential outcomes (Niederer & Smith, 2009) as opposed to single deterministic predictions.
Current community efforts have led to the use of standardized languages for publishing models (Hucka et al. 2003; Garny et al. 2008), open source and/or freely available simulation software (Garny et al. 2003; Pitt-Francis et al. 2009; Bradley et al. 2011), and the availability of models from individual groups and community repositories (Fink et al. 2011) provides a transparent representation of the model equations, parameters and solution methods. However, these efforts have yet to provide a comprehensive format to describe experimental measurements, data processing, mapping and fitting methods and a link between data to model parameters.
As demonstrated by cardiac electromechanics models, computational physiology has a significant role to play in rationalising data, linking observations to function and translating laboratory results into clinical contexts. Embedded within this role there are significant challenges and opportunities, a number of which have been outlined above. Ensuring that models transparently define the system under study, link parameters with data and model results are compared with data to demonstrate increased plausibility is a major challenge for the modelling community. Towards this goal, extensive supplements describing the model development methodology have been provided alongside recent modelling studies. If this trend continues, models will reach their potential of providing a succinct, complete and transparent representation of physiological data and function and, in turn, provide a valuable approach for revealing physiological insight.