Bayesian sensitivity analysis of a 1D vascular model with Gaussian process emulators

One‐dimensional models of the cardiovascular system can capture the physics of pulse waves but involve many parameters. Since these may vary among individuals, patient‐specific models are difficult to construct. Sensitivity analysis can be used to rank model parameters by their effect on outputs and to quantify how uncertainty in parameters influences output uncertainty. This type of analysis is often conducted with a Monte Carlo method, where large numbers of model runs are used to assess input‐output relations. The aim of this study was to demonstrate the computational efficiency of variance‐based sensitivity analysis of 1D vascular models using Gaussian process emulators, compared to a standard Monte Carlo approach. The methodology was tested on four vascular networks of increasing complexity to analyse its scalability. The computational time needed to perform the sensitivity analysis with an emulator was reduced by the 99.96% compared to a Monte Carlo approach. Despite the reduced computational time, sensitivity indices obtained using the two approaches were comparable. The scalability study showed that the number of mechanistic simulations needed to train a Gaussian process for sensitivity analysis was of the order O(d) , rather than O(d×103) needed for Monte Carlo analysis (where d is the number of parameters in the model). The efficiency of this approach, combined with capacity to estimate the impact of uncertain parameters on model outputs, will enable development of patient‐specific models of the vascular system, and has the potential to produce results with clinical relevance.


INTRODUCTION
The cardiovascular system is a complex network of elastic vessels through which blood is pumped by contraction of the heart. This pulsatile regime and vessel elasticity cause pressure to propagate along the arterial circulation as waves. Mechanical discontinuities caused by bifurcation, bends or cardiovascular pathology, eg, vessel stenoses, cause pressure waves to be reflected in all directions. As a result, pressure waves measured at a specific location can be seen as the result of a superimposition of incident and reflected waves. The analysis of this superposition mechanism allows for the study of mechanical properties upstream and downstream of the measurement point and can be a rich source of diagnostic information about the system through which these waves propagate. However, given the vascular system complexity, it is at present difficult to ascribe a particular waveform feature to a specific trait of the arterial circulation. [1][2][3] Numerical models of the vascular tree are ultimately aimed at a better understanding of the circulation. Depending on the model dimensionality and on the target application, different levels of complexity can be reached. 4 Three-dimensional (3D) models are spatially accurate and can predict blood flow in complex geometries, eg, around heart valves, near aneurysms, or arterial bifurcations. 5,6 However, these models rely on an accurate description of the system geometry and of the boundary conditions. Hence, when noninvasive measurements are difficult to be obtained, 3D simulations cannot be performed. One-dimensional (1D) models are capable of simulating the physics of pulse wave propagation and reflection. Many of these models are based on a reduction of 3-dimensional Navier-Stokes equations and on a constitutive equation linking transmural pressure to arterial wall displacement. [7][8][9] Each vessel of the arterial tree is represented by a straight elastic tube whose parameters can be fixed or varied along the tube length. 10 One-dimensional models are less computationally expensive than 3-dimensional ones and require a less precise description of the vasculature geometry. The drawbacks reside in the lack of accuracy in areas where the flow is not developed, eg, in proximity of valve outlets and bifurcations, where secondary and recirculatory flows may originate. One-dimensional models have been used to study and reproduce the behaviour of the entire systemic circulation, 11,12 the coronaries, 13 the cerebral vasculature, 14 and the pulmonary circulation. 15 Although faster to solve than full 3-dimensional simulations, 1D models still require a large number of parameters to be specified for each vessel in the system. In a patient-specific scenario, the total number of parameters to be measured becomes easily infeasible and clinically nonjustifiable, eg, an arterial tree made of 103 segments requires about 500 parameters. 12 This issue can be overcome by identifying those parameters that will have a more significant effect on the output of interest. By ranking the inputs, the parameters worth to be accurately measured could be identified and their uncertainty reduced. Whereas, parameters whose variation has less effect on the outcome of interest can be fixed to their typical reference values. The process of parameter ranking and fixing consists in performing a model sensitivity analysis. 16 The state-of-the-art technique for parameter ranking and fixing is the computation of sensitivity indices. 17 These indices assess the sensitivity of outputs to variation in individual inputs or combinations of inputs. First-order sensitivity indices measure the proportion of output variance that can be accounted for by the variance of one individual input. The variation of a single output due to the combined variation of multiple inputs is measured by higher-order sensitivity indices. The sum of all indices concerning a specific input is called total sensitivity index. First-order indices can be used to rank inputs and to decide which has the strongest influence on the model outputs. Eventually, by highlighting input collaborations, total sensitivity indices assess the inputs or model parameters that can be fixed.
The Monte Carlo method is a simple approach to compute sensitivity indices. Several thousand simulator (ie, the mechanistic model) runs are done. For each simulator run, a different set of inputs are randomly drawn from a distribution of point, which covers the entire input space. Ideally, the distribution contains infinite points, hence by obtaining a result for each point, the model global behaviour would be known. In practice, an infinite distribution cannot be achieved, and Monte Carlo sampling requires a number of runs of the order of O(d × 10 6 ), where d is the number of input parameters. Saltelli et al 18,19 introduced a Monte Carlo-based technique to calculate sensitivity indices that requires a number of runs of order O(d × 10 3 ), but this can still be an impractical number of runs for most applications.
When considering a large number of parameters, the computational time needed for the d × 10 3 simulations becomes prohibitively high. This analysis can be made more efficient by introducing a fast-running approximation of the mechanistic model, ie, an emulator. 16 Emulators are well known in both the applied mathematics and in the statistics community. 20 The former mostly uses a tool called polynomial chaos expansions (PCE), whereas the latter principally uses Gaussian process (GP) emulators. Both tools aim to infer the simulator global behaviour starting from observed simulator runs. The main advantage of GP over the PCE technique resides in the availability of uncertainty information. This characteristic directly descends from the probabilistic nature of a GP, which allows embedding of uncertainty and explicit treatment of model parameters as uncertain quantities. This is essential because it enables the impact of missing, uncertain, or noisy measurements on model outputs to be quantified, which can become relevant in a clinical setting. For a detailed analysis of the differences between the two techniques, see other works. 20,21 The PCE has been successfully used for sensitivity analysis in the cardiovascular field, see, for example, other studies. [22][23][24][25] Outside of the cardiovascular field, GP is a widely used emulation technique, 26 but to our knowledge, GP has never been used to predict 1D blood flow model outcomes for sensitivity analysis purposes.
The aim of this work was therefore to compare sensitivity analysis of a 1D cardiovascular model on the basis of GP emulators with the traditional approach based on Monte Carlo sampling. We have also shown how the GP properties scale with the vascular network complexity. Four patient-generic arterial networks of increasing size were used to demonstrate the benefits of using GP emulators for cardiovascular applications. Results were validated on the outcomes of a Monte Carlo analysis.

Vascular model
The vascular model was based on the reduced 1D form of the general continuity and Navier-Stokes equations for incompressible flows within narrow straight elastic tubes. 7,9 A constitutive equation describing elastic wall behaviour was used to close the equation system. The equations governing the problem of flow through and an elastic vessel are where t is time, x is the longitudinal space coordinate, is the blood density, is the blood dynamic viscosity, A 0 is the reference cross-sectional area (ie, when P − P ext = 0), h 0 is the reference wall thickness, E is the wall Young modulus, = 1∕2 is the wall Poisson ratio, and P ext is the external pressure. The numerical solution of Equation 1 was achieved by means of a finite-volume scheme. [27][28][29] The numerical mesh was independently set for each vessel by using at least 5 elements or, where possible, Δx = 1 mm. To guarantee numerical stability, time steps were adaptively computed at each iteration in all vessels depending on the maximum local wave speed c max and Δx as where the Courant number was set to C cfl = 0.9 and the Δt set for the entire system was the smallest computed for all vessels in the system. A boundary condition was applied to the inlet of the root vessel as a flow time function. Outlet boundaries of peripheral bifurcating vessels were coupled to 3-element windkessel models 30 ; at capillary level, the last boundary condition was assigned by assuming the arterial-venous interface relative pressure equal to 0. To avoid artificial wave reflections induced by discontinuities caused by time-dependent changes in diameters at the bifurcation outlets, windkessel impedances Z were calculated at each time step to match daughter vessel outlet impedance.
To show how the computational time, the accuracy, and the convergence of the proposed methodology scales with the mechanistic model complexity, 4 vascular networks were analysed ( Figure 1). These represent the iliac bifurcation (8 arteries), the ascending and the upper thoracic aorta (7 arteries), the thoracic aorta and the right arm (15 arteries), and a more complete cardiovascular system (61 arteries). Geometrical values (lumen radius R 0 and vessel length ), material properties (wall Young modulus E), inlet flow time function (Figure 2A), and windkessel parameters (peripheral resistance R and peripheral compliance C) were based on data published in previous studies. [31][32][33] Uncertainty domains for all the parameters were set as in Huberts et al 1 ( Table 1). Analysis of the results was conducted by extracting 2 outputs: the pressure waveforms were computed at each node of the system, and minimum and maximum values at middle point of the root vessel (ie, the vessel to which the inlet boundary condition is applied) were recorded ( Figure 2B).

Gaussian process for regression
Gaussian process theory for both regression and classification is well known and established in the machine learning community. In this context, only GP for regression is presented. A complete and exhaustive mathematical description of this method can be found in the literature, see, for example, other studies. 26,[34][35][36] Let us consider a set of N d-dimensional input vectors Inputs are mapped into outputs by an unknown function f(X). The goal of regression is to learn function Since the input data can be uncertain, it is desirable to obtain a set of likely interpolating functions rather than a single function f(X).
A GP describes a probability distribution over functions, p( f). This conveys the prior knowledge about the interpolating functions before observing data D. By definition, the prior of GP has a multivariate Gaussian distribution, mathematically described by where m = m (x) is the mean vector computed with the mean function m and Σ is a covariance matrix. The mean is often set to 0; this is because datasets can be simply preprocessed to have 0 mean before training the statistical model. The covariance matrix is built through a kernel function k(x, x ′ ) as The kernel choice depends on the problem. Complex behaviour may require a combination of kernel functions. Kernel function building starts from widely used kernel functions like the squared exponential or the Matérn class of functions where i (i = 1, … , 3) are hyper-parameters, Γ(·) is the Gamma function, I is a modified Bessel function, and is a positive parameter that controls the smoothness of the kernel function. 37 Of particular interest are the cases = 3∕2 and = 5∕2 for which the kernel function can be simplified as Hyper-parameters i govern kernel properties, and their selection is part of the optimisation process. The GP optimises hyper-parameters from training data. This is done by applying Bayes Theorem, which evaluates the uncertainty in f after D has been observed. The posterior probability, p( f |D), is obtained by taking into account prior knowledge about f before seeing D, p( f), and by expressing how likely it is to observe D for different f.   The effect of the observed data is expressed by the likelihood function p (D|f ). The denominator term is a normalisation constant. The likelihood function is used as a cost function, and it is maximised with respect to hyper-parameters i . This, in turn, maximises the posterior probability of f describing D.
Let us now introduce a new set of test inputs X T and corresponding outputs Y T . The predictive distribution is assumed to have a multivariate Gaussian distribution with mean and covariance given by where Σ NT = k(x n , x N+T ) for n = 1, … , N. Predictions are made by sampling the distribution (Equation 10) at points from X T . Note that the predictive distribution does not have a zero mean function and that its covariance matrix is computed by using the optimised kernel function learned from the training set D.
The training cost has O(N 3 ) complexity, becuse of the covariance matrix inversion. Nevertheless, this is a one-time operation, and the GP obtained can be used to predict an output at any input point within the input space.

Gaussian process emulator verification
To assess the GP prediction error, a Monte Carlo (MC) analysis was performed on system 1. This consisted of running d × 10 3 simulations by sampling the d-dimensional input space. The d × 10 3 inputs and the d × 10 3 × 2 outputs constituted the GP design data D. A portion of the dataset was randomly sampled and saved for diagnostic purposes. The GP model was trained on the remaining part. The MC analysis was performed for the networks I, II, and III, but not for the complete model IV, as the computational time required would have been prohibitive and estimated around 9 years and 4 months (see Figure 5).
The GP model was implemented by using the GPy library. 38 To avoid numerical problems due to bad conditioning of the covariance matrix, training inputs and training outputs were normalised dimension-wise, ie, each dimension in the input space was separately normalised. The kernel function to compute the covariance matrix was obtained as the sum of a squared exponential kernel and a Matérn 5∕2 kernel. Hyper-parameter optimisation was conducted by minimising the log-likelihood through the conjugate gradient descent method.
The training sample was varied in size to assess the relation between the prediction error and the sample size. Three diagnostics 39 were used to verify GP model predictions: 1. Graphical analysis. Normalised GP model (emulator) predictions were compared with normalised vascular model (simulator) outputs ( Figure 3A-C). The normalised values were computed by first subtracting their mean, by then dividing by their standard deviation, and then combined all together. Points lying on the dashed line of equality indicate a good agreement between emulator and simulator.

Standardised prediction error. The differences between simulator outputs (Y) and emulator mean predictions (m[ (X)|Y]) were quantified as
The emulator is claimed to be able to represent properly the simulator when its error distribution is normally distributed ( Figure 3D-F). The normal distribution having mean and standard deviation computed from the error distribution is plotted over the actual error distribution. A visual inspection of the error distribution confirms the similarity between the two distributions.

Sensitivity analysis
Sobol sensitivity indices are briefly introduced. More mathematical background is given in previous works. 17,19,23 If we consider a model y = f (x 1 , … , x d ), the model total variance reads where the term y 0 represents the model mean value and S is the input hyperspace. Analysis of variance (ANOVA) decomposition allows the variance (Equation 13) to be rewritten as the sum where the partial variances are defined as where S i is the domain of x i . The partial summands f i , f ij , and f ij … k are univocally defined as where S − i is the input hyperspace except the domain of x i , ie, S = S i ∪ S − i . The partial variances in Equation 14, normalised with the total variance V(y), return sensitivity indices. First-order sensitivity indices S i measure the amount of variance that can be attributed to variance on each input. The variance in the output due to interaction between variances of two inputs is given by second-order sensitivity indices S ij . The sum of all the indices regarding input i is called the total sensitivity index T i . The influence of input i on the output y due to various input interactions is measured by the difference between T i and S i . By construction, sensitivity indices are smaller than 1 and can be interpreted as proportions.
The sensitivity analysis for parameter fixing and prioritisation was performed following a 3-step strategy 16 : 1. The mechanistic model was run with a small set of input values spanning evenly the entire parameter space. Simulation inputs and outputs constituted the dataset T on which the GP emulators were trained. T was designed through orthogonal Latin hypercube sampling method to ensure an even coverage of the input space. To avoid an ill-conditioned covariance matrix, inputs and outputs were normalised. 2. The trained GP was then used to predict outcomes for a bigger set of inputs with size of order O(d × 10 3 ). 3. Sobol sensitivity indices were computed by means of ANOVA decomposition. Inputs were ranked accordingly to first-order indices. The largest first-order indices indicated those inputs that mainly affect the outcome. Any differences between first-order and total sensitivity indices indicated that the outcome variance could be ascribed to covariance of more than one input.

RESULTS AND DISCUSSION
The time spent to run a single simulation with the mechanistic solver, t s , for each network is reported in Figure 5A. The trained GP emulators were used to predict results for all the d × 10 3 input points needed for the sensitivity analysis ( Figure 5B). These input points were chosen to explore thoroughly the input hyperspace. The sensitivity analysis computational running times, t SA , of both the numerical vascular model and the GP emulator are reported in Figure 5C. The computational time in the MC analysis added up to 2.5 days for the single bifurcation, and it was estimated to scale up to about 9 years in the case of the complete vascular model. The GP computational time for both training and prediction phases increased as the number of vessels in the network increased. The t SA was always 4 orders of magnitude smaller with respect to the MC approach. By coupling the numerical model with the GP regression model, the bulk of the computational time was taken by numerical simulations on the dataset used for training (Table 2), eg, 14 hours for the 61 arteries model. Predictions for the d × 10 3 datasets were made in 0.42 seconds by the GP emulator. Therefore, the sensitivity analysis on the complete model was performed in 14 hours.
The GP prediction error decreased as the number of points in the training sample increased (Figure 4). The number of training points needed to score a MAPE lower than 1% was always lower than the number of points needed to per-form the MC analysis (Table 2). These results indicated that the number of points needed to train the GP is of the order O(d).
In the case of the complete model, the ensemble of input points and predicted outputs was used to compute first-order Sobol sensitivity indices ( Figure 6A-E). To ease the analysis, the indices were subdivided in 5 sets depending on the vessel location as in Eck et al. 2 The vessel length and the peripheral compliance ( and C, respectively) scored low first-order sensitivity indices (less than 0.02, Figure 6B and 6E). Upper limbs vessel radii and Young moduli slightly affected (S R 0 ,E ∼ 0.05) the maximum pressure in the ascending aorta whereas they had no effect on the minimum pressure ( Figure 6A and 6C). Aorta vessel radii and Young moduli were the topological parameters to have the most effect on the maximum pressure (S R 0 ,E ∼ 0.1). The peripheral resistance (R) variation at organ vessels affected the variation of the maximum and minimum pressure (S R ∼ 0.25), whereas the upper part of the network (upper limbs and neck vessels) has a slightly lower effect (0.05 < S R < 0.15) on both minimum and maximum pressure at the ascending aorta ( Figure 6D). To show the sensitivity indices spatial distribution, the first-order indices relating the maximum pressure variation to the variation of the Young modulus were converted to luminance values and plotted in a heat map ( Figure 6F).
The choice of different inlet boundary conditions for the networks investigated did not seem to affect the main outcome of the study, as computational time reduced linearly with the number of vessels for all networks regardless of the conditions set at their inlets ( Figure 5C). In addition, the same prediction accuracy, as measured by the MAPE (Figure 4), was achieved for a sample size proportional to the number of vessels rather than to the type of flow time-function used. Nonetheless, the inlet boundary condition is a source of uncertainty because of its large variability between individuals. In future studies aimed at finding clinical biomechanical markers, the inlet flow function will be included as a GP input.

FIGURE 6
A to E, First-order sensitivity indices (S i ) relating the variation of minimum and maximum pressure (the outputs of the deterministic model, P min and P max , respectively) at the ascending aorta location ( * marker in F) with the variation of the lumen radius R 0 , vessel length , wall Young modulus E, peripheral resistance R, and peripheral compliance C for 5 groups of arteries. F, Diagram of the 61 vessels vascular network.
The 5 groups of arteries used for the sensitivity analysis are indicated by different colours

CONCLUSIONS
One-dimensional models of the cardiovascular system provide an accurate description of the physics of wave transmission in blood and can be used to provide realistic or patient-specific pulse and flow rate waveforms. Their mathematical description relies on the specification of a large number of parameters, which are often not readily available as typical or patient-specific values. Many of these cannot be specified as a constant value either, as they will vary within the physiological envelope of an individual. In this context, ranking and fixing of parameters through sensitivity analysis has been previously proposed as a way to focus on the most influential model inputs or simply to quantify input uncertainty on variables of interest. However, these operations may require many simulations, resulting in large or infeasible computational time.
A time-efficient approach to sensitivity analysis is proposed in this paper. A reduced number of model simulations were used to train a GP regression model. This emulator was able to mimic the vascular numerical model with a percentage error lower than 1% when compared to the actual model runs. The emulator running time was also much shorter than the simulator: the GP prediction phase for network IV took 0.19 second as opposed to an average of 197 seconds required for a single run of the deterministic model. More in general, the adoption of a GP emulator in the sensitivity analysis framework allowed for a minimum reduction of computational time by 99.96% compared to the Monte Carlo analysis.
This framework scalability was tested by developing 4 vascular models of increasing complexity, ie, starting from a single bifurcation, the number of vessels was increased up to 61 in the case of a complete vascular model. In all the four cases, by introducing the GP, the simulator runs needed for the sensitivity analysis is reduced from d × 10 3 to O(d).
The analysis of sensitivity indices allowed us to identify the location in the network of model parameters affecting maximum and minimum pressures in the ascending aorta. In particular, the minimum pressure was affected by changes in the peripheral resistance of organ arteries. The maximum pressure was sensitive to changes in the aorta Young modulus as well as in the upper limbs arteries.
The introduction of a GP regression model as an output generator for a mechanistic model is a novel approach in the cardiovascular research community. The conclusions drawn from sensitivity analysis are not novel, but they confirm that the developed framework is sound, and it is capable of capturing the intrinsic non linear behaviour of flows through a vascular network. Running times were drastically reduced when using the emulator approach, which allowed a thorough sensitivity analysis with comparable accuracy to a much more time-consuming approach. The study of model sensitivity indices gave an insight into how the inputs interact and could be used to study how input uncertainty propagates through to the outputs. The same approach has the potential to improve efficiency in the analysis of more complex and complete models of the cardiovascular systems.