SEARCH

SEARCH BY CITATION

Keywords:

  • parameter estimation;
  • parameter selection;
  • sensitivity analysis;
  • hierarchical clustering;
  • dynamic optimization

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Preliminaries
  5. Parameter Set Selection for Dynamic System Under Uncertainty
  6. Case Studies
  7. Conclusions
  8. Acknowledgments
  9. Literature Cited
  10. Appendix:  

It is common that only a subset of the parameters of models can be accurately estimated. One approach for identifying a subset of parameters for estimation is to perform clustering of the parameters into groups based upon their sensitivity vectors. However, this has the drawback that uncertainty cannot be directly incorporated into the procedure as the sensitivity vectors are based upon the nominal values of the parameters. This article addresses this drawback by presenting a parameter set selection technique that can take uncertainty in the parameter space into account. This is achieved by defining sensitivity cones, where a sensitivity cone includes all sensitivity vectors of a parameter for different values, resulting from the uncertainty, in the parameter space. Parameter clustering can then be performed based upon the angles between the sensitivity cones, instead of the angle between sensitivity vectors. The presented technique is applied to two case studies. © 2013 American Institute of Chemical Engineers AIChE J, 60: 181–192, 2014


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Preliminaries
  5. Parameter Set Selection for Dynamic System Under Uncertainty
  6. Case Studies
  7. Conclusions
  8. Acknowledgments
  9. Literature Cited
  10. Appendix:  

Mathematical models composed of ordinary differential equations (ODEs) or differential algebraic equations (DAEs) are widely used to describe the behavior of dynamic systems, ranging from ecological systems,[1] power systems,[2] chemical processes,[3] biochemical reaction networks,[4] to pharmaceutical processes.[5] The accuracy of these models not only depends on the structure of the model which is determined by the physics, chemistry, and biology of the system, but also relies on adjustable parameters in the model, many of which are either taken from the literature or estimated using experimental data.

A number of studies have investigated various aspects of parameter estimation.[6-8] The parameter estimation problem can be formulated as an optimization problem which minimizes the norm of the error between data and model predictions while treating the equations of the system as constraints. The problem can then be solved by either a sequential approach or a simultaneous approach,[9] with both approaches offering advantages for certain classes of problems. However, before any parameter estimation is performed, it is important to determine if all parameters are numerically identifiable and, if not, then what subset of parameters can be accurately estimated.[10]

A variety of methods for parameter set selection based on sensitivity analysis have been proposed in the literature. These methods include, but are not limited to, genetic algorithms,[11] collinearity index methods,[1] column pivoting methods,[12] Gram–Schmidt orthogonalization methods,[13] and clustering methods.[14] A systematic scheme for parameter set selection is based on optimality criteria computed from the Fisher information matrix which is closely related to the parameter-output sensitivity matrix.[15]

It is important to note that all of the methods mentioned above for parameter set selection utilize local sensitivity analysis. The main drawback of local sensitivity analysis is that the sensitivity vectors are dependent on the parameter values that are not precisely known prior to parameter estimation. This may result in identification of parameter subsets that are suboptimal, which can have a significant impact on the model's prediction accuracy if the parameter uncertainty is large.[16] One alternative to local sensitivity analysis is global sensitivity analysis which simultaneously varies multiple parameters, often over a large range. Additionally, global sensitivity analysis can incorporate the uncertainty description of the parameters into the sensitivity analysis procedure. Unfortunately, the results from global sensitivity analysis, for example, the Morris method,[17] sampling-based method,[18] or a variance-based method,[19] are nontrivial to interpret for experimental design or parameter set selection[16] as they do not rely on the concept of sensitivity vectors.

This article addresses the challenges mentioned above for parameter set selection of dynamic systems under uncertainty. This is achieved by combing a hierarchical clustering method[14] and dynamic optimization techniques, in order to simultaneously vary all parameters over a large range, to quantify the effect of the uncertainty in the parameter space on the sensitivity vectors. As the uncertainty of the parameter values has an effect on the sensitivity vectors, a sensitivity cone is computed for each parameter. All sensitivity vectors associated with one parameter, which correspond to different values of the parameters according to their uncertainty, are contained inside a sensitivity cone. Unlike the local approach where parameters are clustered according to their sensitivity vectors,[14] the presented approach clusters the parameters based upon the sensitivity cones introduced in this work. Important challenges that are dealt with in this work arise from the computation of the sensitivity cones, not only because of the uncertainty description over all parameters, but also because not all cones have the same angle as the sensitivity vectors of some parameters are significantly more affected by uncertainty than those corresponding to other parameters.

The article is structured as follows: preliminaries concerning sensitivity equations, hierarchical clustering, and dynamic optimization are presented. Then, the problem formulation and the solution approach for quantifying an upper bound on the uncertainty will be discussed. A new scheme for parameter selection is then introduced based upon hierarchical clustering of sensitivity cones. Two models will be presented in the case studies.

Preliminaries

  1. Top of page
  2. Abstract
  3. Introduction
  4. Preliminaries
  5. Parameter Set Selection for Dynamic System Under Uncertainty
  6. Case Studies
  7. Conclusions
  8. Acknowledgments
  9. Literature Cited
  10. Appendix:  

Sensitivity equations

One form of dynamic system containing n states, m parameters, and l inputs can be represented as

  • display math(1)

where, inline image is the state vector, inline image is the parameter vector, and inline image is the input vector. The sensitivity equation is derived by taking the derivative of Eq. (1) with respect to the parameters and by applying the chain rule. The resulting dynamic sensitivity equation for the state xj and the parameter pi is given by

  • display math

which is equivalent to

  • display math(2)

Here, sij can be defined as the sensitivity of the state xj with respect to the parameter pi. As, i∈{1,…,m} and j inline image{1,…,n}, the total number of sensitivity equations is m × n. The extended model consists of the original model and the sensitivity equations which corresponds to inline image ODEs. The sensitivities can be calculated by integrating all of the ODEs simultaneously using an ODE solver. However, as the original model (Eq. (1)) is independent of the sensitivity equations and the sensitivity equations in Eq. 2 are independent for different pi, only 2n equations (Eqs. (1) and (2)) for a particular pi need to be integrated at a time.

One advantage of utilizing sensitivity equations rather than automatic differentiation to calculate the sensitivity is that the integration of the extended ODE model, given by Eqs. (1) and (2), has a higher guaranteed accuracy than automatic differentiation.[20-22] This is important as a small perturbation of a single-model parameter might not generate a sufficiently large output variation, if this calculation is performed with automatic differentiation, whereas a relatively large perturbation might cause the dynamic system to become unstable.[21] Additionally, due to the unique structure of the extended ODE model, the sensitivities of all the states can be calculated with respect to only one parameter at a time by integrating 2n equations, which makes the scheme computationally inexpensive.

After calculating the sensitivity variables using numerical integration, the sensitivity vector can be obtained by interpolating between the values for uniformly spaced time points t = (t1,t2,…,th)T, where h is the time step. The i-th column of the sensitivity matrix S represents the sensitivity vector of the output xj with respect to the parameter pi.

  • display math(3)

If there is number of r outputs are considered at the same time, that is, inline image, inline image, …, inline image then the matrix from Eq. (3) would result in a tensor

  • display math(4)

where sij represents the sensitivity vector of the j-th output with respect to the i-th parameter. The sensitivity matrix shown in Eq. (3) or the tensor from Eq. (4) are normalized by dividing each sensitivity vector by inline image, where inline image is the steady-state value of its corresponding output and inline image is the nominal value of its corresponding parameter.

As the local sensitivity matrix is dependent on the initial value of the adjustable parameters, the performance of parameter set selection using this matrix is heavily influenced by the nominal values. To address this challenge, this work develops a formulation to incorporate the uncertainty in the parameter space for sensitivity analysis of a dynamic system with respect to the parameters. The approach for quantifying this uncertainty for the sensitivity vectors will be discussed in the next section.

Hierarchical clustering

Hierarchical clustering is a technique to group data based upon similarities of statistical properties. Results from hierarchical clustering are commonly presented in a dendrogram. A dendrogram does not only represent a single set of clusters, but rather a multilevel hierarchy, where clusters at one level are grouped into clusters to form the next level and the different levels correspond to degrees of similarity that are assigned to objects within clusters.[23] Strategies for hierarchical clustering generally fall into two types: agglomerative and divisive. Agglomerative clustering is a bottom-up approach where each object starts in its own cluster, and pairs of clusters are merged as the threshold value moves up the hierarchy. Divisive clustering is a top-down approach where all objects start in one cluster, and splits are performed recursively as the threshold value moves down the hierarchy.

For parameter set selection, hierarchical clustering is used to reduce the number of model parameters considered by determining several groups of parameters that are pairwise indistinguishable (i.e., they cannot be uniquely estimated for a reasonable level of noise in the measurements). It is then possible to only consider one parameter per group for estimation. Hierarchical clustering via agglomeration is implemented using the following steps:

  1. Calculate the distance between every pair of objects in the dataset. For a dataset containing m objects, there are inline image pairs for which distances need to be measured.
  2. Group the objects into a dendrogram based on the distance.
  3. Determine a threshold distance cutoff value for the similarity which directly affects the number of clusters.

In step one, the choice of an appropriate distance metric is very important, as some objects may be close to one another according to one distance metric and farther away according to another, and this will significantly influence the clustering algorithm. For example, the Euclidean distance is commonly applied to data scattered in Euclidean space, however, for high-dimensional vectors, cosine similarity is usually used to measure the angle between the vectors.[24, 25] As explained previously, each column of the normalized sensitivity matrix represents the sensitivity vector for a corresponding parameter. A large angle between two sensitivity vectors implies a large distance (less similarity) between these vectors (shown in Figure 1a). As the sign of the direction of the sensitivity vector has no influence on parameter set selection, the cosine distance between two sensitivity vectors is defined using the following equation

  • display math(5)
image

Figure 1. Illustration of cosine distance between (a) sensitivity vectors and (b) two clusters of sensitivity vectors.

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Here, inline image and inline image represent two different sensitivity vectors, and inline image is the defined cosine distance.

In step two, the data are clustered using a dendrogram based on the distance defined in step one. Commonly used linkage criteria between two clusters include complete linkage clustering, where the longest distance between two clusters is found, single-linkage clustering, where the shortest distance is found, average linkage clustering, and minimum energy clustering.[23] For parameter set selection, parameters need to be clustered into several groups such that the parameters within a group are pairwise indistinguishable, therefore, the complete linkage criterion is preferred.[14]

A threshold cutoff value for the similarity is determined in the last step. This threshold value affects the number of clusters that the sensitivity vectors are partitioned into (shown in Figure 1b). As a rule of thumb, the cutoff value is chosen to be small enough so that the angle between any pair of sensitivity vectors within a cluster is small. However, it should be noted that if the uncertainty in the parameter space is large, then the parameters cannot be distinguished no matter how small the cutoff value is chosen. This point will be discussed in more detail in the next section.

One of the key contributions of this work is to translate the uncertainty of the parameters into the angle of each sensitivity cone and then use the angles between the sensitivity cones for clustering. Hierarchical clustering will be performed based on the corresponding cosine distance of the angle between sensitivity cones.

Dynamic optimization

Dynamic optimization refers to a category of optimization problems that address time-varying systems. Generally, these problems seek to maximize or minimize an objective function by determining a group of input profiles that may change over time, while the dynamic model, described by initial-value ODEs or DAEs, is considered as constraints. It is worth noting that, for parameter estimation applications, the optimization problem seeks to minimize some measure of the prediction error and the parameters are assumed to be constant over time. As DAE systems can be treated as ODE systems with equality constraints imposed on the state variables, all of the dynamic systems mentioned below are described by ODEs. The mathematical formulation of the dynamic optimization problem is given by

  • display math(6)
  • display math(7)
  • display math(8)

Here, y is the performance index to be optimized, u(t) is the input profile, p are the parameters. Equation (7) represents the ODE system over the time horizon inline image with states x, and known initial values x0. Path constraints are denoted as Ψ and Ω are the terminal constraints.

Two numerical approaches can be used to solve dynamic optimization problems. The sequential approach parameterizes the inputs using a finite number of decision variables, integrates the system states iteratively to compute the performance index, and then uses an optimization algorithm to update the values of the decision variables. Although the sequential approach is straightforward to implement, it can be very computationally expensive, especially when dealing with a large number of degrees of freedom and inequality path constraints.[26] The simultaneous approach parameterizes the input variable and also discretizes the dynamic system, typically using some collocation method. The discretized system is then included as algebraic constraints in a large-scale optimization problem. This approach increases the size of the optimization problem considerably, but the performance can be significantly better as the model (represented by the discretized equality constraints) is solved simultaneously with the optimization problem. The discretized problem can be especially large for stiff systems due to the fine discretization required, and powerful Nonlinear Programming (NLP) solvers must be used to deal with these problems.[27] All the dynamic optimization problems presented in this work are solved using the simultaneous approach because of its computational efficiency.

Parameter Set Selection for Dynamic System Under Uncertainty

  1. Top of page
  2. Abstract
  3. Introduction
  4. Preliminaries
  5. Parameter Set Selection for Dynamic System Under Uncertainty
  6. Case Studies
  7. Conclusions
  8. Acknowledgments
  9. Literature Cited
  10. Appendix:  

Visualization of the effect of uncertainty in the parameter space on the sensitivity vectors

As discussed in previous section, the distance of each pair of sensitivity vectors is reflected by the angle between the vectors in a high-dimensional Euclidean space. A large angle represents a long distance (low similarity) and a small angle represents a short distance (high similarity). Based on the agglomerative clustering strategy and the complete linkage criterion, sensitivity vectors can be clustered into different groups in a dendrogram. If parameter uncertainty is considered for the dynamic system, the sensitivity vector generated from the sensitivity equation for each parameter will not be a single-fixed vector but a group of vectors distributed around the nominal vector as the sensitivity vector changes for perturbations in the parameter values corresponding to the uncertainty range (shown in Figure 2a). A sensitivity cone can be defined where all the sensitivity vectors corresponding to one parameter for different values of all parameters lie within the cone. Any vectors inside the sensitivity cone cannot be distinguished from each other due to uncertainty in the parameter values. Furthermore, vectors associated with different parameters may have sensitivity cones that overlap due to uncertainty (shown in Figure 2b). One result of this is that parameters that have sensitivity cones which overlap, or have only small angles between them, need to be grouped in the same cluster.

image

Figure 2. Visualization of effect of uncertainty in parameter space on sensitivity vectors.

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

In order to capture the largest uncertainty for each parameter inline image is defined as the largest angle between the vectors inside the sensitivity cone and their corresponding nominal vector

  • display math(9)

Here, si(p0) and si(p) are the sensitivity vectors of the i-th parameter at the nominal values p0, and at other values, p, chosen from the uncertainty range of all parameters, respectively. As the sign (or direction) of the sensitivity vector has no impact on the parameter set selection, only the absolute value of inline image is considered, and inline image is an acute or right angle.

It can easily be seen that two sensitivity cones will not overlap when inline image (illustrated in Figure 2b), thus the corresponding parameters p1 and p2 can be distinguished as long as the cutoff value is small enough. In other words, if inline image2, then the two sensitivity vectors may be colinear, and the two parameters cannot be distinguished no matter how small the cutoff value is. It should be noted that the condition inline image is conservative as it will account for the worst possible case that can happen for a given level of uncertainty in the parameter values. As the sum of inline image and inline image will always be used to compare with inline image, for different pair of parameters pi and pj, an effective angle inline image is defined here using Eq. (10). The corresponding cosine distance between two sensitivity cones becomes inline image. Hierarchical clustering using complete linkage criterion is then performed based on the cosine distance of the effective angle between each pair of sensitivity cones. It is important to note that calculating the pairwise cosine distance of the effective angle between two sensitivity cones and then clustering them are two independent steps. Both steps are needed to group all sensitivity cones, where any pairwise distance is smaller than a specified cutoff value, into one cluster.

  • display math(10)

This approach incorporates the uncertainty of the parameter values into the parameter set selection procedure while retaining existing methods for generating a local sensitivity matrix and hierarchical clustering. The next subsection will describe the numerical implementation of this approach.

Problem formulation

Each sensitivity cone can be described by the sensitivity vector corresponding to the nominal values of the parameters and the angle between the cone surface and the nominal sensitivity vector. For each specific parameter inline image ( inline image is a fixed index here), this angle inline image can be calculated by solving the following dynamic optimization problem

  • display math(11)
  • display math
  • display math(12)
  • display math(13)
  • display math(14)

Here, y is the square of the cosine of the angle between the sensitivity vector and the nominal vector. The square is used rather than the absolute value so that the objective function will be continuously differentiable (a common requirement for NLP solvers). Equation (12) is the original dynamic system and Eq. (13) represents the sensitivity equations of the state xj for the parameter inline image. In this problem, the input u is fixed and treated as a known parameter, whereas all the parameters p are perturbed to determine the maximum angle inline image. The perturbation range of p can be specified to reasonable values according to its physical meaning or prior knowledge. It is worth noting that p can be either simultaneously or partially perturbed (e.g., fix some parameters at their nominal values) based on different conditions. For m different parameters, there will be m different maximum angles inline image that describe the largest possible angle between any vector inside the sensitivity cone and its corresponding nominal vector.

Solution method

The dynamic optimization problem formulated in Eqs. (11)-(14) is solved using the simultaneous approach, which discretizes the extended ODE model using a collocation method. The discretization method used in this study is a Legendre–Gauss–Radau collocation on finite elements. Using this discretization technique the differential equation

  • display math(15)

is transformed to a set of algebraic equations given by

  • display math(16)

Here, x denotes the state vector consisting of entries of the states at different points in time t. In the algebraic equations, i is the finite element number, j and k are the collocation point numbers, and CP is the set of collocation points. The finite element step length is h, which is specified based on the tradeoff between the accuracy needed for the discretization, which increases with smaller step sizes, and an acceptable size of the optimization problem, which becomes computationally more expensive for smaller step sizes. To obtain a more accurate discretization, the finite element length can be reduced, or a higher order collocation strategy can be used. Although variant-step collocation can be used if the system stiffness varies dramatically across the entire time horizon, this work uses a fixed-step three-point Radau collocation. The Radau collocation coefficient matrix for this method (shown in Eq. (17)) is found from computing the roots of the Lagrange interpolating polynomials.

  • display math(17)

Discretizing the dynamic optimization problem creates a nonlinear programming problem that can be solved using standard NLP solvers. Here, the optimization problem is formulated in A Mathematical Programming Language (AMPL),[28] which is an algebraic modeling language that provides first- and second-order derivatives through automatic differentiation. The problem is solved using the interior-point nonlinear solver IPOPT[29] which is a software package for large-scale nonlinear optimization which has excellent convergence properties (q-quadratic). It is important to note that, when using interior-point methods, the computation time required for solving the linear system to calculate the Newton steps can dominate the total solution time. When using IPOPT, the choice of the linear solver can dramatically affect the solution speed. For the results in this article, MA86[30] was used as the linear solver in IPOPT. MA86 is designed to solve large sparse symmetric linear systems in parallel. Other solvers can also be used and their convergence may depend upon a variety of factors, including, but not limited to, the conditioning of the problem and the solver itself.

Initialization is another important consideration when addressing nonlinear optimization problems using a simultaneous approach. If a problem is initialized far from its optimum or in some infeasible region, the optimizer may require a very large number of iterations to converge, whereas a good initialization might require only a few iterations. A two-step approach is used in this work to provide a reasonable initialization for this problem: the model is first simulated using the nominal parameter values, and then the values of the states computed using these nominal parameter values are used to initialize the optimization problem.

Schematic for parameter set selection under uncertainty

Now that all the preliminaries have been introduced, and the technical details for quantifying the effect of uncertainty on the sensitivity vectors have been discussed, the steps of the parameter set selection algorithm under uncertainty are summarized in Table 1. Step 2 represents an optional preliminary screening procedure for reducing the parameter set, as parameters with small sensitivity vector lengths are unlikely to be chosen for estimation. Step 3 solves an optimization problem to compute the sensitivity cones of all parameters still under consideration for parameter set selection. Step 4 performs clustering on the basis of the cosine distance of the effective angle between two cones which is defined in Eq. (10). An appropriately small cutoff value is chosen for partitioning the clusters in Step 5. In Step 6, the parameters with the largest nominal sensitivity vectors of all groups are chosen as the representatives for these groups of parameters. These selected parameters form the parameter subset that should be estimated.

Table 1. Algorithm for Parameter Set Selection under Uncertainty
Step 1Calculate the normalized sensitivity matrix for the nominal parameter values.
Step 2 (optional)Fix parameters whose nominal sensitivity vectors have small lengths (e.g., less than 5% of the largest one) at their nominal values.
Step 3Calculate the maximum angle inline image of each sensitivity cone associated with parameter pi that is not fixed at its nominal value in Step 2.
Step 4Cluster the parameters into a dendrogram by hierarchical clustering on the basis of the pairwise cosine distance between two cones inline image, where inline image.
Step 5Choose a cutoff value (usually smaller 0.05) to partition the parameters into n different clusters.
Step 6Select the parameters with the largest nominal sensitivity vectors of each of the n clusters as representatives from these clusters. These n parameters form the subset that needs to be estimated.

The limitation of the proposed approach is that, as the approach considers the worst situation, the dendrogram generated from clustering the sensitivity cones will be quickly pushed toward zero when dealing with large dynamic systems, even for relatively small uncertainty ranges. Modifications dealing with this limitation will be investigated in future work.

Case Studies

  1. Top of page
  2. Abstract
  3. Introduction
  4. Preliminaries
  5. Parameter Set Selection for Dynamic System Under Uncertainty
  6. Case Studies
  7. Conclusions
  8. Acknowledgments
  9. Literature Cited
  10. Appendix:  

This section presents two examples that illustrate the details of the technique developed previously. The first example is a dynamic model of a continuous stirred-tank reactor (CSTR) involving an exothermic reaction. Two states are measured and the D-optimality criterion is used to evaluate the performance of different combinations of parameter sets. The second example is a signal transduction pathway model representing a biochemical reaction network in liver cells exposed to the cytokine IL-6. A single state is measured and the performance of parameter set selection is evaluated using cross-validation involving a norm of the prediction error.

CSTR model

This model describes an exothermic CSTR in which a first-order reaction, A [RIGHTWARDS ARROW] B, is taking place

  • display math(18)

The model is described by the following differential equations

  • display math(19)

The three states of the system are the concentration of component A, the temperature inside the reactor, and the temperature of the cooling jacket. The temperature of the reactor and the cooling jacket are measured as the outputs. There are 16 parameters in the model and their nominal values are taken from the reference[10] and are listed in a table in the Appendix. Figure 3 shows a simulation of the system using nominal parameter values and the initial conditions listed in the Appendix.

image

Figure 3. Simulation of CSTR model at nominal parameter values.

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

The values of the parameters ρ, Cp, k, E, R, ΔH, and h are not known as precisely as the parameters fixed by the design of the process. Therefore, some of these parameters need to be estimated to improve the prediction capability of this dynamic model.

Considering that ρ and Cp never appear independently, but only in the form of their product, inline image, and E and R only ever appear as (−E/R), these two expressions are treated as two new parameters rather than as four parameters. With this substitution, there are a total of five parameters that can be considered for estimation. They are renamed and listed in Table 2. It is worth noting that if there are only five parameters in a system then the correlations among these parameters are unlikely to cause significant over-parameterization. However, this simple model is useful for illustrating the utility of the presented technique for parameter set selection under uncertainty. The application of the developed technique to a larger model will be presented in the second case study.

Table 2. Parameters that Need to be Estimated in CSTR Model
Original ParameterkρCp(−E/R)ΔHh
New parameterp1p2p3p4p5
Nominal value2.51590−2551601000

First, an extended ODE model is formulated by combining the original model and the sensitivity equations. Equation 20 is the original CSTR model and Eq. (21) shows the sensitivity equations based on the output for each parameter pi (i = 1, …, 5). For different parameters, the differences between each group of sensitivity equations will be reflected in the last terms shown in Eq. (21), that is, inline image and inline image while the remaining portions of Eq. (21) remain the same for all values of i = 1, …, 5. According to simulations (Figure 3), the system reaches steady state after approximately 10 h, so the sensitivity vector is generated by interpolating the sensitivity variables from zero to 10 h with a step size of 0.1 h. Step 2 in Table 1 is skipped here as there are only five parameters and no further reduction of the parameter space is needed. Assuming that there is no uncertainty, the parameters are grouped in a dendrogram, as shown in Figure 4a, using the complete linkage clustering. The parameters can be grouped into different numbers of clusters, based upon the cutoff value of the cosine distance. For instance, if a cutoff of 0.05 is chosen, the parameters can be grouped into three distinguishable clusters containing {p1, p3, p4}, {p2}, and {p5}. The corresponding parameter subset that should be estimated is {p2, p3, p5} as these are the parameters with the largest norms of the nominal sensitivity vectors in each of the three clusters. If the cutoff value is increased to 0.35, the parameters can now only be grouped into two clusters containing {p1, p2, p3, p4} and {p5}, where p2 and p5 should be estimated.

  • display math(20)
image

Figure 4. Hierarchical clustering of sensitivity cones under different uncertainty in CSTR model.

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

For i = 1, …, 5

  • display math(21)

The cluster number is also influenced by the uncertainty range in the parameter space. The larger the uncertainty range, the shorter the cosine distance between each pair of sensitivity cones will be, thus fewer clusters can be partitioned using the same cutoff value. Furthermore, once two sensitivity cones overlap, the corresponding two parameters cannot be distinguished no matter how small the cutoff value is chosen. In this example, the largest angle of each sensitivity cone for different uncertainty ranges is listed in Table 3, the cosine distance between each pair of sensitivity cones is listed in Table 4, and the dendrograms for different uncertainty ranges are shown in Figure 4. It can be seen from Figure 4 that an increase of the uncertainty range results in the content of the dendrogram being pushed toward zero and the sensitivity cones start to overlap, resulting in clusters of indistinguishable parameters. For example, if 15% uncertainty is considered, the cosine distances between the sensitivity cones of p1, p3, and p4 are zero. Thus, no matter how small the cutoff value is, the parameters can at most be partitioned into three clusters (i.e., {p1, p3, p4}, {p2}, and {p5}). If the uncertainty is increased to 50%, at most two clusters can be partitioned. Therefore, both the cutoff value and the uncertainty range determine the cluster number, and the uncertainty range determines the largest number of clusters that can be partitioned.

Table 3. Maximum Angle (°) of the Sensitivity Cones for Different Uncertainty Ranges
Uncertainty Range (%)p1p2p3p4p5
57.4761.4047.4767.4323.032
1011.961.62112.2311.964.211
1516.721.98517.1416.725.438
2021.522.29221.9421.506.686
2526.142.68826.4726.147.904
3030.543.03230.6730.549.069
5048.917.34347.4948.9413.49
Table 4. Cosine Distance Between Each Pair of Sensitivity Cones (i.e., inline image) for Different Uncertainty Ranges
Uncertainty Range (%)p1p2p1p3p1p4p1p5p2p3p2p4p2p5p3p4p3p5p4p5
00.29720.00200.00000.93360.25540.29720.95960.00200.94290.9336
50.19580.00000.00000.75270.16130.19620.88240.00000.76180.7535
100.14980.00000.00000.65830.11720.14980.85820.00000.66270.6583
150.10610.00000.00000.56210.07780.10610.83080.00000.56400.5621
200.06980.00000.00000.46990.04710.06990.80410.00000.47170.4701
250.04130.00000.00000.38630.02450.04130.77660.00000.38930.3863
300.02100.00000.00000.31270.01020.02100.75100.00000.31780.3127
500.00000.00000.00000.08490.00000.00000.60680.00000.09930.0848

The D-optimality criterion,[31] which is the most popular experimental design criterion, is used in this example to verify the performance of the parameter set selection. This criterion minimizes the volume of the confidence ellipsoid with an arbitrary fixed confidence level for a least-squares estimator. If the uncertainty of parameters is close to zero, then the criterion value is obtained by calculating the nominal sensitivity matrix. However, if a larger uncertainty is considered, for example, 15% or 50%, a Monte Carlo simulation is performed where 10,000 parameter sets are chosen randomly within the uncertainty range and the average criterion value is obtained from calculating the sensitivity matrix for these different parameter values. Therefore, the magnitude of the criterion value reflects the performance of the parameter selection for estimation. As there are only five parameters in this model, it is possible to compare all of the 25−1 different combinations of parameter subsets that are listed in Table 5. There are several trends that can clearly be seen when analyzing the data from Table 5. One trend is that parameter sets that only consist of one or two parameters are hardly affected by the uncertainty. This is in stark contrast to parameter sets with more parameters, where the criterion value reduces significantly for larger uncertainty, for example, estimating all five parameters results in a criterion value that is three orders of magnitude lower for 50% uncertainty than for no uncertainty (the values shown are on a logarithmic scale). What is particularly striking is that this general trend holds for every set that can potentially be estimated from the five parameters under consideration. A second point worth noting is that parameter sets considered good by the procedure introduced in this work are significantly less affected by increases of the uncertainty than other sets with the same number of parameters. Lastly, parameter sets that are found to be optimal by the presented procedure, that is, {p2} for estimating one parameter, {p2, p5} for estimating two parameters, and {p2, p3, p5} for estimating three parameters, are indeed found to have by far the largest criterion values for all sets with the same number of parameters. It is important to note that there is likely no perfect parameter set for estimation of uncertain systems, especially as the choice of an appropriate cutoff value will be part of future work; however, it is important to determine one or more potential sets of parameters which are good candidates for estimation and differentiate those from other sets which would clearly result in worse prediction accuracy.

Table 5. Different Combinations of Parameter Subsets and the Corresponding Criterion Valuesa (the Top Five Choices for Each Uncertainty Level are Highlighted)
Parameter Subset0%15%30%50%
  1. a

    D-optimality criterion: inline image, S is the normalized sensitivity matrix of the subset.

{p1}−6.025−6.025−6.025−6.020
{p2}3.0683.0673.0613.049
{p3}−3.726−3.728−3.733−3.747
{p4}−6.037−6.035−6.032−6.024
{p5}0.30600.30750.31190.3240
{p1, p2}−3.253−3.267−3.309−3.414
{p1, p3}−12.09−12.11−12.18−12.29
{p1, p4}−15.93−16.02−16.61−18.18
{p1, p5}−5.721−5.720−5.716−5.702
{p2, p3}−1.012−1.029−1.085−1.234
{p2, p4}−3.270−3.282−3.321−3.420
{p2, p5}3.3743.3733.3693.362
{p3, p4}−12.22−12.23−12.27−12.34
{p3, p5}−3.421−3.422−3.423−3.429
{p4, p5}−5.732−5.730−5.723−5.706
{p1, p2, p3}−10.29−10.38−10.56−10.87
{p1, p2, p4}−13.32−13.41−14.09−16.59
{p1, p2, p5}−2.955−2.968−3.009−3.108
{p1, p3, p4}−22.39−22.47−23.14−25.62
{p1, p3, p5}−11.79−11.82−11.88−11.99
{p1, p4, p5}−15.64−15.72−16.36−18.71
{p2, p3, p4}−10.70−10.77−10.87−11.06
{p2, p3, p5}−0.7140−0.7308−0.7839−0.9277
{p2, p4, p5}−2.972−2.983−3.020−3.114
{p3, p4, p5}−11.92−11.93−11.97−12.04
{p1, p2, p3, p4}−21.24−21.29−21.98−24.38
{p1, p2, p3, p5}−9.995−10.08−10.25−10.57
{p1, p2, p4, p5}−13.02−13.11−13.79−16.25
{p1, p3, p4, p5}−22.10−22.18−22.84−25.22
{p2, p3, p4, p5}−10.41−10.47−10.58−10.76
{p1, p2, p3, p4, p5}−20.95−20.99−21.68−24.16

IL-6 signaling pathway model

Modeling and analysis of intracellular signaling networks is an important area in systems biology. Signaling pathways initiate essential processes for regulating cell growth, division, apoptosis, or responses to environmental stimuli. These pathways include a large number of components, which detect, amplify, and integrate diverse external signals to generate responses, such as changes in enzyme activity or gene expression. It is infeasible to measure all the components in these pathways which limit the number of parameters that can be estimated. Therefore, the values of most of the kinetic parameters are taken from the literature and contain a significant level of uncertainty. An IL-6 signaling pathway model is used in this case study to illustrate the selection of a subset of uncertain parameters for estimation using limited experimental data. This model consists of 13 state variables and 19 parameters. The IL-6 concentration is the input and the concentration of the transcription factor STAT3 in the nucleus, STAT3N*-STAT3N*, is the only measured output. The mathematical model is described by Eq. 22[32] and the initial values of all states, the value of the input, and the descriptions and nominal values of all the parameters are listed in the Appendix.

The sensitivity matrix is generated by interpolating the sensitivity variables from 0 to 10 h using a 15-minute time step. After Step 2 (shown in Table 1), p1, p3, p5, p6, p14, and p15 are left as the candidates for parameter clustering. The largest angle of each sensitivity cone is presented in Table 5, and dendrograms for different uncertainty ranges obtained using hierarchical clustering for these six candidate parameters are shown in Figure 5.

image

Figure 5. Hierarchical clustering of IL-6 signaling pathway model under uncertainty.

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

It can be seen from Table 6 that with an increase in the uncertainty range, the largest angles of the sensitivity cones of p1, p5, p6, p14, and p15 increase monotonically. For example, the largest angles of the sensitivity cones of p6 and p14 are very close to zero when the uncertainty range is within 5%. This means that perturbation of the entire set of parameters would not cause p6 and p14 to noticeably deviate from their nominal sensitivity vectors. However, the changes of p3 due to the uncertainty are significantly larger than those of the other parameters. When the uncertainty range reaches 30%, the corresponding maximum angle for p3 is 90°, that is, all the parameters have to be partitioned into one cluster that is pairwise indistinguishable. When this happens, a similar dynamic optimization problem is solved while fixing p3 and perturbing all other parameters. However, the solution of this problem still returns a very large angle of the sensitivity cone for p3. This result can be explained if p3 is strongly correlated with other parameters when uncertainty is taken into account.

Table 6. Largest Angle (°) of the Sensitivity Cones for Different Uncertainty Ranges
Uncertainty Range (%)p1p3p5p6p14p15
52.6888.2712.5630.0000.0001.621
105.06217.733.9704.9974.9303.032
156.48631.005.3777.6075.4384.365
207.90461.146.8329.9705.7896.975
3010.5590.009.76913.8810.959.530
4012.9990.0013.1717.8212.4212.37
5019.3090.0017.8622.0015.5315.80

It can be seen from Figure 5, that at most three clusters can be formed if the uncertainty range is 10%. If the cutoff value is chosen to be 0.05, two clusters are partitioned and the corresponding parameters considered for estimation are p1 and p6. In this example, the performance of the parameter set selection is compared by using cross validation with measured output datasets that are randomly partitioned into 10 subsets, nine of which are used for training the parameter for each run, whereas the remaining data set is used to measure the prediction error. The uncertainty range or the bounds on the parameters for estimation is ±10%. There are seven different parameter subsets listed in Table 7. These are chosen for comparison because the subset {p1, p6} is obtained from the presented approach when the uncertainty is 10%. {p1, p5, p6} is the subset for estimation when there is no uncertainty and {p1} is the subset when the uncertainty increases to an extent that all the sensitivity cones overlap and merge into one cluster. {p14, p15} is chosen as a counter example when two parameters are from the same cluster. {p3, p6} is also selected for comparison purposes because p3 has a strong correlation with other parameters as discussed previously. It can be seen from Table 7 that the average prediction errors associated with set {p1, p6}, {p1}, and {p6} are significantly smaller than that of the set {p1, p5, p6} or any of the other sets. The reason for this is that estimating fewer parameters is generally preferable if large uncertainty is considered. The prediction error associated with set {p14, p15} is the largest because the effects of the two parameters are strongly correlated and neither of them can describe the dynamics of the model when there is noise in the experimental data. The reason that the set {p3, p6} has an acceptable performance is that the effect of changes in p3 are strongly correlated with those for p1, p14, p15, and p3 can partly compensate the function of p1.

Table 7. Prediction Error using 10-Fold Cross-Validation (CV) When Different Combinations of Parameter Subsets are being Estimated from Simulated Data
CV Data Set #{p1}{p6}{p1, p6}{p3, p6}{p5, p6}{p14, p15}{p1, p5, p6}
114.8422.7621.7511.9216.82158.020.26
216.5523.5823.9164.7313.8340.7852.85
322.7330.0320.2552.7018.36174.019.14
421.0621.0420.7622.7931.1026.0421.96
518.4016.449.8526.4421.4215.6629.31
619.5818.5222.9314.4320.6665.6720.56
721.3814.5017.62137.4278.921.6176.03
844.3115.6420.7717.0721.4419.5717.98
919.5339.8219.6970.34112.8307.722.80
1015.4526.0322.4013.1433.9725.33150.2
Mean21.3822.8319.9943.0956.9385.4343.11

This example illustrates the utility of the presented approach by comparing the performance of estimating different sets of parameters using cross-validation. The results returned by the presented method are reasonable, while choosing other parameter sets can result in varying performance. It should be noted that the cross validation would not be performed in practice; it was simply used here to illustrate that the sets of parameters determined for estimation indeed perform better than other sets of parameters. Furthermore, both of the case studies illustrate that the number of parameters that can be estimated is affected by the magnitude of the uncertainty. Although a uniform distribution of the parameters over their uncertainty intervals is assumed in these examples, other uncertainty distributions can also be implemented within the presented framework.

Conclusions

  1. Top of page
  2. Abstract
  3. Introduction
  4. Preliminaries
  5. Parameter Set Selection for Dynamic System Under Uncertainty
  6. Case Studies
  7. Conclusions
  8. Acknowledgments
  9. Literature Cited
  10. Appendix:  

This work presented an approach for parameter set selection for dynamic systems under uncertainty. The technique extends existing methods that are based upon clustering of parameters according to their sensitivity vectors. The extension is made by realizing that the sensitivity vectors vary due to the uncertainty and that a sensitivity cone can be defined, where all sensitivity vectors of a parameter computed for different parameter values lie within the sensitivity cone. Computation of the sensitivity cones is nontrivial, and this article presents a dynamic optimization technique for doing so. The angle between two sensitivity cones, each one corresponding to the sensitivity of a different parameter, can be used as an indicator of the similarity of the effect that changes in the parameter have on the outputs under uncertainty. The cosine distance corresponding to this angle is used for clustering the sensitivity cones. The main advantage of this approach is that existing techniques for parameter set selection can be used, for example, hierarchical clustering, while at the same time the effect of uncertainty on the sensitivity vectors is incorporated into the procedure by determining sensitivity cones.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Introduction
  4. Preliminaries
  5. Parameter Set Selection for Dynamic System Under Uncertainty
  6. Case Studies
  7. Conclusions
  8. Acknowledgments
  9. Literature Cited
  10. Appendix:  

The authors gratefully acknowledge partial financial support by the National Science Foundation (Grant CBET#0941313) and the American Chemical Society (ACS-PRF#50978-ND9).

Literature Cited

  1. Top of page
  2. Abstract
  3. Introduction
  4. Preliminaries
  5. Parameter Set Selection for Dynamic System Under Uncertainty
  6. Case Studies
  7. Conclusions
  8. Acknowledgments
  9. Literature Cited
  10. Appendix:  
  • 1
    Brun R, Reichert P, Künsch HR. Practical identifiability analysis of large environmental simulation models. Water Resour Res. 2001;37(4):10151030.
  • 2
    Hiskens IA. Nonlinear dynamic model evaluation from disturbance measurements. IEEE Trans Power Syst. 2001;16(4):702710.
  • 3
    Dai W, Word DP, Hahn J. Modeling and dynamic optimization of fuel-grade ethanol fermentation using fed-batch process. Control Eng Pract. In Press.
  • 4
    Gadkar KG, Varner J, Doyle FJ III. Model identification of signal transduction networks from data using a state regulator problem. IET Proceedings on Systems Biology, 2005.
  • 5
    Gernaey KV, Gani R. A model-based systems approach to pharmaceutical product-process design and analysis. Chem Eng Sci. 2010;65(21):57575769.
  • 6
    Poyton A, Varziri MS, McAuley KB, McLellan P, Ramsay JO. Parameter estimation in continuous-time dynamic models using principal differential analysis. Comput Chem Eng. 2006;30(4):698708.
  • 7
    Ramsay JO, Hooker G, Campbell D, Cao J. Parameter estimation for differential equations: a generalized smoothing approach. J R Stat Soc Ser B (Stat Methodol). 2007;69(5):741796.
  • 8
    Lin Y, Stadtherr MA. Deterministic global optimization for parameter estimation of dynamic systems. Ind Eng Chem Res. 2006;45(25):84388448.
  • 9
    Tjoa IB, Biegler LT. Simultaneous solution and optimization strategies for parameter estimation of differential-algebraic equation systems. Ind Eng Chem Res. 1991;30(2):376385.
  • 10
    Chu Y, Hahn J. Parameter set selection for estimation of nonlinear dynamic systems. AIChE J. 2007;53(11):28582870.
  • 11
    Chu Y, Hahn J. Integrating parameter selection with experimental design under uncertainty for nonlinear dynamic systems. AIChE J. 2008;54(9):23102320.
  • 12
    Velez-Reyes M, Verghese G. Subset selection in identification, and application to speed and parameter estimation for induction machines. Proceedings of the 4th IEEE Conference on IEEE Control Applications, 1995, Albany, NY, USA, 1995.
  • 13
    Lund BF, Foss BA. Parameter ranking by orthogonalization—applied to nonlinear mechanistic models. Automatica. 2008;44(1):278281.
  • 14
    Chu Y, Hahn J. Parameter set selection via clustering of parameters into pairwise indistinguishable groups of parameters. Ind Eng Chem Res. 2008;48(13):60006009.
  • 15
    Walter E, Pronzato L. Qualitative and quantitative experiment design for phenomenological models—a survey. Automatica. 1990;26(2):195213.
  • 16
    Chu Y, Hahn J. Quantitative optimal experimental design using global sensitivity analysis via quasi-linearization. Ind Eng Chem Res. 2010;49(17):77827794.
  • 17
    Morris MD. Factorial sampling plans for preliminary computational experiments. Technometrics. 1991;33(2):161174.
  • 18
    Hornberger GM, Spear R. Approach to the preliminary analysis of environmental systems. J Environ Manag. 1981;12(1):718.
  • 19
    Atherton R, Schainker R, Ducot E. On the statistical sensitivity analysis of models for chemical kinetics. AIChE J. 1975;21(3):441448.
  • 20
    Dickinson RP, Gelinas RJ. Sensitivity analysis of ordinary differential equation systems—a direct method. J Comput Phys. 1976;21(2):123143.
  • 21
    Griewank A. A mathematical view of automatic differentiation. Acta Numer. 2003;12(1):321398.
  • 22
    Feehery WF, Tolsma JE, Barton PI. Efficient sensitivity analysis of large-scale differential-algebraic systems. Appl Numer Math. 1997;25(1):4154.
  • 23
    Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv (CSUR). 1999;31(3):264323.
  • 24
    van der Laan MJ, Pollard KS. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. J Stat Plan Inference. 2003;117(2):275303.
  • 25
    Zhao Y, Karypis G. Evaluation of hierarchical clustering algorithms for document datasets. Proceedings of Eleventh International Conference on Information and Knowledge Management, McLean, VA, USA, 2002.
  • 26
    Vassiliadis V, Sargent R, Pantelides C. Solution of a class of multistage dynamic optimization problems. 1. Problems without path constraints. Ind Eng Chem Res. 1994;33(9):21112122.
  • 27
    Kameswaran S, Biegler LT. Simultaneous dynamic optimization strategies: recent advances and challenges. Comput Chem Eng. 2006;30(10):15601575.
  • 28
    Fourer R, Gay D, Kernighan B. AMPL: a modeling language for mathematical programming. Duxbury Press, 2002.
  • 29
    Wächter A, Biegler LT. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math Program. 2006;106(1):2557.
  • 30
    HSL2013. Collection of Fortran codes for large-scale scientific computation. 2013. Available at: http://www.hsl.rl.ac.uk. Last accessed on April 1, 2013.
  • 31
    John RS, Draper NR. D-optimality for regression designs: a review. Technometrics. 1975;17(1):1523.
  • 32
    Huang Z, Chu Y, Hahn J. Model simplification procedure for signal transduction pathway models: an application to IL-6 signaling. Chem Eng Sci. 2010;65(6):19641975.

Appendix:  

  1. Top of page
  2. Abstract
  3. Introduction
  4. Preliminaries
  5. Parameter Set Selection for Dynamic System Under Uncertainty
  6. Case Studies
  7. Conclusions
  8. Acknowledgments
  9. Literature Cited
  10. Appendix:  
Table A1. Nominal Parameter Values and Initial Values of CSTR Model
Nominal Parameter Values
TfFeed temperature20°C
CAfFeed composition2500 mol/m3
ρFluid density1025 kg/m3
ΔHHeat of reaction160 kJ/mol
E/RActivation energy255 K
KPre-exponential factor2.5h−1
TcfCoolant inlet temperature10°C
ρcCoolant density1000 kg/m3
HHeat transfer coefficient1000 Wm−2/C
FFeed flow rate0.1 m3/h
FcCoolant flow rate0.15 m3/h
VReactor volume0.2 m3
VcCooling jacket volume0.055 m3
AHeat transfer area4.5 m2
CpcCoolant heat capacity1.20 kJ/kg/C
CpFluid heat capacity1.55 kJ/kg/C
Initial Values
CA0Initial composition1000 mol/m3
T0Initial reactor temperature20°C
Tc0Initial coolant temperature20°C
Table A2. IL-6 Signaling Pathway Model
inline image(22)
Table A3. Initial Values of IL-6 Signaling Pathway Model
x1 inline image0 (nM)
x2STAT3C1000
x3 inline image0
x4STAT3N*-STAT3N*0
x5SOCS30
x6 inline image0
x7SHP2100
x8 inline image0
x9Erk-pp0
x10 inline image16468
x11(IL6-gp80-gp130-JAK)20
x12C/EBPβi40.493
x13C/EBPβn0
uIL-6 (input)3.83
RReceptor4
Table A4. Nominal Parameter Values of IL-6 Signaling Pathway Model
p1Forward Rate Constant for Reaction #12.336e − 005
p2Backward rate constant for Reaction #10.002
p3Forward rate constant for Reaction #20.0138
p4Backward rate constant for Reaction #21.502
p5Forward rate constant for Reaction #30.273
p6Forward rate constant for Reaction #43.282e − 004
p7Maximum rate for Reaction #50.023
p8Time delay for Reaction #51290
p9Michaelis–Menten constant for Reaction #550.6
p10Forward rate constant for Reaction #62.067e − 004
p11Forward rate constant for Reaction #716.52
p12Backward rate constant for Reaction #70.04
p13Forward rate constant for Reaction #80.0023
p14Forward rate constant for Reaction #94.059e − 004
p15Backward rate constant for Reaction #95.086e − 004
p16Maximum rate constant for Reaction #1016.00
p17Michaelis–Menten constant for Reaction #105.115e + 003
p18Forward rate constant for Reaction #111.198e − 005
p19Forward rate constant for Reaction #121.0e − 006