CSTR model
This model describes an exothermic CSTR in which a firstorder reaction, A B, is taking place
 (18)
The model is described by the following differential equations
 (19)
The three states of the system are the concentration of component A, the temperature inside the reactor, and the temperature of the cooling jacket. The temperature of the reactor and the cooling jacket are measured as the outputs. There are 16 parameters in the model and their nominal values are taken from the reference[10] and are listed in a table in the Appendix. Figure 3 shows a simulation of the system using nominal parameter values and the initial conditions listed in the Appendix.
The values of the parameters ρ, C_{p}, k, E, R, ΔH, and h are not known as precisely as the parameters fixed by the design of the process. Therefore, some of these parameters need to be estimated to improve the prediction capability of this dynamic model.
Considering that ρ and C_{p} never appear independently, but only in the form of their product, , and E and R only ever appear as (−E/R), these two expressions are treated as two new parameters rather than as four parameters. With this substitution, there are a total of five parameters that can be considered for estimation. They are renamed and listed in Table 2. It is worth noting that if there are only five parameters in a system then the correlations among these parameters are unlikely to cause significant overparameterization. However, this simple model is useful for illustrating the utility of the presented technique for parameter set selection under uncertainty. The application of the developed technique to a larger model will be presented in the second case study.
Table 2. Parameters that Need to be Estimated in CSTR ModelOriginal Parameter  k  ρC_{p}  (−E/R)  ΔH  h 
New parameter  p_{1}  p_{2}  p_{3}  p_{4}  p_{5} 
Nominal value  2.5  1590  −255  160  1000 
First, an extended ODE model is formulated by combining the original model and the sensitivity equations. Equation 20 is the original CSTR model and Eq. (21) shows the sensitivity equations based on the output for each parameter p_{i} (i = 1, …, 5). For different parameters, the differences between each group of sensitivity equations will be reflected in the last terms shown in Eq. (21), that is, and while the remaining portions of Eq. (21) remain the same for all values of i = 1, …, 5. According to simulations (Figure 3), the system reaches steady state after approximately 10 h, so the sensitivity vector is generated by interpolating the sensitivity variables from zero to 10 h with a step size of 0.1 h. Step 2 in Table 1 is skipped here as there are only five parameters and no further reduction of the parameter space is needed. Assuming that there is no uncertainty, the parameters are grouped in a dendrogram, as shown in Figure 4a, using the complete linkage clustering. The parameters can be grouped into different numbers of clusters, based upon the cutoff value of the cosine distance. For instance, if a cutoff of 0.05 is chosen, the parameters can be grouped into three distinguishable clusters containing {p_{1}, p_{3}, p_{4}}, {p_{2}}, and {p_{5}}. The corresponding parameter subset that should be estimated is {p_{2}, p_{3}, p_{5}} as these are the parameters with the largest norms of the nominal sensitivity vectors in each of the three clusters. If the cutoff value is increased to 0.35, the parameters can now only be grouped into two clusters containing {p_{1}, p_{2}, p_{3}, p_{4}} and {p_{5}}, where p_{2} and p_{5} should be estimated.
 (20)
For i = 1, …, 5
 (21)
The cluster number is also influenced by the uncertainty range in the parameter space. The larger the uncertainty range, the shorter the cosine distance between each pair of sensitivity cones will be, thus fewer clusters can be partitioned using the same cutoff value. Furthermore, once two sensitivity cones overlap, the corresponding two parameters cannot be distinguished no matter how small the cutoff value is chosen. In this example, the largest angle of each sensitivity cone for different uncertainty ranges is listed in Table 3, the cosine distance between each pair of sensitivity cones is listed in Table 4, and the dendrograms for different uncertainty ranges are shown in Figure 4. It can be seen from Figure 4 that an increase of the uncertainty range results in the content of the dendrogram being pushed toward zero and the sensitivity cones start to overlap, resulting in clusters of indistinguishable parameters. For example, if 15% uncertainty is considered, the cosine distances between the sensitivity cones of p_{1}, p_{3}, and p_{4} are zero. Thus, no matter how small the cutoff value is, the parameters can at most be partitioned into three clusters (i.e., {p_{1}, p_{3}, p_{4}}, {p_{2}}, and {p_{5}}). If the uncertainty is increased to 50%, at most two clusters can be partitioned. Therefore, both the cutoff value and the uncertainty range determine the cluster number, and the uncertainty range determines the largest number of clusters that can be partitioned.
Table 3. Maximum Angle (°) of the Sensitivity Cones for Different Uncertainty RangesUncertainty Range (%)  p_{1}  p_{2}  p_{3}  p_{4}  p_{5} 

5  7.476  1.404  7.476  7.432  3.032 
10  11.96  1.621  12.23  11.96  4.211 
15  16.72  1.985  17.14  16.72  5.438 
20  21.52  2.292  21.94  21.50  6.686 
25  26.14  2.688  26.47  26.14  7.904 
30  30.54  3.032  30.67  30.54  9.069 
50  48.91  7.343  47.49  48.94  13.49 
Table 4. Cosine Distance Between Each Pair of Sensitivity Cones (i.e., ) for Different Uncertainty RangesUncertainty Range (%)  p_{1}–p_{2}  p_{1}–p_{3}  p_{1}–p_{4}  p_{1}–p_{5}  p_{2}–p_{3}  p_{2}–p_{4}  p_{2}–p_{5}  p_{3}–p_{4}  p_{3}–p_{5}  p_{4}–p_{5} 

0  0.2972  0.0020  0.0000  0.9336  0.2554  0.2972  0.9596  0.0020  0.9429  0.9336 
5  0.1958  0.0000  0.0000  0.7527  0.1613  0.1962  0.8824  0.0000  0.7618  0.7535 
10  0.1498  0.0000  0.0000  0.6583  0.1172  0.1498  0.8582  0.0000  0.6627  0.6583 
15  0.1061  0.0000  0.0000  0.5621  0.0778  0.1061  0.8308  0.0000  0.5640  0.5621 
20  0.0698  0.0000  0.0000  0.4699  0.0471  0.0699  0.8041  0.0000  0.4717  0.4701 
25  0.0413  0.0000  0.0000  0.3863  0.0245  0.0413  0.7766  0.0000  0.3893  0.3863 
30  0.0210  0.0000  0.0000  0.3127  0.0102  0.0210  0.7510  0.0000  0.3178  0.3127 
50  0.0000  0.0000  0.0000  0.0849  0.0000  0.0000  0.6068  0.0000  0.0993  0.0848 
The Doptimality criterion,[31] which is the most popular experimental design criterion, is used in this example to verify the performance of the parameter set selection. This criterion minimizes the volume of the confidence ellipsoid with an arbitrary fixed confidence level for a leastsquares estimator. If the uncertainty of parameters is close to zero, then the criterion value is obtained by calculating the nominal sensitivity matrix. However, if a larger uncertainty is considered, for example, 15% or 50%, a Monte Carlo simulation is performed where 10,000 parameter sets are chosen randomly within the uncertainty range and the average criterion value is obtained from calculating the sensitivity matrix for these different parameter values. Therefore, the magnitude of the criterion value reflects the performance of the parameter selection for estimation. As there are only five parameters in this model, it is possible to compare all of the 2^{5}−1 different combinations of parameter subsets that are listed in Table 5. There are several trends that can clearly be seen when analyzing the data from Table 5. One trend is that parameter sets that only consist of one or two parameters are hardly affected by the uncertainty. This is in stark contrast to parameter sets with more parameters, where the criterion value reduces significantly for larger uncertainty, for example, estimating all five parameters results in a criterion value that is three orders of magnitude lower for 50% uncertainty than for no uncertainty (the values shown are on a logarithmic scale). What is particularly striking is that this general trend holds for every set that can potentially be estimated from the five parameters under consideration. A second point worth noting is that parameter sets considered good by the procedure introduced in this work are significantly less affected by increases of the uncertainty than other sets with the same number of parameters. Lastly, parameter sets that are found to be optimal by the presented procedure, that is, {p_{2}} for estimating one parameter, {p_{2}, p_{5}} for estimating two parameters, and {p_{2}, p_{3}, p_{5}} for estimating three parameters, are indeed found to have by far the largest criterion values for all sets with the same number of parameters. It is important to note that there is likely no perfect parameter set for estimation of uncertain systems, especially as the choice of an appropriate cutoff value will be part of future work; however, it is important to determine one or more potential sets of parameters which are good candidates for estimation and differentiate those from other sets which would clearly result in worse prediction accuracy.
Table 5. Different Combinations of Parameter Subsets and the Corresponding Criterion Valuesa (the Top Five Choices for Each Uncertainty Level are Highlighted)Parameter Subset  0%  15%  30%  50% 


{p_{1}}  −6.025  −6.025  −6.025  −6.020 
{p_{2}}  3.068  3.067  3.061  3.049 
{p_{3}}  −3.726  −3.728  −3.733  −3.747 
{p_{4}}  −6.037  −6.035  −6.032  −6.024 
{p_{5}}  0.3060  0.3075  0.3119  0.3240 
{p_{1}, p_{2}}  −3.253  −3.267  −3.309  −3.414 
{p_{1}, p_{3}}  −12.09  −12.11  −12.18  −12.29 
{p_{1}, p_{4}}  −15.93  −16.02  −16.61  −18.18 
{p_{1}, p_{5}}  −5.721  −5.720  −5.716  −5.702 
{p_{2}, p_{3}}  −1.012  −1.029  −1.085  −1.234 
{p_{2}, p_{4}}  −3.270  −3.282  −3.321  −3.420 
{p_{2}, p_{5}}  3.374  3.373  3.369  3.362 
{p_{3}, p_{4}}  −12.22  −12.23  −12.27  −12.34 
{p_{3}, p_{5}}  −3.421  −3.422  −3.423  −3.429 
{p_{4}, p_{5}}  −5.732  −5.730  −5.723  −5.706 
{p_{1}, p_{2}, p_{3}}  −10.29  −10.38  −10.56  −10.87 
{p_{1}, p_{2}, p_{4}}  −13.32  −13.41  −14.09  −16.59 
{p_{1}, p_{2}, p_{5}}  −2.955  −2.968  −3.009  −3.108 
{p_{1}, p_{3}, p_{4}}  −22.39  −22.47  −23.14  −25.62 
{p_{1}, p_{3}, p_{5}}  −11.79  −11.82  −11.88  −11.99 
{p_{1}, p_{4}, p_{5}}  −15.64  −15.72  −16.36  −18.71 
{p_{2}, p_{3}, p_{4}}  −10.70  −10.77  −10.87  −11.06 
{p_{2}, p_{3}, p_{5}}  −0.7140  −0.7308  −0.7839  −0.9277 
{p_{2}, p_{4}, p_{5}}  −2.972  −2.983  −3.020  −3.114 
{p_{3}, p_{4}, p_{5}}  −11.92  −11.93  −11.97  −12.04 
{p_{1}, p_{2}, p_{3}, p_{4}}  −21.24  −21.29  −21.98  −24.38 
{p_{1}, p_{2}, p_{3}, p_{5}}  −9.995  −10.08  −10.25  −10.57 
{p_{1}, p_{2}, p_{4}, p_{5}}  −13.02  −13.11  −13.79  −16.25 
{p_{1}, p_{3}, p_{4}, p_{5}}  −22.10  −22.18  −22.84  −25.22 
{p_{2}, p_{3}, p_{4}, p_{5}}  −10.41  −10.47  −10.58  −10.76 
{p_{1}, p_{2}, p_{3}, p_{4}, p_{5}}  −20.95  −20.99  −21.68  −24.16 
IL6 signaling pathway model
Modeling and analysis of intracellular signaling networks is an important area in systems biology. Signaling pathways initiate essential processes for regulating cell growth, division, apoptosis, or responses to environmental stimuli. These pathways include a large number of components, which detect, amplify, and integrate diverse external signals to generate responses, such as changes in enzyme activity or gene expression. It is infeasible to measure all the components in these pathways which limit the number of parameters that can be estimated. Therefore, the values of most of the kinetic parameters are taken from the literature and contain a significant level of uncertainty. An IL6 signaling pathway model is used in this case study to illustrate the selection of a subset of uncertain parameters for estimation using limited experimental data. This model consists of 13 state variables and 19 parameters. The IL6 concentration is the input and the concentration of the transcription factor STAT3 in the nucleus, STAT3N*STAT3N*, is the only measured output. The mathematical model is described by Eq. 22[32] and the initial values of all states, the value of the input, and the descriptions and nominal values of all the parameters are listed in the Appendix.
The sensitivity matrix is generated by interpolating the sensitivity variables from 0 to 10 h using a 15minute time step. After Step 2 (shown in Table 1), p_{1}, p_{3}, p_{5}, p_{6}, p_{14}, and p_{15} are left as the candidates for parameter clustering. The largest angle of each sensitivity cone is presented in Table 5, and dendrograms for different uncertainty ranges obtained using hierarchical clustering for these six candidate parameters are shown in Figure 5.
It can be seen from Table 6 that with an increase in the uncertainty range, the largest angles of the sensitivity cones of p_{1}, p_{5}, p_{6}, p_{14}, and p_{15} increase monotonically. For example, the largest angles of the sensitivity cones of p_{6} and p_{14} are very close to zero when the uncertainty range is within 5%. This means that perturbation of the entire set of parameters would not cause p_{6} and p_{14} to noticeably deviate from their nominal sensitivity vectors. However, the changes of p_{3} due to the uncertainty are significantly larger than those of the other parameters. When the uncertainty range reaches 30%, the corresponding maximum angle for p_{3} is 90°, that is, all the parameters have to be partitioned into one cluster that is pairwise indistinguishable. When this happens, a similar dynamic optimization problem is solved while fixing p_{3} and perturbing all other parameters. However, the solution of this problem still returns a very large angle of the sensitivity cone for p_{3}. This result can be explained if p_{3} is strongly correlated with other parameters when uncertainty is taken into account.
Table 6. Largest Angle (°) of the Sensitivity Cones for Different Uncertainty RangesUncertainty Range (%)  p_{1}  p_{3}  p_{5}  p_{6}  p_{14}  p_{15} 

5  2.688  8.271  2.563  0.000  0.000  1.621 
10  5.062  17.73  3.970  4.997  4.930  3.032 
15  6.486  31.00  5.377  7.607  5.438  4.365 
20  7.904  61.14  6.832  9.970  5.789  6.975 
30  10.55  90.00  9.769  13.88  10.95  9.530 
40  12.99  90.00  13.17  17.82  12.42  12.37 
50  19.30  90.00  17.86  22.00  15.53  15.80 
It can be seen from Figure 5, that at most three clusters can be formed if the uncertainty range is 10%. If the cutoff value is chosen to be 0.05, two clusters are partitioned and the corresponding parameters considered for estimation are p_{1} and p_{6}. In this example, the performance of the parameter set selection is compared by using cross validation with measured output datasets that are randomly partitioned into 10 subsets, nine of which are used for training the parameter for each run, whereas the remaining data set is used to measure the prediction error. The uncertainty range or the bounds on the parameters for estimation is ±10%. There are seven different parameter subsets listed in Table 7. These are chosen for comparison because the subset {p_{1}, p_{6}} is obtained from the presented approach when the uncertainty is 10%. {p_{1}, p_{5}, p_{6}} is the subset for estimation when there is no uncertainty and {p_{1}} is the subset when the uncertainty increases to an extent that all the sensitivity cones overlap and merge into one cluster. {p_{14}, p_{15}} is chosen as a counter example when two parameters are from the same cluster. {p_{3}, p_{6}} is also selected for comparison purposes because p_{3} has a strong correlation with other parameters as discussed previously. It can be seen from Table 7 that the average prediction errors associated with set {p_{1}, p_{6}}, {p_{1}}, and {p_{6}} are significantly smaller than that of the set {p_{1}, p_{5}, p_{6}} or any of the other sets. The reason for this is that estimating fewer parameters is generally preferable if large uncertainty is considered. The prediction error associated with set {p_{14}, p_{15}} is the largest because the effects of the two parameters are strongly correlated and neither of them can describe the dynamics of the model when there is noise in the experimental data. The reason that the set {p_{3}, p_{6}} has an acceptable performance is that the effect of changes in p_{3} are strongly correlated with those for p_{1}, p_{14}, p_{15}, and p_{3} can partly compensate the function of p_{1}.
Table 7. Prediction Error using 10Fold CrossValidation (CV) When Different Combinations of Parameter Subsets are being Estimated from Simulated DataCV Data Set #  {p_{1}}  {p_{6}}  {p_{1}, p_{6}}  {p_{3}, p_{6}}  {p_{5}, p_{6}}  {p_{14}, p_{15}}  {p_{1}, p_{5}, p_{6}} 

1  14.84  22.76  21.75  11.92  16.82  158.0  20.26 
2  16.55  23.58  23.91  64.73  13.83  40.78  52.85 
3  22.73  30.03  20.25  52.70  18.36  174.0  19.14 
4  21.06  21.04  20.76  22.79  31.10  26.04  21.96 
5  18.40  16.44  9.85  26.44  21.42  15.66  29.31 
6  19.58  18.52  22.93  14.43  20.66  65.67  20.56 
7  21.38  14.50  17.62  137.4  278.9  21.61  76.03 
8  44.31  15.64  20.77  17.07  21.44  19.57  17.98 
9  19.53  39.82  19.69  70.34  112.8  307.7  22.80 
10  15.45  26.03  22.40  13.14  33.97  25.33  150.2 
Mean  21.38  22.83  19.99  43.09  56.93  85.43  43.11 
This example illustrates the utility of the presented approach by comparing the performance of estimating different sets of parameters using crossvalidation. The results returned by the presented method are reasonable, while choosing other parameter sets can result in varying performance. It should be noted that the cross validation would not be performed in practice; it was simply used here to illustrate that the sets of parameters determined for estimation indeed perform better than other sets of parameters. Furthermore, both of the case studies illustrate that the number of parameters that can be estimated is affected by the magnitude of the uncertainty. Although a uniform distribution of the parameters over their uncertainty intervals is assumed in these examples, other uncertainty distributions can also be implemented within the presented framework.