SEARCH

SEARCH BY CITATION

Keywords:

  • process control;
  • soft sensor;
  • degradation;
  • adaptive model;
  • database;
  • monitoring

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Method
  5. Results and Discussion
  6. Conclusion
  7. Acknowledgments
  8. Appendix: A
  9. Literature Cited

In chemical plants, soft sensors are used to predict difficult-to-measure process variables. Soft sensor models must adapt to process changes by using new measured data. However, when a model is reconstructed with data that have low variation, the model cannot predict abrupt changes of process characteristics. The predictive performance of adaptive models depends on databases. We therefore propose an index to monitor database, that is, database monitoring index (DMI), and a database monitoring method using the DMI. The DMI is based on similarity between two data. The more similar two data are the smaller value the DMI has. New data are stored when the minimum DMI-value of the data exceeds a threshold. Through the analysis of simulation data and real industrial data, we confirmed that databases can be appropriately managed and the predictive accuracy of adaptive soft sensor models increased by using the proposed method. © 2013 American Institute of Chemical Engineers AIChE J, 60: 160–169, 2014


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Method
  5. Results and Discussion
  6. Conclusion
  7. Acknowledgments
  8. Appendix: A
  9. Literature Cited

Soft sensors are widely used to predict process variables that are difficult to measure online.[1] An inferential model is constructed between the variables that are easy to measure online and those that are not, and an objective variable, y, is then predicted using that model.

Their use, however, is accompanied by some practical difficulties. One of these difficulties is the degradation of the soft sensor models. The predictive accuracy of soft sensors tends to decrease gradually for several reasons, including changes in the state of the chemical plant, catalyzing performance loss, and sensor and process drift. This is called as the degradation of soft sensor models.

It is strongly desired to solve the degradation of a soft sensor model. To reduce the degradation, the model is reconstructed with newest data. A moving window (MW) model[2, 3] and a recursive model[4] are categorized as a sequentially updating type and a distance-based just-in-time (JIT) model,[5] a correlation-based JIT model,[6] and a locally weighted partial least-square model[7] are categorized as a JIT type. For example, an MW model is constructed with data that are measured most recently, and a distance-based JIT model is constructed with data whose similarity with prediction data are higher than those of other data. The indexes such as the Euclidian distance and correlation are applied to the similarity. For the reconstruction of the model in MW models and JIT models, new data are stored in the database and the data regarded as old are deleted from the database.

Problems of reconstructing a model such as the incorporation of abnormal data with training data and an increase of maintenance costs were discussed, and then, a model based on the time difference (TD) of y and that of explanatory variables, X, was proposed.[8-10] This model is referred to as a TD model. The effects of deterioration with age such as the drift and gradual changes in the state of a plant can be handled by using a TD model without reconstruction of the model. The models such as MW, JIT, and TD models that can predict y-values while adapting to states of a plant are called adaptive models.[11] In addition, when data distributions are multimodal, multiple modeling approaches[12, 13] can be combined with adaptive models.

There are no adaptive models having high predictive ability in all process states, and the prediction accuracy of each adaptive model depends on a process state.[14] Kaneko et al. categorized the degradation of a soft sensor model and discussed characteristics of adaptive models, such as MW, JIT, and TD models, based on the classification results, and confirmed the discussion results through the numerical simulation data and real industrial data analyses.[15] The predictive accuracy of TD models was high when the shift of X-values or y-values occurred and this is true regardless of the rapidity of the degradation. Meanwhile, when the slope of X and y changes rapidly, the predictive model can be constructed by the JIT method if there is the shift of X-values. However, if there is no shift of X-values, JIT models cannot adapt to the degradation. On the other hand, the MW models based on the support vector regression (SVR) method, which is one of the nonlinear regression methods, and the time variable can adapt to rapid changes of the slope of X and y even without shift of X-values (Kaneko and Funatsu. Adaptive soft sensor model using online support vector regression and the time variable. AIChE J. submitted).

While the appropriate use of the MW and JIT models enables soft sensors to adapt to the changes of the relationship between X and y, there remain some problems for the introduction of soft sensors into practice. One of the problems is that reconstructed models have a high tendency to specialize in predictions over a narrow data range. Subsequently, when variations in the process variables occur, these models cannot predict the resulting variations in data with a high degree of accuracy. However, if the model is not reconstructed frequently, the predictive ability of the model decreases due to the slow change of process states such as process and sensor drifts.

Therefore, in this study, to construct adaptive models with high predictive accuracy for wide data range, we monitor database appropriately. When the number of data in a database is too large, it takes much time to construct MW models and JIT models. Hence, it is required to judge whether new measured data should be stored in a database or not. Data having much information should be included in the database, but data with little information are not required in the database. Although the selection of new measurement data to store in the database in JIT modeling has been applied based on the prediction error in order to reduce the amount of the database,[16, 17] to the best of our knowledge, there is as yet no research that manages database based on the amount of information of data.

We are developing the database monitoring index (DMI) and the database managing method with the DMI.[18] The DMI is an index based on similarity between two data, and is defined as the ratio of absolute difference of y divided by similarity of X. The more similar two data are, the smaller value the DMI has.

When new data are measured, the DMI-values are calculated between the new data and all data in a database. The data whose minimum value of the DMI-values is large has much information. Therefore, by storing only such data, the amount of information can increase while controlling the number of data in the database. The DMI enables a database to be managed with the consideration of not only the shifts of X and y but also the change of the slope between X and y. In addition, the proposed data managing method can be combined with not only the adaptive soft sensor models[19, 20] but also the process monitoring models[21, 22] that are updated or reconstructed with database including new measurement data.

To verify the effectiveness of the proposed method, we analyze simulation data where the relationship between X and y is nonlinear and data variation is small for a constant time. The drift of y is also considered. Then, the proposed method is applied to real industrial data of a distillation column. The management of databases using the DMI makes it possible for adaptive soft sensor models to adapt to rapid changes of process characteristics after long states of small variations in X and y. In this study, it is assumed that there are no abnormal data and no outliers in new measurement data. In practice, therefore, abnormal data and outliers must be detected first by using multivariate statistical process control methods.[23]

Method

  1. Top of page
  2. Abstract
  3. Introduction
  4. Method
  5. Results and Discussion
  6. Conclusion
  7. Acknowledgments
  8. Appendix: A
  9. Literature Cited

DMI

The DMI proposed in this article for managing databases is defined between two data, (xi, yi) and (xj, yj), as follows

  • display math(1)

where sim(xi, xj) is similarity between xi and xj, and a is a constant. For example, inverse of Euclidian distance and Mahalanobis distance, correlation, Gaussian kernel (GK) are used as similarity. The inverse of Euclidian distance (iED) is given as follows

  • display math(2)

and GK is given as follows

  • display math(3)

where γ is a tuning parameter controlling the width of the kernel function. In our application, the similarity index is GK. The DMI used in this study is given as follows

  • display math(4)

The basic concept of the DMI is shown in Figure 1. The DMI-values are large when two data of X and y are dissimilar, whereas the DMI-values are small when two data of X and y is similar. Although the dissimilarity used in process monitoring[24, 25] is calculated between a data set and another data set, the proposed DMI is calculated between a datum and another datum. Therefore, the DMI can quantify data similarity even for nonlinear and non-Gaussian processes.

image

Figure 1. The basic concept of the DMI.

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

The weight of y for X can be changed by varying the a-value in Eqs. (1) and (4). Figure 2 shows the relationship between |yi − yj| and |yi − yj|a. The smaller a-values compared with 1 mean the larger values of |yi − yj|a and DMI-values for small variations of y such as process and sensor drifts. However, much care should be taken because the DMI is sensitive to noise, which is one of the small variations. On the other hand, when the a-values are larger than 1, the DMI-values are small for small variations of y. Although the DMI has small values in the drift of y, the DMI is not much affected by noise and is robust over noise. The DMI can be discussed by changing a-values to be appropriate for process states and suitable to objective of the database management.

image

Figure 2. The relationships between |yiyj| and |yiyj|a.[18]

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

The GK used in this study is one of the similarity indexes and other similarity indexes can be applied to the DMI. Of course, the curse of dimensionality must be handled sometimes.

Database management with DMI

The flow of the proposed database management is shown in Figure 3. When new data is given, the DMI-values are calculated between the new data and all data included in the database. If the minimum value of the DMI-values exceeds the threshold PDMI, the new data are stored in the database. If not, the new data are not stored in the database or the new data replaces the data in the database that has the minimum DMI-value. Then, if the size of the database exceeds the upper limit, data are deleted from the database. If the current database has the necessary and sufficient data, it is reasonable that the oldest data are eliminated as the minimum DMI values for all the data in the database exceed the PDMI at least and the information loss is not so different in the elimination of each data. In addition, the process state where the oldest data were measured would be different from the process state where the new data were measured. For example, accurate JIT models cannot be constructed with databases where both the data before the y-shift and the data after y-shift exist.[15] The data before the y-shift should be immediately removed for JIT modeling. Although this elimination of the data before the y-shift is not covered in this study, those data will not exist so much by deleting the oldest data from the database.

image

Figure 3. The flow of monitoring database with the DMI.

Download figure to PowerPoint

The proposed DMI enables the database used for the construction of not only MW models but also JIT models to be managed by using the proposed DMI. In JIT modeling, some data must be selected from the enormous numbers of data or the weights of the enormous numbers of data must be set by using an index such as distance and correlation. Highly accurate JIT models can be constructed with less time by appropriately managing the database and keeping the amount of information.

In the field of structure activity relationship, structure-activity landscape index (SALI)[26] is famous as an index using similarity of X and that of y like Eq. (1). The SALI is defined between two molecules p and q as follows

  • display math(5)

where Ap and Aq are the activity of p and q, respectively, and sim(p, q) is similarity of chemical structures between p and q. The different activity and the high similarity lead to large SALI values. The SALI has achieved significant results[27] such as the detection of the activity cliff, which is combination of molecules whose structures are similar and activities vary greatly.

Results and Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Method
  5. Results and Discussion
  6. Conclusion
  7. Acknowledgments
  8. Appendix: A
  9. Literature Cited

To verify the effectiveness of the proposed method, we analyzed simulation data and industrial distillation column data. The relationships between X and y have strong nonlinearity for the simulation data set.

Data in which the relationship between X and Y is nonlinear

The analysis using data, where the relationship between X and y is nonlinear, was performed to verify the performance of the proposed method. The relationship between X and y in the data was given as follows

  • display math(6)

Equation (6) is described in the reference[28] as a test problem and 0.1x1 was added to the raw equation. Figure 4 shows the relationship between x1, x2, and y in Eq. (6). The colorbar represents y-values.

image

Figure 4. The relationship among x1, x2, and y for the simulation data.[18]

The colorbar represents y-values. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

The data of X-variables were generated to be randomly walked within ±3. Random numbers from the normal distribution with a mean of 0 and a standard deviation 0.01 were added to the y-variable. The time plots of x1, x2, and y without the drift of y for the simulation data are shown in Figure 5. From time 786 to 986, the data variation comes from only noise and this situation seems to be in Figure 5. The first 100 data were used for training and the next 1200 data were the test data.

image

Figure 5. The time plots of x1, x2, and y without the drift of y for the simulation data.[18]

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

In this study, two cases were assumed; No drift (Case 1) and Linear time-varying drift (Case 2). The weight of the drift is 0.01 times of the standard deviation of y and this drift was added to y in Case 2.

CASE 1. 1 NO DRIFT. The γ-value in Eq. (4) was optimized with 5-fold cross validation using SVR[29] of GK. The DMI-values were calculated with the γ-value of 25 and Eq. (4), and the database was updated according to the flow of Figure 3. The upper limit of the number of data in the database was set as 50 and the old data was deleted automatically. We set the a-value as 0.5, 1, and 2 while changing the PDMI-value from 0 to 1 in steps of 0.0001, and then, calculated the ratio of the update of the database. Figure6 shows the results. As the PDMI-values became large, the ratio of the database update was low. Additionally, the change of the a-values caused the change of the relationships between the PDMI-values and the ratio of the database update.

image

Figure 6. The relationship between PDMI and rate of update when no drift of y existed for the simulation data.[18]

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Then we predicted y-values with the online support vector regression (OSVR) model (Kaneko and Funatsu. Adaptive soft sensor model using online support vector regression and the time variable. AIChE J. submitted)[30] which is one of the adaptive models. The details of OSVR are shown in Appendix A. To consider measurement time of y, it was assumed that y-values can be obtained after gaining time of 5, although X-values can be given in real time. When a y-value is obtained, the DMI judges whether the new data should be stored in the database or not (Figure 3).

Table 1 shows the prediction results of the OSVR models, changing PDMI values when a equals 1. The rp2 is the determination coefficient r2 for test data and RMSEP is the root-mean-square error (RMSE) for test data. The larger values of rp2 and the smaller values of RMSEP mean the more predictive accuracy of the model. From Table 1, the models with less update frequency had more predictive ability than the model updating each time (PDMI is 0). However, when the PDMI-values increased more than 0.07, rp2 decreased and RMSEP increased. When PDMI was not so large, only new data dissimilar to the data in the database were stored whereas new data similar to the data in the database were not stored, and hence, informative data in the database were not removed. Therefore, the variety of the database increased and the OSVR model constructed with that database could predict wide data range of y with high accuracy. On the other hand, when PDMI was too large, the number of new data stored in the database was so small that the OSVR model could not adapt to the new process state. We can say that the appropriate PDMI-values exist for the database management with the DMI.

Table 1. The Prediction Results for Each PDMI when No Drift of y Existed for the Simulation Data when a Equals 1[18]
PDMIRate of updaterp2RMSEP
01.00000.99620.0378
0.00010.97750.99640.0367
0.00020.95170.99610.0381
0.00050.89750.99720.0323
0.00070.86750.99780.0288
0.0010.83170.99760.0300
0.0020.76670.99750.0307
0.0050.65000.99710.0330
0.0070.59000.99760.0301
0.010.52000.99700.0333
0.020.37330.99750.0305
0.050.16750.99680.0343
0.070.12000.99800.0270
0.10.08250.99300.0508
0.20.03580.99020.0603
0.50.01080.87890.2121
0.70.00580.78690.2814
50.0000−0.14070.6510

The result of PDMI of 0.07 where the rp2-value was maximum and the RMSEP-value was minimum showed that high prediction accuracy was achieved with only 12.0% model update by appropriately selecting data that should be stored in the database. Data required to represent the nonlinear relationship between X and y could be properly selected by using the proposed DMI.

Figure 7 shows the time plots of simulated and predicted y from 950 to 1050 when no drift of y existed. When the model was updated each time, the prediction errors were large in around time 990 when the abrupt process change happened (Figure 7a). This is because the model was constructed with only data including the small variation, and specialized in that state. Thus, the model could not adapt to the next abrupt change. Meanwhile, when the PDMI-value was 0.001 and the model update with data in the small variation was avoided, the model could adapt to the abrupt variation in around time 990 (Figure 7b). Besides, from Figure 7c, the high prediction accuracy was achieved also in the case of the model update frequency of 12.0% (PDMI is 0.07). We could confirmed that the appropriate selection of data required to the database with the proposed DMI enables to predict wide range of y with high accuracy.

CASE 2. 2 EXISTING DRIFT. The drift was added to only y-variable and we conducted another case study as we did in Case 1. The relationship between PDMI and rate of update is shown in Figure 8. The increase of PDMI and the change of a affected the decrease of update and the change of relationship between PDMI and the rate of update, respectively. Figure 8 is very similar to Figure 8 because the difference in the data of Case 1 and Case 2 is only the drift added to y-variable.

image

Figure 7. The time plots of simulated and predicted y from 950 to 1050 when no drift of y existed for the simulation data.[18]

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

image

Figure 8. The relationship between PDMI and rate of update when drift of y existed for the simulation data.[18]

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Table 2 shows the prediction results of the OSVR models. Even when the drift of y existed, the rp2-value increased and the RMSEP-value decreased by selecting the threshold PDMI appropriately. When the rp2-value was maximum and the RMSEP-value was minimum, the PDMI-value was 0.002 and the rate of update was 79.9%. The rate increased compared with that of no drift (Case 1) since the OSVR model had to adapt frequently to the change of the relationship between X and y due to the drift.

Table 2. The Prediction Results for Each PDMI when Drift of y Existed for the Simulation Data when a Equals 1[18]
PDMIRate of updaterp2RMSEP
01.00000.99580.0416
0.00010.98500.99570.0424
0.00020.96420.99560.0427
0.00050.92330.99550.0432
0.00070.90250.99590.0413
0.0010.86750.99660.0374
0.0020.79920.99710.0344
0.0050.68170.99710.0344
0.0070.62080.99690.0359
0.010.56330.99640.0385
0.020.42330.99650.0382
0.050.22080.99580.0416
0.070.17250.99590.0413
0.10.12580.99540.0434
0.20.06750.98980.0649
0.50.02670.97080.1099
0.70.01830.97170.1083
50.00000.87730.2253

Figure 9 shows the time plots of simulated and predicted y from 950 to 1050 when the PDMI-values are 0, 0.002, and 0.07. In the case of updating model each time, the model adapted to the data with the small variation and could not acculately predict the y-values in the rapid variation (Figure 9a). Meanwhile, from Figures 9b and 9c, the updated model could predict y-values in not only the small variation but also in the rapid variation after that with high accuracy by using the DMI and selecting the data for the model update. Even with the drift of y, the appropriate database management could be achieved using the proposed DMI.

image

Figure 9. The time plots of simulated and predicted y from 950 to 1050 when drift of y existed for the simulation data.[18]

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

In these case studies, we changed the PDMI-values and checked the predictive accuracy of the updated model, but in practice, the adequate PDMI-value must be set beforehand in managing dababase with the DMI.

Distillation column data

We analyzed data obtained from the operation of a distillation column at the Mizushima Works, Mitsubishi Chemical Corporation. Figure 10 shows schematic representation of the distillation column, and Table 3 shows the process variables. The y-variable is the concentration of the bottom product with the lowest boiling point, and the X-variables are the 19 variables given in Table 3. The measurement interval of y was 30 min and X-variables are measured every minute. The OSVR method was used as the regression method.

image

Figure 10. Schematic representation of the distillation column.

Download figure to PowerPoint

Table 3. Process Variables Measured in the Distillation Column
 SymbolProcess variables
yABottom product concentration
x1F1Reflux flow
x2F2Reboiler flow
x3F3Feed 1 flow
x4F4Feed 2 flow
x5F5Bottom flow
x6F6Top flow
x7L1Liquid level
x8P1Pressure 1
x9P2Pressure 2
x10T1Temperature 1
x11T2Temperature 2
x12T3Temperature 3
x13T4Temperature 4
x14T5Bottom temperature
x15T6Feed 1 temperature
x16T7Feed 2 temperature
x17T8Top temperature
x18F4/F3=RReflux ratio
x19F1/F6=FFeed flow ratio

We collected data of the 42 days from January 1, 2003. Figure 11 shows the time plot of y. The data of the first week are training data and the data after that are test data.

image

Figure 11. The time plot of y for the distillation columun data.

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

The relationships between PDMI and rate of update are shown in Figure 12. The relationships changed with the change of a. Although Figures 6 and 8 are very similar, Figure 12 is different from Figures 6 and 8 because the data sets are completely different and also the numbers of X-variables are different. Table 4 is the prediction results of the OSVR model when a equals 1. The rp2-values tended to increase and the RMSEP-values tended to decrease with the increase of the PDMI-values from 0. However, when the PDMI-values exceeded 0.005, the rp2-values decreased and the RMSEP-values increased. In the small PDMI-value, the variety of the database and the prediction accuracy of the OSVR model increased by storing only informative new data in the database. But, when the PDMI-value was too large, few new data were stored, and hence, the appropriate model update could not be performed. From the results of PDMI of 0.005 where the rp2-value was maximum and the RMSEP-value was minimum, it was confirmed that the model updated only about 67% of new data could predict y-values with the highest accuracy.

image

Figure 12. The relationship between PDMI and rate of update for the distillation columun data.

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Table 4. The Prediction Results for Each PDMI for the Distillation Column Data when a Equals 1
PDMIRate of updaterp2RMSEP
01.00000.95720.3505
0.00010.97800.95730.3504
0.00050.90360.95740.3498
0.0010.84050.95610.3551
0.0050.66900.95800.3476
0.010.62740.95680.3524
0.050.29460.95380.3644
0.10.17200.92970.4493
0.50.01790.94010.4149

Figures 13 and 14 show the time plots of measured and predicted y. When the OSVR model was updated each time, the prediction errors were large in the relatively large variations after the small variations (see around time 770 in Figure 13a and around time 1160 in Figure 14a). The model that specialized in the small variations could not adapt to the rapid time-varying variations occurring subsequently. Meanwhile, by managing the database with the DMI and the flow of Figure 3, the y-values were accurately predicted even in the rapid variations after the stable states (Figures 13b and 14b). It was confirmed that the database can be appropriately managed with the DMI and enables soft sensor models to predict y-values for wide data range with high accuracy.

image

Figure 13. The time plots of measured and predicted y from 740 to 800 for the distillation columun data.

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

image

Figure 14. The time plots of measured and predicted y from 1140 to 1180 for the distillation columun data.

[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Conclusion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Method
  5. Results and Discussion
  6. Conclusion
  7. Acknowledgments
  8. Appendix: A
  9. Literature Cited

We proposed the DMI that is an index to monitor databases and the method that manages databases for adaptive soft sensors with the DMI. Decision whether new measured data are stored in the database or not is made, based on the DMI-values of the new data. By storing only informative data, the amount of information increases whereas the number of the data is controlled. Through the analysis of the simulation data and the real industrial data, we confirmed that the database can be appropriately managed with the DMI and accordingly the predictive accuracy of the soft sensor models increases and also the data range where the y-values can be accurately predicted increases.

Although only MW models were used as adaptive soft sensor models in this article, the proper management of databases is required for JIT models.[31] In JIT modeling, the data similar to the prediction data are selected or the data are weighted to be larger weights for the data that are more similar to the prediction data. JIT models cannot adapt to the latest state of a plant without storing new data in databases. However when the number of the data in databases is too large, it takes too much time to construct JIT models. JIT models can be appropriately operated by selecting the data that should be stored in database with the DMI (Figure 3).

In the case studies, we set the a-value as one, changed the PDMI-value and checked the performance of the soft sensor models. In fact, the appropriate a-value and PDMI-value must be set beforehand, the way of which is one of the future works. Additionally, the sensitivity of those parameters to the predictive ability of a soft sensor model should be investigated. In our case studies, the first training data are totally stored in the database, but the DMI-values among the training data can be calculated and then the data having little information can be eliminated from the training data. Herewith compact database will be able to be maintained, including much amount of information.

In this article, we assumed that there are no abnormal data and no outliers in new measurement data, however, the abnormal data and outliers deteriorate the predictive ability of adaptive models. In practice, the fault detection[23] is one of the essential tasks.

We believe that by managing the database and increasing the predictive accuracy of adaptive soft sensor models, chemical plants will be operated effectively and stably.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Introduction
  4. Method
  5. Results and Discussion
  6. Conclusion
  7. Acknowledgments
  8. Appendix: A
  9. Literature Cited

H. Kaneko is grateful for financial support of the Japan Society for the Promotion of Science (JSPS) through a Grant-in-Aid for Young Scientists (B) (No. 24760629). The authors acknowledge the support of Mizushima works, Mitsubishi Chemical Corporation, and the financial support of Mizuho Foundation for the Promotion of Sciences.

Appendix: A

  1. Top of page
  2. Abstract
  3. Introduction
  4. Method
  5. Results and Discussion
  6. Conclusion
  7. Acknowledgments
  8. Appendix: A
  9. Literature Cited

OSVR

The SVR method applies a support vector machine (SVM) to regression analysis and can be used to construct nonlinear models by applying a kernel trick as well as the SVM. The OSVR method is a method efficiently updating a SVR model to meet the Karush–Kuhn Tucker (KKT) conditions that the SVR model must fulfill when training data are added or deleted.

The primal form of SVR can be shown to be the following optimization problem.

Minimize

  • display math(A1)

where yi, and xi are training data, f is a SVR model, w is a weight vector, ε is a threshold, and C is a penalizing factor that controls the trade-off between model complexity and training errors. The second term of Eq. A1 is the ε-insensitive loss function and given as follows

  • display math(A2)

Through the minimization of Eq. A1, we can construct a regression model that has good balance between generalization capabilities and the ability to adapt to the training data. A y-value predicted by inputting data x is represented as follows

  • display math(A3)

where N is the number of training data, b is a constant term, and K is a kernel function. The kernel function in our application is a radial basis function

  • display math(A4)

where γ is a tuning parameter controlling the width of the kernel function. From Eqs. A1 and A2, αi and αi* in Eq. A3 are obtained by minimizing the equation given as

  • display math(A5)

subject to

  • display math(A6)
  • display math(A7)

Kij in Eq. A5 is represented as follows

  • display math(A8)

Now, we define θi as follows

  • display math(A9)

From Eqs. A3, A4, and A8, a predicted y-value of data xi is given as

  • display math(A10)

where θi meets the following equation

  • display math(A11)

The error function h is defined as

  • display math(A12)

Then the KKT conditions can be summarized as follows

  • display math(A13)
  • display math(A14)
  • display math(A15)
  • display math(A16)
  • display math(A17)

Each training data must meet one of Eqs. A13–A17. All training data can be divided into the following sets: error support vectors, E, which meet Eq. A13 or A17, margin support vectors, S, which meet Eq. A14 or A16, and remaining vectors, R, which meet Eq. A15.

When new data xc, yc are added, there is no need to update the SVR model θi, b if xc belongs to R. On the other hands, if xc belongs to E or S, the initial value of θc that is θi corresponding to xc is set as 0, and θc, θi, and b are gradually changed to meet the KKT conditions. There are possibilities that each training data moves to another region due to the changes. But, assuming no such movements, variations of h(xi), θc, θi, and b, Δh(xi), Δθc, Δθi, and Δb, respectively, can be represented from Eqs. A11 and A12 as follows

  • display math(A18)
  • display math(A19)

The θi-values of the training data belonging to E and R did not change because of Eqs. A13, A15, and A17, and thus, Eq. A18 can be transformed as

  • display math(A20)

The h(xi)-values of the training data belonging to S are settled due to Eqs. A14 and A16. Thus Eqs. A19 and A20 can change to

  • display math(A21)
  • display math(A22)

Then, Δθc, Δθi and Δb can be represented as

  • display math(A23)
  • display math(A24)

where

  • display math(A25)
  • display math(A26)

Here M is the number of the training data that belong to S. From Eqs. A20, A23, and A24, h(xi) for the training data belonging to E and R can be transformed as

  • display math(A27)

where

  • display math(A28)

From Eqs. A24 and A27, Δθc for the movement of each training data is represented as

  • display math(A29)
  • display math(A30)

The absolute Δθi-values for each training data to move from the current region to another region, i.e. from E to S, from S to E or R and from R to S, are calculated by using Eqs. A29 and A30. The minimum value of the absolute Δθi-values calculated with all training data is selected, and the data having the minimum Δθi-value is actually moved to a new region. The calculation of the absolute Δθi-values and the movement of the data having the minimum value of the absolute Δθi-values are repeated until each of all the training data meets the KKT conditions, namely, one of Eqs. A13–A17. When one data are deleted from training data, the same iterative calculation is performed until all the data meet the KKT conditions.

Literature Cited

  1. Top of page
  2. Abstract
  3. Introduction
  4. Method
  5. Results and Discussion
  6. Conclusion
  7. Acknowledgments
  8. Appendix: A
  9. Literature Cited