2.1. Support Vector Regression
The support vector machine (SVM), proposed by Vapnik et al.,14 is a statistical learning approach based on the structure risk minimization principle. In recent years, the SVM has been regarded as the state-of-the-art technique for solving a variety of learning, classification, regression and prediction problems. SVR, as a regression version of SVM, is capable of solving non-linear regression problems using kernel functions, which is generally superior to other pattern recognition and regression approaches. SVR has been proved to exhibit a number of significant advantages such as excellent learning performance of small samples, good generalization ability, small errors, high calculation accuracy, etc.15 At present, it has become a focus in machine learning research and is extensively employed in a wide range of real-world problems.16–30
Suppose a sample is described by (x, y), where x represents the independent variable and y the dependent response. In general, the relationship between y and x is quite complex and highly nonlinear. For SVR, the basic idea is to map x from the original space X into a higher dimensional feature space F via a nonlinear mapping function ϕ(x), and then to conduct a linear regression in F space. Therefore, the purpose of SVR is to find a linear Equation (1) based on a given training dataset {(x1,y1), …, (xn,yn)}:
((1))
where w is a vector for regression coefficient, b is a bias. They are estimated by minimizing the regularized risk function R(C), namely:
((2))
((3))
where n is the number of training samples, C is a regularization factor, ε is a prescribed parameter controlling the tolerance to error, and (1/2)||w||2 is used as a measurement of function flatness. The second term,
, is the so-called empirical risk and measured by the ε-insensitive loss function Lε(f(xi)–yi), which indicates that it does not penalize errors below ε.
In order to control function complexity and regression errors according to the desired precision, the slack variables ξ and ξ* are introduced to deal with the data points that do not satisfy Equation (3). Equation (2) and (3) can be transformed into the primal problem
:
((4))
In Equation (4), the first term increases the smoothness of the regression function to improve its generalization ability and the second term reduces errors. The regularization factor C is a positive constant, determining the tradeoff between the training error and the model flatness.
In order to obtain w and b in Equation (1), the Lagrange equation is built:
((5))
where αi and αi* are Lagrange multipliers to be solved. Only the nonzero values of the Lagrange multipliers are useful in the regression, and their corresponding samples are known as support vectors (SVs). The Equation (5) is obtained by setting the partial differential coefficients for w, b, ξi, and
equal to zero:
((6))
Substituting Equation (6) into Equation (5), the dual optimization problem can be written as:
((7))
Therefore, the function regression problem on SVR may come down to a quadratic programming problem. By minimizing Q, the array w can be written in terms of the Lagrange multipliers and training samples as:
- $${\bf w} = \sum\limits_{i = 1}^{l} {{\rm (}\alpha _{i} } - \alpha _{i}^{*} {\rm )}\phi {\rm (}{\bf x}_{i} {\rm )}$$, ((8))
where l is the number of SVs. Finally, the linear Equation (1) has the following explicit form:
((9))
In Equation (9), k(x,xi) = ϕ(x)•ϕ(xi) is a kernel function. There exist several types of commonly used kernel functions, such as linear kernel, radial basis kernel, polynomial kernel, sigmoid kernel, etc. In this study, the sigmoid kernel function (10) is used as the kernel function of the SVR because it tends to achieve better performance.
((10))
2.2. Choosing of SVR Parameters with PSO
The PSO method was proposed by Kennedy and Eberhart,31 being motivated by the social behavior of organisms such as bird flocking and fish schooling. It is an optimization technique. The generalization ability of SVR relies entirely on four parameters, i.e., ε of the ε-insensitive loss function, the regularization constant C, and the kernel function parameters α and β. Therefore, it is a key step to search for the optimal parameters (ε, C, α, β) for SVR. In this study, PSO was utilized to search for the optimal parameters (ε, C, α, β) of SVR, and the mean absolute percentage error (MAPE), which directly reflects the modeling performance of SVR, was chosen as the fitness function:
((11))
where n denotes the number of training samples, yi represents the actual measured value and
is the estimated value for the ith training sample.
2.3. Dataset and Modeling Method
The training set and test set used in this study was generated by Liu et al.7, which was originally taken from ref. 32 and is composed of 32 polymethacrylates. Table 1 tabulates the Tg values and the indices of related six quantum chemical descriptors (|L-1.356|, Etotal, qC6, α, q− and Etherm) for 25 training samples and seven test samples. Here, L is side-chain length, referring to the distance from the ester O5 to its furthest atom on the side chain R6 (see Figure 1), here R6 = CnH2n+1 (n is the atomic number of carbon of the alkyl on the side chain of polymethacrylates, n = 1, 2, 3, 4, 6, 8, 10, 12, 14, 16). After analyzing the relationship between the side-chain length L and Tg, Liu et al.7 found that when L = 1.356 nm (n = 10), the Tg would decline to the lowest point where the free volume and the intermolecular attraction strike a balance. Thus taking L = 1.356 nm as the length of the side chain at which Tg is lowest, irrespective of whether the side chain is longer or shorter than 1.356 nm.7 Etotal is the total energy of the macromolecular. qC6 is the net charge of carbon atom C6 connected directly to ester O5. α is the molecular average polarizability. q− is the net charge of the most negative atom and Etherm is the thermal energy of the polymethacrylates. All the quantum chemical descriptors were calculated directly from the structure of the monomer with the Gaussian 03 program, at the DFT/B3LYP/6-31G(d) level with the keywords OPT, POLAR, FREQ, and the optimized structure was characterized as true local energy minima on the potential energy surface, without imaginary frequencies. In this study, the polymethacrylates were represented by their repeating units end-capped by hydrogen atoms to calculate molecular descriptors. The detailed definition and calculation for above quantum chemical descriptors are also given in ref. 7. In addition, other five independent samples as tabulated in Table 3 were taken from the literatures.2, 8
Table 1. Quantum chemical descriptors and glass transition temperature Tg for 32 polymethacrylates.7| No. | Polymer | |L-1.356| [nm] | Etotal [au] | qC6 [au] | q− [au] | α [au] | Etherm [kJ · mol−1] | Exp.Tg [K] |
|---|
|
| 1 | Poly(4-cyanopheny1-methacylate) | 0.68 | −631 | 0.3604 | −0.5274 | 125.33 | 555.98 | 428 |
| 2 | Poly(phenyl methacrylate) | 0.83 | −538.75 | 0.3562 | −0.5336 | 104.34 | 550.12 | 407 |
| 3a) | Poly(tert-butyl methacrylate) | 1.02 | −464.97 | 0.3118 | −0.4964 | 89.6 | 642.01 | 380 |
| 4 | Poly(4-methoxycarbonyphenyl-methacrylate) | 0.54 | −766.4 | 0.3561 | −0.5273 | 139.68 | 680.28 | 379 |
| 5 | Poly(methyl methacrylate) | 1.15 | −347.02 | −0.2169 | −0.4805 | 58.09 | 410.71 | 373 |
| 6 | Poly(4-tertbutylphenyl methacrylate) | 0.6 | −696.01 | 0.3444 | −0.5318 | 152.5 | 866.35 | 371 |
| 7 | Poly(2-chloroehtyl-methacrylate) | 0.85 | −845.93 | −0.028 | −0.4787 | 79.26 | 466.24 | 365 |
| 8a) | Poly(2-hydroxyehtyl-methacrylate) | 0.96 | −461.54 | −0.0555 | −0.609 | 72.93 | 502.01 | 358 |
| 9 | Poly(cycHexyl-methacrylate) | 0.93 | −542.39 | 0.1318 | −0.4913 | 108.94 | 742.84 | 356 |
| 10 | Poly(1,1,1-trifluoro-2-propyl-methacrylate) | 1 | −723.37 | −0.0032 | −0.4758 | 79.55 | 510.31 | 354 |
| 11a) | Poly(isoproyl methacrylate) | 1.02 | −425.65 | 0.1303 | −0.4859 | 79.49 | 565.94 | 354 |
| 12 | Poly(2-hydroxypropyl-methacrylate) | 0.9 | −500.86 | −0.0369 | −0.6099 | 83.17 | 581.1 | 349 |
| 13 | Poly(2-cyanoethyl-methacrylate) | 0.88 | −478.57 | −0.0328 | −0.4765 | 81.35 | 487.92 | 347 |
| 14 | Poly(ethyl methacrylate) | 1.02 | −386.33 | −0.0315 | −0.4832 | 69.08 | 488.81 | 338 |
| 15a) | Poly(sec-butyl methacrylate) | 0.96 | −464.97 | 0.1328 | −0.4856 | 89.9 | 642.15 | 333 |
| 16 | Poly(phenethyl-methacrylate) | 0.6 | −617.39 | −0.0266 | −0.4806 | 126.9 | 714.2 | 329 |
| 17 | Poly(benzyl methacrylate) | 0.75 | −577.99 | −0.1442 | −0.485 | 114.84 | 636.01 | 327 |
| 18 | Poly(isobutyl methacrylate) | 1 | −464.96 | −0.0233 | −0.4828 | 90.42 | 644.9 | 326 |
| 19 | Poly(2-bromoehtyl-methacrylate) | 0.94 | −2957.44 | −0.0336 | −0.4794 | 86.46 | 467.36 | 325 |
| 20a) | Poly(3,3-dimethylbutyl-methacrylate) | 0.96 | −543.59 | −0.0422 | −0.4844 | 111.59 | 800.59 | 318 |
| 21 | Poly(neopenyl methacrylate) | 0.9 | −504.28 | −0.0207 | −0.4826 | 100.58 | 719.13 | 312 |
| 22 | Poly(proyl methacrylate) | 0.9 | −425.65 | −0.0258 | −0.483 | 80.01 | 564.88 | 308 |
| 23 | Poly(2-methoxyethyl-methacrylate) | 0.79 | −500.85 | −0.0631 | −0.4787 | 83.89 | 582.85 | 297 |
| 24a) | Poly(butyl methacrylate) | 0.77 | −464.96 | −0.0315 | −0.4835 | 90.99 | 643.75 | 293 |
| 25 | Poly(hexdecyl methacrylate) | 0.77 | −936.73 | −0.0312 | −0.4837 | 224.75 | 1590.82 | 288 |
| 26 | Poly(pentyl methacrylate) | 0.64 | −504.28 | −0.045 | −0.4841 | 101.65 | 725.07 | 283 |
| 27 | Poly(Hexyl methacrylate) | 0.51 | −543.59 | −0.0312 | −0.4835 | 113.14 | 803.64 | 273 |
| 28 | Poly(2-ethylhexyl-methacrylate) | 0.52 | −622.21 | −0.0487 | −0.489 | 134.51 | 959.99 | 263 |
| 29 | Poly(octyl methacrylate) | 0.26 | −622.22 | −0.0311 | −0.4836 | 135.37 | 961.05 | 253 |
| 30 | Poly(Tetradecyl-methacrylate) | 0.51 | −858.1 | −0.0312 | −0.4837 | 202.37 | 1433.38 | 233 |
| 31 | Poly(dodecyl methacrylate) | 0.26 | −779.47 | −0.0312 | −0.4837 | 180 | 1275.94 | 208 |
| 32a) | Poly(decyl-methacrylate) | 0 | −700.85 | −0.044 | −0.4847 | 157.51 | 1118.66 | 203 |
In this study, in order to directly compare the MLR, ANN, and SVR models' modeling, prediction and generalization performance, the training and test procedures were carried out in terms of the same training and test samples as previously conducted by Liu et al.7. Therefore, as indicated in Table 1, 25 samples were employed as a training dataset in the training process to develop a model, the other seven samples acted as the test set. The five independent samples acted as the independent set to further validate the generalization ability of the established SVR model.
The SVR model was self-adjusted to make the training error (MAPE) as small as possible via learning with the training samples and optimizing the parameters through continuous training adjustment for 10 000 times. Finally, an optimal SVR model was obtained via a sigmoid kernel with the penalty factor C of 56700186162.908470, insensitivity factor ε of 4.011811 and kernel function parameters (α, β) of (0.150243, −4.425184).
2.4. Generalization Performance Evaluation
The Tg values of the seven test samples and five independent samples were calculated by using the constructed SVR model. Besides the index of MAPE, three other indices, i.e., mean absolute error (MAE), RMSE, and correlation coefficient (R2), were used for performance evaluation, as defined by Equation (12–14), respectively:
((12))
((13))
((14))
where m denotes the number of test/independent samples, yj and
stand for the targets (measured) and predicted values of the jth test/independent sample respectively,
is the mean experimental value as well as
is the mean predicted value for all test/independent samples.