Data‐driven modeling based on kernel extreme learning machine for sugarcane juice clarification

Abstract Clarification of sugarcane juice is an important operation in the production process of sugar industry. The gravity purity and the color value of juice are the two most important evaluation indexes in the cane sugar production using the sulphitation clarification method. However, in the actual operation, the measurement of these two indexes is usually obtained by offline experimental titration, which makes it impossible to timely adjust the system indicators. A data‐driven modeling based on kernel extreme learning machine is proposed to predict the gravity purity of juice and the color value of clear juice. The model parameters are optimized by particle swarm optimization. Experiments are conducted to verify the effectiveness and superiority of the modeling method. Compared with BP neural network, radial basis neural network, and support vector machine, the model has a good performance, which proves the reliability of the model.


| INTRODUC TI ON
Sugarcane juice clarification involves complex physical and chemical reactions. It is desirable to establish a mathematical model to analyze and control the clarification process, so that optimal final product be achieved. In control of complex industrial processes, there are generally two types of mathematical modeling approaches: modeling based on mechanism and modeling based on data-driven. Abderafi and Bounahmidi (1999) used the adapted Peng-Robinson equation of state to estimate the boiling temperatures of industrial beet and sugar cane juices over a wide range of dry substance content. Jourani and Bounahmidi (2002) explained the reaction process of calcium phosphate in the first stage by combining the growth rate equation of crystal with the dissolution equation through the kinetic method. Mirsaeedghazi et al. (2010) proposed a mathematical modeling of mass transfer in the concentration polarization layer of flat-sheet membranes during clarification of pomegranate juice. Cheng et al. (2011) used the data regression method to establish the quantitative relationship between calcium salt cations and acidic anions and constructed a mathematical model of calcium salt content and PH value. Hamerski, Silva, Corazza, Ndiaye, and Aquino (2012) presented a study of sugarcane juice carbonation and the evaluation of variable effects such as pH, carbonation time, and temperature on industrially relevant parameters for the quality of sugarcane juice.
Three different batches of sugarcane juice were evaluated using a complete two-level factorial design with central point performed in triplicate.
There are few studies on the mechanism model of the sugarcane juice clarification process. This is because the cane juice clarification process is a large time delay, multivariable coupling, and nonlinear process. It is extremely difficult to establish a complete mechanism model describing the clarification process which involves various complex physical and chemical reactions. Therefore, more and more researches turn to the process based on data-driven approach, without the need for the mechanism description of the clarification process. By relying on the online and offline data of the monitoring system, a model predicting the development of the clarification process can be obtained through mathematical processing. Lin and Yang (2009) established an Elman network model which improved the dual heuristic dynamic programming to predict the neutralized pH value and the purified pH value of sugarcane juice. Song, Wu, Lin, and Liu (2012) used a generalized dynamic fuzzy neural network to predict the color value and alkalinity during the carbonation clarification process of sugarcane juice and obtained a satisfactory result. Sartori et al. (2017) proposed artificial neural network (Lambda NN) models to predict the effects of different variables on sugarcane juice color removal and sucrose content.
The above-mentioned models of the clarification process focused on pH prediction, while more critical process parameters such as the gravity purity have not been involved. Moreover, these models are mainly based on the gradient descent method to update the model parameters (Al-Batah, Mat Isa, Zamli, & Azizli, 2010). While the generalization performance of these models is good, there are some problems, such as slow training speed and easy to fall into the local optimum, which limits the application and development of the model (Kaya & Uyar, 2013;Mohammed, Minhas, Jonathan Wu, & Sid-Ahmed, 2011).
In order to solve the problems of traditional learning machine of slow training speed and easy to fall into local optima, Huang, Zhu, and Siew (2006) proposed a new learning method, namely extreme learning machine (ELM). The ELM learning method has the advantages of less training parameters, very fast speed, and good generalization performance. Many researchers have applied ELM with different variants to solve different industrial problems. Wong, Wong, Vong, and Cheung (2015) used kernel-based ELM and cuckoo search to model and optimize the performance of biodiesel engine. Farias et al. (2014) used extreme learning machine and bat algorithms to monitor product quality and provide fast and reliable product quality assessment of key process variables in second-generation ethanol production. Mohammadi et al. (2015) proposed an extreme learning machine (ELM)-based model for prediction of daily dew point temperature, and the model enjoys much greater prediction capability than SVM and ANN. Sadgrove, Falzon, Miron, and Lamb (2017) presented a color feature extreme learning machine (CF-ELM) for fast object detection in pastoral landscapes, which takes three color inputs instead of the standard grayscale input.
Among the different applications of ELM, the kernel-based ELM proved to have similar generalization performance to SVM while maintaining a much faster learning speed (Uçar & Özalp, 2017). Therefore, the kernel-based ELM is employed in our research to tackle the large time delay and strong coupling problem in the sugarcane juice clarification process where establishing of mechanism model is difficult.
In our data-driven model based on kernel extreme learning machine, four easy-to-measure variables in the sugarcane juice clarification process are selected as input, including the flow rate of the mixed juice, the intensity of sulfitation, the neutralization PH value, and the preliming PH value. Two difficult-to-measure variables are chosen as output, including the gravity purity of juice and the color value of clear juice. The parameters of the model are optimized by particle swarm optimization, and the effectiveness of the model is verified by experiment. To further evaluate the model performance on accuracy and time consuming, the results predicted from this model are also compared with those from other models such as BP, RBF and SVM.

| Kernel extreme learning machine
The support vector machine will not fall into the local minimum point during the learning process which makes it more generalized than BP in training feedforward neural networks, and the fitting degree of the test data set is more reliable (Huang, 2014, Wang, Zheng, Yoon, & Ko, 2018. However, the support vector machine has limited applicability to the field of system modeling. For a complex system, it may be necessary to build multiple parallel networks, which will cause a long modeling period. In order to overcome the shortcomings of neural networks and support vector machine, inspired by biological learning, Huang et al. (2006) proposed a new learning method, namely extreme learning machine (ELM). Unlike the traditional neural network methods which usually are time consumed and easy to get overfitting results, the ELM does not need to tune parameters for its hidden layers, resulting in a faster training speed and an improved generalization performance (Huang, 2014;Lu, Du, Liu, Xia, & Yeap, 2017).
The network structure of the extreme learning machine is shown in Figure 1.
has N samples, where is the input matrix, indicating that there are n input variables, and y i = [y i1 ,y i2 , . . . ,y im ] T ∈ R m is the output matrix, indicating that there are m output variables. According to the structure principle of feedforward neural network, the mathematical model of input and output with the number of l hidden layer neurons can be expressed by Equation (1).
In the above formula, is the network output weight; a k is the network input weight; ⟨a k , x i ⟩is the inner product of a k and x i ; and b k is the threshold of the kth hidden layer neuron. The activation function G( • )of the hidden layer neurons can be any infinitely differentiable function such as sigmoid function, sine function, cosine function or compound function. (1) is written as a matrix form consisting of N equations, it can be expressed as Equation (2). where H is the hidden layer output matrix.

If the expression of Equation
Unlike the traditional feedforward neural network which needs to adjust all network parameters in the training process to get the optimality, Huang et al. demonstrated that ELM input weights and hidden layer neuron thresholds can be randomly initialized prior to training and remain unchanged during training. The weight vector connected between the hidden layer and the output layer can be solved by Equation (5): The solution of the above formula is.
where H + is the Moore-Penrose generalized inverse of the hidden layer output matrix H. It can be solved by various methods such as orthogonal projection, orthogonalization method, iterative method, and singular value decomposition. When using Moore-Penrose generalized inverse to solve H = Y, it is a least squares solution and is easily overfitting in the case of large samples. By introducing the concept of kernel function into the extreme learning machine (Huang, Zhou, Ding, & Zhang, 2012), the kernel extreme learning machine (KELM) is formed. This can effectively avoid the original randomness of ELM, achieving faster training, and better generalization performance (Huang, 2014;Jian et al., 2017) Replace the output matrix of the ELM with the corresponding kernel function, as shown in Equation (7): This leads to Therefore, the output of KELM can be written as where I is the identity matrix, C is the penalty factor, and the Gaussian kernel function is chosen as the kernel function of the model.

| Input and output variables
There are many parameters that may have influence on the clarification of sugarcane juice. Table 1 lists the eight potential parameters that may have significant influence on the clarification process. To eliminate those parameters that have insignificant influence on the process so that to reduce the dimensionality of the data set required to be treated during the modeling, the principal component analysis (PCA) method is used. At the end, the parameters that have significant influence on the clarification process of the cane juice are extracted.
By calculating the size of the eigenvalues of each variable and the cumulative contribution rate to the clarification, it is able to determine the features to be extracted by the PCA feature dimension reduction. Table 2 shows the contribution rate statistics of each variable calculated from a data set of 277 data samples, which were obtained by the experimental platform.
It can be seen from Table 2 that the cumulative contribution rate of the first four variables, n 1 , n 6 , n 4 , and n 3 , has already reached 86.19%, exceeding the usual requirement of 85% when using the cumulative variance contribution rate method (Niu, 2011). Therefore, the mixed juice flow (x 1 ), the intensity of sulfitation (x 2 ), the neutralization PH value (x 3 ), and the preliming PH value (x 4 ) are selected as input. Meanwhile, two unmeasurable parameters, the Gravity purity of juice (y 1 ) and the Color value of clear juice (y 2 ), are taken as output, as shown in Table 3.

| Parameter optimization
Particle swarm optimization (PSO) is a group intelligence global search optimization algorithm proposed by Kennedy in 1995(Kennedy & Eberhart, 1995, which was inspired by the behavior of bird foraging groups. The particle swarm optimization algorithm has the characteristics of fast convergence, easy experimentation, and easy combination with other algorithms. It has been widely used in many fields, such as economic dispatch, robot application, signal processing, and image segmentation (Mahor, Prasad, & Rangnekar, 2009;Sengupta & Das, 2017;Suresh & Lal, 2017;Zhang, Gong, & Zhang, 2013). Therefore, the PSO is employed to optimize the parameters during the clarification process of the sugarcane juice.
The fitness function is used to evaluate the pros and cons of the particles. It has a direct impact on the algorithm optimization results.
During the optimization process, each particle in the group moves according to its own fitness function value, rather than a random flight. In each iteration, the optimal value of the individual particles is updated by comparing its historical optimal value with the current state optimal value. p i = (p i1 ,p i2 , … ,p iD )is used to denote the optimal position that individual particles can currently find, called "local optimum." After traversing each individual particle, the population global optimal value is updated by comparing the group history optimal value with the current state group optimal value. p g = (p g1 ,p g2 , … ,p gD ) is used to indicate the optimal location that the group can currently find, called "global optimality." The velocity and position iteration equations of the particle swarm algorithm are as follows: where k is the number of algorithm iterations, d = 1,2, ⋯ ,Dis the search solution in the D-dimensional space, is the moving speed of the particle i position change, p k id is the optimal position of the individual particle after k iterations, p k gd is the optimal position of the group after k iterations. is the inertia factor, and c 1 ,c 2 are the acceleration factors, and r 1 ,r 2 are two random numbers between (0, 1).
The performance of the KELM-based cane juice clarification process model will be affected by the penalty factor C and the kernel parameter . In this model, the prediction result RMSE of the sample data of the sugarcane juice clarification process is taken as the fitness function, and the optimal penalty factor C and kernel parameter can be obtained.

| Data-driven model
The specific steps of construction of the data-driven model based on KELM are shown in Figure 2. As mentioned previously, four

| Model performance index
In order to evaluate the performance of the data-driven model for sugarcane juice clarification, it is necessarily to define the model evaluation criteria. Suppose the actual value of the i-th test sample with m test samples is expressed as y i , the predicted value of the corresponding i-th test sample is ∧ y i , and the mean value of the test The performance of the data-driven model for the sugarcane juice clarification process was evaluated by using the indexes shown in Table 4 (Malik, 2005).

| Experimental platform
In order to verify the data-driven model of the sugarcane juice clarification process, a comprehensive experimental platform is developed. The platform mainly includes a sedimentation tank, some sugarcane juice tanks, an auxiliary part and a control valve part, as shown in Figure 3.

| Experimental result
The computer operating environment provided in this paper is Inter(R) Corei5, 2.4GHz, 4G memory, Windows 7 operating system, and MATLAB 2014a is used as the running computing software.

Indexes of model performance evaluation Formula
Root mean square error: After the iteration is completed, the optimal parameter combination of the model is obtained. The optimization results are shown in Table 5.
After obtaining the optimal penalty factor and kernel func-

| Comparison with other learning methods
In order to verify the validity and superiority of the data-driven model based on KELM for the cane juice clarification process, a number of models with different learning methods including BP, RBF, and SVM have been run using the same data set of 277 data samples that was used in the KELM-based model to compare their performances. Figure 6 shows the performance comparison of the four models when the gravity purity of juice is chosen as output. Figure 7 shows the results of predicting the color value of clear juice. Tables 7 and 8 compare the performance indexes of different models in predicting gravity purity and color value, respectively.
It is seen from Figures 6 and 7 that all models perform better in predicting the gravity purity than in predicting the color value.
While the prediction accuracy and generalization performance of both BP and RBF are substandard with R 2 < 0.85, and large scattered data, the SVM and KELM models perform much better with

ACK N OWLED G M ENT
The project is supported by National Natural Science Foundation of China (No. 61763001).

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

E TH I C A L S TATEM ENTS
This research does not involve any human or animal testing.