A framework for energy optimization of distillation process using machine learning‐based predictive model

The distillation process is one of the most common and energy‐intensive processes in the chemical industry. Most chemical processes are nonlinear and complex, because of which, it is difficult to find optimal operating conditions. To solve this problem, we developed a framework for energy optimization of the distillation process based on a machine learning (ML) model. The framework enables the efficient operation of the process by using the optimal operating conditions recommended by the ML‐based predictive model. The predictive model, which is a key component, is developed in three steps: learning, validation, and improvement. In the learning step, we select an algorithm suitable for the purpose of the process and learn process data. In the validation step, the model is validated using hold‐out cross‐validation. Finally, in the improvement step, the model performance is improved through hyper‐parameter optimization. We applied the framework to a commercial mixed butane distillation columns of 45,000 metric tons per annum capacity. The predictive model was based on commercial process data, and it can be used to predict the temperature at the product stage. The model recommended the steam flow rate required to maintain the target temperature of the product stage as per the operating conditions. The recommended steam flow rate will be guideline for the on‐site operator. The software is developed that the predictive model can be easily applied to the commercial processes, and it identifies the state of the process and recommends optimal operating conditions.


| INTRODUCTION
The chemical industry is related to almost all other industries and accounts for a substantial proportion of the entire industrial sector. As of 2017, the chemical industry accounted for the largest share (15.6%) of the global gross domestic product of the manufacturing sector. Recently, due to the social problems of resource depletion and environmental pollution, attention has been drawn to the optimal operation of chemical processes that use less energy. As expected, issues concerning process safety or the environment should be guaranteed in the chemical industry; thus, cost minimization, including energy-saving, is becoming a crucial issue. Energy costs account for the second-largest share of the operating costs of chemical plants, making energy cost reduction a vital issue. Chemical processes are typically nonlinear and complex. For example, a chemical process consisting of various devices, such as reactors, separators, and utility facilities, has several operating variables and complex correlations between them. If the target system to be optimized is simple, a few operating variables would require consideration, eliminating the need to introduce optimization. However, as the chemical process is a complex nonlinear system and the business environment changes continuously, many of these variables cannot be considered in a short period by both engineers and operators to identify optimal operating conditions.
To solve these problems, theoretical simulation analysis methods based on mathematical modeling have traditionally been used for control, optimization, and scheduling, regardless of the type of target process. Process optimization using mathematical modeling has been extensively applied in various processes such as distillation processes. [1][2][3][4][5][6][7] For example, Gu et al. have been performed for the energy optimization of the multistage crude oil distillation process using methods that combine energy analysis and mathematical programming. 1 Mathematical modeling has also been used for the development of a knowledge-based corrosion control system in a crude oil distillation unit process. Kim et al. developed a knowledge-based corrosion control system in a crude oil distillation unit process. The cause of corrosion in the distillation column unit can be determined using this system, and the column can be prepared in advance that is before the onset of corrosion. [2][3][4][5] However, mathematical modeling is limited because nonlinear and complex processes involve numerous assumptions. Therefore, suitable modeling methods need to be applied to optimize and control such complex and dynamic chemical processes. Recently, many studies have been reported that have applied a model based on neural networks, which can solve the mathematical model's challenges and limitations. Neural networks have been proposed as an alternative for the classical problems of process engineering, 8 such as process control, optimization, and scheduling. [9][10][11][12][13][14][15][16][17] For example, Faúndez et al. performed the estimation of phase equilibrium properties using an artificial neural network (ANN) model. The ANN model can be reliably used to estimate the bubble pressure and the congener concentration in the vapor phase of the ethanol + congener systems. 9 Dantas et al. used response surface methodology and ANN to optimize the patchouli oil to obtain patchoulol-rich fractions. 10 There are also cases in which models made using machine learning (ML) are applied in the distillation process. [18][19][20][21][22] Fernandez et al. used neurofuzzy networks to apply to the adaptive neuro-fuzzy inference systems modeling and indirect control of the heavy and light product composition in a binary methanol-water distillation column. 18 Osuolale and Zhang used ANN to obtain optimal distillation operation conditions that can maximize the exergy performance of distillation systems. 19 However, the data used in the model development is theoretical data that has been created using Aspen HYSYS, therefore there are limitations to its application to real-time processes.
There have been various studies that reported the applications of a ML model to improve the efficiency of the process, and the effectiveness of the model has also been demonstrated. However, there has been no reported case of the development and application of a ML-based predictive model to real-time commercial distillation processes. In this study, we developed a framework based on ML on commercial distillation column process data and optimized the energy of the distillation process. Furthermore, software was developed including a predictive model to be easily applied to the commercial distillation process. The novelty and major contributions of this study are as follows: • This study proposed a framework for energy optimization of the distillation process using the ML-based prediction model; • The software was developed for implementing the prediction models in a commercial distillation process. Optimal operating conditions of the process are suggested in real-time by the developed software.
The rest of this paper is organized as follows. Section 2 describes the gated recurrent unit (GRU) model, root mean square error (RMSE), and R 2 . Section 3 describes in detail the learning process of the predictive model, which is the core of our proposed framework. Section 4 introduces the process of applying the framework and the results obtained through the application. Finally, we conclude the paper and state its potential value in Section 5.

| GRU algorithms
In this study, we have aimed to develop a model to predict the temperature at the distillation column based on the time-series data. Therefore, we have developed a prediction model using GRU which is a type of time series regression model.
GRU is the newer generation of recurrent neural network (RNN) and has been regarded as a variant of long short-term memory (LSTM). 23 GRU does not require the use of memory units as much as LSTM, thereby reducing the burden of calculation and using considerably fewer parameters than LSTM. Compared to LSTM, GRU uses two gates instead of three, but it has been reported that GRU performs as well as LSTM. 24 Therefore, GRU has been used in this study. The structure of the GRU cell is illustrated in Figure 1. The equation for calculating the state of the GRU cell and output of each layer is as follows: One gate controller z t controls both the target and input gates. When z t output is 1, the target gate opens and the input gate closes, and if the z t output is 0, the target gate closes and the input gate opens. That is, every time the previous (t − 1) memory is stored, the time step input is deleted. The GRU cell does not have an output gate; hence, the entire state vector h t is the output per time step, and there is a new gate controller r t that controls which part of the previous state h t−1 can be the output. The developed model predicts the output variable (the temperature at the 64 stage) through input variables such as feed flow rate, steam flow rate, top pressure, and so on. Therefore, x t is input variables, and h t−1 is the temperature at the 64 stage. The details of input variables and output variables are covered in Section 4.2.

| RMSE and R 2
RMSE and R 2 are criteria for judging the predicted performance of the model, and the model is improved by adjusting the hyper-parameters until the values of RMSE and R 2 reach the desired level.
RMSE indicates the error of the actual and predicted values; the smaller the value, the more accurate the performance of the model. RMSE can provide a more balanced evaluation of the goodness of fit of the models as it is more sensitive to the larger relative errors caused by the low value. 25 RMSE expressed as in 25 : where N represents the number of data, S ia represents the ith actual value of the model forecast, and S ip represents the ith value of the model forecast. R 2 is the coefficient of determination, which means a measure of how well the estimated linear model fits the given data. The closer the R 2 value is to 1, the higher the correlation between the actual value and the predicted value, and the higher the performance of the forecasting model. [26][27][28] R 2 expressed as: R 2 is the square of the deviation from the forecast divided by the square of the deviation from the actual value.

| THE FRAMEWORK WITH ML-BASED PREDICTIVE MODEL
The purpose of the proposed framework is to control the distillation process with optimized operating conditions for energy cost reduction. Most processes are operated and controlled using the distributed control system (DCS), which is most suited for plants involving large numbers of continuous control loops, special control functions, process variables, and alarms. 29 Most existing distillation processes operate as illustrated in Figure 2A. The operational situation of the process is briefly checked through the DCS and the operator makes appropriate decisions for each situation according to the process operation guidelines based on experience or mathematical models. Figure 2B depicts a proposed system for applying a data-driven model to commercial processes developed in this study. The developed ML-based predictive model has been applied to the process operated using an existing DCS, and it is an integrated system designed for the distillation process. Real-time data of the process are collected through the DCS and sent to the software. The collected data is stored in the operating data history in real-time, and the predictive model embedded in the software learns using the stored data with respect to all possible operating conditions. The learned predictive model recommends the calculated optimal value corresponding to the entered real-time operating data. Subsequently, the recommended value is applied to the process through the DCS, and the result values are displayed on the DCS operator screen. The process can be safely and effectively operated through the optimal operating conditions according to the situation.
The predictive model, which recommends the optimal value, is the core of the framework. The predictive model processes real-time process data to recommend operation conditions necessary to reach the goal according to the situation. The predictive model is developed in three steps: learning, validation, and improvement. Figure 3 illustrates the development sequence of the predictive model. At first, the data from the target process is collected and preprocessed. Next, the predictive model is learned by selecting a model that fits the purpose. The developed predictive model can then be validated to improve its performance, and finally the model can be improved by hyper-parameters tuning if it is not at the desired level of performance. The development of predictive models suitable for target processes is the most important objective of applying the framework to other processes. In the following sections, we detail the development of the ML-based predictive model.

| Model learning
In the model learning step, the predictive model is learned using preprocessed data. The predictive model is a ML-based model that comprises various constituent models as described below. ML is defined as an academic field that uses data to learn implicit patterns using computers. ML is divided into supervised learning, unsupervised learning, and reinforcement learning, depending on the goal. Supervised learning is mainly used for systems that predict values/labels, unsupervised learning is used to extract patterns and reinforcement learning is used to establish systems that can interact. A ML model is chosen according to the goal. The ML model that should be used depends on the purpose for which ML is applied to the process or the process situation, but most separation processes require appropriate operating values. The model to be developed in this study is a model that predicts the temperature value of 64 stages F I G U R E 2 Schematic of operation system (A) Existing operation system and (B) proposed operation system using time series data. Therefore, we used algorithms for time series regression in supervised learning.

| Model validation: Hold-out cross-validation
The validation objective is to confirm the efficacy of the prediction model. 30 Hold-out cross-validation is a popular cross-validation technique, and it is illustrated in Figure 4. The hold-out cross-validation strategy is currently considered one of the most reliable ways to estimate the accuracy of a prediction model. [31][32][33] First, the data set is divided into training sets for model training and test sets for estimating generalized performance. Overuse of the same test set in model selection will result in over compatibility of the model and training set. Therefore, it is appropriate to divide the data sets into training, validation, and test sets. The validation sets are used to repeatedly train models at different hyper-parameters values and evaluate performance. This process is called hyper-parameter optimization (HPO). To improve predictive performance, it is required to tune and compare hyper-parameters. There is no fixed formula or the correct answer for the training data ratio. Therefore, you can set an appropriate ratio for the situation. In this study, RMSE and R 2 were used as performance evaluation indicators. The training data were used by dividing 70% of the total data and the remaining 30% as test data, and again, training data and validation data were divided into 70% and 30%, respectively in the training data.

| Model improvement: Hyper-parameter optimization
In the model validation step, the performance of the predictive model may not meet certain levels. Thus, the reliability of the predictive model itself decreases, and so does the need. To solve these problems, models need to use HPO to improve accuracy and precision. Studies of the effects of hyper-parameters on different deep learning architectures have shown complex relationships, where hyper-parameters that give great performance improvements in simple networks do not have the same effect in more complex architectures. 34 Depending on the data and model, the appropriate hyper-parameters differ, and there is no absolute good value. With no explicit formula for choosing a correct set of hyper-parameters, their selection often depends on a combination of previous experience and trial and error. 35 The hyper-parameters responsible for the external elements of the model vary depending on the type of algorithm. There are learning rate, batch size, epochs, the number of hidden layers, and the number of hidden nodes in hyper-parameters of the GRU algorithm. In this study, HPO was performed on the number of hidden nodes and batch size.

| Mixed butane separating process
The target process is the separation of mixed butane into n-butane and iso-butane, and the production of n-butane of 99% purity or higher. The core facility is the distillation column, which is the most basic and general facility in the petrochemical industry. Figure 5 is the schematic diagram of the target process. This process involves two distillation columns that are of 20,000 and 25,000 metric tons per annum (MTPA) in capacity, respectively. The processing capacity of these distillation columns is 45,000 MPTA of n-butane, and the number of column stages is 78; the feed stage is at stage 35 and the product stage is at stage 64.
The goal of this distillation process is to maintain the 99% purity of the product being produced, but it is difficult to operate the process in accordance with the purity of the product. Measuring the purity of a product consumes time and cannot be checked in real-time, therefore, we have used the temperature at the product stage as an indicator instead of the purity of the product. The temperature at the stage is extremely important because the distillation column consists of equipment that separates substances using the differences in the boiling points. Therefore, the temperature at the production stage of the product, stage 64, and the purity of the product are closely related and maintaining the temperature at stage 64 (T 64 ) at 71.5°C can ensure 99% of the product purity as reflected by the data accumulated from past experiences regarding the process. To maintain T 64 at 71.5°C, process operators controlled the steam flow rate of the steam used in the bottom boiler. The operation conditions of the distillation column, similar to the steam flow rate, required adjustment according to the amount of n-butane in the incoming raw materials. As the flow F I G U R E 4 Schematic of hold-out cross-validation F I G U R E 5 Schematic of the target process. MTPA, metric tons per annum rate and composition of the feed raw materials were not precisely known, this process required recommendation values for the steam flow rate. Therefore, we have developed a data-driven model to derive the steam flow value according to the process conditions.

| Predictive model development
The objective of the target process, the mixed butane distillation process, was to maintain T 64 at 71.5°C to maintain the target purity even if the incoming feed changed. Our framework was applied to optimize and stabilize the process through optimal operation conditions for the mixed butane distillation process. To apply the framework on the target process, the predictive model, which is the core of the framework, was learned through the target process data.
The energy optimization problem is to operate at the target at a minimum cost. If the energy optimization problem is defined as an objective function, a constraint, and so on, it is as shown in Equations (7)-(11) below. The objective function is the minimum energy, and the constraint is to maintain the purity of the product above 99%. Therefore, the decision variables were set to T 64 and the steam flow rate, which affects the purity of the product and energy.

| Process data
The data learned from the predictive model was a total of 5 weeks of data from June 1, 2018 to July 4, 2018 with an interval of 1 min. The total amount of data to learn was 46,066 data excluding missing values. Figure 6 illustrates the process data: feed flow rate, reflux flow rate, steam flow rate, product flow rate, bottom flow rate, bottom pressure, top pressure, and T 64 . The data is preprocessed data for the distillation column of 20,000 MPTA capacity (DA-141).

| Model learning and evaluation
We developed the T 64 predictive model in order of model development. First, we divided the data into input and output variables for model learning, and set details such as activation, optimizer function, and loss function. Table 1 lists the parameters and various algorithms used in the predictive model. The predictive model learned four core variables using the GRU model and Adam optimizer function. The training data used for the training of the model were 70% of the total data, and the test data used for validating the performance of the model were 30% of the total data. To verify the performance of the developed model, we compared the test data with the values predicted by the predictive model. Figure 7 illustrates the temperature at the stage 64. The black line is the temperature at the test data, and the red line is the temperature at the predictive values with the predictive model. The RMSE and R 2 values for the predicted performance of the model are 0.151 and 0.780, respectively. Although this level of performance is satisfactory, we have proceeded with HPO to use a better model. The HPO was conducted for only two hyper-parameters, which are the number of hidden nodes (neurons) and the batch size. As a result of the HPO, the number of hidden nodes was changed from 1 to 10, while the batch size was changed from 512 to 64. Figure 8 illustrates the predicted T 64 by the improved model. The black line is T 64 of the test data as shown in Figure 7. The red line is T 64 of the predictive values with the improved predictive model. The RMSE and R 2 values for the predicted performance of the improved model are 0.117 and 0.865, respectively. We confirmed the model's improved predictive performance with RMSE and R 2 values. Comparing Figures 7 and 8 show that the red line in Figure 8 resembles the black line, and the model's predictive performance improved due to the HPO.

| Recommendation of steam flow rate via the predictive model
The problem with the target process is that T 64 does not operate at 71.5°C. This problem can be directly related to the quality of the product, as explained earlier. To address these problems, it would be recommended obtaining a steam flow rate value that can maintain 71.5°C using the developed predictive model. Therefore, we provided the guidelines for the optimal steam flow rate value for the on-site operators. The optimal steam flow rate was calculated by the predictive model. The calculation process first increases the steam flow sequentially from zero, which is then input to the T 64 predictive model. And we record the steam flow rate when the T 64 reaches 71.5°C. We obtain the optimal steam flow rate by repeating this process. Figure 9 illustrates the steam flow rate of operation data and recommended values. The black line is the operation data, and the red line is the recommended values. Figure 10 shows the operation data of T 64 and the expected T 64 when using the recommended steam flow rate. The black line is the operation data, and the red line indicates the expected values. Figure 10 is not a T 64 with continuous process control at the recommended steam flow rate. This result means the optimum steam flow rate to maintain T 64 at 71.5°C each time and the expected T 64 when using the optimum steam flow rate. Using the steam flow rate value recommended by the predictive model, T 64 is close to 71.5°C, allowing for a stable operation and saving operating costs because the total amount of steam flow rate used also reduced. The recommended steam flow rate focused on providing the guidelines for the on-site operator who is usually confused when some operating values are in an inefficient position.
F I G U R E 11 Overview window F I G U R E 12 Process trends window

| ML-based software
The ML-based predictive model, developed based on framework, aims to recommend optimal operation values for process situations that change in real-time during the actual process. Since most processes use DCS to operate and control the process situation, there is a need to connect the DCS to the ML-based predictive model. Therefore, an ML-based software was developed to link the predictive model to the DCS and allow it to play its role. The overview window is shown in Figure 11. The current status and recommended operating conditions of the core variables can be checked in the overview window, which displays the distillation column and the flow rate of feed, product, and steam can be checked in real-time. The overview window also displays the recommended values of the optimal operating conditions depending on the changes in the feed. "A:" is the realtime value in the current process, and "R:" is the recommended value presented by the predictive model. The recommended values are sent equally to the DCS. The process trends window is shown in Figure 12. In the process trends window, the graph depicts the data being measured by sensors installed throughout the distillation column, and specific data of interest can be selected to see through the left pane. In addition, graphs provide real-time visibility of the degree and shape of change. The x-axis is time and y-axis is the value for the selected data; yellow is the real-time operation value and blue is the recommended operation value. Using the software, the process operation values can be checked in real-time, recommended operation values can also be checked according to the current situation, and the optimization process can be controlled by ensuring that the value recommended by the predictive model has been applied to real-time process.

| CONCLUSION
In this study, we developed a framework for the distillation process and applied it to a commercial distillation process. The framework enables process optimization by providing optimal operating conditions. The predictive model in the framework recommends operating conditions depending on the process situations. The predictive model is a learned model based on ML and developed through four steps: development, learning, validation, and improvement. The predictive model predicts the desired variable values and exploit them to derive recommended control values of other variables.
The commercial process was a mixed butane distillation process using distillation columns with a processing capacity of 45,000 MTPA. The purpose of this process was to produce normal butane of 99% purity. However, measuring purity in real-time was difficult due to the circumstances; hence, the predictive model was used to obtain the steam flow rate for maintaining the temperature at the stage 64 of the distillation column at 71.5°C. Therefore, we have developed the T 64 predictive model using data from two months of the process. The performance of the developed predictive model was evaluated as RMSE and R 2 , and the values were 0.107 and 0.863, respectively. Subsequently, we have applied the model to obtain the steam flow rate for maintaining the T 64 at 71.5°C. Moreover, we developed an ML-based software that can apply the values recommended by this predictive model to the process and applied it to the process. Our framework can provide the guidelines for the on-site operator to stabilize and optimize other distillation processes or real plant with a complex environment.