Forecasting Chinese carbon emissions based on a novel time series prediction method

Many calculation methods have been developed to forecast carbon emissions (CE). These methods can be classified into two categories: time series prediction and variables regression. However, time series prediction cannot fully use all CE‐related information and variables regression can only predict CE by knowing other variables in advance. To solve these problems, KLS is proposed in this paper which is a fusion of Kalman filter (KF), long short‐term memory (LSTM), and support vector machine (SVM). LSTM is used to predict CE as a time series, and 10 of 14 variables are selected by ridge regression (RR) to regress CE by SVM. Finally, KF is used to integrate the results of LSTM and SVM based on their variances. Overall, statistical filtering and deep learning algorithms are used in this paper, which achieved the fusion of time series prediction and variables regression. To confirm the effectiveness of KLS, we trained KLS model by CE from 1965 to 2005 and tested the model by the data from 2006 to 2014. The results showed KLS was more accurate than other four existing methods. Finally, we predicted the CE of China in next decade.


| INTRODUCTION
Due to the "reform and opening-up" policy, economic growth of China has greatly promoted. 1 However, with rapid economic growth, China's carbon emissions (CE) have threatened the environment. This should be caused by huge energy consumption since Granger causal relationship has been found between economic growth and energy consumption by Wang et al. 2 Overall, the huge energy consumption supports China's economic growth and, in the meanwhile, greatly damages the environment through carbon emissions.
Due to the relationship between CE and energy consumption, 3 China has become both the largest energy consumer and the largest carbon emitter in the world. Many factors affect CE, such as residential building sector, 4 gas consumption, 5,6 and carbon trading volume and price. 7 Previous studies have discussed their relationship with CE, and some of them developed methods to predict them. Under the pressure of all countries in the world and the importance of sustainable development, the Chinese government promised to reach the carbon emissions peak in 2030 (United Nations Climate Change Summit 2009). Therefore, Chinese energy consumption and CE have received attention all over the world. 8 The future changing trend of China in CE which can directly impact world energy structure and environmental change is a concern. 9 | 2275 LI Accordingly, many researchers are devoted to develop calculation methods to predict CE. We can divide their methods into two categories: 10 time series prediction and regression.
For time series prediction, researchers use CE data as time series and use CE from previous years to forecast CE in the next years. 11 Because many factors which cannot be fully counted can affect CE, this kind of method considers that the historical data of CE contain all the information affecting CE. Therefore, the input and output of this kind of method are only CE data. Gray prediction model (GM) and autoregressive integrated moving average (ARIMA) are used by Hsiao-Tien Pao and Chung-MingTsai 12 to forecast CO2 emissions, energy, and gross domestic product (GDP) in Brazil in 2011. ARIMA was also used to predict greenhouse gas (GHG) emission in Indian by Sen et al. 13 GM and ARIMA are suitable for stationary linear time series, but cannot perform well in nonlinear situations. Then, in 2017, Pao et al 14 proposed nonlinear gray Bernoulli model (NGBM) to predict same factors in China. They compared their method with ARIMA and GM and found that NGBM performed best among these methods. In addition, different forms of the GM's discrete equation and the continuous equation determined could not equal accurately. To overcome this problem, discrete gray model (DGM) was developed. 15 GM (1,1) and DGM (1,1) were used to predict CE, renewable energy, and economic growth in Vietnam by Ho et al. 16 Wu et al 17 developed rolling multivariable gray forecasting model to predict CE in Brazil, Russia, India, China, and South Africa (BRIC nations). In addition, Wenqing Wu et al 18 also proposed an improved nonhomogeneous gray model to predict CE in BRIC nations. Other methods like extended environmental Kuznets curve (EKC) was also applied into CE prediction. 19 In summary, traditional methods such as GM and ARIMA are the most common used method to predict CE as time series. However, two problems are existed in this type of method. (1) Novel and powerful methods need to be applied to improve the precision. (2) Applying CE historical data to predict CE results in poor interpretation.
For regression methods, researchers use CE-related variables such as GDP and energy consumption to predict CE. 20 Therefore, the input of this type of method is related variables and its output is CE. Zhou et al 21 considered residential sector, commercial sector, industrial sector, transport sector, and power sector to regress CE in China. Industry, transport, and buildings sectors are used to regress CE by Gambhir et al 22 System dynamic model (SDM) was employed to estimate the energy consumption and CE of China in 2008-2020 by Liu et al. 23 GDP and different types of energy were used to regress CE by SDM. Wei Sun and Mohan Liu 24 discussed the impact of three major industries on CE by least-square support-vector machine (LSSVM). Zhu et al 25 also used LSSVM to predict carbon price in 2017. However, too many support vectors of LSSVM results in poor sparsity and poor accuracy than SVM. Zheng-Xin Wang and De-Jun Ye 26 predicted CE by GDP using nonlinear gray multivariable model. Huang et al 27 integrated sixteen potential influencing factors by principal component analysis (PCA) and used long short-term memory (LSTM) method to predict CE in China. LSTM has been proved to be very suitable for long series prediction. 28 In summary, economic factors and energy-related factors are the most common variables to be used to regress CE, which makes the result easily to be understood. However, there are two drawbacks of this type of method. (1) The model built by this method can only explain the relationship between variables and CE, if we want to predict CE in the next few years, we should know the related variables in the next few years firstly. This leads to two types of error: error of related variables and error of model. These two types of error will cause much larger errors than linear superposition. (2) Since we need to know the related variables in the next few years firstly, this kind of method is not real prediction, but a kind of estimation.
To solve the drawback of these two kinds of methods, we developed KLS which is a fusion of Kalman Filter (KF), LSTM, and support vector machine (SVM) to borrow ideas of both time series prediction and regression. Firstly, LSTM is used to predict CE as time series, which can solve the problems of two types of error and fake prediction. Then, SVM is used to regress CE by 14 variables, which can solve the problem of poor interpretation. Finally, KF can be a bridge between time series prediction and regression to integrate results from LSTM and SVM. The final prediction of KF makes the accuracy better than both LSTM and SVM.

| Work frame
The work frame of our method is as Figure 1.
As we can see in the Figure 1, the workflow of KLS can be divided into three steps.
The first step is to predict CE as time series by LSTM. We can obtain the prediction result and error covariance Q LSTM of LSTM by this step (details in Section 2.2). The second step is to predict CE by other variables using SVM. 14 related variables are prepared at first. Since some of them may not strongly related to CE and multiple collinearities may occur between some variables, ridge regression (RR) is used to selected significantly related variables (details in Section 2.3.1). P variables (P < 14) are selected to do regression by SVM. To achieve the aim of prediction, the values of variables in last year would be used to predict CE in the next year. In the end, prediction result and error covariance R SVM of SVM could be obtained by this step (details in Section 2.3). Finally, KF is used to integrate results from LSTM and SVM. Kalman gain can be obtained from Q LSTM and R SVM . Then, it can be used to update the final prediction. At last, the whole error covariance of system could be updated. This covariance is critical for calculating Kalman gain in the next step of prediction (details in Section 2.4).

| Predict CE by LSTM
Because many classical linear methods are difficult to adapt to multi-input prediction problems, LSTM, 29 as a representative algorithm of deep learning in time series prediction, has gradually replaced GM, ARIMA, etc LSTM solved the vanishing gradient problem for RNNs by creating "forget gate." Now LSTM is a power tool to do time series prediction which is widely used in the field of stock, 30 energy, 31 and economy. 32

| Preprocessing of time series
Firstly, we should encode CE as a time series so that LSTM can use it to do prediction. Formula (1) shows the encoding way: In this paper, the previous 5 years data are used to predict the data of 6th year, so m should be 5 in formula (1). The previous 5 years data are shown in left side, and target data of 6th year are shown in right side. Since we want to use previous 5 years data to predict the data of 6th year, m should be 5. In addition, n should be the length of whole CE.

| Building model by LSTM
Forget gate makes LSTM quite different from normal recurrent neural network (RNN), and it is the key point why LSTM is suitable to predict time series. 33 Forget gate can help LSTM "forget" some historical information so that LSTM can focus on the new updated data which contain more useful information. We can understand this by this way, for example, if we LI want to predict CE in 2020, the usefulness of CE in 1965 must be less than that in 2019. LSTM can process data by following formulas: Firstly, we should input data. i t is the input gate. h t−1 denotes the output of previous step. x t is our input data. b i and W i denote the bias and weight of input gate respectively. () is sigmoid function.
Then, we should put data into forget gate. f t is the forget gate and W f is the weight of forget gate. h t−1 denotes the output of previous step. b f denotes bias of forget gate.
Then, we can update our previous state. C t denotes the state of previous step and W C denotes its weight and b C denotes its bias.
Then, we can obtain our new state by the previous state. Since we can obtain i t , f t and C t by (2), (3) and (4), we can put them into (5) and obtain the cell state of current moment.
Finally, we can put the cell state of current moment into output gate. o t denotes output gate, and W o and b o are its weight and bias, respectively.
The last step is to generate output for the next iteration. The structure of LSTM is shown as Table 1.

| Calculate error covariance of LSTM
Since we could not know the real error covariance of our prediction result, we could use error covariance of training results to approximate. The error covariance of training results would be obtained by formula (8).
LSTM(train) means the training results. It can be obtained by putting training data into LSTM model. CE train denotes the real CE value in the training dataset. std() denotes standard deviation function.
By formula (8), the estimation of error covariance of LSTM Q LSTM could be obtained.
If the prediction result of LSTM obeys normal distribution, Q LSTM is the standard deviation of this distribution. In the other way, the prediction result obeys N(result,Q LSTM ).

| Predict CE by SVM
Since we obtained data from 1965 to 2014, only 50 samples are obtained. SVM is suitable for building model from small samples, so it is chosen to regress CE-related variables to CE.

| Select variables by RR
The first step is to select CE-related variables by RR. RR is a typical way to select most significantly relevant variables to do regression since it can effectively avoid multiple collinearity. 34,35 Although RR has biasedness due to using partial information, the coefficients are more reliable. 36 Because we prepared 14 potential variables which may related to CE and the Pearson correlation coefficients (PCC) between them and CE are all very high with low P-value, to avoid multiple collinearity and reduce redundant information, RR is used to screen variables.
The biggest difference between RR and normal linear regression is loss function. The loss function of normal linear regression is as following: However, the loss function of RR is as following: As we can see, regularization parameter is introduced in RR, which makes the coefficients of unimportant variables very small. In this way, we can screen the most significantly relevant variables from 14 variables to do regression.

| Construct feature
The most important process of building model is to extract feature. As we mentioned in Section 2.3.1, we prepared 14 variables which are potentially related to CE. RR was used to screen the most significantly CE-related variables. Therefore, SVM would only use these variables to regress CE. Suppose we obtain P variables which are significantly related to CE, we take these P variables as the features of CE. To avoid doing fake prediction, the value of P variables in the previous year is taken as the feature of CE in the next year. In this way, if we obtain the value of P variables in this year, we can predict the CE of next year.
The encoding method in formula (11) is as same as the way in formula (1).

| Building model by SVM
The input of SVM is the feature constructed in the previous section and the output is corresponding CE. Since constructing SVM model needs two parameters, Grid search method with 10-cross validation has been used to select best parameters. Finally, we set the regularization parameter as 1.4, kernel width parameter as 2.9. In addition, radical kernel function is selected. Finally, the model of SVM could be denoted as:

| Calculate error covariance of SVM
This step shares the same situation as LSTM's. Error covariance of training results is used to approximate the error covariance of prediction.
SVM(train) means the training results. It can be obtained by putting training data into SVM model. CE train denotes the real CE value in the training dataset. std() denotes standard deviation function.

| Integrate results by KF
Kalman Filter can estimate the state of a dynamic system by combining many uncertainties information based on statistical method. 37 It can obtain more accurate estimation by combining the state of previous moment with the measured values.
In our work frame, LSTM can obtain the prediction from the value of historical moments and SVM can obtain the prediction from other measured values. Therefore, the two kinds of important information that KF needs can be obtained, so KF can obtain a more accurate estimation.

| Obtain final prediction by iteration of KF
There are five steps for KF doing iteration.
Step 1. Input the CE data of previous years and obtain Q LSTM .
Step 2. Obtain prior covariance P ′ k by setting initial P 0 .
Step 3. Kalman gain K k is obtained by inputting R SVM and P ′ k .
Step 4. Obtain the final prediction x k of KF.
Step 5. Obtain current covariance P k by P ′ k in the step 2.
By these five steps, KF can filter the prediction noise of LSTM and SVM effectively. Then, after setting up P 0 = 0.05, the final prediction of CE could be obtained.
The work frame of KF is as Figure 2.
The noise variance of LSTM and SVM could be used to obtain Kalman gain. Then, the final prediction of CE could be obtained based on Kalman gain and the prediction results of LSTM and SVM. Finally, the covariance could be updated by Q, R and previous covariance. The final covariance would be used in the next round of prediction.

| Data description
We obtained energy consumption, coal consumption, oil consumption, natural gas consumption, nuclear consumption, Hydroelectricity consumption, other renewable resource consumption, population, primary industry, secondary industry, tertiary industry, GDP, and industrial value-added of each year from "China Statistical Yearbook." The CE data are obtained from World Bank. Since World Bank only released CE data of China from 1965 to 2014, all the data we used in this paper are from this period.

| Variable selection
RR coefficients and PCCs between 14 variables and CE are shown in Table 2.
As we can see in Table 2, PCC cannot help us screen variables. However, 10 of 14 variables are screened by RR (P-value lower than .05). Because PCC can only test the relevance between variables and CE, although these variables all have strong relationship with CE, these variables have multiple collinearity. In other words, the information some variables offered have already been given by other variables, which makes some variables not that useful anymore.
Since the units of these 14 variables are different, the value of RR coefficients is not important. But the plus and minus sign of RR coefficients are crucial for us to know the contribution of each variables to CE. As our common sense, oil, hydroelectricity, nuclear energy, other renewable resource, and tertiary industry can help us reduce CE, so their RR coefficients are negative. However, the RR coefficient of natural gas is positive, which means consumption of natural gas can increase CE. It is not hard to understand since burning natural gas still causes carbon dioxide production. The reason it is cleaner than oil is

F I G U R E 2 Work frame of Kalman
Filter that natural gas combustion does not produce other pollutants, and the cost of processing and transporting natural gas is lower.
Overall, since the RR P-value of energy consumption, coal consumption. Natural gas consumption, nuclear consumption, other renewable resource consumption, population, primary industry, secondary industry, GDP, and industrial value-added are lower than 0.05, and these 10 variables are selected to predict CE by SVM. As we can see in Figure 3, the result of KLS is the combination of results of LSTM and SVM. Although LSTM performed best from 2009 to 2011, KLS performed best in the aspect of the whole period. In order to further compare these three methods, we use the mean square error (MSE), the average absolute   Table 3. We can clearly see in Table 3, KLS performed best in all three error indicators.

| Power of KF
This experiment shows the power of KF. KF can fix the error of each method based on variances and obtain a more accuracy prediction result.

| Power of KLS
In this section, we compared our method KLS with some common used methods. GM(1,1), nonlinear gray multivariable model (NGMM), ARIMA, and back-propagation artificial neural network (BP-ANN) are used to compare with KLS.
The blue line denotes prediction result of KLS. The red line denotes true value of CE. The black line denotes prediction result of NGMM. 26 The line with points is the prediction result of GM (1,1). The prediction result of BP-ANN is shown as dotted line, and ARIMA is shown as dotted line with triangles. The prediction result of Kernel-based nonlinear multivariate gray model (KGM) 38 is shown as dotted line with squares.
As we can see in Figure 4, KLS still performed best among these methods. The accuracy of ARIMA is the worst. Table 3 shows the error of these methods. Table 4 shows the performance of each method. KLS has the smallest error, followed by NGMM, then BP-ANN. ARIMA is the worst.
This experiment shows KLS is an effective method for forecasting CE, and it is better than the previous methods.

| Prediction of Chinese CE in 2030 by KLS
Since we have proved the effectiveness of KLS in the Section 3.4, we used KLS to predict CE of China from 2015 to 2030 in this section.
There are two steps of this prediction. The first step is to estimate CE of China from 2015 to 2019. Because 10 variables selected in the Section 3.2 from 2014 to 2018 can be obtained from "China Statistical Yearbook," we can input these variables into KLS model directly to estimate the CE of China from 2015 to 2019. The second step is to predict CE of China from 2020 to 2030. Because all the variables from 2019 to 2029 cannot be obtained, we should predict the trend of them firstly and then put them into the model to predict.
Therefore, for the first step, we trained the model by the data from 1965 to 2014 and estimated CE from 2015 to 2019. 10 variables, energy consumption, coal consumption, natural gas consumption, nuclear consumption, other renewable resource consumption, population, primary industry, secondary industry, GDP, and industrial value-added of each year, were obtained from "China Statistical Yearbook" and historical CE from 1965 to 2014 were known; therefore, it is easy to achieve the estimation. Figure 5 shows the estimation results by KLS. The results showed that CE leveled off from 2014 to 2016, and then, it gradually increased from 2017. The next step is to predict CE of China from 2020 to 2030. Since we cannot obtain 10 variables from 2019 to 2029, we summarized their trends from 2010 to 2018 and predicted their trends in three situations: optimistic situation, normal situation, and pessimistic situation. As shown in Figure 6, the annual growth rate of all variables does not change much, except for 'other renewable resource'. Based on the boxplot, first quartile, median, and third quartile were used to describe the trend of these variables from 2019 to 2029. Table 5 shows the predicted growth trend of 10 variables from 2019 to 2029. However, the growth of some variables promotes the increase in CE, while the growth of nuclear consumption and other renewable resource consumption help to reduce CE. Therefore, for variables other than nuclear consumption and other renewable resource consumption, the optimistic growth rate should be less than pessimistic growth rate, that is, the values of pessimistic situation and optimistic situation should be reversed. Finally, based on the predicted growth rate, the value of these 10 variables from 2019 to 2029 could be predicted. Then, KLS was used to predict the CE of China from 2020 to 2030.
As shown in Figure 7, we predicted CE of China from 2015 to 2030 in three situations. The red line is the prediction results in normal situation. The black line is the results in pessimistic situation, and green line denotes the optimistic situation. The prediction results showed the CE peak would be reached in about 2024 and then CE would gradually decrease, but before that, CE in China would keep increasing.

| CONCLUSIONS
China has become the world's highest carbon emitter, and its impact on the environment has received much attention. Forecasting China's CE can show the energy demand and environmental impact in advance, which is conducive to the improvement of low-carbon development policies.
Many researchers have developed methods to forecast CE. Due to the complex nonlinear relationship between CE and other variables, the accuracy of these methods is unsatisfactory. In addition, most of these methods do not have the ability to predict CE. They can only predict CE by obtaining corresponding variables in advance. Most scholars first roughly estimate the corresponding variables and then bring them into the regression model to predict CE. This results in the secondary transmission of errors, which directly affects the accuracy of the results.
To solve these problems, KLS is proposed in this paper. KLS combines the advantages of time series prediction and regression and perfectly integrates the two kinds of methods in the way of statistical filtering. LSTM is used to predict CE as time series and SVM is used to predict CE by 10 relevant variables. Finally, KF fixes the results of LSTM and SVM, and gives a more accurate results. 10 relevant variables are selected by RR, which effectively avoid the problem of multicollinearity. In addition, to achieve the real prediction by regression, 10 variables of this year are used to regress the CE of next year. In this way, we can predict the CE in 2015 by the ten variables of 2014.
Then, two experiments were completed to show the superiority of KLS. In these experiments, we trained KLS model by CE from 1965 to 2005 and tested the model by the data from 2006 to 2014. The first experiment compared KLS with LSTM and SVM. The result showed that using KF to fuse these two methods can achieve better accuracy than using only one of them. The other experiment compared KLS with GM (1,1), ARIMA, BP-ANN, and NGMM. The result showed that KLS performed best among these method. Overall, these experiments proved that KLS is very suitable for forecasting CE.
After verifying the effectiveness of KLS, we used it to predict CE of China in 2030. Since we had the CE data from 1965 to 2014 and 10 variables' value from 2014 to 2018, we predicted the CE from 2015 to 2019 in the first step. In the next step, we firstly estimated the 10 variables from 2019 to 2029 based on the historical trend of these 10 variables. We estimated these 10 variables in three situations: pessimistic situation, normal situation, and optimistic situation. Then, we put these variables into KLS model to predict CE from 2020 to 2030. Finally, we obtained the CE prediction results in three situations. The results show that CE in China will keep increasing and then reach the peak in 2024 and then gradually decrease.

ACKNOWLEDGMENT
This work was supported by the National Social Science Foundation of China (No: 16CJY074).