An ensemble‐driven long short‐term memory model based on mode decomposition for carbon price forecasting of all eight carbon trading pilots in China

The carbon trading market has become a powerful weapon in alleviating carbon emissions in China, and the carbon price is at the core of its operation. Hence, the carbon trading market serves as an indispensable component in forecasting the carbon price accurately in advance. This paper innovatively explores an ensemble‐driven long short‐term memory network (LSTM) model based on complementary ensemble empirical mode decomposition (CEEMD) for carbon price forecasting, applying it to all eight carbon trading pilots in China. The CEEMD was initially implemented for mode transformation in order to decompose the original complicated mode into a set of simple modes. Then, the partial autocorrelation function selected time‐lagged features as inputs for each mode. Subsequently, the LSTM was used to model the mapping between time‐lagged factors as well as each mode's target values, constructing multiple LSTM models for ensemble learning. Finally, the inverse CEEMD computation was introduced to integrate the anticipated results of the multi‐mode into the final results. Its practical application simultaneously embraced all eight carbon pilots in China, covering their corresponding carbon price data over a considerably long period. The obtained results illustrated that the proposed model driven by ensemble learning possessed sufficient accuracy in carbon price forecasting in China compared with the single LSTM model as well as other conventional artificial neural network models. Furthermore, according to the scope of its application, the innovative model exhibited strong stability and universality.


| INTRODUCTION
Global climate change, which was triggered by the excessive emissions of greenhouse gases, has become an obstacle in sustainable development. Among all greenhouse gases, carbon dioxide is the most prominent and is artificially controllable. China ranks as the largest emitter of carbon dioxide worldwide and is a pioneer in economic growth. 1 As the international community is advocating in reducing carbon emissions, Chinese carbon emissions and its corresponding | 4095 SUN aNd LI changes have become the focus of attention, which burdens the Chinese government with tremendous pressure. The EU Emissions Trading Scheme (ETS) is supposed to be an internationally effective means of reducing carbon emissions according to a previous study. 2 Inspired by this, and in order to cope with growing economic development demands and alleviate the adverse effects of climate change, a total of eight carbon trading pilots have been established in China since 2013. This was followed by the diffusion of national emissions trading scheme (ETS), which was introduced on December 19, 2017, and nearly doubled the coverage of the carbon pricing system. The carbon price is well known to be at the heart of establishing a stable and effective ETS. Besides, accurately grasping the trend of price fluctuations in advance may both assist the government in constructing a stable and mature carbon price mechanism and help investors take timely steps in reducing investment risk. In this light, exploring an accurate, stable, and universal carbon price forecasting technique is of prime significance for Chinese ETS.

| Overview of studies on carbon price forecasting and other relevant prediction techniques
An increasing number of academic studies exist on carbon price forecasting as well as techniques applied in embracing two types: time series prediction and multi-factor prediction. Multi-factor prediction refers to several related exogenous variables being used in the forecasting process. This method has achieved definite success, though in certain cases, adverse consequences have emerged, owing to two obvious issues. First, the final prediction results are based on the prediction values of other exogenous variables, which may cause accumulations in errors. 3 Second, multicollinearity may be present among the selected variables, inevitably leading to over-fitting. 4 Time series forecasting is independent of exogenous variables and can acquire future trends by using intrinsic characteristics of its historical data. Therefore, time series prediction is both feasible and highly efficient. Moreover, the wide application of time series prediction 5,6 has demonstrated its validity in forecasting. Accordingly, in this study, multifactor forecasting was abandoned while time series forecasting was adopted in order to anticipate the carbon price.
The time series forecasting technique is usually data-based and can be categorized into two branches involving statistical methods and machine learning algorithms. Statistical methods previously employed for ETS carbon price forecasting include multiple linear regression, 7 generalized autoregressive conditional heteroskedasticity (GARCH)-type model, 8 heterogeneous autoregressive model of realized volatility (HAR-RV) model, 9 and gray model GM(1.1). 10 Nevertheless, the nonlinear and high volatility of carbon prices may not be dealt with well enough via the statistical method. The advent of machine learning has spawned a considerable number of studies on future trends of the carbon price, in which nonlinear methods are involved, for example, the artificial intelligence neural network (ANN), least squares support vector machine (LSSVM), and other hybrid methods. Yi et al 11 confirmed its effectiveness in analyzing future trends of EUA carbon price by executing a type of feed-forward neural network (FFNN) technique. Tsai and Kuo 12 carried out the radial basis function neural network (RBFNN) with parameters optimized by ant colony optimization so as to forecast the carbon price. Additionally, an empirical study done on carbon price forecasting for four seasons demonstrated that RBFNN is superior to other contrast models. Zhu et al derived a model with a combination of empirical mode decomposition and particle swarm optimization, which improved LSSVM to forecast the EU carbon price. The corresponding results proved that the proposed model was more appropriate compared with conventional models in terms of accuracy and stability. 3 In order to obtain a more accessible method in predicting the carbon price, Zhu 13 proposed an ANN-based multiscale ensemble forecasting model which integrated empirical mode decomposition, genetic algorithms, and artificial neural networks for carbon price forecasting, which resulted in superior forecasting results. Among these methods, the traditional ANN and hybrid models extended from it are extensively used and have yielded quite adequate results in strong fitting capability and fault tolerance for nonlinear and complex time series. However, modeling of the nonlinear carbon price pattern is still underdeveloped as existing techniques are static in nature and unable to handle long-term dependencies.
The recurrent neural network (RNN) is a special neural network structure belonging to deep learning. Unlike traditional ANN, for example, FFNN and RBFNN, it incorporates feedback loops, enabling information flow that forms a closed loop between two adjacent hidden layers, which may subsequently learn features in the long-term and extract hidden information more comprehensively. 14 However, its inability to solve vanishing or exploding gradient problems in traditional RNN could lead to inaccurate results; thus, an alternative novel method called LSTM emerged in order to effectively overcome this issue. As a promising approach, the LSTM model has attained state-of-the-art performance in challenging prediction issues. Zhang et al 15 innovatively adopted the LSTM model to predict wind turbine power, and case studies have proved that its forecasting accuracy could be greatly improved. Moreover, the convergence speed of LSTM is faster compared to that of FFNN, RBFNN, and other approaches. Xia et al presented an LSTM-driven deep learning framework to forecast the remaining useful life of the machine, and incorporating multiple time windows to further enhance its applicability and effectiveness. Their empirical study verified its superiority by comparing it with individual benchmark models. 16 Ma et al 17 combined the Grid concept and LSTM (G-LSTM) for the forecasting of fuel cell degradation. More than that, the latest researches applied LSTM to more hot areas of prediction, for example, electricity price forecasting, 18 flood forecasting, 19 wind speed forecasting, 20 air pollution forecasting, 21 voltages forecasting, 22 demand forecasting, 23 photovoltaic power forecasting, 24 and so forth. 25,26 The LSTM was wielded to alleviate the vanished gradient in a multi-layer network architecture. Above all, the prediction results indicated validation of this developed model. However, to our knowledge, few studies have applied this technique in carbon price forecasting. Consequently, this study attempted to extend the LSTM model into carbon price modeling and forecasting and provide new references for accurate carbon price prediction.
Considering the high volatility and intermittence of the researched carbon price series, a single model is insufficient in obtaining satisfactory forecasting results. Inspired by ensemble learning theory, where various models or multi-mode learning extract inherent characteristic information from multiple dimensions resulting in better forecasting quality compared with individual models or algorithms. 27 Wang et al ensembled three kinds of machine learning approaches involving the random forest model, long short-term memory model, and Gaussian process regression model to forecast wind gusts. The sufficient accuracy and generalization performance of the established model was then verified by a series of empirical studies. 28 Yu et al presented a novel hybrid ensemble model in forecasting monthly biofuel production. In this model, the original time series was decomposed and reconstructed into multi-mode by the empirical mode decomposition method and fine-to-coarse approach, after which the modes were separately predicted and the results of the individual predictions were accumulated to generate the final prediction results. The aforementioned study indicated that the proposed hybrid ensemble forecasting technique competitively predicted biofuel production. 29 Ribeiro and dos Santos 30 analyzed and studied the role of different types of ensemble learning methods (Bagging, Boosting, and Stacking) on predicting price series, and finally concluded that ensemble technique showed statistically significant gains, reducing prediction errors to a large extent. In addition, many recent studies 20,31-33 have analyzed ensemble learning methods and have made great progress in this regard. Thus, ensemble learning based on mode transformation methods was introduced to forecast the carbon price in this study. Model transformation methods used in existing studies have mainly included empirical mode decomposition (EMD), 3 variational mode decomposition (VMD), 34 ensemble empirical mode decomposition (EEMD), 35 and wavelet transform (WT). 36 However, these methods are not sufficiently advanced. For example, EMD possesses problems in modal aliasing; EEMD overcomes this issue but contains residual noise. The complementary ensemble empirical mode decomposition (CEEMD) is a type of self-adaptive approach used to decompose time series into multi-mode, which are named intrinsic mode functions (IMFs). 37 As the most up-to-date advancements pertaining to EMD, 38 CEEMD alleviates the mode mixing issue of the traditional EMD by adding adaptive white noise and implementing averaging calculation while addressing the issue where added white noise in EEMD was not completely neutralized. 39 Moreover, the decomposition level of CEEMD is only determined by the length of the time series and independent of the set basis function. 40 Thus, the CEEMD is preferable in integrating with LSTM in order to establish an ensemble learning framework (ED-LSTM). Additionally, to the best of our knowledge, CEEMD has never been used in carbon price time series forecasting. In light of these aspects, an innovative ensemble-driven LSTM model (ED-LSTM) based on CEEMD was proposed in this paper. The superiority of the ED-LSTM model mainly derives from the two aspects: LSTM has memory units and can acquire serialization features and long-term dependencies; the ensemble learning architecture enables the model to enhance feature representation, information extraction, and the application of long short-term relationships in order for the data to be deeply explored and for their features to be utilized to the greatest potential. The modeling process entailed the following three steps: (a) the CEEMD was introduced to multi-mode feature extraction, transforming a complex single mode into multi-mode; (b) LSTM was implemented on each mode to learn features; (c) the inverse CEEMD calculation was utilized in integrating the results of multi-mode learning.

| Overview of empirical research on carbon price of China carbon trading pilot
Gokhan et al 41 pointed out that under the premise of ensuring forecasting accuracy, generality and stability are extremely important for one model. The generality of a model refers to the breadth of its application. Conducting research over a long period range can measure whether the forecasting model has enough generality to a certain degree. Chinese carbon trading pilot emerged relatively late, hence, studies are limited. Table 1 lists related papers according to the perspectives on the amount of China carbon trading pilots covered as well as the corresponding time horizon. Moreover, the table demonstrates that most research attempts to predict the carbon price of one or several carbon markets and deduce the good applicability of this method to other carbon price series. However, it is not reasonable enough as the inherent characteristics of carbon price data vary with changes in different Chinese carbon markets. In past studies, only one paper 54 predicted all carbon prices of the eight carbon markets, which shows that thus far, very few related studies exist on all Chinese carbon price forecasting. The researched period range on the study, 54 however, was not long enough, and the sample capacity was limited, which lacked a certain comprehensiveness for The study 8 Seen in Table 2 carbon trading markets in China. In addition, the study 54 proposed two kinds of models to predict carbon prices, one of which could not fully adapt to all data characteristics and obtain good prediction results. Therefore, the proposed model was not sufficiently universal.
In order to strengthen the existing literature, this paper aims to explore a general and stable forecasting model that could precisely anticipate future carbon prices for eight carbon trading pilots in China given the time series data. An ensemble-driven LSTM model (ED-LSTM) based on complementary ensemble empirical mode decomposition (CEEMD) that builds on multiple LSTM models in different modes for ensemble learning was explored to forecast carbon price. Methodologically, CEEMD was initially employed for multi-mode feature extraction, and PACF was executed in each mode so as to select time-lagged factors as input. Subsequently, LSTM was implemented for multimode feature learning, and inverse CEEMD computation was ultimately utilized to integrate the results of the forecasted multi-mode. Empirically, all eight-pilot carbon trading markets in China were simultaneously researched, and the proposed model was applied to forecast carbon prices in all markets during a long forecasting period. The range of each carbon price series data essentially covered the daily carbon price from its inception of the transaction to the most recent period. The contributions and findings of this study are concisely summarized as follows: • This paper focuses on the accuracy of the model as well as its stability and universality. • This study newly proposes a general carbon price forecasting technique and applies it to accurately and simultaneously forecast all Chinese carbon prices, spanning data over a considerably long period. • A novel ED-LSTM model is established in order to forecast the carbon price. LSTM is introduced to learn F I G U R E 1 A, The simple architecture of FFNN and RBFNN. B, The simple architecture of RNN

F I G U R E 2
The structure of the LSTM network long-term characteristics. Moreover, inspired by ensemble learning, an ED-LSTM model based on CEEMD is formed, capable of deeply extracting inherent features of data from various aspects, thus greatly enhancing its predictive performance. • The explored model is shown to be accurate, stable, and universal, which may serve as a promising tool for Chinese carbon price forecasting.
The rest of the paper is organized as follows. Section 2 detailed describes the methodologies used throughout this paper. In Section 3, we introduce the proposed model and its building process. Section 4 presents eight case studies of China carbon market using the proposed method as well as related models executed for comparison. In this section, the forecasting results of utilized models are also analyzed and compared. In Section 5, we draw some key conclusions.
F I G U R E 4 All eight China carbon markets and data description 4100 | SUN aNd LI 2 | RELATED METHODOLOGIES

Mode Decomposition (CEEMD)
The CEEMD, a novel adaptive signal decomposition approach, is developed in 2010. 55 As illustrated in Equation (1), it is utilized to decompose the original complex time series signal into a limited amount of intrinsic mode function (IFMs) and a residue component.
where C i (t) denotes the ith IMF, along with R(t) is the residue component that represents the main trend of the original signal.
Complementary ensemble empirical mode decomposition is a promising substitute for EMD and EEMD giving that it not only can effectively solve the problems of decomposition instability, pattern aliasing and endpoint effects in EMD methods but also is an impactful method to suppress the error caused by the artificially added white noise in EEMD, meanwhile, it is computational-efficient. In the process of executing the CEEMD algorithm, two vital parameters are involved. The amplitude k of white noise is usually set at 0.05-0.5 times. Driven by data characteristics and empirical research, this paper sets k as 0.4 and the iteration time P as 100. This setting can not only ensure the effect of decomposition but also ensure the decomposition speed. The succinct calculation procedures of CEEMD are as follows.
(1) Two kinds of new signals x + p (t) and x − p (t) are obtained by adding a pair of standard white noise of the same magnitude and 180° phase angle difference to the original signal x(t). Where x + p (t) represents the summation of the original data and positive noise, while x − p (t) means summation of the original data and the negative one. (1) and (2) for P times until a smooth decomposition signal is acquired. The ensemble pattern of all corresponding IMFs is computed by the following Equation (3).
where C i represents the ith final IMF component derived from CEEMD. Where P is the iteration time set in this paper.

| The partial autocorrelation function (PACF)
Partial autocorrelation function is capable to deal with correlations between the given time series and its lagged values, which have made it widely used in describing structural characteristics of the stochastic process. For time series x t , the so-called bias autocorrelation coefficient of lag k is the correlation metric that x t−k affects x t under the condition without regarding the intermediate k − 1 variables.
Overall, the k-order autoregressive model is described as follows: where kj denotes the j-th regression coefficient of the k-th order autoregressive equation, and kk is the last coefficient among them.

| Long short-term memory network (LSTM)
Long short-term memory network is a kind of recurrent neural network (RNN). Traditional neural networks (NN), for example, FFNN and RBFNN as shown in Figure 1(A), deems the information as independent ignoring relevancy between the data and considers the only input is the input data at the current time. Significantly distinct from traditional NN, RNN, shown in Figure 1(B), incorporates feedback loops (4) that allow information flow forming a closed loop between two adjacent hidden layers, constructing a self-joining hidden layer. Theoretically, the structure can make hidden units share parameters across time indexes and long sequences memory is thus built to recognize and predict. In practice, however, the issue of vanishing gradient prohibits its extensive application in complex long-term tasks.
Therefore, LSTM proposed by Hochreiter and Schmidhuber is gradually developed into an available method. 56 It can gain serialized features and long-term dependencies. Meanwhile, it can eliminate the gradient vanishing issue exiting in the traditional RNN algorithm by introducing the memory cell and self-connected gate mechanism in the hidden units. Among these core Building process Parameter setting improvements, memory cells are exploited for historical information recording. Gate structure does not provide information, in contrast, it is essentially a multi-level feature selection method to remove or add messages to the cell state. Figure 2 depicts the structure of LSTM memory units. Four main operational phases are described in conjunction with Figure 2.
I. Forgetting phase. Forget the input from the previous node selectively. That is, the forgetting gate controls the retention or forgetting of information from the last cell state. The performance of the forget gate is shown in Equation 5.
II. Selecting and memorizing phase. Determine what new information will be reserved in the current cell state. The sigmoid layer is the input gate layer determining what value will be updated; the tanh( ) layer creates a new candidate value vector C (t) , and adds it to the current state.

III.
Updating phase. Update the cell state from C (t−1) to C (t) by the following Equation 7. The cell state C (t−1) multiply by Γ f to forget the information needed to forget and plus current input cell state C (t) multiplied by the (5) Parameter setting LSTM Exactly the same as LSTM prediction process shown in Table 4 FFNN  IV. Outputting phase. The output is obtained according to the cell state. First, the sigmoid layer decides which state of the cell will be the output. Then, the cell state is coped by the tanh(·) layer and multiplied by the above result to finally determine the output information.
where Γ f Γ i Γ o denotes the forget gate, the input gate, and the output gate, respectively, along with a (t) is the implicit vector of all useful information stored at time t and before,C (t) represents cell state vector, W f , W i , W c , and W o denote the weight matrices of the forget gate, the input gate, the current input cell state, and the output gate, respectively, as well b f , b i , b c , and b o are corresponding offset vectors. is the sigmoid function and tanh(·) is the hyperbolic tangent function.

| Ensemble-driven LSTM technique based on mode decomposition CEEMD (ED-LSTM)
The core concept of the ensemble learning framework is integrating one or more constituent algorithms or multi-mode learning to obtain improved performance. 14 Given the nonlinear complexity and high volatility nature of carbon price series in single mode, along with its vulnerability to external factors, the ensemble-driven LSTM model based on mode decomposition CEEMD was employed to improve the performance of LSTM.
According to the aforementioned description of CEEMD process, its proposed procedure of application in this paper was as follows: Forecasting results between different models for Beijing carbon price 1. Standard white noise was added into the original carbon price time series, resulting in multi-mode frequencies being separated more readily, facilitating the addressing of mode mixing related issues. 2. Different IMFs were identified that originated from the raw carbon price series. 3. The ensemble calculation in the corresponding decomposed IMFs and R were realized as the ultimate result: where Y represents the original carbon price series, which can subsequently be decomposed into i IMFs (marked as M) and a residue mode R.After CEEMD, PACF was applied to extract latent input information for each IMF and R. Then, LSTM was performed in each mode. Specifically, (m + 1) LSTM models were applied to learn features of different modes, respectively. The modeling architecture may be presented as: where M(t − k) denotes the selected input factors of each IMF mode, exerting influence on M under the given confidence interval. k refers to the time-lagged feature of the k-order. Analogously, R(t − k) are the input variables of R mode, where M ⋀ i (t) refers to the prediction results of each IMF and R ⋀ (t) is the prediction results of the R mode.
Finally, the inverse CEEMD was used to restructure the forecasted carbon price value Y ⋀ (t) by means of Equation 11.

| Building process of the proposed model
Essentially, by adhering to the technical route "mode decomposition, separate modeling, and ensemble learning", a novel ED-LSTM model was explored to forecast the carbon price in Chinese carbon markets. The building process is presented in Figure 3, and the detailed summarization is given below.
Step 1: The CEEMD technique was initially implemented for mode transformation to decompose the original Forecasting results between different models for Guangdong carbon price complicated mode into a set of simple modes, which was conducive in acquiring ideal prediction results. Through this approach, m IMFs as well as a residue mode R were generated.
Step 2: PACF was performed on each decomposed mode to mine suitable input features for the follow-up forecasting procedure.
Step 3: (m + 1) LSTM models were constructed according to Equation (10) in order to acquire their forecasting values.
Step 4: The inverse CEEMD was applied to gain the predicted carbon price values according to Equation (11). Ultimate predicting results can be obtained by aggregating forecasting values of all IMFs and R.

| China carbon markets and datasets description
Since the establishment of the first carbon trading market pilot -Shenzhen carbon market on June 19, 2013, the Chinese government has successively established seven regional carbon trading markets in Beijing, Guangdong, Shanghai, Tianjin, Hubei, Chongqing, and Fujian. Figure 4 displays the carbon trading volume (unit: 10 000 tons) from their inception to June 1, 2019. Among them, the larger steam drum represents the more trading volume. The total volume is up to 17 122 which is nearly a quarter of the total national carbon emissions in 2016, showing the prosperity of the carbon market. Besides, the carbon price is the hotpot in the course of carbon trading. Therefore, it is necessary to comprehensively predict the Chinese carbon prices of all carbon markets due to the differences in their economic development, geographical location, and policy regulations. The target of this research is to apply an innovative approach to forecasting carbon price among all eight Chinese carbon markets. The original daily carbon price series are obtained from the China carbon trading website (http:// www.tanji aoyi.com/). The concrete values are shown in Appendix S1. These acquired time series are drawn in Figure 4, where three parallel lines from top to bottom respectively represent the maximum, average, and minimum values of each carbon price series, meanwhile, high volatility, nonlinear, uncertainty, and complexity are clearly F I G U R E 9 Forecasting results between different models for Shanghai carbon price seen. More detailed statistical information is illustrated in Table 2, including the minimum value (Min), the maximum value (Max), the mean value (Mean), standard deviation (Std) and sample entropy (SE). Among them, Std is the standard for measuring the degree of dispersion of data distribution and SE denotes the complexity of a time series. The smaller SE implies the less complex of the given series and vice versa. Each series was branched into a training set and a test set in case studies where approximately 80% of the data were used to train the model and the rest were employed as the observation. Figure 4 shows the curve of each carbon price series containing the daily carbon price for each carbon market for a long period of time since its inception, along with the left and right sides of the vertical red line represent the training subset and test subset. Details are presented in Table 2.

| Multi-mode feature extraction and input selection
The proposed mode transformation technique was employed to decompose the original carbon price series into 8 IMFs and one residual R. Shenzhen carbon price series decomposed by CEEMD was taken as an example to illustrate. These modes in Figure 5(A) are shown in order according to the frequency from high to low. The decomposition results of other carbon price series used in this paper are similar. Subsequently, PACF was utilized to extract input features that have high relevance to each decomposed mode, regarding the short time interval and the intrinsic relationship of selected data. Set x i as output and once the PACF result at k lag is out of the 90% confidence interval, x i−k is mined as an input feature. Similarly, the Figure 5(B). takes Shenzhen carbon price as an example to display input variables of each mode. Table 3 lists the input variables for each mode in carbon price series of eight different carbon markets.
And thus, nine LSTM models were built and trained to learn the features of each mode respectively.

| Parameter setting
In this paper, the ED-LSTM model was explored for carbon price forecasting in eight Chinese carbon markets. Its constructing and modeling procedures were based on the parameter setting in Table 4. The parameter setting of the LSTM technique is detailed illustrated here. Comprehensively consider performance and computational cost, the structure of LSTM with two layers was chosen in this paper. As for the number of the nodes in the second layer, if the amount of hidden layer node is too small, the network cannot have the necessary learning ability and information processing ability. On the contrary, if too many, it will not only greatly increase the complexity of the network structure rendering the network is more likely to fall into a local minimum during the learning process, but also make the learning speed of the network become very slow. Therefore, choosing a moderate number of hidden layer nodes is necessary. We set it as 10 in this paper. Since the total sample size of each empirical study in this paper is relatively large, in order to improve the computational efficiency, the Batch-size and Time-step are set relatively larger, and both set as 20. For the parameters of Learning rate and Training epoch, they were set as 0.0006 and 500, respectively based on the previous trial and error research. As for the inverse CEEMD arithmetic, the input node M ⋀ i (t) and R ⋀ (t) denote the prediction results of each IMF and the R mode, respectively, meanwhile, the output node Y ⋀ (t) is the finally prediction results.
For the purpose of examining the performance of the proposed ED-LSTM, the traditional NN including FFNN and RBFNN, as well LSTM were respectively executed as benchmark models for carbon price forecasting. The detailed parameter setting of each contrast model is listed in Table 5. The hidden layer node of the FFNN model was set as 5, which is due to the hidden layer node is generally set to 75% of the number of input layer nodes. In this paper, the number of input nodes was mostly between 1 and 10, so the number of hidden layer nodes was set to 5. Iteration time was set as 100 for the reason that in case studies of this paper, once the number of iteration exceeds 100, the results change very little, so it is no need to increase the time of iteration. The Learning rate and Goal were determined by trial and error methods. When adjusting these two parameters to different values separately, we chose the value that gives the best results, that is, the Learning rate was 0.1 and Goal was 0.00004. The parameters of the RBFNN model were gotten by network self-training. As for modeling data, the same training data and the completely same input data were used.

| Performance evaluation criteria
Firstly, traditional statistics criteria were utilized in this paper to measure predicting performance. Among them, mean absolute error (MAE) is the average of absolute error between the original and prediction data, mean absolute percentage error (MAPE) measures the average forecasting accuracy, root mean square error (RMSE) tests model stability, and the coefficient of determination (R 2 ) denotes fitting degree. Smaller MAE, MAPE, and RMSE, along with a bigger value of R 2 indicate a better prediction quality, vice versa. The definition of these criteria is as follows: where y t and y * t are the real values and the predicted results of the carbon price series, respectively.
Moreover, A MAE , A MAPE , A RMSE , and A R2 were introduced to evaluate the amelioration of the proposed model. If all indices are greater than 0, which means the proposed model is better. Additionally, the values of A MAE , A MAPE , A RMSE , and A R2 are closer to 0, the smaller amelioration the proposed model renders. The definition of, A MAE , A MAPE , A RMSE , and A R2 is described as Equations (16)(17)(18)(19).
Forecasting results between different models for Chongqing carbon price where the subscript 1 and 2 denote the statistic index of the proposed model and other benchmark models. Besides, the stability test was also carried out in this study to evaluate the capabilities of models. The concrete principle depends on the formula (20).
The smaller S Var value represents higher stability, and vice versa.

| Results and discussion
According to the aforementioned procedure, the proposed ED-LSTM model and other contrast models including LSTM, FFNN, and RBFNN were established. The final forecasting results of the related models are shown in Figures 6-13, which represent the carbon prices of Shenzhen, Beijing, Forecasting results between different models for Fujian carbon price Guangdong, Shanghai, Tianjin, Hubei, Chongqing, and Fujian, respectively. The Shenzhen carbon price, shown in Figure 6, served as an example to discuss the forecasting results. The fitting curves of the forecasting values from different models, frequency histogram of RE (relative error), and scatter correlation diagrams are presented in this figure.
The fitting curve of the ED-LSTM model was nearest to the actual curve, implying that it is the best fitted. Moreover, in the frequency histogram, RE of the proposed model is observed to be distributed more on the left side compared to other benchmark models. In other words, the proposed model had small errors in most instances. In regard to the scatter correlation diagram, the regression line denoted that the real value was the same as the prediction value; thus, the closer the scatters were to this line, the better the performance was. The scatters of the ED-LSTM model were observed to be nearest and more concentrated to the regression line. Analogously, the forecasting results in the other seven case studies of Chinese carbon price series were also seen in the corresponding Figure (Figures 7-13), where it is clear that the forecasting results of the proposed model were superior due to the following three aspects: The fitting curve of ED-LSTM model was closest to the actual curve; RE of the ED-LSTM model was relatively small; scatters of the ED-LSTM model were nearest and more concentrated to the regression line. The values of traditional statistical criteria are listed in Table 6, which may be seen more clearly in Figure 14. The optimal values are marked in bold in Table 6. In view of MAE, MAPE, and RMSE and R2, though the evaluation values of the ED-LSTM model vary with different carbon price series, these values served as the smallest MAE, MAPE, and RMSE and the highest R2, unlike other benchmark models. Besides, the statistic values of the LSTM model were also better than the FFNN model as well as the RBFNN model. Specifically, the values of MAE, MAPE, and RMSE of the LSTM model were smaller than the corresponding values of the FFNN and RBFNN models. Additionally, the values of R2 were found to be larger than the corresponding values of the FFNN and RBFNN models. Key findings can be derived based on the above results of statistical indicators. First, the prediction performance of the ED-LSTM model was superior to that of the single LSTM model as the ensemble technique was based on mode decomposition, which may effectively predict each additional stable mode so as to significantly enhance the prediction accuracy. It also indicated the merit of the "mode decomposition, separate modeling, and ensemble learning" principle. Second, the single LSTM model was shown to be prone to generating prediction results closer to the true values compared with the single FFNN and RBFNN models, which mainly contributed in its features of excellent serialization and long-term dependencies.
From the statistical index radar chart (A), (B), and (C) in Figure 14, the pink line representing the ED-LSTM model may be observed to be at the innermost side, which is followed by the yellow line representing the LSTM model. According to the radar chart (D) in Figure 14, the pink line is at the outermost side, followed by the yellow line, which all have values of reduced MAE, MAPE, and RMSE, but have a higher R2 in the proposed model regarding eight carbon price series forecasting. Combining Figure 15 and Table 6, the models that covered memory architecture (ED-LSTM and LSTM), especially ED-LSTM, greatly enhanced the forecasting accuracy compared to FFNN and RBFNN.
Another four evaluation indices covering AMAE, AMAPE, ARMSE, and AR2 were executed. As evident in Figure 15 Table 7. The smallest S Var is highlighted in bold. Obviously, the S Var of the proposed model was smallest. Concrete S Var values of the ED-LSTM model were 0.00111, 0.00223, 0.00061, 0.00019, 0.00009, 0.00017, 0.00325, and 0.00008 in each case study, respectively. Thus, compared with the benchmark models, the developed model also possessed strong prediction stability.
Moreover, to verify the positive effect of PACF filtering input features on the prediction results, the contrast model ED-LSTM* that did not include PACF was also executed. In this model, the input features were set manually. In all experiments, the results of the evaluation of model accuracy and stability are listed in Table 6 and Table 7. Accordingly, it can be found that adding PACF can improve the performance of the ED-LSTM* model; hence, PACF is essential in constructing the forecasting model.
Overall, the following remarkable conclusions may infer from these results:  Table 6 and Figures 6-13, the forecasting results of Beijing and Guangdong carbon prices were relatively poor, while that of Tianjin and Hubei were good, which corresponds to the volatility and complexity of the original data shown in Table 2. Those represent a relatively larger volatility and complexity in the Beijing carbon price, the most unstable and relatively complex Guangdong carbon price, a relatively smaller volatility and complexity in the Tianjin carbon price, and the smallest fluctuation of the Hubei carbon price. b. In contrast to traditional ANN (FFNN and RBFNN), LSTM and the proposed ED-LSTM model performed well, which may have occurred as traditional ANN models are static in nature due to being historically data-driven only, while models possessing memory units can gain serialization features and long-term dependencies. c. It can be drawn from the overall assessments of performance that the ensemble-driven hybrid model based on mode decomposition is better than the individual model once the data largely fluctuates. This phenomenon may be because the model under ensemble learning architecture is capable of enhancing feature representation, information extraction, and the application of long short-term relationships in order for the data to be deeply explored and for their features to be utilized to the greatest potential. d. The proposed model was applied to all eight different China carbon price series forecasts given the large sample data. The forecasting results demonstrated that the model was precise, stable, and sufficiently general, showing its capabilities in Chinese carbon price forecasting.

| CONCLUSIONS
The precise prediction of carbon price is essential nowadays as carbon trading markets are being widely established to promote environmental protection. This paper explored a general and stable forecasting model in order to efficiently anticipate future carbon prices given time series data, which was applied to all eight-pilot carbon trading markets in China. Moreover, an ED-LSTM model that built multiple LSTM models in different modes for ensemble learning was explored for carbon price forecasting. The established model was implemented, encompassing three steps: (a) CEEMD was initially employed to multi-mode feature extraction, and PACF was executed on each mode to select time-lagged factors as input; (b) LSTM was subsequently employed for multi-mode feature learning; (c) inverse CEEMD computation was ultimately utilized to integrate the forecasted results of multi-mode. Practically, the performance of the developed ED-LSTM was compared with single LSTM as well as two other kinds of conventional ANN by considering evaluation indicators such as MAE, MAPE, RMSE, R2, AMAE, AMAPE, ARMSE, and AR2. The proposed model improved by 38.9%, 34.6%, 48.0%, 49.4%, 17.2%, 47.1%, 38.0%, and 42.9% compared with LSTM in the eight carbon price series. The ED-LSTM model was more enhanced compared to the two conventional static ANN models. Additionally, the stability test demonstrated that the developed model was superior in prediction accuracy as well as prediction stability.
Taking the daily carbon price of all eight China carbon trading pilots for an extended period of time as the research object, the obtained forecasting results indicated the satisfactory performance of the proposed model, which possessed sufficient accuracy, stability, and strong universality. Overall, the proposed method may serve as a promising tool in Chinese carbon price forecasting.