Crude oil price forecasting: An ensemble‐driven long short‐term memory model based on CEEMDAN decomposition and ALS‐PSO optimization

As the lifeline of various industries, crude oil is frequently considered a pillar of economic development. The accurate and reliable prediction of crude oil price can provide investors and decision makers with valuable guidance for formulating their strategies. However, the complexity of the crude oil market and the volatility of oil prices pose significant challenges to forecasting crude oil price. To achieve higher prediction accuracy, this paper proposes a new model in which a new modal component reconstruction rule based on sample entropy (SE) is innovatively proposed and includes an improved particle swarm optimization model to avoid getting trapped in a local optimal solution. Two real crude oil price series are selected for experimentation, and the results demonstrate that the proposed model exhibits superior performance in predicting crude oil price. The developed model outperforms the compared models in terms of mean absolute error, root‐mean‐square error, mean absolute percentage error, and coefficient of determination. This result indicates that Complete Ensemble Empirical Mode Decomposition with Adaptive Noise exhibits better decomposition performance. SE can effectively reduce error accumulation, and particle swarm optimization with an adaptive learning strategy (ALS‐PSO) achieves better optimization results for a long short‐term memory network compared with PSO only. Therefore, the proposed model is an effective crude oil price prediction tool that can be applied in practice for investment and policy‐making.


| INTRODUCTION
As the basis and driving force for the progress of human civilization, energy has occupied an important position in the history of human development.With energy being the lifeline of various industries and the national economy, rapid economic development and long-term social stability are inseparable from energy. 1 Energy price, which is considered an important factor in ensuring people's living standards and developing national economies, is closely linked to global economic development, international relations, and national policies.3][4][5][6][7] The accurate prediction of energy prices is crucial for governments, financial companies, and energy companies.Governments can avoid policy failure or resource waste by accurately predicting the trend of energy prices when formulating energy policies, maximizing profits and ensuring the stability of energy supply. 8Financial companies can profit from crude oil futures and derivatives trading, while energy companies must predict prices to formulate production and sales strategies that maximize profits.If the prediction is inaccurate, then these companies may suffer huge economic losses. 9redicting crude oil prices is also essential for investors.Investors can decide whether to enter a market or adjust their investment portfolio on the basis of prediction results.Accurate predictions can help investors achieve better returns in a market, while avoiding risks to protect their investments.Therefore, the accurate predictions of crude oil prices can provide valuable reference information to investors, helping them make more informed decisions.However, crude oil prices are highly nonlinear and nonstationary, [10][11][12] making the accurate forecasting of crude oil prices challenging.Nevertheless, many researchers are still engaged in the search for more accurate prediction models.
In accordance with existing research, commonly used forecasting methods are classified into three categories: econometric models, artificial intelligence (AI) methods, and approaches from hybrid models.
4][15] Hou and Suardi 16 used a nonparametric GARCH model to predict the volatility of crude oil prices, and their experimental results showed that the nonparametric GARCH-based model achieved higher prediction accuracy compared with parametric GARCH.The nonparametric GARCH model has the characteristics of strong adaptability and flexibility due to the absence of any parameter restrictions.However, the lack of specified parameters also results in a high degree of cross-explanatory power of the model and requires more data for prediction, leading to longer training time and higher computational costs.Nan et al. 17 applied the ARIMA model to predict China's energy ecological footprint.Sheoran and Pasari 18 investigated the applicability of the window-sliding ARIMA (WS-ARIMA) method to daily and weekly wind speed forecasts.Their experimental results on daily and weekly wind speed data showed that the WS-ARIMA method consistently outperformed the conventional ARIMA method.The ARIMA model exhibits high predictive accuracy and interpretability once the parameters are determined.However, it assumes stationarity of the time series data and may fail to capture the features and trends of complex and nonstationary data.Moreover, the accuracy of ARIMA predictions largely depends on the selection of parameters, which requires expertise and skill.The WS-ARIMA model is more effective in handling nonperiodic, sudden, and volatile data, and can adjust its model parameters to accommodate changes in time series data.It can also make predictions at multiple time points, improving prediction accuracy.However, the model has drawbacks such as longer training times, higher computational costs, and the need for expertise in selecting the window size and ARIMA model parameters.Additionally, WS-ARIMA model predictions may lack interpretability compared with ARIMA models and may be difficult to explain intuitively.To address this issue, some scholars have turned to AI methods for nonlinear and nonstationary time series analysis.
AI methods can learn and map rules from large amounts of historical data to build nonlinear predictive models.By training and learning approximations iteratively, AI methods can effectively handle nonlinear and non-smooth time series.Compared with econometric models such as ARIMA and GARCH, machine learning models can automatically adapt to the variability and complexity of the data without the need to manually adjust the model parameters.Thus, they provide higher prediction accuracy than statistical methods.1][22][23] Tarmanini et al. 24 forecast electricity load data by using two models: ARIMA and ANN.They also compared the two forecasting models in terms of error factor, and the results showed that the ANN model could cope better with nonlinear data, while ARIMA can only manage linear data.ANN model have advantages in nonlinear modeling, strong adaptability, parallel computing, scalability, and ability to process large-scale data, making it a powerful tool for solving complex problems and predicting nonlinear data.However, when the data set is small or the model is too complex, ANN is prone to overfitting, resulting in insufficient generalization ability.The optimization of ANN typically employs the gradient descent algorithm, which is susceptible to the influence of initial weight values, leading to the algorithm being trapped in a local optimal solution and unable to obtain the global optimal solution.Fałdziński et al. 25 selected energy commodities (crude oil, natural gas, heating oil, diesel, and gasoline) and compared the forecasting performance of GARCH models with support vector regression (SVR) models; they showed that SVR forecasts were typically more accurate than GARCH forecasts.The GARCH model assumes that data changes are linear, while SVR can handle nonlinear relationships through kernel functions.In nonlinear data, SVR can better capture complex patterns in the data, thereby improving prediction accuracy.The goal of SVM is to find an optimal decision boundary that maximizes classification accuracy on new, unseen data.This means that SVM can effectively avoid overfitting and has good generalization ability.However, SVM is slow in processing large-scale and high-dimensional data sets, requiring a significant amount of computing resources and time.Additionally, it is sensitive to noise and outliers, which may affect the performance of the model.Chen et al. 26 used LSTM to predict oil import risk and compared its prediction results with those of other models, such as SVM, to verify whether the LSTM model exhibited better fit.The advantages of LSTM are its excellent performance in handling long sequences, its ability to effectively capture long-term dependencies in the sequence, and its ability to avoid the problem of vanishing gradients in traditional recurrent neural networks, thereby reducing the likelihood of local optima.The disadvantages of LSTM include its high model complexity and computational cost, as well as its requirement for large amounts of training data and time for training.
A conclusion can be drawn, that compared with econometric models, AI models can effectively improve the prediction accuracy of oil prices.However, all these models have their own unique strengths and weaknesses, and thus, applying each model to all situations is difficult.
To overcome the shortcomings of individual models, many scholars have proposed hybrid models in recent years.The purpose of fusing models is to combine the strengths of each model to reduce their weaknesses, improving the predictive performance of a model.
Common hybrid models can also be divided into three categories.The first type combines optimization algorithms with a prediction model.For example, optimization algorithms, such as particle swarm optimization (PSO), 27 genetic algorithm (GA), 28 and Grey Wolf Optimization (GWO), 29 are used to improve the accuracy of prediction.Soleimanzade et al. 30 used the PSO algorithm to optimize three deep neural networks based on convolutional neural networks and LSTM neural networks to improve the performance of an intelligent energy management system (IEMS) when studying energy management during the desalination process.Finally, gray wolf optimization (GWO) and GA were used to optimize the model for a comparison test, and the experimental results showed that PSO exhibited better optimization effect.Song et al. 31 used the PSO algorithm to optimize the hyperparameters in the LSTM model and predict oil production.Their experiments showed that PSO-LSTM outperformed the LSTM model.These methods are common optimization algorithms based on biological behavior or natural phenomena, designed to solve complex optimization problems.In PSO, each individual is called a "particle" and they move at a certain speed in the search space to find the optimal solution.PSO algorithm has advantages such as easy implementation and not requiring much parameter tuning, but it may easily get stuck in local optima.GA simulates the process of biological population evolution through selection, crossover, mutation, and other operations to find the optimal solution.GA has advantages such as strong global search ability and insensitivity to the representation of the problem solution, but it also has disadvantages such as slow convergence speed and difficult parameter selection.GWO algorithm has advantages such as strong global search ability and ease of implementation, but it also has disadvantages such as difficult parameter selection and easy to get stuck in local optima.Previous studies have shown that the models optimized by PSO outperform other algorithms, which may be due to the sensitivity of the data set features to PSO optimization, or because the optimization properties of PSO are more suitable for this problem.However, this result does not necessarily mean that the PSO algorithm is better than GWO and GA, as different problems may require different algorithms to solve.Nevertheless, many studies suggest that models optimized by optimization algorithms perform better than those without optimization algorithms.The second type is a combination of different forecasting models.Alameer et al. 32 proposed an LSTM-deep neural network (DNN) model for predicting future coal price fluctuations.Compared with other models, their model utilized an LSTM neural network at the front end to extract implicit features from the initial feature volume and a DNN at the back end to classify and identify the transient stability results based on the features extracted at the front end.Then, it fused the LSTM neural network with the DNN by means of linear superposition.The LSTM neural network can effectively solve the problems of gradient vanishing and exploding that traditional neural networks encounter when processing time-series data, and has good longterm memory capability.The LSTM-DNN model requires certain preprocessing of the input data, such as timewindowing for time-series data, which may have an impact on the data dimensions.Xie et al. 33 combined multiple superimposed LSTM and backpropagation neural network (BPNN) to propose LSTM-BPNN + BPNN square to improve prediction accuracy by dividing the cooling load of an improved ground source heat pump system into separate predictions of sensible and latent heat.The BPNN model has a simple structure, a small number of parameters, and relatively fast training speed.By stacking LSTM and BPNN, the advantages of the two models can be effectively combined to improve prediction accuracy.However, the BPNN model is not as effective as other neural network models in fitting nonlinear data.The stacking process of the LSTM-BPNN + BPNN model is complex, and requires sufficient parameter tuning for both models.
Although both hybrid models can further improve the prediction accuracy of a single model, neither of the two types can extract endogenous patterns from the time series.Thus, some scholars have proposed a third type of hybrid model by combining signal decomposition algorithms with forecasting models.The price series is decomposed into several eigenmodal functions, that is, intrinsic mode functions (IMFs), through the signal decomposition algorithm.Each IMF represents different characteristics of the original signal, such as short-term fluctuations and periodicity, reducing the prediction difficulty of the model.Bedi and Toshniwal 34 used the empirical mode decomposition (EMD) 35 -LSTM method to forecast electricity demand for a given season, date, and time interval, by comparing the forecast results with a recurrent neural network (RNN), LSTM, and EMD-RNN.Their results showed that EMD-LSTM outperformed the other competing models.Wu et al. 36 used ensemble EMD (EEMD) 37 -LSTM to predict crude oil prices.Their empirical results showed that their proposed model remained valid with good robustness and high accuracy when the number of decomposition results changed.EMD-LSTM and EEMD-LSTM are effective in handling nonlinear and nonstationary time series data, allowing for better reflection of the intrinsic patterns and trends in the data.Preprocessing data with EMD or EEMD allows for the decomposition of the data into multiple sub-sequences with different scales and frequencies, enabling the LSTM network to better learn the multiple features of the data while reducing noise and random fluctuations.However, for some nonlinear, nonstationary time series data such as stock prices, temperature, and energy consumption, EMD or EEMD decomposition may result in modal overlap or modal confusion, which can affect the decomposition results and prediction accuracy.
In summary, deep learning models have demonstrated superior predictive performance in various domains compared with econometric models, but deep learning models also have their own drawbacks.For RNN, the gradients from the later part of the sequence are difficult to back-propagate to the earlier part when the sequence is long, resulting in vanishing gradient.LSTM and gated recurrent unit (GRU) address this problem by introducing gates with sigmoid functions and combining them with tanh functions and by adding a summation operation, reducing the likelihood of vanishing gradient and balancing short-term and long-term dependencies.Therefore, LSTM and GRU are superior to RNN.For LSTM and GRU, GRU has fewer parameters and faster convergence speed, and thus, its actual computational time is less, which can considerably accelerate the iteration process of the model. 38To improve the model's performance, researchers have attempted to introduce decomposition algorithms to extract oscillation characteristics in crude oil price sequences.However, some defects must still be improved.First, when using EMD and EEMD, modal aliasing will occur and the reconstructed sequence may not be equal, reducing the quality of decomposition and producing time-consuming and error accumulation problems due to the large number of obtained IMFs.In addition, optimization algorithms play an important role in model training.Common algorithms include GA, Ant Colony Optimization (ACO), and PSO.ACO uses a positive and negative feedback mechanism, but it requires a long search time and is prone to stagnation; it is also highly sensitive to the initial parameter settings.Therefore, the application scope of ACO is limited to a certain extent.By comparison, GA and PSO are stochastic search algorithms based on global optimization.GA is based on the idea of natural evolution, and it continuously optimizes individuals in the population through operations, such as inheritance, crossover, and mutation, to achieve the global optimal solution.However, GA does not have a memory function, and thus, previous knowledge is destroyed as the population changes.In addition, although GA's coding and genetic technology are relatively simple compared with PSO, GA is slower than PSO.PSO simulates the process of particles moving in a search space, constantly updates the position and velocity of particles, and aims to achieve the global optimal solution.PSO exhibits excellent robustness and fast convergence; it is widely used in the structural optimization of neural networks.However, PSO is prone to premature convergence during iteration and may fall into local optimal values, and thus, measures should be taken to avoid such situation.
On the basis of the preceding discussion, the current study proposes a hybrid oil price prediction model by using complete EEMD with adaptive noise (CEEMDAN), 39 sample entropy (SE), 40 PSO with an adaptive learning strategy (ALS-PSO), and LSTM.First, CEEMDAN is used to decompose the oil price signal by adding pairs of Gaussian white noise to solve the problems of mode mixing and unequal reconstruction after decomposition.Second, SE is used to evaluate the complexity of each IMF, and IMFs are partitioned and reconstructed on the basis of different complexities to reduce the number of IMFs, computational time, and error accumulation.Subsequently, LSTM is used to model the reconstructed subsequences because of its ability to extract long-term dependencies in time series.ALS-PSO is employed to optimize the parameters in LSTM.It can adaptively perform subgroup division and particle updates, avoiding getting trapped in local optimal solutions.Its effectiveness requires further verification through more literature in different research fields.Finally, the prediction results of each subpart are combined to obtain the final prediction result.
The innovations and contributions of this study are as follows.
1.A new SE-based method for reconstructing sequences is used.In this process, sequences of higher complexity are kept separate while sequences of lower complexity are combined, effectively reducing the superposition of errors and model time consumption.In Section 5.2, models with this reconstruction approach are compared with models without this reconstruction approach to verify their validity.

A novel optimization algorithm, ALS-PSO, was
employed to adjust the model parameters when predicting nonstationary and nonlinear time series data.This algorithm exhibits faster convergence speed and superior global search capability compared with traditional optimization algorithms and proved to be particularly effective in our research.Empirical research results demonstrated that models utilizing ALS-PSO outperformed those employing other optimization algorithms in terms of prediction accuracy and convergence speed.In the future, the application of ALS-PSO in this field has broad research prospects and application value, with the potential to further explore its application in other domains.
The remainder of this paper is organized as follows.Section 2 (Methodology) presents the methods covered in this study.Section 3 (Hybrid model framework) describes the basic process of the model proposed in this study.Section 4 (Experimental preparation) provides the data sets, evaluation criteria, description of the parameters, and benchmark test model.Section 5 (Empirical analysis) performs predictions and comparisons of the experimental results of single and mixed models.Section 6 (Conclusions) presents the conclusions and directions for future work.

| CEEMDAN
EMD is a self-adaptive decomposition method that can be applied to linear, nonlinear, stationary, and nonstationary signals 41 ; it is widely used in prediction models.However, when a sudden change occurs in the interference signal, the EMD result may exhibit mode mixing, 42 rendering the IMF meaningless and failing to represent the features of the time series.EEMD 43 solves the mode mixing problem by adding a large amount of Gaussian white noise to EMD; however, the reconstructed sequence may not be equal to the original sequence.CEEMDAN 44 cannot only avoid mode mixing but also considerably reduce, or even eliminate, the residual noise component in the reconstructed signal.
The steps of CEEMDAN are as follows.
Step (1): A white noise series is added to the time series, and EMD is averaged to obtain the modal component IMF 1 .Let x n ( ) be the original time series, ε be the adaptive coefficients, and ω n ( ) i be the white noise sequence added at each iteration.Then, the sequence after adding noise for the ith time is The mean of N EMDs represents the mode component IMF n .
Step (2): The residual sequence r n ( ) obtained in Step (1) is calculated, and then IMF 2 is calculated from its result. (3) Step (3): The calculation of Step ( 2) is repeated up to Step k + 1 to obtain the residual sequence r n ( ) Step (4): The steps above are repeated until the remaining models can no longer be decomposed to obtain the intrinsic mode component IMF k and the residual sequence RES, at which point the time series obtained is

| SE
SE 45 is a more accurate representation parameter for measuring the complexity of time series.It improves approximate entropy and exhibits advantages, such as low computational cost and strong anti-interference ability.Error accumulation occurs when reconstructing IMF.In previous studies, IMF was reconstructed on the basis of the similarity of its complexity to reduce error accumulation.However, recombining sequences with high complexity can result in feature mixing, which is not conducive to model learning.The current study proposes a new combination method based on SE that separately retains sequences with high complexity and combines sequences with low complexity to reduce error accumulation and model runtime.
The SE formula is calculated as follows.
Step (1): The sequence is reconstructed.Given the phase space reconstruction dimension m, a set of mdimensional time series X i m is obtained.
Step (2): Measuring similarity.When the distance between X i m and X j m is greater than the similarity tolerance r, then the similarity between the two sequences is 0, and 1 otherwise.The similarity function S r ( ) is as follows: )  , where d X X ( , ) Step (3): The probability value φ m i that the ith time series X i m is similar to the other time series is counted.

| ALS-PSO
Deep learning has become a mainstream model for predicting energy prices and demands, with previous studies demonstrating its superior accuracy.To improve the effectiveness and accuracy of a prediction model, PSO 46 has been widely used in research to optimize the combination of hyperparameters.An optimized model frequently exhibits better robustness and convergence speed.However, the traditional PSO is prone to being trapped in local optimal solutions.ALS-PSO 47 partitions the particle swarm adaptively into several sub-swarms, and within each sub-swarm, each particle is further classified as a normal or local optimal particle.Two different learning strategies are designed to update different types of particles to increase population diversity.Finally, the fitness values of all the local optimal particles are compared with obtain the global optimal value, overcoming the problem of the traditional PSO being trapped in local optimal solutions (Figure 1).The steps in ALS-PSO calculation are as follows.
Step (1): Adaptive population size partitioning.First, the local density ρ i and the distance δ i of each particle are calculated, and particles with higher values are identified as the center particles of each subgroup.Then, the remaining particles are assigned to the subgroup that is closest in distance and has high density.This method differs from the traditional iterative approach and completes population partitioning in one step, improving the efficiency of partitioning.In addition, compared with the traditional truncation allocation method, this method introduces the concept of boundary regions to prevent low-density subgroups from being classified as noise groups.The boundary region is a group of particles assigned to the subgroup with a distance of at least d c from other subgroups.For each subgroup, δ b is defined as the particle with the maximum density in its boundary region.If the density of a particle in the subgroup is greater than δ b , then it is considered to belong to the subgroup; otherwise, it is considered a separate subgroup.
where I is the set of all particles, d ij is the Euclidean distance between particles i and j, and d c is the truncated distance between particles.
Step (2): In each sub-swarm, the particle with the best fitness is selected as the local best particle, while the remaining particles are considered normal particles.The two types of particles have different responsibilities, and thus, two different learning strategies are designed for each type.To determine whether particle i is the local best particle, Step (3) learning strategy is applied if it is the local best particle; otherwise, Step (4) learning strategy is used.
Step (3): If particle i is identified as the local best particle, then it is responsible for exploring other potentially better regions and guiding other normal particles in learning.Therefore, its learning strategy is as follows: where ω is the inertia weight, C is the number of subgroups, C 1 and C 2 are the acceleration coefficients, rand 1 and rand 2 are two random numbers uniformly distributed on the interval [0, 1], and cgBest c is the locally optimal particle in Subgroup C.
As indicated in Equation ( 15), cgBest c in the social learning part is replaced with the average information . This step of change increases the diversity of population and the speed of convergence.Locally optimal particles can learn more information to enhance their diversity and avoid local optima.
Step (4): If particle i is determined as a common particle, then its primary responsibility is to carry out development in the area of its subgroup.Its learning strategy is as follows: where ω is the inertia weight, C 1 and C 2 are the acceleration coefficients, rand 1 and rand 2 are two random numbers that are uniformly distributed on the interval [0, 1], and cgBest c is the locally optimal particle in Subgroup C.
In Equation ( 16), in contrast with the traditional PSO, ALS-PSO replaces the globally optimal particles with locally optimal particles in the learning strategy of ordinary particles to guide ordinary particles in updating, enabling each F I G U R E 1 Optimization process of particle swarm optimization with an adaptive learning strategy.
subpopulation to learn more and different search information to increase the diversity of the population.

| ALS-PSO-LSTM
LSTM networks exhibit an improvement over RNNs.RNNs suffer from the vanishing gradient problem and cannot capture long-term dependencies in sequences.LSTM introduces gate mechanisms that can solve the vanishing gradient problem and are better suited for time series problems. 48The cell structure of LSTM is shown in Figure 2.
Step (1): The forgetting gate f t determines which information to discard from the cell state.First, we use the sum of h t−1 and x t as input to the forgetting gate.The forgetting gate outputs a number in the interval [0, 1], where 1 represents "complete retention" and 0 represents "complete forgetting."Then, we multiply the output with the corresponding element of C t−1 to forget long-time dependent information.
Step ( 2): The update is achieved by utilizing gate i t to decide which new information should be stored in the cell state.The activation function tanh generates a new vector of the candidate values, which is filtered by the input gate and then added to the cell state.
Step (3): The final output is determined by the output gate o t .First, the updated cell state value is transformed into the interval [−1, 1] by using the tanh function.Then, the output gate determines which parts of the cell state to output by producing a number in the interval [0, 1].The final output h t is obtained by multiplying the transformed cell state with the output of the output gate.
This study employs ALS-PSO to search for the optimal number of neurons, time window size, and learning rate of LSTM.It also establishes an ALS-PSO-LSTM prediction model, which enables the LSTM model to quickly and accurately determine the optimal parameters on the basis of the characteristics of crude oil price data, avoiding local optimization and improving the accuracy of parameter optimization.
The steps for optimizing LSTM by using ALS-PSO are as follows.
Step (1): The parameters of the particle swarm and the structure of the LSTM network are initialized.Particle swarm parameters include population size, inertia weight, acceleration factor, and sub-warms.The initialization of the LSTM network structure refers to batch size, number of hidden layers, and learning rate.
Step (2): The local density and distance of each particle are calculated using Equations ( 13) and ( 14), respectively.Adaptive population partitioning is performed to partition the particle swarm into sub-swarms.
Step (3): The global and local best particle positions are determined, and the positions of the ordinary and local best particles are updated.
Step (4): The position information of the ordinary and local best particles is updated using Equations ( 15) and ( 16), respectively.
Step (5): If the iteration termination condition is met, then return the optimal hyperparameter values and end the iteration.Otherwise, return to Step (2) to continue the iteration.
Step (6): The obtained optimal hyperparameter values are assigned to LSTM.The model is trained using the training and validation sets.Predictions are made using the test set, and the prediction results are obtained.

| FRAMEWORK FLOW OF THE HYBRID MODEL
The flowchart of the framework of the hybrid prediction model proposed in this study is presented in Figure 3.
In accordance with Figure 3, the major steps consist of the following.
Step (1): Decomposition and reconstruction of crude oil prices.CEEMDAN is used to decompose crude oil prices into several IMFs, each representing different amplitude characteristics of the signal.Consequently, the nonlinearity and nonstationarity of crude oil prices are reduced.Then, SE is used to calculate the complexity of each IMF.IMFs with high complexity are retained, while those with low complexity are combined by adding IMFs with similar SE values.
Step (2): Optimization of model structure.An LSTM neural network model based on ALS-PSO is constructed.ALS-PSO optimizes the hyperparameters in LSTM, including batch size, number of hidden layers, and learning rate, considerably affecting the performance of LSTM. 49tep (3): Model prediction and evaluation of accuracy.The optimized LSTM model is used to predict each subsequence, and the following evaluation criteria are used to assess the prediction accuracy of the model: mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R 2 ).The final prediction results are obtained by summing all the predictions.

| EXPERIMENTAL PREPARATION 4.1 | Data description
In this study, 4591 daily closing prices of WTI and Brent futures (data source: Yahoo Finance -Stock Market Live, Quotes, Business & Finance News-https://finance.yahoo.com/) from January 2, 2014, to October 19, 2022, were utilized as sample data for the experiment, as depicted in Figure 4.The training data set was composed of 70% of the data, the subsequent 10% was utilized for the validation set to optimize the parameters by using ALS-PSO, 50,51 and the remaining 20% was reserved for the test set. 52I G U R E 3 Flowchart of this study's proposed model.
F I G U R E 4 Daily crude oil price data for both markets.
To understand the characteristics of crude oil price data, statistical descriptions were performed on the two data sets, as indicated in Table 1.Mean and median describe the central tendency of the time series data, while range and variance describe the dispersion degree of the time series data.Kurtosis and skewness describe the distribution shape of the time series data, while unit root tests are used to assess the stationarity of the time series data.
As shown in Table 1, the mean values of the two crude oil markets are 61.7363 and 66.4087, and the ranges are 161.33 and 108.65, respectively, indicating that crude oil prices have a large degree of dispersion.The skewness values are 0.7178 and 0.6815, while the kurtosis values are 0.1613 and −0.2003, respectively.The augmented Dickey-Fuller (ADF) test showed a p value > 0.05, indicating the presence of unit roots and nonstationarity in the data.Through statistical descriptions, a conclusion can be drawn that violent fluctuations occur in oil prices, and they exhibit nonlinearity and nonstationarity.

| Evaluation criteria
To measure the prediction performance of the prediction model, four evaluation indicators, namely, MAE, RMSE, MAPE, and R 2 , are selected in this study to verify the prediction accuracy of the proposed model.They are calculated as follows: where Y t ( ) is the true value at time t, ⨆ Y t ( ) is the predicted value at time t, Y ¯is the mean of the predicted values, and T is the sample size.MAE, RMSE, and MAPE reveal the difference between the true and predicted values, and R 2 reflects the proportion of the total variation in the dependent variable that can be explained by the independent variable through a regression relationship.

| Parameter description
In this study, CEEMDAN was used to decompose the time series data from January 2, 2014 to October 19, 2022.ALS-PSO was also employed to optimize the LSTM model.The model parameters were selected as indicated in Table 2. CEEMDAN parameters were set as follows: the signal-to-noise ratio of the added white noise was 0.2, the number of noise additions was 500, and the maximum iteration number was 5000.The ALS-PSO parameters were set as follows: particles were 30, subswarms were 3, velocity inertia factor was 0.8, acceleration factor C 1 was 1.5, and acceleration factor C 2 was 1.5.The setting of inertia weight and acceleration factor is a commonly used configuration in the classic PSO algorithm and is widely used in many PSO algorithms. 53,54he hyperparameters 55,56 in LSTM were optimized by ALS-PSO.The hyperparameters chosen for this study included batch size, number of hidden layers, learning rate, and optimization range, as shown in Table 3.

| Benchmarking model
To verify the superior predictive performance of the proposed hybrid model, this study selected ARIMA, SVR, BPNN, RNN, and LSTM as single prediction models to validate the superiority of LSTM among single models.PSO and ALS-PSO were introduced to optimize the hyperparameter combination of LSTM, proving that the optimized LSTM has higher accuracy than before optimization.

| Decomposition and reconstruction
This section delves into the utilization of EMD and EEMD for the decomposition of crude oil price data, particularly WTI.As illustrated in Figure 5, the EMD results exhibit mode mixing, which is evident from the red box in the figure .The phenomenon arises due to rapid fluctuations in the local extremum of the price sequence over short time intervals during the EMD process.Consequently, the IMF components lack physical meaning and fail to capture the essential features of the time series.Although EEMD resolves the issue of mode mixing by introducing significant amounts of Gaussian white noise, the reconstructed sequence may not match the original sequence.Figures 6 and 7 present the outcomes of the CEEMDAN decomposition of WTI and Brent time series data, respectively.CEEMDAN decomposes crude oil prices into 14 components.Some of these subsequences  are smoother and more regular than the original sequence, representing different features of the time series, such as short-term fluctuations, long-term trends, and seasonal trends.Thus, the CEEMDAN decomposition approach is more conducive to model learning.The decomposition of WTI and Brent into 14 IMFs, as shown in Figures 6 and 7, significantly increases computation time and may lead to error accumulation in the prediction results of each IMF during reconstruction.These different IMFs represent different frequency ranges of vibrational patterns, indicating the different messages represented by different fluctuations in oil prices.IMFs with higher frequencies correspond to shorter-term fluctuations.For the first few IMFSs, their sample SEs are larger than 1, implying that they have higher complexity and irregularity.These IMFSs, which we interpret as the highly volatile component, represent short-term high-frequency fluctuations in oil prices.These high-frequency fluctuations may be influenced by a variety of factors, such as the short-term impact of changes in the stock and currency markets on oil prices.The interaction between these markets may lead to high frequency fluctuations in oil prices.In addition, speculators may seek profits through short-term trading activities, which may lead to large fluctuations in oil prices in a relatively short period of time.While lower frequency IMFs correspond to longer-term volatility, such volatility is usually caused by major events.These major events may include geopolitical tensions, macroeconomic factors and global supply and demand balances, which can cause oil prices to fluctuate over a longer time horizon.Therefore, SE is used to quantify the complexity of each component.Table 4 and Figure 8 present the calculation results of SE.Components with a complexity greater than 1 are retained, while low-complexity and similar components are merged.Components with 0.1 < SE < 1 and SE < 0.1 are merged separately.This approach preserves features while reducing error accumulation.Table 5 and Figures 9 and 10 present the final recombination results.After decomposing the price series into different components of volatility, the regularity of different frequencies hidden in the price series can be explored, which can improve the performance of subsequent forecasting models.| 4069 competing models in predicting crude oil prices with significant fluctuations.These results suggest that ALS-PSO can optimize the hyperparameters of the LSTM model more accurately by effectively combining the LSTM model and crude oil price data features, improving the accuracy and stability of crude oil price predictions.
On the basis of the aforementioned predictive analysis results, we can conclude that the ALS-PSO-LSTM model exhibits superior predictive performance in a single predictive model.In the absence of decomposition and reconstruction models, ALS-PSO-LSTM can more accurately extract crude oil price features and generate more  precise results when crude oil prices exhibit significant fluctuations.To further verify the optimization effectiveness of ALS-PSO, optimized LSTMs of commonly used optimization algorithms were also used in this experiment for comparison (GWO-LSTM, GA-LSTM), as shown in Table 6, ALS-PSO was the best in almost all metrics.However, ALS-PSO-LSTM does not significantly improve R 2 when processing crude oil prices and even produces a slightly larger RMSE of 1.9715 for the Brent market data compared with PSO-LSTM.Therefore, we believe that the accuracy of a single predictive model is relatively limited.7. From Table 7, the proposed hybrid model exhibits significant improvements in all the evaluation metrics.Compared with the single prediction model mentioned in the previous section, the hybrid prediction model incorporates signal decomposition algorithms, and therefore, achieves superior predictive performance.

| Analysis of the hybrid model
Comparing the LSTM prediction results of the time series decomposed by CEEMDAN, EMD, and EEMD, CEEMDAN-SE-ALS-PSO-LSTM performs the best in both data sets.Using the WTI market as an example, the MAE, RMSE, and MAPE values predicted by CEEMDAN are 0.9504, 1.6427, and 1.86%, respectively, with a significantly higher R 2 of 99.56%.However, the introduction of PSO does not significantly improve prediction accuracy and may even lead to an increase in MAPE and a decrease in R 2 , because the pure decomposition model | 4071 cannot completely eliminate the nonstationarity of modal components while also failing to extract their features.The introduction of SE can improve this drawback.Therefore, CEEMDAN-SE-ALS-PSO-LSTM is significantly superior to other hybrid and single models in evaluation metrics, such as MAE, RMSE, MAPE, and R 2 .
In conclusion, we believe that the superior predictive performance of CEEMDAN-SE-ALS-PSO-LSTM is mostly due to CEEMDAN's ability to overcome problems, such as modal mixing in EMD and the nonreconstruction of the original sequences in EEMD.The hybrid prediction model based on CEEMDAN effectively decomposes crude oil price data into a series of modal components with different complexities and reduces prediction difficulty by calculating SE.Moreover, it preserves high-complexity modal components and combines low-complexity ones, further reducing error  In addition, to verify the effectiveness of the SE-based reconstruction method proposed in this study for reducing the error accumulation of modal components, we present a comparison of the prediction results between CEEMDAN-ALS-PSO-LSTM and CEEMDAN-SE-ALS-PSO-LSTM in Figure 16.The figure clearly shows that the introduction of the SE-based reconstruction method reduces prediction error fluctuations and makes them closer to the real values.This finding further confirms the superiority and effectiveness of the SE-based reconstruction method proposed in this study.| 4073 basis of different complexities, that is, using CEEMDAN-SE to decompose and recombine the original sequence; and (3) model prediction, predicting the recombined sequence by using the hyperparameters of the LSTM structure optimized by PSO-ALS.We selected daily price data for WTI and Brent crude oil futures from January 2, 2014 to October 19, 2022 for the prediction.To verify its effectiveness, we compared it with 10 other models: BPNN, RNN, SVM, LSTM, EMD-LSTM, EEMD-LSTM, CEEMDAN-LSTM, CEEMDAN-PSO-LSTM, CEEMDAN-ALS-PSO-LSTM, and CEEMDAN-SE-ALS-PSO-LSTM.

| CONCLUSIONS
Empirical analysis results show that the novel hybrid model proposed in this study exhibits excellent predictive performance and can effectively improve prediction accuracy.The major conclusions drawn in this work are as follows.
1.The use of ALS-PSO optimized LSTM models improves crude oil price prediction accuracy.The experimental results demonstrate that compared with traditional LSTM models and LSTM models with traditional PSO, the ALS-PSO-optimized LSTM model achieves higher prediction accuracy.ALS-PSO avoids manual determination of hyperparameters, accurately determining optimal hyperparameters on the basis of the characteristics of crude oil price data.This model effectively combines the LSTM model and crude oil price sequence data, enhancing the accuracy and stability of crude oil price prediction.

The SE-based partition and recombination method
proposed in this study can preserve the features of each decomposed component while reducing error accumulation.Previous research and the experimental results of the current study demonstrate that the CEEMDAN technique exhibits higher decomposition accuracy than EMD and EEMD in predicting natural gas prices.Furthermore, this study proves that the model that uses CEEMDAN-SE and ALS-PSO optimized parameters achieves more accurate prediction performance compared with the models that use CEEMDAN, CEEMDAN-PSO, and CEEMDAN-ALS-PSO.3. The inclusion of decomposition models can improve the model's predictive performance.In this study, models decomposed by EMD, EEMD, and CEEMDAN were all significantly superior to their corresponding single prediction models, indicating that the combination of prediction models with decomposition techniques can significantly improve the accuracy of the original sequence prediction.
In summary, the proposed novel hybrid model demonstrates superior accuracy and robustness in predicting crude oil prices.It is applicable to other high-complexity, nonstationary, nonlinear, and irregular time series data.In addition, the model has potential applications to other energy prices, providing a valuable and versatile tool for energy market analysis and forecasting.With further development and refinement, the model can be extended to other fields, such as finance, environmental monitoring, and healthcare, making it an exciting area for future research and development.
However, as a data-driven model, it possesses certain limitations.First, it heavily relies on the quality and availability of historical data.In the presence of missing or noisy historical data, the model's performance and robustness may be affected.Second, the predictive capabilities of the model may be influenced by external factors such as macroeconomic indicators, political events, and geopolitical situations, which have not been accounted for in this study.Additionally, the use of SE for reconstruction introduces a certain level of subjectivity.Therefore, future research endeavors could focus on addressing these limitations and continuously improving the model's design and methodologies.The aim would be to develop a more reliable, accurate, and practical energy market forecasting model.By incorporating a more comprehensive consideration of various influencing factors and adopting more scientifically grounded methods, the ability to cope with market uncertainty and fluctuations can be enhanced, providing decision-makers with more valuable predictive insights.

T A B L E 2 5
Model parameter settings.Empirical mode decomposition (A) and ensemble empirical mode decomposition results (B).

F I G U R E 6
WTI crude oil market price decomposition results.F I G U R E 7 Brent crude oil market price decomposition results.

F I G U R E 8
Abbreviation: IMF, intrinsic mode function.

F I G U R E 9 4067 5. 2 |
WTI market intrinsic mode function restructuring results.F I G U R E 10 Brent market intrinsic mode function restructuring results.WANG ET AL.|Comparison of a single modelTo further validate the superiority of the proposed ALS-PSO optimized LSTM model, we employ four individual prediction models and two optimized LSTM models in this section to forecast crude oil futures prices.The prediction results of each model on the test data set are presented in Figures11 and 12.As shown in Figures11 and 12, the true data are displayed in gray.The SVM model clearly yields significant errors compared with the ground truth in the WTI and Brent futures markets.In particular, the MAE of SVM is 8.2719 and 3.5169, its RMSE is 9.0202 and 4.2589, and its MAPE is 15.8890% and 6.3848% in the WTI and Brent markets, respectively.These values are considerably higher than those of the other models.The prediction performance of BPNN, RNN, and LSTM is remarkably more accurate than that of SVM.Although RNN performs slightly better than LSTM in terms of R 2 , with a higher R 2 of 0.129% in the WTI market and 0.0041% in the Brent market, LSTM demonstrates overall better performance in terms of MAE, RMSE, and MAPE.Table6provides the evaluation results of various predictive models on the two data sets, highlighting the superior predictive performance of the ALS-PSO-LSTM model.The optimized LSTM model achieves higher accuracy than the traditional LSTM model in both markets, with the ALS-PSO-LSTM model outperforming the PSO-LSTM model.For example, in the Brent market, the PSO-LSTM model's MAE, RMSE, MAPE, and R 2 are 1.262, 1.9563, 1.93%, and 99.22%, respectively.Compared with the LSTM model, the PSO-LSTM model reduces MAE, RMSE, and MAPE by 0.0144, 0.0464, and 0.03%, respectively, while increasing R 2 by 0.01%.Furthermore, the ALS-PSO-LSTM model exhibits smaller errors and outperforms other F I G U R E 11 Single model forecast results and errors for the WTI crude oil market.ALS-PSO, particle swarm optimization with an adaptive learning strategy; BPNN, back-propagation neural network; LSTM, long short-term memory; PSO, particle swarm optimization; RNN, recurrent neural network; SVM, support vector machine.F I G U R E 12 Single model forecast results and errors for the Brent crude oil market.ALS-PSO, particle swarm optimization with an adaptive learning strategy; BPNN, back-propagation neural network; LSTM, long short-term memory; PSO, particle swarm optimization; RNN, recurrent neural network; SVM, support vector machine.T A B L E 6 Calculated values of evaluation indicators for a single prediction model and a single optimized prediction model.

F I G U R E 13
Hybrid model forecast results and errors for the WTI crude oil market.
To verify the stability and accuracy of our proposed hybrid model, we compare it with five representative hybrid prediction models, namely, EMD-LSTM, EEMD-LSTM, CEEMDAN-LSTM, CEEMDAN-PSO-LSTM, and CEEMDAN-ALS-PSO-LSTM.The comparative evaluation results of the model predictions are presented in Figures13 and 14and Table

F
I G U R E 14 Hybrid model forecast results and errors for the Brent crude oil market.WANG ET AL.

F I G U R E 15
Error between all models and true values.accumulationduring reconstruction and model computation time.To more intuitively illustrate the superior accuracy and stability of CEEMDAN-SE-ALS-PSO-LSTM compared with other hybrid and single models, we select the most recent data for 2 months (August 19 to October 19, 2022) and present the prediction errors of the proposed hybrid model and other competitive models in two different real data sets in Figure15.As shown in the figure, the prediction error of the single prediction model fluctuates significantly.After introducing the signal decomposition algorithm, fluctuations are reduced significantly, indicating that the predictive performance of the single model is inferior to that of the prediction model that incorporates signal decomposition.Furthermore, after introducing the signal decomposition algorithm, the proposed hybrid model exhibits significantly smaller fluctuations and is closer to the real curve.
Given the nonlinear and nonstationary nature of crude oil futures price data, a single prediction model or a purely optimized prediction model cannot fully capture the hidden patterns in prices.In the current study, we propose a novel hybrid prediction model based on CEEMDAN-SE and ALS-PSO for crude oil price prediction.The core of the model is CEEMDAN-SE, ALS-PSO, and the LSTM model.The method includes three steps: (1) data decomposition, using a signal decomposition method to decompose the crude oil price time series into modal components of different frequencies; (2) recombination, recombining these components on the F I G U R E 16 Individual comparison of the two models.
WANG ET AL.
Statistical description of price data.PSO of getting stuck in local optima.EMD, EEMD, CEEMDAN, and CEEMDAN-SE were introduced to decompose and reconstruct the time series, verifying that CEEMDAN-SE can reduce error accumulation.In summary, 12 benchmark models were selected in the empirical section with logical reasoning to compare and analyze the predictive ability of the proposed hybrid model.
T A B L E 4 Sample entropy (SE) calculation results.
: ALS-PSO, particle swarm optimization with an adaptive learning strategy; BPNN, back-propagation neural network; LSTM, long short-term memory; MAE, mean absolute error; MAPE, mean absolute percentage error; PSO, particle swarm optimization; RMSE, root mean square error; RNN, recurrent neural network; SVM, support vector machine. Abbreviations T A B L E 7 Calculated values of evaluation indicators for the hybrid prediction model and the hybrid optimized prediction model.
Abbreviations: ALS-PSO, particle swarm optimization with an adaptive learning strategy; LSTM, long short-term memory; MAE, mean absolute error; MAPE, mean absolute percentage error; PSO, particle swarm optimization; RMSE, root mean square error; SE, sample entropy.