Multiscale ultra‐short‐term wind power prediction model based on GD‐IFEM‐PSO and VMD‐BP

Wind power prediction enables advance prediction of the future capacity of wind farms to improve production, increase capacity and reduce costs, however, wind power generation data are highly unstable, making it difficult to achieve high‐precision prediction. Signal decomposition methods and optimization algorithms allow for data smoothing and optimization of parameter settings, but the limitations of parameter dependency and the slow optimization search process prevent existing research from being applied in a practical setting. To address these issues, this article proposes a multiscale ultra‐short‐term wind power prediction model based on particle swarm optimization based on greedy dynamic and integrated fitness evaluation method (GD‐IFEM‐PSO) and variational mode decomposition and back propagation network (VMD‐BP). First, this article improves the velocity update formula in the particle swarm optimization algorithm by the proposed GD weight and sets up a fitness function using multiple evaluation metrics by the proposed integrated fitness evaluation method (IFEM), so that the improved particle swarm optimization algorithm achieves high efficiency in the optimization search while ensuring the comprehensiveness of the evaluation. Second, the variable mode decomposition (VMD) decomposition algorithm is used to decompose the historical wind power data to achieve smoothing of the wind power data, and the improved particle swarm optimization algorithm is used to optimize the K and ⍺ values in the VMD decomposition algorithm to improve the comprehensive performance of the decomposition. Then, to achieve data reduction and fast model training, the components are divided into trend components, low‐frequency vibration components, high‐frequency vibration components, and random noise components according to the central frequency, so that the model can better grasp the data trend while reducing the number of components and achieving high prediction accuracy. Finally, the different components are predicted by the BP neural network and the predicted values of the different categories of components are reconstructed into the final wind power prediction values. To demonstrate the rationality and progressiveness of the proposed model, several models are compared on several data sets, and the results show that the proposed model has faster prediction speed and higher prediction performance, and is more suitable for application in real‐world environments.

components are reconstructed into the final wind power prediction values.To demonstrate the rationality and progressiveness of the proposed model, several models are compared on several data sets, and the results show that the proposed model has faster prediction speed and higher prediction performance, and is more suitable for application in real-world environments.

K E Y W O R D S
BP neural network, parameter optimization, PSO algorithm, VMD, wind power prediction

| INTRODUCTION
With the gradual depletion of fossil energy sources, wind power has gradually become the main mode of global power generation, bringing a whole new level of security to global electricity consumption.According to statistics, the global installed wind power capacity in 2021-2022 is 94 GW, only 1.8% less than in 2020-2021, bringing the global installed wind power capacity to 837GW. 1 Of this, China adds 52 GW, accounting for 55% of the growth in installed wind power capacity in 2021-2022.China has become the world's largest wind power market due to its vast size and emphasis on new energy development.However, the excessive scale of wind power generation has led to serious abandonment rates, wasting large amounts of wind resources and increasing wind farm losses. 2 Wind power prediction can provide data support for wind farm production and power dispatch by predicting future wind power output, while high precision and high-efficiency wind power prediction can quickly provide accurate numerical predicts for relevant enterprises and departments, and adjust corresponding power generation strategies to reduce the abandoned wind rate through highly reliable predicts, 3 thereby reducing enterprise losses and ensuring grid security.However, wind power data are affected by the volatility and randomness of wind, and its data have serious nonstationary characteristics, which makes it impossible to achieve highly accurate predicts. 4,57][8] The empirical mode decomposition (EMD) algorithm is one of the most classical signal decomposition algorithms, however, it suffers from modal aliasing and endpoint effects.Wang Tao et al. had improved the EMD decomposition process by expanding the range of extreme points, effectively reducing the degree of overshoot or undershoot and improving the subsequent prediction accuracy, 9 but still did not completely solve the problem of modal aliasing and the endpoint effect.This improvement in the EMD decomposition algorithm alleviates the endpoint effect of the EMD decomposition algorithm, but the modal mixing problem of the EMD decomposition algorithm still exists, making the components unable to clearly represent the trend of the data and affecting the prediction accuracy of subsequent models.Wavelet decomposition is also one of the classical signal decomposition algorithms, which achieves a time-frequency domain decomposition of the original signal by means of an autonomously chosen parent function, thus avoiding modal aliasing and endpoint effects.Li Lingling et al. used wavelet decomposition to decompose the wind power data and then used dynamic sinusoidal adaptive weights to improve the position update equation of the atomic search algorithm.They also introduced crossover and mutation operators into the atomic search algorithm, which was used to optimize the neural network parameters for predicting wind power generation 10 ; Sahra Khazaei et al. decomposed the wind power data by wavelet decomposition, smoothed the wind power data and used a multilayer perceptron to predict the components, 11 but this approach does not avoid the problem of slow training and inefficient models caused by too many components.Although the above studies avoid the problems of modal conflation and endpoint effects that exist in EMD decomposition algorithms, wavelet decomposition relies too much on the choice of the parent function, 12 and its decomposition performance has a large randomness and uncertainty.The variable mode decomposition (VMD) decomposition algorithm is a relatively advanced signal decomposition algorithm that adaptively decomposes the original signal into a defined number of components with a custom number of decomposition layers, and achieves optimal frequency centers and finite bandwidth adaptive matching of the components through subsequent iterative solving of the variational model, and is more robust than other decomposition algorithms in terms of noise reduction.However, the disadvantage of this algorithm is that it requires a custom number of components, which may result in an incomplete decomposition or too many components in the final decomposition algorithm.To address this problem, Min Yu et al. optimized the K and ⍺ values of VMD decomposition by whale algorithm and predicted the components generated by VMD decomposition using gate recurrent unit and attention mechanism 13 ; Krishna Rayi Vijaya et al. proposed a meta-heuristic population-based sine cosine integral water cycle algorithm to optimize the K and ⍺ values of VMD 14 ; Zhang Yagang et al. selected the mode number of VMD decomposition by energy difference 15 ; and Zhang Lin et al. used the gray wolf algorithm to optimize the K and ⍺ values of VMD decomposition locations using the envelope entropy of the components as the fitness function. 16Optimization of parameters using optimization algorithms is a common approach in current scientific research.Especially in wind energy field research, the use of optimization algorithms to optimize model parameters is a good choice.For example, Ayman Al-Quraan et al. used the WOA optimization algorithm to optimize the parameters of three probability distribution models, Weibull, Gamma, and Rayleigh, to reduce the error between the estimated and measured wind speeds in a wind energy prediction study for several wind farms in Jordan; Hiba H. Darwish et al. proposed a new probability distribution A new probability distribution model was proposed by Hiba H. Darwish et al. and various optimization algorithms (GA, bacterial foraging optimization algorithm, simulated annealing, etc.) were used to optimize the parameters of Wellbull and the new model.Although the above optimization algorithms are able to achieve the optimal parameters, the optimization process is slow, and the selection of the fitting function of their optimization algorithms is too singular to comprehensively evaluate the decomposition performance of the decomposition algorithms, which affects the efficiency of the whole prediction process, and prevents them from realizing a model that is both accurate and efficient.
In summary, although EMD decomposition, wavelet decomposition and VMD decomposition decompose the wind power data in the time and frequency domains respectively, and make the wind power data stable, there are certain shortcomings in the above methods: (1) the number of modal components cannot be adjusted, and there are problems of over-decomposition or underdecomposition 17 ; (2) although the EMD algorithm has been improved, there are still serious problems of modal mixing, 18 and subsequent improved algorithms such as ensemble empirical mode decomposition (EEMD) and complementary ensemble empirical mode decomposition (CEEMD) also suffer from incomplete decomposition or too many components 19 ; (3) although VMD decomposition allows for customization of the number of decomposition levels and generally outperforms the other two decomposition algorithms, the number of decomposition levels is uncertain, and either too low or too high will affect the decomposition performance of VMD 20 ; (4)although the optimization algorithm can be used to achieve the optimization of the parameters of the VMD decomposition algorithm, the existing optimization algorithm has the problems of too slow search and single fitness function.Most of the optimization algorithms can only optimize the number of components from specific indicators such as envelope entropy, 21 envelope sample entropy, 22 and envelope sample entropy 23 when used in the evaluation of the decomposition algorithm after the letter, and cannot find the optimal parameters that can make the decomposition algorithm decomposition performance is high and the decomposition speed is fast.
Therefore, to achieve high accuracy and highefficiency wind power prediction, considering the optimization algorithm can optimize the parameters of VMD decomposition algorithm and the advantages of VMD decomposition algorithm exists, this article proposes a multiscale ultra-short-term wind power prediction model based on particle Swarm optimization based on greedy dynamic and integrated fitness evaluation method (GD-IFEM-PSO) and VMD-back propagation network (BP).To solve the problem that the existing optimization algorithm has a single fitness function and cannot measure the performance comprehensively, this article proposes the integrated fitness evaluation method (IFEM), which combines multiple evaluation methods to comprehensively evaluate the performance of the selected parameters and achieve the best decomposition algorithm in all aspects; to solve the problem of low efficiency and the slow process of finding the optimum, the greedy dynamic weights (GD weights) are proposed by combining the idea of speed optimum in the greedy algorithm to improve the position update formula in the particle swarm optimization (PSO) algorithm so as to select more suitable K and ⍺ values faster; to achieve dimensionality reduction of the input data of the neural network and improve the training speed of the model, from the perspective of the frequency domain, the components are divided into trend components, lowfrequency fluctuation components, high-frequency fluctuation components, and random components according to the central frequency of each component; to maximize the training efficiency of the model, by comparing different models' prediction performance, the classical simple BP neural network was selected as the prediction model, and the BP neural network was used to predict different types of components while reconsitution them to achieve an efficient and accurate wind power prediction model.Finally, the model proposed in this article is validated using various data sets, proving its higher efficiency and prediction accuracy compared with other advanced models., so that the particles can be evaluated comprehensively and the particles that can achieve VMD decomposition performance of the particle that is optimal across the board.3. To achieve better trend capture for different frequency components and thus achieve higher prediction accuracy, the different components are divided into trend components, low-frequency vibration components, high-frequency vibration components, and random noise components according to their central frequencies and reconstructed in this article.This can reduce the number of components and improve the speed and efficiency of prediction while ensuring high prediction accuracy.4. By comparing various advanced models, it is demonstrated that the multiscale ultra-short-term wind power prediction model based on GD-IFEM-PSO and VMD-BP proposed in this article can achieve higher prediction accuracy and faster prediction speed, which is more suitable for practical production needs and further ensure and improve the safety and stability of wind farm production.

| VMD
For the unstable characteristics of wind power data, the signal decomposition algorithm can smooth them out and provide smooth data for the prediction model.The existing EMD decomposition algorithm is stable with modal blending and endpoint effect, the decomposition performance of wavelet decomposition depends on the choice of the parent function and has strong uncertainty, the VMD decomposition algorithm can adjust the number of decomposition layers to achieve the best frequency center of the components and finite bandwidth adaptive matching, which is a more advanced and excellent decomposition algorithm at present.Therefore, this article will use the VMD decomposition algorithm to decompose the historical wind power data.
The VMD algorithm is a signal decomposition algorithm proposed by Dragomiretskiy and Zosso in 2014. 24Although this algorithm is a signal decomposition algorithm similar to the EMD series of decomposition algorithms, its core concepts are somewhat different.The decomposition algorithm of the EMD series treats the signal as a set of sine waves defined by Professor Huang E as intrinsic modal components.VMD decomposition follows the concept of intrinsic modal components, but its definition has been modified.The concept of frequency domain correlation has been introduced into the definition of intrinsic modal components, and the decomposition algorithm has been associated with frequency domain analysis. 25The intrinsic modal components in VMD decomposition are defined as follows 26 : where u t ( ) is the amplitude of the intrinsic mode component, and ϕ t ( ) k is a phase function of the intrinsic mode component.
The core of the VMD decomposition algorithm is to construct a variational model, which is iterated to achieve the solution of the optimal center frequency and bandwidth. 27The variational structural model is shown in the following equation: where u k is the modal component, ω k is the center frequency, f is the raw data, j is an imaginary unit, and δ is the Dirac distribution.
The constraint on the variational problem is achieved through formula (2), and the modal function and center frequency are obtained when the sum of the bandwidth of the center frequency of each modal component is the smallest.The decomposition process is shown in Figure 1, and the specific steps are as follows: ; where λ ˆ1 is the Lagrangian operator during the first cycle, k is the current number of modes, and n is the current cycle number.

Update u ˆk, if its central frequency big than 0;
Update formula of u ˆk is as follow: where . α is the punishment parameter.

Update ω k ;
Update formula of ω k is as follow: 4. Determine k is equal K ; where K is the custom number of intrinsic mode components.

If k K
< , repeat step2~step4; 6.If k K = , update every λ that its central frequency big than 0; Update formula of λ is as follow: 7. Determine whether the condition for stopping iteration is met; The condition for stopping iteration is as follow: where ε is containment error, Usually set to 1e-7.8.If the condition can not be met, n n = + 1, and repeat step2 ~step7; 9.If the condition can be met, output K intrinsic mode components.
Through the above steps, the intrinsic mode components are continuously iteratively searched to complete the decomposition.It can be seen from the above steps that the K value and α are particularly important for VMD decomposition.Usually, for the selection of the K value, the empirical selection method is used to set the K value based on past experience, 28 but this method relies on expert experience and is highly subjective, which can easily lead to over decomposition or under decomposition 29 ; the value of ⍺ is usually set as a common value, so the settings often make the VMD decomposition not achieve an optimal decomposition effect.Therefore, this article will use the improved PSO algorithm to optimize the K value and α.

| Improved particle swarm optimization (IPSO)
Optimization algorithms are used to select the global optimum by learning certain behaviors or strategies in nature, for example, genetics, whale predation, wolf predation, bird predation, and so on.The classical optimization algorithms are Genetic Algorithm (GA), 30 Whale Optimization Algorithm (WOA), 31 Grey Wolf Algorithm (GWO), 32 PSO Algorithm, and so on.In this article, for the purpose of optimization of the VMD decomposition parameters, the PSO algorithm is chosen to find the best K and ⍺ values.
PSO 33 is an optimization algorithm inspired by the migration and predation of birds and flocks of birds in the animal kingdom.The algorithm is often compared with the feeding activity of a flock of birds, the particles in the algorithm are compared with a flock of birds, the evaluation function of the search results is used as the fitness function and the search space is compared with the feeding space of a flock of birds.The successive iterations of the algorithm are like a flock of birds searching for food in the feeding space until the maximum food is found, that is, the parameter with the highest fitness is found and the iteration ends.According to the PSO algorithm, the velocity update formula and the position update formula are as follows: where ω is inertia weight, it determines the effect of the previous speed on the current speed.Begin the iteration, PSO needs to search for the optimal value in a large range and with a large stride, so this effect is greater, usually this initial value is set 0.9.Then as the search progresses, the particle swarm needs to perform a refined search to reduce the impact, thereby performing a refined search on the search space, so it will be reduced to 0.4; v i (the right side of formula) is the speed at last moment; c 1 , c 2 are the acceleration factor.They respectively represent the offset weights advancing toward their own optimal value and the global optimal value, their range is [0,4], usually their values are set same value; r 1 , r 2 are random value, their range is [0,1], they Guaranteed randomness of search.The flow of the PSO algorithm is shown in Figure 2. The fitness function is the determining factor to determine the position of the particle, and the speed update formula is the key factor related to the particle search process.The fitness function is used to determine whether to update the particle's own optimal value and the global optimal value of the particle swarm; the speed update formula is used to update the particle search direction and search step to realize the particle search process.Therefore, the fitness function and the speed update formula are the core of the PSO algorithm and the determinants of the optimal solution.In this article, the fitness function will be constructed by energy method, 34 mean absolute error (MAE), mean absolute percentage error (MAPE), and running time to form the best fitness function.The speed update formula is improved through the fitness function value to speed up the search for the optimal solution.

| Improved fitness function of PSO
The fitness function is the decisive factor to evaluate the particle search results.Usually, in the process of VMD decomposition and optimization, the fitness function is set as evaluation indicators such as energy difference and MAPE.The energy difference fitness function can be divided into two ways to evaluate the fitness of VMD decomposition: generally speaking, evaluate the difference between the energy recombined after VMD decomposition and the energy of the original data, and the difference is the smallest, which is the best decomposition 35,36 ; From the point of view of the energy of a single decomposition, the main focus is on the energy volatility, that is, to evaluate the fluctuation between the energy recombined after the last decomposition and the energy recombined after the current decomposition, and the decomposition where the fluctuation starts to be stable is best decomposition. 37The MAPE fitness function is an evaluation function for the decomposition model.MAPE is usually used to evaluate models, the lower the value, the more perfect the model 38 ; otherwise, the poorer the model.
The above methods can evaluate the effect of VMD decomposition better, but the above fitness function is not the best fitness function.Nowadays, people's requirements for models are not only for accuracy but also for time.In practical applications, fast and accurate models are what the society needs.Therefore, a new fitness function setting method IFEMs is proposed in this article to reconstruct the fitness function and select the energy method, MAE, MAPE, and running time to form a new fitness function.The evaluation steps are as follows: 1. Calculate the energy of raw data; where x i is the i-th raw data, n is the number of raw data.2. Use the parameters corresponding to the position of the particle to perform VMD decomposition; 3. Calculate energy differences, MAE, MAPE, Time decomposition time for recombined data.
where Energy D is the energy difference between the original data and the recombined data after decomposition, Energy C is the energy of recombined data; x ˆi is i-th recombined data, time S , time E are the decomposi- tion start time and decomposition end time, respectively.4. Linear normalization to energy differences, MAE, MAPE, and Time, and because the higher the fitness function, the better the solution, and the lower the MAE, MAPE, and time consumption, the better the solution.Therefore, the utility value is used to process these three indicators; where By changing the appropriate degree of entry, the excellent performance of the particle location can be better evaluated.While considering the analysis effect, consider the analysis time and take into account the analysis performance.

| Improved speed update formula of PSO
The traditional velocity update formula for PSO algorithms uses inertia weights, acceleration coefficients, and random numbers to update the velocity of the particles.The inertia weight determines the development capability of the optimization algorithm, while the acceleration factor determines the direction of motion of the particles and the step size of the motion.The traditional acceleration coefficients set the direction of motion weights of the particles to the same value, that is, giving equal importance to the particle optimal solution and the particle swarm optimal solution, however, this approach will prolong the process of finding the optimal solution, therefore, this article improves the traditional velocity update formula by improving the acceleration coefficients through GD weights to speed up the position update of the particles and accelerate the process of finding the optimal solution.The specific steps are shown below: 1. Calculate the fitness function value of the particle's position; fit Fitness k alpha = ( , ) , where fit i is the fitness function value of the i-th particle, Fitness () is the fitness function, k i and alpha i represent the K value and ⍺ value corresponding to the i-th particle, respectively.2. Calculate the acceleration coefficient c 1 , c 2 of the current particle through the improved acceleration coefficient calculation formula and GD weights where c i1 is the first acceleration coefficient of the i-th particle's speed update formula, c i2 is the second acceleration coefficient; ω i1 is individual acceleration GD weight of the i-th particle, ω i2 is the global acceleration GD weight of the i-th particle; ω SP is ZHANG ET AL.
| 4707 individual weight coefficient, its value decreases from 2 to 0.4.3. Calculate the velocity of the current particle with an improved acceleration factor.
From the above steps, it can be seen that by adding GD weights to the acceleration coefficient formula, the particle can dynamically update the acceleration coefficient after each iteration by the fitness function value, and decide the future update direction and step size by the current optimal value, which speeds up the search process for the global optimal solution.At the same time, this article argues that the global optimal solution must first be an individual optimal solution.Therefore, in the process of searching for the global optimal solution, a weighting factor that decreases from 2.0 to 0.4 is used to improve the search for the individual optimal solution and to ensure the acceleration of the search for the individual optimal solution.

| IPSO-VMD-BP
BP neural network is the most classic neural network.In the current field of wind power prediction, many studies use BP neural networks as the prediction model for single-feature wind power prediction. 39,40he core of BP neural network is "forward prediction, reverse correction," with a three-layer network structure, including input layer, hidden layer, and output layer.Each layer has multiple neurons, each connected to each neuron in the next layer and has a weight.The output of this layer is obtained by multiplying the input and weight, and then adding the offset weight of each layer, and then outputting to the next layer through the activation function as the input of the next layer.The reverse correction of BP neural network is its most important core.By reverse correction, the weights of the neural network can be updated based on the error between the predicted and true values, thereby achieving a more accurate prediction of the true values.
The reverse correction of BP neural network is its most important core.By reverse correction, the weights of the neural network can be updated based on the error between the predicted and true values, thereby achieving a more accurate prediction of the true values.The detailed introduction of these datasets is as follows: 1. Data set of a wind farm in Inner Mongolia, China.
This data set contains 5760 pieces of data, with a sampling frequency of 15 min and a sampling duration of 3 months.

Turkey wind turbine power generation data set
This data set contains 50,530 pieces of data, with a sampling frequency of 10 min and a sampling duration of 12 months.

Texas Wind Data Set
This data set contains 8760 pieces of data, with a sampling frequency of 1 h and a sampling duration of 12 months.

Electricity Consumption Data Set (Universal Time
Series Data set) This data set contains 26,304 pieces of data, with a sampling frequency of 1 h and a sampling duration of 3 years.

Solar Power Generation Data Set(Universal Time
Series Data set) This data set contains 52,560 pieces of data, with a sampling frequency of 10 min and a sampling duration of 12 months.

Road Occupancy Data Set(Universal Time Series
Data set) This data set contains 17,544 pieces of data, with a sampling frequency of 1 h and a sampling duration of 2 years.
This article divides six datasets into a training set and a validation set based on a 7:3 ratio.The partitioning results of the data set are shown in Figure 3.

| Evaluation methods
To reasonably and scientifically evaluate the prediction performance of the model proposed in this article, according to the current national evaluation standards for wind power prediction systems, this article selects Goodness of fit (R2 Score), 41 mean absolute error (MAE), 42 mean square error (MSE), 42 root mean square error (RMSE) 42 and average absolute percentage error (MAPE).As well as the running time as the evaluation criteria, the formula is as follows:  where n is the number of data, y ˆis the value of prediction, y is the value of test.

| Comparison of different decomposition algorithms
In this article, to stabilize data, we choose some single decomposition algorithms to deal the data.We choose EMD, CEEMD, and VMD as the way of stabilizing the data, and prediction them by BP neural network and LSTM.First, We use these ways to deal the data, and compare their decomposition performance.The comparison results based on six data sets are shown in Table 1.
In Table 1, We compare the completeness (degree of similarity between the component reconstructed and original data), time (decomposition time), and component number for different datasets under different decomposition algorithms.It can be seen decomposition performance of VMD is better than CEEMD, although the decomposition completeness of VMD is lower than CEEMD, it decomposition is faster than CEEMD, and its decomposition completeness can be maintained at a relatively high value.Then, this article compared the predictive performance of the data after decomposition using different decomposition algorithms.Taking the Inner Mongolia data set as an example, the results are shown in Table 2.
In Table 2, it can be seen that compared with EMD, CEEMD, Wavelet, and LMD, the combination of VMD + BP has the lowest MAE, lowest MSE, lowest RMSE, and highest R2-Score, it proves that the VMD stabilize the data better and that the model predicts future data more accurately.Specifically, in terms of MAE, VMD decreased by 56.7% compared with EMD, decreased by  1  and 2, it can be seen VMD is better than EMD, CEEMD, and Wavelet Decomposition in both decomposition and prediction performance.

| Comparison of traditional PSO and improved PSO
Although VMD is better than other decomposition algorithms, but its decomposition performance is not best, and its parameters are set based on experience.This way usually can not make the decomposition algorithm achieve the optimal performance.Therefore, this article uses PSO to find the best parameter.In this article, we improved the speed formula and fitness function of PSO, making PSO achieve better optimization performance.
First, this article set a reference number by CEEMD.As a current relatively new decomposition algorithms, CEEMD can achieve relatively good decomposition performance.Obtaining the number of components in this way can make subsequent optimization algorithms more efficient.Every data set's reference number of component is shown in Table 3.
After obtaining the modal volume reference value, the range of radius 5 is used as the optimization range of K value with the reference value as the center, and the range of radius 1000 is used as the optimization range of α value with 2000 as the center.In this article, the VMD decomposition method is used to decompose different data sets.The modified PSO algorithm is used to construct the fitness function using IFEM as the fitness function construction method, and the energy, MAE, MAPE, and running time are used to construct the fitness function to optimize the K and α values in the VMD decomposition.Where energy indicates data information and energy change indicates information change; MAE indicates the difference between rate data and reconstructed data; MAPE indicates the quality of the decomposition algorithm.In this article, the initial particle is set to 30, the initial position is set to a random number within the optimization range, the initial velocity is set to a random number within the maximum velocity range, and the energy stability is used as the termination condition to compare the optimal number of components selected by different fitness functions, taking the Inner Mongolia data set as an example, the comparison results are shown in Table 4, the energy change is shown in Figure 4, and the MAE, MAPE, and decomposition time are shown in Figure 5.In Figure 4, the horizontal coordinate is the change in energy with the optimization search process and the vertical coordinate is the energy.
The energy change of original data and reconsitution data.
ZHANG ET AL.
| 4711 Through Table 4, it can be proved that IFEM this paper proposed is relatively good fitness function.However, its search cost and time are still very long, and compared with methods based on overall energy differences and energy fluctuations, it only reduces a little time.Therefore, this article improved the speed update formula of PSO to accelerate the search process.
From Figures 4 and 5, it can be seen the energy of reconsitution data has been increasing throughout the search process, and its MAE has been decreasing, but its decomposition time has been increasing, and its MAPE is relatively stable in the early stages of search.If only one indicator is selected as the fitness function of PSO, when this indicator is high, other indicators are low, so the optimal decomposition of performance cannot be achieved.Therefore, in order to comprehensively and comprehensively find the optimal parameters, this article selects four indicators to combine as IFEM.
In Table 4, it can be seen that the optimization performance of PSO based on IFEM fitness function is the best.In terms of decomposition performance, its MAPE value is a relatively good value, increased by 2.63% compared with the way based on overall energy difference fitness function, increased by 1.32% compared with the way based on energy fluctuation fitness function, decreased by 59.42% compared with the way based on permutation entropy, decreased by 90.71% compared with the way based on sample entropy.In addition, in terms of optimization performance and prediction performance, IFEM can achieve the most balanced and the best performance.Its search cost time is only 1500 s, and it can achieve more precise prediction in less time.
In order to improve the new speed update formula can achieve faster search, this article compared the tradition speed update formula and improved speed update formula, the comparison results are shown in Table 5.
It can be seen from Table 5 that compared with the traditional PSO, the improved PSO has faster optimization speed, and the improved PSO can obtain better parameters in fewer rounds.
The overall search process is shown in Figure 6, and the best parameter of every data set are shown in Table 6 .

| Decompose data and reconsitute data
The optimal K value and α value are obtained through an improved PSO algorithm, and the VMD decomposition of the data set is achieved using this value.Taking the data set of a wind farm in Inner Mongolia, China, as an example, the decomposition results are shown in Figure 7.In Figures 7  and 8, the horizontal coordinate is the amplitude and the vertical coordinate is the frequency.
As can be seen in Figure 7, after improved variational mode decomposition (IVMD), the original highly volatile data is decomposed into multiple smooth components.However, excessive components exert too much pressure on subsequent prediction models, prolonging the training time of the model.Moreover, from a frequency domain perspective, in order to make more accurate prediction, components are divided into trend components, low-frequency vibration components, high-frequency vibration components, and random noise components based on the central frequency of each component.Taking the data set of a wind farm in Inner Mongolia, China as an example, its central frequency is shown in Figure 8.
Components of China inner Mongolia wind power data set after variable mode decomposition (VMD) decomposition.
As can be seen from Figure 8, there are certain differences in the center frequencies of different components.According to the frequency domain data analysis method, the higher the frequency, the more noise the data contains; the lower the frequency, the purer the data.Therefore, data can be divided into trend component (mainly the trend characteristics in the data), vibration component (mainly the changes in the data), and random noise component (mainly the noise in the data) according to the central frequency, and then the vibration component can be further subdivided into lowfrequency vibration component and high-frequency vibration component.Taking a wind farm data set in Inner Mongolia, China, as an example, its trend components are components 1, 2, 3, and 4, lowfrequency vibration components are components 5, 6, and 7, high-frequency vibration components are components 8, 9, and 10, and random noise components are components 11, 12, and 13.The reconstructed components are shown in Figure 9.
The frequency of each component after variable mode decomposition (VMD) decomposition.
In order to prove this way that combine these components to trend component, low-frequency vibration component, high-frequency vibration component, and random noise component can achieve more precise prediction, this article compared different combine ways, the results are shown in Table 7.
From Table 7, it can be seen that the train time is so long by using all components to predict, although its R2-Score is the highest.The way by using trend component, vibration component, and random noise component can train model so fast, but its R2-Score is the lowest.The way this article proposed is the best combine way, its R2-Score is same like the highest value, and its train time is so short.Specifically, compared with the first way in Table 7, the way this article proposed MAE increased by 4.45%, MSE increased by 12.28%, RMSE increased by 6.1%, R2-Score decreased by 0.05% and train time decreased by 16 s; Compared with the second way in Table 7, the way this article proposed's MAE decreased by 18.45%, MSE decreased by 25.58%, RMSE decreased by 13.7%, and the train time just increased by 4 s.It can prove the way this article proposed is the to combine these components.

| Data prediction
After reconsitute these components, we use BP neural network to predict the future data.In order to verify the advantages of the model proposed in this article, this article has done a lot of comparative experiments.The experimental results of China's Inner Mongolia data set are shown in Table 8 and Figure 10, and the experimental results of Turkey wind power data set, Texas wind power data set, and general time series data set are shown in Table 9.
From Table 8, it can be seen IVMD + BP that we proposed is the best model, Its prediction of performance is the best.The model has the lowest MAE, lowest MSE, lowest RMSE, highest R2-Score.Although R2-Score.Although using XGBoost and ELM to predict just only costs 0.6 and 0.1 s, their R2-Score is so low that it cannot achieve high precise prediction.Specifically, in term of MAE, compared with other prediction methods, it has decreased 6.75%, 33 In Table 9, there are other data set's experimental results.Model1 is this article's proposed model; the Model2 is a wind power prediction model with relatively excellent performance at present.Its core is to reconstruct the components decomposed by CEEMD based on gray correlation degree, and then make predictions; the Model3 is a wind power prediction model decomposed by VMD and predicted by XGBoost.Time1 is the decomposition time and Time2 is the time.From Table 9, compared with Model2, the model proposed in this article is very close to both in terms of prediction accuracy, and both meet the standard of high precision wind power prediction.In terms of decomposition speed and training speed, the values of the model proposed in this article are much lower than those of Model2, which proves that the efficiency of the model proposed in this article is much higher than that of Model 2; compared with Model3, although the prediction speed of the model proposed in this article is slower than that of compared with Model3, although the prediction speed of the model proposed in this article is slower than that of Model3, the prediction accuracy of Model3 is much lower than that of the model proposed in this article, and cannot reach the standard of high precision prediction.In summary, the model proposed in this article has high accuracy and high efficiency, which is particularly important for the practical application of wind power prediction models.Specifically, the prediction accuracy of the model proposed in this article only decreased by 0.22% compared with Model2 on the Turkish data set, but the decomposition time and training time decreased by 99.9% and 80.8% respectively, and the prediction accuracy increased by 0.98% compared with Model3, and the training time increased by 17 s.On the Texas data set, the prediction accuracy only decreased by The prediction results for each data set are shown in Figure 11.
In addition to this, multistep prediction experiments were carried out in this article and their results are shown in Table 10, Figures 12 and 13.
As can be seen from Figures 12 and 13, the performance of the model proposed in this article still has a greater advantage in multistep prediction performance and has stronger multistep prediction performance.Combined with Table 10, it can be seen that in the fourstep prediction, the model proposed in this article has the lowest MAE, the lowest MSE, the lowest RMSE, and the highest R2 score; in the eight-step prediction, the model proposed in this article is used for the lowest MSE, the lowest RMSE, and the highest R2 score, and its MAE is only slightly higher than that of the LMD + BP combination.
Specifically, in the four-step prediction, in terms of MAE, it is reduced by 9.48%, 37.02%, and 39.73%, respectively, compared with the other models; in terms of MSE, it is reduced by 52.50%, 59.96%, and 60.39%, respectively, compared with the LMD + BP combination.In terms of RMSE, it is reduced by 31.1%,36.72%, and 37.07% respectively compared with the other models; in terms of R2-Score, it was prediction accuracy, and prediction performance compared with other prediction methods, both in single-step prediction and multi-step prediction.

| CONCLUSIONS
With the expansion of the scale of wind power development, high-precision wind power prediction is becoming increasingly important for achieving safe and stable production of wind power.This article proposes a multiscale ultra-short-term wind power prediction model based on GD-IFEM-PSO and VMD-BP.
1. Compared with the classical decomposition method, this article uses the VMD decomposition method to quickly decompose and stabilize the historical wind power data, which reduces the difficulty of the model in extracting the temporal features of the wind power data, and thus improves the accuracy of the wind power prediction.At the same time, the optimal value of the K value and ⍺ value of the VMD decomposition is searched using the improved PSO algorithm, which makes the model's prediction accuracy very close and the prediction efficiency far more than other models compared with the existing excellent models.The prediction model proposed in this article can decompose and analyze historical wind power data in a relatively short time, and capture the temporal characteristics of wind power data through neural network models, thereby achieving high-precision wind power prediction.Improving the prediction accuracy of wind power and reducing the training time of wind power prediction models is of great significance for achieving the industrialization of wind power models, and also contributes to achieving global sustainable development.

4 | RESULT AND DISCUSSION 4 . 1 |
Data sources description and evaluation methods4.1.1 | Data sourcesThis article uses multiple datasets for simulation experiments to prove the model proposed in this article.The data sets used in this article include a wind farm data set in Inner Mongolia, China, wind turbine power generation data set in Turkey, wind power data set in Texas, and general time series data set.

2 .
Compared with the traditional optimization algorithm, this article proposes the use of GD weights and improves the speed update formula by this weight, which speeds up the search of the optimal solution and reduces the search time of the optimal solution on the basis of the existing performance.At the same time, this article proposes the IFEM fitness function setting method, which improves the traditional fitness function by adding multiple indicators such as time indicators on the basis of traditional evaluation indicators, and combines multiple indicators to comprehensively evaluate the fit of the search results, which improves the reasonableness of the optimization algorithm's optimization search and evaluation.3. Different from the existing component reconstruction methods, this article further subdivides and differentiates the vibration components, making it easier and more targeted for the neural network to learn data F I G U R The comparison results of MAE, MSE, RMSE for multi-step predictions of different models.MAE, mean absolute error; MSE, mean square error; RMSE,root mean square error.F G U R E 13The comparison results of R2-Score for multi-step predictions of different models.features.Through frequency domain analysis, the components are reconstructed into four types of components, namely, trend component, lowfrequency vibration component, high-frequency vibration component, and random noise component, according to the center frequency, which makes the neural network learn the features of each type of component in a more targeted way, and at the same time, reconstructing the data makes the data dimension of the model lower and improves the training speed.
d MAE , d MAE , and d MAE are the utility values of MAE, MAPE, and Time respectively.MAE′, MAPE′, and Time′ are Normalized value. 5. Combine the above four evaluation indicators as the fitness function value of the particle's current position.
DMAE MAPE Time The partition of datasets.
The comparison results of different decomposition algorithms based on four six sets.
T A B L E 1Abbreviations: CEEMD, complementary ensemble empirical mode decomposition; VMD, variable mode decomposition.T A B L E 2The prediction performance of different composition algorithms.Note:The bold values are the best value in different evaluation methods.Abbreviations: CEEMD, complementary ensemble empirical mode decomposition; MAE, mean absolute error; MSE, mean square error; RMSE, root mean square error.
The reference number of each data set.The comparison results of different fitness function.
T A B L E 3Abbreviations: IFEM, integrated fitness evaluation method; MAPE, mean absolute percentage error.
The best parameter of each data set.
.46%, 23.87%, 35.37%, 83.16%, 72.65%, 39.71%; in term of MSE, compared with other prediction methods, it has T A B L E 7 The comparison results of different combine way.The comparison results of different prediction way.Note:The bold values are the best value in different evaluation methods.Abbreviations: GAN, generative adversarial network; IVMD, improved variational mode decomposition; MAE, mean absolute error; MSE, mean square error; RMSE, root mean square error; VMD, variable mode decomposition; WOA, whale optimization algorithm.
F I G U R E 10 The comparison of different predict models.magnitude compared to Model2 and was able to maintain a prediction accuracy very close to that of Model2 with a higher accuracy, compared with Model3, due to the use of the machine model.The prediction speed is faster than the depth model used in this article, which is acceptable, but the prediction accuracy is not high enough to guarantee a high-accuracy wind power prediction.Therefore, it can be demonstrated that the model proposed in this article can achieve faster and more accurate wind power prediction and is more suitable for practical applications.