Ultra-short-term multi-step wind power forecasting based on CNN-LSTM

The ﬂuctuation and intermission of large-scale wind power integration is a serious threat to the stability and security of the power system. Accurate prediction of wind power is of great signiﬁcance to the safety of wind power grid connection. This study proposes a novel spatio-temporal correlation model (STCM) for ultra-short-term wind power prediction based on convolutional neural networks-long short-term memory (CNN-LSTM). The original meteorological factors at multi-historical time points of different sites throughout the target wind farm can be reconstructed into the input window of the model, and thus a new data reconstruction method is represented. CNN is used to extract the spatial correlation feature vectors of meteorological factors of different sites and the temporal correlation vectors of the meteorological features in ultra-short term, which are reconstructed in time series and used as the input data of LSTM. Then, LSTM extracts the temporal feature relationship between the historical time points for multi-step wind power forecasting. The STCM based on CNN-LSTM proposed in this study is suitable for wind farms that can collect meteorological factors at different locations. Taking the measured meteorological factors and wind power dataset of a wind farm in China as an example, four evaluation metrics of the CNN-LSTM model, CNN and LSTM individually used for multi-step wind power prediction, are obtained. The results show that the proposed STCM based on CNN-LSTM has better spatial and temporal characteristics extraction ability than the traditional structure model and can forecast the power of wind farm more accurately.


INTRODUCTION
In recent years, wind power has developed rapidly all over the world. The fluctuation and intermission of wind power output bring unstable factors to the power system. Improving the forecasting accuracy of wind power is an effective way to reduce the instability of the power system caused by large-scale wind power integration.
The forecasting models of wind power are mainly divided into physical, statistical, machine learning and combined models [1]. The physical models convert the numerical weather forecasting (NWP) data into wind speed of the height of the wind turbine by means of microscale meteorology and computational fluid dynamics and forecast wind power indirectly by the conversion calculation [2]. The physical models can be applied to new wind farms without a large amount of historical data, and it is easy to realise the medium and long-term forecasting of wind power. The forecasting accuracy mainly depends on the accuracy of NWP data, the information of the physical environment around the wind farm and the accuracy of the physical model [3]. However, physical models are not suitable for shortterm wind power forecasting because of the high calculation cost. Based on a large number of historical data of wind farms, the statistical methods use algorithms including Kalman filter [4], autoregressive (AR) model, autoregressive moving average (ARMA) model [5] to extract the linear relation between input features (NWP, historical measured data) and wind power. Statistical models can achieve short-term wind speed forecasting, but they cannot analyse non-linear relationships between the variables [6]. Machine learning models such as backpropagation network [7], radial basis function [8], extreme learning machine [9], support vector machine (SVM) and Gaussian process [10] establish a black-box model to fit the non-linear relationship between the input characteristics and the output wind power by learning and training a large number of historical measured data. However, the shallow machine learning models can only extract very superficial features and have weak learning ability for multi-dimensional big data.
Compared with shallow machine learning, the deep learning models have a stronger ability of computing and complex function fitting. Through non-linear optimisation of multi-layer network structure, deep learning models can automatically extract the inherent features in data from the lowest to the highest level [11]. Some scholars try to apply the deep learning models to wind power forecasting based on historical data to improve the accuracy of wind power forecasting [12]. The ultrashort-term and short-term wind speed forecasting models of three hidden layers are established by using the deep Boltzmann machine [13]. In [14], migration learning model was applied to transfer wind speed forecasting models trained by wind farms with rich historical data to wind farms with less historical data. In [15], the deep belief network model was applied to short-term wind speed forecasting to obtained high prediction accuracy in practical examples. A new forecasting model based on neural network (NN) and a novel chaotic shark smell optimisation algorithm was proposed and the effectiveness of the proposed forecasting model was tested on two real-world case studies [16]. The authors in [17] proposed a new wind power prediction approach which included an improved version of Kriging interpolation method, empirical mode decomposition (EMD), an information-theoretic feature selection method, and a closed-loop forecasting engine. In [18], a prediction approach based on the improved EMD (IEMD) in conjunction with a hybrid framework consisting of the bagging NN (BaNN), K-means clustering method, and a stochastic optimisation algorithm was proposed.
CNN and LSTM are two main deep learning models [19]. In [20], a probability forecasting model of ultra-short-term wind power based on CNN was proposed, and the accuracy of the model was verified. In [21], CNN and physical model were combined for forecasting, which further reduced the forecasting error of short-term wind power. In [22], LSTM models were applied to short-term forecasting of wind speed and wind power. In [23], the short-term wind power interval prediction based on two typical recurrent NN (RNN) models, Elman network and the nonlinear autoregressive with exogenous inputs (NARX) model and lower upper bound estimation method was investigated. Taking into account the impact of meteorological information data on wind power prediction, the authors in [24] sifted multivariate meteorological information data highly relevant to wind power with distance analysis as the input data of the LSTM model and modelled the time series from the viewpoint of time with LSTM. The authors in [25] combined CNN and LSTM to forecast wind speed and considered the influence of various meteorological factors such as temperature, wind speed, and wind direction on the wind speed in time and space. In [26], the authors investigated the combined performance of the wavelet packet decomposition and the CNN and CNN-LSTM in the wind speed multi-step prediction to extract the hidden features of the wind speed time series.
Existing individual CNN and LSTM can establish the nonlinear correlation between output and input variables through a large amount of historical data to predict wind speed or wind power. And each model has its advantages and disadvantages. The combination of CNN and LSTM can realise the complementary advantages of each model to further improve the accuracy of forecasting [25]. The power of a wind farm is related to meteorological factors such as wind speed and wind direction at various sites of the wind farm. The meteorological factors are continuous in time and space resulting in a significant cross-correlation between the factors of a target and its adjacent site [27]. However, the existing ultrashort-term forecasting methods of wind power usually ignore the influence of spatio-temporal correlation of meteorological factors at different sites of the wind farm on the wind power.
Thus, this study combines the advantage of extracting spatial features of CNN and the advantage of extracting timeseries features of LSTM to extract the spatio-temporal correlation between multiple meteorological factors and wind power.
The contributions of this study are as follows.
(1) Considering the influence of multiple meteorological factors of different sites throughout the target wind farm to wind power, a novel spatio-temporal correlation model (STCM) based on CNN-LSTM for ultra-short-term wind power prediction is proposed. (2) A new data reconstruction method is proposed. The input matrix is constructed by the meteorological factors at different sites of the wind farm as the vertical axis and the ultra-short-term historical time as the horizontal axis. CNN is used to extract the spatial correlation of meteorological factors on the vertical axis and the ultra-short-term temporal correlation of the features on the horizontal axis. The correlation vectors of each input matrix extracted by CNN are constructed in a long-term time series and used as the input data of LSTM to extract long-term historical temporal relationship for multi-step wind power prediction. (3) Based on the data reconstruction method, the study uses multiple independent models to share the same input matrix to achieve a multistep prediction of wind power and reduce the time for data processing.
The organisation of this study is as follows. Section 2 introduces the structure of STCM based on CNN-LSTM. In Section 3, the spatio-temporal of multiple meteorological factors (specifically refers to wind speed and wind direction in this study) at different sites of the wind farm is analysed. The wind speeds and wind directions measured by benchmark wind turbines are reconstructed and used as the input of the model. And the STCM based on CNN-LSTM for multi-step wind power forecasting and error calculation method is presented. In Section 4, taking a wind farm in China as an example, the calculation results of four evaluation indexes show that the STCM based on CNN-LSTM established in this study can predict the wind farm power more accurately than the individually deep learning model (CNN, LSTM). The principle of CNN algorithm CNN is a deep feedforward NN with convolution structure using a supervised learning method [28]. The structure includes a convolutional layer, pooling layer and full connection layer as shown in Figure 1. Convolutional and pooling layers are the core modules of the CNN network feature extraction. The input features are convoluted through the convolutional layer. The pooling layer samples information from the preceding convolutional layer and minimises the spatial size [29]. Then, the full connection layer maps the two-dimensional feature to one-dimensional data for output.
1. Convolutional layer: The convolution operation of the convolutional layer can reduce the noise and enhance the key information of the original input features. Assuming that v is the input features of the original wind speeds and directions, and w is the convolution kernel of order J × I, and the spatial correlation characteristic matrix y of the wind speeds and directions of order M × N is output after the activation function, the element in row m and column n is defined as [30] y mn = f ( where m = 0,1,…,M -1; n = 0,1,…,N -1; w i,j is the element in row i and column j of w; v m+i,n+j is the element in row m + i and column n + j of v; b is the bias; f is the activation function. 1. Pooling layer: In the pooling layer, the spatial correlation matrix y of wind speeds and directions after through the convolutional layer is further reduced by taking the average value (average pooling) or the maximum value (maximum pooling) of the area to save useful information while reducing the amount of data processing. Assuming that the dimension of the pool area is S1×S2, the dimension of the spatial correlation characteristic matrix C after output processing is (M/S 1 )×(N/S 2 ), and the calculation formula of average pooling is defined as [31] where C ab is the element in row i and column j of C; a = 0,1,…,M/S 1 -1; b = 0,1,…,N/S 2 -1; y aS1+i,bS2+j is the element in row aS 1 +i and column bS 2 +j of y.
1. Fully connected layer: In the full connection layer, the spatial correlation characteristic matrix C processed by the pooling layer is expanded into one-dimensional data output, which is expressed as [31] where c = [c 1 ,c 2 ,…,c i ,…c n ] is the n-dimensional input variable; k = [k 1 ,k 2 ,…,k i ,…k n ] is the connection weight; p is the onedimensional output value of the spatial correlation characteristics of wind speeds and directions.

The principle of LSTM algorithm
LSTM is a special RNN with stronger feature extraction ability for processing sequence data [32]. LSTM introduces memory unit on the basis of RNN, which is controlled by input, output and forgetting gates. It can better realise the storage, screening and control of information flow under the time feedback mechanism, effectively avoid information loss and solve the problem of gradient disappearance and explosion. The structure of LSTM is shown in Figure 2. The symbol σ represents sigmoid activation function, and tanh is tanh activation function. The input information flow enters from the output variable p t-1 at the previous time and the input variable v t at the current time. Through the control of input, output and forgetting gates, the memory unit C t-1 is updated to C t , and the output value at the current time is p t . The output values of input, output and forgetting gates are i t , o t and f t , respectively.
The transformation equation is defined as [33] where k xi , k hi , k ci are weight matrixes of input, output at the previous moment and memory unit to input gate, respectively; k xf , k hf , k cf are weight matrixes of input, output at the previous moment and memory unit to forgetting gate, respectively; The architecture of the long short-term memory (LSTM) model

FIGURE 3
Output mode of CNN-LSTM multi-step forecasting model k xo , k ho , k co are weight matrixes of input, output at the previous moment and memory unit to output gate, respectively; b i , b f , b o , b c are the bias values of input, forgetting, output gates and memory unit, respectively [33]; "⋅" is Hadamard product.

The STCM based on CNN-LSTM and data reconstruction method
Meteorological factors are closely related to wind power forecasting, including wind speed, wind direction, temperature, air pressure, and humidity; and the meteorological features of a region are similar to those of its adjacent regions [27]. Making full use of the meteorological information of different site throughout the target wind farm can improve the accuracy and reliability of wind power forecasting. CNN-LSTM model combines the advantages of CNN and LSTM, which can extract spatial local features while time-series modelling. CNN is used to extract the spatial correlation feature vectors of meteorological factors of different sites, which are constructed in time series and used as the input data of LSTM network, and then LSTM network is used for ultra-shortterm wind power forecasting. The multi-step forecasting model of wind power based on CNN-LSTM proposed in this study is shown in Figure 3.
Here,P (p t , p t +1 , … p t +m … p t +M −1 ) is the value of the wind powers of the target wind farm at the next M moment, C (IW (t − m, f n ), p t +m ) is the correlation coefficient between the value of the nth meteorological factor at historical time point t-m and wind power value at the moment t+m, and nm is the error term.
The CNN-LSTM forecasting model with an output of M steps has M CNN-LSTM structures that share the same input, simplifying the process of data preprocessing. The M models are trained independently and do not interfere with each other. The structure of a single CNN-LSTM model is shown in Figure 4, N meteorological factors at different positions of the wind farm at M times form an input window. After convolution by multiple two-dimensional convolution kernels, CNN can extract the spa-

Spatio-temporal correlation of wind speeds and wind directions at different positions of the wind farm
Wind farm has dozens to hundreds of wind turbines, usually influenced by distance and eddy current effect of wind turbine blades, the wind speed, and direction of each wind turbine are not exactly the same but have the correlation. The change of wind speeds with time, measured by four benchmark wind turbines in a wind farm in China is shown in Figure 5, where v is the wind speed value, t is the time, and the data resolution is 5 min. The distribution rule of wind directions is represented by wind direction rose map, and the wind direction rose maps of the four benchmark wind turbines based on the wind speed and wind direction data of 2017 are shown in Figure 6. The wind direction in the figure is represented by eight directions, namely east (E), west (W), south (S), north (N), southeast (S-E), southwest (W-S), northeast (N-E), northwest (N-W). Different colours represent different wind speed ranges, and the number above each colour represents the frequency statistics of the corresponding wind speed section in this direction.
From the figures above, it can be seen that the wind speeds and wind directions distribution of four benchmark wind turbines in a wind farm have an obvious similarity. Therefore, it is very necessary to explore the wind speeds and wind directions temporal and spatial correlation of different wind turbines in the wind farm so as to achieve the power forecasting of the wind farm.
The wind speeds and wind directions measured at different benchmark wind turbines of wind farm are taken as input variables features, and the power values of wind farm are taken as output variables to construct the learning model. In order to extract the spatial correlations of each input features and consider the temporal correlation of historical input data, the relationship between wind speeds and wind directions of the benchmark wind turbines in the previous hour and the power value of wind farm in the next hour is established by CNN-LSTM algorithm (the data resolution is 5 min, and the input features of the previous hour are used to predict the wind power of the next hour). The input characteristic matrix of each data sample is represented as where t is the current time, v h t-i is the wind speed of the h-th benchmark wind turbine at t-i, and d h t-i is the wind direction of the h-th benchmark wind turbine at t-i.
The corresponding output is the wind farm power in the next hour, which is shown below as The wind power forecasting is to find the mapping relationship between V t and P t and consider the influence of historical sequence data. It is a high-dimensional regression problem, which needs CNN-LSTM algorithm to mine the temporal and spatial correlation of features.

Wind farm power forecasting model and error calculation method based on CNN-LSTM
The construction process of wind power prediction model based on CNN-LSTM is shown in Figure 7.
The specific steps are as follows: 1. Data preprocessing The importance of normalisation is actually so that one feature is not weighted more than another because it has a higher variance magnitude. In this scenario, the wind direction would naturally be weighted more, and thus wind speeds and direction must be normalised. Different attribute input features have a different correlation with wind power, and the orders of magnitude of features are different. Therefore, the input features are processed by the method of normalisation, and the wind power feature data is mapped to 0∼1 to form the initial features set [34]: where v t is the actual value of the input feature, v t ' is the normalised input feature, and the number of samples is k. The final wind power forecasting data shall be achieved by inverse normalisation according to Equation (9) [34]: where p t pre is the predicted value after inverse normalisation; p t pre' is the predicted value of wind power, p t is the actual value of wind power, and the number of sample points is k.

Model training
The normalised data is divided into a training set and test set according to 8:2. The training set is used for model training, and the error prediction is obtained by postdeparture operation. The mean squared error (MSE) is used as the objective function, and the parameters of the whole model are updated from the back to the front by using the Adam optimisation algorithm to find a set of parameters that minimise the MSE. Adam optimisation algorithm is a random objective function-step optimisation algorithm based on low-order moment adaptive estimation [35]. In order to  prevent overfitting, L 2 parameter regularisation is used in the full connection layer. The objective function of adding regular terms is shown in Equation (10) [36]: Here, N is the total sample number of the forecasting series, and P N is the rated installed capacity. p t and p t pre are the actual and forecasted values of the i-th wind power samples, respectively.

Sample processing and parameters setting
In order to verify the validity of the model proposed in this study, data samples of a wind farm in China are selected for wind power prediction. The wind farm installed 33 wind turbines with a total installed capacity of 49.5 MW. It is located in the middle and low mountain landform with undulating terrain and different height distribution of wind turbine. The historical power values and meteorological factors of the target wind farm in 2017 were measured, a total of 105,120 time points of data with a resolution of 5 min. After deleting the missing data, a total of 104,800 time points were collected. After data reconstruction (12 time point data for each sample), there are a total of 8732 samples. The first 6732 samples are taken as the training set, and the remaining 2000 samples as the test set. The training set is used to calculate the gradient and update the weights, and the test set gives the final evaluation index.
The multi-model method is selected for multi-step prediction. Based on the input features of the historical sequences length of 12, the 5 min-1 h (1-12 step) ultra-short-term forecasting of the power time series of the wind farm is carried out. Twelve CNN-LSTM multi-step models with the same network structure share the same input, and each model is trained independently without interference. The platform used for programming and the set super parameters is shown in Table 1.

Generalisation capability of the model
In order to prevent overfitting, the regularisation coefficient of L 2 parameter and dropout are set to reduce network complexity and improve the generalisation ability of the model. By comparing the variation of 5 min-1 h (1-12 steps) RMSE e RMSE for training and test sets with the iteration times, the generalisation ability of the model is analysed as shown in Figure 8. All the errors of the 12 networks with 1-12 predicted values in the training set decrease with the increase of iteration times, and there is still good convergence in the test set, thus the model has strong generalisation ability.

Comparison of forecasting performance between CNN, LSTM and CNN-LSTM
The CNN, LSTM can significantly improve the forecasting accuracy of wind farm power compared with ANN and SVM and other conventional machine learning algorithms [37,38]. Therefore, this study only analyses the advantages of CNN-LSTM compared with CNN and LSTM in extracting spatial correlation features of multivariate time series. The CNN and LSTM network uses the same input data to predict the wind farm power for the next hour. The comparison of the predic-tion errors e MAE , e MAPE , e RMSE and e NRMSE of the CNN, LSTM and CNN-LSTM models for the test set in the next hour (12 steps) is shown in Table 2.
It can be seen from Table 2 that the e MAE , e MAPE , e NRMSE and e RMSE of CNN-LSTM model are lower than that of CNN and LSTM model in every step in the next hour. The average MAE, MAPE, RMSE and NRMSE of the total model decrease by 33.77%, 30.69%, 25.3% and 23.3% (compared with CNN), 12.0%, 10.6%, 14% and 12.7% (compared with LSTM), respectively. The comparison results of the wind farm power forecasting values and the real power values of the test sets for the three algorithms are shown in Figure 9. Four samples of prediction results of next hour (12 steps) are randomly selected and the specific power values are shown in Table 3. It can be seen from Figure 9 that CNN-LSTM has better forecasting effect than CNN and LSTM.

CONCLUSIONS
Considering the temporal and spatial correlation of meteorological factors of different sites throughout the wind farm, a novel STCM based on CNN-LSTM for ultra-short-term wind power prediction is proposed. In order to deeply extract the temporal and spatial correlations for the STCM based on CNN-LSTM, a new data reconstruction method is proposed, saving data processing time while extracting spatio-temporal correlations for multi-step wind power prediction. Specifically, the wind speeds and wind directions measured by benchmark wind turbines are reconstructed and used as the input of the model. The STCM for wind farm power forecasting model and error calculation method based on CNN-LSTM is presented. To verify the accuracy and superiority of the proposed model, the measured wind speeds and wind directions by four benchmark wind turbines and the wind power data from a wind farm in China are used in the experiments. Moreover, the effectiveness and superiority of the proposed CNN-LSTM model are verified by the comparison of four evaluation metrics with CNN and LSTM individually used. The experiment results show that the average MAE, MAPE, RMSE and NRMSE of the total model decrease by 33.77%, 30.69%, 25.3% and 23.3% (compared with CNN), 12.0%, 10.6%, 14% and 12.7% (compared with LSTM), respectively. The proposed STCM based on CNN-LSTM for multi-step wind power forecasting fully considers the spatiotemporal correlation of meteorological factors throughout the wind farm and can forecast the power of wind farm more accurately. However, there are still some shortcomings that need to be completed. The different contribution degree of meteorological factors of different sites to the power of the wind farm needs to be explored further. Assigning different weights to different contribution degree of factors in the learning process will help further improve the forecasting accuracy of wind power based on the deep learning algorithm, which is the next research focus. In addition, it is necessary to improve the optimiser algorithm of the NN.