GCNInformer: A combined deep learning model based on GCN and Informer for wind power forecasting

Wind power energy is green, clean, and renewable, which is random and volatile. The integration of unstable wind energy severely threatens the security and constant operation of the power system. The need to enhance the reliability of wind power grid integration, mitigate the impact of wind power uncertainty, and develop a robust prediction model has become a pressing issue. However, only some people have considered the correlations among the power of multiple adjacent wind turbine arrays. In this paper, we propose GCNInformer to construct these relationships. Furthermore, we analyze the relationships among multiple features of individual wind turbines. GCNInformer is composed of two main components. The first component employs a graph convolutional network (GCN) to establish relationships among multiple wind turbine arrays, enhancing the correlation of the data. The second part employs Informer to extract the time information from the data and predict long‐term sequences. For training and testing, GCNInformer utilizes two data sets: Data_CQ and Data_DL. The evaluation of the model's performance is conducted using various metrics such as mean absolute percentage error, mean absolute error, root mean square error, and mean square error. Numerous experimental findings have validated the effectiveness of the GCNInformer.


| INTRODUCTION
The need to develop renewable energy sources has become more urgent due to rising crude oil prices and continued depletion. 1Wind energy is one of these new nonpolluting renewable energy sources, which is abundant, virtually unlimited, widely distributed, clean, and can moderate the greenhouse effect.It also has great potential for development.Wind power generation has gained significant momentum and has become a preferred choice for many countries. 2According to the Global Wind Report 2022, 93.6 GW of new wind power was installed worldwide in 2021, and by the end of 2021, the cumulative global installed capacity of wind power reached 837 GW, up 12.4% year-on-year.A record high of almost 21 GW, or more than three times the installed capacity from the prior year, was reached for the annual offshore wind generating capacity.Global onshore wind power is predicted to add 466 GW of new installations between 2022 and 2026, with an average annual addition of 93.3 GW and a compounded annual growth rate (CAGR) of 6.1%, according to the Global Wind Energy Council Market Intelligence platform.Between 2022 and 2026, the global offshore wind power market is expected to increase over 90 GW of new installations, with an average annual addition of 18.1 GW and a CAGR of 8.3%.Nevertheless, as the number of capacitors equipped increases, the efficient use of wind source has become an issue that must be addressed. 3Wind uncertainty is a major challenge in wind power generation, as wind availability has a direct impact on the generation process. 4Accurate forecasting of wind power has become a critical task with far-reaching implications and benefits for mankind to lessen the uncertainty of the chaotic nature of wind.
A novel hybrid deep learning model for wind power prediction is presented in this study.To predict future wind power values, the proposed model utilizes past wind power data.Compared to previous research, the model presented in this paper demonstrates significant performance improvements, exhibiting higher accuracy and precision.The key contributions of this study can be summed up in the following: 1.This paper proposes GCNInformer, a unique prediction network combining graph convolutional network (GCN) and the Informer model for effective wind power prediction.By jointly training GCN and Informer models, GCNInformer improves the accuracy and reliability of wind power forecast outcomes.2. By employing the GCN algorithm, we iteratively collect the feature information of each node's topological neighbors from the historical power sequence data of wind turbine units and integrate them to establish connections among the individual wind turbine units.This enables the extraction of additional time-series features.This approach addresses the issue of insufficient time-series information within a single time scale, thereby enhancing the data's correlation.3. The data set involves the historical power of several neighboring wind turbines, and there is a high degree of spatiotemporal correlation between these data.We use the Informer wind power forecasting model to comprehensively learn the time-series characteristics and improve model generalization and robustness.This model combines self-attention and multihead attention mechanisms, enabling more accurate capturing of relationships between different time scales and variables and extracting multiscale features.As a result, it further improves prediction accuracy and efficiency.
This research employs the Informer model for wind power prediction in the long series continuous prediction context.The following outlines the structure and organization of this study.Section 2 outlines the related work.The algorithm structure of GCNInformer, which is based on GCN and Informer, is presented in Section 3. The data set used, the data preprocessing steps, and the partitioning of the wind power forecasting data set are presented in Section 4 of the paper.Section 5 offers wind power prediction experiments showing that the proposed GCNInformer structure can improve the accuracy of wind power forecasts.A summary of the conclusions drawn in this study is provided in Section 6.

| RELATED WORK
Wind power forecasting models are commonly categorized into physical, statistical, artificial intelligence, and combinatorial. 5onstructing a physical prediction model requires a set of mathematical equations and physical laws based on multivariate physical parameters, considering the current and future climate conditions at the wind farm, meteorological information like temperature and pressure, and site-specific conditions such as surface roughness and topography. 6However, the modeling procedure is intricate and takes a lot of time to compute.The most widely used physical models are numerical weather prediction (NWP) techniques. 7NWP indirectly predicts wind power by converting data into wind speed at the height of the wind turbine utilizing microscale meteorology and computational fluid dynamics.The physical model is suitable for new wind farms and does not require historical wind farm power data to support it, but it does need extensive and accurate NWP data, which is computationally too expensive. 8he statistical model utilizes as much historical wind farm data as possible 9 and uses algorithms such as autoregressive (AR), 10 AR integrated moving average (ARIMA), 11 and Kalman filter 12 to extract a linear relationship between wind power input characteristics to establish a mapping relationship between them, replacing physical causality analysis.The AR model is considered a representative method for wind power forecasting and is an algorithm that was widely used in the early development of wind power forecasting. 13RMA model combines AR and moving average models. 14In the literature, 15 the use of the differential ARIMA model for short-term wind power prediction has been proposed to deal with the nonstationarity of wind power time series.This model overcomes the smoothness requirement limitation of the ARMA model, which makes it unsuitable for the highly stochastic and unstable nature of wind power.The statistical method is widely used to predict short-and very short-term wind power, especially when meteorological information is unavailable for wind power forecasting.It has a high prediction accuracy provided that sufficient historical data are available.However, their forecasting performance is limited by nonlinear and nonsmooth data, 16 their inability to deal with abrupt changes in the information, and their need to collect extensive historical data.
With the advancements in computer hardware and software, increased computing power, and the widespread adoption of artificial intelligence theories, statistical methods have been extended with machine learning techniques such as artificial neural networks (ANN), 17 support vector machines (SVMs), 18,19 and extreme learning machines (ELM). 20These approaches utilize a mapping model between input data and wind power generation using learned features, allowing the prediction of future power generation. 21ANN simulates the neural functioning of the human brain to describe the nonlinear relationship between wind power and meteorological data.SVM transforms the input space using nonlinear transformations to achieve high-dimensional representations based on statistical theory.SVMs have been introduced into wind energy forecasting and have achieved better forecasting results because of the advantages of fewer examples required, no dimensional catastrophes, and local optima.However, the choice of its kernel function and associated parameters is more dependent on the designer's experience and wind speed information.The ELM model is popularly used in various prediction fields since it converges faster and has less human factor interference.
Researchers have begun to explore deep learning methods in wind power forecasting due to the rapid development of deep learning theory and its successful application in various domains.Compared to other approaches, deep neural network models have demonstrated superior predictive performance, 22 such as convolutional neural networks (CNNs), 23 recurrent neural networks (RNNs), 24 and long short-term memory networks (LSTMs). 25Neural networks are powerful tools for achieving improved predictive performance.Increasing the number of hidden layers in the network enhances its ability to learn nonlinear relationships.CNN is an efficient feature extraction method that preserves the correlation among features.In recent years, many researchers have incorporated CNN into deep learning models as a feature extraction module to extract local information for deep data mining.RNN, with its recurrent network structure, is capable of storing and reintroducing the output values of hidden layer neurons in subsequent iterations.This enables effective learning of nonlinear features. 26LSTM, with its unique forget gate structure, adaptively preserves useful information in sequences, compensating for the structural limitations of RNN. 27,28However, they still have their limitations.For instance, CNN can only capture dependencies among short-term data, whereas RNN lacks long-term storage units and is prone to gradient vanishing. 29While LSTM, as a special type of RNN, can leverage the internal correlations among time-series data, its predictive accuracy is lower when dealing with discontinuous data features. 30A self-attention mechanism is used by the Transformer to capture intricate relationships and patterns from time-series data, but it still suffers from high time-space complexity and limitations on the length of the input and output sequences. 31hile each wind power prediction method has its own advantages, there are certain limitations in its application.Currently, a single prediction model can no longer meet the needs of prediction accuracy required due to the large wind power prediction errors currently obtained by a single prediction method, which are mostly caused by the intermittent and uncertain wind energy as well as the method's own limits.As a result, by integrating the benefits of several models, a combination of several forecasting approaches can be employed to get around the shortcomings of a single algorithm and increase forecasting accuracy.The combined approach considers all influencing factors and builds a prediction model from a broader perspective.By integrating the strengths of each prediction method and overcoming their limitations, the combined model achieves optimal outcomes for predictions.While methods based on statistics are commonly used for ultrashort-and short-term wind power forecasting, in realistic scenarios involving complex terrains and larger wind farms, they are often combined with other physical prediction methods to achieve more accurate results.Hybrid models that integrate machine learning and deep learning approaches have proven reliable in predicting wind speed 32 and have great potential in solving high-accuracy wind speed prediction problems.For wind speed prediction, Li et al. 33 incorporated spatial and temporal correlations, using CNN for spatial information extraction and LSTM for temporal feature extraction.A general temporal convolutional network that takes into account time-series features and is appropriate for all tasks was proposed by Bai et al. 34 after redesigning the CNN network structure.It has been demonstrated to handle timeseries prediction issues exceptionally well.Using a mixture of LSTM and SVM algorithms, Chen et al. 35 proposed a deeplearning strategy for time-series prediction.This nonlinear learning integration explores and exploits the hidden information in the wind speed time series.To mitigate the risk of overfitting and enhance the learning of the underlying pattern of wind power, Han et al. 36 proposed an improved LSTM network topology.This architecture effectively reduces the long-term memory of the stochastic component while preserving its short-term memory.A short-term wind prediction model with stacked noise-reducing autoencoders and a migration learning strategy was introduced by Hu et al. 37 This approach transferred information from data-rich wind farms to new wind farms with limited data, enabling accurate wind speed prediction in the absence of sufficient data.Chen et al. 38 introduced a new method for the nonlinear combination of two layers, known as EEL-ELM.The method first predicts wind speed separately by exploiting the advantages of each model of ELM, Evolutionary neural networks, and LSTM, and then learns the relationship between the single prediction result and the final prediction result using a nonlinear aggregation mechanism based on ELM to obtain better prediction performance.Li et al. 39 constructed a Keras-based multivariate LSTM (MV-LSTM) model for short-term wind prediction.The Transformer model for time-series data prediction was first presented by Wu et al. 31 Utilizing the self-attention mechanism effectively captures intricate dynamics and patterns within the time-series data.The Informer model, which accelerates inference for long series prediction tasks, was introduced by Zhou et al. 40 An architecture for multiview neural networks was presented by Xiong et al. 41 They employed gated recurrent unit neural networks to extract physical features and analyzed the patterns between these physical features and wind power.Table 1 provides a summary of the combinatorial models.
In conclusion, most recent research advances in wind power prediction are based on artificial intelligence models such as ANN, CNN, deep NN, RNN, and Transformer for prediction.These methods can better capture the time dependence of time series, improving wind power prediction.However, in many practical application problems, we need to predict long series time series, and we also need to consider the power correlation among adjacent wind turbines.At the same time, the prediction results of these models could be better.To effectively use wind power in the significant data era, learning how to construct more precise wind power prediction models is crucial.

| GCNINFORMER
To forecast wind power, this research presents a hybrid network model that blends GCN and Informer.This section introduces the GCN used to extract data characteristics first, then the Informer to predict wind power, and finally the suggested GCNInformer.

| The proposed model GCNInformer
In the GCNInformer, the time-series data from 10 wind turbines are transformed into 10 graph structures.The latent correlation layer captures the correlations between the data from different wind turbines.The 10 GCN modules are then used to extract feature information from each of the 10 graph structures for fusion learning, and the results are combined as the input to the Informer T A B L E 1 Summary of the combinatorial models.

Model
References Features

CNN-LSTM
Li et al. 33 Combining time-series forecasting and variable regression to predict future energy mix TCN Bai et al. 34 Significantly better performance than general loop architectures and longer memory than loop architectures of the same capacity EnsemLSTM Chen et al. 35 The model avoids the defects of weak generalization ability and poor robustness ILSTM Han et al. 36 While lessening the effect of the stochastic on the patterns in the long-term reminiscence units, ILSTM can preserve the current random in the network's short-term memory EEL-ELM Chen et al. 38 The model enhances the accuracy and speed of wind speed forecasting, as well as the stability of forecast outcomes MV-LSTM Li et al. 39 The model comprehensively considers multidimensional features and has high prediction accuracy Transformer Wu et al. 31 The model is a versatile framework that may be used with time-series data for different types of variables Informer Zhou et al. 40 The model can handle longer input sequences and accelerate long sequence prediction while reducing the time and spatial complexity of Transformer Abbreviations: CNN, convolutional neural network; ILSTM, improved long short-term memory; MV-LSTM, multivariate LSTM; TCN, temporal convolutional network.
network.The last fully connected layer obtains the final output.Figure 1 depicts the model's overall structure.

| Description of latent correlation layer
where d w is the size of the hidden dimension of Q w and K w0 , Q w and K w0 stand for the query and key, respectively, while R denotes the most recent hidden state.The overall time complexity of the latent correlation layer is O N d ( )

| Description of GCN
GCNs 42 are a variant of multilayer CNNs capable of learning features from unstructured graphs that operate directly on the network.A GCN functions as a feature extractor, much like a CNN.It learns each node by iteratively aggregating feature information from its topological neighbors and fusing This node's information is derived by GCN utilizing data from other nodes.If there is a batch of data with N w nodes, each node has its own feature and forms an N D × w w feature matrix X w .Then, the relationship between each node will also form an N N × w w adjacency matrix A w , and X w and A w are the input of GCN.
The GCN algorithm relies on the utilization of the Laplacian matrix, a matrix commonly used to represent graphs in graph theory.For a graph G E V = ( , ) , its Laplacian matrix is defined as , where L w is the Laplacian matrix, D w is the degree matrix of the vertices, which indicates how many degrees each node has, that is, how many edges are connected to it, and A w is the adjacency matrix of the graph.The key part of the GCN formula is derived as follows: 1.The core of GCN is based on the spectral decomposition of Laplacian matrices.For a Laplace matrix, its spectral decomposition is L U U = Λ w w w T , where Λ is the diagonal matrix of eigenvalues and U w is a vector consisting of eigenvectors.2.Then, the convolution operation is performed on the graph: denotes the operation of graph convolution on the feature vector x.
• First, the Fourier transform of the input feature x, that is, U x w T .• Then, the Fourier transform result is scaled, that is, . The advantage of this polynomial definition is that the above formula can be defined in the following form: • The Chebyshev formula is then used to simplify the computations.This is how the Chebyshev formula is described where • Finally, rewriting the graph convolution formula as follows: • Based on the empirical values, set K to 1, that is, the first-order approximation θ ( ) 5. Finally, the GCN layer-to-layer propagation is as follows: ( ) ( ) Here, , A w denotes the information matrix composed of the other matrix nodes connected to node A w , E is the identity matrix, D ˜w is the degree matrix of A ˜w, and  D A ˜= wii j wii .Θ represents a trainable parameter matrix.σ () indicates a nonlinear activation function, which is used in this paper as leakyrelu().The weight matrix of the neural network's lth layer is designated as

| Description of Informer
Informer is a long-sequence time-series forecasting model based on Transformer. 40It lowers the computational and spatial complexity of self-attention and optimizes the encoder and decoder components.Informer primarily addresses the problem of long-sequence data prediction.Figure 4 illustrates its general structure.
In the encoder, self-attention is ProbSparse self-attention, a new attention mechanism is used to reduce the computation, and the amount of network parameters and dimensions are decreased through self-attention distilling, while the robustness is increased with layer stacking replicas.In the decoder, the long-sequence input is received, and the target part is padding with zeros.The generative-style decoder predicts the target part and generates predictions in a single step instead of dynamic decoding.
ProbSparse self-attention optimizes the time complexity of self-attention dot product computation, reducing it from O L ( ) 2 to O L L ( × log ).This optimization improves the attention mechanism's efficiency and enhances the model's overall performance.In self-attention, the main dot product pairs bias the attentional probability distribution of the query away from homogeneity.This means that a sequence element will have a high correlation with only a few other elements, while the rest of the dot product pairs can be ignored or given lower attention weights.This selective attention mechanism allows the model to focus on the most relevant elements and improves the efficiency of computation.There must be a few larger probability values where the attention probability p k q ( | ) wj wi of the ith query for all keys significantly deviates from the uniform distribution q k q ( | ) = Using the Kullback-Leibler divergence, the ith query's sparsity is assessed.The relative entropy between them can then be used to determine the formula for determining the sparsity of the ith query The first term calculates the Log-Sum-Exp (LSE) of the product of q wi over all keys, while the second term calculates their arithmetic mean.However, this calculation has a memory consumption of O L ( ) 2 , and it is essential to be aware of potential numerical stability issues that may arise during the LSE operation.Accordingly, the approximation of query sparsity evaluation is proposed: memory and the LSE has potential numerical stability problems.Accordingly, the query sparsity evaluation is approximated as follows: (10)   When indicator q K M ( , ) wi w A is higher, the attention probability p distribution becomes more diversified and tends to include the major dot product pairings in the head area of the long-tailed self-attention distribution.The N w queries with the highest sparsity scores are selected, and only the dot product results of the N w queries and keys are computed, and then the attention results are retrieved where Q w and q w are sparse matrices of the same size, N c L = × ln w Q w , and d 0 is the input dimension.Self-attention distilling is employed to reduce the number of network parameters and dimensions.Through the distilling operation, a concentrated selfattention feature map is generated in the subsequent layer, prioritizing dominant features.The distilling operation from layer j to layer j + 1 is as follows: ))), t w j t j w w AB ( +1) w w (12)  where [•] A B w w contains the key operations of multihead probSparse self-attention and attention block; The Conv1d w operation represents a one-dimensional convolution applied to the time series with a kernel width 3. The ELU w activation function is used in this operation.
Generative style decoder optimizes the decoder so that the model predicts all outputs directly through a forward process without the need for step-by-step dynamic decoding.In addition, it alleviates the problem of speed dips in long time prediction.The model offers the vectors that the decoder will accept ∈ ( )

| DATA SETS FOR PREDICTING WIND POWER
Effective data analysis and processing techniques are crucial for accurate prediction outcomes, in addition to the model itself.They can always have a direct or indirect impact on the prediction effect.The data sets utilized in this experiment are thoroughly described in this section.The study employed two data sets, named Data_CQ and Data_DL.The Data_CQ is used to study the historical wind power prediction of multiple adjacent wind turbines.On the other hand, the Data_DL is employed for investigating the prediction of wind power based on wind speed, wind direction, and ambient temperature.In addition, historical wind power data need to be standardized, missing values need to be filled, and the data set needs to be reasonably divided.We can better comprehend the data and gather information from these analyses to obtain the outcomes of our experiments.

| About Data_CQ
The Data_CQ is derived from historical wind power data collected between September 21, 2020 and October 7, 2020, from a real wind farm in Chongqing, China, with a 1-s data interval.The data set consists of historical data from 10 wind turbines, each with 1,048,575 time points of data, so the whole data set can be viewed as a matrix of 1,048,575 × 10. Figure 5 displays historical wind power data.There must be some correlation among the wind resources in nearby places because atmospheric motion is continuous.There are usually several wind turbines on a wind farm.These wind turbines have a high degree of F I G U R E 5 Historical wind power data of Data_CQ.
consistency due to their similarity to weather, climate, parameter size, and the operator's influence.Therefore, there is a connection among the wind power generation of neighboring turbines under the same wind farm.In Table 2 the data are systematically described, and the data distribution for the wind farm is described by recording its minimum, maximum, mean, median, and plural values.The data set's minimum, maximum, mean, median, and plural are 0.08, 14.02, 4.7158, 4.36, and 2.72 MW, respectively.
Figure 6 depicts box plots that were employed to visualize the distribution of the original data before preprocessing.It is evident from the figure that the data exhibits a uniform and stable distribution.

| About Data_DL
The Data_DL data set consists of historical wind power data from a wind turbine in Dalian, China.The measurements were taken at 10-min intervals and covered the period from January 1 to December 31, 2014.There are 52,068 records, each containing wind speed, wind direction, ambient temperature, and wind power while it was operating.We have used these features to investigate wind power prediction.Figure 7 displays historical wind power data.In addition, Table 3 comprehensively describes the Data_DL data set.
We utilized the Pearson correlation coefficient (PCC) to evaluate further the impact of wind speed, wind direction, and ambient temperature on wind power.Based on the PCC heatmap analysis of the Data_DL data set in Figure 8, the following conclusions can be drawn: The PCC between wind speed and wind power is 0.81, indicating a highly significant positive correlation between the two variables.The correlation is extremely strong, suggesting that wind power exhibits a noticeable increasing trend as wind speed increases.The PCC between wind direction and wind power is 0.73, indicating a significant positive correlation and a relatively strong linear relationship.This implies that specific wind directions may be associated with higher | 3845 wind power.The PCC between ambient temperature and wind power is 0.63, indicating a moderate positive correlation.This suggests a certain degree of correlation between an increase in ambient temperature and an increase in wind power, although the correlation is not as strong as that between wind speed and wind direction with wind power.Therefore, changes in wind speed, direction, and ambient temperature have varying degrees of impact on wind power.Wind speed exhibits the strongest correlation, followed by wind direction, while the correlation with ambient temperature is relatively moderate.

| Preprocessing of data sets
Since different wind turbine data have different fluctuations, it can cause numerical problems when the data volume is too large.The historical wind power data used in this study are normalized to increase convergence rates and simplify the optimization process in the GCNInformer model.By standardizing the data and creating an initial feature set, the gradient descent algorithm is accelerated, leading to faster convergence toward the ideal solution.This normalization step aids in enhancing the model's performance.In this paper, the Z-score standardization method 43,44 is used to normalize the data.This method involves centering the data by subtracting the mean and scaling it by dividing it by the standard deviation of the original data.The processed data has a mean value of 0, and it is distributed normally, with a standard deviation of 1.This standardization technique helps to bring the data to a common scale and facilitates the comparison and analysis of different variables.The Z-score standardization transformation formula is as follows: where m w is the historical power generation data of the wind turbine, μ w is the mean value of the sample data, and μ w is calculated as shown in Equation ( 10), σ is the overall sample data's standard deviation, and the following is the standard deviation formula:  during the real operation of a wind farm.This can result in missing data from the wind turbine at a certain time.Also, to address the problem of missing data, this paper uses the mean fill method for missing data.Since the amount of anomalous data is small in comparison to the entire data set, the impact of data anomalies is negligible.

| Division of data sets
The commonly available methods for data set partitioning include hold-out, cross-validation, bootstrapping, and others.Considering the temporal correlation present in the Data_CQ and Data_DL data sets, earlier data contribute significantly to model training, while more recent data are crucial for accurate future predictions.Hence, in this study, the hold-out method was employed to partition the data set after normalizing into three subsets: the training set, validation set, and test set, following a temporal order with a ratio of 7:2:1.In comparison to other data set partitioning methods, the usage of the hold-out method enables better simulation of realworld applications.By partitioning the data set based  The evaluation metrics and parameter selection are briefly introduced in this section.Then, we compare the model proposed in this paper with previous models.Finally, we show the power prediction curves of the model on other wind turbines.

| Evaluation metrics
The accuracy of the hybrid model's predictions is evaluated quantitatively via various widely used error evaluation criteria, including mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and mean square error (MSE).The following formulas are used to calculate these errors: where n w stands for the total number of predicted wind power, y ˆwi for the predicted wind power, and y wi represents the actual value.
MAE is a metric used to quantify the average absolute discrepancy between the predicted and true values.It provides a measure of the magnitude of errors without considering their direction.MAPE is a modification of MAE that addresses the impact of data range sizes.It assesses the percentage error between the true and predicted values, providing a relative measure.Precision can be evaluated using RMSE, the square root of MSE.MSE is a metric frequently used to gauge how well a regression model works.The average squared difference between the anticipated and actual values is quantified.Regarding accuracy and precision in forecasting the target variable, the model performs better when all four metrics (MAE, MAPE, RMSE, and MSE) have lower values.

| Experimental environment
The experimental code in this paper is implemented in a Python 3.7 environment using the deep learning framework PyTorch 1.8.The experiments were conducted on a PC with a Windows 10 operating system (AMD Ryzen 7 5800H with Radeon Graphics CPU 3.20 GHz, 16 Gbytes of RAM; NVIDIA GeForce RTX 3070 Laptop GPU).

| Hyperparameter selection
In this experiment, seven hyperparameters are set, as shown in Table 4. Two hundred iterations are performed for each epoch throughout the training process.Our experiments calculate the ultimate loss value for each epoch using the average loss value across 200 iterations.To get the final predictions, the model is tested against the test set.The model uses the Adam optimizer, MSE loss, and Grue activation functions.The Adam algorithm adapts the learning rate and does not require smoothing for the objective function, making it suitable for handling noisy samples.We employ a batch size of 64 and use techniques like early termination and reduced learning rates to prevent overfitting.Therefore, the experimental results have a certain randomness.Early termination refers to stopping the training when the model's performance on the validation set declines.Every three epochs are used as a cycle to save the optimal model parameters for the current situation.The parameters from the previous iteration results are utilized as the model's final parameters when training is ended when a worse validation set performance is noticed 15 times.This paper uses the control variable method to select the hyperparameters.After many experiments, the results show that selecting the above parameters is the optimal solution for the model.

| Comparison of GCNInformer with previous models
To evaluate its performance, we compared the CNNInformer, 45 GCNTransformer, Informer, Transformer, LSTM, and RNN algorithms to the GCNInformer model.This thorough comparison enables us to assess how effectively the proposed GCNInformer model compares to these current models.These algorithms were applied to conduct experiments on the Data_CQ and Data_DL data sets.
In the data set of Data_CQ, all seven models predict the future 30-s wind power averages using 12 min of historical wind power data, as shown in Figure 10, which offers some of the prediction results of the GCNInformer, CNNInformer, GCNTransformer, Informer, Transformer, LSTM, and RNN models applied to the wind turbine 1 data proposed in the experimental prediction graph.Among them, the proposed model in this paper has the best performance, which is slightly higher than CNNInformer.In contrast, LSTM and RNN have poorer performance, which differs significantly from the proposed models (GCNInformer), CNNInformer, GCNTransformer, Informer, and Transformer.The experimental results for the proposed models (GCNInformer), CNNInformer, GCNTransformer, Informer, Transformer, LSTM, In the Data_DL data set, all seven models predicted the wind power for the following 6 h using 24 h worth of historical wind power data.Figure 11 shows the graph displaying the predicted errors of the GCNInformer, CNNInformer, GCNTransformer, Informer, Transformer, LSTM, and RNN models applied to the Data_DL data set.The diagram indicates that the GCNInformer model exhibits the slightest fluctuation in predicted error while the RNN model fluctuates the most.Detailed results are documented in Table 6.The best results for each of the seven models mentioned in Table 6 are denoted in bold.
According to In summary, by applying a certain level of graph convolutional network on the original wind power generation sequence, the model's predictive performance can be improved.GCN can handle data with a generalized graph topology and delve into its features and patterns.This approach allows for the exploration of relationships among multiple adjacent wind turbines and the relationships among multiple features of a single turbine, such as wind speed, wind direction, ambient temperature, and wind power.In both cases, a graph structure is constructed, and GCN is used for information transfer and feature fusion on the graph.This process helps uncover more associations among the data, which is then used in the Informer for wind power prediction, thereby enhancing the predictions' accuracy, reliability, and efficiency.GCNInformer, CNNInformer, GCNTransformer, Informer, and Transformer.For better observation, we intercept some of the consecutive prediction results.We can observe from the figure that GCNInformer performs superbly for wind turbine 1 and for predicting the outcomes of other wind turbines located within the same wind farm.This observation indicates that GCNInformer possesses strong generalization capabilities.Furthermore, Tables 7-10 present the MAPE, MAE, RMSE, and MSE values for the six turbines (Nos.2, 4, 6, 8, 9, and 10) under the seven models.

| Comparison of the proposed model
Studying new methods for wind power prediction is crucial to enhance its accuracy and reliability, which in turn contributes to ensuring the safe operation of the smart grid.Research on wind power prediction today is no longer limited to model and algorithm optimization and combination.It focuses more on mining and extracting the features and laws included in them from multisource, multidimensional, and multimodal wind power data.The wind power of the same wind farm exhibits correlation across multiple adjacent time points on the time axis.Within the same time section, the power of multiple adjacent wind turbines also has a specific correlation.The present study proposes a combined deep learning model called GCNInformer, which integrates GCN and Informer.First, we employ a latent correlation layer to transform the data into a graph structure.Subsequently, the GCN module is utilized to establish potential relationships among multiple adjacent wind turbines or to capture the correlations between multiple features of an individual turbine, finally, through the Informer module for long-term series forecasting.In addition, to further improve wind power prediction, future studies will explore the spatiotemporal correlation of wind power clusters more deeply and explore the spatiotemporal correlation of wind turbines among multiple neighboring wind farms.By examining wind power data from various dimensions, we can enhance wind power forecasts' effectiveness, accuracy, and reliability.

Figure 2
Figure 2 shows the latent correlation layer, which employs a self-attention mechanism to automatically learn correlations among multiple time series.This layer helps capture the underlying relationships and dependencies within the data.The historical data ∈ B R N T × w from 10 wind farms are first supplied to the latent correlation layer, and then the hidden state corresponding to each timestamp t are computed in turn, where the correlation weight matrix W R ϵ w N N × w w can be obtained from the data and the graph structure G W B = ( , ) w 0 is automatically inferred.The specific calculations are as follows:

F
I G U R E 1 Overall structure of the proposed model.GCN, graph convolutional network.the information.Figure 3 displays the network structure diagram for this part of the model.It visually represents the architecture and connections of the network components involved in this section.

. 4 .
The shared parameters such that θ θ − = 0 1 .The final equation can be organized as follows: Due to various circumstances, including equipment failure, communication interference, and staff operation errors, some missing wind power data may be created F I G U R E 7 Historical wind power data of Data_DL.T A B L E 3 Description of Data_DL.
on time, we ensured that the test set contained data from future time periods, allowing for a more accurate assessment of the model's performance in future predictions.The division of the historical wind power data set is shown in Figure9.In the Data_CQ data set, the training set encompasses 734,002 data points spanning from September 21, 2020 to September 29, 2020.The validation set comprises 209,715 data points from September 29, 2020 to October 3, 2020.Lastly, the testing set encompasses 104,858 data points from October 3, 2020 to October 7, 2020.In the Data_DL data set, the training set consists of 36,447 data points, spanning from January 1, 2014, 00:00:00, to September 13, 2014, 12:50:00.The validation set comprises 10,414 data points, covering the period from September 13, 2014 to November 25, 2014, 00:40:00.Finally, the test set includes 5207 data points, spanning from November 25, 2014 to December 31, 2014, 23:50:00.The training set is used to optimize the model's parameters.The validation set is then used to evaluate the model's performance, and the evaluation results are utilized to adjust the model for the best performance.The test set is then used to test and validate the model that had the most outstanding performance on the validation set.

F
I G U R E 8 Pearson's correlation coefficient heatmap analysis of the Data_DL data set.POWER PREDICTION

F I G U R E 9
Division of historical wind power data sets.

Figure 12
Figure 12 displays the forecast curves for the wind power of the Data_CQ's wind turbines 2, 4, 6, 8, 9, and 10 from the Description of Data_CQ.
T A B L E 2 F I G U R E 6 Data box diagram of Data_CQ.WANG ET AL.

Table 5 .
The best results for each of the seven models mentioned in Table5are denoted in bold.According to Table 5, MAPE, MAE, RMSE, and MSE for GCNInformer are 0.0411, 0.1035, 0.1540, and 0.0237, respectively.The MAPE, MAE, RMSE, and MSE values for Informer are 0.0505, 0.1172, 0.1708, and 0.0292, respectively.
F I G U R E 10 Curves for applying Data_CQ's prediction results.

Table 6 ,
the proposed model achieves a MAPE, MAE, RMSE, and MSE of 0.1279, 0.0672, 0.0855, and 0.0073, respectively.On the other hand, the CNNInformer model obtains MAPE, MAE, RMSE, and MSE values of 0.1564, 0.0737, 0.0965, and 0.0093, respectively.Compared to T A B L E 5 Metrics of seven models of Data_CQ.Metrics of seven models of Data_DL.CNNInformer model published in 2022, the proposed model demonstrates lower evaluation metrics and better predictive performance.Additionally, the GCNInformer model outperforms GCNTransformer, Informer, Transformer, LSTM, and RNN models.The consistent experimental results with the Data_CQ data set indicate that the GCNInformer exhibits stable and reliable performance across different data sets, enhancing its generalization capability.Moreover, it can predict wind power for multiple adjacent wind turbines and a single wind turbine with multiple features.
Abbreviations: CNN, convolutional neural network; GCN, graph convolutional network; LSTM, long short-term memory network; MAE, mean absolute error; MAPE, mean absolute percentage error; MSE, mean square error; RMSE, root mean square error; RNN, recurrent neural network.F I G U R E 11 Curve of the forecast results of Data_DL.T A B L E 6Abbreviations: CNN, convolutional neural network; GCN, graph convolutional network; LSTM, long short-term memory network; MAE, mean absolute error; MAPE, mean absolute percentage error; MSE, mean square error; RMSE, root mean square error; RNN, recurrent neural network.the Abbreviations: CNN, convolutional neural network; GCN, graph convolutional network; LSTM, long short-term memory network; MAE, mean absolute error; RNN, recurrent neural network.T A B L E 9 RMSE comparison of the seven models.Abbreviations: CNN, convolutional neural network; GCN, graph convolutional network; LSTM, long short-term memory network; MSE, mean square error; RNN, recurrent neural network.
Abbreviations: CNN, convolutional neural network; GCN, graph convolutional network; LSTM, long short-term memory network; MAPE, mean absolute percentage error; RNN, recurrent neural network.Abbreviations: CNN, convolutional neural network; GCN, graph convolutional network; LSTM, long short-term memory network; RMSE, root mean square error; RNN, recurrent neural network.T A B L E 10 MSE comparison of the seven models.