Multivariable neural network to postprocess short‐term, hub‐height wind forecasts

This work introduces a novel error correction method for short‐term, hub‐height wind speed forecasting systems aimed at power output prediction. We present a multivariable neural network that is trained to reduce the error in wind speed predictions out of a numerical weather prediction (NWP) model, by exploiting hidden information in additional atmospheric variables, that is, wind direction, temperature, and pressure. The unique layout of the network was influenced by that of denoising autoencoders, and their ability to learn mapping functions. The predicted values from the NWP model, which incorporate errors due to numerical discretization, inaccuracies in initial/boundary conditions and parametrizations, complex terrain features, etc., are mapped to a more accurate prediction in which the errors have been reduced. To show the performance of the proposed model, training and validation are carried out with 4 years of forecasted and observed data for fifteen sites in a wind farm in Awaji Island, Japan, in a challenging zone with complex topography and therefore complicated, highly fluctuating wind patterns. Moreover, a single variable (i.e., wind speed) network is also implemented in order to expose the contribution and usefulness of including additional atmospheric variables. The results show a considerable reduction in the root mean square error as well as an increase in the correlation coefficient. As expected, it is found that multiple meteorological variables as inputs offer a huge advantage when compared with the equivalent single‐variable correction method.


| INTRODUCTION
The negative effects of environmental problems such as climate change, pollution, and energy-security have increased the pressure for cleaner and more efficient renewable energy sources. 1 Wind power is a fundamental part of the transition to renewable energy 2 ; in 2017, an estimated 17% of all renewable-generated electricity worldwide came from wind sources, 3 and wind energy corresponded to approximately 23% of all the renewable energy production capacity worldwide. 4 The potential for the exploitation of this resource is enormous; it is estimated that by 2050 it will be able to supply approximately 4.4 TW, which corresponds to roughly 37% of all of the end-use power supply in the world. 1 In Japan, wind energy enjoys less protagonism. Despite having a reported potential of as much as 1900 GW 5 , as of 2017 the installed capacity was only of approximately 3.4 GW. 3,4 In that same year, the output from wind power sources was roughly 3.7% of all renewable-generated electricity, 3,4 and merely 0.6% of all generated electricity. 3 In recent years, and particularly after the nuclear power supply was drastically cut due to the Fukushima disaster, the transition to renewable sources has gained momentum and support from both the private and public sectors. 5 However, fossil fuels still represent more than 80% of all energy sources. 3 The daily output of a turbine depends not only on the cube of the wind speed, 6 but also on the periods when the wind speed is between cut-in and cut-out values. Grid scheduling and electricity trading may also be affected because of the uncertainty and volatility of wind and therefore of the generator output. 7,8 Moreover, other aspects of operation such as estimation of the life span of turbines and scheduling of maintenance also require reliable wind speed forecasts. 9 That is, optimizing the use of wind energy and integrating it into existing grids relies on accurate knowledge of the future wind conditions, especially in regions with complex topographic features. Such is the case of Japan, whose geography presents additional challenges due to steep, mountainous terrain that modifies the wind profile, introduces turbulence, and causes flow separation and/or flow turning. 10,11 In general, wind prediction models can be grouped into statistical and physical models. 12 Physical-based models, or numerical weather prediction (NWP) models, are able to simulate the real state of the atmosphere by using known physical laws. Usually, these models perform better in forecasting upper atmospheric patterns, but errors and biases increase as one gets closer to the surface. 13 On the other hand, pure statistical methods obtain their forecasts from the analysis of recorded historical data and rely mostly on regression techniques. 14 Statistical methods are better at handling local weather patterns and close to the surface patterns, but their performance decreases rapidly as the number of simulated hours increases. [14][15][16] Most implementations of statistical models are in the form of a postprocessing step, to correct and enhance the results of the dynamical NWP model. 14 Kalman filters, 17 analog schemes, 18,19 Ensemble Model Output Statistics, 20 and quantile regression 21 are among the most common methods for statistical postprocessing of weather forecasts. In recent years, machine learning (ML) and artificial intelligence techniques have also been developed, due to the abundance of forecasted and observed data. Traditional postprocessing methods have been improved by the inclusion of ML components. In the context of ensemble forecasts, Ref. 22 replaced the link functions in a distributional regression model with a neural network able to predict the distribution parameters from a set of predictors, and applied it to 2-m temperature forecasts. The quantile regression method, originally developed by Bremnes,21 was enhanced by Cannon 23 with the development of a monotone composite quantile regression neural network; and again by Bremnes 24 by using neural networks to estimate the coefficients of Bernstein basis polynomials, after which the quantile function is constructed. On the other hand, Taillardat et al 25 developed instead a random forest generalization for quantile regression, first as a nonparametric approach and posteriorly extended within a semiparametric model. 26 Finally, Chapman et al 27 introduce the use of convolutional neural networks for the correction of gridded, two-dimensional integrated vapor transport forecasts, based on image processing techniques and denoising autoencoders.
For the particular case of wind speed, most ML-related applications have been on prediction based solely on past time series. A common approach would consist of a regression ML model, 16,[28][29][30] often in combination with an error correction or optimization component. 31,32 These implementations lack an NWP component, and therefore, their performance is limited to ultra-short-term applications, rendering them unusable for reliable wind power assessment or wind farm arrangement.
On the other hand, attempts at employing ML for error correction include the use of nonlinear autoregressive exogenous networks, 33 Markov stochastic processes, 34 sequence transfer algorithms, 35 polynomial neural networks, 15 support vector machines, 9 and bidirectional gated recurrent unit neural networks. 8 All of these methods require the output from an NWP model in either deterministic or ensemble form.
Nonetheless, we consider that the literature regarding ML for wind speed error correction is still very limited, and it deserves further exploration. Most literature is not focused on day-ahead wind speed forecasts at hub-height level, which is of crucial importance to the wind farm arrangement. Moreover, just a few of these techniques have been validated in zones with highly fluctuating wind, an important point for the case of Japan where complex topography results in complex wind profiles. Finally, to the best of the authors' knowledge, no ML-based correction method has considered the inclusion of other weather variables to improve in the correction process, despite being common practice in classical correction methods such as model output statistics (MOS), and despite the versatility of ML algorithms, especially neural networks.
Given these considerations, we present a novel postprocessing correction method for short-term, hub-height (80 m above ground) wind speed predictions from the Weather Research and Forecasting (WRF) model. We propose a neural network architecture of which the inputs are the raw forecasts of multiple variables from the WRF model, and the output is an improved wind speed forecast. The main motivation behind the use of neural networks is their capabilities to recover nonlinear relationships and dependencies in complex geophysical processes, 36 as well as to learn from historical data, making them ideal for localized prediction and correction. Similarly to Ref.,27 the proposed method in this paper is also inspired by the working idea behind denoising autoencoders.
The main contributions of the framework presented in this paper are summarized in the following points: • To the best of the authors' knowledge, this method is the first to employ a denoising autoencoder-based architecture to correct short-term wind predictions. No complex decomposition of data is required, and no ensemble results are needed. The network layout is demand-oriented: scalable and modifiable for time series of different lengths, or a different number of input variables depending on the available data. • It is the first method to combine multiple weather variables and neural networks to correct wind prediction from an NWP model to the best of the authors' knowledge. We demonstrate that the error correction for a single variable (wind speed) can be improved significantly by providing the network with additional information about the atmosphere's state (wind direction, temperature, and pressure). We show that the network can find hidden information in these additional variables by evaluating the final weights of the trained network. • The model is validated with data from fifteen sites in an island of complex geographic features with highly fluctuating winds. Each site is at different elevations and/or distance from the shore (as shown in Figure 1B), meaning that each site is influenced by different local wind patterns due to topographic interference. We show that the trained networks are able to learn from these patterns and succeed at reducing the error in all fifteen sites.
The rest of the paper is organized as follows. In Section 2, the information of the physical model and dataset is described. Section 3 shows the details of the proposed multivariable neural network architecture. Section 4 analyzes the performance of our proposed method, and a brief discussion is given. The paper ends with concluding remarks in Section 5.

| PHYSICAL MODEL AND DATA
This section describes the characteristics of the NWP model used to obtain the forecasts. Throughout this paper, we refer to the direct output from the NWP model as the raw or uncorrected forecast. This section also explains the collection of observed data at hub-height that is used to build and validate the performance of the correction models.
The high-resolution meteorological model chosen in the study is the WRF model with Advanced Research WRF (ARW) version 3.9.1. 37 It is a regional model based on a fully compressible and nonhydrostatic dynamic core and can be driven with proper initial/boundary conditions from other large-scale models (e.g., Global Forecast System [GFS] in this study). The forecasting domain was configured by following the recommendations in. 38,39 As shown in Figure 1A, one parent (D01) and three nested domains (D02, D03, and D04) with a horizontal resolution of 13.5, 4.5, 1.5, and 0.5 km, respectively, are chosen. There are 35 vertically stretched eta levels for all domains, 10 of which are within the lowest 1 km. The main physical options we used include WRF Single-Moment 6-class (WSM6) microphysics parameterization scheme 40 ; the new Kain-Fritsch convective parameterization 41 ; the Noah land surface model 42 ; Dudhia shortwave radiation 43 and Rapid Radiative Transfer Model longwave radiation 44 ; and the Asymmetric Convective Model version 2 planetary boundary layer scheme. 45 GFS forecasting data from the National Center for Environmental Prediction are chosen as the initial and boundary conditions for the regional high-resolution model. The data period is from 2016-01-03 to 2020-06-01 with a 6-hour interval, and a horizontal resolution of 0.5 × 0.5 degrees for all variables. The topographic data were fetched from the US Geological Survey global 30 arc-s elevation (GTOPO30) dataset for all domains.
Predictions are obtained for the same period (2016-01-03 to 2020-06-01). The WRF model is re-initialized as a "cold-start" at 18:00 UTC (03:00 JST) each day and each re-initialization runs for 30 hours. Outputs are produced at 1-hour interval; however, the data for the first 6 hours corresponding to the spin-up time are discarded. 37 Hence, the final obtained time series is for the remaining 24 hours between 00:00 UTC (09:00 JST) and 23:00 UTC (08:00 JST).
The nacelle-based wind observations from 15 turbines of the wind farm of interest (red triangle in Figure 1), which is located in the south Awaji area, are used to build and evaluate the performance of the proposed model. Observed data are available every 10 minutes for the same period of obtained GFS data. The data have been quality-controlled to some extent based on the standards addressed in 46 to flag out any unreasonable data points. Daily mean wind speed variations and the annual frequency histogram of wind speeds at the wind farm of interest are presented in Figure 2.

FORECASTS WITH NEURAL NET WORK S
In this section, we present the architectures for our correction method, the logic behind the design, and the training details. We start with a short explanation of the concept of denoising autoencoders, after which our methods were inspired.

| Denoising autoencoders
Denoising autoencoders are a particular architecture of neural network, which can remove noise from the input data. The idea consists of a network that maps a vector of noisy data x into a reconstructed vector x. The network learns the mapping during training by optimizing the parameters of the network to minimize the difference between the clean, noiseless data x, and the reconstruction x given a cost function L. The input and the output layers of a denoising autoencoder are of the same dimensions, and the hidden layers are usually of reduced dimensionality. A sketch of a denoising autoencoder is shown in Figure 3A.
This mapping from corrupted to uncorrupted data can be visualized as learning a manifold. 47,48 We assume the real information x lies in an abstract mathematical space, or manifold, and that the noisy information x lies outside of this manifold. Hence, the neural network learns the distribution p(x|x) and uses it to bring x to the more likely position x, as shown in Figure 3B.
In this paper, we consider the 24 Therefore, we consider s to be a noisy or corrupted version of s so that by using a network f based on Figure 3A, the corrupted forecast can be mapped to the more accurate estimation ŝ, where W are the parameters (weights and biases) of the network. We choose to perform the correction for all 24 values at the same time, so that any temporal relations can be learned by the network. Moreover, if there are any daily periods of high, fluctuating wind where the error might be higher than the daily average, the networks would, in principle, learn to recognize it as well. Finally, since we are training a different network per each site, each network should be able to learn these periods even if they occur at different times of the day in each location.
Although the idea of denoising data with neural networks was originally introduced in 49 and, 50 modern denoising autoencoders (presented in 47 and described in this section) were developed with the distinct intent of finding and extracting meaningful features from data. Naturally, modern denoising autoencoders have been used on actual image and audio denoising tasks such as in 51 and. 52

| Neural network architecture
We firstly present a single variable network that directly resembles a denoising autoencoder and it is shown in Figure 4A. The input and output layers consist of 24 neurons for s and ŝ , respectively. The only hidden layer has 23 nodes since all of the performed tests showed that a slight reduction in dimensionality in the hidden layer contributed to enhancing the correction. All layers are fully connected; thus, the output z n from the nth node in the hidden layer is where is an activation function, w n are the weights connecting the input nodes and node n, and b is a bias term. The chosen activation functions for the hidden layer and the output layer are ReLU and sigmoid, respectively, as shown in Table 1. We denominate this network as NN-S, where "S" stands for wind speed.
We hypothesize that feeding additional information about the state of the atmosphere into the network will greatly enhance its correction skill. With this idea in mind, we present F I G U R E 3 A, A sketch of a denoising autoencoder. Note that the hidden layers have less nodes than the input and output layers. B, The correction process (Adapted from 47 ). The manifold is presented by the thick black line and the corrupted inputs x by the red dots. The autoencoder displaces each corrupted input x closer to the learned manifold.
x and x stand for the noiseless data and a reconstructed vector, respectively a second network where the input consists of the wind speed forecast s, and also the forecasts for other atmospheric variables, namely wind direction d , pressure p, and temperature t . These extra input variables were chosen on the assumption that they are correlated to the variable of interest. 18 Moreover, this same set of variables is used in other prediction/correction methods where a more comprehensive state of the atmosphere is needed as input. 18,19 The predictions of these variables are extracted from the same WRF model run as the wind speed prediction, for the same time period.
The architecture of this network is shown in Figure 4B. It has an input layer with 96 nodes for the 24-hour forecast of the four variables, two hidden layers with 24 and 23 nodes, respectively, and an output layer with 24 nodes. The most important feature in this network is that the first hidden layer is not fully connected to the input layer, but rather locally connected. Each node in the first hidden layer is connected only to the input nodes that receive the values for a particular hour. In other words, the nth node in the hidden layer is connected only to the input nodes for the predicted wind speed s, wind direction d , pressure p, and temperature t at the nth hour. Thus, the output z n from the nth node in the hidden layer becomes We denominate this network NN-SDPT, where "SDPT" stands for speed, direction, pressure, and temperature. A more detailed description of this network also can be found in Table 1. During training, the network will adjust the connecting weights according on the contribution of each input variable to the correction process. We show in Section 4.2 that the nodes corresponding to the predicted wind will exhibit the largest weights after training, as it is to expect. Each other extra variable contributes in a smaller extent, but their overall contribution results in a significant improvement over network NN-S (as shown in Section 4). The choice of a locally connected layer instead of a fully connected has the intent of obtaining a regularizing effect, as described in. 53 By choosing to connect only certain nodes between the two layers, we are hard wiring our prior knowledge of the relationships in the data into the network and, at the same time, restricting the weights search space so that the desired weight configuration in the first layer becomes more accessible.
To prove the advantages of a locally connected layer, we have compiled in Table 2 the summarized results of the validation tests carried out to determine the adequate network architecture. We present network NN-SDPT labeled as (A), an equivalent version using a fully connected layer (B), and also a pair of networks where both locally connected and fully connected versions have 24 nodes in the middle layer (C and D, respectively). Each network was trained for all 15 sites using 10-fold cross-validation, and the values in Table 2 represent the average improvement of each metric across all sites and all folds. In all cases, early stopping was used with a patience parameter of 200 epochs, and the network at the epoch showing the lowest validation loss was chosen.
The results show that for both number of nodes, the locally connected versions (A and C) perform better than each of their fully connected counterparts (B and D), with network A being the overall best. To further corroborate the performance of the locally connected version, we carried out a Wilcoxon signed-rank significance test between networks A and B, using all available 150 pairs (15 sites and 10 folds). The test concluded in that there is a significant difference between the values of A and B, with a test statistic z = −2.60 and a P-value P < .01 for root mean square error (RMSE), as well as for CC (z = 2.04, P < .05) and mean absolute error (MAE; z = −2.18, P < .05). Note that the results in Table 2 correspond to the average of validation sets, and therefore differ those shown in Section 4.
The remaining hyperparameters such as total number of layers and number of nodes in each layer were determined in an empirical trial-and-error basis until satisfactory performance was obtained.

| Training details
For the training of the networks, the dataset was divided such that 90% is used for training and validation, and 10% is used for testing. This corresponds to the periods of 2016-01-03 to 2019-12-24 for training and validation, and 2019-12-25 to 2020-06-01 for testing. The training was done in a 10-fold cross-validation fashion, and the best fold was taken for each site. We train a different network for each site so that each network can learn local patterns or intricacies in the wind behavior that are important for the correction process. A total of 30 networks are trained, all of them with Adapted Momentum Estimation (ADAM) as optimization 54 and mean square error as the loss function L. All inputs are scaled to be between 0 and 1 before feeding them to the networks.

| Performance of NN-SDPT
In this section, we present results that show the performance for the 15 NN-SDPT networks and also the comparisons with the 15 NN-S networks. We assess the correction skill of the networks by comparing statistical parameters, specifically RMSE, MAE, and correlation coefficient (CC): where i is the time point and N is the total number of verification time points. f and o represent the forecasts (both raw and corrected) and observations, respectively. The bar denotes the mean of the corresponding variable.
As a baseline to assess the neural network's correction skill, we have also included results from the MOS approach 55,56 in some of the figures. MOS is a family of techniques that corrects the numerical predictions by finding statistical relationships between the NWP's output and observed and/or additional model data. For the MOS implementation shown in this paper, the predictors are the same four weather variables as in NN-SDPT and the observed wind speed values were used as predictands to fit the model. The parameters were determined by using 1 year of past data.
All results shown in this paper correspond to the testing period from 2019-12-25 to 2020-06-01. Values for RMSE, MAE, and CC are shown in Figure 5, Tables 3 and 4, respectively. The "Raw" label corresponds to the parameter calculation between the observations versus the uncorrected WRF wind speed forecasts, while "NN-S" and "NN-SDPT" correspond to the observations versus the forecasts corrected with each method, respectively. The percentage of improvement relative to the raw forecast is also shown for each parameter. Figure 5 shows the RMSE values before and after correction, as well as the percentage of improvement. For all 15 sites, both networks are able to reduce the RMSE in the testing period. This behavior was to be expected since the loss of function that the network was trained to minimize was the mean square error. The parameters MAE and CC are also improved in all 15 sites. Furthermore, all metrics regarding the correction with the multivariable network, that is, NN-SDPT, are superior to those of the single variable NN-S. This indicates that NN-SDPT is indeed capable to extract additional information from wind direction, pressure, and temperature forecasts that help in correcting the wind speed variable.
Specifically, for the correction with NN-S, the percentage of improvement of the RMSE metric ranges from 6.86% in site 5 to 17.63% in site 1, with an average of 11.55%. By using NN-SDPT, the relative improvement increases with values ranging from 8.76% in site 4 to 26.55% in site 15, with an average of 16.52%. Despite NN-SDPT performing better than NN-S in all sites, the sites where NN-S performs the worst (sites 4, 5, and 13) are also the sites where NN-SDPT performs the worst. A closer look at Figure 5 and Tables 3 and 4 reveals that the raw predictions for these sites are much better than other sites (e.g., site 5 has the lowest raw RMSE and MAE, and the highest CC), so it can be implied that in these sites there is little room for correction when compared to the other sites. Figure 6 (lower panel) shows the RMSE values calculated for each of the 24 simulated hours, for all sites. The metric for an additional correction with MOS is included for reference (orange dashed line). The RMSE for the raw predictions, displayed with the solid red line, shows small values during the first hours of the simulation, and it grows steadily until its peak at the 12th simulation hour (20:00 JST). The boxplots in the upper panel correspond to the observations at each respective hours, and they show that there is a period of change to strong winds which extends from the 9th hour (17:00 JST) 2 N , to the 12th (20:00 JST) and may be the reason of the increased error. A per-site analysis revealed that this high wind speed period exists in all sites, although it is more marked in some sites than in others. After this period, the error decreases slightly before increasing again during the last hours of the simulation. The reduced RMSE after the correction with NN-S and NN-SDPT is shown with the green and blue lines, respectively. The error is largely reduced for all hours and in similar proportions, resulting in a very similar pattern. As  shown in Figure 6, NN-SDPT outperforms NN-S at all times, and this difference in performance is more evident during the last hours of the simulation. The Taylor diagrams displayed in Figures 7 and 8 also show the improvement in the forecasts after postprocessing with the neural networks. Figure 7 displays the overall improvement at all 15 sites and includes the MOS correction as well for reference. Figure 8B shows the displacement of the points for each site individually. As a general pattern, correcting with NN-S reduces the centered RMSE (CRMSE) and increases the CC slightly. However, points tend to move toward zero in the standard deviation axis. This means that a great amount of variability in the data is lost during the correction. On the other hand, correcting with NN-SDPT appears to displace the points in a direction more toward the reference points (not shown in 8a). This results in an even higher reduction in CRMSE and a higher increase in the CC. Nevertheless, after correction with NN-SDPT there is still a slight reduction in standard deviation.
Finally, in Figure 9 we present five cases of corrected and uncorrected wind speed time series, together with MOS correction and observed values. Each case lasts a period of 7 days. The metrics for each case and correction method are shown in Table 5. These cases were selected in order to highlight the advantages of the current method when compared to the raw output and MOS, and also to find the any weaknesses of the method that can be improved.
A and B are cases where the raw forecast incorrectly predicts brief periods of high-speed wind. These periods last approximately 1 day. During these periods, MOS offers moderate improvement and sits mostly between the raw and proposed methods, although in case B, MOS can get closer to NN-S. On the other hand, NN-SDPT successfully corrects the overestimation in A and is able to bring the values closer to the observations in case B, surpassing both MOS and NN-S. Case A also shows a period of sustained high-speed wind lasting the last 2 days of the sample. The WRF model underestimated this period, and out of all the tested methods, it could only be corrected with NN-SDPT, with MOS degrading the results for this period. The RMSE values in Table 5 show that for both cases A and B, all three correction methods offer significant improvement with NN-SDPT being superior to MOS and NN-S.
Case C presents another high-speed wind period lasting approximately 3 days. It is interesting to see that MOS and NN-SDPT behave similarly; NN-SDPT performs better during the first day, but MOS is clearly superior in the next 2 days. Cases D and E show examples where the uncorrected values are already very close to the observations. In these instances, a correction method could easily move the prediction farther away from the real values and increase the RMSE, as   RMSE in all of the selected cases, but the corrected values present less variability and randomness, which is a very important characteristic in the behavior of wind. Quick fluctuations in wind speed that might be correctly predicted by the NWP model might be flattened out by the correction process.

F I G U R E 9
Wind speed observations and forecasts (corrected and uncorrected) for a period of 7 days, for five selected cases. A-E represent the Case A-E displayed in Table 5 2020

| Analysis of NN-SDPT
Traditionally, it has been common to treat neural network models as black boxes and to give priority only to the accuracy of its output. However, analyzing the inner workings of a network can help determine which features contribute more to the network's output. This kind of analysis can also provide new insights into complex, nonlinear physical systems. 57 In this section, we attempt to study the internal state of the network NN-SDPT, particularly the connections between the input and the first hidden layer. As shown in Equation 4, each node in the first hidden layer contains a weight for each atmospheric variable. The higher the magnitude of the weight, the higher the respective variable contributes to the correction process. In order to study the contribution of each variable, we have extracted the final weights from the links between the input and first hidden layer. These are shown in Figure 10 after normalization and averaging for each site. As expected, wind speed has the highest weight in the correction process for all sites except for sites 2 and 4. This means that wind speed is the variable that the network finds the most useful for the correction of wind speed. Although this might seem trivial and obvious from a human perspective, it should be mentioned that the network has no previous knowledge of which variable is being input in each node, and it knows no physical information about the atmospheric processes. The fact that the network precisely figured out the variable that was being corrected suggests that the correction is not happening by pure chance.
In almost all sites, wind speed is followed by temperature, pressure, and, finally, wind direction. It should be noted that this does not necessarily mean that there is a higher correlation between wind speed and temperature than with the other variables. It simply means that the model has found useful information in the temperature time series that improves the correction of the wind speed time series. Although there might be deeper connections and relations in the subsequent layers that we are not taking into account, this brief analysis shows that the correction in each site depends on each variable with different magnitudes, suggesting that each site is certainly subject to very different wind patterns. However, it should be mentioned that any attempt at understanding the weights of a neural network can only be done speculatively, as there can be different interpretations of their meanings. 57

| CONCLUSIONS
This work focused on the challenges presented when forecasting wind speed in regions of complex geographical features where the patterns and behavior of wind are more fluctuating than usual. Intending to improve wind power output prediction in a wind farm in Awaji Island, Japan, we have proposed a novel ML-based correction method, that is, NN-SDPT, to improve the forecasts of wind speed out of the WRF model. The correction process executed by the network is based on extracting underlying information from the predictions of additional atmospheric variables (wind direction, temperature, and pressure), forecasted by the same WRF model. It was observed that NN-SDPT has remarkable capabilities to improve the raw forecasts of the WRF model. The 15-site averaged improvements of RMSE, MAE, and CC are 16.52%, 16.32%, and 13.20%, respectively. Moreover, the multivariable network NN-SDPT was shown to be superior to its single-variable counterpart (NN-S) in all tests and metrics, a fact that highlights the existence of useful, hidden information in other atmospheric variables, and the ability of the neural network to find this information.
Although we have presented these networks in the context of wind speed forecasting and wind power generation, we can say without a doubt that these can be applied for the correction of other atmospheric variables. The layout of NN-SDPT is easily adaptable and can fit in any weather prediction pipeline, as a correction step for results coming from an NWP model. Finally, since each network is trained locally, this method can be implemented wherever observed and predicted data are available.