Improved Very-short-term Wind Forecasting using Atmospheric Classification

We present a regime-switching vector-autoregressive method for very-short-term wind speed forecasting at multiple locations with regimes based on large-scale meteorological phenomena. Statistical methods short-term wind forecasting out-perform numerical weather prediction for forecast horizons up to a few hours, and the spatio-temporal interdependency between geographically dispersed locations may be exploited to improve forecast skill. Here we show that conditioning spatio-temporal interdependency on `atmospheric modes' can further improve forecast performance. The modes are defined from the atmospheric classification of wind and pressure fields at the surface level, and the geopotential height field at the 500hPa level. The data fields are extracted from the MERRA-2 reanalysis dataset with an hourly temporal resolution over the UK, atmospheric patterns are classified using self-organising maps and then clustered to optimise forecast performance. In a case study based on 6 years of measurements from 23 weather stations in the UK, a set of three atmospheric modes are found to be optimal for forecast performance. The skill in the one- to six-hour-ahead forecasts is improved at all sites compared to persistence and competitive benchmarks. Across the 23 test sites, one-hour-ahead root mean squared error is reduced by between 0.3% and 4.1% compared to the best performing benchmark, and by and average of 1.6% over all sites; the six-hour-ahead accuracy is improved by an average of 3.1%.


Introduction
Wind energy is providing record a share of demand for electricity in power systems around the world and this trend is expected to continue in light of global commitments to de-carbonise society [18]. Operating power systems and participating in electricity markets with high penetrations of wind energy demands continuous improvement in wind power forecasting to reduce the impact of forecast errors and uncertainty on economic cost and reliability [28,10]. Here we are concerned with very-short-term forecasts of the order of minutes to hours ahead which are of particular importance to participants in intra-day markets and the balancing function of power system operators [28,16].
On these time scales statistical methods based on time series analysis are generally superior to those based on post-processing numerical weather prediction due to the easy assimilation of new measurements and low computational cost of producing new forecasts [12]. Many time series methods have been employed for wind speed and power forecasting including autoregressive (AR) [25] and autoregressive moving average (ARMA) [8] and various machine learning methods including neural networks [19] and Markov chains [3,30]. Hybrid methods which combine multiple prediction layers or blend forecasts from multiple methods have also been studied and shown to outperform individual methods [9].
Information from spatially disperse measurements can be used to model spatio-temporal dependency and thereby improve forecast skill at all measurement locations [13,15]. Typically, measurements from multiple locations are embedded in a single vector and the temporal evolution of that vector is modelled in a vector autoregressive (VAR) framework [20]. Furthermore, the spatial dependency structure may itself depend on externalities such as season or wind direction [14,7]. However, the number of parameters to be estimated scales with the square of the number of spatial locations making these methods impractical for large problems. More recently, advances have been made in sparse parameter estimation in order to make large scale problems, those dealing with hundreds or potentially thousands of locations, tractable [6,4] In parallel to the development of spatial models, forecasting schemes based on multiple models, each designed for specific conditions, have been proposed. Regime-switching methods have been applied to forecast offshore wind power fluctuations in [26] where the underlying regime is governed by a hidden Markov process. The number of regimes is chosen to be three by expert judgement to reflect the three distinct regions of the wind farm power curve. An adaptive extension to this approach is presented in [27]. A spatial regime-switching model for wind speed prediction is proposed in [14] with fixed regimes based on wind direction at a target location and selected via a cross-validation procedure. More recently, cyclone detection has been used to predict periods of potentially large forecast error in day-ahead wind power forecasting [29], and the EEM Wind Power Forecasting Competition was won with a regime-switching AR method with regimes identified by clustering the previous day's zonal and meridional wind speed measurements from a single location [2]. The large-scale meteorological situation has a clear bearing on spatial dependency but to our knowledge has not been exploited in spatio-temporal time series models for very-short-term wind or wind power forecasting.
For a given region there may be a wide range of possible large-scale meteorological conditions due to variations in the strength and location of synoptic-scale weather features such as extratropical cyclones or high pressure systems [24]. Classification techniques may be used to codify these large-scale atmospheric cir-culation patterns in terms of a relatively small number of distinct modes [17,23] defined based on the fields of mean sea level pressure and geopotential height, for example, for each time instant of interest. Given the length-scale of the synoptic scale features, modes are typically determined on a daily temporal resolution; however, the same methods can be used to classify the patterns on any temporal scale from sub-daily to seasonal depending on the application. Here we develop a regime-switching time-series model for very-short-term forecasting with regimes defined by the atmospheric mode at each forecast issue time, with forecasts issued every hour on a rolling basis.
In this paper, we introduce a conditional regime-switching VAR model with regimes conditioned explicitly on the observed atmospheric mode at the forecast issue time. In Sections 2.1-2.3 the univariate autoregressive model is introduced and extended to the proposed conditional VAR. The identification of atmospheric modes via self-organising maps is described in 2.4. A UK-based case study comprising six years of measurement data is introduced in Section 3 and results are presented in Section 4. Finally, conclusions are drawn and discussed in Section 5.

Forecasting Framework
Consider a the wind speed time series denoted Y = {y 1 , y 2 , ..., y T }. We aim to forecast y t+τ at time t and in order to do so find some function f τ (·) which maps a vector explanatory variables x t onto y t+τ , while minimising some function of the prediction error, which is given by e t+τ |t = y t+τ |t − y t+τ .

Autoregression
For time-series that exhibit serial correlation, such as wind speed measurements, it is reasonable for the vector of explanatory variables to consist of the recent history of y t ,ŷ and for the function f τ (·) to take the form of a weighted sum of p past values plus a constant ν τ ,ŷ This is the familiar autoregressive model of order p, denoted AR(p). The choice of the model order p and estimation of parameters ν τ , α i,τ , i = 0, ..., p − 1 will be discussed in the next section. For the remainder of this section a number of extensions to the AR(p) model specific to wind speed forecasting are introduced. A natural extension of the autoregressive model is the inclusion of exogenous explanatory variables, sometimes denoted ARX. Since wind speed exhibits diurnal seasonality, the time of day is included as a set of dummy variables, denoted d h (t), h ∈ H where H is the set of discrete measurement times appropriate to the temporal resolution of the data, e.g. H = {0, 1, ..., 23} in the case of hourly measurements, and The autoregressive model with exogenous variables, ARX(p), is written where the intercept ν τ is superseded by β h,τ which may be interpreted as timedependent intercepts. Diurnal cycles may be modelled by a variety of other approaches, notably by estimating a smooth function of the time of day and de-trending the data as a form of pre-processing, as in [15], or retaining them in an additive model [11] as in [31]. Here we proceed with dummy variables as they are flexible and easily interpretable as a time-of-day bias correction. Any categorical exogenous variable may be modelled in this way and we also consider ARX models where dummy variables for the current atmospheric mode m t ∈ S = {1, 2, ..., M } are included, with associated parameters γ τ . In this case (6) becomeŝ where the indicator function 1 s (m t ) = 1 if m t = s and 0 otherwise. We conjecture that the dependence of the process Y on atmospheric mode is more complex than the bias correction modelled by a dummy variable and therefore consider switching between ARX models that are mode-specific, such that (6) becomeŝ where each parameter of the ARX model depends the atmospheric mode m t at the forecast issue time.

Vector Autoregression
It is advantageous to consider multiple locations simultaneously in order to capture interdependency among lagged measurements for spatially dispersed sites. This is achieved by extending the AR time series models described above to vector autoregressive models whereby measurements made at time t and N locations and embedded in the vector y t ∈ R N and considering the vector-valued time series Y = {y 1 , y 2 , ..., y T }. The simplest VAR(p) process is written where A i,τ ∈ R N ×N are matrices of parameters, which is the 'vectorised' form of (4). Parameters on the diagonal of A i,τ capture autocorrelation effects and off-diagonal parameters capture cross-correlation. Exogenous variables incorporated along similar lines to givê with the effect of diurnal dummies parametrised by the vector β h,τ ∈ R N , and with the further addition of atmospheric mode dummies parametrised by γ τ ∈ R N . Finally, the model parameters may themselves be dependent on atmospheric mode resulting in a conditional VAR (CVAR) model which, since m t is discrete, is equivalent to a group of M VAR models, one corresponding to each atmospheric mode.

Parameter Estimation
The model parameters B τ,s = A 1,τ,s . . . A p,τ,s β 0,τ,s . . . β 23,τ,s are estimated by minimising some function of the prediction errors on a static dataset. It is useful to define the data matrices Y τ,s and X τ,s of target and and input data, respectively, for forecast horizon τ and atmospheric mode s. Input data is the vertical concatenation of explanatory variables for which a corresponding target variable exists, written for 1 ≤ i < T − τ subject to m i = s. The corresponding matrix of target data is the vertical concatenation of wind speed vectors given by for p + τ ≤ i ≤ T subject to m i−τ = s. The matrix of prediction errors for all sites and times in the dataset corresponding to mode s with forecast horizon τ is given by The parameter matrix B τ,s may now be estimated by minimising an appropriate function of E τ,s .
Ordinary least squares (OLS) is used here for simplicity though different cost functions may be more appropriate for specific forecasting tasks, such as the quantile loss function for non-parametric probabilistic forecasting. The OLS parameters estimates are the solution to which is popular due to its simple solution by differentiation given bŷ and equivalence to maximum likelihood estimation for the special case that the rows of E τ,s are independent and identically multivariate Normal distributed with zero mean and diagonal covariance matrix. The number of samples available for parameters estimation is an important consideration as insufficient training data will result in noisy parameter estimates [20]. As the parameters of the conditional VAR are estimated using only a subset of the available training data corresponding to a specific mode, the size of each subset may become a factor in the quality of the parameter estimates if a large number of atmospheric modes is considered, or if there is only a small amount of training data available for one or more modes.

Atmospheric Classification
The proposed very-short-term forecasting methodology employs the large-scale atmospheric circulation as an exogenous explanatory variable and requires its classification to M distinct atmospheric modes. Huth et al. showed there is a wide variety of methodologies available to classify circulation patterns which can be divided into three categories; subjective (manual), mixed (hybrid) and objective (automated) [17]. In this study a two-stage automated clustering approach is adopted, where the k-means algorithm is applied to further group the atmospheric patterns identified in the "SOM atmospheric circulation catalogue for wind energy applications over the UK" [22]. A mixed approach is also considered whereby the second-stage grouping is made subjective judgement.
The SOM is a two-layer Artificial Neural Network (ANN) consisting of an input layer and an output two-dimensional lattice of neurons, characterized by their synaptic weights vector, w and their location at the SOM lattice. Learning in SOM is achieved through the processes of competition, cooperation and adaptation. During the competition phase an input pattern is presented to the SOM and a metric distance is calculated for all neurons. The neuron with the smallest distance is the "winner" or Best Matching Unit (BMU), which through a radial basis function determines the topological neighbourhood of the "excited" neurons at the SOM lattice during the cooperation phase. Finally, in the adaptation phase the BMU and the "excited" neurons' weight vectors are updated towards the input vector.
To describe the atmospheric circulation with a higher degree of generalization, a k-means clustering algorithm is performed as a post-processing step for further grouping the SOM patterns. The k-means is one of the most well known unsupervised clustering algorithms that defines the centroids through an iterative procedure and thus associates the input data to the nearest centroid. The k-means algorithm is analogous to the SOM learning process, with non-existing cooperation and adaptation phases. For comparison, a mix approach is tested whereby the SOM patterns are grouped by expert meteorologists through inspection of charts summarising the SOM patterns.

Case Study
The proposed method is tested on measurements of hourly mean wind speed made in the UK at 23 locations in the UK from 2002 to 2007 (as shown in Figure 1), inclusive, from the MIDAS dataset provided by the British Atmospheric Data Centre (BADC). These sites all have greater than 98% 'good' data coverage following quality control by the BADC. Years 2002-2005 are used for model order selection via 10-fold cross-validation, and 2006-2007 are used for out-of-sample testing [21]. Forecasts are produced for 1 to 6 hours ahead for all locations meaning that in total over 2.41 million out-of-sample forecasts have been produced and evaluated. Models (9)-(12) are all implemented for comparison.

Atmospheric Classification
The SOM implementation used here is based on a four-step SOM clustering framework proposed in [23], which has been also applied in a climatological context over Greece to examine the relationship of wintertime meteorological conditions with atmospheric circulation [24]. Philippopoulos et al. used the above framework to examine the association of atmospheric patterns with extreme wind power events in the UK [22]. The novelty of the aforementioned classification is that in addition to large-scale parameters mean sea level pressure (SLP) and geopotential height at 500hPa (Z500), the surface wind speed field (WS) is incorporated as a critical input for wind energy applications. The selected vari- Step 1) the spatio-temporal time series are standardized and the Principal Components Analysis (PCA) is used as a pre-processing step for data reduction purposes (Step 2). The classification is performed by using the SOM algorithm on the first PC scores that explain more than one predefined percent of the initial variance, while the optimum size of the SOM feature map (the number of atmospheric patterns) is determined using qualitative criteria and the Davies-Bouldin index [5] (Step 3). The resulting catalogue of atmospheric circulation patterns may then be visualised and studied as per the user's application. The application here is to condition VAR models on atmospheric modes and that is the focus of the remainder of this paper; for more detail on the the SOM process, please see [23,24,22].
Multiple SOM configurations for the UK were examined in [22] and the optimum size of the SOM feature map identified 21 atmospheric organized in a 7×3 map, visualised in Figure 2. Neighbouring nodes are inter-connected and each one is associated with the composites of the selected variables. An important advantage of the approach is that relative position in the SOM map is associated with specific features, such as seasonality, location of the pressure systems and pressure gradient along with the wind field, enabling the extraction of valuable information regarding the evolution of atmospheric circulation. The results indicate that in some cases rather minor changes in large-scale atmospheric circulation may lead to a different surface wind field over the UK, a finding with important implications for wind energy applications, including forecasting. Furthermore it was observed that many of the 21 modes were very similar and formed natural groups. The k-means algorithm was used to arrange the 21 atmospheric patterns into k atmospheric modes for k = 1, ..., 10. In addition, two groupings were formed by expert judgement, one arranged the 21 patterns into 4 groups, and the other into 9. The grouping used in the final model was chosen via 10-fold cross validation. The autoregressive order of the VAR models (9)-(12) is chosen to be p = 3 after 10-fold cross-validation on the training dataset for values p = 1, ..., 5 showed negligible difference in predictive performance, but analysis of the partial autocorrelation functions for each of the 23 series showed 3 lags to be significant in the majority of cases. While some sites showed that greater than 3 lags were significant, the results of the cross-validation exercise did not support increasing in model complexity any further.

VAR Model Fitting
Summary results from the cross-validation exercise are presented in Table 1. Comparing the candidate models indicates that the conditional VAR based on 3 atmospheric modes has the best predictive performance across forecast horizons from 1-to 6-hours-ahead, showing greater improvement over non-conditional methods for greater forecast horizons. The conditional VAR based on the 21 patterns from the SOM (without further grouping) provides no improvement on the non-conditional VAR model. Each hexagon represents one of the 21 atmospheric patterns identified by the SOM, shading corresponds to membership of the three clusters, or modes, found to be optimal for our forecasting application.

Atmospheric Modes and Grouping
The performance of the conditional VAR model with different numbers of atmospheric modes is plotted in 3. The special case of having 1 mode is equivalent to the standard non-conditional VAR and is outperformed by cases with two to five modes. The data-driven approach for grouping the SOM atmospheric patterns indicates that three atmospheric modes is optimal, and outperforms both groupings formed by expert judgement. The forecasting error of the 10-fold cross-validation of the conditional VAR models on the training dataset increases gradually for greater than three modes. This can be attributed to the degree of weather-related information required for improved wind forecasting skill without reducing the generalization ability of the models due to insufficient training events. Furthermore, the k-means grouping of the SOM atmospheric patterns is consistent with the inherent characteristic of the SOM scheme where the resulting patterns are topographically ordered in a two-dimensional map. In more detail, the first mode consists of patterns located at the second row of the SOM map, the second mode principally from the third row patterns and the third mode groups all the cases from the first row of the SOM atmospheric patterns shown in Figure 2.
The modes correspond to three distinct states of atmospheric circulation and wind speed conditions over the UK. The mode centroids are illustrated in Figure 4. The first mode is associated with anticyclonic circulation and moderate wind speed conditions. The high-pressure centre is located at the south-west of UK at western Atlantic, leading to an easterly component flow with maximum intensity at Scotland and northern England. The second mode consists of low-wind speed cases and according the SLP centroid, the calm conditions observed over the UK result from the combination of the low and high pressure fields at the west and east respectively. The third mode is directly linked with the cyclonic atmospheric circulation patterns and relatively high wind speed conditions. In more detail, the centre of the low-pressure centre can either be at the west, north, east or over the British Isles and represents throughout the year, however there is a higher frequency of events during the summer period, as shown in Figure 5. Approximately 40% of Mode 1 events occur between May and August where the high-pressure is related to the extension of the Azores anticyclone. Similarly, approximately 50% of mode 2 events are between May and August. However, in comparison to Mode 1, Mode 2 occurs less frequently during the winter months, less than 13% of events occur in between December and February. In contrast, Mode 3 is more common during the winter and transitional seasons, less than 5% of events occur between June and August. This is due to the frequent passage of extratropical cyclones associated with the North Atlantic storm track.
The atmospheric modes are defined by the synoptic scale conditions, which have a length scale of the order of 1000km and therefore tend to persist for a time period of days. Figure 6 shows the frequency distribution of the duration for which each mode event persists during the period 2002-2007. The distributions of Modes 2 and 3 are very similar; the mean duration is 86 and 87 hours respectively (which equates to 3.6 days), in comparison the duration of Mode 1 is generally shorter (mean duration of 57 hours or 2.4 days). This difference is largely due to the extremely long duration events which occur more frequently for Modes 2 and 3. The median duration for Mode 1 (38 hours) is actually very  Figure 4: Visualisation of surface level pressure field (SLP), geopotential height at 500hPa (Z500), and wind speed (WS) in units of ms −1 for the three atmospheric mode centroids.  similar to Mode 3 (42 hours), however Mode 3 is much more likely to persist for very long periods (over 10 days). This is generally associated with the passage of consecutive extratropical cyclones which are sufficiently close to prevent a change in the atmospheric mode. For all three modes it is very rare for the duration of an event to be shorter than 6 hours (10.5% for Mode 1, 8.7% for Mode 2 and 8.5% for Mode 3).
The underlying physical process governing the transition between atmospheric modes is clearly non-Markovian as the passage of synoptic-scale weather features is smooth and occurs on a temporal scale much greater than the hourly sample rate under consideration. The probability of transitioning from one mode to another will depend on more than the just present state (the definition of a Markov process) and will depend, for instance, on the amount of time spent in the present state. This supports our efforts to observe the atmospheric mode rather than taking a Markov-switching type approach based on an assumption that the transition between regimes is governed by a hidden-Markov model.

Results
Conditioning the forecast on atmospheric modes increases the accuracy of the wind speed predictions across sites and all lead times. For the 1-hour ahead forecast the RMSE of the predicted wind speed averaged across the 23 sites is reduced by 1.6% relative to VAR with Diurnal Dummies and 7.8% relative to persistence, as illustrated in in Figure 7. The improvement in the forecast  Figure 7: Improvement over persistence for averaged across all sites on test dataset.
skill due to the information provided by the atmospheric modes increases with the lead time. For example, for the 6-hour ahead forecast, CVAR improves the RMSE by 3.1% relative to VAR with diurnal dummies, and 23.9% relative to persistence. Figure 7 also shows that adding mode dummies to the VAR model with diurnal dummies has very little impact on the skill of the model. This indicates that the additional skill provided by the CVAR model is due to the better representation of the spatial structure of winds between the sites provided by the atmospheric modes, it is not simply a bias correction. The CVAR model produces an improvement in the forecast at all 23 sites (see Figure 8); however the magnitude of the reduction in RMSE varies from site to site. For example, for the 1-hour-ahead forecasts the reduction in RMSE varies from only 0.35% for Site 4 to 4.1% for Site 5. This result is also true for all of the atmospheric modes, i.e. for all sites there is a reduction in the RMSE of the wind speed using the CVAR model for each of the atmospheric modes. In general, the CVAR model provides the greatest improvement in the forecast when the atmosphere is determined to be in Mode 3, cyclonic conditions. Averaged across the 23 sites, there is a reduction in the RMSE of 2.4% during Mode 3 events, in comparison to a reduction of 1.1% and 1.4% for Modes 1 and 2, respectively. Furthermore, the reduction in RMSE is greatest for Mode 3 for 16 of the 23 sites. Further analysis of the sites has not revealed a clear relationship between the added value due to the atmospheric modes and the geographical location,  terrain type and the elevation of the sites. The distribution of the forecast errors is approximately Gaussian for all 3 atmospheric modes. Figure 9 shows the distributions for two locations, however similar results were shown for all of the other sites. The spread of the distributions varies for each mode and location. For all sites, the largest errors occur for Mode 3 which is not surprising given it is associated with cyclonic conditions and relatively high wind speeds. For the majority of the sites the distribution of the errors for Modes 1 and 2 are very similar. However, for the a number of the sites in Scotland and northern England the errors are larger for Mode 1, when these sites tend to experience higher wind speeds, as shown in Figure 4.

Conclusion
This paper presents a framework for incorporating the information provided by the large-scale meteorological conditions into a vector-autoregressive method for very-short-term wind forecasting using an atmospheric mode classification. The approach has been applied to a case study based on 6 years of measurements from 23 locations across the UK. As a result of the information provided by the atmospheric modes on the spatial and temporal structure of the wind field, the forecast skill is improved at all sites and lead times compared to competitive benchmarks. For the 1-hour ahead forecast, the RMSE of the predicted wind speed averaged across the 23 sites is reduced by 1.6% relative to VAR with Diurnal Dummies and 7.8% relative to persistence.
An improvement in forecast skill was shown at all 23 sites for all atmospheric modes. However, the model generally provided the greatest improvement in the forecast during cyclonic conditions (Mode 3). Averaged across the 23 sites, there was a reduction in the RMSE of 2.4% during Mode 3 events, in comparison to a reduction of 1.1% and 1.4% for Modes 1 and 2, respectively. For each mode, the distribution of the forecast errors was approximately Gaussian at all 23 sites. The spread of the distributions however varies for each mode and location. Despite, the increased forecast skill, the wind speed errors were typically largest for mode 3 (cyclonic conditions).
While forecast performance is consistently improved when atmospheric conditions were grouped into 3 modes, grouping conditions into 6 or more modes was detrimental to forecast accuracy, compared to non-conditional methods. Given the size of the dataset, it is unlikely that this is due to having insufficient data for parameter estimation alone. The unsupervised learning approach used for atmospheric classification presented here is not fully optimised for forecast performance, rather groupings are formed based on generic distance metrics. Semi-supervised learning methods may enable atmospheric classification to be performed in such a way that groupings are formed to explicitly improve forecast performance.
The framework presented in this study can be applied to any geographical location or combination of sites, however there are several key areas of consideration. Firstly, the atmospheric classification has been performed using reanalysis data. To run operationally, the method could be adapted to determine the atmospheric mode from the analysis of forecasts provided by a Numerical Weather Prediction model. Secondly, the application of this method to a large number of sites should consider sparse VAR approximation. The sparsity structure of such models could provide further insight into the nature or spatio-temporal structures under different atmospheric conditions. Finally, at present the model has only been applied to wind speed forecasting, therefore further work is required to quantify the benefits for wind power forecasting. teorological observations are freely available to bona fide research programmes from [21]. Data and code produced in the course of this work are available to download from the University of Strathclyde KnowledgeBase [1].