Wind speed prediction for small sample dataset using hybrid first‐order accumulated generating operation‐based double exponential smoothing model

Wind power generation has recently emerged in many countries. Therefore, the availability of long‐term historical wind speed data at various potential wind farm sites is limited. In these situations, such forecasting models are needed that comprehensively address the uncertainty of raw data based on small sample size. In this study, a hybrid first‐order accumulated generating operation‐based double exponential smoothing (AGO‐HDES) model is proposed for very short‐term wind speed forecasts. Firstly, the problems of traditional Holt's double exponential smoothing model are highlighted considering the wind speed data of Palmerston North, New Zealand. Next, three improvements are suggested for the traditional model with a rolling window of six data points. A mixed initialization method is introduced to improve the model performance. Finally, the superiority of the novel model is discussed by comparing the accuracy of the AGO‐HDES model with other forecasting models. Results show that the AGO‐HDES model increased the performance of the traditional model by 10%. Also, the modified model performed 7% better than other considered models with three times faster computational time.

Argentina (1.63 times), and Vietnam (1.6 times) for a remarkable increase in wind power from the preceding year. An overall summary of the total installed wind capacity is visualized in Figure 1.
Although wind energy is one of the rapidly evolving technologies, the main disadvantage of wind energy is its intermittent, stochastic, and non-stationary nature. The randomness in wind speed hinders the stability of wind energy, which leads to additional operating costs. Therefore, effective wind speed forecasts are needed to improve wind energy utilization in power systems and to reduce the additional reserve capacity.
Wind speed forecasting methods are mainly classified into five categories: persistence, physical, statistical, artificial intelligence/machine learning (AI/ML), and hybrid. A brief description of these models is given below, whereas detailed reviews are available in the literature. [3][4][5] • The persistence method is also known as the naïve predictor. In this method, the current wind speed is strongly correlated with the immediate future wind speeds as v (t + Δt) = v(t). This method is only applicable for very short-term wind speed forecasts (0 -30min). • Physical models are based on numerical weather predictions (NWP) as input. 6 These models require extensive information of various meteorological data and characteristics of wind farms. Therefore, physical models are computationally extensive and are appropriate for medium-(6-24 h) to long-term (>24 h) predictions, such as 12 h, 7 24 h, 8 48 h, 9 and 72 h. 10 • Statistical models map internal relationships among historical data. Such models are fast to calculate and easy to build. In addition, these models have strong linear estimation ability and are best suited for very short to short-term (30 min to 6 h) wind forecasts. Commonly used statistical models include ARIMA, 11 exponential smoothing, 12 gray predictors, 13 and Markov chain 14 models. • Artificial intelligence/machine learning (AI/ML) models are motivated by how the human brain would solve the problem. These algorithms learn from past data to predict future wind speeds through computer simulations considering logical thinking, reasoning, and group behavior. Also, AI/ML models have powerful feature extraction and nonlinear mapping abilities. Such models effectively deal with incomplete and/ or uncertain data and are suitable for very short-to short-term forecasts. AI/ML models are further subcategorized into traditional and deep learning models. Commonly applied traditional models include ANN, 15 SVM, 14 and Fuzzy Logic, 16 while deep learning models include LSTM, 14 DBN, 17 CNN, 18 ESN, 19 and ELM. 20 • Hybrid models are the combination of two or more methods from other categories. Since a single predictive model does not fully capture the complex relationship of wind speed, therefore, a hybrid model can recompense for some of the deficiencies of individual models at the expense of additional complexity. Hybrid models can further be classified into four subcategories: weighted models, feature selection models, decomposition models, and error processing models. Some of the recent architectures include VMD-DE-ESN, 21 MODWT-ARIMA-Markov, 14 and ICEEMDAN-LSTM-GWO. 22 While analyzing the global distribution of wind capacity for the year 2020 (as shown in Figure 1), it is observed that more than 20% of countries have a maximum installed wind capacity of 5 MW. Also, wind power generation has recently emerged in many countries. For example, in Saudi Arabia, the first wind farm with a capacity of 400MW started generating electricity in August 2021. 23 Hence, the availability of long-term historical data at various potential wind farm sites is limited. In these situations, such forecasting models that use only a small sample size are needed to comprehensively address the uncertainty of raw data.
Literature studies of wind speed forecasting showed that grey models are the most applied method for small sample datasets. Satisfactory prediction accuracy can be achieved with a minimum dataset of 5-11. 13 For this reason, many researchers focused on improving the grey models for wind forecasting and proposed improved versions, including FOTP-GM(1,1), NDGM(1,1), and FAGM(1,1). 24 Other statistical models such as ARIMA and Markov chain models are applied for datasets of 15 25 and 200 14 samples, respectively. On the contrary, exponential smoothing (ES) models are rarely considered for small sample datasets.

| Previous studies on exponential smoothing model
The exponential smoothing (ES) forecasting models [26][27][28] have been popular since their development.
Forecasts generated from ES models are a weighted average of past observations, with previous values assigned exponentially decreasing weights. It means that the recent observations are given more weight as compared to the older ones. This simple structure provides fast and reliable forecasts for various applications, including wind speed predictions.
Standard ES models are based on the trend and seasonality of time series. These include simple ES, Holt's linear trend, Brown's linear trend, damped trend, and Holt-Winter ES methods. Most of these methods are successfully applied in wind speed forecasting. 12,[29][30][31][32][33][34][35][36][37][38] However, as this study aims to analyze wind speed forecasts for a small dataset, therefore, double exponential smoothing models (trends with no seasonality) are considered for further discussions.
Kusiak and Zhang 34 applied Holt's linear trend model for six steps ahead wind speed forecast considering a resolution of 10 s. The evolutionary strategy algorithm is used to identify the optimal smoothing constants. The results showed that the prediction performance of the double exponential smoothing (DES) model is comparable to that of the artificial neural network. However, a dataset of 80,000 instances was considered for the analysis. Another method of DES, Brown's DES model, is applied by Yang et al. 32 The training is performed for a dataset of 1500. The smoothing constant is selected as 0.9 with no further discussion on the selection criteria. Furthermore, the DES method performed the least as compared to other models. Hybrid exponential smoothing models are also considered to improve forecasting performance. These include DES-PSO-BPNN-Elman 31 and CEEMDAN-CC-FA-Holt-SVR. 36 Both the models applied for a training dataset of 4500 and 1300, respectively. The optimal smoothing constants are evaluated based on the sum of squared errors of prediction. Results showed that the DES model is a better substitute for other statistical models. Besides technical programming, built-in software functions are also commonly used for the double exponential smoothing forecasting model. These include Minitab, 39 Eviews, 40 forecast package of R, 41 and SAS. 42 Table 1 summarizes the studies implementing the DES model for wind speed forecasting.

| Limitations of previous studies
From Table 1, three major observations are summarized as follows: • None of the studies considered the DES model for a small wind speed dataset, and hence, the problems associated with the small dataset are not analyzed.
• There is no specific method to initialize the level and trend. However, several studies suggest a least-square estimate to be the best choice. This is why some software, such as Minitab, 39 SAS, 42 and Eviews, 40 are working on least-square estimates. A linear regression model is fit in time to available data as v o =̂o ,T +̂1 ,T t. This would imply y-intercept (̂o ,T ) and slope (̂1 ,T ) of the trend line as L o and T o . • A common approach to finding optimal values of smoothing constants is to search a combination of parameters that minimizes the prediction errors. The values are constrained between 0 and 1. Furthermore, once the smoothing constants are evaluated, they remain constant for the rest of the dataset without recalibration.
To address the first observation, we analyze the wind speed data of Palmerston North, New Zealand. The summer season data are gathered from NIWA with a temporal resolution of 10 min. The traditional Holt's DES model is applied with a rolling window of 6, and the results are displayed in Figure 2. It is observed that the traditional model starts predicting negative values for non-negative time series. Even, for a time series fragment of V = {3.6, 2.9, 1.9, 1.8, 1.1, 0.5}, all the combinations of smoothing constants 0 < ( , ) < 1 are providing negative predictions when the least-square estimate method is considered to initiate the level and trend. This led our focus to the second observation of how to initialize the level and trend.
In an early study, Makridakis and Hibon 43 analyzed seven methods to initiate the first forecast. The major conclusion is that type of initial values does not affect the accuracy of the prediction. Also, initializing by least square provides satisfactory results. 44 However, the conclusion is not suitable for a small-size dataset. As seen in Figure 2, the choice of the initial value is critical when the sample size is small. For the considered example of time series fragment, the Holt function of the forecast package of R applied a convenient initial value method, whereas Minitab used a least-square estimate. With ( , ) = (0.72, 0.01), both software predicted negative wind speeds, that is, −0.2259 and −0.1002 m/s, respectively. However, if some other combination of zero-value initialization method is considered, then the problem is rectified. This brought our attention to the third major observation, that is, calibrating smoothing constants at every instant.
From the analysis of Figure 2, it is observed that if uncalibrated smoothing constants are considered, and then, there are more chances for negative wind speeds. It is because the addition of the latest observation does not change the parameters of the prediction model. Hence, unoptimized smoothing constants imposed an arbitrary handicap on the forecasting model. 45 As depicted in the heat map of Figure 2, with the zero-value initialization method, more than half combinations of 0 < ( , ) < 1 are predicting erroneous values, while the rest are under the acceptable limits. Therefore, optimizing at each time origin is required than optimizing only once. 46 To the best of the authors' knowledge, these problems are not discussed in any wind forecasting literature. Therefore, in this study, we improve the forecasting performance of the traditional Holt's method by considering the above-discussed observations to avoid erroneous wind speed forecasts. The main contributions of this study are as follows: T A B L E 1 Studies implementing the DES model for wind speed forecasting

Studies
Niu et al., 31 Yang et al., 32 Zhang et al. 33 Kusiak and Zhang, 34 Prema and Rao, 35 Jiang et al., 36  Initial values The remainder of this paper is organized as follows. In section 2, the traditional models are defined. In section 3, three improvements are suggested to enhance the forecasting accuracy. The performance of the novel hybrid F I G U R E 2 Forecasting issues with the traditional Holt's Double exponential smoothing model for a rolling widow of 6 model is discussed in Section 4, considering wind speed data of Palmerston North, New Zealand. Finally, conclusions are highlighted in Section 5.

| Double exponential smoothing model
Double exponential smoothing models are preferred when data exhibit a trend. Two commonly used DES models are Brown's method and Holt's method.
Brown's method is the extension of simple exponential smoothing model and is given by where is the smoothing constant and S (1) ( Different from Brown, the time series is divided into two components in Holt's method: Level (L t ) and Trend (T t ). The two components can be calculated as where and are the smoothing constants. In Holt's method, the trend also updates itself via equation (5), expressed as the difference between the last two smoothed values. 47 Based on level and trend, kstep ahead forecast can be evaluated as where is the damping parameter. If = 1, then the damped model is identical to Holt's DES model.

EXPONENTIAL SMOOTHING MODEL
For any of the above-mentioned DES models, forecasting is summarized in three major steps: initial value calculation, optimal smoothing constants selection, and applying forecasting equations (as discussed in Section 2). In this section, we discuss the improvements related to the first two steps. Hereafter, the traditional model with the first two improvements is termed as HDES.

| Improvement # 01: optimal smoothing constant
A common approach for finding the optimal values of smoothing constants ( , ) is to explore the parameter combination that minimizes the sum or mean of squared error of predictions, However, there are two issues with the traditional approach. It is possible that a combination of ( , ) with least MSE might predict negative wind speed. As an example, a time series fragment of V = {1.1, 0.8, 0.6, 0.2, 0.2, 0.3}, suggests the best combination as (α = 0.41, γ = 0.01) based on least MSE. However, the predicted value is −0.0506 m/s. The same problem occurred with other rolling windows (N) as well, as shown in Table 2. One solution is to apply Box-Cox transformation with λ = 0 for imposing a positivity constraint.
Secondly, in most cases, the range of smoothing constants are constrained between 0 ≤ ( , ) ≤ 1. As discussed in Figure 2, it might be possible that none of the combinations is applicable for prediction, and thus, the traditional constraints limit the model applicability. Although Hyndman et al. 49 proved that the model is forecastable if 0 < < 2 and 0 < < 4 − 2 (see condition 1 of Theorem 10.2 in ref. 49 ); however, the concept of admissible constraints is rarely used for wind speed predictions (see Table 1).
YOUSUF et al.
In this study, instead of identifying a single combination of ( , ), the model stores a vector of all combinations. Initially, the smoothing constants are considered between 0 < ( , ) < 1 with a step size of 0.1. Therefore, a vector of 81 combinations is stored in memory. If a particular combination predicts a negative wind speed, it is discarded from the primary vector. The final vector is then sorted by the least MSE value to identify the ascending series of combinations. However, if all the combinations predict erroneous values, then admissible constraints are considered to find the optimal smoothing constants in the next step.

| Improvement # 02: Initial Value calculation
Considering the simplest form of exponential smoothing model with no trend, The equation shows that the final forecast (v t ) is influenced by the initial value (v 0 ) with a factor (1 − ) t . Therefore, for a small sample dataset, the estimation of initial values has high relevance to the final forecast.
Different methods are identified in the literature to initialize the level and trend. A general approach is to obtain by least-square estimates. A linear regression model is fit in time to available data as, where ̂o ,T and ̂1 ,T are considered as the initial values L o and T o .
Other methods include convenient initial values, zero values, and first data point. In this study, we considered eight different initialization methods, including least-square estimates. Such meth- This study introduces a mixed initial value method. The first initialization method is applied to identify the optimal damping constant, while the second method is used for final forecasts.

| Improvement # 03: first-order accumulated generating operation
The third modification is to apply First-order Accumulated Generating Operation (1-AGO) to the original time series as:  2, 3, ⋯, n Once the prediction is completed, then the forecasting series can be obtained by inverse 1-AGO as: Based on the above improvements, the modified model is summarized as follows: 1. Apply 1-AGO on the original wind speed sequence to obtain V (1) . 2. Select the first method to initialize level and trend.
This method is used to obtain the optimal damping constants. 3. Calculate the mean squared error for all the combinations of smoothing constants considering traditional constraints, that is, 0 < (α, γ) < 1 with a step size of 0.1 and stored in a vector and then sort the matrix in ascending order corresponding to the least MSE against the values of smoothing constants. For more optimized results, a smaller step size such as 0.01 can also be used at the expense of higher computational time and storage. 4. Select the second method to initialize level and trend and then compute the predictive value using the DES model based on the order of optimal smoothing constant matrix. 5. Apply inverse 1-AGO on the forecasting sequence. 6. Discard the combination if the model forecasts negative wind speed, then use the next combination for prediction. 7. If none of the combinations is applicable, then apply admissible constraints, that is, 1 < α < 2, to find the best order and repeat steps 3 to 6.
The overall methodology of the prediction model is presented in Figure 3.

| Site description
New Zealand has 17 wind farms operating with a total installed capacity of 690 MW. It supplies about 6% of New Zealand's annual electricity production that roughly corresponds to the electricity consumption of 300,000 households per year. Of these wind farms, more than 40% of the wind capacity is installed near Palmerston North. Tararua wind farm is the largest one (161 MW) and is 10 km from Palmerston North. Other operating wind farms near the city are Te Apiti Wind Farm (90 MW) and Te Rere Hau Wind Farm (48.5 MW). In the near future, Turitea wind farm will be the largest wind farm of New Zealand (222 MW) and is planned approximately 10 km southeast of Palmerston North. 50 Palmerston North (40.382°S, 175.609°E) has a large number of strong winds (>8.6 m/s) with west-northwest as the predominant direction. From the analysis of the recorded data, the seasonal percentage of strong winds is 37%, 26%, 19%, and 18% in spring, summer, winter, and autumn, respectively. 51 In this study, wind speed data from December 1, 2017 to February 28, 2018 is considered. 52 The collected data are first preprocessed to identify any anomalies using Box-plot analysis. A total of 30 inaccuracies were found that were imputed using Piecewise Cubic Hermite Interpolating Polynomials (PCHIP). Next, the finalized observations are divided into training and testing datasets with a ratio of 3:1. Thus, the training dataset contains 9072 observations, whereas the testing dataset has 3888 observations.

| Mixed initialization method
The results of the mixed initialization method for HDES, damped HDES, and AGO-HDES are tabulated in Tables 3-5. First initialization is applied to evaluate the optimal smoothing constants, whereas the latter is used for wind speed forecast. Four categories of initial values are considered in the analysis. These include zero level and non-zero trend (IV4 and IV8), non-zero level and zero trend (IV3 and IV7), both zero (IV2) and both non-zero (IV1, IV5, and IV6).
Results show that the mixed method strategy is improving the model performance. The best initial values to evaluate optimal smoothing parameters are IV2 and IV8 for all three models. Similarly, the best initialization method for the final forecast is IV3 and IV7 for HDES, IV2, IV3 and IV7 for damped HDES, and IV1 and IV8 for AGO-HDES. More specifically, zero trend is the best choice for the original time series, whereas zero level provides the best forecast for the AGO time series. It is because the AGO time series always have an uptrend.
Compared with HDES and damped HDES models, it is observed that the damped model provides three times more suitable initialization combinations than the HDES model. Among 64 combinations, HDES provides four options with the least MSE, while the damped HDES model has 13 appropriate options for the same case study.
While comparing all three models, it is observed that the novel AGO-HDES is lesser affected by incorrect selection of initialization method. The MSE values for AGO-HDES are adjusted between 0.3705 and 3.9469 whereas, for HDES and damped HDES, MSE varies from 0.3839 to (14)

F I G U R E 3
Proposed framework for AGO-HDES forecasting model 371.16 and 0.3762 to 22.59, respectively. Therefore, it is concluded that the novel AGO-HDES model is a better option than HDES and damped HDES models.
It should be noted that this conclusion is only applicable for small sample dataset. For larger rolling windows, the initialization method does not affect the accuracy of the prediction, as shown in Figure 4. Therefore, it follows the same conclusion as inferred by Makridakis and Hibon. 43 IV1  IV2  IV3  IV4  IV5  IV6  IV7  IV8   IV1

| 11
concludes that forecasts are more responsive to recent levels and less emphasized by trend estimates. Figure 5 is a screenshot of the sample real-time forecasting video. The complete video is attached in the Video S1. This study evaluates the forecasting models' performance based on commonly used mean absolute error (MAE) and root mean square error (RMSE). Furthermore, the enhancement of the traditional model is estimated through improving ratio. The formulations of the considered three metrics are given in equations 15 to 17.
where v t and v t are the predicted and measured wind speeds at instant t, n is the number of observations, and I is a general index for comparison. Table 6 shows the performance of the forecasting models. Holt DES method is better than the Brown DES method. This is because Holt's method updates both level and trend. However, the traditional DES methods are lesser accurate than other statistical models. The proposed modifications enhanced the prediction accuracy. Analysis shows that the mixed initialization method increased the performance of the traditional model by 10%. The MAE and RMSE values of traditional DES model dropped to 0.452 m/s and 0.608 m/s from 0.509 and 0.678 m/s, respectively. Similarly, the modified models performed 7% better than GM (1,1) and Markov chain models. The accuracy measures show that AGO-HDES performed the best among all models with the least MAE and RMSE values of 0.452 m/s and 0.609 m/s. With the lowest RMSE, the AGO-HDES model captures the intermittency of small sample size wind speed in finer detail with fewer outlier errors. Furthermore, Figure 6 shows the scatter plot between measured and predicted wind speed by the AGO-HDES model for one step ahead forecast. The correlation coefficient of 0.96 also endorses the strong correlation between measured and predicted wind speed.
In addition, the best performance is also accompanied by very short computational time. All models were programmed in MATLAB with an Intel i5, 1.70 GHz processor with quad-core, and 16 GB RAM. For each model, the test was run ten times, and the average computational time was considered. Table 7 shows that the AGO-HDES model is almost three times faster than the GM (1,1) model. Therefore, the proposed model is practically applicable for wind speed forecasts.

| CONCLUSION
The performance of most of the existing data-driven wind speed forecasting models decreases considerably for small datasets. Also, long-term wind speed historical data are not always available for newly identified sites. Therefore, in this study, a hybrid first-order accumulated generating operation-based double exponential smoothing model is proposed for small-size wind datasets. The generalized conclusions are as follows: • The traditional model might predict negative wind speed values because of two reasons. Firstly, the most common initialization method is the least-square estimate, which is biased to a small-size dataset. Secondly, a combination of smoothing constants with the least MSE might cause erroneous wind speed predictions. • In comparison, we proposed three modifications and named the model as AGO-HDES. Firstly, the model stores a vector of all combinations of smoothing constants instead of identifying a single combination. If a particular combination predicts a negative wind speed, it is discarded from the primary vector. The final vector is then sorted by w.r.t least MSE value to identify the ascending series of combinations. Secondly, a mixed initial value method is introduced. Instead of considering a single initial value throughout the process, the first initialization method is applied to identify the optimal  1) and Markov chain models. • The AGO-HDES model is almost three times faster than the GM (1,1) model with the same rolling window.
The novel AGO-HDES model is well-suited for shortterm wind speed forecasts. However, the problem of trend lag is not eliminated entirely, and a study is needed in this regard. Dixit et al. 53 addressed the issue of "prediction lag" or "timing error" in wave height forecasting by proposing a multilevel neuro-wavelet technique. The central idea of the study is to apply wavelet transformation multiple times such that the correlation is removed. Therefore, decomposition-based hybrid models are F I G U R E 5 One step ahead wind speed forecast for Markov Chain, GM (1,1), Traditional DES and AGI-HDES models T A B L E 6 Performance evaluation of the forecasting models Here α′ is the weighting factor.
Bold indicates best options of initialization method based on MSE criterion.

| 13
required to address the same in wind speed forecasting. Similarly, a dataset of only six values was considered, with a resolution of 10 min, to predict one step ahead wind speed. Hence, the cyclic component is not considered. Therefore, an adaptive higher order exponential smoothing model will also be considered in the future for low-resolution data to address the diurnal effects of wind speed.