Fuzzy ARTMAP and GARCH-based hybrid model aided with wavelet transform for short-term electricity load forecasting

With the evolution of the electricity market into a restructured smart version, load forecasting has emerged as an eminent research domain. Many forecasting models have been proposed by researchers for electricity price and load forecasting. This state of art introduces a load time series modeled with a hybrid technique culminating from the logical amalgamation of GARCH, a conventional hard computing method, Fuzzy ARTMAP, an artificial intelligence-based soft computing technique, and wavelet transform, for treating the load time series. The study investigates into the ability of the proposed hybrid model in tackling the electricity load time series forecasting problems. The work under this study also includes comparisons drawn among models which use either one or two of the mentioned techniques and the model proposed. Results certify the efficacy and effectiveness of the model over others.


Introduction
Forecasting [1,2] knowingly or unknowingly holds an integral stature in every person's life. We often predict stock market, earthquake and weather in our day to day life. A business manager forecasts product sales [3]. People make future plans based on these forecasts. Here it should be understood that it is an impossible task to make exact forecasts, we can only work incessantly towards attaining higher accuracies. Generally forecasting presumes that future occurrences depend upon past or present observable events; it assumes that some aspects of the past pattern will continue in the future. Through observing and studying past data relationships can be established between the event and the parameters persisting at the then moment [4].
Forecasting with load time series is a challenging application because load time series are inherently nonstationary [5], deterministically chaotic, and highly noisy by nature. Above this by no technique the past information can solely determine the futuristic behavior of the electricity market. Therefore, to maintain global competitiveness, market dependency on advance computer technologies is increasing day by day.
Forecasting can either be spatial or temporal in nature. While spatial forecasts are based on area covered, temporal are based on time horizon utilized. Temporal forecasting can further be divided into three subcategories: Short Term, Medium Term and Long Term [6]. Out of these, short-term forecasting is of most importance because of its utility. They help plan capacity building, estimate load flows and prevent overloading to name a few. Forecasting techniques primarily use either statistical tools [7,8] or artificial intelligence based algorithms [9].
In this study, we present a new method of forecasting electricity loads using Fuzzy ARTMAP (FA) and GARCH along with wavelet transform (WT) for Fuzzy ARTMAP and GARCH-based hybrid model aided with  wavelet transform for short-term electricity load  forecasting  day-ahead forecasting. The approach implemented aims  to develop a sturdy, precise and efficient day-ahead  load forecasting tool utilizing data filtering technique,  employing WT, fused with a computational model  implementing conjointly FA (soft computing model), and GARCH (hard computing model). Comparison results of the proposed models' performance with that utilizing only FA and utilizing both FA and GARCH show a convincing reduction in mean absolute percentage error (MAPE). A simple artificial neural network (ANN) model utilizing the same data has also been implemented simply to draw a wider range comparison and highlight the supremacy of the hybrid model over ANN.

MODELING AND ANALYSIS
The procedure of forecasting proposed can be summed in three steps: 1. Decomposition of past load series using WT. 2. Using FA model fitted to one approximated and two decomposed series of WT and using GARCH on one decomposed series of WT. 3. Then using inverse WT to reconstruct the forecasted load series.
The literature of the study is arranged as follows: Section II details WT, FA, and GARCH techniques in brief to enhance reader understanding. Section III elaborates on the implemented hybrid methodology. Section IV comprises numerical and graphical results and Section V concluding the study. Any load data series embodies numerous spikes, nonlinearities, and fluctuations. The 2010 hourly New South Wales electricity load series is no exception to it. As illustrated in Figure 1, it is also characterized by chaotic and random changes. This series of 2148 h has been utilized in this study. Where the hourly data of 92 days or 3 months has been used to train the network and the data of next 24 h has been reserved as validation set data and is also the predicted subset.

Wavelet Transform
Wavelet transform is a mathematical model which transforms the original load series (in time domain) into constituent subseries over time domain of a different scale for processing and analysis. WT is most suitable for the nonstationary data (mean and autocorrelation of series are not constant). It is also well known that most of the load data series are nonstationary, hence the utility of WT [10]. The WT is used to decompose the original load series into several other series with resolution of different levels, which is called multiresolution decomposition [11,12].
Fourier transform (FT) decomposes the original load series into linear combinations as sine and cosine functions whereas by WT the series is decomposed into a sum of more flexible functions which are localized in both time and frequency [13].
Wavelet transform can be classified into two: continuous wavelet transform (CWT) and discrete wavelet transform (DWT).
The CWT of a continuous time signal x (t) is defined as [4,6]: where ψ (t) is the mother wavelet, given by eq. (2) where a acts as a scaling parameter and b as a translating parameter. (2) Each wavelet is formulated by scaling and translating the mother wavelet. The mother wavelet is an oscillatory function characterized by zero average and finite energy.
where (4) Here, c acts as a scaling coefficient while d acts as a sampling one.
To implement DWT as a filter, Mallat propounded an algorithm called Mallat multiresolution analysis or the Mallat algorithm [12]. It is a two-staged algorithm where decomposition occurs in the first stage followed by reconstruction in the second one. This study implements a three-level decomposition on the original load series yielding three detailed series (D) and one approximated series (A) as illustrated in Figure 2. Decomposing and reconstructing processes both involve filtering for which both high-pass (HPF) and low-pass filters (LPF) are utilized. While down-sampling occurs during wavelet decomposition, up-sampling and filtering is used in wavelet reconstruction. A Daubenchies wavelet function of order 5 (db5) has been utilized in this study as a mother wavelet.

Fuzzy ARTMAP
The FA network is a supervised learning method based on adaptive resonance theory (ART) [14]. FA network carries out learning without forgetting previously learned information [15]. FA is flexible and adaptive to changes in the environment and is self-organizing by nature [16]. FA network is a recent technique that has been utilized in forecasting applications including load forecasting.
Neural network is another popular artificial intelligence technique utilized in forecasting applications. Most neural networks struggle with the plasticity-stability dilemma which probes into ways by which a network can endure adaptive-ness or plasticity toward new inputs while staying aloof of the noisy data inputs, hence stability [17,18]. A general neural network encounters hindrance in preserving previously learned knowledge while learning newer concepts. The FA confronts this dilemma with a feedback mechanism laid between the competitive and input layers to allow fresh concepts to be absorbed without losing the knowledge attained previously. This results in a firmer learning environment endowed with faster convergence capability compared to traditional soft computing techniques [19]. This is also confirmed by the results of this study. These properties of FA can improve load forecasting performance as load series data are highly stochastic by nature ( Figure 1).
The functional layout of FA network is shown in Figure 3. An ARTMAP system embodies twin art modules (ART a and ART b ) to fabricate stable recognition categories corresponding to the arbitrary input patterns. ART a uses ART-1, a type of ART network which accepts only binary input, while ART b uses FUZZY ART. This setup enables to switch the binary module notations into a corresponding feature in the fuzzy ART module. For example, the * intersection operator (^) of ART 1 is replaced by the operator (^) in FUZZY ART. The architecture called FA is achieved by the synthesis of fuzzy logic and ART neural network, employing a close formal similarity between two computations of fuzzy subsets and ART category. Also, FA actualizes a new min-max learning rule that collectively minimizes predictive error and maximizes generalization, or code compression. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units," to meet the criteria of accuracy. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding [15] leads to a symmetric theory in which the AND operator (^) and the OR operator (v) of fuzzy logic plays complementary roles. In training, the best matching category is [19]: where (6) where T j = choice function, α = choice parameter, ^ = Fuzzy MIN operator, ρ = vigilance parameter, and |Itr∧Wj| |Itr| ≥ is the vigilance criteria. If vigilance criteria satisfy, then resonance occurs. During training, the vigilance criteria vary from baseline vigilance which is the initial value. If vigilance criteria qualify, then category J becomes representative membership function for time series, and the weighing vector of the winning category W j is updated as per the following equation: Here β represents the learning rate. If vigilance criteria fail, then category J is deactivated for the present load series by equating choice function equals to zero. If ART b does not predict the correct output for ART a , then the vigilance parameter is increased. This is called match tracking, in which the value of the vigilance parameter is slightly increased to a new value [17]: (8) where ε denotes the learning precision.
The scheme resizes a category on predictive success by amplifying the vigilance parameter ρ by a minimal amount essential to verify the predictive error in ART b . The parameter ρ holds an inverse relationship with the category size. A lower value leads to a broadly generalized category with higher compressed code. This parameter rates the minimum faith that ART a should have while accepting a category during hypothesis testing which focuses ART a on a new cluster. The failures at ART a increase ρ to that threshold value which in turn triggers ART a under a process called match tracking. This technique reduces generalization essential to correct a predictive error. The combination of these techniques, i.e. ARTMAP J = arg max 0≤j≤N T j (I tr ),  and match tracking leads to a faster learning and erudition from a rare event. The fuzzy ART reduces to ART 1 for a binary input and works as self for a binary input and works as self for an analog vector. Thus the crisp logics of ART 1 with their fuzzy counterparts form a potent module.
Once the training stage is completed, the FA network is used as a classifier of the input load series which is given to ART a . ART b is not used during classification process and the learning capability of the network is deactivated during classifying process (i.e. β = 0). In this stage we get predicted classified labels in the output of ARTMAP. These classified labels are later de-fuzzified to get the forecasted loads.

GARCH
GARCH stands for Generalized Autoregressive Conditional Hetero-skedasticity which is used to model observed time series. GARCH is effectively implemented to highly volatile time series caused by unexpected random effects [20,21]. The model GARCH (p, q) is defined as: (9) where μ is offset and ε t = σ t z t.
Considering a time series x t with a constant mean offset [4]: (10) where p is the order of GARCH terms σ 2 and q is the order of ARCH terms ε 2 .
As can be seen in eq. (10), in GARCH (p, q) model is p = 0, i.e. a GARCH (0, q) model becomes an ARCH (q) model.  A limitation of the GARCH model is that it can only be specified for stationary time series, hence the below equation must be satisfied for stationary time series only: (11) Steps for GARCH modeling [21,22] Figure 4 presents the schematic diagram of the prospective hybrid model for day-ahead electricity load forecasting built on the FA technique combined with GARCH and WT. The procedure for forecasting is as follows:

Proposed Methodology
1. In the proposed hybrid model, input variables are hourly data of electricity load, month, day, day of week, hour, previous week same hour load and previous day same hour load. Only the load series is passed through WT. The load series is decomposed through WT into four components. The decomposed detailed coefficients,  D1, D2, D3 (high frequency components) and approximated series, A3 (low frequency components) are then obtained by down sampling with HPF and LPF, respectively.
2. The characteristic decomposed load series (D1, D2, and A3) along with other input data are furnished into the FA network and decomposed load series D3 is fed into GARCH.  3. The output components of GARCH, FA network, and the decomposed detailed and approximate series are then processed by wavelet reconstruction to produce the dayahead electricity loads. This step-by-step summary is shown in Figure 5.

Numerical and Graphical Results
The paper introduces a new hybrid algorithm based on WT, FA, and GARCH, which accounts for the interactions of month, day, day of week, hour, previous week same hour load, and previous day same hour load. The proposed method has worked on the electricity load data of New South Wales. To rank the performance of the proposed model, the results have been compared to other models such as FA, FA + WT, and the most employed artificial intelligence technique, ANN. The summary table of this is illustrated in the Conclusion section. Above this the outputs of the mentioned models have been tabulated below followed by the respective graphs comparing the forecasted and actual data.
Before one takes a look at these, we have briefed how they have been tabulated and why they serve as an appropriate measure of the efficiency of any forecasting model. Error is defined as the difference between the actual value and the forecasted value for the corresponding period [10,[23][24][25][26]. (12) where ε t is the error for the period t, A t is the actual value for the period t, and F t is the forecasted value for the period t. MAPE or mean average percentage error is the most widely accepted parameter of forecasting error, which mathematically means: In this study, N has been valued 24 for daily electricity load forecasts. N should be valued 168 when we attempt weekly electricity forecasts. The graph shown below uses N = 24 as it predicts the day-ahead forecasts. Figures 6-9 represent the actual versus forecasted data for FA, FA + WT, FA + WT + GARCH, and ANN, respectively. Table 1 presents the actual versus forecasted data for all the techniques, namely FA, FA + Wavelet, ANN, and the proposed model (FA + WT + GARCH).

Conclusion
The hybrid model proposed in this paper is for shortterm electricity load forecasting. The model is the aftermath of befitting coalition of FA, GARCH, and WT. While WT looks after the ill-behaved load series, FA captures the nonlinear fluctuations by virtue of stabilityplasticity dilemma [27]. The attributes of FA renders the proposed hybrid method robustness and higher efficiency enabling forecasting meeting higher accuracy.
The model has also been compared with FA, FA + WT, and ANN. The results certify the efficacy of the proposed load forecasting hybrid model, as can be seen from Table 2.