Data-driven models for short-term ocean wave power forecasting

In order to integrate wave farms into the grid, the power from wave energy converters (WEC) must be forecasted. This study presents a novel data-driven modelling (DDM) method to predict very short-term (15 min–4 h) and short-term (0–72 h) power generation from a WEC. The model is tested using data from an oscillating body converter. Several other methods are tested as well. These include support vector machines (SVM), neural networks (NN), and recurrent neural networks (RNN). Of these, the best is the long-short-term memory (LSTM) network, which is trained and updated on observed values. The experiments demonstrate both the SVM and NN forecast well. However, the proposed deep learning models predict them more accurately. The models work well over short horizons. At horizons longer than three days, accuracy deteriorates, and the models cannot ﬁt the data well.


INTRODUCTION
While wind and solar have been the leading sources of renewable energy up to now, waves are increasingly being recognised as a viable source of power for coastal regions [1]. The global potential of wave energy has been estimated at 3.7 terawatts, almost double the world electrical consumption in 2010. Ocean wave energy is more predictable than wind and solar [2]. Further, the energy density of water is greater than wind [3]. Although there is no single preferred design for wave energy converters (hereafter WECs), the technologies fall into three main categories: oscillating water columns (OWC), oscillating body systems (OBS), and overtopping converters [4]. One key aspect of integrating wave farms into the grid is power forecasting, which is used for planning and calculating reserves [5][6][7]. Wave energy has some of the same problems as wind and solar. Waves can show non-linear variability and intermittency. In particular, waves are more volatile at night when wind speed is often higher. The horizons involved in the management of power grids range from as little as a few hours to as long as several days. The models used here are therefore tested for very short-term (15 min-4 h) and short-term (0-72 h) horizons.
The methods for forecasting wave energy comprise two major approaches, physics-based models and data-driven techniques. Physics models have been in operation since the 1960s, with the theoretical basis established in the late 1950s [8]. As the models have become more sophisticated, they have emerged as one of the most powerful tools for the study of surface water waves [9]. The major physics models include WAVEWATCH III in the United States, and the ECMWF (European Centre for Medium-Range Weather Forecasts) wave model [10,11]. In addition, some small-scale models such as SWAN (Simulating Waves Near Shore) have been used to study coastal conditions [12,13]. The physics models predict the standard wave parameters, the significant wave height and the wave period. The wave energy flux, and the power output from WECs can be calculated from these values based on the power matrix of an operating device [14,15].
This study extends the analysis of data-driven methods, focusing on machine learning and artificial intelligence, to The schematic diagram of data-driven modelling forecast power generation from WECs. The remainder of the paper is organized as follows. Section 2 describes the methodology. Section 3 provides the framework for wave power prediction model. The forecasting experiments are run in Section 4.

METHODOLOGY
In theory, the DDM related approaches should be part of machine learning (ML) and data mining methods. As Figure 1 shows, the DDM algorithm is used to determine the relationship between a system's inputs and outputs using a training data set. Hence, they consider influential approaches to do classification and regression based on historical data [34]. Recent practices show that the DDM yields good forecast results, in particular for very short-term prediction up to several hours [35]. SVM, NN as well as LSTM, have been considered as the main representatives to be forecast.

Support vector machine
SVM is a linear classifier proposed by Vapnik. To date, SVMs have been widely applied to pattern classification problems as well as non-linear regressions because they have the potential to handle very large feature spaces and dimensional vectors [36]. For a given set of samples , the ideal method of SVM for regression is mapping the x into a high dimensional feature space.
denote coefficients that have to be predicted from the samples.
is determined from the data by minimising the sum of the empirical risk R[w] and a complexity term ‖w 2 ‖, which enforces flatness in feature space where is a regularisation constant and |( f (x i ) − y i )| is cost function. When the " − insensitive loss function is used the Finally, the coefficients i , * i are obtained by maximising the following form The data points associated with them are called support vectors. Parameters C and εare free and have to be decided by the user. More details of SVM for regression can be found in [37].

Neural network
Artificial neural network is a type of DDM whose structure and calibration procedure have some analogies with brain neural networks. As shown in Figure 2, every neuron represents a single unit of NN that has a non-linear element with multiple inputs but single output. The coefficients of the linear combination are named synaptic weights, and together with the network biases they represent the free parameters of the model. The input layers contain input samples and the output layer gives the result. When the NN is created and trained with input and output data, the predicting result is made. Fundamentally, the NN mathematical model shows as follow: where x i , y i (i = 1, 2, … … , n) are the input and output signals of the network respectively. w i j denotes the weight between neurons and j denotes the threshold. f (x) is effect function. The

Long-short-term memory
The LSTM is a family member of RNN that uses their feedback connection store representations of recent input events. The specification makes it potentially useful for time series forecasting. As seen, there are many advantages of this particular RNN approach. The result of traditional RNN may leads to oscillating weights or takes a prohibitive amount of time. In order to deal with these issues, the LSTM learns to bridge time intervals in beyond 1000 steps even in case of noisy and oscillating, without loss of short time lag capabilities. This feature is suitable for short time horizon prediction [40]. Each LSTM includes input layers, hidden layers and an output layer. Figure 3a illustrates the additional 'cell status' of LSTM. In the left subplot, an RNN cell A t receives two inputs A t-1 and X t , then output A t+1 and H t. The LSTM cell in the right subplot additionally adds parameter C to RNN cell, and saves cell status from previous cell. Besides, Figure 3b depicts a single LSTM cell at time step t. The LSTM cell obtains the ability to add or remove information from the cell status by structures called gates. There are three gates designed in LSTM architecture: a forget gate (f t ) means how many memories will be reserved from c t-1 to c t ; an output gate (o t ) decides how many memories will be output to H t ; an input gate (i t ) means how many memories will be reserved from c' t . As shown in the right subplot of Figure 3b, the long-term memory is calculated with c t-1 multiply by f t . The short-term memory is calculated with c' t multiplied by i t , which means the input gate. As a result, the c t combines long-term with short-term information. In terms of output H t , it is decided by cell status c t and output gate o t . With the Equation (6)-(8), the LSTM architecture follows the steps to pass the information to the next one based on the three functional gates [41,42].
The mathematical representation of LSTM can be obtained as follows: Note that xn and hn represent weights of input and output from previous cells, and b n denotes bias with n ∈ (i, f , o).
Cell memory updates itself recursively by interaction of its previous value with forget and other values.
where W c , U c , b c denote weights and biases, and c denotes tangent function. Finally, the cell output can be computed by the equation below:

WAVE POWER FORECASTING FRAMEWORK
Three models are tested, the neural net, the SVM, and the LSTM. Figure 4 illustrates the forecasting strategy of this paper. The three DDM models are explained and employed; the different results are given separately. According to the forecasting purposes, the prediction strategy includes different horizon ranges from a few hours to up to 3 days.

Feature selection
The performance of DDM highly depends on its inputs. There are several observations collected from the WEC such as wave height, rotor speed, pressure etc. Network training with redundant inputs not only reduces the predicting accuracy but also increases the computation. Hence, each observation has several lags with different features that need to be determined. The principal component analysis (PCA) method is adopted to select the main predictors. PCA is widely used to filter redundant factors, details can be found in ref. [43]. Then the Granger causality test (GCT) is conducted to determine correlations between the remaining variables and the wave parameters. It is used to keep the testing horizon stationary; details can be founded in the literature [44]. As a result, the data collected from WEC are

Model architecture
In order to assess the performance of each model, the three purposed network are adopted separately. SVM uses wave power time-series data as training data. The polynomial kernel and the Gaussian radial basis function are used as the kernel function of this SVM. The standard three-layer BP networks are also selected as benchmarks.
For the NN modelling, the inputs are the pressure, flow, rotor speed and torque (see Table 1). The lagged input data are used to predict the power. For example, for the 1-hour ahead prediction, the previous 60 minutes of the input series are used. The inputs are classified into train value and predicting value. The threelayer feed-forward networks with a non-linear sigmoid transfer function are deployed.
The LSTM structure is a two-dimensional network. The four time-series inputs represent the vertical dimension, mentioned  Table 2.
The basic LSTM block uses logistic sigmoid and hyperbolic tangent as an activation function. Here, the ReLU activation function is deployed because it is easy to optimise and is not saturated. The traditional optimisation method for a deep network is stochastic gradient descent (SGD), which is the batch version of gradient descent. The batch method can speed up the convergence when training network. Here, the Adagrad optimisation method [45] is adopted, which can adapt the learning rate to the parameters and perform larger updates. To train the LSTM units, initialise weight matrices and bias vectors that include xn , hn and b n randomly. Then train the parameters by using BP method with the gradient-based optimisation, to minimise the cost function. For different observation points, the corresponding LSTM units are trained. The operating data collected from WEC

Evaluation for forecast result
The criteria used to evaluate the models include the mean absolute error (MAE), the mean absolute percent error (MAPE), and the root mean squared error (RMSE) [46].

Data acquisition and preparation
The data for modelling is collected from an operational WEC with 10 kW total capacity. Figure 5 shows the schematic diagram of the WEC. It is an oscillating buoy device (similar as Wavestar) and extracts wave energy from the ocean by using an up-anddown motion absorber. In theory, the WEC contains an energy absorbing system, energy conversion system and energy transmission system. The energy-absorbing system takes the responsibility to acquire energy from wave resource. Meanwhile, the energy conversion system converts kinetic energy to electrical power. Then the power is transmitted from device to grid using energy transmission system. The data lasted for approximately three months from February to April 2017 in sequence. The quality control procedures were applied to the measurements to fill the gaps and to replace the unrealistically large values. Some of the rectified observations are shown in Figure 6. It is clear that the type of signals shown in Table 1 are recorded at a one-minute interval.
The power signals represent the predictors of the proposed model. Time series of data in total are divided into two  Figure 7a illustrates the very short-term multi-step predicting results from 15 min up to approximately 5 h ahead. Figure 7b illustrates the short-term multi-step predicting results from three hours up to 90 hours ahead. Obviously, both the very short-term and short-term predictions (red lines) cannot fit the real values well in almost horizons. And the errors increase conspicuously as the forecast horizon increases. This tendency can be found notably in Figure 7b. The observing curve and forecasting curve have similar tendency between time point 370 and 380. But the errors between the two curves begian to increase sharply after time point 380, showing that the SVM model performs unsatisfactory. Furthermore, for very shortterm prediction, less than 4-hour ahead values seem believable by using SVM method. For short-term prediction, less than 3-day ahead values seem believable by using SVM method. Figure 8 shows the very short-term and short-term power prediction results from NN. It seems much better than SVM results while using the same training and testing datasets. Obviously, the prediction lines fit the observed data properly both for very short-term and short-term prediction. Compared with  Figure 8b, demonstrate that the model is extremely accurate.

Prediction results
Normally, each LSTM block trains at each time step based on the previous period's input signals. In order to achieve a more accurate forecast, the models used here are all updated on actual observations. Figure 9 explains the very short-term prediction and short-term prediction wave power prediction from LSTM network. Similar as NN, each time sequence in Figure 9a represents 15 minutes and the vertical axis means the power output from WEC. The blue line represents the observing data and the red line denotes the predicting data from the end of the observing data. The time point 600 means the end of observing data and from 601 to 620 time-point means the forecasting results of very short-term up to 5 hours (15 min×20). Similarly in Figure 9b, time-point 370 means the end of observing data and from 371 to 400 time-point means the results of shortterm up to 90 hours (30×3 h), more than three days. It can be concluded that the proposed deep LSTM performs more accurately than NN and SVM models. In particular, the forecasting performance remains stable during the entire forecasting period. For comparison, the traditional LSTM model were given results which uses predicting data as update strategy. The results are shown in Table 3. The evaluation index in Table 3 also confirm the results. The 1st, 2nd and 3rd comments represent the model trained with different hyper-parameters. For NN method, the MAPE values (mean the model accuracy) from very short-term demonstrate slightly better than short-term prediction. Similarly, the LSTM evaluation values from very short-term significantly surpass the short-term values. The RMSE values of very shortterm obtained 15.04, 16.24, and 16.26, compared with the shortterm 23.08, 29.06 and 24.12. Furthermore, all the metrics show that the proposed LSTM model predicts more accurately than NN.

Analysis and discussion
In case of mid-term prediction, Figure 10 gives the comparative results up to 16.5 hours in advance in forms of curves and bar charts. Each curve means different multi-step forecasting results and each bar chart means the errors between real values and forecasting values. The SVM model has the smoothest tendency while NN and LSTM models fluctuate dramatically. For very short-term prediction, the LSTM methods can fit better than other methods in this first 10 time steps (2.5 hours). For the entire view of the raw data, the very short-term prediction model cannot fit the raw data well more than two days ahead. Figure 11 also gives the compatible short-term results up to 24 days (each time stamp means 3 hours). The prediction for first 36 time-points fit well from purposed models. For the large values power output, all purposed models cannot fit well. In  other words, the short-term prediction has reliable confidence interval within the first 72 hours and the further predicting results seems suspicious. Furthermore, there are few abnormal values such as minus predicted by models and future work will fix these problems. The calculations were implemented in Matlab environment in an Inter ® Core TM i7-8550U CPU at 1.80 GHz mobile work-

CONCLUSION
The analysis has produced several findings. First, the new artificial intelligence methods tested here predict very accurately at extremely short horizons, which are needed for grid integration, but are generally shorter than the horizons at which physics models can forecast. Second, the models work well over slightly longer horizons of up to two days. However, forecast accuracy deteriorates rapidly at horizons of three days or more. Of the methods tested, the neural net and SVM do well. However, the LSTM is consistently more accurate. With the advent of new open source software which makes the LSTM readily available to users, this method can easily be adopted for wave energy forecasting.