Particle swarm grammatical evolution for energy demand estimation

Grammatical Swarm is a search and optimization algorithm that belongs to the more general Grammatical Evolution family, which works with a set of solutions called individuals or particles. It uses the Particle Swarm Optimization algorithm as the search engine in the evolution of solutions. In this paper, we present a Grammatical Swarm algorithm for total energy demand estimation in a country from macroeconomic variables. Each particle in the Grammatical Swarm encodes a different model for energy demand estimation, which will be decoded by a predefined grammar. The parameters of the model are also optimized by the proposed algorithm, in such a way that the model is adjusted to a training set of real energy demand data, selecting the more appropriate variables to appear in the model. We analyze the performance of the Grammatical Swarm evolution in two real problems of one-year ahead energy demand estimation in Spain and France. The proposal is compared with previous approaches with competitive results.


| INTRODUCTION
Energy demand estimation from macroeconomic variables is one of the most important indicators of a country's economic growth. 1 Financial crises are known to deeply affect energy demand, producing sudden reductions in the demand of the total energy consumed in a country, 2 as recently occurred during the hard worldwide crisis of 2008. 3,4 In periods of economic bonanza, on the contrary, energy demand usually soars, mainly pushed by the most energy-demanding sectors such as industry and construction. 5 Energy demand is also in close connection with social and environmental issues: Currently, about 80% of the global energy demand is covered by nonrenewable sources such as coal or petroleum. This dependence on nonrenewable resources produces important environmental issues and contributes to global warming and climate change, problems which are currently affecting millions of people, mainly in developing and poor countries. 6 The Paris Agreement, signed in 2015, is a global agreement on the reduction in climate change, which will try to limit the global temperature increase to less than 2 Celsius. 7 Moreover, the effective application of the Paris Agreement should achieve net zero emissions in the second half of this century. This goal will require for sure to deeply reduce emissions by signing countries. Nonetheless, the countries in the Paris Agreement publicly outlined what post-2020 climate actions they intended to take under it, known as intended nationally determined contributions (INDCs). It is easy to understand that some of these INDCs will be political decisions affecting key macroeconomic sectors such as industry and transport, among many others. The idea is to make compatible a correct economic development of a country with the reduction in emissions to fulfill the commitments signed in the Paris Agreement. The correct management of these balance between economic growth and emissions reduction will be, therefore, the guiding line of policy decisions over the next years. In this context, it is clear that obtaining an accurate prediction of medium and long-term energy demand of a country from its macroeconomic indicators will be basic for policy and decision makers. 8 Previous works on total energy demand forecasting from macroeconomic variables have mainly applied machine learning algorithms, including meta-heuristics techniques and neural computation approaches. The first work dealing with this problem was, 5 where a Genetic Algorithm (GA) was proposed to obtain the parameters of an exponential prediction model for energy demand in Turkey. The model proposed in that work was based on four input macroeconomic variables; Gross Domestic Product or GDP, population, import size, and export size. These four predictive variables have been also used in, 9 where a multilayer neural network was applied to solve a problem of total energy demand estimation in Korea. Other approaches such as Particle Swarm Optimization (PSO) have been proposed for problems of total energy demand estimation. In, 10 a Particle Swarm approach (PSO) is proposed for this problem in Turkey, including two prediction models, a linear and a quadratic one. Five predictive variables are considered in that work, the ones in 5 plus the country's growth rate in the years of analysis. In, 11 a PSO algorithm is also applied, in this case, to a problem of electricity demand in Turkey, considering linear and quadratic models. Ant Colony Optimization (ACO) has been hybridized with PSO in, 12 also for a problem of electricity demand estimation in Turkey. Other hybrids based on PSO and GA have been proposed in [13][14][15] applied to problems of energy demand estimation in China. Specifically, in 15 a PSO-GA has been applied to a problem of energy demand forecasting. Three different prediction models, linear, exponential, and quadratic are considered, and the predictive variables are economic growth, population, economic structure, urbanization rate, energy structure, and energy price. Moreover, there are other techniques in, [16][17][18] where different mixtures between clustering and neural networks in order to predict energy consumption have really good performance applied to industry problems. In, 14 a problem of energy demand future projection has been considered, and in, 13 the energy demand forecasting in the primary sector has been considered. More recently, in 3 a Harmony Search approach with feature selection has been proposed for a problem of energy demand estimation in Spain, and in, 4 a Variable Neighborhood Search (VNS) approach was employed for solving this problem, also comparing the results with that of an Extreme Learning Machine. In, 19 new models for energy demand estimation were proposed based on a Grammatical Evolution approach combined with a Differential Evolution algorithm.
In this paper, we explore the performance of a Grammatical Swarm (GS) approach to predict the total energy demand estimation of countries from macroeconomic variables. Grammatical Evolution is a modeling technique based on genetic Programming. It is divided into two main parts. On one side, there is a set of functions which may explain the process to be modeled. On the other side, there is an optimization engine which finds the best combinations of the set of functions in order to minimize the difference between the real data of the studied process and the data obtained by the model provided by Grammatical Evolution. Grammatical Swarm 20 is an implementation of a Grammatical Evolution, 21 where the optimization engine is a Particle Swarm Optimization (PSO) 22 algorithm. The exploration and intensification phases of the PSO are quite different from the classical grammar evolution based on genetic algorithms, which produces a significant effect in the algorithm's performance. This algorithm has shown to be superior to the classical popular Grammatical Evolution in several problems, 20 including neural network training. 23 In our proposal, each particle of the swarm represents one energy prediction model. Together with PSO, some modifications have been added to the original algorithm in order to avoid the stuck of the optimization process in local minima. The PSO engine searches for the best model according to the training data available. Note that in this algorithm, the PSO not only selects the best prediction model, including the best predictive variables, but also tunes its parameters. Thus, the idea of the paper is to use the GS algorithm to explore new models for energy demand estimation. Previous works use existing models, mainly linear and exponential models, though alternative nonlinear models have also been considered in the literature. 3 The proposed approach goes a step further, by trying to obtain completely new models using a GS algorithm to construct them and tune their parameters, in the same line as 19 did with a classical GE algorithm. In order to show the performance of our proposal, we have tackled two different data sets related to annual energy prediction in Spain and France in the last 30 years. The Grammatical Swarm has shown an excellent performance obtaining good results in terms of mean average relative error for both problems and showing a robust behavior despite the different distribution of the training data in both problems, escaping from overfitted models. The proposed PSO approach was first tested on a set of benchmark problems, giving better results than other PSOs strategies, when the difficulty of the problem to be optimized increases.
The remainder of the paper has been structured as follows: In Section 2, we present the problem definition and in Section 3, we present the Grammatical Swarm algorithm proposed in this paper, including a description of the grammar that has been applied and the PSO search engine. Section 4 details the results obtained by the Grammatical Swarm approach in two real problems of annual energy demand estimation in Spain and France, with comparison of the results to previous algorithms in the literature. In Section 5, we discuss the obtained results. Section 6 closes the paper by giving some final conclusions and remarks.

| PROBLEM DEFINITION
Let us consider a time series E = {E(t)} n t=1 of past energy demands for a given country, with n discrete values corresponding to different years; and a set of m predictive variables X = {X 1 (t), … ,X m (t)}, with t = 1,..., n. A model  provides an estimation Ê for E, at a prediction time-horizon t + 1 respect to the predictive variables. The problem tackled in this paper consists of finding a mathematical model  for the total energy demand estimation prediction, and tuning its parameters, in such a way that it estimates, as accurately as possible, the real energy demand E at time t + 1 (ie, it obtains, from the set The quality of a solution S in the GS (a given prediction model  and its parameters) is evaluated using a given

| 1071
MARTÍNEZ-RODRÍGUEZ ET Al. objective function, usually related to the similarity of the model output to the real energy demand values. In this case, it is considered that the root-mean-square error between the observed values and the predicted ones, which is to be minimized: where n* is the size of a reduced training sample (n* < n). Figure 1 shows an outline of the problem definition, and the role played by the GS in the prediction problem, as generator of the prediction model .

| PROPOSED GRAMMATICAL SWARM
Swarm intelligence is a computational paradigm that was introduced at the end of the 1980 decade, and published in the early 1990s. 24 It is based on the assumption that artificial intelligence cannot depend solely on individual behavior, but it should take into account the society influence. A swarm is formed by an indeterminate amount of particles making elemental actions. All particles interact among them and the environment without a central control, forcing the particles to do any action. Particles are subjected to limited abilities for problem resolution. Nevertheless, the interaction between the different particles which form the swarm allows improving the individual behavior, obtaining good solutions in many different scenarios. 25 In this paper, we propose the application of one of the existing swarm algorithms to a real-life problem. Specifically, we apply GS 20 to the problem of annual energy demand estimation from macroeconomic variables. In the rest of this section, we describe in detail the GS algorithm proposed in this paper.

| Grammar production
Grammatical Swarm is a population-based algorithm where each individual of the population is a particle, which represents a solution for a given optimization problem. Specifically, in this case each particle represents a model for energy demand estimation, encoded as a vector of integer numbers that will be decoded by means of a given grammar, in the same way as performed in Grammatical Evolution (GE). 21 Figure 2 shows the production rules of the grammar that we have considered in this work, and Figure 3 shows all the possibilities of the grammar expansion. This grammar is taken from, 19 in order to make the results of the GS comparable to those described in that paper. Table 1 shows an example of a particle decoding, taking into account the grammar shown in Figure 2. Assuming that a particle consists of a vector of integer numbers (named codons) such as (7,10,4,5,21,52,12,77,25,31,27,15,44,89,112), shown in the first row of the table, the decoding phase takes the first integer, 7 to decode the < func> symbol. The rule in the grammar for that symbol has only one production. Hence, the symbol is translated into < param> <op> <recExpr>. The next integer to be processed is 10, which is used to decode the first nonterminal symbol, <param>. Again, the rule for this symbol has only one production, 0.<digit><digit>, which substitutes the < param> symbol. Then, the element 4 is processed to decode the leftmost < digit> symbol. The rule for < digit> has 10 productions. Given that 4 mod 10 = 4, the fifth production of the rule is selected, which corresponds to the value 4. This process continues using the modulus operation till all the nonterminal symbols are decoded. In the example of the table, three remaining integer values are not used for decoding.
In the classical GE implementation, it is very common to consider wrapping, which is the feature allowing the return to the first codon if the final one is reached and there are nonterminal symbols not yet decoded. In the example shown above, no wrapping is considered.

| Particle swarm optimization engine
As stated before, GS is a GE implementation where the optimization engine is a Particle Swarm Optimization (PSO) algorithm. PSO is a bio-inspired meta-heuristic algorithm developed by James Kennedy and Russell Eberhart in 1995. 22 It is based on the social behavior of bird swarms, which try to make their movements in the most optimal possible way to find food.
Mathematically, given a target function f.
the optimization problem consists of finding in the maximization case, and in the minimization case.
The D-dimensional domain of function f in ℝ D is called the search space. Every point in this space is characterized by the D-dimensional vector, x, which represents a candidate solution for the problem. These vectors are called particles. 26 Let X be a swarm, formed by N particles: where T A B L E 1 Mapping procedure that converts an individual (second column) into an expression. Each row represents a derivation step. The grammar applied is represented in Figure 2 Derivation The number N of particles defined in the swarm depends on the nature of the problem to optimize, but the typical range is between 10 and 60 particles. During the search process, the particles update their position at iteration t, that is, the values of the elements x i (t), according to the Equation (7): where v i (t) is the velocity of the particle i at iteration t and is computed by Equation (8): being u(0,1) a sample from an uniform distribution between 0 and 1, w the inertia weight, c 1 the individual weight and c 2 the social weight. Those weights are meta-parameters of the algorithms that we fixed after an experimental setup. p i is the the best position reached by the particle i, and g is the best position of all the N particles which form the swarm. Some modifications have been made to the original GS algorithm in order to adapt its behavior to the problem at hand. In order to avoid the stuck of the particles of the swarm in local minima, two different families of particles are deployed by the algorithm. The first family (C) is ruled by Equation (8), and the second family (E) is ruled by a modified velocity equation, Equation (2), that repels the particles from the already explored parts of the domain of the function. This modification is based on the calculation of the mass center for the t th iteration, MC(t), of all the particles that have been deployed in the function, assuming that the behavior of the mass center is equivalent to the behavior of the system and can be computed by Equation (9): The equation of the velocity that rules the E family of particles is then the following, where c 3 is the weight of the repulsion and x j, max , x j, min are the boundaries of the domain for the function f.
Besides, in order to make possible the integral optimization because of the nature of the problem's encoding (integer values), the result of Equation (7) applied to both families is rounded to the closer integer.

| Final grammatical swarm algorithm proposed
As previously defined, we have a Grammar Production, which is the method that allows the automatic creation of mathematical models from the individuals by means of a given grammar, and also a PSO algorithm, which allows the optimization of the best grammar by means of swarm intelligence. The combination of both techniques produces Grammatical Swarm algorithm, according to. 20 Algorithm 1 shows the pseudo-code for the proposed GS, which includes the modifications explained below. The quality of each particle, that is, the objective function to be minimized, is the root-mean-square error (RMSE) between the real energy demand data and the prediction made by the model generated after the decoding of a particle. Figure 4 shows a diagram of this process.

| EXPERIMENTS AND RESULTS
This section presents the results obtained applying GS for solving two different problems of energy demand estimation (one-year ahead prediction), Spain and France. We have gathered data from both countries that involve several macroeconomic variables. The energy prediction has been carried out separately for each country, so we have tackled two different experiments.
The data sets have been randomly divided into two different sets of values: the training set, used for the obtention of the mathematical model, and the test set, used to check the quality of the model found. This division is made in order to avoid the effect of overfitting, where the model is only able to describe the data that has been trained with, instead of showing a good generalizing performance 27 and to test that the forecast is correct. We have considered the same values for the parameters of the PSO in both experiments. These parameters are summarized in Table 2.
In order to perform a comparison with the state-of-theart techniques, we will run two alternative algorithms on the same data set. These algorithms are a classical implementation of Grammatical Evolution, denoted as GE, and a hybridization between GE and Differential Evolution (DE), denoted as GE + DE. Both algorithms follow the implementation described in, 19 where GE + DE obtained the best results in the data sets studied in that paper.

| 1075
MARTÍNEZ-RODRÍGUEZ ET Al. Table 3 shows the parameter configuration for both GE and GE + DE algorithms in the experiments that we have conducted. Notice that the third column shows the values of the parameters that apply in the DE part of GE + DE.

| Energy demand estimation in Spain
For the estimation of the energy demanded in Spain, we have gathered data of fourteen macroeconomic variables. The full data set ranges from 1980 until 2011. The training set is formed by 15 years chosen with a uniform random sampling. Note that the test set is formed by the remaining 16 years. Figure 5 shows a graphic representation of the data set of actual demand of energy in Spain, highlighting the training data years (blue points).
All the variable values have been scaled to the values of 1980, which present the smallest values. The variables that have been taken into account are the following ones:
GDP per unit of energy use (€ per kTOE) 9.
Energy imports net (% use)

Parameter Value
Maximum number of Wraps 1 Chromosome length 300

Number of C particles 60
Number of E particles 30 w 0-0.73 Maximum number of iterations 10 000 000

| Energy demand prediction in France
Similarly as in the previous works in the literature, we have collected a set of values from nine macroeconomic variables from France. In this case, we have data from 1990 until 2016. The training set is formed by 13 years chosen with a uniform random sample. The test set is formed by the remaining 13 years. Figure 6 shows a graphic representation of the data set of actual demand of energy in France, highlighting the training data years. All the variable values have been scaled to the values of 1990, which present the smallest values. The variables that have been taken into account are the following ones: 1. Energy production (kTOE) 2. Population 3. Gross Domestic Product (€) 4. CO 2 emissions (Mton) 5. Electricity production (TWh) 6. Electricity consumption (TWh) 7. Unemployment rate 8. Usage of renewable energy (% electicity production) 9. Oil demand (Mtons) The settings of the grammars are the same as in the experiment for the Spanish data. However, given that the data set of France has only nine variables, the grammar has been modified by removing X10 to X14 productions from rule V (see Figure 2).

| Results
We have run 32 executions of the optimization process on the training data set, obtaining a set of 32 models for each one of the algorithms and considering as objective function the minimization of the root-mean-square error (RMSE) between the real data and the data resulting from applying the input parameters to the models obtained during the search process. Each model considers input variables in year t and produces an estimation of energy in the year t + 1. Both GE and GS use the grammar shown in Figure 2, while GE + DE uses the same grammar, but including the parameters to be optimized by the DE part, as described in. 19 For the test set, we have calculated the best average relative error ( * R ) and the mean average relative error ( R ). In the calculation of the mean average relative error, we have removed the worst and the best performing models. Table  4 shows the results for the energy demand prediction obtained in the training and test data set by the algorithms under study.

| DISCUSSION
As can be seen in Table 4, the results obtained by GS on the test data of Spain, outperform both GE and GE + DE F I G U R E 8 Actual energy demand compared to the prediction made by GS for the test data set of France methods, obtaining a reduction of 50% in relation to the mean average relative error of GE, which is the second best performer. On the other hand, in the test data of France, the behavior of both GS and GE + DE is comparable. However, the results of GE + DE are slightly better.
We observe that the behavior of the swarm intelligence is able to find out good models in both data sets. On the other hand, the GE + DE and GE methods are sensitive to the data that are selected for the test. As seen in Figures 5 and 6, the training data from Spain are more consecutively selected, which may have caused an overfitting of the GE + DE and GE methods, which obtained good results for the training data. The selected data from France are more sparsely distributed, and GE + DE is able to capture the behavior of the data. GS, on the contrary, performs very similarly with both data sets, showing more robustness than the GE methods and obtaining good results, which are also the best ones in the experiment with the data from Spain. This hypothesis can be confirmed by comparing the training data from the same problem in, 19 where data are more consecutively distributed and both algorithms (GE, GE + DE) obtain much better results in the test set. Figure 7 shows a graphic representation of the comparison between the estimation and the actual demand of energy in Spain for the test data set, obtained with the best model from the GS executions. As it can be seen, the model is able to successfully capture the trends of the actual data.
As in the previous experiment, Figure 8 shows the comparison between the prediction of the best model obtained by GS and the actual demand of energy in France for the test data set. Again, the behavior of the GS model is comparable to that of the actual data.
GS technique has shown to be more robust than DE or GE + DE. The use of this algorithm is recommended when data provided by the system to be modeled (ie, energy demand) are not uniform.
In future works, different energy fields should be tested. Electricity consumption from the electric network could be an interesting problem to be proposed, since the prediction of future power demand is crucial for a proper system functioning.

| CONCLUSIONS
Total energy demand estimation from macroeconomic variables is an important problem, deeply related to countries' economic growth measurement, and with consequences in policies for energy efficiency and environment protection. As such, research in developing accurate new models for energy demand estimation has been intense in the last years. In this paper, we have evaluated the performance of a Grammatical Swarm algorithm for generating new energy demand estimation models from macroeconomic predictors.
The proposed algorithm is based on a predefined grammar, able to decode different prediction models evolved with a particle swarm approach. The algorithm is also able to adjust the models to a series of real energy demand data. Once trained, the developed model should be able to reproduce with a good accuracy the series of energy demand from the macroeconomic variables and also estimate new values for the future if needed. We have tested the proposed Grammatical Swarm algorithm in two real problems of oneyear ahead energy demand estimation from macroeconomic variables, for the case of Spain and France. In both cases, the proposed approach is able to obtain models with an excellent performance in terms of prediction error, showing robustness with different distributions of training data, and improving the results of previous algorithms in one of the problems.