Application of artificial neural network for forecasting standardized precipitation and evapotranspiration index: A case study of Nigeria

The necessity to perform an accurate prediction of future characteristics of drought requires a robust and efficient technique that can deduce from historical data the stochastic relationship or dependency between history and future. In this study, the applicability of the artificial neural network (ANN) is used for forecasting the standardized precipitation and evapotranspiration index (SPEI) at 12‐month timescale for five candidate stations in Nigeria using predictive variable data from 1985 to 2008 (training) and tested data between 2009 and 2015. The predictive variables are monthly average precipitation, average air temperature, maximum temperature, minimum temperature, mean speed, mean solar radiation, sunshine hours, and two large‐scale climate indices (Southern Oscillation Index and North Atlantic Oscillation). From the several combinations of the input variables, training algorithms, hidden, and output transfer functions, a total of eight model runs stood out using a three‐layer ANN network. The most efficient ANN model architecture had 9,8,1 as the input, hidden, and output neurons, respectively, trained using the Levenberg‐Marquardt training algorithm and tansig as the activation and hidden transfer functions. Assessment on the efficiency of the model based on statistics indicate that the coefficient of determination, root mean square error, Nash‐Sutcliffe coefficient of efficiency and the mean absolute error ranges between 0.51 and 0.82; 0.57 and 0.75; 0.28 and 0.79; 0.44 and 0.56, respectively, during the testing period. The output of these findings shows that ANN modeling technique can play a significant role as a data‐driven model in forecasting monthly SPEI time series and drought characteristics in the study area, thereby leading to the development of an early warning system for the country.


INTRODUCTION
Drought events are known as a transient and reoccurring meteorological natural phenomenon which starts when there is insufficient rainfall during a period to sustain the agricultural and the water resources sectors. This is a common phenomenon in most regions of the world and it is presently the most drastic natural hazard to contend with because it is not easy to detect or predict its onset and cessation dates. The uncertainties in predicting extreme events such as drought and other natural hazards have led to the loss of lives and properties. 1 For example, in 2018, series of extreme weather events claimed many lives, destroyed crops, livestock, infrastructures, and brought a standstill to public engagements in parts of India. 2 In addition, some of the notable cases of extreme flood events in Nigeria occurred on July and August 2011 in Lagos and Ibadan, respectively, and more recently, in 2012, nationwide flood events occurred. 3 According to IPCC, 4 extreme events with related uncertainties may increase due to climate change, therefore, efforts must be channeled toward developing drought early warning systems in Nigeria so that losses accrued to drought disaster can be minimized. To achieve a good drought preparedness and mitigation plan, timely information on the onset, severity as time progresses, spatial extent, and duration of a specific drought event are very vital. 5 To acquire this information, up-to-date monitoring of drought, which involves using drought indices are necessary. Even though many indices have been developed over the years, standardized precipitation and evapotranspiration index (SPEI) stood out from many because it incorporates very simply the effect of climate change into its computation by adding the effect of potential evapotranspiration (PET) to precipitation, which happens to be the most important variable in drought characterization. Since the world is believed to have entered a period of incessant changes in most of the climate variables; temperature and precipitation most importantly, engineers and researchers that are obligated with the responsibility of planning and managing the distribution of water resources in an area/region must understand the spatiotemporal distributions of precipitation and evapotranspiration and their forecasted trends within the region. Therefore, SPEI forecast models will be significantly helpful to the government or stakeholders in evaluating aridity and drought impacts as a result of high variability in precipitation, temperature, and evapotranspiration. Developing models that depend on the shortage of water can significantly help in managing drought risk, developing mitigation plans, forecasting, and feedback mechanisms 5 for a country such as Nigeria, where there is little information available as regards the spatiotemporal extent of this phenomenon.
Drought episodes have seriously affected sustainable agriculture, people's living condition and the economy of many developing and under-developed countries. 6,7 The impact of changing climate on drought has been reported to have been more frequent in East Asia, South Asia, and the African continent 8 as there are no dependable models developed to understand the phenomenon to a great extent particularly in West Africa region. Moreover, West Africa is situated in a severe drought risk region of the earth. 9 Some of the recent notable studies on drought and climate variability studies in Nigeria and West Africa are. 8,[10][11][12] Despite their meaningful contribution to the study of drought, none of this work was tilted toward developing a model that can be used for predicting a standard drought index to develop an effective drought exigency plan for Nigeria and the West Africa region.
In predicting significant variables of drought such as evaporation, temperature, and rainfall, either one or two kinds of models can be considered: Physical model as known as general circulation models and statistical model (data-dependent). The two has both merits and demerits, but the statistical model has been proven to be more effective at predicting crucial parameters (eg, precipitation) that are responsible for drought events. 5 Several statistical models, ranging from regression, Markov to the more recent machine learning approach have been used for forecasting drought and other climate-related parameters 5,13-21 (Anurag et al, 2020). 22 Despite the achievements recorded by most of these methods; a simple, efficient, and well-established data-driven or statistical model is the artificial neural network (ANN). ANN works like a black box and it does not need detailed information about the input variable unlike the case of physical models. It learns from the strength of correlation between input variables by checking previous trends in the datasets like a nonlinear regression. ANN also has the potential for handling complex datasets which have several interrelated variables. This special attribute with others makes ANN very suitable for predicting complex hydrological and climatological phenomenon such as drought. Though it is not new, it is an enlightening research endeavor especially for the case study selected. Recently, many studies have indicated that ANN can predict drought to a high degree of accuracy. 23,24 Ghumman et al 25 discovered that the ANN model was better than a conventional model for runoff forecasting even when the available data is of a low standard. Hung et al 26 also reported the superiority of an ANN model over a simple persistent method for rainfall forecasting in Bangkok, Thailand. A comparison is done between ANN and autoregressive integrated moving average time series model for the forecasting of streamflow in India by Jain et al 27 concluded that ANN approach performed better. The impertinent outcome of the ANN model for predicting short-term streamflow in Canada within a stochastic deterministic watershed model was indicated by Zealand et al. 28 Deo and Sahin 5 revealed how ANN models were able to predict the monthly values of SPEI very accurately in Australia. Birikundavyi et al 29 also reported that ANN performed better than a conventional conceptual model in predicting the daily streamflow of the Mistassibi River in Canada. Wang et al 30 discovered that ANN performed satisfactorily in the prediction of the daily streamflow from the historical streamflow data without using any external variable such as precipitation and runoff. Morid et al 31 were able to show the efficiency of ANN when it was used for forecasting some drought indices in some selected places in Iran for up to 12 months lead times. Barua et al 32 also showed the ability of ANN in forecasting drought efficiently up-till 6 months into the future using nonlinear aggregated drought index (NADI).
In this study, ANN models were applied for forecasting the SPEI using a combination of some large-scale climate indices (El Nino Southern Oscillation (ENSO) and North Atlantic Oscillation (NAO)) and some other meteorological parameters as the predictor variables. The prediction of SPEI will provide stakeholders in the water and agricultural sectors with a useful tool for monitoring and evaluating the condition of dryness or aridness of a region. 33 A model for drought forecasting using evapotranspiration can play a significant role in the evaluation of the severity and magnitude of drought events and a good indicator for aridity (Yao et al, 2011) 34 since most of the study area is either arid or semiarid. The SPEI has been used for several drought studies in Asia, Europe, and Australia 35-39 but its application in Nigeria for predictive or forecasting purpose is yet to be done.
The essence of this study is in three parts; first, ANN models will be developed and applied for SPEI using a varying combination of meteorological data and some large-scale climate indices, training algorithms, and hidden transfer function using input datasets between 1981 and 2008 and tested from 2009 to 2015. Second, an extensive analysis for testing the performance of the ANN model using vital statistical parameters will be done. Finally, the prediction error (PE) yield will be assessed and evaluated for the test period. The remaining aspect of this report is itemized as follows: The succeeding section focuses on the description of the study area and the details of the meteorological and climate indices. Section 3 will provide the detailed methodology required for computing the SPEI, developing the ANN models and its performance evaluation. Section 4 will summarize the results of the model simulation and discussions, while the last section will conclude the article by giving logical recommendations for stakeholders and the government.

2
STUDY AREA AND DATA USED

Study area
Nigeria is situated between Latitudes 4 • N and 14 • N and Longitudes 2 • E and 15 • E. She shares boundaries with Chad and Cameroon in the east, the Benin Republic in the west, Niger in the north and the Gulf of Guinea in the south. The country has a total land area of 925, 796 km 2 . Her strategic location in West Africa and Africa at large gives the country a wide range of climate variations right from the north down south. 40 The southern part experience more annual precipitation when compared with the northern region. The rainy period in the south usually covers up to three-quarter part of a year or more depending on how close the area is to the Ocean, while the north experiences shorter rainy season, that is, usually less than 7 months. Nigeria is a tropical country, with only two distinct seasons (dry and wet). The dry season lasts from November until March and it is characterized with low humidity and higher temperatures than the wet season because of warm winds from the Sahara Desert. Temperature ranges from 10 • C in Jos Plateau during the harmattan period to as high as over 40 • C during the dry season (Ogungbenro and Morakinyo, 2014). 41 The rainfall events in the southern part of Nigeria are often convectional in nature due to the region's proximity to the equatorial belt. In the south, the annual average rainfall ranges between 2000 and 4000 mm (Oguntunde et al, 2011). 42 The rainfall is usually characterized by two peaks, with a short dry and long dry season occurring between and after each peak, respectively. The first rainy period usually starts in March and terminates in July having its maximum recorded in June. This period is usually followed by 2 to 3 weeks break in August. The second rainy period takes off toward the end of August till the end of October or early November having another peak in September. The main dry season set out around early November till early March. Nigeria can be divided into five agroecological zones based on rainfall distribution pattern ( Figure 1). The agroecological zones classification (from north to south) are Sahel savannah, Sudan savannah, Guinea Savannah, Tropical Rain Forest, and Swamp Forest. The annual rainfall ranges between 300 and 700 mm; 700 and 1100 mm; 1100 and 1500 mm; 1500 and 1900 mm; greater than 1900 mm for Sahel savannah, Sudan  savannah, Guinea savannah, Tropical Rain Forest, and Swamp forest zones, respectively (Oguntunde et al, 2011). A meteorological station was selected for modeling purpose from each agroecological zone based on the availability of data and the representativeness of the station within the zone. Table 1 shows the detailed geographical and climate characteristics of the study area.

Data acquisition
Daily series of rainfall, maximum and minimum temperature data were obtained from the Nigeria Meteorological Agency (NIMET) Headquarters in Abuja, Nigeria and aggregated into monthly datasets. Data from five meteorological stations, that is, one from each of the five agroecological zones (Sahel savannah, Sudan savannah, Guinea savannah, Tropical rain forest, and Swamp forest) of the study area between January 1981 and December 2015 were used for this study (Table 1).
Before the data were utilized for analysis, they were subjected to quality control and homogeneity test. Some of which are days with missing values and possible outliers, which might have occurred due to human or measuring equipment errors.
The missing values are generally few; for instance, more than 92% of the stations had complete 35 years of daily data, while the remaining 8% of the stations had only a few days data missing. Missing data were generated using artificial rainfall based on cumulative rainfall distributions as described by Haylock and Nicholls. 43 Consequently, both temperature and rainfall datasets have been previously used for climate change studies in Nigeria. 11,40 Since wind speed and sunshine duration datasets were not accessible in most of the stations for the study period, bias-adjusted datasets were obtained from ERA-interim reanalyzed data at a reference height of 10 m for wind speed (ftp://ecem.climate.copernicus.eu). The wind speed data consist of monthly northward wind component (u) and eastward wind component (v) for 35 years, 1981 to 2015. The datasets were extracted at a resolution of 0.5 • × 0.5 • grid. The wind components can be produced by either reanalysis or forecast. 44 For this study, the reanalyzed sets were adopted because they have strong correlation coefficient (>0.8) with observed data as indicated by Oluleye and Adeyewa 45 in their study on wind energy density in Nigeria as estimated from ERA-interim reanalyzed dataset. Therefore, wind speed magnitude (U) was calculated using Equation (1); To ensure that the simulated sunshine duration dataset provided by ERA-Interim are representative of reality, some ground meteorological stations data were obtained to validate the dataset.
The strength of ENSO events is usually estimated by using the Southern Oscillation Index (SOI). It indicates the level of development or intensity of El Nino or La Nina events influenced by the Pacific Ocean. Though SOI has few ways of computation, a method described by the Australian Bureau of Meteorology is given in Equation (2); where Pdiff is the mean Tahiti MSLP for the month-mean Darwin MSLP for the month, Pdiffav is the long term of Pdiff for the month under consideration, SD (Pdiff) is the SD of Pdiff for the month under consideration, MSLP is the mean sea level pressure. The NAO is a phenomenon of the weather in the North Atlantic Ocean that reflects the difference in the normalized atmospheric pressure measurements at the Azores (high) and Iceland (low). These indices vary from year to year but also tends to remain in one phase for a long period. 46 Egbuawa et al 47 and Marshall et al 48 showed that ENSO related activities and NAO may have a significant influence on the rainfall, temperatures, and storms in the tropics or mid-latitudes. The correlation between SOI and rainfall shows the occurrence of dry years in some part of Nigeria. Marshall et al 48 also noted that, the influence of NAO may be far reaching beyond the estimated areas of coverage. The monthly values of the SOI and NAO from 1981 to 2015 were obtained from www.cru.uea.ac.uk/cru/data.

Computation of SPEI
The computation of SPEI is very similar to that of SPI. The major difference is that SPI requires only monthly precipitation (P) for its computation while SPEI requires the impact of evapotranspiration to be subtracted from the precipitation data to account for losses due to changes in the demand for water in the hydrology of an area. According to Vicente-Serrano et al, 33 the steps required in computing SPEI are: The first step is to use an already established PET model such as Penman-Monteith (PM), Thornthwaite, Hargreaves, or others to calculate PET. The choice of the PET model to be selected should depend on the availability of data as the type of PET model used may not significantly affect the output of the SPEI values. 33 The second step is to find the difference between Precipitation (P) and PET values for all the months computed. Equation (3) shows the estimation for a given month i; The calculated Di values can be aggregated at many timescales depending on the type of drought under consideration. Equation (4) shows that the difference in a particular month j and year i based on the selected timescale, k. For instance, the summed difference for a month in a particular year, i with a 12-month timescale is computed according to Equation (5): The third step is to fit D i to a three-parameter log-logistic distribution. The probability density function of a three-parameter log-logistic variable can be expressed as given in Equation (6): where , , and are scale, shape, and origin variables, respectively. Even though, there are several methods for estimating the parameters of the log-logistic distribution; simple, efficient, and the most robust among them is the L-moment procedure. The procedural steps of L-moment can be found in Ahmad et al. 49 Equation (6) can then be simplified further to obtain Equation (7), out of which the SPEI values can be obtained based on the timescale preselected.
In this study, the PET is computed using the popular PM model. 50 The SPEI is computed on 12-month timescale to evaluate the hydrological drought in the study area. Although for forecasting, the timescale selection does not significantly affect the output of the ANN model.

Development of drought forecasting models
The SPEI is adopted for developing the drought predicting model because of its climate change impact assessment capability. The effect of SPEI in the prediction of drought events takes care of the complete dryness within the system as the index puts all hydrometeorological variables (rainfall, PET, temperature, wind speed, and sunshine hours) that influences drought events into consideration.
To develop a drought forecasting model using ANN, the primary step is to determine the model structure and input variables to be adopted. The determination of model structures and the input parameters can depend on the previous relevant studies 5,51 while the optimal model structure and input parameters are usually determined using an iterative or trial and error process. 51 The next step is the calibration of the model. The is aimed at setting the model parameters (eg, the number of hidden neurons, connection weights, and so on) that will produce the desired output result. The last step is to test the trained model to ensure the applicability of the models to a real life situation.

Selection of input variables and data preprocessing
The set of input parameters that will be needed to produce the desired output may not be initially known unlike physically based models. 52 The input parameters are selected after conducting a correlation test among the parameters just to check the linearity among the variables. However, the models are developed using several combinations of the input variables to establish the best model while avoiding overfitting and underfitting problems. 5 the meteorological properties (rainfall, temperature, and others as given in Table 2) and the remaining two neurons are for the climate mode indices (SOI and NAO index). The details of the input parameters are shown in Table 2.

Selection of ANN model structure
To forecast the SPEI time series, NARX models with only one output neuron is used in this study. NARX is a nonlinear concept of the Autoregressive Exogenous which has a standard tool in linear black box system identification. Models developed using NARX have been used in creating many nonlinear dynamic systems that have been adopted for many applications including time series modeling. 55 NARX has feedback connections by enclosing many layers in the network. More details about the NARX neural network can be found in relevant kinds of literature. [55][56][57] NARX can predict series y(t) given the p past values of series y and other external series x(t). The NARX network behavior for time series prediction can be modeled using Equation (8).
In this study, NARX with several inputs for the SPEI time series at time t − 1, y(t − 1), and another input with exogenous data at time t − 1, x(t − 1), to provide a single output data y(t), corresponding to the value of SPEI is adopted.

ANN model development and network architecture
More details of the ANN theoretical framework can be found in relevant textbooks or some published articles. 5,31 In this study, all simulations are performed on MATLAB 2013a platform. Figure 2 shows the ANN model topological structure adopted for the development of the network. To get an efficient and the best model, the three-layer network which comprises of the inputs, the training algorithm, hidden transfer function, and output transfer function are systematically varied. However, since no mathematical formula or any rigid rules is guiding the selection of a neuronal structure in the hidden layer of the ANN model, the number of neurons in the hidden layer is determined using trial and error. 53 A maximum of 12 neurons are used for the network architecture. To determine the optimum architecture, avoid overfitting, and underfitting, various combinations of the input, hidden layer and output neurons are tried. These resulted in eight models run having different combinations for each model architecture using two training algorithms, one transfer and output function. Bayesian regularization backpropagation (trainbr) and Levenberg-Marquardt (trainlm) training algorithms are adopted for the study. The advantage of these algorithms over others is that they determine the correct combination that can produce a network that generalizes well by reducing a combination of squared errors and weights. The mathematical relationship between inputs and outputs is explicitly expressed in Equation (9).
From the monthly data of 35 years available, almost 80% (ie, 1981-2008) are utilized for the training and validation phases, while the remaining (ie, 2009-2015) are utilized for testing the ANN model. After training is completed, the final weight matrix obtained is applied to the inputs in the test dataset. After running the model, the outputs are compared with the observed values of the monthly SPEI time series.

Model performance using statistical evaluation
The efficiency of the models in forecasting the monthly SPEI is statistically assessed using four prediction score metrics for the output during the testing period. The statistical tests include; root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R 2 ), 58 and Nash-Sutcliffe coefficient of efficiency (E). 59,60 The mathematical equations for calculating the efficiency of the models are presented below; where SPEI o,i and SPEI p,i are the observed and predicted monthly SPEI in the test period, respectively, i is the starting month, N = 84 (total months in the test dataset).

Assessment of the drought forecasting models
To carry out a thorough assessment on drought in any region, a notable and an accepted drought index must be used. Due to the versatility and the efficiency of SPEI in the face of climate change, it is adopted to develop drought forecast models for all the five agroecological zones in Nigeria (Figure 1). Twelve-month timescale is selected so as account for hydrological drought. However, it is pertinent to note that, the index or timescales may not have a significant effect on the ability of ANN to efficiently develop models that can be used for forecasting purposes. This is because ANN only considers the intrinsic trends in the input variable datasets to develop its models and not the values of the datasets. Several studies have used ANN to develop forecasting models either for drought assessment or other hydrological related predictions, but none has reported that an index or a particular timescale is better forecasted than the other. 5,31,61 The findings general to most of the studies are the input parameters and the ANN architectures adopted. To successfully predict SPEI to a high degree of accuracy, several combinations of input variables are used before arriving at the eight models listed in Table 3. Table 3 presents the ANN models, network architectures as well as the performance of the training and validation phases of the models. Further assessment is done on the developed models for all the five agroecological zones in Nigeria ( Table 4). The essence of this is to substantially evaluate the efficiency/performance and the practicality of the developed models. The goodness of fit of the models for evaluating drought at the five stations is verified using a scattered diagram of predicted and observed monthly SPEI during the testing period (2009-2015) using the linear regression equation.
For the five candidate stations under consideration in this study, the gradient (m) of the linear fit ranged between 0.78 (Lokoja) and 0.93 (Nguru). The correlation of determination (R 2 ) is between 0.51 and 0.82 and the y-intercepts between −0.121 and 0.197 (Table 4). Thus, there is a good concord between the observed and the predicted values. By comparing the percentage of SD between the predicted and observed SPEI values, the lowest magnitude is recorded for Kaduna (0.68%) in relative to the remaining four stations. The least performance is indicated for Ikom (−15.87%). The  the ANN models in predicting SPEI better for Kaduna and Nguru may not be unconnected with the fact that these locations are sited in the Sahel region of West Africa which is known to have a frequent drought episode when compared with other agroecological zones in West Africa. 8,10 In a bid to compare the forecasting skill of ANN in this study with other relevant studies where ANN and other machine learning tools were used, we discover that the performance of the ANN models developed is within an acceptable limit. A study by Morid et al 31 on the forecasting of drought using ANNs and time series of drought indices revealed the R 2 of the developed models to be between 0.66 and 0.79. In another study by Barua et al 32 on drought forecasting using ANN for a NADI, the R-value of the best models were 0.75 for the recursive multistep neural network and direct multistep neural network, respectively. Fung et al 13 also recorded R 2 values between 0.86 and 0.91 for SPEI time series as the timescale increases using an improved SVR machine learning models.
For the physical representation of the developed ANN models, Figure 3A-E show the plots of the predicted and observed SPEI time series on monthly basis during the testing period (2009)(2010)(2011)(2012)(2013)(2014)(2015) in conjunction with the PE yield of the ANN models recorded for each month. PE is estimated by finding the difference between SPEI p and SPEI o on monthly basis. When PE is zero, it means there is no difference between the predicted and the observed SPEI time series. On a general note, the visual concord between the predicted and the observed SPEI time series is good during the testing period. Out of the 84 months recorded for the testing period, the least PE (0.145) is recorded at the Kaduna station. This observation is in line with the scatter plot analysis between the observed and the predicted time series and the summary statistics as described in Table 5. The comparison based on the four PE metrics as described in Equations (10) to (13) showing the forecasting skill of ANN model over the five stations is indicated in Table 5. In drought forecasting, the onset, duration and severity of major extreme events are very important. 31 This means that, a good forecasting model should be able to capture all or most of these drought characteristics efficiently. Out of all these characteristics, knowing the onset of drought is more important because it will inform the proper timing of the mitigation measures needed. Further observations from Figure 3A-E show that the onset of drought is captured for all the five candidate stations. This results further buttressed the efficiency of the developed models in this study in reflecting the characteristics of drought events.
In line with previous studies on drought forecasting in Asia, Europe, and Australia, 5,24,62 this study revealed the strength of ANN which is one of the machine learning approaches in providing efficient models that can be used as an early warning system. From Table 5, the mean of the PE during the test period (2009)(2010)(2011)(2012)(2013)(2014)(2015) shows that the optimum model predictions are indicated in Kaduna. For this station, the smallest value of both the MAE (0.4375) and the RMSE (0.5670), and the highest R 2 (0.8246) are recorded. The next better performance is recorded for Nguru. The PE terms for Nguru are only a little different from that of the Kaduna station. The last performance of the ANN models in the study area is recorded at Ikom station, the MAE, RMSE, and R 2 values are 0.48, 0.62, and 0.51, respectively. This result is also in line with the values indicated in Table 4, where the difference in the percentage SD of observed and predicted SPEI is the largest (−15.87%). The Nash-Sutcliffe coefficient of efficiency (E) is also computed based on the values of the simulated SPEI. Considering E, the best prediction is also recorded for Kaduna (E = 0.7940) while the least is recorded for Ikom (E = 0.28). The average MAE, RMSE, R 2 , and E values for all the five candidate stations are 0.48, 0.63, 0.65, and 0.55, respectively. Generally, the performance of the developed ANN models for all the stations can be categorized good, because the average values recorded from the error statistical analysis are significant. Figure 4A-E show the frequency distribution of the PE for each station computed during the testing period. The output of PE for Kaduna and Nguru is narrower (ie, −1.0 ≤ PE ≤ +1.5) when compared with other stations, although a few outliers are noted. For instance, there is a case of PE recording 2.50 at Nguru station, but when compared with the frequency of the over predicted and under predicted SPEI time series, larger differences are more evident at Lokoja, Ikeja, and Ikom stations. The model over predicted the SPEI values for all the stations with the highest difference indicated at Ikom where more than half of the months are over predicted, while for under prediction, the SPEI values indicate the highest difference at Nguru. For instance, the percentage of overprediction time recorded for Nguru, Kaduna, Lokoja, Ikeja, and Ikom is 54.76%, 64.20%, 63.10%, 76.86%, and 82.14%, respectively. Based on the obtained results from the under and over predicted values, it is observed that the accuracy of the ANN model prediction for all the stations considered are different. This is in line with some previous studies that considered several geographical locations/regions in their simulations using ANN. 5,31,63 Even though geographical factors may not necessarily influence the performance of ANN models, it may have a significant effect on the predicting variables. This means that before an effective early warning system can be developed for a region, it is necessary to test the effectiveness of the models on several locations or carry out a spatial and temporal analysis in the region using the models.   The spread of the observed and the predicted SPEI time series is shown in Figure 5A-E using boxplot analysis. The boxplot analysis shows the degree of spread in the predicted data by using the quartile values. The lower and upper ends of the outer box capture the lower quartile q1 (25th percentile) and the upper quartile q3 (75th percentile), respectively, while the line that cut the two boxes represent the second quartile q2 (50th percentile) which is the median of the data. The median of the predicted and the observed SPEI for the ANN models is nearly identical for all the five stations. The centrality of the observed and predicted data are close because the ANN models captured most of the characteristics of drought significantly. Some of the important features of a good and efficient drought forecasting model are its ability to simulate the frequency, duration and intensity of notable drought events. 5 Having a good understanding of these characteristics will help in developing an efficient early warning system.
Despite the over and underprediction of SPEI values in all the five candidate stations considered in the study, some significant drought characteristics such as the onset of 2009 drought event are captured at Nguru and Kaduna stations, respectively. In addition, the severity/magnitude and the duration of the drought event are adequately captured across all the agroecological zones considered. This shows that the present ANN models have a good forecasting ability for the SPEI time series and therefore this modeling technique can offer effective and noteworthy advantages to engineers, scientists, and other stakeholders in the water resources sector. Moreover, since the SPEI incorporates the effect of rainfall variability, temperature changes, and losses due to evapotranspiration during its computation, it is a good metric for water resources development/management. Therefore, predictive models as a result of the combined influence of the parameters used in developing the ANN models in this study is significantly important for detecting climate risk events, general agriculture out of which agricultural engineering is a subset and management of the ecosystems. In addition, these models are good for evaluation purposes by stakeholders who need to put into consideration the current soil moisture status and its likely effect on dry spells, moisture losses from the surface of the soil to the atmosphere, impacts on drought/desertification and the general well-being of nature's ecosystem.
In comparison with studies on rainfall prediction in northern Nigeria, 64,65 the prediction score metrics (Table 5) obtained from the ANN models exhibit a relative better performance, as evidenced by the higher R 2 and smaller RMSE. The better performance of the ANN models may be partially attributed to the standard values of the drought index unlike the direct simulation of rainfall values used in the earlier stated studies and the use of some large-scale climate indices.
This study also showed that large-scale climate indices may have a significant impact on the efficiency of drought forecasting models. This is because the best model was formulated considering both local meteorological variables and the large-scale climate indices. However, several studies have shown the predictability of drought using some of these large-scale indices. Goddard et al 66 ; Roundy and Wood 67 reveal the predictability skill of sea surface temperature (SST) of some land surface characteristics, for example, soil moisture and snow cover on seasonal timescales. Hao et al 68 highlighted that the major advancement in severe/extreme drought prediction started with the discovery of teleconnections between SST anomalies and hydroclimatic variables of regions. The combined ocean and atmosphere El Niňo Southern Oscillation (ENSO) phenomenon with a usual period of between 2 and 7 years provide the most important source for seasonal drought prediction. The coverage of ENSO related activities affects the seasonal climate of many regions across the globe. The regions include; North and South America, West, East, and South Africa, India, Indonesia, southwest Asia, and Australia. 69 ENSO effect differs from region to region, thereby making the correlation between this phenomenon and precipitation high in some regions around the globe with seasonal differences [19]. The seasonality of ENSO makes the phenomenon also suitable for the seasonal forecast of drought or dryness of an area. The SST anomaly has been recognized as the external driver of large-scale drought conditions in different regions and this has significantly improved the forecasting capabilities of many areas, especially the ones with strong teleconnection. [68][69][70] For illustration, drought forecast in the United States of America have been hinged on the accurate prediction of Pacific SST anomalies associated with ENSO or Pacific decadal oscillation which often produce the dominating forcing, also, the SST anomaly in the Atlantic ocean and Indian ocean have shown to provide a good predictability skill 71 (Hoerling et al, 2009). 72 To be best of our knowledge, the use of large-scale climate indices and quantification of important variables has not been applied to forecast drought time series using ANN models in Nigeria or West Africa. However, we believe this approach can be helpful to enable effective forecasting and early warning systems.

SUMMARY AND CONCLUSION
The ability to correctly predict the future is one of the most important and difficult endeavors in the field of applied sciences, especially in the area of weather or climate forecasting. The selection of effective predictors from historical dataset requires statistical and computational methods that can better inform reliabilities between the past and near-future values of observed values as well as develop skillful plans to tackle longer future problems. This study has, however, explained using a real situation, the role of machine learning in dealing with forecasting problems. To forecast drought events, ANN is used to predict the monthly SPEI time series using some climate parameters between 1981 and 2015 for five candidate stations in Nigeria. The ANN is calibrated using some hydrometeorological variables and two large-scale climate mode indices as the input variables or predictors and SPEI time series as the output variable. The efficiency of the ANN models is evaluated using standard metrics of prediction, such as the difference between the observed and the predicted SPEI time series, variation of the predicted SPEI, MAE, RMSE, R 2 , and E. The main outcomes of this research are highlighted in the next few paragraphs.
The most efficient model is derived using the Levenberg-Marquardt algorithm (trainlm) for training the predictors, tangent sigmoid (tansig) equation for the hidden transfer and linear equation for the output function. The best neuronal architecture that yielded the best ANN model is formulated using 9-8-1 combination ratio for the input, hidden and output neurons, respectively. The model is selected based on the lowest RMSE (0.125), smallest MAE (0.089), and highest R 2 (0.972) values when compared with the results of the remaining models.
Results from the comparison of the observed and predicted SPEI time series during the test period (2009)(2010)(2011)(2012)(2013)(2014)(2015) indicates that the gradient (m) reflect a good agreement in the range of 0.78 to 0.93, the R 2 is also in the range of 0.51 to 0.82, the highest deviation of simulations from the observed SPEI time series is between 0.74 and 1.27 from the scatter plot analysis. In terms of the performances of the ANN model among the stations, Kaduna station is the best while Ikom station has the worst predictions which also recorded the highest difference in the percentage of SD (−15.87%) between the observed and the predicted SPEI time series.
Considering the error statistical analysis carried out on the mean of all the five stations during the test period, the results show that ANN model did reasonably well in forecasting SPEI values for all the stations. The mean of the five stations indicates MAE as 0.49, RMSE as 0.63 and the R 2 as 0.65. In addition, the value of the Nash-Sutcliffe coefficient of efficiency (E) is more than the average (0.55), therefore making the model acceptable but may need to be improved.
Conclusively, the models developed in this study using ANN indicate a good forecasting skill of the monthly time series of SPEI for all the agroecological zones in Nigeria. The achievement of incorporating the historical observations of hydrometeorological variables with some large-scale climate indices as input variables for forecasting SPEI has given a guarantee that using these variables for future forecasting of drought using data-driven approach will have outstanding performance especially when a longer period of a historical dataset is considered. Despite the PEs detected for all the stations considered, the method adopted for testing various combinations of the training algorithms, hidden transfer and output transfer functions are efficient for engineers and other researchers to test its use in many other geographical areas in the face of climate change, weather, and natural hazard-related studies. However, this study succeeded in optimizing hidden neurons, activation functions, and many unique combinations of training, validation, and testing algorithms. It is therefore recommended that an assessment of each predictor variable as to how significant they have contributed to the outcome of the model would be necessary. In addition, performance evaluation of the models based on the architectures of the network, activation and output functions are necessary. Nevertheless, in terms of drought forecasting, the models with simple design, training, testing, and application stages when compared with physical models are of better value to stakeholders involved in developing models for predicting available water resources, water demand, supply and planning, agricultural sustainability, and general hydrology. Future studies will focus on developing models that can predict drought conditions sometimes ahead with a higher degree of accuracy for Nigeria. An increment in the meteorological stations will also need to be considered for a better spatiotemporal analysis and modeling of various SPEI timescales.

PEER REVIEW INFORMATION
Engineering Reports thanks Charlene Chai Hoon Koo, Abdol Rassoul Zarei, and other anonymous reviewers for their contribution to the peer review of this work.