Uncertainty analysis of streamflow drought forecast using artificial neural networks and Monte-Carlo simulation

Authors


ABSTRACT

In this research, two scenarios of drought forecast were studied. In the first scenario, the time series of monthly streamflow were converted into the Standardized Hydrological Drought Index (SHDI), a similar index to the well-known Standardized Precipitation Index (SPI). Multi-layer feed-forward artificial neural network (FFANN) was trained with the SHDI time series to forecast the hydrological drought of Karoon River in southwestern Iran. In the second scenario, the time series of monthly streamflow discharge was forecasted directly and then converted to the SHDI. Principal component analysis (PCA) and forward selection (FS) techniques were applied to remove dependency of inputs and reduce the number of input variables, respectively. Moreover, uncertainty of SHDI and monthly streamflow discharge forecasts were investigated using a Monte-Carlo simulation approach. Findings indicated that the results of the first scenario were considerably better than the second scenario and that the SHDI adequately forecasted hydrological drought. The Monte-Carlo simulations demonstrated that all of forecasted values lie within the 95% confidence intervals.

1. Introduction

Drought is a natural disaster imposing significant impact on socio-economic, agricultural, and environmental domains. Although the definition of drought is not universal, its occurrence is usually attributed to a long and sustained period of scarce water availability (Dracup et al., 1980; Redmond, 2002). Drought imposes severe consequences on all affected regions, in particular arid and semi-arid regions. Different drought classifications have been proposed. Wilhite and Glantz (1985) classified droughts in four categories: meteorological, agricultural, hydrological, and socio-economic. Hydrological drought emerges with a lag after meteorological drought and is generally defined as a period of time in which the amount of available water, river discharge, or reservoir level is less than the normal condition.

Hydrological drought forecasting helps decision makers to lay out mitigation measures within the context of water resources management. Traditionally, statistical time series models have been used for drought forecasting. Simple multiple regression and autoregressive moving average (ARMA) models are typical forecasting models. However, such models are basically linear models assuming that data are stationary. These models have limited ability to capture nonlinearities in the hydrologic data. Thus, hydrologists tend to consider alternative forecasting tools when nonlinearity and nonstationarity exist in the data (Kim and Valdes, 2003).

Most of the reported researches deal with direct streamflow forecasting rather than drought index forecasting. Coulibaly et al. (2000) introduced an early stopped training approach (STA) to train multi-layer feed-forward neural networks (FFNN) for real-time reservoir inflow forecasting. The proposed method takes advantage of both Levenberg–Marquardt back propagation (LMBP) and cross-validation technique to avoid underfitting/overfitting on FFNN training and enhance generalization performance. Overall, the results showed that the proposed method is effective for improving prediction accuracy.

Wang et al. (2006) predicted 1–10 d ahead stream discharge using three types of artificial neural networks. They used six preprocessing methods for input data. Results indicated that standardized data can improve the networks performance.

Aqil et al. (2006) evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. For comparison, a multiple linear regression analysis performed by the Citarum River Authority was also examined using various statistical indices. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13 and 10%, respectively.

Modarres (2007) applied a multiplicative seasonal autoregressive integrated moving average (SARIMA) model to forecast monthly streamflow of Zayandehrud River in western Isfahan province, Iran. Observed and forecasted streamflow showed a drought period but with different return periods.

Fernandez et al. (2009) applied ARIMA model to forecast monthly streamflow in a watershed in Spain. After forecasting 12 leading month streamflow, three drought thresholds including streamflow mean, monthly streamflow mean, and standardized streamflow index were chosen. Both observed and forecasted streamflow showed no drought evidence in this basin.

Raziei et al. (2010) investigated space–time variability of hydrological drought in Iran. They prepared precipitation dataset for the period of 1948–2007 using NCEP/NCAR and GPCC dataset. The aim was detection of long-term trends in drought/wetness time series. Results indicated that there is a good agreement in southeastern and north-western regions, while discrepancies occur for central and Caspian sea regions for two datasets.

Tabari et al. (2012) used Streamflow Drought Index (SDI) for assessment of hydrological drought in northwest of Iran and reported that streamflow volume did not follow the normal distribution. Thus, they tested some other distributions and finally applied lognormal distribution to generate the SDI time series. Results indicated that all the stations experienced extreme droughts, especially in the last decade.

In this research, using the concept and methodology of Standardized Precipitation Index (SPI) meteorological drought index, Standardized Hydrological Drought Index (SHDI) is first introduced for analysis of hydrological droughts. Then, as a first scenario, ANN model is trained with the SHDI time series and later applied to forecast the drought index. In the second scenario, ANN model is trained with the streamflow discharge time series to forecast the discharge first. Then, the forecasted discharge time series is converted to the SHDI and the results of the two scenarios are compared. Finally, the uncertainties of discharge and drought index forecasts are investigated using a Monte-Carlo simulation approach.

2. Study area and methodology

2.1. Study area and data set

Karun River basin, located in southwest of Iran, covers an area of 24 202 km2 at Pole-shaloo hydrometric station (Figure 1). Karun River is the longest river in Iran and originates from Zardkouh mountains and flows down to the Persian Gulf. Monthly rainfall and temperature (minimum and average) data as well as discharge data at Pole-shaloo station for 30 years (from 1974 to 2004) were acquired for this study. Pole-shaloo station is located just upstream of Karun 3 dam. The location of meteorological stations used in this study is shown in Figure 2.

Figure 1.

Location of study area in Iran.

Figure 2.

Location of meteorological stations within and around Karun basin.

2.2. Standardized Hydrological Drought Index

There are several indices used for hydrological drought analysis. Surface Water Supply Index (SWSI), Palmer Hydrological Drought Index (PHDI), and Reclamation Drought Index (RDI) are the most well known. However, there are deficiencies with most indices. For example, determining the weight of variables in the SWSI index is subjective. RDI is a suitable index, but it requires some data that are not measured in all meteorological stations. Similarly, PHDI, mostly used in the United States, is data intensive so that required data may not be available in meteorological stations. Furthermore, PHDI is not a normal index and similar values in two locations do not indicate the same drought intensity. For drought analysis and forecasting, a simple index is desirable.

McKee et al. (1993) proposed SPI for meteorological drought analysis. Calculation of the SPI requires a long-term (at least 30 years) monthly precipitation data. The probability distribution function (PDF) of precipitation is assumed to follow the gamma distribution. The cumulative distribution is then transformed, using equal probability, to a normal distribution with a mean of zero and standard deviation of one. A given precipitation for a specified time period corresponds to a particular SPI value consistent with the normal probability. Positive SPI values indicate greater than median precipitation, while negative values indicate less than median precipitation. For more details on the SPI, one may refer to Wu et al. (2007) and Lloyd-Hughes and Sunders (2002).

By replacing precipitation with discharge in the SPI, provided that other conditions and assumptions remain unchanged, the SHDI emerges. Similar to the SPI, SHDI has a number of advantages. First, it is simple and is based only on discharge data. Second, it can be determined on various timescales. Third, because of its normal distribution, the frequencies of the extreme and severe droughts for any location and any timescale are comparable. And Fourth, SHDI and SPI are similar which eases the conjunctive analysis of meteorological and hydrological droughts.

For classification of SHDI, McKee et al.'s (1993) classification for the SPI is used. In SPI, the PDF of precipitation is assumed to follow the gamma distribution. However, monthly discharge data may not fit the gamma distribution. Log-normal, Pearson, and log-Pearson distributions represent other potential distributions. If log-normal is selected, a simple logarithmic transformation of the data will yield the SHDI series as follows:

display math(1)

where μ and σ are the mean and standard deviation of data (x), respectively.

For Pearson, log-Pearson, gamma, and other distributions, application of appropriate transformation functions to the normal distribution is required. In this research, the approximation proposed by Abramowitz and Stegun (1965) is applied which converts the cumulative probability to the standard normal variable.

For the sake of comparison with other hydrological drought indices, Nalbantis and Tsakiris (2009) abbreviated the SDI to stand for ‘Streamflow Drought Index’. They proposed accumulating discharge at different timescales and later standardizing the values using the mean and standard deviation of the cumulative streamflow volumes. Such standardized values were called SDI. In a different study, Vasiliades et al. (2010) used SDI identical to the Water Balance Derived Drought Index. They simulated the discharge because the study watersheds were ungaged. The procedure to derive SDI was similar to that proposed by Nalbantis and Tsakiris (2009), except that they applied Box–Cox transformation to normalize the discharge time series.

However, for further clarification, the SHDI standing for ‘Standardized Hydrological Drought Index’, follows the general methodology of the well-known SPI. First, the best PDF is fitted to the discharge time series at different timescales. Contrary to the SPI, however, the best PDF does not have to be limited to gamma function. For this purpose, five likely PDFs including normal, log-normal, exponential, gamma, and log-gamma are fitted to the discharge time series and the best PDF is selected based on the Kolmogorov–Smirnov (described in the ‘Nonparametric tests’ section). Then, the discharge series is transformed to an equal probability normal distribution series with a mean of zero and standard deviation of one.

2.3. Artificial neural networks

ANN customary architecture is composed of three layers of neurons: input layer, hidden layer, and output layer (Haykin, 1994). A neuron response is based on the weighted sum of all its inputs according to an activation function. A feed-forward network was adopted for this study as feed-forward ANN has been shown to have a computational superiority in comparison to other paradigms (Hornik et al., 1989). The network was trained by the back-propagation algorithm through the split-validation procedure. Available data was divided into three sets: a training set, a validation set, and a test set. The training set is used to fit ANN model weights, the validation to select the model variant that provides the best level of generalization, and the test set is used to evaluate the chosen model against the remaining data. The number of neurons between two and six was chosen by trial and error. All input and output variables were standardized to 0.1, 0.9 scale as follows (Rajurkar et al., 2004):

display math(2)

where X is the input variable or PCi, Xmin and Xmax are the minimum and maximum values of input variable or PCi and Xn is the standard value.

As there is no single definite evaluation criterion, it is important to apply a multi-criteria assessment of ANN skill (Dawson et al., 2002; Kumar et al., 2005). These evaluation statistics are summarized in a recent paper by Dawson et al. (2007) and may be calculated by Hydrotest, a web-based toolbox, on hydrotest website (http://www.hydrotest.org.uk). We applied 13 criteria as listed in Appendix. Moreover, some other graphical and nonparametric statistical tests were applied for evaluation of the model robustness.

2.4. Input selection

There are some 32 meteorological stations within and around the Karun basin. For the purpose of forecast, SHDI is expected to depend on several input variables such as discharge, precipitation, temperature, and snow reserve of previous months. Although the basin holds considerable snow from around November to April, a reliable and continuous snow dataset does not exist. Thus, temperature could be a viable substitute.

Inverse distance method was applied to determine the average precipitation in the basin for each month of the record. For average and minimum temperature averages over the basin, temperature lapse rate was first determined. Then, the temperature maps were produced by applying the lapse rate over the elevation map.

The basin has a long memory duration. So one expects that the discharge for each month depend on the values of influential variables in the previous months. The candidates for input variables are SHDI, precipitation, discharge, and temperature. The auto-correlation function (ACF) was computed in order to determine the lag of each input variable. Correlograms were derived for all input variables using SPSS16 Software. The r values (ACF coefficient) greater than zero point to possible influence of input variables. For comparing the effect of basin-average data and point data as inputs in the forecast models, the correlation coefficient between SHDI at 1-month timescale (SHDI1) and precipitation and temperature of all stations in the basin were studied.

2.5. Principal component analysis

Principal component analysis (PCA) is one of the multivariate statistical methods which may be used to reduce complexity of input variables while one is confronted with a huge volume of information (Camdevyren et al., 2005; Noori et al., 2010a). Input variables are converted into PCs that are independent, i.e. the information of input variables are presented with minimum losses in PCs (Helena et al., 2000; Noori et al., 2010b). PCs are represented by the following equation:

display math(3)

where Zi represents PCs, ai is the eigen vector and X's are input variables. Eigen values are computed by solving the following equation (Johnson and Wichern,1982):

display math(4)

where, I is the unit matrix, R is the variance-covariance matrix and λ represent the eigen values. PCA involves the following major steps:

  1. Start by coding the variables X1, X2,…, Xp with zero mean and unit variance.
  2. Calculate the Kaisere-Meyere-Olkin (KMO).
  3. Calculate the variance–covariance matrix R.
  4. Find the eigen values λ1, λ2, …, λp and the corresponding eigen vectors a1, a2, …, ap.

More details on the PCA may be found in Manly (1986), Davis (1986), Noori et al. (2007), and Noori et al. (2011).

We applied PCA to identify the main PCs as inputs to the ANN model. Five meteorological stations with highest correlation coefficients were chosen. PCA was applied to the data of these five stations (namely Pole-shaloo, Shahr-kord, Sosan, Sad-Shohada, and Pataveh) in the following manner: Pole-shaloo-PCA, Shahr-kord-PCA, Sosan-PCA, Sad-Shohada-PCA, and Pataveh-PCA. Please note that Pole-shaloo is a location where a hydrometric station is installed on Karun River and a meteorological station is also available nearby. Basin-average-PCA was also studied separately. For SHDI1 forecasting, five ANN architectures were trained with the PCs of five stations as well as with the basin-average PCs.

2.6. Forward selection

Forward selection (FS) has been successfully used by many researchers in order to build robust prediction models (Chen et al., 2004; Eksioglu et al., 2005; Wang et al., 2006; Khan et al., 2007). FS approach is based on linear regression model in the first step that orders the explanatory variables according to their correlation with the dependent variable. Then, the explanatory variable, which is best correlated with the output variable, is selected as the first input. All remaining variables are then added according to their correlation with the output. The variable which most significantly increases the coefficient of determination (R2) is selected as the second input. This step is repeated N−1 times for evaluating the effect of each variable on the model output, where N is the number of input variable candidates. Finally, among N obtained subsets, the subset with optimum R2 is selected as the model input subset (Noori et al., 2010c).

2.7. Nonparametric tests

Nonparametric tests were conducted in order to compare mean, standard deviation, and the cumulative distribution function (CDF) of the observed and forecasted SHDI. Khan et al. (2006) used these statistics to compare different precipitation downscaling methods including ANN, while Modarres (2007) used a nonparametric method to evaluate drought time series forecast with ARIMA. Modarres (2009) also used these statistics as a mean for validating ANN in rainfall-runoff modelling.

Levene's test was used to test if k samples have equal variances. Equal variance across samples is known as homogeneity of variance. Some statistical tests, e.g. the analysis of variance, assume that variances are equal across groups or samples. The Levene's test may be applied to verify such assumption (Khan et al., 2006):

display math(5)
display math(6)

In performing Levene's test, a variable X with sample size N is divided into k subgroups, where Ni is the sample size of the ith subgroup. The Levene test statistic is defined as follows:

display math(7)

where Zij is defined as:

display math(8)

The Wilcoxon matched-pairs, signed-ranks test is used to examine whether two related groups show a difference in central tendency. The null hypothesis is that the central tendencies of two related populations are not significantly different (McCuen, 2002). Thus, the test is an alternative to the parametric two-sample t-test for constructing a hypothesis test p-value.

The Kolmogorov–Smirnov two-sample (KS2) test is used to test the null hypothesis that two independent samples are not different in distribution characteristics. The test can be used for either of the following sampling programs: (1) one population with two independently drawn samples subjected to different treatments, or (2) two samples drawn from different populations and then subjected to the same treatment. The test is sensitive to differences in any distributional characteristic: central tendency or location, dispersion or scale, and shape (McCuen, 2002). Details on nonparametric tests are discussed in McCuen (2002), Modarres (2009), Modarres (2007), and Khan et al. (2006).

2.8. Uncertainty analysis

In order to determine the uncertainty in the SHDI forecast, ANN modelling procedure was implemented in a Monte-Carlo framework as introduced by Marce et al. (2004). Monte-Carlo simulation involves repeated generation of random parameters from their probability distributions, and then computing the statistics of the output. We randomly resampled the input database without replacement for 1000 times, maintaining the ratio between the calibration (training and validation) and test sets. Thus, 1000 SHDI series were generated. The 95% confidence interval of estimation is determined due to the fact that this confidence interval provides more information than other statistics about the range of predictions associated with the model (Noori et al., 2010c) The 95% confidence intervals are determined by finding the 2.5th and 97.5th percentiles of the constructed distribution (Noori et al., 2009).

3. Results

Different distribution functions were fitted to the discharge series of Karun River. Among all functions, gamma showed the best fit. So 1-month SHDI, labelled here as SHDI1, was calculated by transforming gamma distribution function into a normal distribution. Figures 3 and 4 show the monthly discharge and SHDI1 time series, respectively.

Figure 3.

Monthly discharge time series at Pole-shaloo.

Figure 4.

SHDI1 time series at Pole-shaloo hydrometric station.

Figure 4 indicates that a sustained drought occurred from 1998 to 2001. So, this period was chosen as the test period and the remaining time was assigned for training and validation of ANN model. Of 357 monthly data, 291 data for training, 30 data for validation, and 36 data for the test stage were assigned. Precipitation and temperature input data of five meteorological stations were used based on the correlation between the discharge output and precipitation/temperature inputs among all the stations in the basin. Then ACF was calculated for determining the lag of each input variables. SHDI1 with 15-month lag, discharge (Q) with 7-month lag, precipitation (P) with 3-month lag, and temperature (T) (minimum, average, and maximum) with 2-month lag were identified (altogether, 31 input variables). The PCA analysis indicated that 16 PCs explained 94.18% of the total variance of the input variables.

3.1. SHDI forecast scenarios

As Table 1 indicates, there is no significant difference between various types of ANN models. Nonetheless, Shahr-kord station PCs were better than others. Also, the results show that forecasts using point precipitation and temperature data were slightly more accurate than using basin-average data. For further analysis, we trained ANN with eight PCs obtained from Pole-shaloo and Shahr-kord meteorological station data. These eight PCs explained 81.3 and 80.1% of the total variance of Pole-shaloo and Shahr-kord input variables, respectively. While there were no significant differences between ANN architecture in the training phase, the differences were bold in the testing phase (Table 1).

Table 1. Results of SHDI1 ANN training and test phases using PCs and FS
Input data combinationANN networkTraining phaseTesting phase
Number of neurons in hidden layerNumber of inputRRMSEMAERRMSEMAE
Basin-average PCA4160.680.70.510.740.590.42
Basin-average FS630.590.770.560.730.720.55
Pataveh PCA3160.660.720.520.780.560.45
Sosan PCA2160.610.760.530.730.60.45
Sad shohada PCA4160.680.70.50.750.580.46
Shahr-kord PCA6160.690.690.480.820.530.41
Shahr-kord FS670.670.710.510.770.650.5
Pole-shaloo PCA5160.720.660.480.790.560.4
Pole-shaloo FS630.60.760.550.690.740.58
Pole-shaloo PCA680.70.680.490.680.690.51
Shahr-kord PCA380.610.750.540.710.750.67

In the next step, by using FS, suitable input variables were identified for Pole-shaloo, Shahr-kord, and basin-average input data (Table 2). Then, ANN was trained with suitable inputs. Results shown in Table 1 indicate that inputting PCs improve model accuracy. Figure 5 shows the observed and forecasted time series of SHDI1 in the test phase with Pole-shaloo and Shahr-kord inputs (16 PCs as input variables). Statistical criteria are summarized in Table 3 in the test phase of SHDI1 forecast. Results again confirmed that overall ANN models with Pole-shaloo and Shahr-kord input PCs perform best among the 13 statistical criteria. The bold entries in Tables I and III shows the best value for each criteria based on different input combinations. This is also true for Table V.

Table 2. Input variables based on forward selection for various ANN models of SHDI1
Input selection schemeInput data
Shahr-kord FSSHDI1(t − 3), SHDI1(t − 6), SHDI1(t − 9), SHDI1(t − 12), SHDI1(t − 15), P(t − 2), P(t − 3)
Pole-shaloo FSSHDI1(t − 3), SHDI1(t − 9), SHDI1(t − 15)
Basin-average FSSHDI1(t − 2), SHDI1(t − 8), SHDI1(t − 14)
Figure 5.

Observed and forecasted SHDI1 at Pole-shaloo hydrometric station.

Table 3. Statistical criteria in test phase of SHDI1 forecast
 AMEPDIFFMAEMERMSER4MS4ERAEMAREMRER2IoAdCEPI
Basin-average PCA1.450.550.46−0.160.60.80.611.180.660.580.80.510.15
Basin-average FS1.970.240.55−0.390.721.020.7310.60.530.750.31−0.2
Shahr-kord PCA1.580.510.41−0.20.530.740.550.990.410.680.870.620.33
Pole-shaloo PCA1.820.260.4−0.080.560.830.541.47−0.130.620.880.580.27
Sosan PCA1.970.240.43−0.180.590.880.571.390.280.570.840.530.18
Sad Shohada PCA1.530.210.45−0.080.580.80.592.071.760.560.850.550.21
Pataveh PCA1.720.490.45−0.160.560.790.62.94−0.980.610.860.580.26
Pole-shaloo FS1.90.310.58−0.350.740.990.781.820.350.480.680.27−0.29
Shahr-kord FS1.72−0.110.5−0.330.650.90.662.642.520.590.810.440.02

3.2. Streamflow forecast scenario

At first, discharge data of seven previous months, temperature (minimum and average) data of three previous months, and precipitation data of three previous months for Pole-shaloo and Shahr-kord meteorological stations as well as for the basin-average were prepared. Then, ANN was trained to simulate the streamflow with PCA and FS methods applied on the input data. The PCA analysis indicated that eight PCs explained 94.8, 95.1, and 94.6% of the total variance of input variables for Pole-shaloo, Shahr-kord and basin-average, respectively. On the basis of the FS analysis, for Pole-shaloo, Q (t − 1), Q (t − 2), Q (t − 4), Q (t − 7), and Tmean (t − 2), for Shahr-kord, Q (t − 1), Q (t − 7), Tmean (t − 1), Tmin (t − 2), Tmin (t − 3), and P (t − 3), and for basin-average, Q (t − 2), Q (t − 7), Tmean (t − 3), and Tmin (t − 3) were identified as the most influential input variables. ANN was trained to model the streamflow with these input variables. Results in Table 4 indicate that forecasts using point precipitation and temperature data were more accurate than using basin-average. Furthermore, using PCA and FS considerably improved the results and the FS was superior to the PCA in input selection. Then forecasted values were converted to the SHDI. Results, shown in Table 5, indicate that forecasting SHDI is considerably more accurate than forecasting the streamflow first and then converting to the SHDI. All statistical indices are better in the first scenario involving direct SHDI forecast.

Table 4. Statistical criteria in test phase of streamflow forecast
 AME (m3 s−1)PDIFF (m3 s−1)MAE (m3 s−1)ME (m3 s−1)RMSE (m3 s−1)R4MS4E (m3 s−1)RAEMAREMRER2IoAdCEPI
Basin-average FS5.75−3.61.89−0.742.523.32.527.08−2.520.0020.29−7.5−14
Shahr-kord5.33−3.181.55−0.82.152.942.072.840.551E−040.34−5.2−9.8
Pole-shaloo1.93−0.920.78−0.080.941.171.035.26−3.560.150.63−0.2−1.1
Pole-shaloo FS2.39−0.030.640.320.831.150.852.23−1.350.40.770.07−0.6
Shahr-kord FS5.24−3.091.66−0.372.152.92.26.02−2.660.0050.37−5.2−9.8
Shahr-kord PCA1.71−0.110.6−0.120.7610.793.83−0.50.340.750.23−0.4
Pole-shaloo PCA2.36−0.260.980.241.121.361.35.850.240.130.61−0.7−1.9
Table 5. Statistical criteria for SHDI1 obtained by conversion of forecasted streamflow
 AMEPDIFFMAEMERMSER4MS4ERAEMAREMRER2IoAdCEPI
Basin-average FS379.86206.166.8−14.4109.9182.20.560.31−0.20.590.860.580.45
Shahr-kord286.8512079.75−60.4103.3142.290.670.54−0.510.760.90.630.52
Pole-shaloo325.5686.8472.19−27.6103.9162.540.60.39−0.280.670.890.630.51
Pole-shaloo FS347.52106.758.88−8.39105.1169.570.490.25−0.050.660.890.620.5
Shahr-kord FS258.88170.651.39−9.7286.12137.760.430.25−0.040.750.930.740.66
Shahr-kord PCA248.1197.673.74−37.796.32135.510.620.42−0.360.730.890.680.58
Pole-shaloo PCA349.7879.7493.37−7.66131.5187.840.780.49−0.050.510.830.40.21

For evaluation of model robustness, we used the box-plot and probability plot of the observed and forecasted SHDI. These two tests are useful for visual comparison of the upper or lower tail of the distribution of the observed and simulated SHDI. The box-plots shown in Figure 6 implies that Pole-shaloo-PCA-ANN and Shahr-kord-FS-ANN closely match the observed SHDI1, while they could not reproduce the very low SHDIs .The probability plot of the observed and forecasted Pole-shaloo-PCA-ANN and Shahr-kord-FS-ANN SHDI1s were fitted by the method of Blom (1958). The probability plot of the observed SHDI and Pole-shaloo-PCA-ANN and Shahr-kord-FS-ANN SHDI1s for a normal distribution are shown in Figure 7. It is seen that ANN is not able to reproduce the probability of the observed SHDI such that significant differences in the lower and median quartile distribution of SHDI exist.

Figure 6.

Comparison of observed and ANN simulated SHDI1 box plots.

Figure 7.

Normal cumulative probability plot of SHDI1 for (a) observed discharge (b) PCA-ANN-Shahr-kord and (c) PCA-ANN-Pole-shaloo.

Statistical characteristics of the observed and forecasted SHDI are shown in Table 6. Results indicate that Pole-shaloo meteorological data improved the ANN performance in reproducing all the characteristics. For better judgment, nonparametric tests were applied to the forecasted values (Table 7). Results of the nonparametric tests indicate that ANN had acceptable performance in forecasting the SHDI in the test phase. Also, inclusion of Pole-shaloo meteorological data improved the ANN performance. Table 7 indicate that ANN was capable to reproduce standard deviation, mean, and the probability distribution of observed values using Pole-shaloo meteorological data while it is not capable to reproduce mean of observed data based on Shahr-kord meteorological data.

Table 6. Statistical characteristics of observed and forecasted SHDI1
Basic statisticsObserved SHDI1ANN-PCA-Pole-shalooANN-PCA-Shahr-kord
Mean−1.09−1.00−0.89
Median−0.95−0.80−0.72
Standard deviation0.880.830.66
Variance0.770.690.43
Minimum−2.88−2.51−2.06
Maximum0.550.290.04
Table 7. Nonparametric tests results for SHDI1 forecast
Nonparametric testANN-PCA-Pole-shalooANN-PCA-Shahr-kord
Wilcoxon rank sum0.1940.01
Levene's test0.660.0539
Kolmogorov–Smirnov0.640.28

For selecting the best forecasted model, a statistic proposed by Noori et al. (2009) was also used. Statistical criteria such as R2 and RMSE are based on the average error in model simulation and they do not give any information about the error distribution. Discrepancy ratio (DR) proposed by White et al. (1973) is an appropriate statistic:

display math(9)

According to Equation (4), if the DR is zero, the predicted value is identical to the measured value. If the DR is larger (or smaller) than zero, the predicted value is overestimated (underestimated). DR is commonly used as an error measure in the literature (Seo and Cheong, 1998; Kashefipour and Falconer, 2002; Tayfur and Singh, 2005). However, DR cannot be used for negative and zero observed values. To remedy this problem, Noori et al. (2009) proposed the discrepancy ratio (DDR) statistic defined as:

display math(10)

For better judgment and visualization, the Gaussian function of DDR values could be calculated and illustrated in a standard normal distribution format. DDR values must be first standardized and then, using Gaussian function, the normalized values of DR are calculated. Results have shown in Figure 8. Results indicate that using Shahr-kord station data improve the forecasting and there is less error in forecasting values.

Figure 8.

Standardized normal distribution graph of the DDR values for the PCA-ANN-Pole-shaloo and PCA-ANN-Shahr-kord in testing phase of SHDI forecasting.

3.3. Uncertainty analysis

Besides statistical measures, a practical way of quantifying the accuracy of the forecast is by estimating the confidence interval of prediction, i.e. an interval that contains the observed value with a fixed probability α (e.g. 95%). The wider the interval, the smaller is the accuracy of the forecast and vice versa. After forecasting the SHDI1 time series, the Monte-Carlo uncertainty analysis was conducted. Figure 9 shows the 95% confidence intervals for the SHDI1 time series forecast.

Figure 9.

Confidence intervals for the SHDI1 forecast based on Shahr-kord meteorological input data.

It is seen that all forecasted values lie within the confidence intervals. So it may be inferred that ANN appropriately predicted the SHDI values. However, not all observations lie in the confidence intervals; some 8 of 36 values are out of the intervals. According to Figure 9, the upper bound was predicted properly and none of the observations lie above the upper bound. On the contrary, the lower bound could not contain some of the observed SHDI values and the model has thus overestimated the extreme low values.

4. Conclusions

Hydrological drought forecast is essential for water resources managers. Most hydrological drought indices are difficult to compute or subjective in parameter determination. For this reason, an index similar to the well-known SPI meteorological drought index was proposed in this paper. The index, abbreviated by SHDI, suits quantification of hydrological drought and is relatively easy to determine. Moreover, ANN framework was trained to forecast the SHDI. The PCA and FS procedure and the ACF analysis were applied to identify effective input variables.

Comparison of various input combinations including point and basin-average precipitation and temperature input data indicated that the use of point data is sufficiently accurate. PCA data reduction and FS input selection methods improved SHDI and direct streamflow forecasts over other cases, respectively. For better judgment, several statistical criteria were also computed. Statistical criteria were found as not sufficient for model evaluation as far as dealing with error distribution. Thus, graphical and nonparametric test were used to evaluate the accuracy of forecasts. One of the most important findings was that forecasting the SHDI drought index was considerably more accurate than directly forecasting streamflow and then converting the values to the SHDI.

Furthermore, Monte-Carlo uncertainty analysis was performed on the forecasted results. It was concluded that the ANN could properly forecast the hydrological wet periods while a few of the observed extreme low droughts fell outside of the lower uncertainty band.

Appendix: A. Abbreviations and formulations of global criteria used in this study

Qi: observed value, math formula: modelled value, math formula: mean of the observed data, math formula: mean of the modelled values.

Absolute maximum error (AME):

display math(A1)

Peak difference (PDIFF):

display math(A2)

Mean absolute error (MAE):

display math(A3)

Mean error (ME):

display math(A4)

RMSE:

display math(A5)

Fourth root mean quadrupled error (R4MS4E):

display math(A6)

Relative absolute error (RAE):

display math(A7)

Mean absolute relative error (MARE):

display math(A8)

Mean relative error (MRE):

display math(A9)

Correlation of determination (R2):

display math(A10)

Coefficient of efficiency (CE):

display math(A11)

Index of agreement (IoAd):

display math(A12)

Coefficient of persistence (PI):

display math(A13)

Ancillary