Corresponding author: S. Sadri, Department of Civil and Environmental Engineering, Princeton University, Engineering Quad, Princeton, NJ 08540, USA. (firstname.lastname@example.org)
 The objective in frequency analysis is, given extreme events such as drought severity or duration, to estimate the relationship between that event and the associated return periods at a catchment. Neural networks and other artificial intelligence approaches in function estimation and regression analysis are relatively new techniques in engineering, providing an attractive alternative to traditional statistical models. There are, however, few applications of neural networks and support vector machines in the area of severity quantile estimation for drought frequency analysis. In this paper, we compare three methods for this task: multiple linear regression, radial basis function neural networks, and least squares support vector regression (LS-SVR). The area selected for this study includes 32 catchments in the Canadian Prairies. From each catchment drought severities are extracted and fitted to a Pearson type III distribution, which act as observed values. For each method-duration pair, we use a jackknife algorithm to produce estimated values at each site. The results from these three approaches are compared and analyzed, and it is found that LS-SVR provides the best quantile estimates and extrapolating capacity.
 Reliable at-site estimation of drought quantiles from a small number of observations is important for many engineering projects such as emergency planning, reservoir management, pollution control, and risk calculations. Traditionally, this type of drought quantile estimation is done with parametric methods. In order to do this, one must assume that extreme events are drawn according to a probability density function from some known parameterized family. In practice, there are important problems associated with this assumption. For example, it is clear that drought processes are governed by complex interactions between atmosphere, surface water, soil, vegetation, and groundwater; this suggests it is unlikely that any known parameterized distribution is a true explanation of the drought quantiles.
 Researchers have applied several different nonparametric approaches for quantile estimation in hydrology. Nonparametric approaches build a more complicated regression between input and output data. For example, Apipattanavis et al.  presented a nonparametric approach based on local polynomial regression for estimating a flood quantile function. They applied the method to synthetic data from mixtures of conventional distributions and to flood records that exhibited mixed population characteristics from the Gila River basin of Arizona. Moon and Lall  used a kernel estimator to predict flood quantiles at a gauged site. During the past decades, applications of machine learning theory and neural networks have emerged in function estimation and regression analysis. The support vector machines (SVMs) of Vapnik , based on kernel distributions and supervised learning, are recognized as one of the most powerful tools in machine learning and risk minimization. Shu and Ouarda  used artificial neural networks (ANN) for flood quantile estimation at ungauged sites.
 This paper explores the functionality of nonparametric approaches in drought quantile estimation at ungauged sites. We apply radial basis function (RBF) neural networks and least squares support vector machine regression (SVR) as nonparametric methods and compare the results from these two to the traditional regression model, the most commonly used method for quantile estimation at ungauged sites. Linear regression has been commonly applied for this application, but it has disadvantages such as not fitting the observed data very well, or diverting from tails in skewed data [Haghighatjou et al., 2008]. Both RBF and least squares SVR have rarely been used in extreme quantile estimation and, to the authors' knowledge, least squares SVR was never used in drought quantile estimation. Studies relevant to RBF are those of Shu and Burn , Dawson et al. , Reed and Robson , and Ahmad and Simonovic . Examples of research using SVR are given by Tripathi et al. , Ghosh , Lin et al. , and Asefa et al. .
 We use a jackknife experiment to train and test the data for all three methods: least squares SVR, RBF, and multiple linear regression (MLR). In the end, we compare the methods and their effectiveness for quantile estimation at ungauged sites.
2. Multiple Linear Regression With Affine Transformation
 Multiple linear regression is one of the most common approaches used for quantile estimation especially in case of flood quantile estimation [e.g., Shu and Ouarda, 2008]. We review it since it is one of the methods we evaluate and also since it is a subroutine in the neural network approach of section 3.
 Throughout the paper, we will assume we have N sites, each with Dcharacteristics. In linear regression, the assumption is that there exists a near-linear relationship that explains the relationship between theN × D input matrix X and N-dimensional target vectorY. (In our study we use a multiple linear regression with affine transformation to estimate drought quantiles as a function of site characteristics, so Yn will estimate some fixed quantile of drought severity at each site n.) Linear regression has parameters and models the relationship as
where Yn is the nth target value; and Xnd is the dth characteristic value for nth site.
 Linear regression models are often fitted using the least squares approach. The best fit in the least squares sense minimizes the sum of squared residuals, a residual being the difference between an observed value and the fitted value provided by a model. That is to say, we want to choose the β parameters to
It is well known that this optimization problem is easy to solve, with the optimal β
where is the matrix obtained by adding an all-one column of 1s to the left-hand side andβ is the column vector .
 Important examples of linear regression in hydrology include the following. Zrinji and Burn  and Hosking and Wallis  suggested a linear regression approach to relate a quantile of interest to a vector of catchment's physiographic, climatic and geomorphologic characteristics. Apipattanavis et al.  presented a nonparametric approach based on local polynomial regression, a form of linear regression, for estimating a flood quantile function based on synthetic and historical data. Lefebvre  found that the conditional expectation for the future flow value is an affine function of the current flow value, thus justifying the use of linear regression to predict future flow values. Bako et al. used a recursive procedure for the identification of linear models from input-output data. It was also observed that by appropriately choosing the data assignment criterion, the proposed online method could be extended to deal also with the identification of affine models.
3. Artificial Neural Network
 A RBF NN is an artificial neural network that uses radial basis functions, in other words, normal distributions, activation functions. RBF are used in function approximation, time series prediction, and control. The NN builds a linear combination of radial basis functions. The basic architecture of a RBF network is an input layer, a single hidden layer with radial activation functions and an output layer. Figure 1 shows a schematic representation of an RBF network.
 ANN have been used in the domain of regional flood frequency analysis by Shu and Burn  for index flood and flood quantile estimation. Dawson et al.  applied ANN for flood quantile and index flood estimation for 870 sites in the UK and compared the results with those obtained by the Reed and Robson  models. Ahmad and Simonovic applied ANN for predicting peak flow, timing, and runoff hydrographs of the Red River in Manitoba using five meteorological characteristics as inputs. Application of RBFs became popular in the mid-1980s due to their higher interpolation capability in a high-dimensional space [Ghodsi and Schuurmans, 2003]. To our knowledge, RBFs have not yet been applied to drought quantile estimation.
 Let M be the number of neurons we desire in our neural network, using radial basis functions. (There are other common forms of basis functions such as triangular and univariate.) The training will select, for each neuron , an axis-scaled Gaussian distribution parameterized by center and width (standard deviation along each axial direction). These define radial basis functions from to R,
 The first step in the implementation of the algorithm is the selection of these parameters μ and ν. In this study, the center and the width parameters of the Gaussian mixture components are estimated via a numerical local search estimation-maximization (EM) algorithm, whose goal is to find the mixture ofM Gaussians most likely to have generated all of the data points At this point we are not using any of the target output values; also, we discard the probabilities generated by the EM algorithm and just keep the parameters defining the M Gaussians.
 For the second half of the NN approach, μ and ν will remain fixed. The M Gaussians describe M coordinates for each site . For succinctness write , an matrix, for the collection of these coordinates. Then the second half consists of doing a linear least squares fit from to Y, where and is the target characteristic for site n. In other words, we will select a matrix and a scalar to have a minimum least squares fit between and Y. So for a new site , the model estimates that its output value is , or in other words
3.1. Overfitting and Underfitting
 One of the critical issues in using RBF networks is selecting a number M of neurons/basis functions that show good performance on both training and testing data. A set of N training data vectors can be modeled exactly with N RBFs. Although such a model follows the training data perfectly, the model is not likely to be a true representation of unseen testing data. For an optimal training performance of the network, the hidden layer nodes should be optimized [Karray and De Silva, 2004]. The optimal number M of bias function is usually much less than the number, N, of data points.
 Generally, the mean-square error of training decreases as the number of neurons increases but if we add too many neurons the testing mean-square error starts to increase due to overfitting. The number of basis functions should be chosen so that the mean-square errors (MSEs) of both training and testing errors are close to zero but also close to each other [Ghodsi and Schuurmans, 2003]. In practice, we repeat the RBF algorithm, described above, for a selected range of different values of M (the number of neurons).
 Moreover, one of the qualities of RBF is that the unsupervised learning part of the algorithm (the first half) is not designed to find the global optimum, so every time it is run a different output can be obtained. We ran the test several times with the same training data and averaged the error of the predictions, in order to get a more reliable estimate of the error level for each value of M. Then, we selected the best one and trained one more network with that value of M for use in our model.
4. Least Squares Support Vector Regression
 In support vector regression, the function that we fit to the data points is a linear combination of “kernels” centered at each input point. SVR is based on support vector machines, classifiers introduced by Vapnik : given some input points and their classifications into discrete groups, they predict classifications for further sample points. Both SVM and SVR seek an optimal hyperplane in the feature space; the main difference is that in SVM we try to put each example point on the correct “side” of the hyperplane and maximize the separation, whereas in SVR we try to keep the points close to the hyperplane, minimizing the separation. Figure 2 shows an SVM classification. Least squares SVM and SVR methods are modifications of Vapnik's original approach by Suykens et al. . We used the LS-SVMlab software package [Suykens et al., 2011] as the basis of our implementation.
 For the LS-SVR method, we define for each site a basis kernel function . Choosing radial basis functions means this function is Gaussian with center Xm, so
where the standard deviation parameter σmust be fixed prior to performing LS-SVR.
 Recall that M was a fixed parameter much less than N in the neural network approach (section 3). For LS-SVR, the mathematical program that underlies the fitting will actually set the coefficients for most of the basis functions to zero. So implicitly, we will also haveM much less than Nin the LS-SVR approach. Least squares SVR performs linear regression in the high-dimensional feature space and at the same time, tries to reduce model complexity by minimizing the norm of the linear map [Vapnik, 2006]. The mathematical program used to compute LS-SVR includes a parameterc which trades off between model complexity and fitting the training data; we will explain in section 4.2.
4.1. Previous Applications
 SVM have been used in hydrology to solve several different problems, although not specifically in hydrologic quantile estimation. Some of research conducted using SVM in hydrology include the work of Tripathi et al. who proposed a least squares SVR approach for statistical downscaling of precipitation as the output of the General Circulation Model (GCM) at the monthly time scale in India. For this model, extreme rainfalls were extracted and used as input and the contemporaneous precipitation observed were used as output. The results showed the success of SVM over other downscaling techniques such as multilayer back-propagation artificial neural networks.Ghosh  used a least squares SVR to estimate the parameters for statistical downscaling to compute the rainfall scenarios for multiple GCMs. Lin et al.  presented least squares SVR as a promising method for hydrological prediction and used least squares SVR for time and site specific forecasts. Asefa et al. applied least squares SVR for forecasting of seasonal streamflow volumes and hourly stream flows. Overall, the literature suggests that compared with other models such as autoregressive moving average (ARMA) and ANN, least squares SVR has shown to be stronger for prediction of long-term river flow discharges.
4.2. Computing Least Squares Support Vector Regression
 Recall that we have fixed a variance σ, so that Then LS-SVR postulates a linear explanation ofY in terms of Φ,
However, it also postulates that the function on the right-hand side above should not be too complicated. In order to capture this, LS-SVM regression is formulated as minimization of the following function ofw:
 This is a nonlinear problem with a unique optimal solution that is solved by means of convex quadratic programs. Here c is a fixed parameter that determines the model complexity. Notice that the objective function is a linear combination of squared training error and the squared norm of w (the model complexity). At one extreme if c = 0, then the program has only the trivial optimum where , since this function has minimum model complexity. At the other extreme if c is very large, then the objective is to minimize the training error only, without regard to model complexity, leading to overfitting. In between, we obtain functions where most, but not all, of the wm values are equal to zero. For details on solving this mathematical program and using other choices of kernel functions, see Suykens et al.  and Smola and Schlkopf .
 In this study, drought is defined based on the “theory of runs” of Yevjevich  on the basis of differences between the water supply and water demand. Based on this theory, hydrological drought occurs when the magnitude of a discrete series of streamflow variables at a given time is smaller than some demand level called the “truncation level.” A drought event begins when the flow level drops below the truncation level and ends when the flow level exceeds it. The cumulative volume of water deficit (the truncation level minus the actual flow level) during the months that water level was below the truncation level defines the severity of a drought event.
 Due to seasonal variation of the streamflow, we used a truncation level that depended on the month of the year, as suggested by Kjeldsen et al. : the truncation level for a given calendar month was equal to the average (index) of all streamflow values for that month in all years in our data set. Each site gets its own (periodic) truncation level in this way. Figure 3shows an example of one drought event in a section of the streamflow series of the Athabasca River in Alberta, Canada. The red line shows the truncation level (the long-term average of average monthly flow levels from 1913–2006), while the blue is the flow of each month.
5.1. Data Set
 Our data set consists of unregulated streamflow records from monitoring sites in the Canadian Prairies and nearby areas in the provinces of Alberta, Saskatchewan, and Manitoba. We started with an initial pool of monthly records from 59 sites (22 from Alberta, 18 from Saskatchewan, and 19 from Manitoba), taken from Water Survey of Canada's Archived Hydrometric Data Web site (http://www.wsc.ec.gc.ca/applications/H2O/index-eng.cfm) in Canada's National Climate Data and Information Archive (http://www.climate.weatheroffice.gc.ca). We then filtered the stations to keep only those meeting these criteria: (1) At least 20 drought events were recorded, following suggestions ofHosking and Wallis . (2) The site is a nonregulated river, meaning that there has been minimal human-related interference with the flow regime. (3) The data at the gauging station passed tests of homogeneity, stationarity, and independence. There were 36 sites meeting all three criteria;Figure 4 shows locations of the original pool of 59 sites and the 36 selected sites.
 The record lengths of flows for these stations vary from 20 to 105 years all ending in year 2007. Drought is related to complex interactions between physiographical, meteorological, and hydrological phenomena. Initially, several different characteristics were investigated for correlation with drought severity-duration curves. Three characteristics were found to be very relevant to drought: (1) drainage area (DA; km2), as the physiographical input, (2) mean annual precipitation (MAP; mm yr−1), as the meteorological input, and (3) mean runoff (MRO; mm yr−1), as the hydrological input
 Thus D = 3 in our studies; i.e., each site was represented by a length 3 vector Xn of characteristics, and an output Yn, the severity of site n for a fixed return period. Table 1 shows the summary statistics of the selected characteristics of the 32 sites from the date of record to end of 2007. A Pearson type III (PIII) distribution was selected for the pattern distribution of severity data for each site. This was based on a L moment parametric approach fitting several distributions which PIII showed the best fit among the rest. This finding that negative run sum statistics are best described by a PIII distribution is similar to the findings of other recent studies which have explored the probability distribution of low streamflow series such as Kroll and Vogel . Three severity levels, corresponding to three desired return periods of 5, 10, and 50 years, were calculated for each site. After obtaining the scatterplots, it was decided to remove sites 1, 2, 3, and 11 from the quantile estimation procedure as they appeared to be outliers, leaving 32 sites (15 from Alberta, 8 from Saskatchewan, and 9 from Manitoba) for the analysis.
Table 1. Summary Statistics of the Study Data
Drainage area (km2)
Mean annual precipitation (mm yr−1)
Mean runoff (mm yr−1)
Figure 5 shows scatterplots of site characteristics and each corresponding severity quantile (i.e., input and output for training of models). The summary of severity statistics of droughts for these three selected return periods from the 32 remaining catchments is presented in Table 2.
Table 2. Statistics of the Study Data
Severity for Assigned Return Period
S5 (106 m3)
S10 (106 m3)
S50 (106 m3)
5.2. Experiment Design
 Overall, there were nine experiments, one for each combination of return period (5, 10, or 50 years) and estimation technique (linear least squares, neural network, and support vector machine). For each experiment the input was the matrix X of characteristics with and the length N vector Y of severity quantiles.
 To simulate the modeling performance on ungauged sites, a jackknife cross-validation procedure was used. Jackknifing means that the estimated value at each siten was obtained by temporarily removing site n from the data set, running the model on the remaining N – 1 sites, and then simulating the model at site n.
5.3. Evaluation Method
 The performance of each drought frequency analysis model is evaluated based on three indices: , and For them let N be the total number of drought events; qi the observed (severity) value; the estimated value; and the mean of all observed values.
 The root-mean-square error is similar to the standard deviation of a distribution:
Better fits are indicated by having a lower value.
 The of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated:
A close to zero is preferred in a model.
 The criterion provides an overall assessment of the quality of estimation:
values close to 1 show a near-perfect estimation, with values 0.8 and higher generally acceptable.
 Overall, the result of SVR training for radial basis functions depends on two parameters, σ and c. We used a logarithmic grid search of c and within an assigned range, and found that pair (c = 323, ) was optimal for our data set, using the grid search function in the LS-SVMlab package. The grid search is illustrated inFigure 6.
 For the neural network method, Figure 7 shows the diagram used to optimize the number of hidden nodes M ranges between 1 and 12 for our data, at the period length of 50 years. In this case, once the hidden nodes exceed 4, the overfitting of the training data set begins, so M = 4 neurons was chosen as optimal, since it minimizes the testing error, and has training error very close to the testing error. For return periods of 5 and 10 years, M = 7 nodes was optimal. This shows that the design of RBF networks has to change with the change in the input data.
 The summary statistics of the results of our experiment are shown in Table 3, and the scatterplots of the predicted versus actual values are shown in Figures 8 through 910. The results indicate that in general least squares SVR might tend to overestimate the targets especially at the lower quantiles while MLR has a tendency to underestimate the predicted values at the lower quantiles and overestimate the predicated values at the larger quantiles. Nevertheless, least squares SVR showed the best results in terms of Least squares SVR model shows the best values and MLR and RBF models show poorer results although both do an acceptable job in estimation of severity.
Table 3. Regression Results Using Cross Validation
RMSE (106 m3)
BIAS (106 m3)
 To compare the extrapolation ability of the models, the quantile located at the outmost part of the sample can be considered. For the highest return period of 50 years (tail of distribution) SVM model showed an excellent ability in extrapolation. Interestingly, RBF shows a competitive ability in extrapolation and therefore, in prediction capability. Overall, least squares SVR showed a better extrapolation capacity than RBF for all 3 return periods. MLR showed weaker prediction ability since it overestimated the outermost severity.
7. Conclusions and Summary
 In this paper, we evaluated three quantile estimation techniques for ungauged sites: RBF, least squares SVR, and MLR. Least squares SVR outperformed the other two methods although the other two methods both gave acceptable results. The RBF approach showed a weaker result statistically but good quantile prediction at the tail of the distribution. The least squares SVR seems to be a good tool for quantile estimation of drought at ungauged sites since it guarantees a deterministic solution once c and σ are fixed. Future experiments are required to investigate applications of least squares SVR in drought severity estimation since the power of intelligent algorithms can be enhanced by balancing the number of the training data and the dimensionality of the input data.
 Input characteristics in this paper were selected with care, starting with a larger selection of possible related characteristics, and keeping the ones that showed a good linear correlation with severity quantiles. It was observed that the prediction output got better by eliminating the inputs less linearly relevant to the output.
 Using least squares SVR is an attractive alternative to the standard multivariate regression methods so widely used in regional hydrology. There is a need for further evaluation of least squares SVR in future hydrologic regionalization studies. Although the paper only deals with the drought estimation problem, the methodology could be extended for estimation of quantiles at ungauged sites for a wide range of other problems.