Spatial assessment of precipitation deficits in the Duero basin (central Spain) with multivariate extreme value statistics


Corresponding author: M. Kallache, IMDEA Water (Instituto Madrileño de Estudios Avanzados Agua), Parque Científico Tecnológico Universidad de Alcalá, c/Punto Net 4, 2 planta, Edificio ZYE, 28805 Alcalá de Henares (Madrid), Spain. (


[1] Nonirrigated agriculture on the Iberian Peninsula is regularly affected by dry periods that can cause important losses. This paper focuses on the comparison of the classical Standard Precipitation Index (SPI) with a fragility index developed by the multivariate extreme value theory community, which is used to describe monthly precipitation deficits below 30.5 mm (about 1 mm/d) in the Spanish Duero basin. The multivariate extreme value model allows to capture relevant information concerning the dependence structure among extreme precipitation deficits. Maps of those extremal dependence summaries and of loadings of principal components of the SPI provide quantitative information for water management. In addition, jointly analyzing data from several stations improves the inference of uncertainty. Spatial patterns of extremal dependence emerged with respect to orographic features. Most severe dry spells occur in the southeast of the Duero basin. In central plain of the Duero basin, a predominantly agricultural area, a strong fragility index for severity of dry spells is particularly found in eastern regions. Results of the MEVT and SPI analysis point in the same direction. Beyond this, the MEVT assessment gives a quantitative measure of the dependence between stations and regions. Estimates of return periods for extreme dry spell severity are discussed. Deficits below 42.7 mm are also analyzed.

1. Introduction

[2] Dry periods are common in central Spain. They mostly affect the agricultural and tourism sectors. Crop yields on the Iberian Peninsula have been severely reduced during dry years [Vicente-Serrano, 2006]. In the case of extreme droughts, the water supply of the whole region is under question, as happened in the mid-1990s for the region of Madrid. In this paper, rainfall deficits of monthly precipitation totals are analyzed for the Duero basin located in central Spain. High rainfall deficits indicate dry periods and thus potentially adverse conditions for agriculture. The main scope of our paper is the analysis of spatial dependence of extreme rainfall deficits.

[3] Dry periods have many facets, such as spatial extension, severity, and duration. Therefore, diverse definitions of a dry period exist, depending on the scope of a study. Intense research on droughts in the last decades leads to a portfolio of drought concepts and drought classifications. Here droughts are commonly seen as deviation from normal conditions [see, e.g., World Meteorological Organization, 1986; Mishra and Singh, 2010]. Precipitation is commonly used to indicate meteorological droughts, river runoff deficits represent hydrological droughts, and a lack of soil moisture is related to agricultural droughts. An overview is given in Hisdal and Tallaksen [2000], Heim [2002], or Keyantash and Dracup [2002]. Another important branch investigates the characteristics of dry spells. Commonly, a dry spell is seen as a period of abnormally dry weather (normally reserved for less extensive, and therefore less severe, conditions than for droughts). Dry spell definitions are usually derived from the definition of a dry day. In general, a common threshold level is used to define a dry day and thus a dry spell, e.g., 0.1 mm/d or 5 mm/d. The level depends on the application at hand [cf. Mathugama and Peiris, 2011; Lana et al., 2008; Ceballos et al., 2004]. In this study, monthly precipitation deficits are analyzed with the Standard Precipitation Index (SPI) and with a multivariate extreme value analysis [see, e.g., Coles, 2001; Beirlant et al., 2004; Resnick, 2007] of cumulative precipitation below 30.5 mm. The focus of our work lies in the application of this novel approach to cumulative precipitation deficits and in its comparison with the common SPI. One motivation for the multivariate extreme value analysis is the creation of dependence maps for extreme precipitation deficits. This important visualization of spatial dependence complements common frequency maps, which document the frequency of occurrence of past dry periods.

[4] The number of application of multivariate extreme value theory (MEVT) to geophysical sciences has been steadily growing during this late decade. To name a few, Blanchet et al. [2009] studied snow cover over Switzerland, Ribatet et al. [2012] and Cooley et al. [2007] estimated precipitation return levels, and de Haan and de Ronde [1998] investigated sea level and wind extremes. Besides those references, there exists a large body of work concerning the modeling and the inference of extremes. In this work, we focus our attention on the so-called fragility index (FI), an indicator of extremal dependence that has been studied by Geluk et al. [2007] and Tichy and Falk [2009] for financial application. This indicator basically counts the expected number of extremes given that another extreme event has already occurred. Section 4.2 provides a precise definition of this probabilistic tool.

[5] The SPI (see equation (1) for details) is a common drought assessment indicator with good performances under various conditions [see, e.g., Heim, 2002; Keyantash and Dracup, 2002]. By applying a principal component analysis (PCA) to the SPI data, regions with similar variability can be identified and according spatial maps provided [cf., e.g., Raziei et al., 2009].

[6] This article is organized as follows. In section 2, the Duero basin region characteristics are described. Monthly precipitation deficits and droughts are defined then in section 3. The MEVT analysis method is described in section 4 and applied to the Duero basin in section 5. For the same basin, the SPI approach is applied and then discussed in section 6. Conclusions are given in section 7.

2. The Duero Basin

[7] The watershed has a surface area of 97.290 km2 and extends 78.954 km2. It is the most extensive watershed of the Iberian Peninsula. The topography of the basin is depicted in Figure 1a. Spatially, mean annual precipitation decreases from North to South. The mountain range which surrounds a topographic depression in the middle of the basin has the largest precipitation intensity. The central zone is very dry, contains most of the aquifer formations and is an important area of agricultural production. Most of the population lives in the central plain, and so water consumption happens mostly here. The volume of average annual precipitation in the complete Duero basin is around 50,000 hm3, of which the majority (35,000) hm3 evaporates or is directly used by the vegetation [Moratiel et al., 2011]. Precipitation shows a marked seasonality. It peaks roughly in autumn and winter and decreases in spring to its lowest amounts in summer [Morán-Tejeda et al., 2011c]. Precipitation from October to December generates soil water reserves and runoff. The dry period coincides with warm temperatures in summer [Morán-Tejeda et al., 2011c]. Summer drought conditions affect 90% of the surface of the Duero river basin [Moratiel et al., 2011]. Rivers in this basin are highly regulated, which hinders stochastic modeling of river runoff. Here we concentrate thus on the analysis of precipitation. In any case, meteorological and hydrological droughts are often well correlated [Lorenzo-Lacruz et al., 2010].

Figure 1.

(a) Elevation and rivers of the Duero basin in central Spain. (b) Parts of subwaterbasins in the middle of the Duero basin, in which agriculture plays a major role. The available precipitation stations are marked with dots.

[8] During the summer months, precipitation is mostly associated with storms and convective systems that occur with high spatial irregularity. In winter, larger and more systemic events impact precipitation. Various studies show a relationship between high values of the North Atlantic Oscillation (NAO) index and the decrease in winter precipitation in the western part of the Iberian Peninsula [cf., e.g., McCabe et al., 2001; Ceballos et al., 2004; Caramelo and Manso Orgaz, 2007]. Morán-Tejeda et al. [2011b] describe moreover the connection between river runoff and the NAO.

[9] The most vulnerable sectors to water stress in the Duero basin are the tourism and the agricultural sector. In 2003, still over 50% of the Duero basin area has been used as cropland [Morán-Tejeda et al., 2011a], mainly for summer production (the cultivation of winter crops is less than 5%) [MARM, 2008]. Though dry, the basin has enough water to allow mostly for nonirrigated agriculture. Official statistics indicate that only about 10% of the area is irrigated. The irrigation season is from May to October [cf. Gil et al., 2011], due to precipitation decrease and increase in evapotranspiration during this period [Moneo Laím, 2008].

3. Indication of Dry Periods: The SPI and Cumulative Precipitation Deficits

[10] The SPI was developed by McKee et al. [1993] and indicates standardized precipitation anomalies. To calculate it, precipitation is commonly fitted by a Gamma distribution whose parameters are estimated at each station and for each month [cf., e.g., Keyantash and Dracup, 2002; Vidal and Wade, 2009; Hayes et al., 1999]. To account for dry events, the cumulative distribution function (cdf), say H(x), is represented by a mixture model

display math(1)

where G(x) denotes the Gamma cdf and q corresponds to the probability of a dry event.

[11] To standardize and compare series at different weather stations, H(x) is transformed into a standard Gaussian cdf. The SPI values are quantiles of this standard normal distribution [Wanders et al., 2010]. In other words, the SPI of the precipitation amount x corresponds to F(−1) math formula, where F(−1) (.) corresponds to the inverse of the Gaussian cdf.

[12] Although there exists no universal drought indicators, Keyantash and Dracup [2002] tested the robustness of 18 different drought indices by means of statistical methods and concluded that the SPI represents the best climatic index for drought identification and for quantification of the severity, duration, and spatial extent of droughts. Compared to other indicators, the SPI success can be explained by its capacity to cope with sparse data. SPI does neither consider soil moisture nor temperatures. Indicators that include soil moisture depend crucially on adequate soil maps with reliable soil textures and associated hydraulic properties [Wanders et al., 2010]. Yet such data are often not available. Improvement of drought indices may also be achieved by the consideration of management and storage effects. Basin managers rather rely on precipitation and runoff variables to determine the onset of droughts [Garrote et al., 2007]. Many complex indices which take storage and management into account are not easily be interpolated across regions and cannot be validated over wide geographical areas.

[13] Droughts are commonly defined as deviations from normal circumstances. For a humid location, the indication of a drought does, therefore, not necessarily imply the need for irrigation measures for agricultural plants. Dry spells are defined as a set of consecutive days with daily rainfall amounts below a fixed level [Lana et al., 2008]. For extreme events, we focus here on cumulative precipitation deficits below a given precipitation level [Engeland et al., 2004]. This approach was originally called “method of crossing theory” [Rice, 1945]. It was extended by Cramér and Leadbetter [1967] and applied in hydrology by, e.g., Yevjevich [1967]. To be able to infer to irrigation needs, here fixed levels will be used, e.g., 1 mm per day [Ceballos et al., 2004]. The undershooted percentile may thus vary from site to site. In order to apply this approach, we need to describe precisely our definition of cumulative precipitation deficits. In particular, we need to chose a level.

[14] Common dry spells levels lie between 0.1 mm/d and 30 mm/d [Ceballos et al., 2004; Lana et al., 2008] and precipitation below 1 mm/d is directly evaporated off. In this paper, we mainly focus on the level of 30.5 mm/month (i.e., 1 mm/d) to define our cumulation deficit. We have also studied a second level of 42.7 mm/month, see the figures and conclusions. Our level choice makes sense for the rather dry basin of the Duero river with average precipitation amounts of 1.72 mm/d, about 53 mm/month.

[15] Let pt be the precipitation amount for month t. Our cumulative precipitation deficit event Di is then defined as the sum of monthly deficits (i.e., when math formula) as

display math(2)

where starti and endi correspond to the starting and ending month of the ith deficit event during the period of interest, respectively. The cumulative precipitation deficit of an event, that is a dry spell, indicates its severity. Figure 2 illustrates this computation. In Figure 3, three SPIs (SPI, SPI3, and SPI6) and the cumulative precipitation deficit are compared for the station “La Parilla” during the time period 1970–1972. The SPIs are derived from monthly precipitation (SPI1), running means of 3 months (SPI3), or 6 months (SPI6) of precipitation and are depicted with lines. The horizontal straight lines indicate the standard SPI drought classification from moderate to extreme droughts [Wanders et al., 2010]. Black triangles and diamonds mark cumulative precipitation deficits (they have been standardized to zero mean and unit variance). For the cumulative precipitation deficit, no running mean over several months is taken. Avoiding this smoothing procedure preserves very low deficits as illustrated in Figure 3. On the other hand, cumulative precipitation deficits result in one single event per dry spell. As precipitation deficits are cumulated for consecutive months, they can get large when a dry period persists. For this example, a dry event lasted about 6 months in autumn/winter 1971 and lead to a high cumulative precipitation deficit. The SPI averages over a fixed number of months. Here, in contrast, the dry period may be cut into several values of moderate amount, depending on the window length chosen for averaging.

Figure 2.

Precipitation at station Valladolid for years 1961–1968 (black line). Connected areas below the levels 30.5 mm/month and 42.7 mm/month indicate dry spells (gray hatched areas).

Figure 3.

SPI1, SPI3, and SPI6 and the cumulative precipitation deficits (standardized, negative) at level 30.5 mm/month and 42.7 mm/month for station La Parilla and years 1970–1972.

[16] Concerning the seasons of interest, we study two time periods, the irrigation period from May to October and the entire year.

[17] An overview of the dry spell characteristics is given in Table 1. The average dry spell lengths are between 2 and 3 months. The number of dry spell occurrences is about the same for irrigation period and the whole year. Dry spells occur frequently in winter, but they are more severe during the irrigation period.

Table 1. Dry Spell Definition Levels and According Characteristics on Average (Minimum-Maximum) Over all Stations and Years
Level (mm/Month)Dry Spell Length (Month)Dry Spell NumberDry Spell Severity (mm/Dry Spell Length)
30.5 (All year)2 (1.3–3.7)147 (60–180, 1–3 per year)33.73 (17.74–68.13)
30.5 (Irrigation period)2.4 (1.3–3.7)68 (48–79, 0.8–1.3 per period)43.59 (18.05–101.93)
42.7 (All year)2.73 (1.4–5.5)147 (83–172, 1.3–2.8 per year)64.21 (27.13–151.29)
42.7 (Irrigation period)3.1 (1.5–5.7)67 (46–81, 0.8–1.3 per period)84.1 (31.37–217.2)

[18] Our time series come from the MOPREDAS database [González-Hidalgo et al., 2010], which include measurements from 1945 to 2005. Those records have been homogenized, gaps have been filled, and outliers have been discarded. To do so, reference series have been calculated from neighboring sites. Details on the procedures are outlined in González-Hidalgo et al. [2010]. A total of 491 stations are available for the whole Duero basin (cf. Figure 1a), and 175 stations from the crop lands in the center of the basin (see Figure 1b). Concerning the temporal clustering of dry spells that can affect the statistical analysis, shifting algorithms have been used to deal with this issue. For details, see Appendix B.

[19] To conclude this section, we note that a strong correlation between dry spell severity and dry spell duration is found in this data set. This leads us to only focus on dry spell severity. Still, commonly frequency or duration of dry spells has been assessed in the past [see, e.g., Mathugama and Peiris, 2011].

4. Modeling Multivariate Extremes

4.1. Defining Extreme Precipitation Deficits

[20] In the previous section, the level of 30.5 mm/month was used to define cumulative precipitation deficits, see equation (2). In this work, we would like to study extreme deficits. This means that another threshold is needed to select a subset of those already low precipitation quantities. In other words, extremes correspond here to very low precipitation amounts that have been thresholded twice, first to define precipitation deficits and second to introduce extreme cumulative precipitation deficits. As a compromise between sample sizes and modeling considerations, the threshold for defining extreme deficits is set to be equal to the 50th percentile of whole year precipitation deficits and for the irrigation period all deficit events have been used. A study of subbasin regions, where station series in those regions have been joined (see section 5.2), is based on the rate of 20% uppermost dry spells (for the whole year and for the irrigation period). To explore the suitability with respect to the expected EVT Generalized Pareto Distribution (GPD) [see, e.g., Coles, 2001], an Anderson-Darling test [cf. Choulakian and Stephens, 2001] has been applied to those extreme deficits. One percentage of the series did not suit the GPD at a significance level of 0.05, which is less than the expected 5%. So, the GPD hypothesis is reinforced. To complement this test, quantile-quantile plots for the GPD [see Coles, 2001] have been inspected for a few stations randomly chosen. Those graphs seem adequate (results are not shown, but available upon request). As one may expect for precipitation deficits, they have an upper endpoint, most of the estimated GPD shape parameters are negative. This endpoint corresponds to the theoretical event of no precipitation during the whole time period.

[21] A prerequisite of applying the multivariate extreme value model is that extremes at each site are independent and identically distributed in time [cf. Coles, 2001]. No significant temporal trends have been found for the region and time period analyzed [Ceballos et al., 2004]. To assess temporal clustering among extreme deficits, the so-called extremal index that measures the reciprocal of the limiting mean cluster size of extremes has been estimated by using the method of Ferro and Segers [2003]. For our excesses, no significant clusters were found. Consequently, we regard those extreme deficits as temporally independent and identically distributed.

[22] Without loss of generality, all precipitation deficits are changed into unit Fréchet random variables by applying a probability integral transform [cf. Ramos and Ledford, 2009; Cooley et al., 2010]. We recall that the unit Fréchet distribution math formula for math formula is max-stable. In the sequel, math formula will correspond to a multivariate random vector with unit Fréchet marginals (other choices for marginals are possible). This framework simplifies the MEVT dependence model and its inference because the marginal behavior can be decoupled from the issue of dependence among extremes [see, e.g., Ledford and Tawn, 1997].

4.2. The Fragility Index FI Inference

[23] The concept of measuring dependences among extremes lays at the core of the FI. While it is trivial to define independence, it is arduous to describe and infer various degrees of dependence or near independence in MEVT. One particular delicate point resides in the subtle case of asymptotically independence. To illustrate this point, suppose that the vector X has only two components and that we are interested in the conditional probability, math formula, of observing a large of X1 given X2 is also large,

display math(3)

[24] If math formula, then X1 and X2 are said to be asymptotically dependent. If math formula, then we are in the case of asymptotic independence [Sibuya, 1960]. Another way to interpret χ is to introduce the limiting expected number of extremes given that one extreme event has occurred already. This number is denoted by N and has been studied by Geluk et al. [2007] and Tichy and Falk [2009]. For the bivariate case, math formula varies between one and two.

[25] In MEVT, it is classical to present all mathematical results in terms of excess above a high threshold or maxima. For our application, we focus on precipitation deficits, and consequently, we study low values under a threshold. Theoretically, it is always possible to multiply by −1. This trick transforms deficits under a low threshold into excesses above a high threshold. For this reason, we follow the conventional way to present MEVT tools, and in practice, those tools will be applied on negative deficits, i.e., excesses.

[26] The asymptotically independent case ( math formula or N = 1) is complex because the definition math formula does not capture anything about the rate of convergence toward zero. For example, if the original vector comes from a standardized bivariate Gaussian random vector with a strong correlation coefficient (say 0.99), it is possible to show that math formula. But this convergence is extremely slow and can only be inferred from samples of enormous sizes. In other words, it would be of interest to measure some second-order information for the case of asymptotic independence. A few alternatives have been proposed in this context. For example, the coefficient

display math(4)

relates the probability of having a joint extreme event to the probability of having any extreme event (joint or not) [see Coles et al., 1999].

[27] Lately, various models which jointly treat asymptotic dependence and independence have been proposed and studied [e.g., Coles and Pauli, 2002]. Here, we will pay a special attention to the work of Ledford and Ramos who extensively studied a very general framework to model the joint tail (survival function) defined by

display math(5)

where math formula represents a bivariate slowly varying function [Ramos and Ledford, 2009; Resnick, 2007]. A fundamental feature of equation (5) is the so-called tail dependence coefficient math formula that encapsulates the strength of asymptotic independence. To see this, one can write that

display math

[28] and deduces from equation (4) that math formula [Ramos, 2003]. Definition (5) also allows for the modeling of the dependence case (η = 1) and complete independence (η = 0.5) and, consequently, offers a large flexibility. One important parametric example for our precipitation deficit assessment corresponds to the η-asymmetric logistic model studied by Ramos and Ledford [2011] (see Appendix A for its definition within a multivariate context).

[29] Coming back to N, the limiting expected number of extremes given that one extreme event has occurred already, its definition of N can also be widened to deal with the asymptotically independent case. This leads to the so-called FI [Geluk et al., 2007; Tichy and Falk, 2009]

display math(6)

[30] An FI = 0.5 indicates thus independent extremes, an FI in (0.5, 1) asymptotic independence and an math formula asymptotic dependence.

[31] For example, the FI can explicitly be computed for the asymmetric logistic model with parameters α and math formula [Ramos and Ledford, 2009]

display math(7)

cf. Appendices A and C for inference and the extension to math formula.

4.3. Inference From Simulations With the Asymmetric Logistic Model

[32] The relation of N and the model parameters has been assessed by means of simulation studies with artificial bivariate data (results not shown) and for the asymptotically dependent case (η = 1). Here the simulation studies indicate a previsible influence of the other parameter estimates on N: In case math formula, the whole spectrum of asymptotic dependence is possible, that is, N lies in (1,2]. The more asymmetric the data is (that is the further away math formula is from 1), the less dependent the data can be. This is expected, strongly asymmetric data have few or no extremes on the diagonal. Moreover, it showed that large differences in the thresholds of the (standardized unit Fréchet) data resulted in low dependence of the data. This result is independent from the underlying distribution of the data and underlines the importance of the threshold choice.

[33] The distinction between asymptotically dependent and asymptotically independent data can be done by means of a modified likelihood ratio test where the complete model is compared to a submodel with η restricted to 1. To test for symmetry, the standard likelihood-ratio test can be used, that is the complete model is compared to submodels with math formula fixed to 1 for all possible combinations of math formula [Ramos and Ledford, 2009]. In simulation studies with artificial data of the same length as the application data, a high capability of the likelihood-ratio test to discriminate between symmetric and asymmetric data has been found (results not shown). We thus applied the test for symmetry and chose the submodel with math formula fixed to 1, when appropriate. The modified likelihood-ratio test revealed also a high power to detect asymptotically independent data. However, in case the data were actually asymptotically dependent, the modified likelihood ratio test accepted too often falsely the hypothesis of asymptotically independent data, that is, η fixed to 1. Thus, in the following, the FI has been set to N, in case η is compatible with being 1 (i.e., 1 lies within the 68% confidence band of η), otherwise math formula.

[34] As an example for the estimation of η and N, χ and math formula are depicted in Figure 4 for stations Aguas de Cabreiroa and Barxa (Figures 4a and 4b) and Aguas de Cabreiroa and Cantimpalos (Figures 4c and 4d). The estimates shown in black have been calculated from N and η. For comparison reasons, empirical estimates χ and math formula, as described in Coles et al. [1999], are added in gray. Aguas de Cabreiroa and Barxa are most likely asymptotically dependent (χ is compatible with being larger than 0 and math formula is compatible with being 1). The according estimate of N is with 1.48 (0.093) high, and the according estimated η is with 0.967 (0.14) compatible with being one (the numbers in brackets denote the standard errors). Aguas de Cabreiroa and Cantimpalos are most probable asymptotically independent. The estimate for η is 0.7 (0.12). For the submodel with fixed η = 1, N is estimated as 1.13 (0.17), which is also compatible with being one. In both cases, the empirical estimates of χ and math formula converge toward the estimates calculated from η and N, as the threshold (x axis) gets larger. It is difficult to set the FI of different sets of stations into relation. When looking for example at the dependence between all three stations, three bivariate dependence measures, and one dependence measure (indicating the dependence between all three stations in their joint tail) can be calculated. However, the latter cannot be used to infer the three bivariate dependence measures.

Figure 4.

(left) χ calculated from N estimates (black line, with 95% confidence bands) and an empirical estimate of χ (gray). (right) math formula calculated from η estimates (in black) and an empirical estimate of math formula. The dependence of (a and b) stations Aguas de Cabreiroa (2978E) and Barxa (2970I) and (c and d) stations Aguas de Cabreiroa (2978E) and Cantimpalos (2199) is measured.

5. Severity of Extreme Dry Spells in the Duero Basin (MEVT Model)

[35] Average precipitation and dry spell severity in the Duero basin are depicted in Figure 5. The highest average precipitation amounts are given in the surrounding mountain range (Figure 5a). The most severe dry spells (on average over the whole time period) occur in the southeast of the basin center, in the crop lands of the Bajo Duero region (Figure 5b). This result is independent of the dry spell level and the season assessed. Accordingly, the (severe) dry spells with level 30.5 mm/month occur more frequently in the topographic depression in the basin center (Figure 5c). For comparison, a level of 42.7 mm/month has also been tested. The dry spells defined with this level happen more frequently in the mountain regions at the edges of the basin (Figure 5d).

Figure 5.

Maps of the dry spell characteristics. (a) Average yearly precipitation, (b) average dry spell severity for level 30.5 mm/month, and average dry spell numbers for (c) 30.5 mm/month and (d) 42.7 mm/month.

5.1. Bivariate Dependence

[36] For the evaluation of the dependence between any two stations in the Duero basin, the threshold for defining extreme deficits is set to the 50th percentile of whole year precipitation deficits. For comparison purpose, in the following, the evaluation is also performed separately on the irrigation period (May to October) where another threshold has been set up to include 100% of the precipitation deficits. Moreover, for those two time periods (whole year and irrigation period, with different thresholds), analyses are brought on two levels (30.5 and 42.7 mm/month) to define cumulative precipitation deficits.

[37] The FI values retrieved from fitting the bivariate extreme value model to any of the combinations of two stations in the Duero basin crop lands (cf. Figure 1b) are visualized in Figure 6. The gray dots denote the FI values. The gap between 0.8 and 1 is due to the shortness of the series, which does not allow for a sharp distinction between asymptotically dependent and independent data (0.2 is on average the standard deviation of the η estimates). It shows that the FIs measuring bivariate dependence decrease with distance in space. For the quite severe 30.5 mm/month level, the polynomial fit of order 3 (black line) reveals a decrease of the speed of decay for very distant stations. For this level, 70% of the stations are asymptotically independent, which is reduced to 60% for the 42.7 mm/month level: These less extreme and longer dry spells are more often asymptotically dependent. For both levels, the asymptotically independent data show a lower dependence-distance slope than asymptotically dependent data. The distance-dependence relation is frequently exploited in geostatistics to simplify the description of dependence. However, here the FI shows a large variability over all distances.

Figure 6.

FI of bivariate assessment for the stations in the Duero basin crop lands (gray dots). (a) For 30.5 mm/month level and (b) for 42.7 mm/month level. A linear fit and polynomial fit of degree 3 with 68% confidence bands are added in black.

[38] To exemplify the spatial pattern of dependence of extreme dry spells in the Duero basin, maps of the dependence with station Castronuño are shown in Figure 7 (the red dot indicates the location of Castronuño). The FI values have been interpolated with inverse distance weighting. Castronuño lies in the middle of the Bajo Duero crop land region, which is affected by the severest dry spells. For this station, strong dependence ( math formula) is spatially less extended for the irrigation period than for the whole year. However, in all cases, nearly the whole basin shows an math formula: The stations are not independent from Castronuño. The dependence of the more severe dry spells (Figures 7a and 7b) is more concentrated in the Western part of the Duero basin then for the dry spells at the 42.7 mm/month level.

Figure 7.

Maps of the fragility index (FI) as measure of bivariate dependence between Castronuño (red dot) and all other stations. In the upper line results for level 30.5 mm/month and (a) all year and (b) the irrigation period are depicted. In the lower line, the same for level 42.7 mm/month is shown (all year, c) and (d) irrigation period.

[39] When looking at maps of other stations (results not shown), spatial patterns in the dependence structure get apparent as well: The FI decays with distance. Furthermore, some stations are clearly connected to the surrounding mountain area and others to the central plain, which shows the influence of topology. However, the spatial patterns are too diverse to deduce the dependence of the dry spell severity from elevation and spatial distance only. When looking at severity extremes of the whole year, larger areas are connected through strong dependence ( math formula) than in the irrigation period. This hints to a more diverse behavior of extremely severe dry spells in the irrigation period and to a reduced influence of large-scale patterns (the NAO, for example). A further climatological interpretation of the results would need an in-depth description and analysis of atmospheric circulation patterns and variability in time. This is out of scope of the presented paper; according analyses are done for example in Vicente-Serrano [2006].

5.2. Dependence Between Crop Regions

[40] Here spatial patterns of dry spell severity will be explored in the center of the basin (see Figure 1b), where agriculture is the dominant land use practice. In the following, these regions are thus called crop regions. Watershed borders are used to separate the crop regions. In this way, the water courses and hydrological systems of the regions are separated. The series of dry spell severity of each region have been joined to a single time series. This series thus represents a dry spell happening anywhere in one of the regions. Dependence between the regions is assessed by analyzing these series. Here the threshold excess rates have been set to 20%.

[41] Results for strong bivariate dependence between the regions are shown in Figure 8. Regions exhibiting asymptotic dependence with an math formula are depicted in the same color. A connection of the eastern regions gets apparent for the 30.5 mm/month level (Figure 8a). The crop land zone of Riaza-Duraton-Alto-Duero is asymptotically dependent with both neighboring sites, but the three regions together are not asymptotically dependent. Therefore, Riaza-Duraton-Alto-Duero is hatched in two colors. For this dry spell definition level, the results for the whole year and the irrigation period are the same. In Figure 8b, results for the 42.7 mm/month level and the irrigation period are depicted. Here, the southern regions exhibit strong bivariate dependence, and even all three southern regions together are asymptotically dependent with an math formula. The Northern part is divided in two dependent zones. The same dependence structure shows for the whole year. However, here no trivariate asymptotic dependence with an math formula occurs. All in all, the regions are more connected when looking at the longer and less severe dry spells at the 42.7 mm/month level.

Figure 8.

Strong bivariate dependence ( math formula) between subwaterbasins in the crop zones of the Duero basin for dry spells defined with (a) a level of 30.5 mm/month and (b) dry spells defined with a 42.7 mm/month level. Regions with similar dependence are hatched in the same color. Extremes in the region hatched in two colors are strongly dependent to extremes in both neighboring regions.

[42] In addition, the joint occurrence of dry spells in all six regions has been examined for the irrigation period and dry spells defined with the 42.7 mm/month level. Dry periods with 1 mm or less precipitation per day and station, which last longer than one month and which cover large areas, might cause severe damage to the agricultural sector. In extreme value analysis, the return period T = 1∕p of such an extreme event is commonly calculated as the reciprocal value of the probability p that such an event occurs [Coles, 2001]. Here different approaches can be used to estimate p and, thus, the length of the return period. In a first attempt, the characteristics of a structure variable X, which is defined as sum of the dry spell severity time series of the six regions, is examined. A GPD is suited to the extremes of this variable, which exceed the threshold q, which is the sum of the 30.5 mm/month thresholds of the single stations [cf. de Haan and de Ronde, 1998]. The probability of an extreme event is, thus, math formula. The according shape parameter estimate is with −0.33 (0.06) negative. For this model, the return period for such a dry spell of on average less precipitation than 1 mm per day and station for the whole region of crop lands (cf. Figure 1b) is estimated to be 1.88 irrigation seasons, that is about 2 years. However, here stations with a lot of precipitation can balance stations with little precipitation. This result can be further refined by using the multivariate extreme value model to describe the joint extremes of the six regions. The FI of the six regions is below 0.5, which indicates negative tail dependence. Nevertheless, there exist 20 joint extreme events, which allows for the examination of the joint tail. For this model, math formula is given, and the return period of a joint extreme event, where in every region precipitation falls on average per station below 30.5 mm/month, is 3.24 irrigation seasons. This return period is longer than the 1.88 irrigation seasons, because here precipitation in the different regions cannot counterbalance.

[43] The MEVT model for the six regions also serves to estimate return periods of joint extreme events in subsets of these regions. The three southern crop land regions Bajo Duero, Cega-Eresma-Adaja, and Riaza-Duraton-Alto-Duero are highly dependent. They have an FI larger than 1.5 for dry spells in the irrigation period and at the 42.7 mm/month level (cf. Figure 8b). As expected, the return periods for dry spells below 30.5 mm/month in solely these three regions are, with 3.12 irrigation periods, shorter than for extremely severe dry spells in less dependent regions. The regions Bajo Duero, Esla-Valderaduey, and Pisuerga-Arlanza, for example, have a small FI in the trivariate analysis. They are not asymptotically dependent. A simultaneous dry period in these three regions is expected every 3.19 irrigation periods. When suiting a trivariate extreme value model to the three southern regions only, that is when having no constraint for the other three regions, the return period for precipitation deficits larger than 30.5 mm/month in these regions reduce to 2.37 irrigation periods. The different results may be used to tackle different water management problems. The use of the multivariate extreme value model serves in any case to refine the spatial analysis of extremal dependence.

6. Droughts in the Duero Basin Analyzed With the SPI

[44] By construction, the SPI inference procedure does not take into account of any spatial dependence. To identify spatial regions with similar variability patterns, a principal component analysis (PCA) can be applied to the calculated SPI fields [see, e.g., Bonaccorso et al., 2003]. As a benchmark for our MEVT approach, we implemented this PCA technique on three month running mean deficits (SPI3) in the central plane of the Duero basin, see Figure 1b. To reduce high loadings with several PCs, which hampered the determination of a spatial pattern, a Varimax rotation to the loadings [von Storch and Zwiers, 1999] was added with the rule by North et al. [1982] to determine the number of principal components.

[45] The first PC, which explains more than 70% of the variance of the data (cf. Table 2), is similarly related to all stations and does thus not result in a spatial pattern (see Figure 9a). This reflects findings of Vicente-Serrano [2006], who analyze the SPI12 from stations of the whole Iberian Peninsula. They find similar variability for the whole center of the peninsula. The second and third PC result in a North-West to South-East and in a North-East to South-West gradient, respectively (see Figures 9b and 9c). Some parts of the crop lands, such as Esla-Valdereduey in the North, for example, cannot be clearly assigned, they show positive loadings for PC2 and PC3. We applied thus an orthogonal varimax rotation to the most important PCs to get clearer spatial patterns [Bonaccorso et al., 2003]. North's rule, see North et al. [1982], suggests to retain up to three PCs. When interpreting the screen diagram or concentrating on the PCs which explain more than 80% of the variance, only two PCs are kept. As the number of retained PCs change the spatial patterns obtained from the varimax rotation, we interpret results from both rotations. When rotating two PCs, a North-West to South-East gradient gets apparent. The first PC hints to a similar variability of droughts within the Esla-Valdereduey zone. The direction of the PC does not matter for the determination of regions with similar variance. We thus regard stations with high negative loadings also as connected. The second-rotated PC indicates a connection of sub-basins Riaza-Duraton-Alto-Duero and Cega-Eresma-Adaja in the South-East (see Figures 9d and 9e). When rotating three PCs, the first PC hints again to a strong connectivity within the Esla-Valdereduey basin. The second PC now indicates a common variability in the Southern basins, especially Bajo-Duero and Cega-Eresma-Adaja (cf. Figure 9g), whereas the third PC connects the North-East, namely Pisuerga-Arlanza and Riaza-Duraton-Alto-Duero. It is thus not clearly identifiable if the subbasin Riaza-Duration-Alto-Duero is rather connected to its North or to its South-West, which confirms the findings of the MEVT analysis (cf. Figure 8a). By construction, the rotated PCs explain similar amounts of variance, which is about 40% when two PCs are rotated and 28% for three PCs (see Table 2).

Table 2. Variance Contributions (%) of the First Four Unrotated PCs and of the Rotated PCs for SPI3 Dataa
Number PCsSPI3 VarianceSPI3 Varimax (2 PC rot.)SPI3 Varimax (3 PC rot.)
  1. a

    Rotations have been performed with the first two or three PCs.

31.81 27.00
Figure 9.

Loading patterns of the first unrotated three principal components of the SPI3 data (top, Figures 9a–9c). Figures 9d and 9e show the loading patterns after a rotation of the first two PCs, and Figures 9f–9h show the loading patterns after a rotation of the first three PCs.

[46] Comparable results have been obtained when analyzing the SPI derived from monthly precipitation and from running means of 6 months of precipitation (results not shown). The spatial study by means of SPI and PCA illustrates the dependence structure of droughts in the Duero basin. However, the decision on the number of PCs to retain and the classification of the loading values into distinct spatial regions leaves some ambivalence. With regard to content, the results support the findings of the MEVT study in section 5.2.

7. Conclusions

[47] Precipitation deficits in the Duero basin and their spatial dependence have been assessed. Dry periods are a frequent phenomenon in the Duero basin.

[48] A multivariate extreme value model is applied, which captures the dependence structure of extreme severity of dry spells (asymptotically dependent as well as asymptotically independent extremes). Here cumulative precipitation deficits below 42.7 mm/month and 30.5 mm/month have been assessed. In the Duero basin, such dry spells occur between one and three times a year, and they have a length between 2 and 3 months on average. These dry spells emerge during the whole year, but they are more intense in the irrigation period. The most severe dry spells (on average over the whole time period) occur in the Bajo Duero, which is situated in the southeast of the Duero Basin.

[49] The MEVT allows for the assessment of bivariate dependence. The estimated dependence between extreme severity of dry spells at each two stations have been visualized in dependence maps, where the dependence of dry spells at a single station with dry spells at all other stations in the region is depicted. It is found that up to 30% of the bivariate dependence measures indicate asymptotic dependence. Thus, dry spells in this basin are very connected. The dependence between dry spells at the 42.7 mm/month level in general spatially more extensive. It got apparent that topography and spatial distance influence the extremal dependence between dry spells. However, no simple law, which describes the influence of topography and spatial distance, could be deduced. This also showed in a dependence-distance study: As expected, the extremal dependence decreases with distance. However, its large variability hampered an approach to deduce a simple correlation function. Thus, the presented dependence maps are a valuable complement of risk maps, where solely the probability of dry spell occurrence is depicted.

[50] Moreover, the stochastic model has been employed to describe the dependence between six regions in the center of the Duero basin where most of the agricultural activities take place. Bivariate to trivariate dependence between these regions is found. In the irrigation period, the shorter and more severe dry spells (defined at the 30.5 mm/month level) exhibit strong asymptotic dependence ( math formula) in the eastern regions, whereas less severe dry spells at the 42.7 mm/month level are more connected in the South.

[51] These findings are supplemented with a drought assessment of the crop zones by means of the common SPI. It shows that subregions with similar variability can be identified. Esla-Valdereduey in the North-East contains highly connected stations. Furthermore, the subbasins in the South and North-West are connected. Riaza-Duraton-Alto-Duero is connected either with its North or with its South-East.

[52] In summary, the SPI analysis results are well in line with the MEVT findings. They also indicate regions of similar variability in the South and in the North-East, and bipolar characteristics of Riaza-Duraton-Alto-Duero. With respect to the methodology, several similarities and differences between the SPI and MEVT approach arise. The SPI is calculated from running means of precipitation. Droughts of the respective window length, e.g., 1, 3, or 6 months, are thus in focus. Averaging reduces the severity of droughts, which are shorter than the window length, and longer droughts are split into several events. The comparison of SPI1, SPI3, and SPIs with a wider window width might be necessary to get a complete overview over drought characteristics in a region. Cumulative precipitation deficits can also be defined with different precipitation levels. These levels are of the same kind as the classification levels, which classify the SPI into moderate, severe, and extreme droughts. However, the assignment of one event to each dry spell, whatever length it has, allows for a joint assessment by means of MEVT.

[53] Beyond this, the dependence between stations or regions can be quantified with the MEVT framework by means of the FI. According confidence bands are provided, which allow for uncertainty assessment. The MEVT model allows moreover to the inference of yet not observed, extreme events. This includes the estimation of return periods for extreme dry spell severity in a region. The return period for a large-area dry spell in the crop lands of the Duero basin, with precipitation being on average below 30.5 mm/month in all six subregions, is about 3 years.

[54] The spatial patterns of dry spells are usually complex. It is common for one area to suffer dry conditions, whilst neighboring areas experience normal or even humid conditions. The presented analyses assess dependence at station and subbasin level; thus, more of the spatial heterogeneity of dry periods is captured. However, the presented MEVT approach analyses joint extremes; thus, the number of analyzable entities is restricted. The extension to the assessment of joint dependence between all stations is envisaged in further work. One way to achieve this goal would be the use of a spatial inhomogeneous dependence measure.

[55] Nonirrigated agriculture is a common practice in the Duero basin. However, average yearly precipitation amounts in this region are close to levels, which might cause yield losses. The anticipated future decrease of precipitation [Vicente-Serrano et al., 2011] hints to an aggravation of dry periods in the Duero basin. In addition, temperature is expected to increase and runoff supply to decrease (due to revegetation processes in the mountain areas, which surround the Duero basin). An increasing water demand of the population in the center of the Duero basin is anticipated as well. Thus, a greater social and economic vulnerability to dry spells is to expect [Vicente-Serrano, 2006]. The presented approach may be used for short-term water management planning to face this situation. Up to now, dry periods in the Duero basin have been analyzed rather with respect to their temporal evolution [see, e.g., MARM, 2007]. However, the temporal drivers for dry periods are not well determined. A probabilistic view and the provision of maps of dry spell probability and dependence provide, thus, valuable additional information for water management.

Appendix A: η-Asymmetric Logistic Model and the FI

[56] There are infinitely many ways to define a dependence measure math formula for multivariate extremes. We use the η-asymmetric logistic model and define such a measure as presented in Ramos and Ledford [2011]. In the following, this measure will be denoted math formula. The according measure density for multivariate data with dimension d is

display math(A1)

with parameters math formula and math formula. math formula holds, and math formula is math formula. Here B represents the set of all nonempty subsets of math formula and math formula is the number of elements in the set b. The constraints math formula and math formula hold. They determine math formula and math formula. The parameters influence the characteristics of the multivariate extreme value distribution: The limit function of math formula, is concave in ω when math formula, and it is convex in ω when math formula. When math formula, then math formula is flat and thus ray independent. math formula is a measure of symmetry between two variables, e.g., X1 and X2. For math formula, these variables are symmetric.

display math(A2)

holds, with math formula being the number of joint occurring extreme events, i.e., counting the number of events of the type math formula. Informally, N can be described as 1 (the extreme which has already occurred) + P(a joint extreme event occurs)/P(any extreme event occurs).

[57] For the asymmetric logistic dependence function, we can write

display math(A3)

[58] It is also possible to derive N for a multivariate asymmetric logistic dependence function

display math(A4)

[59] Confidence bands for N can be derived from the parameter estimates and their covariance by using the delta method [cf. Coles, 2001].

Appendix B: Shifting of Dry Spells

[60] Overlapping dry spells are assumed to be dependent. The presented approach does not model the duration of dry spells. Thus, the data are preprocessed to integrate this assumption: Overlapping dry spells are shifted to a common time point. Extremes are defined as threshold excesses. Here the actual time point of occurrence of an extreme is not modeled and thus the shifting does not alter the results of the analysis of single series.

[61] Comparison: To shift the dry spells, they are compared in descending order, i.e., station 1 is compared with stations 2 to d, where d is the number of stations. Let station 1 be the principal station and stations 2 to d be the comparison stations. The comparison is not repeated, so station 2 is compared with stations 3 to d, and so forth.

[62] Eligibility: For each dry spell i of the principal station, dry spells of the comparison stations are only eligible for a shift, if they occur during the time period of dry spell i, and in case, they have the longest overlap with dry spell i and not with some other dry spell j of the principal station. Furthermore, they must not have been shifted previously.

[63] New time point: The time point t within the period of dry spell i for which the cumulative dry spell lengths of all eligible dry spells are the highest, is chosen as new time point. If there are several such time points, the time point with the largest number of overlapping dry spells is chosen. Dry spell i and all eligible dry spells, which also cover the new time point, are shifted to the new time point t.

[64] The result of the shifting algorithm depends on the (arbitrary) indexing of the stations. To avoid a bias of the results due to the shifting algorithm, it is repeated in reverse order. Here, the principal station is station d, and it is compared to stations 1 to (d – 1). Then, the principal station (d – 1) is compared to stations 1 to (d – 2), and so forth. For illustration, see Figure 10. Here durations for dry spells defined at the 42.7 mm/month level and the whole year are depicted. A total of 20 stations have been selected randomly from the 491 available stations, and the time period January 1975 to December 1978 has been chosen for illustrations. Small differences between the shifting and reverse shifting get apparent.

Figure 10.

Dry spell durations for January 1975 to December 1978 for 20 randomly chosen stations (the labels of the y axis are the station IDs) are depicted as black lines and small black dots in case the duration is 1 month. Furthermore, the time points of the shifted dry spells are marked as gray dots for the shifting algorithm and as black circles for the reversed shifting algorithm.

[65] When assessing two stations, shifting is not problematic: Results with standard shifting and reversed shifting are always the same. To eliminate the influence of the shifting algorithm for more stations, the results obtained with both algorithms are compared and only common results are kept, that is, FI estimates whose standard deviations overlap.

Appendix C: Maximum Likelihood Estimation

[66] The Poisson process model is used, so it is assumed that the extremes in the tail region A occur independently from each other [Beirlant et al., 2004]. Let math formula denote the region above thresholds math formula. The likelihood for the poisson process is modeled as

display math(C1)

[67] Thus, m events occur in the joint tail of d dry spell severity time series at d stations. Here solely, the joint tail is examined, so the probability of the occurrence of exactly m extremes in A is set to 1. The estimates math formula are obtained by numerical optimization. Due to the specifities of math formula, the equation differs slightly from the result for the classic EVT model. For this metric, the radius r cannot be neglected in the likelihood equation: It is needed to estimate η. However, r and math formula can still be divided into separate factors.

[68] The likelihood function is given by

display math(C2)
display math(C3)

[69]  math formula holds for equal thresholds math formula.

[70] As initial values α = 0.65 and math formula are chosen [Ramos and Ledford, 2009]. The initial value for η is obtained by means of the structure variable math formula: The shape parameter of the distribution the excesses of Ti over a high threshold is taken as initial value [Ledford and Tawn, 1996].

[71] The maximum likelihood estimation is only performed, in case 20 or more extremes occur in the joint tail.


[72] This work has been realized within the EU Marie Curie COFUND program AMAROUT (PEOPLE-2007-2-3.COFUND). We thank this project for its support. Furthermore, we are very grateful to J. C. González Hidalgo for the uncomplicated and rapid provision of the MOPREDA database. Part of this work has been supported by the EU-FP7 ACQWA Project (, by the PEPER-GIS project, by the ANR-MOPERA project, by the ANR-McSim project, and by the MIRACCLE-GICC project.