Modelling of bivariate meteorological drought analysis in Lake Urmia Basin using Archimedean copula functions

The Urmia Lake Basin as the world's second largest salt lake has experienced severe drought during recent years. The purpose of this study is to analyse the bivariate characteristics of drought (i.e., duration and severity) using two indices including SPI (standard precipitation index) and SPImod (modified SPI) associated with copula functions. For this purpose, rainfall data of six stations were used for the period of 1971–2017. At first, the characteristics of drought were extracted using the two indices. Then, through coding in the MATLAB software environment, eight families of Archimedean copula functions were applied. The simultaneous return period and conditional and Kendall returns were also investigated. The result showed that the Joe copula function was the best predictor for the analysis of both intensity and duration of drought for the study area. The correlation coefficients of Spearman, linear correlation and Tau Kendall computed for the SPI of stations were >0.65, >0.72 and >0.48, respectively, while all of them were significant. At a given critical probability level, t, the value of the Kendall return period was much greater than the standard return period, so that this difference increased with increasing t value. The results obtained from the time series of indices indicated that at least 40% of the months were dry, and the severity of droughts in the Urmia station was much higher than other stations during the studied period. Moreover, SPImod to a large extent eliminates the disadvantages of conventional SPI and takes into account seasonal variations of precipitation in the calculation of SPI.


| INTRODUCTION
Drought is known as a natural phenomenon that depending on its characteristics and consequences can be studied in four categories including meteorological, agricultural, hydrological and socio-economic schemes (Dracup et al., 1980a(Dracup et al., , 1980bHeim, 2002;Xu et al., 2015). Planning and improving the management of water resources systems under drought conditions requires estimating the period of combined and conditional drought returns. Typically, simultaneous distributions are obtained analytically by assuming that drought variables are independent random variables or follow the same univariate distribution and have explicit multivariate shapes (such as normal multivariate, exponential multivariate and gamma multivariate). However, these two conditions are not met in most cases because drought variables are highly correlated and may follow different univariate distributions (Wong et al., 2013).
While analysing two or more variables, copula functions incorporate the correlations between them in the calculations. These functions were first proposed by Sklar (1959) to create multivariate distributions. In hydrological studies, copula functions were first used by De Michele and Salvadori (2003) to create a twovariable distribution of storm intensity and duration. Mirabbasi et al. (2012) analysed drought bivariate using copula functions in order to study the severity and continuity of meteorological drought events at Sharafkhaneh station, Iran. They found that the Galambos copula function was the most suitable copula for the station. Amirataee et al. (2018) developed the distribution of the joint probabilities of drought intensity and the percentage of area affected by drought based on the copula functions in the catchment area of Urmia Lake in Iran and used seven different families of copula functions. Ayantobo et al. (2018) also used several copula functions to derive regional and spatial models of drought risk assessment in Mainland China for the period of 1961-2013. Pontes Filho et al. (2019 studied SPI (standard precipitation index) based on copula functions and monitored the probability of consecutive droughts in Portugal.
In the continuation of the research, various studies on droughts and indicators that have been done in the world are presented (e.g., Borji et al., 2016;Bouabdelli et al., 2020;Choubin et al., 2016;Danandeh Mehr et al., 2020;Salvadori & De Michele, 2015;Wang et al., 2020). Achour et al. (2020) performed the timespatial analysis and drought forecasting in the plains of northwestern Algeria using SPI. They used the artificial neural network (ANN) model from different scales of the SPI. The results of their research indicated that the frequency of drought varied depending on time scale. Das et al. (2020) applied the copula function approach using the SPI to assess drought characteristics and climatic indicators in the Himalayan states in India.
Mohseni Saravi et al. (2009) studied intensity-duration-frequency (IDF) and spatial analysis of droughts using the standardized precipitation index for Karun River basin located in Iran. For this purpose, they used standard 3-, 6-and 12-month scales of precipitation index. Choubin et al. (2014) predicted drought in the Maharloo Bakhtegan semi-arid watershed in southeastern Iran using climatic signals and SPI in a fuzzy neural modelling framework. The results of their study showed that Atlantic Ocean surface temperature was inversely related to SPI. Kazemzadeh and Malekian (2016) studied the spatial and temporal characteristics of meteorological and hydrological drought in northwestern Iran and showed that most of the drought events in the base periods in the last 15 years were from 1995-1996-2011. In addition, the driest year was 2007-2008 to the meteorological drought index. Malekian and Kazemzadeh (2016) studied the spatio-temporal analysis of regional trends and the changes of self-correlated temperature series changes in the Urmia Lake Basin located in northwestern Iran. The results of Ljung-Box test showed a positive series correlation between mean temperature (Tmean) and maximum temperature (Tmax) for all stations at a significant level of 0.05. In the monthly series, the significant warm-up trend in the Tmean series was more noticeable than in the Tmax series. However, the Tmax trend was found to be greater than the Tmin series.
The Urmia Lake Basin as the largest interior lake in Iran has experienced severe drought during recent years. The water level in the Lake has decreased dramatically in recent years due to drought, climate change and the overuse of water resources. Up to date, no study has been done in this basin using the copula functions joint to SPI and SPI mod for the analysis of drought. Therefore, the purpose of this study is to evaluate the bivariate analysis of drought in the basin of Urmia Lake located in northwestern Iran using two indices of SPI and SPI mod (modified SPI) by eight categories of copula functions: Gumble Huagard, Galambos, Frank, Clyton, Placket, Farlie Gumble Morgan Stern, Ali Mikhail Haq and Joe. For this purpose, rainfall data of six synoptic stations were used for the statistical period of 1971-2017. The coding at various stages of this research was done in the MATLAB programming environment. The findings of this study will enhance our knowledge on drought vulnerability and drought impact assessment in the Urmia Lake Basin for improving the management of water resources under vulnerable conditions. Moreover, Joint and conditional return periods of drought characteristics (severity and duration) will provide useful information for drought risk assessment and analysis to managers, water resources engineers, and can design criteria for water reservoirs.

| Study area
This study was done in the Urmia Lake Basin, which is located between 37 4 0 to 38 17 0 latitudes and 45 13 0 to 46 longitudes in northwest Iran and covers an area of 51,800 km 2 ( Figure 1). The lake is between two provinces of West Azarbaijan and East Azerbaijan, with an average height of 1276 m above sea level. The study of fluctuations in the water level of lakes in terms of importance, nature and location of these aquifers has become particularly important in recent years. The Urmia Lake is the largest inland lake in Iran. There are 16 wetlands around the Lake with areas ranging from 5 to 120 ha (some of which have dried up), most of them have fresh or salty water and have a high ecosystem value. Six synoptic stations were selected to acquire rainfall database for the region. Table 1 shows the general characteristics of the stations studied.

| SPI and SPI mod indices
SPI is an index that depends on the probability of precipitation for each time and scale. This index is comparable for different time scales, and based on it, the initial warning for monitoring drought and its severity can be evaluated and calculated using the following equation (McKee et al., 1993): where SPI is the standardized precipitation index, P i is the normalized data, P is the mean of normalized data in the desired period and σ is the standard deviation of normalized data in the desired period. If X w represents the total precipitation over a period of W months, by fitting the two-parameter gamma distribution to the time series X w , the cumulative distribution function or marginal CDF u w = F Xw ( x w) is obtained then the SPI, like any observation, is calculated with a normal inverse function or ϕ À1 u w ð Þ. Although the SPI calculation method seems logical, it still has its drawbacks. For example, the SPI in the annual rainfall regime cannot calculate seasonal variations. Therefore, Kao and Govindaraju (2010) suggested that the accumulated precipitation for a given time window be grouped X w (t) based on the final month (Kao & Govindaraju, 2010). On this basis, in the modified SPI (SPI mod ), X w (t) changes to X m w y ð Þ, where m represents the final month of the time window w (m = 1 (Jan) … m = 12 (Dec) and y is the index of the year, where y = 1, 2, …, k (k is the total number of years) as presented in Equation (2): where, g is year index and m is month index (month number) and t is time index and is stated as Þþm. By fitting the inverse distribution for each group means, u m w ¼ F x m w X m w À Á , w ¼ 1, 2, …, 12, m ¼ 1 Jan ð Þ … 12 Jan ð Þ SPI mod can be calculated through conversion of u m w to standard normal variable as follows: where φ À1 u m w À Á is the inverse function of the standard normal distribution with mean zero and standard deviation of 1.

| Theory of copula functions
Copula functions have been of interest to researchers in modelling (non-linear) multivariate relationships in hydrological and meteorological studies such as multivariate frequency analysis, risk assessment, drought modelling, floods and geostatistical interpolations (Schweizer & Wolff, 1981). Copula was used in hydrology because of its flexibility, since it is based on a nonlinear relationship between variables and connects both distribution and marginal functions (Joe, 1997;Schweizer & Wolff, 1981). In this study, copula function was used for the multivariate analysis of drought in the Urmia Lake Basin. The following is a brief description of the distribution functions and the copula functions used in the research.

| Sklar theory
The first application of copula function has been attributed to Sklar (1959). In the Sklar theory, 1D distribution functions can be combined in the form of multivariate distributions (Nelsen, 2006). For continuous random vari-

as follows in Equation 4
: where u j is the cumulative distribution function for the variable j m and H X 1 ,…,X d x 1 , …x d ð Þ is the same CDF with d variables for a set of variables X 1 , …, X d f g . When copula is used to create a bivariate distribution, it is very easy to evaluate the combined probabilities. The combined distribution function defined for the two variables in the following equation indicates the probability that both X and Y are less than or equal to x and y. In this case, the simultaneous probability of them is as Equation (5) (Nelsen, 2006): Because different combinations of F X x ð Þ, F Y y ð Þ can lead to the same value of F X,Y x, y ð Þ, a simple and useful way to present the simultaneous probability F X,Y x,y ð Þ is drawing equal curves of equal probabilities. In practice, the probability of both x and y exceeding their respective thresholds can also be defined in copula as follows (Shiau, 2006): Now the return period is for intensity (S) and duration (D) and the return period of Kendall is in the form of Equations (7) and (8): where E L ð Þ is the average time between the onsets of two consecutive droughts and t∈I is a critical threshold that is defined based on the following equation (Salvadori & De Michele, 2010): where inf : f g is infimum or is the largest lower bound of the set. This is clear because K C is a distribution function, since the condition for using copula functions to analyse multivariate is the correlation between variables  (Nelsen, 2006). Therefore, in this study, the correlation equations of Spearman (Equation 10), Pearson (Equation 11) and Tau Kendall (Equation 12) were used as suggested by Nelsen (2006) as follows respectively: r where d i is the difference between the data ranks, C is the number of coordinated pairs and D is the number of uncoordinated pairs in which x and y, respectively, are the mean of the variables of X and Y, E : ½ is mathematical expectation and Std : ½ is the standard deviation of the events.

| Determining the parameters of the copula functions
In order to determine the copula function parameters, different methods have been proposed. In the present study, the root mean of square error (Equation 13), the Nash-Sutcliffe coefficient (Equation 14) and the Akaike criterion (Equation 15) and the maximum likelihood method are used as follows. For The maximum proof reading of the copula function parameter is estimated based on Equation (16) (Abdi et al., 2017;Akaike, 1974;Ayantobo et al., 2019;Nash & Sutcliffe, 1970): where n is the sample size, C p is the calculated values of the parametric copula, C e is the observational values of the probability obtained from the experimental copula, C e is observational values of probability obtained from the experimental copula, lnML the maximum value of the logarithm function and k is the number of fit parameters.

| SPI and SPI mod time series
Hydrological data are usually obtained from a random process. The randomness of the data allows us to analyse it in probabilistic ways. To use possible methods for data analysis, we must make sure that the data are random. For this reason, using the Run test method, the randomness test of precipitation data for the studied stations was investigated.
In order to investigate the presence of trend in statistical characteristics (such as mean, variance) of rainfall, static test (trend) of Spearman was used. The Spearman method is the most suitable method for static study of hydrological data. In the case of hydrology, the randomness of time series generally means that these data are the result of a series of natural phenomena. Run test is used to check the randomness of rainfall data. The results are presented in Table 3 for all stations indicating that the statistical data of all selected stations are random. According to Table 3, the results obtained for the Run test show that precipitation data in all stations are within the allowable range in terms of randomness testing. In Spearman test, the correlation between the rank of the events is considered as the basis. The time series of SPI mod and SPI for the Urmia station at monthly scale are shown in Figure 2. The comparison of time series between SPI and SPI mod shows that for Urmia station and the study area, 40% of the number of months was dry. The comparison indicates that the range of changes in SPI is greater than in SPI mod , so that the most severe drought according to SPI is À6.07 while for SPI mod this value is À2.76. This can be attributed to the way in which the gamma distribution function is fitted with the precipitation data in the SPI and SPI mod indices. Because in the SPI, only one pair of parameter gamma distribution coefficient parameters (alpha and beta) is fitted on the data for the whole statistical period is adopted, whereas according to SPI mod , this fitting is done separately for each month during the statistical period. Thus, for each month, one pair of gamma distributions is obtained for precipitation data. However, in the SPI, one pair is obtained for the whole statistical period, which in turn makes the amplitude of changes in the SPI mod greater than the SPI. The results of time series analysis for the studied stations on average showed that 45% of the study area had dry months (Table 4).

| Evaluation result of the SPI and SPI mod indices
In this study, the characteristics of drought are evaluated using the SPI and SPI mod indices. The parameters of the fit distribution functions on the drought characteristics were extracted, and the correlation between the characteristics was examined with three criteria of Pearson, Kendall τ and Spearman ρ correlation coefficient. The goodness of fit criteria were then used to determine the copula functions. The results of this section are given in Table 5. The results of Nash-Sutcliffe coefficient for multivariate analysis for all stations showed at least 0.998 values for NSE, indicating the good performance of copula functions in the analysis. In other words, events similar to historical data or more or less of them are likely to occur in future periods. Similar result was found for the values of the maximum likelihood function, RMSE, AIC for different copula-based. Accordingly, any copula function that has the lowest mean squared error and criterion has the highest maximum likelihood (ML) and is known as the superior copula function. On the other hand, any function whose Nash-Sutcliffe efficiency (NSE) coefficient is close to one is known as superior. The results obtained from the goodness of fit criteria showed that the superior copula function over the fitted rainfall data is the prediction of the Joe copula function, since for most stations the optimal parameter value of this copula function was 4.7. Also, the results of statistically superior distributions compared with intensity and duration data revealed that beta and exponential distribution functions were the best, respectively. The results of correlation coefficients showed that the highest correlation coefficients were related to Pearson correlation coefficient followed by Spearman and Kendall correlation coefficients. Moreover, as the condition for the application of copula functions is the existence of correlation between the selected variables; therefore, the results obtained through the correlation coefficients confirm this finding.
T A B L E 2 Relationships of Archimedean copula functions (Nelsen, 2006)  The copula theoretical and experimental values of the coincident on drought intensity and duration data for Urmia station are shown in Figure 3a. As is seen, the points are very close to the 45 angle bisector line (Line 1:1), indicating that the fitted Joe copula values have an acceptable agreement with the values obtained from the empirical copula. Also, the changes of the ML logarithm function against their dependence parameter are shown in Figure 3b. The value of the dependency parameter of the copula function for Urmia station is equal to 4.72 and its ML is 82.98.

Parameter domain
Then, the probabilities and their return period and other characteristics related to droughts were calculated by creating bivariate distribution functions. The results of this section indicated that depending on the copula function and due to the limited nature of different events, it is possible that the appropriate copula function differs among the stations. Also, the values of the studied indices are different from each other, probably due to the coincidence of statistical distributions on rainfall data to extract drought characteristics.

| Evaluation of the joint probabilities
In this study, by replacing the values of the density distribution of the severity margins and duration of drought in the selected copula function, the simultaneous probabilities were calculated. Since different combinations of duration and severity of drought may result in the same F I G U R E 2 SPI and SPI mod time series obtained for Urmia station  T A B L E 4 The percentages of wet and dry months for each station based on SPI mod probability, the simultaneous probabilities are represented by counter lines. Figure 4a shows the counter lines of the probabilities of both the severity and duration of drought at the Urmia station using a Joe superior function and observational values. From a practical point of view, it is possible that both the severity and duration of the drought are simultaneously higher than certain threshold levels, providing useful information for managers and planners to improve water resource management under drought conditions. For example, in Figure 4a, the probability of occurrence of drought with a duration of more than 6 months, that is, D d < 6 months, and intensities of D d < 6 at Urmia station is about 0.5. Also, a 3D representation of the probability of both intensity and duration of drought based on their superior copula function is shown in Figure 4b for Urmia station. Therefore, awareness of the occurrence of drought events can be useful information for planners in various management areas such as water resources, agriculture, environment, hydroelectric, economic and social, as well as preparedness for the destructive risks of drought. It was also found that lower levels of probability are associated with smaller values of the characteristics, while with increasing probability, the values of the F I G U R E 3 Fitted theoretical and empirical copula values of intensity and dry season data and the optimal parameter values obtained based on the copula Joe function for Urmia station characteristics also increase as is shown in Figure 4b. From mathematical point of view, this finding confirms the correct operation of copula functions in drought analysis. For instance, as mentioned above in drought time series for Urmia station, when the duration of drought was longer than 6 months and the intensity was more than 6, the probability of their simultaneous occurrence was equal to 0.5. These diagrams also can be used for the analysis of other drought characteristics such as returns period, joint and conditional probabilities. Given the above mention explanations, it is possible that the intensity values of the drought duration will be more than one threshold level at the same time, providing useful information to water resources managers, to improve water resources management, farmers, the environment and other water-related areas in a state of drought. Figure 5 shows the conditional probability of severe drought in a mode where its duration exceeds a certain threshold, d, for Urmia stations. For Urmia station, the probability of occurrence of droughts for intensities less than 4 and 6, and for the duration of drought more than 3 months is 0.55 and 0.57, respectively. The result also shows that as the duration of droughts decreases, the probability of their occurrence in different intensities increases, so that the probability of occurrence for droughts with a duration of 1 month is the highest, but the probability decreases for droughts lasting 6 months or more. This, in turn, is debatable, the reason for which should be sought in the probabilistic topics of mathematics in hydrological studies. In this case, the view of the F I G U R E 4 Contour map of drought intensity and duration (right) and the 3D (intensity-duration-probability) diagram (left) provided for Urmia station F I G U R E 5 Conditional probability of severe drought in a mode where its duration exceeds a certain threshold, d, for Urmia stations subject will be discrete and a sample space in which any size, the smaller the port of the fraction, the higher the probability of its occurrence. Hence, according to the science of probability of their occurrence, the shorter periods of drought are much longer than the other periods, since meteorological and hydrological phenomena are periodical and there is a possibility of their occurrence in future periods. This would be applicable if we want to save a water reservoir in an area so that this reservoir can provide downstream water storage during water shortages conditions for a variety of drinking, agricultural, environmental, recreational, hydroelectric and industrial conditions. These curves, therefore, provide useful and valuable information to water resource managers and planners.

| Conditional return period evaluation
To analyse the period of drought return in the conditional state, two states can be considered: the first state is when for a certain threshold of the drought period, its intensity value exceeds a certain limit (T DjS > s ); the second case is when for a certain threshold of drought intensity, its duration exceeds a certain limit (T S D ≥ d j ). As is shown in Figure 6a for Urmia station, when intensity is more than 5 (D s < 5) and drought period is more than 4 months (D d < 4), the return period for such conditions is about 50 years, then a water supply reservoir cannot provide enough water for drought conditions. The results for other values of intensity and duration against the return period and vice versa are presented in Figure 6b. In general, it is observed that with increasing intensity and duration of drought, the return period increases as well. In addition to our knowledge and information on water and agricultural resource systems, these diagrams are also used in rangeland or watershed management systems before and after the construction of any flood distribution system.

| Evaluation of Kendall return period
According to studies, the use of the standard return period definition in various studies in hydrology and water resources estimates the amount of return period less than its actual value. To solve this problem, the Kendall return period was calculated based on the Kendall distribution functions (Ayantobo et al., 2018;Mirabbasi et al., 2012;Salvadori & De Michele, 2010), accordingly the two return periods of standard and Kendall were compared for Urmia stations as shown in Figure 7. This diagram is plotted for the critical probability levels (t∈I) of the station. Due to the small number of observational data, the long return period was not estimated based on the Kendall return period. In fact, the fracture in the Kendall return period is due to the small number of observational data. According to Figure 7, the difference between the return period of Kendall and the standard return period for Urmia station is 10 years, while the standard return period for the same probability level is 4.5, respectively. It is revealed that with increasing time, the difference between the Kendall return period and the standard return period increases. In this context, Durante and Salvadori (2010) and Salvadori et al. (2011) reported that the Kendall return periods were significantly different from simultaneous return periods. Such differences of the return period will lead to errors in drought risk analysis. However, using the definition of Kendall return period, the results of risk analysis are more accurate for two-variable cases.

| CONCLUSION
In this study, multivariate analysis of drought in Urmia Lake Basin was performed using Archimedean copula functions and standard precipitation indices (SPI) and modified precipitation index (SPI mod ). Then, using these indicators, bivariate characteristics of drought, namely severity and duration, were extracted and analysed. As drought phenomenon depends on different variables, univariate analysis of drought does not comprehensively demonstrate the influence of drought characteristics. Instead, the bivariate analysis can provide a more accurate description of drought phenomenon by considering the effect of both drought characteristics, that is, severity and duration. Therefore, in order to analyse the droughtrelated variables, eight Archimedean copula functions (i.e., Gamble-Hoguard, Frank, Clyton, Galambos, Placket, Farli-Gumble-Morgenstern, Ali-Mikhail-Hagh and Joe) were used. The results of fitting statistical distributions on drought characteristics showed that the exponential distribution function on drought duration and the gamma distribution function on drought intensity were known as superior margin distribution functions. The results of correlation analysis showed that these characteristics have significant correlation in the studied stations. The correlation coefficients of Spearman, linear correlation and Tau Kendall computed for the SPI of stations were >0.65, >0.72 and >0.48, respectively, while all of them were significant. Also, for all stations, the Nash-Sutcliffe coefficient was 0.99, which confirms the correct operation of the copula functions for bivariate analysis.
The results of determining the best copula functions showed that the evaluated Joe copula function was recognized as the superior copula function for bivariate analysis for the study area. In general, the results of bivariate drought analysis showed that the severity of drought over the period 1971-2017 was worse at Urmia station than at other stations, and was less severe at Tabriz station than at other stations. Also, the results of the return period in different cases indicate that in future periods there is a possibility of more severe drought in the Urmia Lake Basin.
Knowledge of the return period can provide useful information to water resources engineers in order to plan to determine the volume of reservoirs in drought periods in order to meet the water needs downstream of reservoirs as well as hydroelectric needs, environmental, industrial and agricultural. The use of multivariate analysis can also provide information to agricultural engineers and to managers of watersheds and rangelands.