Statistical modeling of monthly rainfall variability in Soummam watershed of Algeria, between 1967 and 2018

The monthly precipitations obtained during 51 years of measurement in 24 stations of Soummam watershed in Algeria were analyzed to describe rainfall trends and aridity state of the area using statistical modeling. The choice of distribution laws was justified by comparing fitting results of different distributions laws used in literature reviews. Hence, the p values proved that Generalized Extreme Value, Weibull (3) and Logistic the distribution law are more adequate to analyze rainfall frequencies in different part of the watershed. The diagnostic given by Q‐Q plot, P‐P plot and survival regression curve showed the period of wetness and dryness in the northeastern and the southwestern part of the watershed, respectively. Moreover, the study given by the De Martonne index explains the consequences of climate change by a new form of aridity in the watershed between 1994 and 2018.


Recommendations for Resource Managers
• The annual rainfall of Soummam watershed has a moderate and irregular rainfall distribution between 1967 and 2018. • Using distribution function on monthly rainfall in each bioclimatic floor to analyze the trend of rainfall frequency gives a spatio-temporal description of climate in the area. • Fitting by Kolmogorov-Smirnov test allows us to choose generalized extreme value, Weibull (3) and Logistic for modeling monthly rainfall variability in each part of the watershed. • The diagnostic obtained by P-P plot, Q-Q plot and survival regression curve proved a change of aridity in the northeastern and southwestern part of the watershed between 1994 and 2018.

K E Y W O R D S
aridity index, bioclimatic floor, fitting test, monthly rainfall, statistical modeling, watershed

| INTRODUCTION
Climate change has become a concrete study; it is measured by changes in temperature, precipitation, and other parameters. According to the scientific community reports, the global climate demonstrates a big change, and the intergovernmental panel on climate change (IPCC) showed that the global average of temperature trend from 1880 to 2012 is 0.85°C, which had a degree of uncertainty between 0.65°C and 1.06°C (GIEC et al., 2013). During the last decade (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012), it was shown a rise on temperature average, which arrived to +0.78°C (for a variation range between 0.72°C and 0.85°C). However, the World of Meteorological Organization (WMO) considers the period of 2011-2015 as the hottest on record, especially in 2015, it was reported as the hottest year since modern observations began in the late of 19th century (WMO, 2016). The global precipitation evolution is a subject of measurement, which depends mainly on spatial and temporal climatic variability, this characteristic raises a problem on the acceleration and the shift of the hydrological cycle, such as a simple increase of precipitations followed by a substantial elevation of temperature can cause a very important evaporation especially in the regions of the world favorable to such a climatic evolution (Nouaceur & Murărescu, 2016). Different studies have been done to understand the concept of rainfall evolution in several sites of the world. They proved that climate change is more expressed in the wettest conditions (Alexander et al., 2006;Dore, 2005), where it was observed between 1991 and 2010 an increase in rainfall followed by a repetition of extremes (New et al., 2001;Planton et al., 2005). Climate change has a direct implication in natural disasters, it depends mainly on the large differences of temperature and rainfall variability, the climate anomalies have consequences on hurricanes and floods in different regions (Hailemariam, 1999). The drought that affects African countries is a good example of the consequences of climate change and its impacts on the availability of water resources, especially water reserves; this issue deserves great efforts to exploit water and improve food security in these countries (Faramarzi et al., 2013). In the last years, Algeria has been also a victim of climate change problems. The Algerian authorities reported two natural catastrophes, which affected the country in 2001 and 2008, known by the floods of Bab El Oued and Ghardaia, respectively. It is caused by the big change of rainfall variability, but no work has shown the main variable that produced this phenomenon, it can be explained only by modeling rainfall records during the period that showed breaks of frequency and the high rainfall variability (Boudrissa et al., 2017). The climate risk assessment depends mainly on the variability of rainfall extremes. One of the fundamental steps to avoid human and material damage is the need of a good knowledge and the mechanisms which can be taken to analyze climate change, then to make an in-depth study about a space and temporal distribution of climatic variables (rainfall, temperature); consequently, one can make a forecast about disasters occurrence and to take appropriate measures for mitigating the negative consequences and to reduce the impact of unforeseen events (Stern & Cooper, 2011). In literature, several works have been done to parameterize the adequate mathematical models which express variability of climatic parameters in several sites of the world. The first models are proposed by Fisher and Gambel in 1920and 1940, respectively (Fisher & Tippett, 1928Gnedenko, 1943;Gumbel, 1958); however, the choice of statistical law depends generally on several factors such as the time of observation, the period of the study, the seasonality and the aridity of study area. Recently, different studies showed that Generalized Extreme Value, Gamma, Normal and Weibull are the most applicable to model the monthly rainfall (Alam et al., 2018;Al-Suhili & Khanbilvardi, 2014). Nevertheless, the best choice is always given according to the p values results obtained by the fit test (Oseni & Ayoola, 2013).
The aim of this study was to study the behavior of climatic variability in Soummam watershed (Algeria), using an adequate the distribution law on a monthly rainfall measurements series of 51 years, to describe the spatio-temporal assessment of the climate over the entire surface of the watershed. This paper gives a statistical description of annual and inter-monthly rainfall followed by a presentation of the distribution models used in our study, according to the results obtained by the fit test. The De Martonne aridity index was also used to express the change of bioclimatic watershed levels during the period of study.

| STUDY AREA
Soummam watershed is one of the 17 watersheds in Algeria, its identification number is 15 according to the classification obtained by the National Water Resources Agency (Zouggaghe & Moali, 2009). It is located in the central part of Algerian northern and it extends over 9,125 km 2 . The watershed has an irregular shape characterized by Latitude of 35.75°, 36.75°a nd a longitude of 3.60°, 5.5° (Figure 1). In the north, the Soummam is delimited mainly by the Djurdjura mountain range, whose highest peak is 2,308 m of altitude (Lalla Khedidja), while its limit in the south is located near to the Hodna Mountains, which have an attitude lower than Djurdjura (the maximum altitude is 1,862 m). From the east to the west, Soummam watershed is open and in this part located the plateau of Setif and Bouira. The climatic conditions of Soummam watershed are not uniform. It was noted that the climate is essentially semihumid in the height part of Soummam valley ( Figure 1). However, some studies showed that in Bejaia, the climate is generally humid with slight changes in temperature. On the upland of Setif and Bouira the climate is Mediterranean with humid winters and hot summers. The annual rainfall of Soummam varies enormously between 400 mm on the Setif upland and 1,000 mm near the Bejaia coast. This variation depends on the geographical parameters, it increases with the altitude under humid winds in the west direction and decreases away from the coastal zones (Lounaci, 2005). The rainfall frequency also decreases from north to south (Turki et al., 2016). Generally, the Soummam watershed climate varies from the north to the south and from the east to the west between humid and Mediterranean with an extension of semihumid conditions.

| Data description
The study of climatic variability in Soummam watershed (Algerian northeastern) were applied on a monthly time series of rainfall and temperature parameters from January 1967 to December 2018 over 24 climatic stations, which are positioned on the whole surface of the watershed. The data were obtained from National Water Resources Agency (ANRH) and National Centers for Environmental Information (NCEI-NOAA), https://www.ncdc.noaa.gov/. In fact, the data history shows gaps of the record due to difficulty of measurements in some years. The Table 1 showed that the missing rainfall data had an interval range of (1.1%, 14.2%), whereas the temperature values are less frequent, which had an interval range of (0.2%, 4.1%). Hence, the missing data have been processed by using the new approach proposed in our former work to get a continuous and reliable time series (Aieb et al., 2019).  Abbreviations: IA*, inter annual; MD*, missing data.

| Classification of Soummam bioclimatic floors
To determine the bioclimatic floors in Soummam watershed of Algeria, which cover a big part of northeastern of the country, the principal component analysis (PCA) and agglomerative hierarchical clustering (AHC) shown in Figure 2 were applied on two climatic parameters for each individual (weather station) used in our study. The measurements were carried out at stations located all over the area for giving the best information about the bioclimatic floors of the watershed. The PCA graph showed two components PC1 and PC2, which are the interannual rainfall and the interannual temperature of the 24 stations over 51 years (from 1967 to 2018), respectively. It is represented from min to max (from 420 to 663 mm and from 14.02°C to 16.8°C; Table 1), reading from bottom to top and from left to right. PC1 component is the dominant parameter, which can cover 72.00% of the total information compared to PC2. The PCA and AHC results highlight three clusters that have very different characteristics. As shown in AHC plot, all clusters are well separated. The 1st cluster corresponds to the highest area of PCA graph, which represent just Bejaia airport station characterized by a very high annual F I G U R E 2 Principal component analysis followed by agglomerative hierarchical clustering plot showing the Soummam bioclimatic floors, using interannual rainfall and temperature measurements of 24 weather stations during 51 years precipitation and temperature measurement. This area represents the humid bioclimatic floor of the watershed that overlooks on the Mediterranean Sea ( Figure 1). On the other hand, the second cluster showed the largest proportion of the watershed, given by 16 stations, located on the red zone in the center of PCA graph ( Figure 2). It is characterized by a very varied rainfall and temperature, between interval values [−1.5, 2] and [−0.5, 1] of PC1 and PC2, respectively. According to the similarity of stations showed in AHC graph, we determined two sets on the same cluster, which are (Beni Maouch, Sour Elghozlen, Sidi Aich, Ighil Ali, Seddouk, Boubirek and Sidi yahia) and (Al Asnam, Mahouane, Farmatou, El hachimia, Zammorah, Mechdellah, Bouira Coligny, and Ain abbess), respectively. This part of the watershed represents the semihumid bioclimatic floor (Figure 1). The third cluster showed all the meteorological stations which have a Mediterranean climate, corresponding to the lowest area in the PCA graph, which have a climate characterized by low values of rainfall and temperature compared to all stations. The AHC graph displayed a great similarity between all the stations of the same cluster.

| Methods
According to the literature, the methodology used to study the variability extremes and to analyze the trend of frequency of climatic data series is based mainly on the good knowledge to choose the best distribution model describing the phenomenon. The Kolmogorov-Smirnov (D KS ) is a nonparametric test used to prove the good fitting of distribution models by evaluated the compatibility between empirical and theoretical cumulative distribution function, which are noted by F(x) and G(x), respectively (Justel et al., 1997;Massey, 1951). The D KS is also given by: The test based on the null and the alternative hypotheses, which are: H 0 : the data follow the specified distribution; H a : the data do not follow the specified distribution.
The proposed statistical distribution model is also rejected when the p value given by the fit test is lower than the significance level alpha, which is given generally by 0.05 (Simard & L'Ecuyer, 2011). The p value (P) of the D KS test is: where Fn denotes the cumulative distribution function of D KS under the null hypothesis H 0 .

| Probability distribution models
In this study, we have selected the generalized extreme value (GEV), Weibull (3) and logistics model according to the fitting tests results obtained in the third section of this study. On the other hands, the advantages of those probability distributions are simple and popular in minima and maxima monthly rainfall analysis (Alam et al., 2018;Uwimana & Joseph, 2018).

GEV distribution
The GEV is a family of continuous probability distributions developed within the extreme value theorem. It was used in many field of study as climatology to study maxima of temperatures, precipitation and river discharges (Hosking & Wallis, 1987). Three parameters of GEV were recommended for meteorology frequencies analysis, which are the shape κ, the scale α and the location ξ; where the parameters estimation is given by method of moment (Bhunya et al., 2007). The probability density function (PDF) of GEV is defined as:

Weibull distribution
The Weibull probability function is the most widely used in fitting the distribution of rainfall (Udomboso et al., 2010). Its general probability density function is given by Equation (6), where x μ ≥ , γ > 0 and α > 0 (Cousineau, 2009).
where γ is the shape parameter, μ is the location parameter, and α is the scale parameter. The Equation (6) is called the standard Weibull distribution in the case where μ = 0 and α = 1. On the other hand, it is also called the 2-parameter Weibull distribution in the case where μ = 0 (Wilks, 1989).

Logistic distribution
Logistic distribution is a continuous probability distribution. It defined by two parameters for all the real numbers, which are a scale s and a location μ (Ye et al., 2018). Its PDF plot is always symmetric about the mean. The logistic model can also be used to analyze monthly rainfall occurrence, this technique based on assumption of the average Mean and the variance Var of the number of wet days (Buishand et al., 2004). It can be represented by: where M stands for the total number of days in the month of interest, ϕ is a dispersion parameter, and П is the wet-day probability. The PDF of logistic distribution used in the third section to fit monthly rainfall of Soummam watershed during 51 years of observation is given by: 3.2.2 | Climatic statistical tests

DE martonne aridity index
The De Martonne aridity index (I DM ) is one of the best known and widely used aridity/humidity indices in applied climatology (Croitoru et al., 2013;de Martonne, 1925). This index is very important in the arid/humid climate classification. Despite the fact that it is one of the oldest indices, it is still used with good results worldwide to identify dry/humid conditions of different regions (Adnan & Haider, 2012;Baltas, 2007;Caloiero, 2004;Shahid, 2008;Zarghami et al., 2011). This index is given by Equation (10): where P and T are the annual amount of precipitation and mean annual surface temperature in millimeter and in degree Celsius, respectively.

Precipitation concentration index
The Precipitation concentration index considered as a powerful indicator to control temporal precipitation distribution, it was proposed by Oliver (Oliver, 1980), then it was developed by De Luis, González-Hidalgo (De Luis et al., 1997). This index is also very useful for the assessment of annual and seasonal precipitation change, which noted by PCI and SPCI, given by Equations (11) and (12), respectively.

( )
where pi represents the monthly precipitation in month i, that is calculated for each year of the study.

| RESULTS
This section highlights the important findings of this study and describes rainfall variability in Soummam watershed in Algeria. Both of graphical and statistical approaches were applied to examine trends of monthly rainfall series during 51 years of the observations. 4.1 | Trend analysis of rainfall and temperature parameters 4.1.1 | Annual rainfall analysis Figure 3 shows the histogram of difference from the annual rainfall, followed by annual PCI index curve, obtained from average measurements of 24 meteorological stations in Soummam watershed during 51 years of observations (from 1967 to 2018). The results show a periodic change of annual rainfall variability between dry and wet. The rainfall was driven by an abrupt shift from wet years to successive dry years between 1976 and 2009. Then, an increase in rainfall measurement was given above the average between 2010 and 2018. The results of annual precipitation distribution were given by using the PCI test for further investigation of the climate characteristics (Figure 3). It was observed during all the period of study that PCI values varied between 10 and 18. The annual rainfall distribution showed that the most years have a uniform regime of precipitation. However, in 1973However, in , 1988However, in , 1999However, in , 2002However, in , 2004However, in , 2006However, in and 2017; the annual rainfall showed a moderate distribution. During this period, the watershed had probably known some month of dryness or wetness, which changed the rainfall ratio from moderate to irregular distribution. These results allowed us to bring about the study of rainfall trends throughout this period. Figure 4 shows inter-monthly observations of rainfall and temperature during 51 years (from 1967 until 2018), which are given by a histogram and a curve, respectively. According to graphical results, one can see a great variability given by rainfall observations compared to temperature. Table 2 shows a big distribution variability of inter monthly rainfall, proved by a coefficient of variation and SD, which equal 3.6 and 12.6, respectively. The monthly rainfall average is given by 34 mm for a maximum of 57 mm and a minimum of 5.8 mm. One can conclude a variability range of (+40.4%, −82.9%).

| Inter-monthly rainfall and temperature analysis
On the other hand, the temperature showed a good distribution according to all the measurements obtained during the 51 years. The table shows that the temperature average equal 18.1°C, it is centralized between a maximum of 24.8°C and a minimum of 12.4°C, this good distribution was given by an interval of variability of (+27%, −31.4%) and a coefficient of variation of 2.5. The PCI index shows that the concentration of seasonal precipitation to be part F I G U R E 3 Histogram of difference from the mean, followed by annual PCI curve, obtained during 51 years of observations in Soummam watershed of uniform and moderate distribution, knowing that autumn and summer season shows a uniform regime, given by 8.5 and 9.4, respectively. In contrary, Winter and Springer season are moderate, the PCI index given by 11.1 and 12.2, respectively. Table 3 shows statistical parameters to expresses the irregular distribution of the inter-monthly rainfall that was observed in Soummam watershed during the 51 years. The variations average which are between (5.7 and 57 mm) shows a maximum of precipitation recorded in January, given by 154.5 mm. The table shows that 25% of the rainfall measurements of January during the 51 years varied between (60.1 and 154.5 mm). The stormy periods was recorded on August with a high rainfall reached up to 65.1 mm. The results showed that the season of winter marks the big rainfall variability given by the values range of [25.9, 34.5].

.1 | Homogeneity analysis
The interannual rainfall measurements observed in each bioclimatic floor of Soummam watershed, which are humid, semihumid, and Mediterranean during 51 years (from 1967 to 2018) are shown as mobile averages curves in Figure 5. The graphs show that there is a change in trend of rainfall frequencies between 2000 and 2018 at humid and semihumid bioclimatic floor. The homogeneity analysis by Pettitt and standard normal homogeneity test (SNHT) are shown graphically in Figure 5a1-a3 and 5b1-b3, respectively. The H 0 hypothesis, which tries to prove that the statistical series are homogeneous, has been rejected in the case of precipitation measurements obtained by humid and semihumid bioclimatic floor. According to the results shown in

| Fitting test
The choice of climatic model depends almost on the conditions of the area and the parameter scale. The purpose of the studies shown in Figure 6 and Table 5 are to evaluate the best  Table 5 showed that monthly rainfall of humid bioclimatic floor follows GEV model, proved by a p value of 0.1234, which gives the best result compared to all models. This hypothesis is also verified for the log-normal distribution, according to p value result, which equals .04747. In the case of the semihumid bioclimatic floor, H 0 hypothesis accepts Gamma (2), logistic and weibull (3) model. However, the fitting results favored weibull F I G U R E 5 Moving average curves of monthly rainfall in humid, semihumid, and Mediterranean bioclimatic floors of Soummam watershed during 51 years of observations, followed by homogeneity graphs (a1, a2, and a3) and (b1, b2, and b3) obtained by Pettitt and SNHT test (3) with a p value of .1686 and D KS equal 0.0348. On the other hand, the selected model for the monthly rainfall distribution in Mediterranean climatic floor is logistic law, given by a p value equal .7529 and a D KS equal 0.0277. In this case, the fitting tests accept also the normal law according to a p value result, which equal .1501. Figure 6 showed a good fit of GEV, Weibull (3) and Logistic model used cumulative distribution curve and the density histogram. In the case of humid bioclimatic floor, the GEV theoretical model, which is defined by (0.0468, 11.1219 and 37.9512), it is found below the experimental model when the rainfall values are between 40 and 60 ( Figure 6b1, b2); this fit is also similar with the rainfall distribution of semihumid climate, which is given by the weibull law (3). Inversely in case of Mediterranean climate, rainfall distribution showed a very good fit, superimposed on theoretical model (Figure 6c1, c2).

| Time modeling of rainfall variability
The trends analysis of monthly precipitation in each bioclimatic floor was done temporally by Q-Q plot, P-P plot (which allowed us to cf. between empirical and theoretical distribution of monthly rainfall, as well as the similarity between empirical and theoretical quantiles of the same distribution model, respectively) and standard residual curve as a diagnostic tool, to check the distribution of rainfall extremes and to identify cases where there are breaks of frequency for each bioclimatic floor, compared to theoretical model. The P-P plot graph given by Figure 7 showed that the monthly rainfall distribution of Mediterranean bioclimatic floor has a very good adjustment with logistic model. No large deviations were observed. In this area, the values of the precipitation have a very good similarity with theoretical data given by the model, which is shown graphically by Q-Q plot. This similarity between the values has a very good correlation (R 2 ) of 0.9938. According to standard residual graph, logistic model does not display any break of frequency or trend in the rainfall series during 51 years of observation. On the other hand, Q-Q plot showed a big deviation of the curve from the bisecting line in the case of humid and semihumid precipitations, observed in the part of values that falls within the range of (70, 110) mm and (60, 80) mm, respectively. The differences between theoretical and experimental values are shown in the graph of standard residuals by trends between 400 and 600 months. The results of Figure 7 complete the information of the homogeneous test given in Figure 5 about the break of frequency which has been observed in the last years of analysis. The hypothesis that one can make from this comparison is that humid and semihumid climate exposed to a change in aridity, which can be temporal or spatial.  Figure 8 represents spatial modeling results obtained from monthly precipitation distribution analysis in semihumid bioclimatic floor of Soummam watershed during 51 years of observations, given for each meteorological station. The purpose of this study was to specify the area showing break of rainfall frequency using Weibull (3) law, knowing that a big deviation of rainfall trends can cause also a change of aridity in the watershed. The results shown in Figure 8 are obtained by survival weibull regression test, applied on 16 weather stations, which expresses the rainfall distribution compared to theoretical model using the time scale. In addition, the statistical description of this diagnostics summarized in Table 6. The graphs noted by (1-16) show that monthly rainfall at the majority of stations have a good adjustment of weibull (3) distribution during 612 months of observation (from 1967 to 2018), given by a p value varying between .0858 and .5201 (Table 6). The best distribution is that of Ain Abessa station, given by a standard deviation and a p value parameter, which equal 26.615 and .5201, respectively. On the other hand,

| Aridity analysis
Rainfall variability analysis of stations that gave breaks of frequencies compared to GEV and Weibull (3) theoretical model are represented in Figures 7 and 8, respectively. It was given by the histogram of difference from the annual rainfall mean, followed by the annual De Martonne  Figure 9a shows a big change of rainfall variability at Bejaia Airport station. The histogram demonstrated that the weather there has passed by two phases, from dry to wet. However, the cumulative rainfall was less than the average between 1967 and 1989; then an increase in precipitation was observed after 1990, where the annual rainfall gave values above the average. The annual De Martonne curve explained this variability by a change in aridity from humid to very humid during this period. On the other hand, Beni Maouch station has been also influenced by the same change of climate, as shown in Figure 9b. De Martonne curve show that the bioclimatic floor of this station changed from semihumid to humid after year 2000, as the weather there had recorded an enormous increase in precipitation between 2000 and 2018. Inversely, El Hachimia and Sour Elghozlen station showed that precipitation is above the average and climate aridity there is semihumid during the first period of the analysis. Then, they were affected by a drought of rainfall between 1991 and 2018, where it was observed a decrease of precipitation measurements and a change of climate aridity from semihumid to Mediterranean (Figure 9c,d). Figure 10 represents a description of Soummam climate aridity during different periods of study, according to interannual rainfall of 13 years, taken the following intervals: (1967, 1980), (1981, 1993), (1994,2006) and (2007,2018) referring to the maps a, b, c, and d, respectively. The figure shows a decrease of rainfall observations in the interior and south part of the watershed. It is more observed in map c and d in semihumid and mediterranean bioclimatic floor, which is characterized by a range of rainfall [400, 500 mm] and [500, 600 mm], respectively. However, the northern part of the watershed shows an increase of rainfall, varying between 600 and 800 mm. Figure 10 shows also that Soummam This study work aims to study rainfall variability of Soummam watershed between 1967 and 2018. The PCA test showed that the study area is characterized by humid climate in the northeastern part, semihumid in the interior part and mediterranean climate in the south part. The annual precipitation follows a moderate and irregular distribution. The adjustment analysis applied by the D KS test showed that the most suitable statistical laws for modeling of monthly rainfall in each bioclimatic floor is GEV, weibull (3) and logistic distributions according to the p values given by .1234, .1686 and .7529, respectively. Moreover, the diagnosis given by P-P plot, Q-Q plot and survival regression curve showed a deviation of monthly rainfall compared to the theoretical data given by the previous models. This expresses a change of aridity proved by De Martonne index between 1994 and 2018, from humid to very humid in Bejaia Airport station, from semihumid to humid in Beni Maouch station. However; El Hachimia and Sour Elghozlen showed a decrease of rainfall, which changed its bioclimatic floor from semihumid to Mediterranean.
F I G U R E 1 0 Spatial variability of inter annual rainfall in Soummam watershed. (a) 1967-1980; (b) 1981-1993; (c) 1994-2006; and (d) 2007-2018 Our future work aims to study the impact of different natural phenomena on the mediterranean climate change, such as the change that can be caused by the pressures of North Atlantic Ocean to predict the future consequences that can alter the food diversity of those countries.