Spatial Variation of Extreme Rainfall Observed From Two Century‐Long Datasets

This paper presents the spatial variation of area‐orientated annual maximum daily rainfall (AMDR), represented by well‐fitted generalized extreme value (GEV) distributions, over the last century in Great Britain (GB) and Australia (AU) with respect to three spatial properties: geographic locations, sizes, and shapes of the region‐of‐interest (ROI). The results show that the spatial variation of GEV location‐scale parameters is dominated by geographic locations and area sizes. In GB, there is an eastward‐decreasing banded pattern compared with a concentrically increasing pattern from the middle to coasts in AU. The parameters tend to decrease with increased area sizes in both studied regions. Although the impact of the ROI shapes is insignificant, the round‐shaped regions usually have higher‐valued parameters than the elongated ones. These findings provide a new perspective to understand the heterogeneity of extreme rainfall distribution over space driven by the complex interactions between climate, geographical features, and the practical sampling approaches.

1. How do areal rainfall extremes change over space? 2. How may other factors, such as the size, shapes of the area in question affect such spatial dependencies? 3. How are the spatial patterns and variations linked to the large-scale climatology of rainfall? 4. What is the implication of the spatial variation of the parameters to the applications (e.g., FRM)?
Additionally, a toolbox known as the spatial random sampling for grid-based data analysis (SRS-GDA, Wang & Xuan, 2020), is employed to assist the required spatial sampling with predefined or randomized features, for example, size, location, and dominant orientation of ROIs, from the grid-based datasets. The sampled annual maximum daily rainfall (AMDR) of each ROI is fitted with the widely used and tested Generalized Extreme Value (GEV) distributions whose spatial variation is then analyzed. The associated intensive computation demand is met by the high-performance computing (HPC) resources provided by Supercomputing Wales (https://www.supercomputing.wales).
The remainder of this paper is organized as follows: Section 2 describes the data and methods, then the sampled ROI and the goodness-of-fit (GOF) results. Both the qualitative and quantitative results of the spatial variation of the distribution parameters and further application are discussed in Sections 3 and 4, as well as their linkage to the climatology of rainfall are discussed in Section 5. The conclusions and recommendations of further study are given in Section 6.

Datasets
This study makes use of two century-long datasets which are the "gridded estimates of daily areal rainfall" (GEAR) and the "Australian Data Archive for Meteorology" (ADAM). The GEAR dataset is a grid-based (1 × 1 km 2 ) rainfall estimation that covers the mainland of GB from 01/01/1898 to 31/12/2010. It is derived from the UK Met Office national database of observed precipitation from the UK rain gauge network, using the natural neighbor interpolation method. The coordinates are in the National Grid Reference (Ordnance Survey, 1946) which is a projected map coordinate system with the easting (x-) and northing (y-) expressed in linear kilometers (Tanguy et al., 2016). The ADAM data set is generated using a sophisticated analysis technique described in Jones et al. (2009), which is also grid-based (0.05 × 0.05°, ∼5 × 5 km 2 ) rainfall from January 1, 1900 to December 31, 2018 over AU based on the Geocentric Datum of AU 1994 (GDA94; Collier, 2002) with the origin (44 S, 112 E), that is (0,0), and easting (x-) and northing (y-) transformed to kilometers. The recorded rainfall values are provided as daily rainfall, that is, the total rainfall amount over a predefined 24-h (9:00-9:00 a.m.) period which refers to the 24 h prior to the reporting time for the ADAM data set and the 24 h after for the GEAR data set.

Methodology
The geographical areas of the two data domains, i.e., GB and AU, are sampled into a series of regions-of-interest (ROIs) using the SRS-GDA toolbox. Three predefined spatial features (i.e., geographical locations, sizes, and shapes) are applied during the sampling process to reduce the overall computing time while maintaining the representativeness of the samples. The block maxima series, that is, the AMDR of each sampled ROI is then extracted and further fitted by a proper probability distribution. In this study, the three-parameter GEV distribution is chosen as the candidate distribution whose GOF is evaluated. The parameters ( and  ) of the fitted distributions are then analyzed with regards to their spatial distribution referring to the large-scale climatology of rainfall variations.

ROI Generation and AMDR Extraction
The sampling starts with an initial set of uniformly distributed ROIs whose locations are represented by the coordinates of their geometric centroids. At each location, there are seven ROI shapes produced, indicated by their distinctive spatial indexes (Wang & Xuan, 2020) and reciprocally grouped as 0.2/5.0, 0.5/2.0, 0.8/1.25, and 1.0, respectively. The size of these ROIs is then gradually increased by 10 steps with 20% increment each, while maintaining the same shape and location. In the end, the largest sizes the ROIs are 1,050 km 2 for GB and 9,900 km 2 for AU, respectively.
The SRS-GDA toolbox used to generate the ROIs is set up in a way that only one spatial feature is allowed to vary at a time. For instance, to obtain ROI samples of G2 and A2 in Table 1, the centroid locations and shape are kept unchanged while generating 10 ROIs in different sizes. Geographical location (marked as "") Abbreviations: AU, Australia; GB, Great Britain; ROI, region-of-interest. For each ROI, its daily areal rainfall is calculated by taking arithmetic average over the covered grids (e.g., spatial average); then the block annual maxima are picked up to generate a time series of AMDR. This work was carried out by the HPC, processing around 642.3 GB data with 11,011 areal AMDR series produced.

Fitting the Extracted AMDRs Using GEV Distribution
The GEV distribution is one of the most well-founded distributions for describing normalized maxima from a sequence of independent, identically distributed random variables, for example, the block maxima of annual rainfall series. It has been applied to characterize both gauged (Feng et al., 2007;Westra et al., 2013) and grid rainfall datasets (Overeem et al., 2010). A GEV distribution is controlled by three parameters, namely, the location , the scale  , and the shape  which defines the three limiting types: the Gumbel (  0), the Frechét (  0), and the reversed Weibull (  0). AMDR (denoted as x) of each ROI is fitted by a GEV distribution whose cumulative probability distribution function is as follows: (Hosking, 1985) is employed to estimate the parameters of GEV.
However, the GEV distribution has been chiefly used to fit point rainfall extremes as reported in many studies (Schaefer, 1990;Yoon et al., 2013), very few studies have been done on the suitability of fitting the areal grid-based rainfall extremes with GEV. Thus, in this study, the GOF is tested using a bootstrapped version of Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests, and the L-moment ratio diagrams (Text S1). Out of the AMDRs from all ROIs (1,416 ROIs of GB and 9,595 of AU) tested, the results show that the GEV distribution fits well the AMDR series with a 100% pass of the KS test and more than 97% of the AD test. This is also supported by the L-moment ratio diagrams ( Figure S1c) which demonstrates the fitted GEV distributions compared with the statistical characteristics of the AMDR itself.

Analyzing the Spatial Distribution of the Location-Scale Parameters
The spatial variation of the location and scale parameters of the fitted GEV distributions are analyzed both qualitatively and quantitatively. Instead of using full spatial coordinates to represent the geographical locations, a univariate spatial-location representation is adopted in this study. This procedure is briefly described below: 1. The chosen GEV parameter is aggregated meridionally, for example, over all ROIs that have the same x-direction (easting or longitude) coordinate. 2. The aggregated GEV parameter values are indexed by their x-direction only coordinates which are then used as an input variable to represent the geographical locations. 3. The same procedure is also applied zonally, that is, over the same y-direction coordinate.
With this arrangement, the meridional or zonal average of the GEV parameter in question is taken as the response variable. In AU, a concentric pattern is found where both the meridional and zonal average show similar results ( Figure S6); For the GB case, only a strong west-east pattern exists. Therefore, for the convenience of comparison, the meridional average is chosen for both cases. The spatial variation of  is less significant over space thus is not considered (Figure 1 and S7).
Finally, a generalized linear model (GLM) is fitted to quantify the relationship between the GEV parameters and the associated spatial features of the ROIs, that is, to explicitly model the spatial variation of the GEV parameters with respect to the locations, sizes, and shapes of the underlying ROIs. A k-fold-cross-validation (Efron & Tibshirani, 1997) and variogram (Cressie & Hawkins, 1980) are employed to evaluate the performance of GLMs.  (e) and AU (f) where a symmetric pattern in both cases is observed but the changes with respect to ROI shape is not significant. AU, Australia; GB, Great Britain; GEV, generalized extreme value; ROI, region-of-interest. Figure 1a and 1b present the histograms and spatial variations of the three GEV parameters of all ROIs in both GB and AU where the following patterns can be clearly identified:

GEV Parameter Variation Over Geographical Locations
• Most ROIs are in favor of the Frechét type of distribution (  0). • Both  and  present a similar spatial pattern where a higher  is usually accompanied by a higher  .
In GB, the values of  and  in the western region, especially those in the coastal area, are much larger than those in the east. Such west-east gradient is also strong in the west, indicated by much denser contours. However, there is no remarkable variation from south to north, even though both  and  in Scotland are higher. As such, the meridional average is thought to better reveal such eastward pattern which can be described as "west high, east low" with an apparently nonlinear variation.
Both  and  in AU have a clear increasing trend from the south-middle zone to the coastal regions. This spatial pattern can be seen as a series of concentric circles. It is also noted that the rapid variations are close to the north-eastern coastal regions. The decreases in both  and  with increased ROI sizes have an important implication: the most frequent AMDR (relating to ) becomes smaller for larger ROI alongside an overall decreased extremity (relating to both parameters). Another interesting measure is the rate of such reduction (RR) as the size of ROI increases, which has also shown a clear spatial dependency. In AU, the RR remains low in the central desert zone (e.g., from Easting 300 to 360 km), and increases near the coastal areas where large parameter values are also found. This feature can be explained by the fact that regions receiving more extreme rainfall (e.g., the coastal regions in AU) are not only manifested by the higher  and  ; but also have more heterogenous rainfall than regions with less extreme rainfall (lower  and  ). Therefore, the changes of  and  are more sensitive to geographic locations, as revealed by the RR. GB also shows a similar pattern albeit not as remarkable. Figure 1e and 1f present the changes of  and  of all meridional groups in GB and AU, parameterized by the ROI shape (sp). The variation of the shape starts from west-east orientated (  0.2 sp ), gradually growing into more rounded (  1.0 sp ) , then to north-south orientated (  5.0 sp ) ones. Two shapes with reciprocal sp values will have their major dimension swapped, that is, east-west versus south-north and vice versa. The result is inspected and summarized as follows:

Variation of GEV Parameters Due to the Area Shape
• For the majority of the meridional groups, there is little difference between the location-scale parameters of ROIs with reciprocal shapes, for example, two shapes with sp values of 0.2 and 5.0. This is regarded as a symmetric pattern around  1.0. sp • Generally,  and  of ROIs in an elongated shape are smaller than those of the ROIs in more rounded shapes. This indicates that the rounded-shape ROIs have a better chance to capture more rainfall extremes than the elongated ones. It also leads to that for the same area size, regions with more regular shape tend to have more extreme areal rainfall. • Overall, the effects of ROI shape are insignificant.

Quantification of Spatial Variation
Generalized Linear Models (GLMs) are based on an extension to the classical linear regression model, having been widely applied in hydrometeorology (Coe & Stern, 1982;Stern & Coe, 1984) and shown to be effective in terms of incorporating complex structures (Segond et al., 2006). Since Chandler and Wheater (2002) proposed a GLM-based framework for interpreting historical daily rainfall records and revealing the changes on rainfall occurrence and amount in western Ireland, many more applications have followed, for example, Yan et al. (2002), Yang et al. (2005), and Rashid et al. (2013) with good performance reported.
As the two meridionally averaged parameters  and  which reflect the property of rainfall extremes are shown to follow similar right-skewed gamma distributions ( Figure S2a), we broadly followed Chandler and Wheater (2002) and proposed a GLM with a log-link to quantify the spatial variation. The fitting starts from a simplest prediction form, and then adds other predictors or their combinations successively, for example,  x s,  s sp, or  x sp (James, 2002). To determine whether to keep the newly added predictor/combination, the significance of the new term is evaluated by calculating the log-likelihood at the significant level of 0.05 (detailed in Text S2). Finally, an optimal form of the GLMs is obtained by considering both the log-likelihood and the discrepancy (e.g., root-mean-squared-error) and identified as: where the subscripts of β refer to the study case in question. A maximum likelihood estimator (McCullagh, 2018) was employed to obtain β (see the estimations in Table S1). These fitted GLMs are visualized in Figure 2a and 2b which show the following intriguing spatial features: ). It means that the north-south variation in AU is in general smaller than that of the eastwest direction. Besides, the combined term (  x s) is significant, which means that the RR of  with respect to ROI size varies at different geographic locations. It is manifested by the uneven vertical gaps between contours in the right panel of Figure 2b The performance of the GLMs is evaluated by comparing the parameter values modeled by the GLMs and those from the originally fitted GEVs (Figure 2c), where in the GB case there are slight underestimates for some large values that appear in the western coastal region; and in the AU case, some overestimates happen for the small values which are located in the middle-south dry zone. The GLM model probability structure is checked by a residual analysis (McCullagh, 2018;Pierce & Schafer, 1986;Wang, 1987) whose results are shown in Figure 2d with a theoretical normal distribution shown on the x-axis and the residual quantiles on the y -axis. If the probability assumption (i.e., gamma assumption) is correct, all residuals would have the same distribution which is an approximate normal distribution. The distribution of the residuals of the four GLMs is symmetric with two flat sides. Generally, the approximation fits well except for the upper side which represents only 0.9% of the total data points.
In view of the research aims, this is considered to be acceptable. The locative continuity in the adjacent ROIs was compared using variograms (see Text S4) which show highly similar spatial correlations in the parameters modeled by the GLM and those from the originally fitted GEVs for both the GB and AU cases. The prediction skill of the fitted GLMs is tested using a k-fold-cross-validation (  10 k ) method which produces average Nash-Sutcliffe efficiency (NSE) coefficients across all random partitions of the four GLMs as 0.97, 0.89, 0.86, and 0.62 (see Text S3).
These quantitative findings regarding the spatial variation of the GEV parameters have two import implications to many downstream applications of the areal rainfall maxima, for example, FRM. For one aspect, the traditional approach in FRM makes use of point rainfall maxima to represent the areal one (catchment or a predefined area), where a scaling factor is involved. This simplistic treatment ignores the complexity in spatial distribution, nor can it account for the interplay of the size, location as well as shape of the area in question, as revealed above. For another, the overall quantification of the spatial variation of the GEV parameters (hence, the return values) makes it possible to study the FRM at country level as a single entity instead of looking at individual regions with isolations. It also helps to gain insights into how large scale hydroclimatic (rainfall) variation can affect the local FRM, which is very important for studying FRM under climate change impact.

Link to the Large-Scale Climatology of Rainfall Variation
The GEV distribution parameters that can reveal the characteristics of extreme rainfall in terms of both its amount and occurrence probability, are shown to have a strong spatial dependency as discussed in the previous sections. To understand how such spatial variation of the extreme rainfall is related to the climatology of rainfall variation, areal annual rainfall (AAR) time series from each ROI were obtained. The mean and standard deviation (SD) of the AAR series were then compared with the GEV parameters  and  of the AMDR series extracted from the same ROIs. To visualize the link, the spatial continuity of the corresponding parameters from both AAR and AMDR were represented by their variograms (see Figure s5) which show very little difference on locative continuity. Figure 3 demonstrates a great deal of similarity existing in space between the daily maxima time series (i.e., the AMDR) and the cumulative annual rainfall (i.e., the ARR). For example, regions with higher mean of AAR are not only represented by a higher SD (e.g., the circles located in west Scotland and west Wales of GB and in north-eastern coastal regions of AU, appearing more reddish and larger), they are also associated with higher GEV parameters of AMDR, and appear to be more heterogenous. This feature also exists in regions with low and more even annual rainfall distribution, but works in an opposite way (e.g., circles located in middle and eastern England of GB and middle-north zone of AU are all more bluish and smaller). These findings are consistent with those published in the series of climate reports of GB (M. Kendon

Conclusions
This paper presents a study on the spatial variation of extreme rainfall using two-century long datasets covering GB and AU. The AMDR series extracted from regions-of-interest (ROI, 11,011 in total) with various spatial properties (location, size, and shape), are individually fitted with GEV distributions whose parameters are then analyzed over the space. Four GLMs are developed to quantify these variations by involving the effect from the geographical location, area size, and shapes. From the results discussed previously, the following conclusions can be drawn: 1. The GEV distributions are shown to be able to model well the grid-based areal-AMDR for both GB and AU; more than 90% of the regions are better fitted with the Frechét distribution among the three GEV types. 2. The GEV location () and scale ( ) parameters present similar spatial patterns where a higher  is usually accompanied by a higher  indicating those regions that have higher amount of most frequent rainfall often observe a higher occurrence probability of extremes. 3. Geographic location is the most significant factor affecting the two GEV parameters. The spatial pattern in GB is an eastward-decreasing-banded pattern with no significant difference along north-south direction. In AU, a concentrically increasing pattern from middle-south zone to north-east coasts is found. 4. Increasing the region size will decrease both parameters which means a decrease of the most frequent AMDR amount and the occurrence probability of extremes. However, in AU, the rate of such decrease varies with regions as the combined impact of ROI location and size is also detected to be significant. 5. Compared with other spatial properties, the shape of ROI is detected as insignificant, even though, a symmetric pattern is found for regions with reciprocal spatial indexes. Also, regions of more elongated shapes tend to have small parameter values in contrast with those having regular/rounded shapes.
These findings offer a new quantitative insight in understanding the spatial variation of large-scale climatology of rainfall. The quantification of the extreme rainfall and its spatial dependencies are of great practical value in engineering design, for example, designed rainfall/floods for constructions. The methods employed in this study are specifically designed for large grid-based datasets, and thus can be readily applied to climate projections for evaluating the spatial heterogeneity of climate change impact, such as flooding and droughts. It should be noted that the quality of the underlying datasets, which have undergone a series of quality control measures, may still bring in large amount of uncertainties and should be addressed in further work. Additionally, impact of the density of the underlying data observations, that is, rain-gauges, and its variation over long term also need to be further studied.

Data Availability Statement
The open-source toolbox of spatial random sampling for grid-based data analysis (SRS-GDA toolbox, https://doi.org/10.5281/zenodo.4044626) developed by the authors used in this study.