Understanding the Impact of Precipitation Bias‐Correction and Statistical Downscaling Methods on Projected Changes in Flood Extremes

This study evaluates five bias correction and statistical downscaling (BCSD) techniques for daily precipitation and examines their impacts on the projected changes in flood extremes (i.e., 1%, 0.5%, and 0.2% floods). We use climate model outputs from the Coupled Model Intercomparison Project Phase 6 (CMIP6) to conduct hydrologic simulations across watersheds in Iowa and determine historical and future flood extreme estimates based on generalized extreme value distribution fitting. Projected changes in these extremes are examined with respect to four Shared Socioeconomic Pathways (SSPs) alongside five BCSD techniques. We find the magnitude of the estimates of future annual exceedance probabilities (AEPs) are expected to increase under all SSPs, especially for the emission scenarios with higher greenhouse gases concentrations (i.e., SSP370 and SSP585). Our results also suggest the choice of BCSD impacts the magnitude of the projected changes, with the SSPs that play a more limited role compared to the choice of downscaling method. The variability in projected flood changes across Iowa is similar across the downscaling technique but increases as the AEP increases. Our findings provide insights into the impact of downscaling techniques on flood extremes' projections and useful information for climate planning across the state.


Introduction
Understanding how extreme flood quantiles are projected to change for different future climate scenarios is imperative for optimizing critical infrastructure and minimizing potential impacts on local communities (e.g., Arnell & Gosling, 2014;Lawrence, 2020).Developing flood projections at the regional scale typically requires the use of hydrologic modeling coupled with climate model outputs.However, hydrologic models require high spatial resolution for variables like precipitation and temperature, and downscaling is used to regionally scale climate model outputs with coarse spatial resolutions (e.g., ∼1°or ∼100 km).A common approach to accomplish this is to apply bias correction and statistical downscaling (BCSD) to the climate model outputs (e.g., Hundecha et al., 2016;Prudhomme et al., 2002;Teutschbein et al., 2011).Additionally, there are many BCSD techniques (e.g., quantile mapping, power transformations) and understanding the impact they have on flood projections is important when considering designing for the future.
Numerous studies have examined the performance of statistical downscaling on climate model outputs such as precipitation and shown that it is able to improve the representation of processes at the regional scale (e.g., Golian & Murphy, 2022;Gudmundsson et al., 2012;Maraun et al., 2010;Piani et al., 2009;Sunyer et al., 2015;Tapiador et al., 2020), making it suitable for hydrologic application (e.g., Quintero et al., 2022).However, studies such as Teutschbein et al. (2011) have indicated that the performance can vary among statistical downscaling methods and the choice of method can impact the climate change signal from the native climate model.Furthermore, the hydrologic driver (i.e., snowmelt, rainfall) of floods can play a key role in the impact of downscaling methods on flood projections.For example, Hundecha et al. (2016) showed that basins with a flood response driven by snowmelt and rainfall would have different flood projections depending on the selected downscaling method, whereas rainfalldominated basins tended to have more agreement among methods.Furthermore, they found that the selection of downscaling method accounted for up to 35%-60% of the total variance in some cases.In regard to flood frequency analysis, studies such as Camici et al. (2014) found that delta change projected a decrease in most flood quantile estimates, whereas quantile mapping projected an increase.These studies highlight the importance of considering how the choice of downscaling impacts flood projections.However, many of these studies focus on the previous version of the Coupled Model Intercomparison Project (CMIP5) and not specifically on the upper tail of the flood peak distribution, highlighting a need to understand the role of downscaling method on these extremes.
In Iowa, understanding projected changes in flood frequency extremes is important as communities across the state have experienced numerous flood events over the past few decades, leading to major economic impacts (e.g., Flanagan et al., 2020;Kraft et al., 2023;Smith et al., 2013).One example is the 2008 flood at Cedar Rapids (eastern Iowa), which was exacerbated by climate change (Villarini et al., 2020).Additionally, analysis of observational records highlight that across Iowa there are increasing trends in the number of flood events at the annual scale as well as for the summer season (e.g., Mallakpour & Villarini, 2015).Furthermore, flood projections using CMIP5 and Coupled Model Intercomparison Project 6 (CMIP6) climate models across Iowa indicate increasing trends in annual flood peak magnitudes for communities in the eastern half of the state under higher emission scenarios by the end of the century (Michalek et al., 2023a(Michalek et al., , 2023b)).With both observed and projected flood analysis indicating a changing nature of floods across Iowa, it is critical to assess the projected changes in flood frequency extremes to aid decision makers.However, none of the abovementioned studies examine flood frequency projections and only a brief analysis of the impact of BCSD has been performed for this region in terms of flood projections (Michalek et al., 2023c).Therefore, this study area is an ideal candidate to build up existing literature and examine the impact of BCSD on flood frequency projections.
Our objective is to examine the projected changes in flood extremes across Iowa using CMIP6 model outputs.In particular, we want to determine how the choice of BCSD applied to climate-model precipitation impacts the projected flood frequency changes based on hydrologic modeling results.Furthermore, we compare projected changes in flood extremes across four different Shared Socioeconomic Pathways (SSPs) to understand the role on flood frequency projections for Iowa.We examine projected changes in this hazard by comparing changes in magnitude and return period for basins across Iowa utilizing annual exceedance probabilities (AEPs) of 1% (100year), 0.5% (200-year), and 0.2% (500-year).
We organize the paper as follows.Section 2 describes the study region with relevant geographic information.Section 3 describes the methodology of our study, including the hydrologic model set up, climate models and BCSD procedure, and statistical analysis.Section 4 provides the results related to the projected changes in flood extremes across Iowa with relevant discussion, followed by Section 5, which describes the primary conclusions and future works.

Study Region
We focus on the State of Iowa (United States) whose basins drain into either the Missouri or Mississippi Rivers.The study area is agriculturally dominated with low relief and can be divided into seven distinct landforms.The landscape has a high drainage density in the Southern Iowa Drift Plains, Loess Hills, Iowan Surface, Northwest Iowa Plains, and Paleozoic Plateau, each of which consist of a loess cover (Prior, 1991).Iowa has a seasonally variable climate, with the largest amounts of rainfall occurring in summer (June to September) and snow in winter (December to February).For our analyses, we start with 90 U.S. Geological Survey (USGS) stream gages that have at least 30 years of annual flood peak observations from 1981 to 2014; we then utilize 44 of them for our primary analysis after validation (Section 3; Figure 1 triangles).The drainage areas of the gages range from 10 to 36,000 km 2 .

Hydrologic Modeling
The Iowa Flood Center (IFC) has developed a continuous rainfall-runoff model (i.e., the hillslope-link model (HLM); Mantilla et al., 2022) for the state that has been shown to perform well (Krajewski et al., et al., 2023a;Quintero et al., 2016;Quintero et al., 2020).For our hydrologic assessment, we simulate streamflow in Iowa with the TETIS version of HLM (Quintero & Velasquez, 2022).The HLM is a fully distributed hydrologic model based on the decomposition of the landscape into hillslopes and channels.It estimates runoff generation at the hillslope by simulating different processes.In the TETIS version of the model, snow accumulation and melting rates are derived from precipitation and temperature data based on the degree-day method (Martinec & Rango, 1981;Rango, 1995).The model simulates precipitation losses in vegetation and soil pore macrostructure.Surface infiltration considers the hydraulic soil properties of the hillslope, and the conditions of frozen ground during the cold season.Deep percolation and groundwater losses are also modeled, considering the hydraulic conductivity of soil at the subsurface.Total hillslope runoff is aggregated in the river channel from the contribution of overland flow, interflow, and base flow.A nonlinear hydrologic routing model transports flow in the channels and considers the scaling of the geomorphologic characteristics of the river network.The model equations are written as a system of ordinary differential equations (ODEs).A numerical solver that integrates the HLM benefits from the binary tree structure of the river network to solve the system of differential equations in an asynchronous manner, suitable for high performance computing environments (Small et al., 2013).In our implementation, the HLM solves the system of ODEs at an hourly time step, and we save outputs of streamflow simulations at a daily time scale for all stream links in Iowa.Annual maximum discharge can be determined for all channel links in the river network.Projections of precipitation, temperature, evapotranspiration, and frozen ground conditions are forced into the hydrologic model.The spatial and temporal resolution of these inputs are key for the performance of the model (see Quintero and Velasquez (2022) for more details).For details on model validation and performance with CMIP5 and CMIP6 inputs see Michalek et al. (2023aMichalek et al. ( , 2023b)).

Climate Model Data and Bias-Correction and Statistical Downscaling
We utilize 18 climate models from the CMIP6 project that have precipitation, temperature, and evapotranspiration for the historical, SSP126, SSP245, SSP370, and SSP585 scenarios as input to our hydrologic model (Table S1 in Supporting Information S1).Precipitation and temperature are utilized at a daily time scale, while evapotranspiration is monthly as it is standard for our hydrologic modeling approach described in Section 3.1.We conduct simulations from 1981 to 2100 (historical runs: 1981 to 2014; future projections: 2015 to 2100).Since precipitation's spatial resolution is key for model performance, we focus on the bias correction and statistical downscaling of this variable.For a historical reference, we use PRISM (Parameter-elevation Regressions on Independent Slopes Model) daily 4-km gridded climatological data set due to its high spatial resolution and the product being based on observations from a weighted regression analysis (Daly et al., 2002).We use PRISM products from 1981 to 2014 to conduct bias correction of precipitation, temperature, evapotranspiration, and downscaling of precipitation of the GCM forcings.To downscale and bias correct precipitation, we employ five methods: (a) detrended quantile matching (dqm; Cannon et al., 2015); (b) empirical quantile mapping (eqm; Amengual et al., 2012); (c) mean variance adjustment (mva); (d) power transformation (ptr; Leander & Buishand, 2007); and (e) quantile delta mapping (qdm; Cannon et al., 2015).We utilize functions developed by the Santander Meteorology Group (2015) to accomplish this task.For more details see Michalek et al. (2023c).

Evaluation and Statistical Analysis
We begin by assessing which of the 90 stream gage locations that have 30-years of observed flood peaks are suitable for the analyses of flood extremes based on the evaluation of hydrologic simulations.For a given gage location, we compare the empirical cumulative distribution functions (ecdfs) of the annual maximum discharges obtained from the USGS observations and HLM simulations based on outputs from each of the 18 climate models for the historical period  with the two-sample Kolmogorov-Smirnov (KS) test (e.g., Massey, 1951).The KS test is a non-parametric test with the null hypothesis that the ecdfs are the same, while the alternative hypothesis is that the ecdfs are statistically different from each other.This is performed for each downscaling method.We remove gages which have statistically significant test results for more than six of the 18 climate models across two or more downscaling methods.We apply the false discovery rate (FDR) to control for type I error in our multiple hypothesis testing at a significance level of 5%.Specifically, we apply the generalized FDR approach by Benjamini and Yekutieli (2001), which does not require assumptions about spatial dependence.
After filtering our initial gages based on the KS test mentioned in the previous paragraph, we perform flood frequency analysis for these locations based on the 3-parameter generalized extreme value (GEV) distribution (e.g., Coles, 2001), focusing on flood quantiles with AEPs of 1% (100-year), 0.5% (200-year), and 0.2% (500year).The fitting is conducted with the extRemes package in R assuming stationarity over three sets of 34-year periods: (a) USGS observations ; (b) historical simulations ; and (c) future scenarios .For this flood frequency analysis, we fit the GEV and estimate the quantiles for the three sets of simulations.The estimation is conducted by downscaling method (BCSD) and emission scenario (SSP).Furthermore, we perform a Monte Carlo simulation to estimate the 5th and 95th confidence levels using the quantiles estimated based on USGS observations to filter gage locations where our simulations reproduce observed flood quantiles.To explain, we only select sites where the ensemble median of flood quantiles across climate is within the observed uncertainty bounds for all three AEPs.
To summarize, we utilize a two-step process to select sites in which climate-model based historical simulations of annual flood peaks can reproduce the statistical distribution and extreme flood quantiles of the observations for a historical period (1981-2014) across all downscaling techniques.First, we compare ecdfs with the KS test and select gages where four or more downscaling methods can reproduce the historical ecdf for at least 12 climate models.Next, we fit a GEV distribution for the remaining sites based on each climate model and downscaling combination for the historical simulated annual flood peaks, and determine if the results are within the observed confidence intervals.This provides us with the final selection of gages to use for our primary analysis.
For the primary analysis of the flood quantiles, we project changes and the impact of BCSD methods in two ways.First, we scale each of the three AEP values for a given locations, scenario, and BCSD methods as follows: where Q i is the flood quantile for the 1%, 0.5%, or 0.2% AEP and Q 50 is the 50% AEP of the historical simulations, enabling a comparing among all sites regardless of their drainage area.Second, we compute the future AEPs with respect to the historical ones.For these analyses, no observations are utilized.
Finally, we examine changes in projected precipitation as it is the primary input to our model and see if it aligns with what is observed in flood quantile changes.We apply the precipitation intensity index (SDII) as follows: where RR i is the daily precipitation amount on wet days (W, RR > 1 mm) in a given period.We examine the change between the historical (1981-2014) and future (2066-2096) periods for each emission scenario at a given grid cell across Iowa.

Results and Discussion
Before we present our primary analyses, we begin with the validation of our hydrologic model by examining the performance of the observed flood peak distribution with each climate-model-simulated flood peak distribution.We subset the 90 USGS gage locations based on the percentage of the 18 climate models where the KS test did not produce a statistically significant result.Based on this test, we selected 70 sites that met our criteria.With the initial site screening complete, we focus on selecting sites where the annual exceedance discharge estimates for the historical period (1981-2014) for 1%, 0.5%, and 0.2% values are within the observed confidence intervals at a given location.with 44 locations which are utilized for the rest of our analysis.Finally, Figure 2 highlights that mva and eqm performed the best in reproducing the observed AEP estimates across Iowa, whereas qdm performed the worst.
Following model validation, we proceed to analyze the projection in flood quantile estimates considering probabilities of 1%, 0.5%, and 0.2%.The projections are compared against historical simulations using the AEP ratio derived from Equation 1 (refer to Figure 3) stratified by BCSD method and SSP. Figure 3 provides a clear depiction of the expected rise in median AEP discharge values across Iowa for all AEPs, SSPs, and BCSD techniques.For example, when analyzing the 1% exceedance probability (Figure 3, left column), the historical data show a median value of around 6 for each BCSD.In the future, this ratio is expected to vary between 8 and 12.Likewise, the median historical ratio for the 0.5% exceedance probability (Figure 3, middle column) is approximately 10, whereas future ratios are expected to fall between 12 and 16.Finally, for the 0.2% exceedance probability, the median historical AEP ratio stands at approximately 13, with future values projected to range from 15 to 27.Notably, the 0.2% AEP discharges exhibit the greatest variability across Iowa.It is worth highlighting that the method ptr demonstrates the most substantial increase across all SSPs and AEPs, while mva exhibits the least pronounced projected change.Although there are some inconsistencies observed in the quantile mapping methods, the overall outcomes are largely similar.
Our results agree with other studies such as Chen et al. (2013), who found difference in streamflow projections among statistical downscaling methods.They showed that mean-adjustment based methods of precipitation bias correction do not perform as well in capturing historical behavior of streamflow compared to distribution-based methods for basins across North America.Therefore, projections based on mva-corrected precipitation should be interpreted with caution.Furthermore, Hundecha et al. (2016), who examined the impact of statistical downscaling methods on projections of extreme flow indices in basins across Europe, found that methods such as quantile mapping and mean and/or variance adjustment generally showed agreement in projected change for rainfall-dominated flood regions in Europe.However, the authors showed decreases in snowmelt dominated basins.For Iowa, the maxima of floods typically occur between April and June (e.g., Veatch & Villarini, 2022;Villarini, Smith, Baeck, & Krajewski, 2011), with rainfall maxima between May and August (e.g., Villarini, Smith, Baeck, Vitolo, et al., 2011).Hence, there is a dual influence of snowmelt during the cold season and rainfall in the warm season on flood peaks.The differences observed across BCSD methods in Iowa (Figure 3) could be attributed to the varying roles of rainfall and snowmelt in driving these peaks as the methods may exhibit different projected changes based on whether rainfall or snowmelt is the predominant driver, as previously mentioned (Hundecha et al., 2016).Future works should incorporate seasonal downscaling performance when examining flood frequency projections for Iowa.
Another way to characterize projected changes in AEP discharges is to estimate the return period of the historical AEP discharge from the future distribution.We present these results in Figure 4, which highlights how the 100-, 250-, and 500-year floods are projected to shift at the end of century.For all SSPs and BCSDs, the return period for most sites across Iowa is expected to decrease, with the most extreme shifts found under higher emission scenarios (i.e., SSP370 and SSP585).For the 100-year flood (1% AEP), the return period in the future would range from ∼70-year under SSP126 to ∼40-year under SSP585 for most of Iowa's sites.For the 200-year flood (0.5% AEP), the return period is projected to decrease for most basins to less than 100 years under SSP370 and SSP585.The smallest change would be a shift to a median return period of ∼120 years under SSP126.For the 500-year return period, the median shift across sites ranges from ∼200 years (SSP126) to ∼140 years (SSP585).It should be noted that not all sites show this behavior as the whiskers of the plots in Figure 4 extend quite far in both directions, indicating that some sites are expected to have more extreme projected shifts in return period.This may be due to the variability in downscaling methods at different basin scales as results under the dqm method have the smallest variability across Iowa.Finally, we show that mva-based simulations produce the smallest shift in return period compared to the other methods.
To put this in context we can examine how this compares to other studies for the state of Iowa.For example, Slater et al. ( 2021) examined historical changes in the 20-, 50-, and 100-year floods with information multiple sites across Iowa indicating increases already observed by over 50% and the return period for a 50-year flood (1970) on average recurs every 8.5 years (2019).Furthermore, Villarini et al. (2015) determined that higher emissions scenarios are projected to result in stronger signal of change for the upper tail of the discharge distribution for the Racoon River in Iowa.Combining this information with our projections provides support that changes in flood frequency estimates will likely continue until the end of the 21st century and that, regardless of downscaling method, realistic projected changes are produced.However, these results should not be extrapolated to other regions of the United States or the world as downscaling methods do not necessarily always show agreement in the direction of change (e.g., Camici et al., 2014).
Finally, we shift our focus to examining the impact of downscaling methods on precipitation and how it aligns with our projected changes in flood frequency extremes.In Figure 5, we present the change in AEP compared to the basin-average change in SDII stratified by downscaling method.The changes align in a positive manner, where an increase in SDII corresponds to an increase in AEP discharge.This result can explain how wet day precipitation is one of the drivers of flood projections for our area.Furthermore, this analysis highlights directly how mva applied to precipitation produces a slight increase across all emission scenarios for basins across Iowa with a range of ∼0% (SSP126) to ∼5% (SSP585).Comparable results are noted in Sunyer et al. (2015) for projections of precipitation extremes in Europe, in which mva downscaling produced the smallest projected changes compared to quantile methods.Furthermore, studies such as Fang et al. (2015) indicate that downscaling methods like quantile mapping and power transformations work well for the representation of wet days and their intensity, supporting our results.We see agreement between these two BCSD methods (Figure 5) as those methods are nearly indistinguishable from one another, indicating that the application of mva may not be warranted for use across Iowa from a precipitation and flood perspective.Future work should examine the impact of statistical downscaling methods on individual precipitation events to better understand differences in large flood events.

Conclusions and Future Work
In this study we examined the projected changes in flood extremes at 44 watersheds across Iowa for the 2070-2099 period.We found that, independent of emission or downscaling method, events associated with these AEPs are projected to increase by the end of century for most sites across the state.Simulations utilizing mva downscaled precipitations had the least amount of change compared to other downscaling methods.Furthermore, the qdm downscaling method had the largest variability across sites in Iowa.Finally, for higher emission scenarios (i.e., SSP370 and SSP585), flood frequency extremes are projected to increase the most across BCSD methods.However, our results should be interpreted based on the assumptions utilized to conduct this study.
First, we focused on a single stationary distribution fit for a historical and future period (GEV).Other distributions would likely produce varying results and we encourage exploring this uncertainty similar to Lawrence (2020).
Another limitation of this study is we only focused on a single flood peak value per year.However, it has been shown that downscaling methods impact the seasonal flow regimes (Hundecha et al., 2016) and this could be a reason we see differences in downscaling methods as well.A final primary limitation of this study is that we did not examine the storm mechanisms driving peak flows within Iowa.We only examined the SDII as a proxy to summarize the overall precipitation behavior across the state.However, certain methods of statistical downscaling might do better at capturing processes more important for flood peak behavior.Furthermore, we assume stationarity of the statistical downscaling, meaning that the correction/shift computed during the historical period holds in the future.We encourage a more in depth analysis of statistically downscaled precipitation by comparing the results by the statistical downscaling with those by dynamic downscaling to understand the impact of these assumptions (Maraun et al., 2010).
Overall, our results provide an initial look at the projected changes in flood frequency extremes across Iowa with the most recent climate projections.We encourage stakeholders across the state to consider not only the impact of emission scenarios when incorporating climate projections into water resource planning and design but also the impact of downscaling method.Finally, future research should focus on comparing current climate projections with past climate model projections and other downscaling products to quantify agreement and uncertainty.
pronounced with higher emission scenarios • Projected increases in flood extremes become greater as the annual exceedance probability decreases Supporting Information: Supporting Information may be found in the online version of this article.

Figure 1 .
Figure 1.USGS gage locations (triangles) with major rivers (gray lines) in Iowa, United States.
Figure2shows the percentage of the 70 gage locations where the ensemble median of the climate model projections is within this site-specific confidence level based on the different BCSD methods.We only select locations where the three AEPs for all five BCSDs are within the observed confidence level, providing us

Figure 2 .
Figure 2. The percentage of sites for five BCSD techniques where the climate model ensemble median estimate is within the observed confidence intervals.The columns represent the three AEPs.The black line represents the percentage of sites selected (44 gages or 63% of all locations) for our primary analysis.

Figure 3 .
Figure 3. Boxplots of projected AEP discharge value scaled by the historical median flood quantile (Equation1) for all sites across Iowa.The whiskers represent the 5th and 95th percentiles.The columns represent the AEP quantiles of interest, and the rows are the climate model scenario.Each box plot is used to represent the discharge ratio for all 44 gage locations across Iowa for a given BCSD.

Figure 4 .
Figure 4. Estimates of future return period for given extreme AEPs.Each box plot is used to represent the future return period for all 44 gage locations across Iowa for a given BCSD.The whiskers represent the 5th and 95th percentiles.The red dashed line represents the AEP return period in the historical period for each column.The columns represent the AEP quantiles of interest, and the rows are the climate model scenario.

Figure 5 .
Figure 5.A comparison of the relative difference in historical and future AEPs for a given basin compared to the change in basin average wet day precipitation (SDII).The columns represent the AEP quantiles of interest, and the rows are the climate model scenario.