Can Sub‐Daily Multivariate Bias Correction of Regional Climate Model Boundary Conditions Improve Simulation of the Diurnal Precipitation Cycle?

The diurnal cycle is often poorly reproduced in global climate model (GCM) simulations, particularly in terms of rainfall frequency and amplitude. While improvements in the regional climate model (RCM) with bias‐corrected boundaries have been reported in previous studies, they assumed that diurnal patterns are simulated correctly by the GCM, potentially leading to inaccuracies in the maximum rainfall timing and magnitude within the RCM domain. Here we provide the first examination of improvements to the diurnal cycle, within a RCM domain, achieved through the use of sophisticated bias‐corrected lateral and lower boundary conditions. Results show that the RCMs with bias‐corrected boundaries generally present improvement in capturing both rainfall timing and magnitude, particularly in northern Australia, where a strong diurnal pattern in rainfall is prevalent. We show that correcting systematic sub‐daily multivariate bias in RCM boundaries improves the diurnal rainfall cycle, which is particularly important in regions where short‐term intense precipitation occurs.


Introduction
Predicting short-term precipitation events and their possible change into the future is of significant interest to water resource managers and stakeholders for evaluating the frequency of extreme precipitation and convective storms.Although Global Climate Models (GCMs) are capable of capturing precipitation patterns at daily or longer time scales (Westra et al., 2014), their ability to represent changes in sub-daily precipitation is questionable due to the errors introduced by the deficiencies with regard to their discretization in time (Dai, 2006;Rosa & Collins, 2013) and space (Lee et al., 2007;Wang et al., 2007).In terms of the diurnal cycle, GCM simulations generally have poorly reproduced rainfall frequency and the amplitude (Stephens et al., 2010).This is due to multiple factors including the limitations of the convective scheme that can make the models produce moist convection several hours early, and can be dependent on the horizontal model grid spacing (Wang et al., 2007(Wang et al., , 2011)).
Although the GCM-driven RCM simulations generally show similar diurnal variation to that of the observation in the mentioned studies, it, unfortunately, suffers from inherent limitations of the input boundary conditions driven by the GCM data set, which contain systematic biases (Kim et al., 2020).These improper boundary conditions introduce bias in time and space that can be propagated into RCM outputs (Kim et al., 2021).
To overcome the scale gap and to reduce systematic bias, bias correction approaches have routinely been applied to the boundary conditions.Different techniques are in use for impact assessment, ranging from simple climatological correction (Bruyere et al., 2014;Xu & Yang, 2012) to a sophisticated method, potentially extended to temporal persistence and inter-variable relationships (Kim et al., 2020(Kim et al., , 2023b;;Rocheta et al., 2017).Rocheta et al. (2017) showed that more complex bias correction techniques generally produced a better representation of the rainfall pattern, and correcting the sea surface temperature plays a crucial role in improving the simulation of rainfall.Investigation into extremes using several univariate bias corrections was also addressed by Kim et al. (2020).They found that RCM simulation with a complex method better represents the seasonal extremes than the simple methods.
Previous work highlights the need for physical consistency, including inter-variable relationships, in the boundary conditions for RCMs (Kim et al., 2021;Rocheta et al., 2014).Recently, (Kim et al., 2023b) investigated the RCM with multivariate bias-corrected boundaries because the univariate bias correction techniques can produce physical inconsistencies among the variables.They showed that the complex bias correction approaches better present rainfall variability than the univariate bias correction approaches.While some degradations were shown after the generation of the boundaries, the physical relationships between the atmospheric variables along the boundaries were preserved inside the domain.They highlighted that RCM with multivariate bias correction generally better represents extreme events for three surface variables compared to the RCMs with univariate bias correction.
Although improvements in RCM with bias-corrected boundaries have been reported in previous studies, they corrected bias at daily and longer time scales, assuming that diurnal patterns are preserved properly inside the domain.This may impact the RCM simulated maximum rainfall timing and magnitude.
Observational studies that have identified key features of the global diurnal cycle characteristics note that precipitation typically peaks at mid-afternoon, showing a stronger diurnal cycle over land than over the ocean (Dai, 2006;Watters et al., 2021).In addition, the diurnal amplitude is stronger in summer, and the diurnal cycle of precipitation accumulation is related to its occurrence (Watters et al., 2021).
Therefore, the present study evaluates the RCMs focusing on the simulation of the diurnal cycle of precipitation for summer (DJF) over Australia using RCMs with corrected boundary inputs, using the natural resource management Super-clusters (https://www.climatechangeinaustralia.gov.au/en/overview/methodology/nrm-regions/) as the regions where results are assessed.In all, four RCM simulations which differ based on the source and bias correction method of their lateral boundary conditions are assessed.
The rest of the paper is as follows.The data sets and methods are described in Section 2. Section 3 presents the results, and Section 4 provides a discussion and conclusions.

Models and Data
In this study the Australian Community Climate and Earth System Simulator Earth System Model Version 1.5 (ACCESS-ESM1.5)GCM simulation made available for contributing to the internationally coordinated Coupled Model Intercomparison Project Phase 6 (CMIP6) (Ziehn et al., 2020) was used.For the RCM simulations, we used the Weather Research and Forecasting model (WRF) with dynamical core (ARW), version 4.2.1 (Skamarock et al., 2019).The ACCESS-ESM1.5 has a resolution of approximately 1.875°EW × 1.25°NS with 38 vertical hybrid sigma levels to 40 km from the surface.
The ERA5, the fifth-generation model reanalysis of the global climate from the European Centre for Medium-range Weather Forecasts (ECMWF) (Hersbach et al., 2020) with a resolution of 31 km with 37 pressure levels, was treated as an observation for correcting GCM biases.Here, the ERA5-driven RCM simulation was considered a "perfect" simulation.
The five variables in the RCM lateral boundary conditions (zonal wind u (m/s), meridional wind v (m/s), specific humidity q (g/kg), temperature T (K)) and sea surface temperature SST (K) were corrected toward ERA5.These variables are the only atmospheric and primary surface variables extracted from a GCM and subsequently introduced into the RCM through the lateral and lower boundary conditions, respectively.To correct the variables, the ERA5 variables were first regridded using conservative remapping for the specific humidity and bilinear for the other four variables, toward those of the ACCESS-ESM1.5.The RCM boundary conditions were built after bias correction, and all other variables remained identical.To ensure consistency, the pressures from the surface to the top were recalculated using the bias-corrected fields, employing the Hypsometric equation.Thus, the bias correction procedure also includes adjustments to the pressure field, including the surface pressure.
Here, the model simulations cover 1982-2012 (31 years), and the first year was considered a spin-up period to remove issues related to the equilibrium state for the soil moisture (Chen et al., 2007;Cosgrove et al., 2003).The spin-up year was not included in the subsequent analysis.

Bias Correction Approaches
Two bias correction approaches were applied to the RCM boundary conditions.The first corrected the multivariate relationship among atmospheric variables, such as specific humidity, temperature, zonal and meridional winds, and surface variable sea surface temperature toward those of the reanalysis data.The second further corrected the distribution of these variables using quantile mapping (QM) which adjusts the RCM input boundaries at a sub-daily time scale to match the reanalysis data.

Multivariate Bias Correction (DMBC)
The daily multivariate recursive bias correction (DMBC) was implemented on a daily GCM data set to correct inter-variable relationships among the three-dimensional atmospheric and surface variables at the RCM input boundaries.This approach aims to correct climatological statistics as well as the lag0 and lag1 auto-and cross-correlation attributes and has been used in previous studies (Kim et al., 2023b;Mehrotra & Sharma, 2015).It has shown improvement in representing mean, variance, persistence, and physical consistency between the atmospheric variables at multiple time scales.DMBC presented improvement in the simulation of extreme events compared to the univariate bias corrections and generally better represented the rainfall characteristics (Kim et al., 2023b).The DMBC was applied with respect to ERA5 before downscaling over 31 years.
The standardized daily GCM and observed vectors with zero mean and unit variance are denoted    () and    () , respectively, and using bold capital letters indicate matrices, a simplified Multivariate first-order AutoRegressive model (MAR1) that forms the basis for DMBC can be expressed as (Salas, 1980): and where C and D are the coefficient matrices that contain the lag0 cross-correlations of the reanalysis data.E and K are the coefficient matrices of the GCM data.A simplified model used here considers C and E as diagonal matrices to avoid many parameters that may cause estimation errors (Kim et al., 2023b).Equation 1b can be simplified for the random vector ε t as follows: where ε t is a standardized vector after removing the correlation attributes from the GCM data series.Equation 2is then used to obtain a modified  Ż  that maintains the observed lag0 and lag1 attributes as follows: Adding back the means and standard deviations of observed data provides bias-corrected attributes with appropriate means, standard deviations, lag1 auto-, and lag0 cross-dependence.
Matrices   and   or   and   can be derived as follows (Matalas, 1967): and where B 0 and B 1 are the lag0 and lag1 cross-correlation matrices, and the elements of D and K can be found by singular value decomposition of DD T and KK T , respectively.The elements of C or E, and D or K of simplified MAR1 corresponding to variables i and j are defined as: and The elements of B 0 and B 1 corresponding to variables i and j can be derived by the sets of standardized time series as: and where N is the total number of data sets.
MAR1 can be defined with periodic parameters that are derived for each period separately.
Matrices C and E or D and K are as: and where τ is time internal and y is year.    indicates the GCM periodic series having a zero mean and unit variance.The elements of D or K of simplified periodic MAR1 corresponding to variables i and j are defined as: The outputs were then combined with the nesting approach (Kim et al., 2023b), indicating the daily GCM data incorporated the effect of bias correction at longer time scales: month, season, and annual.This means that the bias-corrected GCM values exhibit the same persistence-related attributes compared to observed values.The bias-corrected variables at multiple time scales can be employed as a form of the weighting factor for the raw daily GCM data (Srikanthan & Pegram, 2009): where  Ẋ  is the bias-corrected daily value and     is the original daily value.Subscript d, m, s, y indicate day, month, season, and year, respectively.X m,s,y , X m,s,y , X s,y , X y indicate the aggregated values from daily to monthly, seasonal, and yearly, respectively.

Sub-Daily Multivariate Bias Correction (SDMBC)
Bias at the sub-daily time scale, if present beyond what is corrected at the daily level using DMBC, is addressed by correcting the entire frequency distribution of the variables at a 6-hourly time scale after implementing the multivariate bias correction at a daily time scale (Figure D in Supporting Information S1).
We first resampled 6-hourly variables to a daily time scale for multivariate bias correction and calculated sub-daily fractions (SF) to rescale the outputs to a 6-hourly time scale.

The daily corrected variables (
Ẋ  ) were then rescaled to 6 hr using SF, as noted above.For sub-daily bias correction, quantile-mapped bias-corrected SF was applied to the daily corrected variables so that the results can reproduce the observed distributional and dependence attributes-mean, standard deviation, lag1 auto-, and lag0 cross-correlation at multiple time scales.
Quantile mapping (QM) generally corrects the distribution of modeled data based on observed data.QM was designed to preserve long-term quantile changes based on the model's cumulative distribution functions (CDF).The transformation function used in this study can be defined as follows: where F g and F o are the CDF of the raw GCM and the reanalysis data, respectively.  −1  is the inverse CDF corresponding to the reanalysis data.SF g and  Ṡ F  are the uncorrected and quantile-mapped corrected SF.Thus, the 6-hourly simulations rescaled using SF g and  Ṡ F  indicate corrections based on DMBC and SDMBC, respectively.

Performance Assessment
Several statistics were used to assess the RCM simulations with uncorrected and bias-corrected boundary conditions.The mean absolute error (MAE) was defined as: where M n and O n are the models and observed data at each grid cell, and I is the total number of grid cells, respectively.
The bias in the means between the model and observed data was calculated at each grid cell for each vertical level and can be represented as: To determine whether two samples, model simulations and observation, come from populations with the same distribution, the two sample Kolmogorov-Smirnov (K-S) test was used.It employs the probability distribution of the quantity L, which is defined as the maximum absolute difference between the cumulative frequency distributions of the model (F g ) and observed (F o ) simulations as follows: The null hypothesis is both samples come from a population with the same distribution and is rejected if the test statistic (L) is greater than the critical value (  critical = () ), where n 1 and n 2 are the sample sizes of modeled and observation, respectively, or the p-value is lower than the threshold of α (here, 0.05).

Results
In this section, the bias correction approaches, DMBC and SDMBC, were evaluated along the boundary conditions in comparison to the reanalysis data.The RCM simulations were then compared to the ERA5-driven RCM simulation, considered the "perfect" boundary simulations, based on the K-S test across the Australasian CORDEX domain.Furthermore, we analyzed the magnitude of the range of the diurnal precipitation cycle in northern Australia.The outermost five grid cells were excluded as the relaxation zone.
The four RCM simulations used here are named RCM(ERA5), RCM(GCM), RCM(DMBC), and RCM(SDMBC), and indicate RCM with reanalysis-driven boundary conditions, RCM with uncorrected GCM boundary conditions, RCM with daily multivariate bias-corrected boundary conditions, and RCM with sub-daily multivariate bias-corrected boundary conditions, respectively.

Is Bias Correction in RCM Lateral Boundary Conditions at a Sub-Daily Scale Necessary?
We first assess the performance of the two bias correction approaches, DMBC and SDMBC, along the lateral boundary conditions at a sub-daily time scale.Both approaches correct inter-variable relationships as well as mean, variance, and persistence attributes at multiple time scales, which was previously shown to be effective in a study by (Kim et al., 2023b).Figures A1-A4 in Supporting Information S1 present a scatter plot comparing the uncorrected and bias-corrected atmospheric variables to the ERA5 data sets at a daily time scale in terms of multivariate relationships for all vertical levels over 30 years for each boundary.The results indicate that both DMBC and SDMBC effectively correct the inter-variable relationship between the three atmospheric variables, with points tending to cluster along the 45° line.
We then evaluate the approaches at a sub-daily time scale (6-hourly) using the K-S test along the boundaries.Figure 1 presents a pdf for temperature (K) along the western boundary of each model for four seasons, DJF, MAM, JJA, and SON.The black dotted vertical line represents the critical value at a p-value of 0.05.If the critical value calculated at a given grid cell is higher than the black dotted line, it means that the values for that model differ from those of ERA5.As seen in the figure, SDMBC is generally more similar to ERA5, with more than 76% of values in agreement.DMBC also shows improvement compared to the GCM, with over 19% of values in agreement, compared to 0.3%-1.4% for GCM.Although sub-daily correction was applied to the boundary variables, the results do not match those of ERA5 perfectly.The possible reasons for this are discussed in Section 4. Other variables along the east, north, and south boundaries are provided in Figures B1-B11 in Supporting Information S1.

Can the Diurnal Precipitation Pattern Be Improved Inside the RCM Domain?
This section investigates the model performance using the K-S test inside the RCM domain at a sub-daily time scale to assess whether the impact of bias correction can be preserved.Figure 2 shows the percentage of grid cells showing agreement between the models and ERA5-driven RCM simulation for precipitation over the Australasian CORDEX domain at a 3-hourly time scale over 30 years for four seasons, DJF, MAM, JJA, and SON.
The results show that RCMs with bias-corrected boundary conditions better represent the 3-hourly precipitation than RCM(GCM).RCM(GCM) produces relatively low agreement, ranging from 20% to 30% over the domain.This implies that bias corrections need to be used before downscaling to simulate diurnal patterns appropriately.
In contrast, RCM(DMBC) and RCM(SDMBC) improve the percentage, ranging from 50% to 70%.From the result, we see that RCM(SDMBC) generally presents similar performance when compared to RCM(DMBC) in all seasons, suggesting that the impact of correcting the sub-daily distribution is relatively small compared to the impact of bias correcting daily and longer timescales.Of interest here is whether this similarity is consistent across regions and seasons, or becomes more enhanced in seasons and areas where the diurnal cycle is significant.

Can the Timing of the Maximum and the Range of Diurnal Precipitation Be Better Represented?
Precipitation typically peaks at mid-afternoon, showing a stronger diurnal cycle over land.However, the GCM simulations often poorly reproduce the diurnal rainfall frequency and amplitude.This suggests that 10.1029/2023GL104442 7 of 11 RCM simulations using boundary conditions generated from GCM data sets may have a bias in the simulation of precipitation at a sub-daily scale.Here we examine whether explicitly correcting this diurnal bias at the boundary improves the RCM simulation within its domain.Figure 3 shows the MAE of maximum 3-hourly precipitation timing and magnitude of the diurnal range (hereafter magnitude range) of 3-hourly precipitation (maximum minus minimum in a day) averaged over 30 years across the regions.The results show that RCMs with bias-corrected boundaries perform similarly in the simulation of maximum precipitation timing.
Although bias still can be seen even after correcting bias in the boundaries, the results show the effectiveness of bias correction for the diurnal precipitation cycle.RCM(DMBC) and RCM(SDMBC) generally show lower bias than RCM(GCM) except for maximum timing averaged over all seasons in northern Australia.

Discussion and Conclusions
Global climate models (GCMs) often exhibit limitations in simulating sub-daily precipitation, in part due to errors introduced by the deficiencies of the convection scheme, leading to producing moist convection several  10.1029/2023GL104442 9 of 11 hours earlier than the observations.While some improvements in reproducing the diurnal variability of rainfall occurrences using the RCMs have been noted in previous studies, their application is hindered by systematic biases inherent in the GCM data (Kim et al., 2023a).Although various mathematical approaches have been applied to the RCM boundary conditions, no study has yet been conducted to address the sub-daily correction, which is important for accurately predicting the maximum sub-daily rainfall timing and magnitude.
With this in mind, we conducted the first ever study of how well more sophisticated alternatives for correcting systematic bias at the sub-daily time scale on RCM boundary conditions improves the diurnal precipitation patterns within the domain, particularly in northern Australia, where intense sub-daily rainfall is often present (Guerreiro et al., 2018;Minobe et al., 2020;Westra et al., 2012).
We find that sub-daily bias correction of the lateral boundary conditions is effective at improving the sub-daily representation of the RCM input variables.We also find that RCMs with bias-corrected boundary conditions improved the simulation of sub-daily precipitation, and the findings can be summarized as follows.
RCM with uncorrected boundary conditions, RCM(GCM), represented a significant bias in simulating 3-hourly precipitation across the Australasian CORDEX domain, with only 0.3%∼1.4% agreement for temperature according to the K-S test compared to the ERA5-driven RCM outputs.This indicates that systematic bias introduced by the GCM data sets through the input boundary conditions was not sufficiently reduced in the relaxation zone (five grid cells from the outermost zones of each boundary), causing a significant bias in simulating sub-daily precipitation within the domain.In contrast, the RCMs with bias-corrected boundary conditions, RCM(DMBC) and RCM(SDMBC), showed improvement with 19.9%∼46.8%and 76.6%∼84% agreement with the ERA5-driven RCM outputs, respectively.Despite the fact that biases related to short-term precipitation were reduced, there are still discrepancies between the models and the ERA5-driven RCM outputs.Further investigation is necessary to enhance the model's ability to simulate the diurnal precipitation cycle.
RCM(DMBC) and RCM(SDMBC) showed generally similar performance over the domain, despite the latter being further corrected using sub-daily QM along the boundaries.This similarity indicates that most of the bias exists at daily and longer time scales and is corrected without consideration of the sub-daily biases.Also contributing to this similarity is the tendency for the relaxation zone to moderate the corrections made to the boundary conditions (Rocheta et al., 2020).Since the additional sub-daily correction is often quite small it may not survive through the relaxation zone.The end result is that SDMBC often only provides a relatively small improvement over DMBC.
With regard to the model simulation for the precipitation maximum timing and magnitude, the RCM(DMBC) showed improvement compared to RCM(GCM), but the RCM(SDMBC) generally showed further improvement exhibiting the lowest bias for magnitude range.An interesting finding from the results was that RCM(GCM) showed a relatively low bias in simulating the maximum precipitation timing, even during high rainfall season (DJF).This aligns with a previous study that found that most RCMs tend to present more accurate diurnal range simulations during DJF-SON (Di Virgilio et al., 2019).This implies that the state-of-the-art GCM simulation used in this study may have the ability to capture the diurnal nature of monsoon rainfall such as that present in Northern Australia.
The results demonstrated that sub-daily correction on the boundary conditions can improve sub-daily precipitation patterns and generally showed better performance, especially in northern Australia, which experiences a strong diurnal cycle in precipitation.
Although sub-daily correction using QM was applied to the boundary conditions, the distributions of the bias-corrected variables were not perfectly aligned with the reanalysis data (as shown in Figure 1).This suggests that the multivariate bias correction may undermine the effects of sub-daily correction, as the daily bias-corrected variables are converted into 6-hourly variables using a fraction factor determined by QM prior to multivariate bias correction.Moreover, while SDMBC showed notable improvement in the lateral boundary conditions, its impact on the diurnal precipitation cycle was reduced far from the boundaries, indicating that the RCM plays a significant role in driving the output variables.Despite these factors, SDMBC of RCM lateral boundary conditions consistently produced the best simulated climate within the RCM domain and should be considered for use within regional climate projection projects like CORDEX (Evans et al., 2021) or NARCliM (Ji et al., 2022).
Diurnal temperature range (DTR, quantified as the maximum minus minimum in a day) has also been investigated to evaluate the effect of bias correction (Figure E in Supporting Information S1).The result demonstrated that SDMBC generally showed better performance than other simulations.
It is worth noting that the performance of MBC on future climate projections can be influenced by both non-stationarity in the bias of inter-variable correlations (Guo et al., 2019;Kim et al. (2023b;Mehrotra & Sharma, 2016) and by the selection of GCM as reliable models for historical climates don't necessarily predict future climates effectively (Hausfather et al., 2020;Mehrotra & Sharma, 2016;Raäisaänen, 2007).Additionally, the downscaled information derived from GCM data sets may not mitigate the uncertainty from natural climate variability (Deser et al., 2012;Hawkins & Sutton, 2009).Further investigation into future climate change is therefore required.Youngil Kim is supported by the UNSW Scientia PhD Scholarship Scheme.J.P.E. was supported via the ARC Centre of Excellence for Climate Extremes (CE170100023) and the Australian Government under the National Environmental Science Program.This research was undertaken using the computational cluster Katana supported by Research Technology Services at UNSW Sydney.This research includes computations with the assistance of resources from the National Computational Infrastructure (NCI Australia), an NCRIS-enabled capability supported by the Australian Government.Open access publishing facilitated by University of New South Wales, as part of the Wiley -University of New South Wales agreement via the Council of Australian University Librarians.
RCM(SDMBC) better represents the magnitude range of the diurnal precipitation cycle compared to other RCM simulations, showing the lowest bias, particularly in northern Australia.From the map presented in Figure C in Supporting Information S1, RCM(SDMBC) produces bias close to zero, particularly in the northeast regions.

Figure 1 .
Figure1.PDF of each model for temperature (K) along the western boundary over 30 years for all vertical levels for each season.Sub-daily multivariate bias correction and DMBC presented here mean the sub-daily-and daily multivariate bias correction, respectively.The black dotted line indicates a critical value calculated using the K-S test at each grid cell.The percentage represents a ratio between the number of grid cells that agree with ERA5 and the total number of grid cells.

Figure 2 .
Figure 2. The percentage of agreement based on the K-S test for precipitation of each model compared to RCM(ERA5) for four seasons, DJF, MAM, JJA, and SON, at a 3-hourly time scale over 30 years across the Australasian CORDEX domain.The models, global climate model (GCM), DMBC, and sub-daily multivariate bias correction (SDMBC), presented here indicate that RCM(GCM), RCM(DMBC), and RCM(SDMBC).

Figure 3 .
Figure 3.The mean absolute error of maximum timing and magnitude range of 3-hourly precipitation in a day over 30 years for all seasons and DJF across Australia (first row) and northern Australia (second row), following the natural resource management Super-clusters.The models, global climate model (GCM), DMBC, and Sub-daily multivariate bias correction (SDMBC) indicate that RCM(GCM), RCM(DMBC), and RCM(SDMBC), respectively.