3.1. Data and methodology
To assess the performance of equiratio CDFm, we rely on the simulations and projections from CMIP5. The 34 GCMs used in this study are, in alphabetical order: (1) ACCESS1-0, (2) ACCESS1-3, (3) bcc-csm1-1, (4) BNU-ESM, (5) CanESM2, (6) CCSM4, (7) CESM1-BGC, (8) CESM1-CAM5, (9) CMCC-CM, (10) CNRM-CM5, (11) CSIRO-Mk-3-6-0, (12) EC-EARTH, (13) FGOALS-g2, (14) FGOALS-s2, (15) FIO-ESM, (16) GFDL-CM3, (17) GFDL-ESM2G, (18) GFDL-ESM2M, (19) GISS-E2-H, (20) GISS-E2-R, (21) HadGEM2-CC, (22) HadGEM2-ES, (23) HadGEM2-AO, (24) inmcm4, (25) IPSL-CM5A-LR, (26) IPSL-CM5A-MR, (27) IPSL-CM5B-LR, (28) MIROC5, (29) MIROC-ESM, (30) MIROC-ESM-CHEM, (31) MPI-ESM-LR, (32) MPI-ESM-MR, (33) MRI-CGCM3 and (34) NorESM1-M. Detailed descriptions and relevant information can be found at http://cmip-pcmdi.llnl.gov/cmip5/. The historical run and the RCP4.5 run (Moss et al., 2010), respectively cover the period from the mid-19th century to 2005 and from 2006 to the end of the 21st century. Here we concentrate on the period from 1901 to 2008. The observed precipitation (unit: mm month−1) used in this study is a gridded dataset on a 0.5 latitude–longitude resolution elaborated by the Climate Research Unit of the University of East Anglia. The latest version of the high-resolution data employed in this study is CRU TS3.10 (Harris et al., 2013), which is the best long-term gridded observational data to facilitate climate change research. Due to the mismatch between fine-resolution observation data and the coarse resolution of climate model simulations, we aggregate the finer-scale observed data to a GCM-specific resolution in order to avoid the added uncertainty introduced by the interpolation scheme. In addition, the bias correction procedure using equiratio CDFm is applied only to monthly precipitation data confined within China (on average, 450 grid cells within China), but it can be readily applied to other parts of the world or to daily outputs from GCMs.
In the work by Li et al. (2010), they used the mixed Bernoulli–gamma function to approximate the precipitation distribution. However, if the precipitation datasets fail to match the priori assumptions of the theoretical distribution, the mixed Bernoulli–gamma distribution correction method is meaningless. We cannot guarantee that the theoretical distribution is suitable for all grid points and all GCMs, which will inevitably invalidate the automated procedure. Consequently, the more effective approach to bias correction is a nonparametric method, which has been proved to be the best at reducing biases in regional climate models (RCMs) in Norway (Gudmundsson et al., 2012). In fact, as the operator in Equation (4) serves as a mapping from one of the quantiles of the modeled distribution in a projection period onto the same quantile of the observed distribution in the reference period, the quantile–quantile correspondence denoted as h1 can be constructed straightforwardly instead of introducing theoretical distribution. The similar explanation can also be applied to h2. Finally, the equiratio CDFm method used in this study can be formulated as:
In Equation (5), two transfer functions are constructed, one representing the mapping from the quantiles of the model in the projection period to those of the observations in the current period, termed here h1; and the second to the quantiles of the model in the current period, termed here h2. These two transfer functions are modeled by utilizing cubic smoothing splines. Please note that the transfer function is estimated for each grid box and each calendar month of the year.
Split-sample cross-validation is used to evaluate the performance of the equiratio CDFm. Specifically, we use the jackknife method (Lafon et al., 2012), described as follows. First, the period 1901–1954 is chosen as the training period to calibrate the parameters, and then the bias correction of precipitation is performed for the remaining 54 years. Subsequently, the validation is conducted against the same period as the bias correction. This completes one set of cross-validation. Then, the training period is repeatedly moved 10 consecutive years forward and the bias correction and validation are carried out for the remaining 54 years. In this study, six continuous 54-year periods, namely 1901–1954, 1911–1964, 1921–1974, 1931–1984, 1941–1994 and 1951–2004 are chosen as training periods and the corresponding remaining years as validation periods. The aim of this experiment design is to minimize the effects of the choice of training period and to guarantee the robustness of the assessment. As a result, this procedure will generate a total of six sets of bias-corrected precipitation time series. To assess the performance of EQCDF, a score is required to quantify the remaining bias between observed and modeled statistics. For a given grid box, the statistics of interest for the observed and the bias-corrected precipitation, denoted as and , respectively, are calculated over the period in which observational data played no role in inferring the bias-corrected data; then the absolute error, defined as , is averaged over all six sets of bias-corrected precipitation to create a performance estimator (hereafter referred to as AAE). Likewise, the same procedure is applied to raw model output. Consequently, if the AAE derived from bias-corrected data is smaller than that calculated from modeled precipitation, the correction method can be said to improve the quality of raw model output.
It is tedious to show the spatial pattern of AAE over all GCMs, so in this study we show the frequency distribution of AAE as an overall evaluation. Specifically, the AAE associated with the first two moments (mean and variance) is calculated individually for each grid cell, and the frequency distribution is subsequently obtained by splitting the AAE over all grid cells into equal-sized bins. Figure 2 shows such a frequency distribution about the mean for the GCM simulations before bias correction (first column) and after (second column), while Figure 3 depicts similar results, but for variance. In December, less than 5%, on average, of the grid cells show biases greater than 10 mm for the corrected model data (Figure 2(b)), while biases greater than 10 mm account for approximately 45% of all grid cells with respect to the original simulation (Figure 2(a)). This is also the case for June, when the number of grids with a bias less than 20 mm increases by 45% after bias correction. Moreover, the probability of very large bias (>50 mm) in June is 22% for the uncorrected simulation, and this frequency drops to 3% if the equiratio CDFm technique is applied. In addition, a large spread in the bias occurs in June but is significantly damped after correction. After bias correction, the biases of mean precipitation are well controlled within 5 and 10 mm in December and June, respectively, for most grid nodes. Comparison between the two columns in Figure 2 exhibits a dramatic reduction in the probability of relatively large biases. The higher probability of relatively small biases suggests that the EQCDF is quite effective in reducing the original model biases outside the training period. It is also noteworthy that the ability of equiratio CDFm when applied to different GCMs is comparable, indicating the robustness of this method.
Figure 2. Frequency distribution of AAE (for statistics: mean) over each GCM for raw model output and that after bias correction in December ((a) and (b)) and June ((c) and (d)). The numbers along the x-coordinate represent the corresponding GCMs, as described in Section 3.1. The y-coordinate denotes the AAE (unit: mm).
Download figure to PowerPoint
Figure 3. Frequency distribution of AAE (for statistics: variance) over each GCM for raw model output and that after bias correction in December ((a) and (b)) and June ((c) and (d)). The numbers along the x-coordinate represent the corresponding GCMs, as described in Section 3.1. The y-coordinate denotes the AAE (unit: mm2).
Download figure to PowerPoint
Analogous results can also be found in the adjustment of variance. As in Figure 2, only the comparisons in December and June are shown in Figure 3. Though the effect of bias correction is less pronounced in December, as shown in Figure 3(a) and (b), further inspection indicates that the frequency of biases greater than 500 mm2 after correction is reduced from 19% to 6% and approximately 85% of total grid cells show bias less than 250 mm2 compared with raw model output, with 69% correspondingly. In June, as displayed in Figure 3(c) and (d), as the frequency of biases less than 500 mm2 increases by 22% after bias correction, the remaining biases after applying EQCDF appear to cluster tightly in a narrower interval from 0 to 500 mm2. Finally, the equiratio CDFm provides consistent error reduction of variance across all GCMs. Not only the mean and variance discussed here, but also the systematic errors of skewness and kurtosis are remarkably reduced after performing equiratio CDFm (not shown). Additionally, it can be inferred that the geographical pattern of the bias-corrected fields based on equiratio CDFm will better reproduce the observations in view of the absolute error reduction.