• Open Access

Equiratio cumulative distribution function matching as an improvement to the equidistant approach in bias correction of precipitation


  • Lin Wang,

    1. Center for Monsoon System Research, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, China
    2. University of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Wen Chen

    Corresponding author
    1. Center for Monsoon System Research, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, China
    • Correspondence to: Dr W. Chen, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beiertiao 6, Zhongguancun, Beijing 100190, China.

      E-mail: cw@post.iap.ac.cn

    Search for more papers by this author


Equidistant cumulative distribution function (CDF) matching has been used frequently in recent studies to bias-correct raw modeled precipitation. However, this brief discussion shows that negative precipitation will result from applying this method. A feasible alternative to avoid this problem is to use equiratio CDF matching as proposed in this study. A real-world assessment based on Coupled Model Inter-comparison Project 5 (CMIP5) confirms the effectiveness and robustness of equiratio CDF matching in systematically removing biases in modeled precipitation. Our conclusions here will require a re-examination of the relevant literature in which equidistant CDF matching is used to bias-correct precipitation.

1. Introduction

Bias correction is a popular technique for postprocessing the raw output from general circulation models (GCM), which usually suffers from biases due to uncertainty in parameterizing unresolved processes (Christensen et al., 2008). A realistic and reliable representation of future climate is crucial for impact and vulnerability assessment. Moreover, the importance of bias correction has been described in the special report of the Intergovernmental Panel on Climate Change (IPCC, Seneviratne and Nicholls, 2012). In recent years, much effort has been dedicated to investigating various postprocessing techniques, from simple additive and scaling corrections (Fowler and Kilsby, 2007) to more advanced quantile mapping approaches (e.g. Wood et al., 2002; Wood et al., 2004; Maurer and Hidalgo, 2008). The major advantage of quantile mapping is that it aims to adjust the cumulative distribution function (CDF) of a model simulation to agree with that of observations in a given reference period rather than to merely adjust the mean and variance of model output. Quantile mapping is a very efficient bias correction technique with many applications (e.g. Sharma et al., 2007; Piani et al., 2010). This method can be mathematically formulated as

display math(1)

Where inline image is the quantile function (inverse CDF) corresponding to observations and Fm − c is the CDF of GCM outputs in the reference period. Briefly, a quantile of a random variable is a real number image satisfying image where F is the CDF. Accordingly, the quantile function expresses the quantile values as a function of probabilities (Gibbons and Chakraborti, 2011; Gilchrist, 2000). The underlying assumption of quantile mapping is that the future distribution of a variable of interest will remain similar to that in the reference period. However, this may not hold true, as argued by Li et al. (2010), who recently proposed the equidistant cumulative distribution function matching (EDCDFm) method as an improvement to the traditional method. EDCDFm explicitly considers the change of the distribution in the future. This improved quantile mapping method can be mathematically written as

display math(2)

where image is the CDF of the model for a future projection period, and inline image and inline image are quantile functions for observations and model in the reference period, respectively. Figure 1(a) illustrates how EDCDFm works. Though it incorporates the change in distribution, the fundamental assumption of this method is that the difference between modeled and observed values over the reference period will be preserved in a future period. By performing a synthetic experiment, Li et al. (2010) concluded that the EDCDFm method is superior to the traditional method. Though recently proposed, it has been widely cited in scholarly articles. In addition, several other studies have used EDCDFm to bias-correct precipitation (e.g. Sun et al., 2011). However, there are problems with this method when applied to bias-correcting precipitation, as described thoroughly in the following section. We applied this exact same method to precipitation data from phase five of the Coupled Model Inter-comparison Project (CMIP5) and found that numerous negative values result.

Figure 1.

Schematic illustration of equidistant CDFm (top) and equiratio CDFm (bottom). Solid black line represents the CDF of observations. Blue and red dotted lines show the CDF of model simulations over reference and future projection periods, respectively. Arrow origins and ends denote the raw and bias-corrected values, respectively.

2. Limitations of the equidistant approach and an improved method

As mentioned above, EDCDFm has some shortcomings when used for bias-correcting precipitation. It is evident that each term on the right side of expression (2) is positive. However, we cannot guarantee the positivity of the bias-corrected output, namely inline image. We will give an idealized example to demonstrate the deficiency of EDCDFm when used for bias-correcting precipitation. In order to overcome a significant problem when precipitation equals zeros for a given month, Li et al. (2010) used a mixed Bernoulli–gamma distribution. For illustration purposes, we take advantage of a pure gamma distribution with two parameters that can be written as

display math(3)

where image is a shape parameter, image is a scale parameter, image is the precipitation amount, and inline image is the gamma function. The distribution of observations over the reference period, and the model output for the reference and future projection periods, are constructed by setting image, image and image in the gamma distribution, respectively, as shown in Figure 1. On the one hand, the solid black curve and the blue dotted curve show a positive (wet) bias in the model output in the reference period. On the other hand, a sharp decrease in precipitation occurs in the future projection period, as revealed by the red dotted line. Consequently, negative values will appear after bias correction, as denoted by arrows in Figure 1(a).

What is the best way to address the problem with using EDCDFm for precipitation bias-correction? The term inline image on the right side of expression (2) can be termed the ‘quantile mapping factor’. For bias correction of temperature, this works well and can be further referred to as the ‘additive factor’. The most natural way to avoid the problem of a negative value as a consequence of applying the additive factor is to use a multiplicative factor inline image. For example, additive adjustment of temperature is usually applied to correct the mean bias of the model, while scaling adjustment is often utilized for precipitation, as negative values may result from additive correction (Berg et al., 2012; Watanabe et al., 2012). For bias correction of precipitation, the statistical transformation can be written as

display math(4)

which can be further referred to as equiratio CDF matching (EQCDF), because the underlying assumption of the multiplicative factor is that the ratio between the observed and modeled values at the same percentile during the reference period will also apply to the projection period. Specifically, for a given value xm − p and its associated percentile Fm − p(xm − p) in a future projection period, the bias between the modeled and observed values at Fm − p(xm − p) during the reference period is quantified by the ratio of inline image to inline image. On the basis of the assumption that the ratio is preserved in the future period, the bias-corrected value is immediately obtained by multiplying xm − p with this ratio. In Figure 1(b), for example, supposing xm − p = 4.28 and thus Fm − p(xm − p) = 0.8 accordingly as denoted by the red dotted line. The values of inline image and inline image are 6 and 11, respectively, and the bias at Fm − p(xm − p) = 0.8 in the reference period expressed in ratio is 6/11. By multiplying xm − p = 4.28 and the ratio, the bias-corrected value is 2.3. In contrast, if we apply equidistant approach shown in Equation (2), the bias quantified by difference between inline image and inline image is −5. Consequently, the resultant value after bias correction is −0.72, which is unacceptable for precipitation. Moreover, the equiratio CDFm method is applied to the same case described above, as shown in Figure 1(b). Owing to the wet bias inherent in the model, the equiratio approach reasonably pulls the raw values projected by the model to the left side of the graph in order to offset the positive biases, and it will not simultaneously lead to negative values.

3. Application and validation

3.1. Data and methodology

To assess the performance of equiratio CDFm, we rely on the simulations and projections from CMIP5. The 34 GCMs used in this study are, in alphabetical order: (1) ACCESS1-0, (2) ACCESS1-3, (3) bcc-csm1-1, (4) BNU-ESM, (5) CanESM2, (6) CCSM4, (7) CESM1-BGC, (8) CESM1-CAM5, (9) CMCC-CM, (10) CNRM-CM5, (11) CSIRO-Mk-3-6-0, (12) EC-EARTH, (13) FGOALS-g2, (14) FGOALS-s2, (15) FIO-ESM, (16) GFDL-CM3, (17) GFDL-ESM2G, (18) GFDL-ESM2M, (19) GISS-E2-H, (20) GISS-E2-R, (21) HadGEM2-CC, (22) HadGEM2-ES, (23) HadGEM2-AO, (24) inmcm4, (25) IPSL-CM5A-LR, (26) IPSL-CM5A-MR, (27) IPSL-CM5B-LR, (28) MIROC5, (29) MIROC-ESM, (30) MIROC-ESM-CHEM, (31) MPI-ESM-LR, (32) MPI-ESM-MR, (33) MRI-CGCM3 and (34) NorESM1-M. Detailed descriptions and relevant information can be found at http://cmip-pcmdi.llnl.gov/cmip5/. The historical run and the RCP4.5 run (Moss et al., 2010), respectively cover the period from the mid-19th century to 2005 and from 2006 to the end of the 21st century. Here we concentrate on the period from 1901 to 2008. The observed precipitation (unit: mm month−1) used in this study is a gridded dataset on a 0.5 latitude–longitude resolution elaborated by the Climate Research Unit of the University of East Anglia. The latest version of the high-resolution data employed in this study is CRU TS3.10 (Harris et al., 2013), which is the best long-term gridded observational data to facilitate climate change research. Due to the mismatch between fine-resolution observation data and the coarse resolution of climate model simulations, we aggregate the finer-scale observed data to a GCM-specific resolution in order to avoid the added uncertainty introduced by the interpolation scheme. In addition, the bias correction procedure using equiratio CDFm is applied only to monthly precipitation data confined within China (on average, 450 grid cells within China), but it can be readily applied to other parts of the world or to daily outputs from GCMs.

In the work by Li et al. (2010), they used the mixed Bernoulli–gamma function to approximate the precipitation distribution. However, if the precipitation datasets fail to match the priori assumptions of the theoretical distribution, the mixed Bernoulli–gamma distribution correction method is meaningless. We cannot guarantee that the theoretical distribution is suitable for all grid points and all GCMs, which will inevitably invalidate the automated procedure. Consequently, the more effective approach to bias correction is a nonparametric method, which has been proved to be the best at reducing biases in regional climate models (RCMs) in Norway (Gudmundsson et al., 2012). In fact, as the operator inline image in Equation (4) serves as a mapping from one of the quantiles of the modeled distribution in a projection period onto the same quantile of the observed distribution in the reference period, the quantile–quantile correspondence denoted as h1 can be constructed straightforwardly instead of introducing theoretical distribution. The similar explanation can also be applied to h2. Finally, the equiratio CDFm method used in this study can be formulated as:

display math(5)

In Equation (5), two transfer functions are constructed, one representing the mapping from the quantiles of the model in the projection period to those of the observations in the current period, termed here h1; and the second to the quantiles of the model in the current period, termed here h2. These two transfer functions are modeled by utilizing cubic smoothing splines. Please note that the transfer function is estimated for each grid box and each calendar month of the year.

Split-sample cross-validation is used to evaluate the performance of the equiratio CDFm. Specifically, we use the jackknife method (Lafon et al., 2012), described as follows. First, the period 1901–1954 is chosen as the training period to calibrate the parameters, and then the bias correction of precipitation is performed for the remaining 54 years. Subsequently, the validation is conducted against the same period as the bias correction. This completes one set of cross-validation. Then, the training period is repeatedly moved 10 consecutive years forward and the bias correction and validation are carried out for the remaining 54 years. In this study, six continuous 54-year periods, namely 1901–1954, 1911–1964, 1921–1974, 1931–1984, 1941–1994 and 1951–2004 are chosen as training periods and the corresponding remaining years as validation periods. The aim of this experiment design is to minimize the effects of the choice of training period and to guarantee the robustness of the assessment. As a result, this procedure will generate a total of six sets of bias-corrected precipitation time series. To assess the performance of EQCDF, a score is required to quantify the remaining bias between observed and modeled statistics. For a given grid box, the statistics of interest for the observed and the bias-corrected precipitation, denoted as image and image, respectively, are calculated over the period in which observational data played no role in inferring the bias-corrected data; then the absolute error, defined as image, is averaged over all six sets of bias-corrected precipitation to create a performance estimator (hereafter referred to as AAE). Likewise, the same procedure is applied to raw model output. Consequently, if the AAE derived from bias-corrected data is smaller than that calculated from modeled precipitation, the correction method can be said to improve the quality of raw model output.

3.2. Results

It is tedious to show the spatial pattern of AAE over all GCMs, so in this study we show the frequency distribution of AAE as an overall evaluation. Specifically, the AAE associated with the first two moments (mean and variance) is calculated individually for each grid cell, and the frequency distribution is subsequently obtained by splitting the AAE over all grid cells into equal-sized bins. Figure 2 shows such a frequency distribution about the mean for the GCM simulations before bias correction (first column) and after (second column), while Figure 3 depicts similar results, but for variance. In December, less than 5%, on average, of the grid cells show biases greater than 10 mm for the corrected model data (Figure 2(b)), while biases greater than 10 mm account for approximately 45% of all grid cells with respect to the original simulation (Figure 2(a)). This is also the case for June, when the number of grids with a bias less than 20 mm increases by 45% after bias correction. Moreover, the probability of very large bias (>50 mm) in June is 22% for the uncorrected simulation, and this frequency drops to 3% if the equiratio CDFm technique is applied. In addition, a large spread in the bias occurs in June but is significantly damped after correction. After bias correction, the biases of mean precipitation are well controlled within 5 and 10 mm in December and June, respectively, for most grid nodes. Comparison between the two columns in Figure 2 exhibits a dramatic reduction in the probability of relatively large biases. The higher probability of relatively small biases suggests that the EQCDF is quite effective in reducing the original model biases outside the training period. It is also noteworthy that the ability of equiratio CDFm when applied to different GCMs is comparable, indicating the robustness of this method.

Figure 2.

Frequency distribution of AAE (for statistics: mean) over each GCM for raw model output and that after bias correction in December ((a) and (b)) and June ((c) and (d)). The numbers along the x-coordinate represent the corresponding GCMs, as described in Section 3.1. The y-coordinate denotes the AAE (unit: mm).

Figure 3.

Frequency distribution of AAE (for statistics: variance) over each GCM for raw model output and that after bias correction in December ((a) and (b)) and June ((c) and (d)). The numbers along the x-coordinate represent the corresponding GCMs, as described in Section 3.1. The y-coordinate denotes the AAE (unit: mm2).

Analogous results can also be found in the adjustment of variance. As in Figure 2, only the comparisons in December and June are shown in Figure 3. Though the effect of bias correction is less pronounced in December, as shown in Figure 3(a) and (b), further inspection indicates that the frequency of biases greater than 500 mm2 after correction is reduced from 19% to 6% and approximately 85% of total grid cells show bias less than 250 mm2 compared with raw model output, with 69% correspondingly. In June, as displayed in Figure 3(c) and (d), as the frequency of biases less than 500 mm2 increases by 22% after bias correction, the remaining biases after applying EQCDF appear to cluster tightly in a narrower interval from 0 to 500 mm2. Finally, the equiratio CDFm provides consistent error reduction of variance across all GCMs. Not only the mean and variance discussed here, but also the systematic errors of skewness and kurtosis are remarkably reduced after performing equiratio CDFm (not shown). Additionally, it can be inferred that the geographical pattern of the bias-corrected fields based on equiratio CDFm will better reproduce the observations in view of the absolute error reduction.

4. Discussion

The equidistant CDF matching proposed by Li et al. (2010) in the family of quantile mapping has been extensively used to bias-correct raw model output. However, negative precipitation will result from this method, which is not acceptable in the bias correction of precipitation. This situation typically occurs when the model has wet biases and the precipitation is projected to decrease. Furthermore, if the bias-corrected precipitation implemented by EDCDFm is used to force impact models such as hydrological and agricultural models, the negative values will adversely affect the accuracy and reliability of regional climate impact studies. This article provides a feasible alternative called equiratio CDFm to solve this problem. To test the effectiveness and fidelity of equiratio CDFm, this technique is applied to bias-correct simulated monthly precipitation from an ensemble of 34 GCMs within the framework of CMIP5. As expected, the method performs well in the bias-reduction of not only the mean but also other moments of the distribution. What's more, even if each GCM has its own intrinsic bias behaviors, the overall assessment illustrated above definitely indicates that the equiratio CDFm method is not model dependent, which confirms the robustness of the proposed procedure. It is worth noting, however, that we do not intend to compare the equiratio CDFm with other bias-correction methods here, which would involve considerable work. In addition, its performance in adjusting other statistics of interest, i.e. extreme values based on daily-scale data (Chen et al., 2012), needs to be further evaluated. Rather, in light of the increasingly wide application of equidistant CDF matching, our purpose here is to point out the deficiencies of this method and the special precautions it requires, and to attempt to propose a feasible scheme to address this problem.


We thank the two reviewers for their constructive comments and suggestions that led to significant improvement of the manuscript. This work is supported by the National Natural Science Foundation of China Grants 41025017 and 41230527.

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. There is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, this manuscript.