Precipitation of Mainland India: Copula‐based bias‐corrected daily CORDEX climate data for both mean and extreme values

Changes in mean and extreme precipitation characteristics with changing climate may lead to an increase in frequency of hydrological extremes. For studying the impacts of the changing climate on hydrological systems, General Circulation Model (GCM)/Regional Climate Model (RCM) simulated precipitation are used. However, these products should be bias‐corrected before used in hydrological simulations to predict hydrological extremes. Most of the existing bias‐correction techniques suffer from either of two limitations – (a) they only reduce bias in selected precipitation quantile (either mean or extreme values), and/or (b) they exclude zero values from the analysis, even though their presence is significant in daily precipitation. In this study, a stochastic copula‐based bias‐correction method (Maity et al., J. Hydrometeorol., 20, 2019, 595), henceforth RMPH method, is used that corrects the bias in any quantile (mean and/or extreme values) of daily precipitation including zero values. The RMPH method is applied across Indian mainland to correct bias in simulated precipitation from the Coordinated Regional Climate Downscaling Experiment (CORDEX). Due to diverse climatic conditions across India, the quality of bias‐corrected precipitation is studied separately for different meteorologically homogenous regions of the country. Despite non‐uniform distribution of raingauge stations for observed precipitation, the superiority of the bias‐corrected precipitation (from RMPH method) in correcting bias and retaining the seasonal variation across the country is evident when compared with tradition bias‐correction approach like quantile mapping. The new bias‐corrected precipitation dataset developed is particularly suited for hydrological simulations, formulating extreme event mitigation strategies and climate change adaptation strategies.


| INTRODUCTION
The characteristics of precipitation in South Asia, including India, are expected to change in a changing climate (Ghosh et al., 2012;Kitoh et al., 2013;Krishnan et al., 2016;Mohan & Rajeevan, 2017;Qiu, 2008). The region is considered a climate 'hot spot' due to the combination of expected strong impacts of climate change and socio-economic factors such as high population density, developing economy and low per capita income (De Souza et al, 2015). Given such challenges in the region, the change in the characteristics of precipitation (whether it is mean, or any other quantile including extreme) is expected to have extensive implications on the availability of water resources and its management. This, in turn, is expected to affect the energy and food security for the large population living in the region which is about one-sixth of the world population. As the precipitation is mostly concentrated in monsoon months, about one-eighth of the Indian mainland is prone to flash flood during this season (NDMA, 2008). Most of north Bihar, Assam, regions of Western Ghats, Gujarat regularly suffer from flash flood during the monsoon months. Many parts of coastal south India have experienced major floods in the recent times, for example Coromandel Coast flood in 2015, Kerala flood in 2018, 2019 and 2020, East and West Godavari district flood in 2019. Moreover, the flash flood events are increasing throughout the country in the recent past (Cho et al., 2016;Houze Jr et al., 2017;Prasad & Singh, 2005;Seenirajan et al., 2017;Thayyen et al., 2013;Vishnu et al., 2019). Lower than normal rainfall results in meteorological drought, which may further develop into agricultural or hydrological drought (Maity et al., 2016). About one-third of the Indian mainland is prone to droughts (Subramanya, 2013), and both the intensity and areal extent of droughts in India are increasing Mallya et al., 2016;Sharma & Mujumdar, 2017).
Prediction and hydrological simulation of these extreme events may help in formulating an effective mitigation strategy. Currently, future precipitation estimates for different climate change scenarios, designated by different Representative Concentration Pathways (RCPs), are provided by several Regional Climate Models (RCMs) under the Coordinated Regional Climate Downscaling Experiment (CORDEX) project. However, the RCMs are known to produce systematic under-or over-estimation of precipitation, limiting their applicability in hydrological studies. Moreover, many RCMs tend to show drizzle effect (generate too many wet days with light rainfall), underestimate heavy rainfall values and produce an incorrect seasonal variation (Christensen et al., 2008;de Elía et al., 2017;Fowler et al., 2007;Maraun et al., 2017;Schmidli et al., 2006;Teutschbein & Seibert, 2010). Hence, RCM-simulated precipitation requires bias-correction before being utilized in hydrological models. A review of different bias-correction methodologies can be found in literature (Lafon et al., 2013;Pierce et al., 2015;Teutschbein & Seibert, 2010). However, most of the bias-correction methods suffer from one of following two limitations: (a) they reduce bias in a selected precipitation quantile (e.g., either mean or extreme values) and (b) they exclude zero values from the analysis, even though their presence is significant in daily precipitation. Copula-based bias-correction scheme (RMPH model) was developed by Maity et al. (2019) that reduces the bias in the entire range of the precipitation (including mean and extreme) and corrects the seasonality of RCM simulations. In this study, a bias-corrected precipitation dataset is generated for the entire Indian mainland using the aforementioned RMPH model. The quality of bias-corrected precipitation is investigated over different hydro-meteorological regions across the study area.
While comparing with Quantile Mapping (QM) method, the bias-corrected product from the RMPH model is better for both the mean and extreme precipitation values across most parts of the Indian mainland. In case of extreme precipitation, QM is found to over-estimate the extreme precipitation during dry period and under-estimate the extreme precipitation during monsoon months. Hence, the bias-corrected dataset from RMPH model is expected to be more useful for many hydrological modelling and climate change-related studies.

| METHODS
The RMPH model uses the entire range of daily precipitation values including zeros to correct bias in any quantile of precipitation with proper consideration to local climatic factors. The study area (Indian mainland) is climatologically diverse; hence, the quality of the bias-corrected data products bias and retaining the seasonal variation across the country is evident when compared with tradition bias-correction approach like quantile mapping. The new bias-corrected precipitation dataset developed is particularly suited for hydrological simulations, formulating extreme event mitigation strategies and climate change adaptation strategies.

K E Y W O R D S
copula based bias-correction, CORDEX, mean and extreme precipitation is investigated separately for different meteorologically homogeneous regions of study area.
An overview of RMPH model is shown in Figure 1. The model utilizes bivariate copulas to model the association between RCM-simulated downscaled values (SDV) of precipitation from CORDEX and observed precipitation (OBS). The conditional distribution from the copula-based joint distribution is modified to consider the presence of zero precipitation. This is carried out in the following steps: first, the SDV-OBS pair in the dataset are divided into three categories -(a) Pairs in which both SDV and OBS are non-zero, (b) Pairs in which OBS is zero and (c) Pairs in which SDV is zero. From these categories, three sets of information are obtained during calibration of the RMPH model.
The first set of data are converted to reduced variate (nonexceeding probability) using a best-fit marginal probability distribution function. The best-fit distribution is selected from a pool of parametric probability distribution functions on the basis of two criteria -(a) the fitted distribution should pass the chi-square (χ 2 ) test at 5% level of significance and (b) it should have the lowest Bayesian Information Criteria (Schwarz, 1978;Wit et al., 2012). Nine parametric probability functions, namely, Exponential, Normal, Log Normal, Inverse Gaussian, Gamma, Beta, Generalized Pareto, Logistic and Log logistic, are used for selecting the best-fit distribution. If no probability distribution satisfies both of the aforementioned criteria simultaneously, then a nonparametric Gaussian kernel-based estimate of probability distribution (hereinafter, nonparametric distribution) is used as best-fit probability distribution function. The relationship between the reduced variate of OBS and SDV is modelled using the most suitable copula function out of four bivariate copulas, namely, Clayton, Frank, Gumbel and Gaussian. These copulas are selected as they have different tail dependence characteristics Mao et al., 2015). The best-fit copula function is selected based on the smallest Cramér-von Mise statistics (Genest et al., 2009). The conditional distribution of observed precipitation given non-zero SDV is obtained from the fitted copula function: where C F X 1 (x 1 ), F X 2 (x 2 ) represents the selected copula function of the reduced variate of OBS F X 1 (x 1 ) and SDV F X 2 (x 2 ) , respectively, obtained from the first set of data. F X 1 ∕X 2 (x 1 |x 2 ) is the conditional distribution of OBS (x 1 ) conditioned on SDV (x 2 ), which can be used to predict observed precipitation given non-zero SDV. However, RCM simulations may generate high simulated precipitation for low observed precipitation, and it may also produce incorrect seasonality (de Elia et al., 2017;Maraun et al., 2017). Further, many RCM outputs show drizzle effect, i.e., many days with simulated low precipitation when observed precipitation is zero. To correct this type of bias, the obtained conditional distribution is required to be updated to include the probability of no/less precipitation. This information (probability of no observed precipitation, i.e., zero OBS, given SDV) is obtained from the second set of data (i.e., pairs where OBS is zero). The probability of getting zero OBS is expected to decrease with increasing SDV. An exponentially decaying function is used to model this probability in the following form: where Y is the probability of zero OBS and X is the value of SDV. The parameters a and b are estimated during calibration. For a decaying curve, b should be negative. The probability of zero precipitation obtained for different SDV (say p, the value of Y) for a specific value of X, i.e., SDV, is obtained in Equation (2). It is then used to update the conditional distribution of OBS given positive SDV as follows: The expression can be used to generate a family of conditional distribution curves %F X 1 ∕X 2 (x 1 |x 2 ) . These conditional distribution/simulation curves estimated during calibration can be used for simulating OBS given any nonzero SDV. Additionally, for the case when SDV is zero, the mixed distribution of observed values is estimated from the last set of data pairs as following.
where M and G X (x) are the probability mass for OBS being zero and the cumulative probability distribution of non-zero OBS given zero SDV, respectively. Hence, from Equations (1), (3) and (4), a set of simulation curves for OBS given any value of SDV is obtained. Next, the most optimal quantile is ascertained to correct the bias in the best possible way by minimizing the mean absolute error between the same month-wise statistics (mean or extreme) of OBS and the bias-corrected SDV (henceforth, bias-corrected values; BCV). The value of the most optimal quantile varies spatio-temporally due to the varying nature of the bias for different regions (stemming from climatic/ geographical differences) and different RCMs (due to varying drizzling effect and overestimated extreme precipitation across RCMs). Hence, the estimated most optimal quantile differs spatially, and it is considered a model parameter. The most optimal quantile is estimated during the calibration period, and it is used for simulating BCV during the validation/future period. Furthermore, the model is run seasonally as characteristics of bias may change seasonally (given some RCMs are poor in reproducing seasonality).
Next, the QM method is also used, and the quality of biascorrected products from the RMPH and QM models are compared. The QM reduces the bias in different quantiles on the assumption that the probability distribution of bias-corrected precipitation does not change when compared to the probability distribution of the observed precipitation (or the probability distribution remains stationary with time). For applying the QM, the best-fit probability distribution function is fitted for OBS and SDV during the calibration period. The best-fit probability distributions for OBS and SDV are selected in the same way as they were selected in the case of RMPH model. As stated earlier also, the bias-corrected SDV precipitation by QM method (hereafter QMC) is also provided in the dataset for comparison.
To explore the quality of the bias-corrected precipitation, the correspondence between SDV and OBS is compared with that between BCV and OBS (or QMC and OBS). Four statistical measures, namely, coefficient of determination (R 2 ), refined index of agreement (Dr), unbiased root-mean-square error (uRMSE) and mean absolute error/distance (MAE) are utilized to quantify the correspondence between SDV, BCV and QMC with the OBS dataset (Maity et al., 2016). Additionally, Taylor diagrams (Taylor, 2001) are analysed for the correspondence between the variables. The coefficient of determination is a measure of fraction of variability of OBS explained by other series, and it ranges from 0 to 1 (for best possible association). The refined index of agreement is a measure of association conceptualized as the mean absolute distance between two series scaled by the mean absolute deviation of OBS from its mean. It varies from −1 to 1. uRMSE is the rootmean-square error (RMSE) between the 'deviation from the mean' series obtained from two data series. Hence, the uRMSE is a measure of the bias in the variability of two data series. The mean absolute error/distance (MAE) quantifies the total bias (arising from both -bias in mean and bias in variability) between two series. Lower values of uRMSE and MAE indicate better correspondence.

| DESCRIPTION OF DATA USED
The quality of the bias-corrected products may vary due to diverse climatic conditions throughout the Indian mainland. Hence, to explore the spatial variation, four locations from each of the seven meteorologically homogenous divisions in India (Kothawale & Rajeevan, 2017) are selected randomly. It should be noted that due to inadequate data reliability, hilly regions were not considered by Kothawale & Rajeevan, 2017, and five meteorologically homogenous divisions regions were demarcated. However, due to data availability for hill regions in this study, these regions are considered for analysis in this study (Hilly regions in north and north-east India are termed Hilly region 1 and 2, respectively). The study region is shown in Figure S2, and the details of these selected points are presented in Table S1 of the Supporting Information. The observed daily precipitation for 1901-2017 at a spatial resolution of 0.25° × 0.25° is obtained from India Meteorological Department (IMD; Pai et al., 2014). The CORDEX (Girogi & Gutowski Jr, 2015) precipitation outputs from seven models (Table S2 of the Supporting Information) for the study region are obtained from Earth System Grid Federation (ESGF; Cinquini et al., 2014). It may be noted that the primary aim of this study is to produce a bias-corrected future daily precipitation dataset for hydrological simulations. Hence, the dataset should have finer spatial resolution. As GCMs have a coarse resolution, a downscaling technique is required for getting finer scaled details. We can apply statistical downscaling, but as CORDEX is a coordinated project for downscaling, we used the CORDEX products. Additionally, though it is not always guaranteed, some RCMs do add value to GCM simulations (Singh et al., 2017). However, in general, outputs from RCMs are used to achieve a fine-resolution bias-corrected products that will be useful for further hydrological studies.
The spatial resolution of CORDEX data varies; however, most of them have a resolution close to 50 km × 50 km, and hence, they are regridded to a common resolution of 0.50° × F I G U R E 2 Bias in mean and extreme precipitation (in mm/day) throughout India for selected months. (a) Monthly mean observed precipitation (first row) and corresponding bias in ensemble mean CORDEX precipitation (second row) for a selected month during calibration period, (b) Similar to a, but for monthly extreme precipitation (monthly 95 th percentile) 0.50° using the inverse distance-weighting method. To match the spatial resolution, the observed precipitation data are upscaled from 0.25° × 0.25° to 0.50° × 0.50°. The CORDEX data are available for 1961-2100, of which 19,61-2,005 is historical simulation and the future period (2006-2100) is simulated using different RCPs. The CORDEX data from the historical period  are divided into a calibration period , and a validation period (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005). Bias-corrected estimates of daily precipitation (both most expected and extreme condition) are calculated for different RCPs in the future time period  and are provided in the dataset. It should be noted that the 95 th percentile of daily precipitation is taken as a threshold for extreme precipitation. Selection of spatially varying extreme precipitation F I G U R E 3 Spatio-temporal distribution of selected best-fit marginal distribution for OBS and SDV during model calibration for REMO2009 RCM forced by MPI-ESM-LR GCM F I G U R E 4 Spatio-temporal distribution of selected copula functions during model calibration for REMO2009 RCM forced by MPI-ESM-LR GCM. The best-fit copula is selected on the basis of the smallest Cramér-von Mise statistics threshold is desirable in climatically diverse region like Indian mainland.

RMPH model
The RMPH model is applied seasonally for four selected seasons: Summer (March-May), Rainy (June-August), Autumn (September-November) and Winter (December-February). Bias correction is carried out separately for different CORDEX models. Hence, underlying details of the RMPH model can only be studied with respect to the individual CORDEX model. However, after bias correction, the ensemble mean of bias-corrected CORDEX precipitation (ensemble mean BCV) is calculated and used for assessing the quality of bias correction in terms of both mean and extreme values.
The bias in monthly mean precipitation for every CORDEX simulation is calculated throughout the region for analysis of spatio-temporal characteristics of the bias. F I G U R E 5 Spatio-temporal distribution of monthly mean OBS, SDV, BCV and QMC during validation period for four representative months. The SDV signifies ensemble mean CORDEX precipitation T A B L E 1 Correspondence of monthly mean most expected BCV, QMC and ensemble mean CORDEX (SDV) with respect to monthly mean OBS precipitation during the (a) calibration period, and (b) validation period. The spatial variation of observed monthly mean and extreme precipitation, and corresponding bias in ensemble mean CORDEX precipitation for four selected months (January, April, July and October; each representing one season) during calibration period are shown in Figure 2. For monthly mean precipitation, the ensemble mean CORDEX mostly overestimates for the dry season and underestimates during the wet season. For instance, in the month of April (a comparatively dry month), the ensemble mean CORDEX overestimates the monthly mean precipitation in most of India excluding north-east. However, during July (a monsoon month), the ensemble mean CORDEX is found to underestimate the precipitation in most parts of the north and north-east India. Additionally, the bias is high in areas having higher monthly mean precipitation. In the Himalayan regions of north India, the ensemble mean CORDEX shows a predominantly wet bias, i.e., ensemble mean CORDEX overestimates precipitation magnitude. Similar but relatively intensified patterns (in both bias magnitude and its spatial extent) are observed for monthly extreme precipitation. During July, the Western Ghats is showing the highest dry bias (underestimation of precipitation by CORDEX models) for extreme precipitation; however, during October, most of the regions showing the highest dry bias for extreme precipitation are situated in the Eastern Ghats. The spatio-temporal variation of the bias is different for different CORDEX models ( Figures S3-S5 in the Supporting Information show the spatio-temporal distribution of bias for three CORDEX models: REMO2009 RCM forced by MPI-ESM-LR GCM, RegCM4 forced by CSIRO-Mk3.6.0 and GFDL-ESM2M GCMs), which can be attributed to design/modelling differences between them.
Before feeding to the model, observed precipitation at each point is checked for outliers during the calibration period. Daily precipitation magnitudes showing a deviation of more than five standard deviation unit from the mean are regarded as outlier. These are not extreme events and may have resulted from erroneous observation records. Such instances are less than 0.33% for the 99.5% of Indian mainland for the entire calibration period. The OBS and corresponding SDV for those time steps are not used while calibrating the model. Additionally, the IMD observation stations are not distributed uniformly over Indian landmass, resulting in non-uniform relative error in precipitation records (discussed in details in Section S1 in the Supporting Information). This information is kept in mind while analysing the quality of bias-corrected precipitation dataset.
As stated earlier, all the OBS-SDV pairs are separated into three groups for each CORDEX model (Figure 1). For the first group of OBS-SDV pairs where both are non-zero, a suitable marginal distribution is fitted individually to OBS and SDV for converting them to their reduced variates (non-exceeding probability). The spatial and seasonal distribution of the selected best-fit marginal distribution for OBS and SDV for REMO2009 RCM model forced by MPI-ESM-LR GCM is shown in Figure 3. The spatial and seasonal distribution of the best-fit marginal distribution for two other CORDEX models are shown in Figure S6 of the Supporting Information. The spatio-temporal distribution of the best-fit marginal distribution varies with different CORDEX models. Additionally, the statistical characteristics of pairs of non-zero OBS and SDV change spatially and seasonally for a given CORDEX model. The kernel-based empirical marginal distribution is selected as best-fit distribution at most locations for non-zero OBS precipitation. Using selected marginal distributions, pairs of non-zero OBS and SDV are converted to their respective reduced variates. Four different copula functions are used to model the inter-relation between non-zero OBS and SDV using their reduced variates. The best copula function is selected based on the smallest Cramér-von Mise statistics. The spatio-temporal distribution of the selected copula functions for REMO2009 RCM model forced by MPI-ESM-LR GCM for different seasons is shown in Figure 4. Similar figures for two other CORDEX models are shown in Figure S7 of the Supporting Information. The conditional distribution of non-zero OBS, given nonzero SDV, is obtained from the selected copula function. From the second group of OBS-SDV pairs (with zero OBS), the entire range of SDV is divided into different classes, and the frequency of SDV values having corresponding zero OBS in those classes is ascertained. The frequency of SDV having zero OBS is expected to decrease with increasing SDV.
Hence, an exponentially decaying curve of form Equation (2) is used to model the frequency of SDV corresponding to zero OBS given the class mark (midpoint of class) of SDV as shown in Figure S8a of the Supporting Information. The assumption of decreasing frequency of SDV having zero OBS with increasing SDV holds well for most locations and season combinations; but some cases are found in which the frequency is found to increase with increasing SDV, which means that the RCM is not able to capture the seasonality well. Additionally, it should be noted that the uncertainty in the OBS is not uniform across India due to the non-uniform distribution of raingauge stations as discussed in Section S1 of the Supporting Information. It may be noted that the performance of RCM may not be spatially uniform, given the climatological, topological or other spatially varying factors. Hence, the correspondence between SDV and OBS itself may be poor at some locations. As any bias-correction method heavily depends upon the quality of output of driving climate model and observed data, so the bias-correction methods are expected to perform poor at those locations (Maraun et al., 2017).
The parametric conditional distribution obtained from the best-fit copula is then modified to include the probability mass of SDV corresponding to zero OBS, resulting in a set of mixed probability distribution of OBS for different positive values of SDV. One such set of conditional distribution for location NE2 is shown in Figure S8b of the Supporting Information. These simulation curves are for non-zero SDV from REMO2009 RCM forced by MPI-ESM-LR GCM during the autumn season. Furthermore, the probability of OBS given zero SDV is estimated from the third set of OBS-SDV pairs with zero SDV. The BCV is then obtained by using (a) the simulation curves for SDV > 0 and (b) probability of OBS given zero SDV, as shown in Figure 1.
The calibrated model is then used to estimate the mean daily BCV. The optimal quantile for the mean daily BCV is obtained by comparing the monthly mean OBS with BCV at different quantiles during calibration period. The spatiotemporal distribution of the mean monthly BCV along with OBS, ensemble mean CORDEX precipitation (SDV) and corresponding QMC for four representative months during the validation period is shown in Figure 5. The spatiotemporal distribution of BCV matches better to OBS when compared with SDV. Despite the spatio-temporal variation of bias and change in its characteristics between calibration and validation period, the satisfactory quality of bias-corrected precipitation bolsters the hypothesis that the model is able to capture and reduce the spatial and seasonal variation of bias. Location-wise correspondence of BCV and SDV with OBS is presented in Table 1, and the quality of bias-corrected precipitation is inferred from grouped Taylor diagrams provided in Figures S9 and S10 of the Supporting Information. From the figures, the correlation and variability pattern of BCV are better than SDV when compared to OBS in most of the hydro-meteorological homogenous regions. Additionally, from Table 1, the BCV is found to have lower MAE, and better Dr as compared to SDV. In some locations, the correspondence between SDV and OBS is not satisfactory, for example, the locations falling inside H1 and H2 regions, NE2, NW3, PE2 and PE3. The reason for the low correspondence for locations PE2 and PE3 might be due to topological and climatic factors. These regions are in Eastern Ghats and unlike most of India receive rain for longer time (July-November). For the locations NE2, NW3 and locations falling inside regions H1 and H2, the small number of stations used for recording observed precipitation (hence, resulting in higher uncertainty; Section S1 of the Supporting Information) might be the reason for the low correspondence. At these locations, the BCV is found to have better correspondence to OBS compared to SDV as indicated by higher R 2 and Dr, and lower values of MAE and uRMSE, which shows the efficacy of the RMPH model. However, the BCV in such locations should be used with caution, as a bias-correction method cannot be taken as a substitute for inadequate modelling of seasonality by climate models (Maraun et al., 2017). Even in cases where the correspondence between SDV and OBS changed between calibration and validation periods (hence, the characteristics of bias changed; e.g., all the points falling in H2 region), the RMPH model is found to perform satisfactorily.
The RMPH model is able to reduce the bias for extreme values too. For comparing extreme precipitation, 95 th percentile of daily OBS and SDV is compared with the mean of 95 th quantile daily BCV. The spatio-temporal distributions of extreme BCV as compared to extreme OBS and SDV (ensemble mean CORDEX) for four representative months are shown in Figure 6. The spatio-temporal distribution of monthly extreme BCV is found to match better with monthly extreme OBS when compared to monthly extreme SDV. This suggests that the RMPH model suitably reduces the bias in the extreme precipitation, even for locations where the bias characteristics of extreme precipitation are different than bias characteristics of monthly mean precipitation. This benefit of the RMPH model is due to the provision of different simulation curves used for different values of SDV; hence, it captures the varying nature of bias at different quantiles. Therefore, for the analysis of extreme events affected by extreme precipitation, BCV is better suited for hydrological simulation as compared to SDV (i.e., CORDEX simulation outputs). Furthermore, location-wise correspondence between OBS and SDV, and OBS and BCV for extreme precipitation are shown in Table 2. The quality of bias-corrected product can be inferred from the grouped Taylor diagrams presented in Figures S11 and S12 of the Supporting Information. Figures S11 and S12 show reduced root-mean-square distance between BCV and OBS when compared to SDV and OBS, barring a few locations with inadequate stations for OBS ( Figure S1). Comparing Tables 1 and 2, the correspondence between monthly extreme OBS and SDV is found to be inferior compared to the case of monthly mean. Hence, the RCMs used in CORDEX do not have same skill while predicting mean and extreme precipitation, resulting in different spatio-temporal characteristics of bias in both cases as shown in Figure 2. The correspondence between monthly extreme BCV and OBS is better compared to the correspondence between monthly extreme SDV and OBS, which is revealed by lower values of MAE and uRMSE and higher values of R 2 and Dr (Table 2). Even in the case of locations with very low correspondence between SDV and OBS, such as H12, H14, NE2, PE2 and PE3, the RMPH model is found to perform satisfactorily.

| Comparison with quantile mapping
The BCV is also compared with the bias-corrected precipitation obtained using the QM method (QMC) in Figures 5  and 6, Tables 1 and 2, and Figures S9-S12 of the Supporting Information. The comparison helps in highlighting the quality of bias-corrected precipitation obtained from RMPH compared to that from QM method. Figures 5 and 6 show that the spatio-temporal variation of BCV (from RMPH model) matches better with OBS as compared to QMC. This difference in the quality is due to the assumption of QM method itself: the QM and other methods based on QM method bias correct the simulated values by matching the quantile with OBS. This assumption might not be correct as the bias is affected by multiple factors as outlined before and can also vary spatially and seasonally Mao et al., 2015). In the case of the RMPH model, the relationship between OBS and SDV is modelled using the generated simulation curves, conditional on the simulated value, and these curves vary spatially. Depending upon the SDV, the conditional distribution of OBS changes, which helps in reducing the bias in different quantiles. Further, the provision of spatially varying most optimal quantile as a calibrated model parameter also helps in capturing spatially varying bias. Hence, the RMPH model is more flexible, and it is supposed to be more effective in reducing biases  compared to QM. This is also evident in the case of location-wise comparison as provided in Tables 1 and 2. From Tables 1 and  2, the correspondence of monthly mean or extreme BCV with OBS is found to be better compared to the correspondence of QMC with OBS as evident by lower MAE and uRMSE and higher values of R 2 and Dr. Additionally, it is found that during the wet period of the year, QMC is under-predicting the extreme precipitation; however, it is over-predicting during the dry period of year. Hence, the generated bias-corrected precipitation (BCV; bias-corrected using the RMPH model) is a better choice for hydrological simulations as compared to either of CORDEX precipitation data (SDV) or CORDEX precipitation data corrected by the QM method (QMC).
Finally, assumption of stationary bias in the data is another important point to mention. In general, if the bias is nonstationary, bias-corrected products will be more reliable in the near future than in the far future. Most of the existing biascorrection methods including RMPH model inherently assume stationarity in the bias. Thus, the limitation with respect to model capability in the far future is undeniable. Incorporating the time-varying characteristics in the existing bias-correction models may be kept as a future scope of this study.

CONCLUSIONS
In this study, a bias-corrected CORDEX precipitation dataset (bias corrected using the newly proposed RMPH model based on copula functions) is evaluated for tropical region like Indian mainland. Additionally, the bias-correction performance is compared with quantile mapping. The observed data provided by the India Meteorological Department suffer from non-uniform relative error due to non-uniform distribution of raingauge station across the country. Despite this, the quality of bias-corrected product from RMPH model is found to be satisfactory across the country compared to quantile mapping in case of mean precipitation.
The better quality of bias correction in the dataset, biascorrected using the RMPH method, can be explained by different design considerations in the model. For instance, the RMPH model is found to capture the regional variation in climatic condition better than quantile mapping. This observed better spatial transferability of the model might result from provision of the location-specific most optimal percentile and different simulation curves for different values of RCM precipitation. The provision of different simulation curve for different values of RCM precipitation is based on hypothesis that characteristic of bias changes with magnitude of RCM simulated precipitation. These flexibilities result in better modelling of bias for different quantiles of daily precipitation, which in turn result in better quality of bias-corrected precipitation dataset from the model as compared to other method like quantile mapping.
Similar to the case of mean precipitation, the bias-corrected precipitation from RMPH model is better than the one from quantile mapping in reducing the bias in extreme daily precipitation. The correspondence between monthly extreme observed precipitation and monthly extreme RCM precipitation is found to be inferior as compared to the case of monthly mean. This suggests that climate models used in CORDEX do not have the same skill to predict extreme precipitation when compared to mean precipitation, resulting in different spatio-temporal variation of bias in extreme precipitation compared to bias in monthly mean precipitation. Even in this case, the RMPH model performed better due to the provision of different simulation curves T A B L E 2 Correspondence of monthly extreme BCV, QMC and ensemble mean CORDEX (SDV) with respect to monthly extreme OBS precipitation during the (a) calibration period, and (b) validation period. for different quantiles of precipitation as mentioned above.
Compared to the RMPH model, quantile mapping is found to generally over-predict extreme precipitation during dry months and to under-predict precipitation during wet months. Hence, the bias-corrected precipitation dataset (obtained from RMPH model) is better than either CORDEX outputs or CORDEX outputs bias-corrected using quantile mapping.

AND USAGE NOTES
The bias-corrected daily CORDEX precipitation dataset from RMPH model is being released for the research community. As the BCV is found to be better than both SDV and QMC, the developed dataset is expected to be particularly suited for hydrological simulations, formulating extreme event mitigation strategies, and climate change adaptation strategies over India. The dataset  is provided in the form of self-documented NetCDF files (*.nc). The file names are formatted as follows: p r _ < d o m a i n > _ < G C M _ n a m e > _ < s c e n a r i o > _ r1i1p1_<RCM_name>_day_<corr_method>.nc Different parts of the file name are described as: 1. domain: Domain name as per CORDEX project. It can be either 'WAS-44' or 'WAS-44i' for Indian mainland.
Files with name ending with 'bias_corrected_expected' should be used for hydrological simulation as it contains most expected daily BCV. All the files have spatial extent of 6.75°N, 66.75°E to 38.25°N, 99.75°E with a spatial resolution of 0.5° (latitude) × 0.5° (longitude) and temporal resolution as daily. In the dataset, the duration of historical period and future period for different RCPs are 1961-2005and 2006-2100. It should be noted that files with names ending with 'bias_corrected_extreme' have estimates of precipitation with the assumption that extreme condition prevails every day. Hence, these files cannot be used for daily simulation, rather they can be used to assess mean extreme precipitation for weeks, months or years by averaging their daily values.