A multi-site statistical downscaling model for daily precipitation using global scale GCM precipitation outputs

Authors

  • D. I. Jeong,

    Corresponding author
    • Centre ESCER (Étude et Simulation du Climat à l'Échelle Régionale), Université du Québec à Montréal, Montreal, Québec, Canada
    Search for more papers by this author
  • A. St-Hilaire,

    1. INRS-ETE, Université du Québec, Québec city, Québec, Canada
    Search for more papers by this author
  • T. B. M. J. Ouarda,

    1. INRS-ETE, Université du Québec, Québec city, Québec, Canada
    2. Water & Environmental Engineering, Masdar Institute of Science and Technology, Abu Dhabi, United Arab Emirates
    Search for more papers by this author
  • P. Gachon

    1. Atmospheric Science and Technology Directorate, Canadian Centre for Climate Modeling and Analysis (CCCMA) section, Climate Research Division, Environment Canada, Montréal, Québec, Canada
    Search for more papers by this author

Correspondence to: Dr. D. I. Jeong, Centre ESCER, Université du Québec à Montréal, 201 Avenue President-Kennedy, Montreal, Quebec H3C 3P8, Canada. E-mail: jeong@sca.uqam.ca

ABSTRACT

This study proposes a multi-site statistical downscaling model (MSDM), which can downscale daily precipitation series at multiple sites in a regional study area by utilizing Global Climate Models' (GCMs) precipitation outputs directly. The at-site precipitation occurrences and amount characteristics are reproduced by first-order Markov chain and probability mapping approaches, respectively. The spatial coherence of precipitation series among multiple sites is reproduced by adding correlated random noise series to GCM precipitation outputs. The model is applied for two regional study areas in southern Québec (Canada). The MSDM results are compared to those of the local intensity scaling (LOCI) model, which is a single site downscaling model that uses GCM precipitation outputs. Both models reproduce probabilities of precipitation occurrence and mean wet-day precipitation amounts. However, the MSDM reproduces the observed precipitation occurrence Lag-1 autocorrelation, the standard deviation of the wet-day precipitation amounts, maximum 3-d precipitation total (R3days), and 90th percentile of the rain day amount (PREC90) better than the LOCI model. The MSDM also accurately reproduces cross-site correlations of precipitation occurrence and amount among multiple observation series.

1. Introduction

Precipitation is a critical hydrological cycle variable within the climate system, and the processes responsible for precipitation occurrence, duration and intensity appear on various large and small scales. Changes in precipitation occurrence and amount patterns due to climate change could affect the global and regional hydrological systems, water resource management, agriculture, forestry, and a broad range of natural and human systems (Meehl et al., 2007). Global Climate Models (GCMs) are the basic source of projected future behaviours of various climate variables (e.g. precipitation, wind, temperature, humidity, and air pressure). These models can quite reliably simulate the projected climate at the global or continental scale, especially for the large scale upper-air field (Huth, 2002; Harding et al., 2011). However, the coarse resolution (typically larger than 2° latitude by 2° longitude) of GCM outputs can prevent their direct application at the regional and local scales, as well as for near-surface variables, which are more biased than the upper-air fields, mainly due to the difficulty of incorporating sub-grid scale processes. The near-surface variables strongly affect the regional and local precipitation patterns through processes such as topography or through regional/local sources of moisture that are not fully captured by GCMs (Widmann et al., 2003).

Dynamical and statistical downscaling (SD) methodologies have been applied to generate surface climate variables (e.g. precipitation and temperatures) at a finer scale than GCMs. Dynamical downscaling employs regional climate models (RCMs) to generate regional-scale climate variables physically using GCM outputs as boundary conditions. SDs are generally categorized into three groups such as weather typing, stochastic generators, and regression-based approaches (Wilby et al., 2002; Fowler et al., 2007). Regression-based SDs derive empirical relationships between GCM atmospheric outputs (predictors) and observed climate variables (predictands) with transfer functions. Although this approach can provide coherent results from an employed GCM with the appropriate selection or combination of predictors (e.g. sea level pressure, geopotential heights, humidities, and wind fields), downscaled values are highly sensitive to this selection (Wilby et al., 2002; Fowler et al., 2007). The transfer functions, however, explain only a portion of the observed local predictand variability, especially for precipitation. Therefore, regression-based SDs are followed by variance increase methods (von Storch, 1999). One of the variance increase methods, the randomization procedure, represents unexplained temporal variability by adding random noise (von Storch, 1999; Wilby et al., 2002; Hessami et al., 2008). The spatial variability among multi-site local predictand variables is also not reproduced accurately by the regression mapping from large-scale predictors (Wilby et al., 2003; Harpham and Wilby, 2005; Bürger and Chen, 2005). Spatially correlated random noise can be employed to reproduce the spatial coherence of the multi-site precipitation series (Jeong et al., 2012).

Widmann et al. (2003) and Schmidli et al. (2006) suggested employing GCM precipitation outputs directly as predictors instead of using GCM air-circulation predictors for downscaling regional precipitation series. Their main reason was that the GCM precipitation output is already (in principle) physically connected with various relevant air circulation GCM variables. However, it is well-known that GCM-simulated precipitation generally shows some strong systematic errors with respect to the observed regional precipitation caused in part by the simplified representation of the surface topography or regional-scale conditions and the poor representation of mesoscale processes within the GCMs. Widmann et al. (2003) also expected that the direct use of GCM precipitation as a predictor should be a safer choice than GCM air-circulation variables because differences between the regional precipitation and GCM precipitation might be less variable than the relationships between regional precipitation and GCM air-circulation variables. Directly using GCM precipitation outputs to downscale regional precipitation series might be preferred because this can alleviate the problems associated with multiple predictor selection.

The local intensity scaling (LOCI) model suggested by Schmidli et al. (2006) and applied by Schmidli et al. (2007) can be a good example of the aforementioned approaches. The LOCI model can reproduce exactly the precipitation occurrence frequency and mean wet-day precipitation amounts at a single site. However, this model cannot reproduce Lag-1 autocorrelation of precipitation occurrence (i.e. the correlation between wet- or dry-days separated by one time step) and standard deviation of wet-day precipitation amounts. Weather states, such as continuous wet or dry spell days, could be reproduced by the Lag-1 autocorrelation of precipitation occurrence. Temporal precipitation persistence can be approximately generated by reproducing the Lag-1 autocorrelation of weather states (wet- or dry-days) (Mehrotra et al., 2006). Furthermore, the LOCI needs to be extended to reproduce the spatial coherence of precipitation occurrences and amounts among multiple observation sites.

Reproducing the surface precipitation spatial coherence series among multiple observation sites in a targeted area is often required in hydrological or agricultural impact analysis (e.g. flood control, water supply, and drought management; Mehrotra and Sharma, 2007). For this reason, various SD methods have been developed to provide multi-site precipitation series based on stochastic weather generation approaches (Wilks, 1998, 1999; Qian et al., 2002; Mehrotra and Sharma, 2007; Khalili et al., 2007, 2009), weather typing approaches (Palutikof et al., 2002; Fowler et al., 2005), regression-based approaches (Harpham and Wilby, 2005), and hybrid approaches that combine regression-based and stochastic generation methods (Wilby et al., 2003; Harpham and Wilby, 2005). None of them has been employed using GCM precipitation input information.

Hence, the main objective of our study is to develop and apply a multi-site statistical downscaling model (MSDM) to downscale the daily precipitation series directly from GCM precipitation outputs at multiple southern Québec province (Canada) observation sites. The MSDM was designed to reproduce the Lag-1 precipitation occurrence autocorrelation and the standard deviation of wet-day precipitation amounts at a given observation site. This model can also reproduce cross-site correlation by adding correlated random noise series from a multivariate normal distribution to the GCM precipitation series. The downscaled MSDM results are also compared to those of LOCI. The strategies and assumptions used to project the future precipitation occurrence and amount series using the MSDM are also provided.

This article is organized as follows. Section 'Methodology' describes the methodologies. Section 'Application' explains the study area and the data used, including the predictand and predictor datasets. Section 'Results' contains the study results, and section 'Summary and conclusion' provides the discussion and conclusions.

2. Methodology

If the physical and/or statistical relationships are unknown between the simulated GCM grid point precipitations and the observed local site precipitation series, one can initially generate the local information using a spatial interpolation approach. In this study, the first step of statistical downscaling is calculating the initial observation site precipitation series (math formula) by distance-weighted averaging the GCM precipitation series (G1, G2, …, Gg) at g grid points around the observation site as below:

display math(1)
display math(2)

where dm represents distances between the local site and grid point m (= 1, 2, …, g). The weights (w1, w2, … wg) are estimated by the inverse distance between GCM grid points and the observation site.

2.1. LOCI model

The LOCI model suggested by Schmidli et al. (2006) has the following two steps for calibration. The first step is determining a wet-day threshold (math formula) on the daily GCM precipitation series (i.e. the initial precipitation series generated by Equation (1)) that attempts to match the number of wet-days to the observed series. An initial daily precipitation value is replaced with zero (dry-day) when it is less than the wet-day threshold. In a second step, a precipitation amount scaling factor s is determined from the observed and initially simulated precipitation amounts as below:

display math(3)

where Y and math formula are the daily precipitation series vectors of the observation and the initially downscaled precipitation series, respectively, yth (= 1 mm/d) is the wet-day observation threshold, math formula is the wet-day threshold of the initial precipitation series, and E(·) indicates the expectation value. The downscaled daily precipitation series math formula for a day t can be obtained by

display math(4)

where math formula is a precipitation value on a day t in the initial precipitation series math formula.

2.2. Multi-site statistical downscaling model

2.2.1. At-site precipitation occurrence

With the LOCI model, the wet-day precipitation occurrence probability (p1) can be exactly reproduced by employing one threshold (the wet-day threshold (math formula)) on the initial precipitation series math formula. If math formula reproduces the Lag-1 autocorrelation adequately, LOCI could reproduce the Lag-1 autocorrelation (or transition probabilities) of precipitation occurrences. If math formula does not reproduce the Lag-1 autocorrelation, LOCI could not have the ability to reproduce the Lag-1 autocorrelation because the Lag-1 precipitation occurrence autocorrelation cannot usually be reproduced by applying the threshold (math formula) to the math formula. Simply put, the probabilities of a wet-day following a dry-day (p01) and a wet-day following a wet-day (p11) need to be modelled to reproduce the Lag-1 autocorrelation of the precipitation occurrence series. The MSDM employs the first-order Markov chain model to reproduce the Lag-1 autocorrelation (or transition probabilities) for math formula.

display math(5)

where F− 1 indicates the inverse of the empirical cumulative math formula distribution function, and math formula is a precipitation value on a day t of the math formula. F− 1[1 − p01] and F− 1[1 − p11] are two thresholds for math formula to reproduce the Lag-1 autocorrelation of the observed precipitation occurrence series. math formula is a binary (0 for dry-day and 1 for wet-day) precipitation occurrence vector transformed from math formula by the first-order Markov chain approach. The precipitation occurrence probability p1 and Lag-1 auto-correlation r1 are defined as below at the first-order Markov chain.

display math(6)
display math(7)

The first-order Markov chain model, however, can accurately reproduce the Lag-1 autocorrelation (or the transition probabilities) only when it is applied to the time series that do not have Lag-1 autocorrelation. If the basic first-order Markov chain model is applied on the math formula, which already has Lag-1 autocorrelation, the recalculated transition probabilities math formula and math formula from the transformed vector math formula are different from the observed p01 and p11. One can expect the same math formula and math formula values as the observed p01 and p11 values by replacing them in Equation (5) with the empirically determined transition probabilities math formula and math formula, which can be estimated with the following iteration scheme.

display math(8a)
display math(8b)

where p1 and p11 are the observed precipitation occurrence values and transition probabilities, respectively, k (= 1,  2,   …) is the number of iterations, math formula and math formula are the empirical transition probabilities at the kth iteration, and math formula is the recalculated transition probability at each kth iteration from the binary vector math formula transformed by math formula and math formula. When k = 1, math formula and math formula are same as p01 and p11. The iteration can be stopped when the difference between the observed transition probability (p01 or p11) and the adjusted transition probability (math formula or math formula) is less than a given tolerance. In this study, the iteration was stopped when the difference between p11 and math formula is less than 5% of the observed probability p11. Finally, the estimated empirical transition probabilities math formula and math formula are employed in Equation (5) to reproduce the Lag-1 autocorrelation and transition probabilities of the observations.

2.2.2. At-site precipitation amount

The precipitation amount LOCI scaling factor s can reproduce mean wet-day precipitation amounts accurately; however, it cannot reproduce the wet-day precipitation amount standard deviation. The MSDM employs a probability distribution mapping technique to reproduce the observed precipitation amount distribution. In this method, empirical cumulative probabilities of all nonzero precipitation amounts in math formula are calculated first. Adjusted precipitation amounts are then recalculated with cumulative nonzero precipitation amount probabilities from the gamma distributions fitted to the observed data for each site.

The gamma distribution has been adopted to simulate or generate daily precipitation amounts in many studies (Stephenson et al., 1999; Wilks, 1999; Yang et al., 2005). The nonzero precipitation amount probability density function is

display math(9)

where α is the shape parameter, β, the scale parameter, and Γ(·), the gamma function. The mean (μ) and variance (σ2) of the daily precipitation amounts on wet-days are as below:

display math(10)

2.2.3. Multi-site coherence

Typically, precipitation amounts at a site should be small or close to zero when nearby stations are all dry. Reproducing the spatial precipitation series coherence is important for regional analyses. For instance, a streamflow simulation model requires inputs of precipitation series that are spatially distributed on the drainage basin. Streamflows can be modelled adequately only when the spatial coherences of precipitation occurrence amounts on the targeted drainage basin are appropriately represented. The precipitation occurrence and amount series are generated based on the initial precipitation series math formula at multiple observation sites in this study. Therefore, the initial precipitation series should have an appropriate spatial correlation to reproduce the observed spatial correlation of precipitation occurrence and amounts. If an RCM can generate the appropriate spatial coherence of the initial precipitation series math formula from the large-scale precipitation series, the math formula can be directly used to generate precipitation occurrence and amount series without additional operations. When the spatial coherences are not reproduced adequately for the initial precipitation series, one can adjust the cross-site correlation by adding correlated random series to the initial precipitation series. The detailed methodology to adjust the cross-site correlation coefficient in the initial precipitation series math formula at multiple sites based on random noise is provided in . The MSDM employs the correlated random number series to adjust the cross-site correlation among the initial precipitation series that are spatially interpolated from GCM precipitation outputs. Therefore, the adjusted MSDM series are different from the series obtained using weather generators, which usually generate cross-correlated series from random processes only. The suggested approach of Wilks (1998) is adopted to overcome the cross-site correlation differences between continuous series and transformed binary series (see for further details). Figure 1 presents the MSDM procedure to downscale precipitation series for multiple observation sites using GCM precipitation outputs.

Figure 1.

The multi-site statistical downscaling model procedure to downscale precipitation series at multiple observation sites from GCM precipitation outputs.

2.2.4. Strategies to project future precipitation series using the MSDM

Although the projection of future precipitation series is beyond the scope of this study (i.e. the ultimate goal of the suggested approach, which will be used in further works), the strategies and assumptions employed to project future precipitation series using the MSDM are provided as follows. To project precipitation occurrence series, the future initial precipitation series math formula should be generated first using the spatial interpolation approach from GCM future precipitation outputs. The future precipitation occurrence series can be projected with same transition probabilities math formula and math formula of the future initial precipitation series with the assumption that the two thresholds will not change.

The future precipitation amount series can be projected using the probability mapping technique and the future gamma distribution. The shape and scale parameters of the ‘future’ gamma distributions can be predicted based on differences of the means and standard deviations of historical and future initial precipitation series (math formulaand math formula). The shape and scale parameters (αF and βF) of the ‘future’ precipitation amount gamma distribution can be predicted as

display math(11a)
display math(11b)

where μF and σF are the predicted mean and standard deviation of the final future precipitation amounts, and αF and βF are the predicted gamma distribution's shape and scale parameters at a site. μO and σO, M and S, and MF and SF are the means and standard deviations estimated from the observed precipitation amount series, historical initial precipitation series and future initial precipitation series, respectively.

For the spatial coherences of future precipitation occurrence and amounts, the covariance matrices estimated from calibration data can be employed under the assumption that the cross-site random series correlations remain stable under future climate conditions. Therefore, the future spatial coherences of precipitation occurrence and amounts are different from the historical spatial coherences only when the cross-site correlations between math formula and math formula are different.

3. Application

3.1. Study area and data

Observed daily precipitation data were obtained from the national archive of Environment Canada with time series from 1961 to 2010. LOCI and the MSDM are calibrated for the period from 1961 to 1990 and validated for the period from 1991 to 2010.

The southern Québec region in eastern Canada is the study area (Figure 2). Southern Québec has a humid continental climate with relatively warm and humid summers and cold and snowy winters. Due to the presence of regular synoptic weather systems coming from the west and the mid-western United States (with local re-development over the Great Lakes area), as well as through major systems moving along the eastern Atlantic Ocean coasts, the annual precipitation is 887 ~ 1260 mm (i.e. from the 30-year 1971–2000 climatological normal values; Table 1). Precipitation is generally evenly distributed throughout the year, with a summer peak being observed during convective rainfall events. Two study areas were selected, and the MSDM was applied separately to the two study areas to evaluate its ability to reproduce the spatial coherence over two different precipitation regimes. Area B, more affected by various meteorological systems developing along the Gulf of St. Lawrence coastal areas or the Atlantic region (Figure 2), has more heterogeneous precipitation conditions than in area A. Twelve and five meteorological observation sites are located in study areas A and B, respectively.

Figure 2.

Locations of global-scale CGCM3 grid points and observation stations of daily precipitations in southern Québec (Canada) study areas A and B. Numbers represent selected meteorological sites (see the names of the respective stations given in Table 1). A1–A4 and B1–B4 represent CGCM3 grid points.

Table 1. Names and locations of meteorological observation stations and CGCM3.1 grid points, which provide global-scale precipitation outputs for study areas A and B (see their locations in Figure 2). Annual precipitations were calculated from the 30-year climatological period from 1971 to 2000
 Observation sitesCGCM3.1 grid points
AreaNo.NameLatitude (°N)Longitude (°W)Altitude (m)Annual precipitation (mm)NameLatitude (°N)Longitude (°W)
A1Les Cedres45.3074.0547.2923.9A146.3975.00
 2St Jerome45.8074.05169.51051.8A242.6875.00
 3Joliette46.0273.4556.0998.9A346.3971.25
 4Philipsburg45.0373.0853.31078.7A442.6871.25
 5Vercheres45.7773.3721.01006.0   
 6Sorel46.0373.1214.6960.0   
 7Granby45.3872.72175.01210.3   
 8Nicolet46.2072.6230.4917.8   
 9Drummondville45.8872.4882.31113.5   
 10Magog45.2772.12274.01116.3   
 11Bromptonville45.4871.95130.01123.2   
 12Thetford Mines46.1071.37381.01260.4   
B13Bagotville A48.3371.00159.1919.3B150.1071.25
 14Ste Lucie46.7370.02373.01142.3B246.3971.25
 15Grandes Bergeronnes48.2569.5261.01029.6B350.1067.50
 16Rimouski48.4568.5235.7886.5B446.3967.50
 17Mont-Joli A48.6068.2252.4894.7   

The global-scale GCM precipitation outputs are originate from the CGCM3 model, which is the third version of the coupled atmosphere-ocean GCM from the Canadian Center for Climate Modeling and Analysis of Environment Canada. This version uses the same ocean component as CGCM2 (Flato and Boer, 2001), but it makes use of the substantially updated atmospheric component AGCM3 (Atmospheric GCM, version 3). This GCM output is regularly used for downscaling and climate change impact research, especially in Canada and abroad (e.g. see its use for statistical downscaling within the ENSEMBLES project in Europe: http://www.meteo.unican.es/ensembles/). The CGCM3 precipitation outputs were obtained from the DAI (Data Access and Integration) web site http://loki.qc.ec.gc.ca/DAI/. Table 1 provides detailed the local observation sites' location information and CGCM3 grid points for study areas A and B. The climatological mean annual precipitation amounts (normal values of the 1971–2000 period) at all observation sites are also provided.

3.2. Model application and evaluation indices

LOCI and the MSDM were calibrated separately for each month and observation sites of each study area during 1961–1990. As the first step, initial precipitation series (i.e. math formula in Section 'Methodology') at each study area's meteorological sites were generated from CGCM3 precipitation outputs on four grid points located in the same study area using distance-weighted averaging.

In the calibration process, the MSDM produces random number series from a multivariate normal distribution with the determined standard deviation (biσi,  i = 1,  2, …, N) and cross-site correlation math formula (i, j = 1, …, N; i ≠ j) for each site. After adding the multi-site random series to the initial precipitation series, the parameters of the first-order Markov chain and probability distribution mapping methodologies were estimated for each site and for each month to reproduce the observed precipitation occurrence (≥1 mm/d used as threshold to define wet-days) and amount characteristics. The LOCI model was directly calibrated from the initial precipitation series without adding random number series. The calibrated LOCI and MSDM were applied to CGCM3 precipitation outputs over the independent time period between 1991 and 2010 for validation. For stability and robustness of the MSDM random process, 50 precipitation series instances of equal length to that of the precipitation records were generated.

Basic statistics and extreme indices were employed to evaluate the downscaled daily precipitation series. Basic statistics of the precipitation occurrences and amounts include the daily precipitation occurrence probability and Lag-1 autocorrelation and the mean, standard deviation, and Lag-1 autocorrelation of the mean daily precipitation amount per wet-day. For extreme indices, R3days (maximum 3-d precipitation total) and PREC90 (90th percentile of rain day amount) are employed (as used in a similar study by Hessami et al., 2008). These indices are calculated per season, i.e. over spring (MAM), summer (JJA), autumn (SON), and winter (DJF). The basic series statistics and extreme indices downscaled by LOCI and MSDM are compared to those observed to evaluate the downscaling ability of the two models. The cross-site correlation coefficients among multiple observation sites were used to evaluate the ability to reproduce the spatial coherence of the precipitation occurrence and amount series downscaled by the two models.

4. Results

Downscaled results by LOCI and the MSDM are presented and discussed in this section. The MSDM generates 50 precipitation occurrence and amount series instances of equal length to the analysis period from 1961 to 2010 to test the stability and robustness of the model's stochastic procedure. Therefore, the MSDM results presented in this section were calculated from the 50 generated instances.

4.1. At-site results

As previously mentioned, the MSDM added random series to the CGCM3 output initial precipitation series at four grid points by the spatial interpolation approach to reproduce the multiple-site spatial coherence of the precipitation occurrence and amounts. Figure 3 presents the annual and seasonal variance percentages explained by the CGCM3 output initial precipitation series in relation to the whole precipitation occurrence and amount variances at each observation site during the MSDM calibration period. The other observed variance percentages have been supplied by random series. For the daily precipitation occurrence (Figure 3(a)), the variance percentages of the initial precipitation series are larger in area A than in area B and varied from 54 ~ 60% to 37 ~ 48% for each respective area over the whole annual period. Therefore, larger variances associated with random series were required for the area B sites than the area A sites to reproduce the observed natural variance at the observation sites. Seasonally, the variance percentages of the initial precipitation series in spring were larger than those in the other seasons on sites of both regions A and B. Surprisingly, for the strong majority of stations, the weaker values of explained variance appear in winter, when it is expected to have a dominant share of large-scale precipitation events more easily captured by GCM fields than in summer, when local or convective events take place (i.e. not explicitly resolved at the GCM grid scale).

Figure 3.

Annual and seasonal variance percentages of daily precipitation occurrence ( ≥1 mm) and amount explained by the CGCM3 output initial precipitation series relative to the entire MSDM precipitation occurrence and amount variances per station and per area (see their locations and names in Figure 1 and in Table 1) during the calibration period. The remaining variances at each site are provided by random noise.

For the precipitation amount (Figure 3(b)), the initial precipitation series variance percentages were smaller than those for precipitation occurrence and varied from 34 to 45% for area A and from 25 to 36% for the whole annual period. Again, percentages at area A sites were larger than those of area B. In such cases, percentages in summer were smaller (for certain sites by a factor of nearly two) than for other seasons at almost all sites, thus implying that larger random series variances should be required in summer. In autumn, the variance percentages originating from the initial precipitation series were larger than those in the other seasons.

Figure 4 presents LOCI and MSDM downscaled versus observed probability and Lag-1 autocorrelation values of the daily precipitation occurrence at all sites in study areas A and B for the calibration and validation periods. Both downscaling models reproduced the observed daily precipitation occurrence probability notably well for the calibration and validation periods (Figure 4(a)). For the validation period, the two models yielded almost the same value for this statistic, although LOCI yielded slightly greater values (i.e. 0.29% on average; 0.8 wet-days per year) than the MSDM. Ranges of 50 MSDM instances for this statistic were not provided to simplify this figure. On average, the standard deviations of the 50 MSDM series instances for this statistics were 0.001 (i.e. 0.1%) for the calibration period and 0.006 (i.e. 0.6%) for the validation period at each site. As shown in Figure 4(b), the MSDM reproduced the Lag-1 autocorrelations quite well, but the LOCI model systematically overestimated it during calibration periods. Root mean square errors (RMSEs) of LOCI and the MSDM were 5.4 and 0.4% during the calibration period, respectively. The LOCI and MSDM yielded RMSEs of 4.1 and 2.4% for this statistic during the validation period. On average, standard deviations of 50 MSDM series instances for this statistic were 0.001 (i.e. 0.1%) for the calibration period and 0.002 (i.e. 0.2%) for the validation period at each site. The strong initial precipitation series autocorrelation might be a reason for the over-represented precipitation occurrence Lag-1 autocorrelation of LOCI. However, LOCI employs only one threshold to determine wet- or dry-days from the initial precipitation series and thus can reproduce the wet-day precipitation occurrence probability exactly, but it cannot guarantee the reproduction of the precipitation occurrence transition probabilities and Lag-1 autocorrelation.

Figure 4.

Scatter plot of downscaled versus observed (a) probability of occurrence and (b) Lag-1 autocorrelation of daily precipitation occurrence (Pocc) ( ≥1 mm) from the LOCI and MSDM models for the calibration (cal) and validation (val) periods.

Figure 5 presents the LOCI and MSDM downscaled values versus observed values of daily wet-day precipitation amount mean, standard deviation, and Lag-1 autocorrelation for all sites in study areas A and B during the calibration and validation periods. In Figure 5(a), both downscaling models reproduced the observed daily wet-day precipitation amount mean adequately during the calibration and the validation periods. On average, the standard deviations of the 50 MSDM series instances for this statistic were less than 0.01 mm for the calibration period and 0.25 mm for the validation period at all sites. As shown in Figure 5(b), the MSDM reproduced the daily wet-day precipitation amount standard deviation well, although with a small underestimation, especially during the calibration period (i.e. bias of −0.28 mm). The imperfect observed precipitation amount representation by the gamma distribution should be the prime source of this systematic difference. During the calibration period, LOCI yielded worse performance for this variable with an RMSE of 0.44 mm, while the MSDM RMSE was 0.28 mm. However, LOCI and the MSDM yielded an RMSE of 0.51 and 0.64 mm, respectively, during the validation period. On average, the standard deviations of the 50 MSDM series instances for this statistic were 0.72 mm for the calibration period and 3.9 mm for validation period. As shown in Figure 5(c), the MSDM cannot adjust this characteristic theoretically, thus, the model produced nearly constant values for all sites. LOCI yielded worse performance for this variable than MSDM and RMSEs of LOCI and MSDM were 0.045 and 0.033, respectively. For the validation period, LOCI and MSDM yielded slightly larger values (increase of 0.05 and 0.02, respectively) than for the calibration period. On average, standard deviations of 50 realizations of MSDM for this statistics were 0.026 (i.e. 2.6%) for the calibration period and 0.032 (i.e. 3.2%) for validation period for each site.

Figure 5.

Scatter plots of downscaled versus observed and (a) mean, (b) standard deviation, and (c) Lag-1 autocorrelation of precipitation amount (Pamount) on wet-days ( ≥ 1 mm) by LOCI and the MSDM. The LOCI and MSDM values for the validation period are plotted with observed values for the calibration period.

In Figures 6 and 7, seasonal box-and-whisker plots were used to compare the extreme variables (R3days and PREC90) between observations and the downscaling models. The box-and-whisker plots are provided for representative sites: site 8 in study area A and site 15 in study area B. For the box-and-whisker plots, outliers were identified when a value falls below or above 1. 5 × IQR (i.e. median plus or minus 1. 5 × IQR), where IQR is the interquartile range. The box-and-whisker plots were generated using values from the whole analysis period from 1961 to 2010. In winter, LOCI overestimated the observed R3days at both sites, and medians from LOCI for this index were 12.8 and 14.4 mm larger than those of sites 8 and 15 observations, respectively (Figure 6). LOCI could not reproduce the R3days variability in summer at site 15. However, the MSDM reproduced R3days fairly well with respect to the observations. As shown in Figure 7, LOCI tended to overestimate the PREC90 in winter at both sites, while it slightly underestimated it in summer at both sites. The MSDM better reproduced the observed PREC90 medians than LOCI except in autumn at site 15. The MSDM could have weaker performances for more extreme events than R3days and PREC90 because of the inability of the gamma distribution to reproduce the precipitation distributions' heavy tail.

Figure 6.

Box-and-whisker plots of R3days from observations (Obs.), LOCI, and the MSDM for site 8 of study area A and site 15 of study area B. A value outside of 1.5 times the interquartile range from the median is defined as an outlier (−).

Figure 7.

Box-and-whisker plots of PREC90 from observation (Obs.), LOCI, and the MSDM for site 8 of study area A and site 15 of study area B. A value outside of 1.5 times the interquartile range from the median is defined as an outlier (−).

4.2. Multi-site result analyses

In Figures 8 and 9, the modelled and observed cross-site correlation coefficients are presented for the precipitation occurrence and amount, respectively, for each pair of stations, the two regions A and B, and the calibration/validation period. The MSDM reproduced the observed precipitation occurrence series cross-site correlations quite well for both study areas (Figure 8(a) and (b)). However, LOCI systematically overestimated the correlations for all pairs in both study areas. The bias of LOCI was 0.22 in area A and 0.36 in area B, respectively, for the calibration period. For the validation period, both models yielded similar correlation values as for the calibration period.

Figure 8.

Cross-site correlation coefficients between pairs of precipitation occurrence series from observation (Obs.), LOCI, and the MSDM as a function of inter-station distances for all possible station pair combinations in study areas A and B.

Figure 9.

Cross-site correlation coefficients between pairs of precipitation amount series from observation (Obs.), LOCI, and the MSDM as a function of inter-station distances for all possible station pair combinations in study areas A and B.

As shown for the precipitation amount (Figure 9(a) and (b)), the MSDM reproduced the observed cross-site correlations fairly well for both study areas, whereas LOCI showed difficulty in representing this variable and consistently overestimated the correlation for all pairs of stations in both study areas. The precipitation amount cross-site correlation biases of LOCI, which were 0.43 for area A and 0.54 for area B were larger than those for the precipitation occurrence during the calibration period. For the validation period, both models yielded similar values for this statistic as their values for the calibration period. On average, the standard deviations of the 50 MSDM series instances for the precipitation occurrence and amount cross-site correlations were small (i.e. 0.01 for the calibration period and 0.02 during validation period for all sites). The initial precipitation series derived directly from global scale precipitation CGCM3 outputs using the spatial interpolation approach already have strong cross-site correlations and are the main cause of the LOCI overestimated cross-site precipitation occurrences and amount correlations. Note that the LOCI model does not have any cross-site correlation adjustment component.

Although the MSDM can reproduce the cross-site correlation among the observation sites in a targeted regional area, it is not designed to reproduce the regional precipitation amounts directly. However, the total precipitation amount in a targeted region or drainage basin is an important variable for hydrological analysis. Therefore, the two extreme indices, R3days and PREC90, were evaluated for regional average precipitation series, which are averaged daily precipitation series at observation sites in each study areas A and B. Figure 10 presents the box-and-whisker plots of observed R3days and the two downscaling models for each season for the A and B regional average precipitation amount series. LOCI overestimated the median R3days values in all seasons in both study areas. Differences between the medians of the LOCI and observation box-and-whisker plots were largest in winter, while they were smallest in autumn. In winter, the LOCI medians were 50.3 and 52.5% greater than the observations for study areas A and B, respectively. The MSDM reproduced the observed medians within 13% for all cases. Figure 11 presents box-and-whisker plots of the observed PREC90 and the two downscaling models for each season for the A and B regional average precipitation amount series. Again, LOCI overestimated the PREC90 in all seasons at both study areas. The LOCI medians were 18.5 ~ 40.9% and 35.6 ~ 55.8% larger than those observed for each season in study areas A and B, respectively. The MSDM slightly underestimated the observed medians (less than 16%) for all cases. As shown in Figure 9, the MSDM underestimates cross-site correlations of precipitation amounts especially when the cross-site correlations are large and inter-distances are small. The underestimated cross-site correlation could be a reason of the under-estimation of the regional PREC90. In Figures 7 to 11, it can be observed that the regional average precipitation series for areas A and B yielded smaller median values and ranges of maximum and minimum values than those for site 8 of area A and site 15 of area B, respectively.

Figure 10.

Box-and-whisker plots of R3days from observation (Obs.), LOCI, and the MSDM for regional average precipitation series in study areas A and B. A value outside of 1.5 times the interquartile range from the median is defined as an outlier (−).

Figure 11.

Box-and-whisker plots of PREC90 from observation (Obs.), LOCI, and the MSDM for regional average precipitation series in study areas A and B. A value outside of 1.5 times the interquartile range from the median is defined as an outlier (−).

4.3. Comparison to a weather generator

The MSDM employs weather generating techniques such as the first-order Markov chain and gamma distribution. However, the main difference between the MSDM and a weather generator is that the MSDM adopts GCM precipitation output series, whereas a weather generator uses random noise series to generate precipitation series. Therefore, the MSDM downscaled precipitation series can be more consistent with the results of a host GCM model than those by a weather generator in terms of time-domain variability.

The total annual precipitation inter-annual variability from LOCI, the MSDM, and a weather generator approach at site 8 of study area A are compared in Figure 12. In the figure, CGCM3 represents the initial precipitation series spatially interpolated from CGCM3 precipitation outputs. The weather generation has the same first-order Markov chain and probability mapping procedures as the MSDM; however, it uses random series generated by random processes and is independent from the host CGCM3 precipitation output. Precipitation series generated by LOCI and the MSDM follow the temporal variability of the CGCM3 well, and the linear correlation coefficients between the LOCI and the CGCM3 and between the MSDM and the CGCM3 are 0.98 and 0.81, respectively. The weather generator, however, showed totally different behaviour from the CGCM3 for the annual total precipitation.

Figure 12.

Annual total precipitation of CGCM3, LOCI, the MSDM, and weather generation at site 8. CGCM3 represents the initial precipitation series spatially interpolated from CGCM3 precipitation outputs, and weather generation is the precipitation series generated by random number generation, first-order Markov chain, and a gamma distribution probability mapping approach.

5. Summary and conclusion

Schmidli et al. (2006) suggested the LOCI model that can downscale precipitation series on a single observation site directly from GCM (or RCM) precipitation outputs. In this study, the MSDM was suggested to reproduce the Lag-1 autocorrelation of precipitation occurrence, the standard deviation of precipitation amount, and the spatial coherence of precipitation occurrence and amount series among multiple observation sites. LOCI and the MSDM were applied to downscale daily precipitation series in two regional areas in southern Québec (Canada), and CGCM3 (i.e. GCM) precipitation outputs were used as the initial sources to generate local scale precipitation series.

In a single-site result analysis, MSDM adequately reproduced the precipitation occurrence Lag-1 autocorrelation and standard deviation of wet-day precipitation amounts. Therefore, the suggested methodologies, i.e. the empirical first-order Markov chain and probability mapping approaches in MSDM can be applied to downscale at-site precipitation series from GCM or RCM precipitation output series. MSDM also yielded more reliable results than LOCI in reproducing the extreme precipitation amount indices (R3days and PREC90), especially in winter and summer. It is obvious that the two extreme indices should be affected by the precipitation occurrence and amount Lag-1 autocorrelations and wet-day precipitation amount standard deviation, which the MSDM reproduced better than did LOCI.

In the MSDM, random noise has been added to the initial precipitation series from CGCM3 precipitation outputs to reproduce the spatial coherence among multiple observation sites. Regression-based statistical downscaling models often employ statistical randomization procedures because regression models with large-scale predictors explain only part of the natural variabilities of precipitation occurrence and amount. Therefore, the reasons for adding random noise to the MSDM and regression-based SDs are different. However, regression-based SDs might require as much as or larger variance than the MSDM to reproduce the precipitation series natural variance. For example, Hessami et al. (2008) reported that their linear regression-based downscaling model using large-scale reanalysis predictors explained only 10 ~ 30% of the precipitation series natural variance in southern Québec (Canada). The MSDM might be preferred over a regression-based downscaling approach because it can avoid complicate predictor selection procedures and systematic errors due to differences between reanalysis and hosted GCM (CGCM3 in this study) predictor variables. Regression-based SDs generally require additional bias and variance adjustment procedures to overcome the systematic differences between reanalyses and GCM outputs (Wilby and Dawson, 2004).

In a multi-site results analysis, the MSDM better reproduced the daily precipitation occurrence and amount series cross-site correlations among multiple observation series than LOCI. LOCI systematically overestimated the observed correlations. Initial precipitation series were generated from CGCM3 global scale precipitation outputs using a spatial interpolation approach, and then the LOCI and MSDM used the initial precipitation series to generate precipitation series at the observation sites. Therefore, strong cross-site correlations among the initial precipitation series already exist and are the main causes of the over-represented LOCI model cross-site correlations. If more accurately correlated initial precipitation series among sites are available than those of this study, LOCI can yield better performance in terms of spatial coherence. Furthermore, the MSDM should require smaller random noise series variances to reproduce the spatial coherence. RCMs can provide better initial precipitation series for multiple local sites in terms of cross-site correlation because they can usually reproduce better spatial coherence results than GCMs by considering regional-scale physical processes and topography not explicitly incorporated in GCMs.

This study evaluated reproduction ability of MSDM based on total precipitation amounts on targeted regions. MSDM reproduced R3days and PREC90 values of observations well, while LOCI were overestimated this variable for all cases. It is obvious that regional average precipitation on a local area or drainage could be reproduced accurately only when spatial coherences of precipitation occurrence and amounts are accurately reproduced.

In future work, the proposed MSDM will be applied to other predictands, such as the daily maximum and minimum temperatures from multiple observation sites. The MSDM will be applied to RCM precipitation outputs, which most likely provide better spatial coherence for a regional area than a GCM. The downscaled results of this model should be compared to those of regression-based downscaling model outputs or RCM model outputs to obtain insight about the various downscaling approaches and the uncertainties related to using one particular downscaling model.

Acknowledgements

We acknowledge the financial support provided by the National Science and Engineering Research Council (NSERC) of Canada. We are also grateful to Lucie Vincent and Eva Mekis from Environment Canada for providing observed data sets of homogenized temperature. The authors would like to acknowledge also the DAI (http://quebec.ccsn.ca/DAI/) Team for providing the predictors data and technical support. The DAI data download gateway is made possible through collaboration among the Global Environmental and Climate Change Centre (GEC3), the Adaptation and Impacts Research Division (AIRD) of Environment Canada, and the Drought Research Initiative (DRI).

Appendix

A.1. Cross-site correlation adjustment procedure using random noise

The cross-site correlation (math formula) between math formula (with standard deviation σi) and math formula (with standard deviation σj) at sites i and j is changed by adding correlated random series (with correlation coefficient math formula) Ri (~ N(0,  biσi)) and Rj (~ N(0,  bjσj)). The bi and bj are constants to determine the standard deviation ratios of Ri and Rj to math formula and math formula, respectively. The variances of math formula(math formula) and math formula(math formula) and the covariance between them are defined below:

display math(A1a)
display math(A1b)
display math(A1c)

The correlation coefficient math formula between math formula and math formula can be calculated as below:

display math(A2)

In the equations, math formula is function of bi, bj, and math formula, which should be determined to make the adjusted correlation coefficient math formula equal to the correlation coefficient of the observed series math formula.

One can simply assume that the ratios of standard deviations of the random series Ri and Rj are equal (bi = bj = b). If one assumes that bi = bj = b and replaces the math formula with math formula, the constant b of Equation (A2) is determined as

display math(A3)

In the equation, one may want to maximize the contributions of initial precipitation series math formula and math formula, while minimizing the portion of random series Ri and Rj. One can estimate the smallest value b by replacing the math formula with zero (when math formula) or one (when math formula) in the equation. It is reasonable that uncorrelated random series should be added to math formula and math formula to decrease math formula when math formula > math formula, while highly correlated random series should be employed to increase math formula when math formula < math formula. As examples, Figure A1 presents how a large b value is required to reproduce an observed cross-site correlation coefficient math formula for three cases: when (1) math formula and math formula, (2) math formula and math formula, and (3) math formula and math formula. In the figure, the range of observed cross-site correlation coefficient math formula is from 0.2 to 0.9, which should be a commonly desired correlation coefficient range to reproduce in real applications. It can be expected that the case (1) might be a more common case than the others when the initial precipitation series math formula and math formula are generated from global-scale GCM precipitation outputs (see this study's application result). It is obvious that the value b increases as the difference between math formula and math formula increase. Note that the b values for precipitation occurrence and amount series are different because the cross-site correlations of precipitation occurrence and amount series math formulas are generally different.

Figure A1.

Relationships between b values and observed cross-site correlation coefficients math formula for three cases when (1) math formula and math formula, (2) math formula and math formula, and (3) math formula and math formula.

For N observation sites, one can estimate the N − 1 number of constants b from the combinations between site i and all different sites j (j = 1, …, N, j ≠ i). For each site i, the largest value among the calculated bs is determined as bi to recalculate math formula to satisfy all combinations and ensure that all math formula remain positive. After determining bi and bj, the correlation coefficient math formula is recalculated with the following equation, which is driven from Equation (A2) by replacing math formula with math formula.

display math(A4)

The covariance between different sites i and j is subsequently calculated from determined bi, bj, and math formula. The correlated random number series for all observation sites are generated from a multivariate normal distribution with estimated covariance matrix among N observation sites.

When continuous series math formula and math formula are transformed to binary series math formula and math formula, the binary series cross-site correlation coefficient is usually smaller than that of the continuous series. Correlations between binary series (ξij) and continuous series (ωij) at locations i and j can be derived empirically (Wilks, 1998). This study employed a simple power function to derive the empirical relationships as below:

display math(A5)

The parameters c and d have been estimated to yield the smallest RMSE between every N(N−1)/2 pairs of cross-site correlation coefficients in the observed binary series and the same number of cross-site correlation coefficients in the transformed binary series.

Ancillary