Climate change has resulted in various unlikely events such as sea level rise, global warming, disruptions of water availability and frequent occurrence of extreme events. The Fourth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC, 2007) has indicated a global surface temperature increase of 1.8–4.0 °C by the year 2100 as compared to the present climate, with maximum changes in the high latitude regions (Khaliq et al., 2008). Rising temperature will have a major effect on atmospheric processes, and will impact the amount of precipitation a region receives. The rising global temperature will also make extreme events occur more frequently (Barnett et al., 2006; Wilcox et al., 2007; Allan et al., 2008). Precipitation is an important parameter because it is one of the driving factors within the hydrologic cycle. In particular, an increased intensity of extreme precipitation events may have a large impact on society, e.g. in urban areas, mountainous areas, agricultural areas and river basins as a whole (Booij, 2002). Thus, it is important to look at the extremes in precipitation and temperature, as these could have a substantial impact on the population, which would be much larger than that due to the increase in mean temperature alone (Chen et al., 2008).
Atmosphere–ocean coupled general circulation models (AOGCMs) offer simulated current and predicted future time series of climate variables, considering various internal and external climate forces, including various scenarios for increases in greenhouse gases. Unfortunately, the accuracy of AOGCMs decreases at finer spatial and temporal scales; a typical resolution of the AOGCM ranges from 250 to 600 km, but the need for impact studies, conversely, increases at finer scales. The representation of regional precipitation is distorted due to this coarse resolution, which does not provide for the local effects and does not capture the subgrid-scale processes significant for the formation of site-specific precipitation conditions. While some models are parameterized, details of the land–water distribution or topography in others are not represented at all (Widmann et al., 2003).
Statistical and dynamic downscaling are two commonly used techniques for the development of climate scenarios depending on their accuracy for different seasons, regions, time periods and the variables of interest. A non-parametric weather generator–based statistical method is applied to generate GCM-based precipitation. Weather generators are nothing but sequences of weather variables that can also be regarded as complex number generators, the output of which resembles daily weather data at a particular location. The parameters of the weather generators are conditioned upon a large scale state or upon the relationships between daily weather generator parameters and climatic averages. They can be used to characterize the nature of future days on the basis of more readily available time-averaged climate change information (Wilks and Wilby, 1999). The early work using weather generators as a downscaling tool in climate change studies can be found in Hughes (1993); Hughes and Guttorp (1994); Hughes et al. (1999); Wilks and Wilby (1999), etc. An overview of stochastic weather generation models is presented by Wilks and Wilby (1999). Examples of non-parametric weather generators that have been successfully employed in climate change studies are LARS-WG (Semenov and Barrow, 1997), K-nearest neighbour (K-NN) (Yates et al., 2003; Sharif and Burn, 2006) and EARWIG (Kilsby et al., 2007).
Considerable research efforts have been undertaken to statistically model the high precipitation amount, with much evidence of its heavy-tailed distribution (Koutsoyiannis, 2004). Weather generators are made to consistently model the precipitation extremes with this information. But the use of weather generators in improving simulation of precipitation extremes is limited. Furrer and Katz (2008) proposed several possible advanced statistical approaches for improving the treatment of extremes within a parametric generalized linear modeling (GLM)-based stochastic weather generator framework. They found a substantial improvement with a hybrid technique that made use of a gamma distribution for low-to-moderate intensities and a generalized Pareto distribution for high intensities. Sharif and Burn (2006) used non-parametric the K-NN weather generator model for simulating extreme precipitation events and found encouraging results in simulating extreme dry and wet spells.
Different downscaling procedures produce different results from the same AOGCM outputs. So, the downscaled AOGCM outputs are burdened with uncertainties due to intermodal variability (the AOGCM uncertainty), inter-scenario variability (scenario uncertainty), inter-modal variability and variability due to downscaling methods themselves (Ghosh, 2007). The purpose of the study is twofold: first, a principal component (PC)-based weather generator is applied to develop synthetic precipitation data from AOGCM outputs for future climate. Next, the propagation of AOGCM and inter-scenario uncertainty is investigated to increase our understanding of the impacts of climate change with an emphasis on extreme precipitation events.
2. Case study area and database
2.1. The Upper Thames River basin
The Upper Thames River (UTR) basin, located in Southwestern Ontario, Canada is a 3500 km2 area nested between the Great Lakes of Erie and Huron. The basin has a population of about 420 000. London, Ontario is the major urban centre with a population of around 350 000. The Thames River is about 273 km long with an average annual discharge of 39.3 m3/s. The Thames River basin consists of two major tributaries of the River Thames: the North branch (1750 km2), flowing southward through Mitchel, St. Marys and eventually into London where it meets the South branch; and the South branch (1360 km2) flowing through Woodstock, Ingersoll and east London. The basin receives about 1000 mm of annual precipitation, 60% of which is lost through evaporation and/or evapotranspiration, stored in ponds and wetlands or recharged as groundwater (Prodanovic and Simonovic, 2006).
For the purpose of analysis the following databases were used:
Daily observed precipitation, maximum and minimum temperature (Tmax and Tmin) data from 15 stations covering the UTR basin (Figure 1) for the period of 1979–2005 has been collected from Environment Canada.
Time series of climate variables for different regions of the world are available at the Canadian Climate Change Scenarios Network (CCCSN) website. For the present study, six AOGCMs' climate data (daily precipitation, Tmax and Tmin), each with two to three emission scenarios have been collected. The AOGCMs used in the study are the third generation Canadian Coupled Global Climate Model at T47 (CGCM3T47) and T63 (CGCM3T63) resolutions, Australia's Commonwealth Scientific and Industrial Research Organization generated MK3 Climate Systems Model (CSIROMK3.5), Goddard Institute for Space Studies provided Atmosphere-Ocean Model (GISS-AOM), the Japanese Model for Interdisciplinary Research on Climate version 3.2 in high (MIROC3.2HIRES) and medium (MIROC3.2MEDRES) resolutions. Three scenarios, A1B, A2 and B1, developed by IPCC's Special Report on Emission Scenarios (SRES) have been used in order to investigate the widest possible range of future climates. Table I lists the AOGCMs used with the available scenarios for each model.
Table I. List of AOGCM models and emission scenarios used
Canadian Centre for Climate Modelling and Analysis
A1B, B1, A2
A1B, B1, A2
Commonwealth Scientific and Industrial Research Organization (CSIRO) Atmospheric Research, Australia
A1B, A2, B1
National Aeronautics and Space Administration (NASA)/Goddard Institute for Space Studies (GISS), USA
MIROC 3.2 HIRES, 2004
Center for Climate System Research (University of Tokyo), National Institute for Environmental Studies and Frontier Research Center for Global Change (JAMSTEC), Japan
MIROC 3.2 MEDRES, 2004
A1B, A2, B1
Stochastic weather generators simulate weather data to assist in the formulation of water resource management policies. The basic assumption for producing synthetic sequences is that the past would be representative of the future. They are essentially complex random number generators, which can be used to produce a synthetic series of data. This allows the researcher to account for natural variability when predicting the effects of climate change. The initial version of the weather generator (Sharif and Burn, 2006) was developed based on a K-NN resampling strategy proposed by Yates et al. (2003). The major drawback of the K-NN weather generator developed by Yates et al. (2003) was that the observed maximum–minimum range is the same as that of the synthetic dataset. Sharif and Burn (2007) improved this algorithm by adding a perturbation process that can calculate alternative extremes for the dataset. The original version of the weather generator was later revised to account for leap years (Prodanovic and Simonovic, 2006) and to further incorporate principal component analysis (PCA) to reduce the multicollinearity of the predictor variables (Eum et al., 2009).
4. Model setup
4.1. Data pre-processing
Precipitations, as well as maximum and minimum temperature (Tmax and Tmin, respectively, have been collected for each of the six AOGCMs' emission scenarios surrounding the Thames River basin. Data have been obtained for time slices: 1960–1990, 1990–2000, 2001–2010 and 2041–2070 (2050s). Pre-processing of the AOGCMs has been carried out in two steps that are explained in the following sections.
4.1.1. Spatial interpolation of AOGCMs
Climate variables from the nearest grid points have been interpolated to provide a dataset for each of the stations of interest. For the purpose of interpolation, the inverse distance weighting method (IDW) is used. The method works by taking the AOGCM variables from the four nearest grid points around the station, and computing the distance from each grid point to the station of interest.
4.1.2. Calculation of change factors for future climate
In order to generate future climate data, the difference between the base climate and the AOGCM outputs (2041–2070 or 2050s) have been computed for precipitation, Tmax and Tmin variables. The change factors are then used to modify the historical dataset collected for each station. Monthly temperature change factors are added to the historical daily temperatures each month and the historical precipitation values are multiplied by the precipitation change factors. These new datasets, modified by the change factors, are then run through the WG-PCA described below to produce a synthetic 54-year-long dataset for each scenario. A synthetic version of the historical dataset using observed historical datasets as inputs has also been produced to evaluate the performance of WG-PCA.
5. Selection of principal components
In order to reduce the multidimensionality and collinearity associated with the large number of input variables, PCA has been integrated within the weather generator. The process requires selecting appropriate PCs that will adequately represent most information of the original dataset. Figure 2 presents the percentage of variances explained by each PC computed for the present study. It has been found that the first PC has been able to explain 81.76% of the variations associated with the inputs. Hence, only the first PC is considered for the weather generator.
The WG-PCA algorithm with p variables and q stations works through the following steps:
(1)Regional means of p variables for all q stations are calculated for each day of the observed data:
(2)Selection of potential neighbours, L days long where L = (w + 1)× (N − 1) for each of p individual variables with N years of historical record, and a temporal window of size w that can be set by the user of the weather generator is carried out. The days within the given window are all potential neighbours of the feature vector. The N data that correspond to the current day are deleted from the potential neighbours so that the value of the current day is not repeated (Eum et al., 2009).
(3)Regional means of the potential neighbours are calculated for each day at all q stations.
(4)A covariance matrix, Ct of size L × p is computed for day t.
(5)The first time step value is randomly selected for each of the p variables from all current day values in the historical record.
(6)Next, using the variance explained by the first PC, Mahalanobis distance is calculated using Equation 3.
where PCt is the value of the current day and PCk is the nearest neighbour transferred by the eigen vector. The variance of the first PC is Var(PC) for all K-NN (Eum et al., 2009).
(7)The selection of the number of nearest neighbours, K, out of L potential values using .
(8)The Mahalanobis distance dk is arranged in the ascending order, and the first K neighbours in the sorted list are selected (the K-NN). A discrete probability distribution is used, which weights closer neighbours highest in order to resample out of the set of K neighbours. Using Equations 4 and 5, the weights, w, are calculated for each k neighbour.
Cumulative probabilities, pj, are given by
(9)A random number u (0,1) is generated and compared to the cumulative probability calculated above in order to select the current day's nearest neighbour. If p1 < u < pk, then day j for which u is closest to pj is selected. However, if pi > u, then the day that corresponds to d1 is chosen. If u = pK, then the day that corresponds to day dK is selected. Upon selecting the nearest neighbour, the K-NN algorithm chooses the weather of the selected day for all stations in order to preserve spatial correlation in the data (Eum et al., 2009).
(10)In order to generate values outside the observed range, perturbation is used. A conditional standard deviation σ for K-NN is estimated. For choosing the optimal bandwidth of a Gaussian distribution function that minimizes the asymptotic mean-integrated square error (AMISE), Sharma et al. (1997) reduced Silverman's (Silverman, 1986; pp 86–87) equation of optimal bandwidth into the following form for a univariate case:
Using the mean value of the weather variable obtained in step (9) and variance , a new value can be achieved through perturbation (Sharma et al., 1997).
where zt is a random variable, distributed normally (zero mean, unit variance) for day t. Negative values are prevented from being produced for precipitation by employing the largest acceptable bandwidth: where * refers to precipitation. If again a negative value is returned, a new value for zt is generated (Sharif and Burn, 2006).
7. Results and discussion
This study uses daily precipitation, Tmax and Tmin of 15 stations for the period of 1979–2005 (N = 27) to simulate plausible meteorological scenarios. Employing the temporal window of 14 days (w = 14) and 27 years of historical data (N = 27), this study uses 404 days as potential neighbours (L = (w + 1)xN − 1 = 404) for each variable. For the purpose of comparing performances of AOGCMs, weather generator results for the London station have been chosen. The following sections present the analysis of the obtained results in reproducing present and future climates.
7.1. Performance evaluation
The performance of the weather generator in generating current climate in the UTR basin is first evaluated to check whether the generated data is able to preserve the statistical attributes of the observed ones while the data points are perturbed. A new subset of data is obtained by calling an integer function N times for generating N number of years, which returns the integers between the upper and lower bounds. Each year of data has equal probability of being selected. Box plots have been used to illustrate the results, as they provide a wide range of variation of the dataset's statistics. The top and bottom lines of the plot represent the 75th and the 25th percentiles, respectively. The middle line in the box represents the median. The whiskers extend out to 1.5 times the interquartile range of the data (range of the data between the 25th and 75th percentiles). Values that go beyond those points have been identified as outliers and marked in black.
Total monthly precipitation values for the simulated data are presented as box plots in Figure 3. The historical mean values are represented by the line plot. For all the months, the median of the simulated data remained close to the mean of the observed data. There is a slight underestimation in the total precipitation for the months of August, September, October, November and December. For the rest of the months, the mean total precipitation of the observed data is very close to the median of the simulated data. There are a number of outliers beyond the whiskers, marked as black dots; however, these are representative of the increased variability due to the perturbation process in the weather generator. Precipitation has the greatest variability of all the weather variables, so overall the performance of the WG-PCA algorithm in this aspect is very good.
In flood management models, wet day statistics are also important to check for precipitation sequence. The number of wet days for the simulated data in London is illustrated as monthly box plot in Figure 4. It is clear that the observed values (shown as a line plot) agree very well with the simulated data. There is a slight overestimation by the simulated data for the month of February and an underestimation for January but the results are otherwise very satisfactory.
In parametric models, the correlation structure of the historical data is often not well produced. But in the K-NN model, it is important to preserve the correlation structure of the observed data intact in the simulated data. To keep the inherent correlation structure unaltered, a constant value of the random normal variate for all the variables and all stations are used at any given time step (Sharif and Burn, 2007). This section investigates the extent to which the correlation structure changed from the observations. Figure 5 presents box plots of the monthly correlations between precipitation and maximum temperature. Observed data has shown positive correlation during winter months; correlations during summer months are very close to zero, which indicates a statistically insignificant correlation for these months.
7.2. Generation of climate change scenarios
A major focus of this study is to evaluate the performance of WG-PCA in simulating the future precipitation amounts which may be larger than the observations. This section presents the performances of WG-PCA in simulating 54 years of future precipitation using the informations from the AOGCMs with plausible scenarios.
First, bar graphs are used to illustrate the change factors for the various AOGCMs and emission scenarios. The change factors for precipitation are illustrated as percentage changes in Figure 6. While one scenario has predicted an increase, the other may have associated with a decrease in precipitation for any specific month. This is more evident during the summer months and early winter. For example, during November, the difference between the CGCM3T63 scenario A2 and CSIROMK3.5 scenario A1B is 68.9%. Despite wide variations in predictions, one interesting observation is that all scenarios have predicted increase in winter precipitation from December through April. The wide range of different future climate projections, thus, clearly suggests interpretation of the obtained results as plausible scenarios rather than as predictions of future climate conditions.
Box plots of the total monthly mean precipitation for all scenarios are presented in box plots from Figure 7 through Figure 12. Figure 7 shows the box plots for the model CGCM3T47 scenarios A1B, A2 and B1, respectively. It is clear from the number of outliers that the WG-PCA has been able to produce a dataset adequately. For summer months, especially from May through September, A1B and B1 scenarios have predicted a decrease in precipitation while A2 has predicted less precipitation for June and August. The precipitation generated for the month of November in A2 scenario is the only month where the observed precipitation falls below the 25th percentile value of A2.
The monthly precipitation box plots for the CGCM3T63 model are shown in Figure 8. Overall, the months of November and September have the greatest range of monthly precipitation totals among all the emission scenarios. The medians for November were the highest in all scenarios. The interquartile ranges (the boxes) of SRES scenario A1B were much larger than those of A2 and B1. Unlike the CGCM3T47 model, A1B and B1 scenarios of CGCM3T63 have projected a decrease in precipitation only in two summer months: June and July by A1B and May and June by B1. However, CGCM3T63 A2 has projected wider variability: a decrease in precipitation during most of summer and increase of precipitation during winter. Overall, the performance of the model in producing the changes in monthly precipitation totals has proved only satisfactory except for November.
The precipitation produced by CSIROMK3.5 is somewhat different than the CGCM models (Figure 9). For all three scenarios, the summer months are projected with an increase in precipitation except for September through November for A1B, June for B1 and August, October and November for A2. A similar trend has been seen for the GISS-AOM, A1B and B1 scenario simulations (Figure 10). The SRES scenario, A1B, predicts an increase of early springtime precipitation and extended period of summer precipitation with greater variability than scenario B1.
The high-resolution MIROC3.2 model has predicted different results that what is seen in previous models. While A1B predicted slight increase of summertime precipitation, the remaining scenarios including the ones from the mid-resolution model have predicted a decrease in precipitation for most months (Figures 11 and 12).
As a whole, all the models except MIROC3.2 high- and low-resolution models have predicted an increase in wintertime precipitation. The models have shown greater variability in predicting summertime precipitation: in general, the three scenarios of CGCM3T47, CSIROMK3. 5 and the B1 scenario of CGCM3T63 model have predicted an increase of summertime precipitation; two scenarios of GISS-AOM and the A1B scenarios of MIROC3.2 high-resolution models could not show any change in the precipitation when compared to the historically observed precipitation. The remaining models, such as B1 and A2 scenario of CGCM3T63, and three scenarios of MIROC3.2 medium-resolution models have predicted a decrease in summertime precipitation. Interestingly, the medium-resolution model has generally predicted a year round decrease in precipitation, which is just a contrast to what has been seen from the rest of the model results. This clearly indicates that a careful and thorough investigation is needed while applying the medium resolution MIROC model for the study area.
7.3. Simulation of extreme events
A moderate change in the precipitation under a new climate has the capacity to shift the timing and volume of runoff pattern in Canada: reductions in spring and summer runoff, increase in winter runoff and earlier peak runoff (Sharif and Burn, 2007). It is therefore important to assess the changes in extreme precipitation amounts. In this study, three precipitation indices (Table II), proposed by Vincent and Mekis (2006) have been used for comparing the performances of the AOGCMs in generating extreme precipitation amounts.
Table II. List of extreme precipitation indices
Heavy precipitation days (≥10 days)
Number of days with precipitation ≥ 10 mm
Very wet days (≥95th percentile)
Number of days with precipitation ≥ 95th percentile
Highest 5 day precipitation amount
Maximum precipitation sum for 5 day interval
These indices describe precipitation frequency, intensity and extremes. The highest 5 day precipitation, very wet days and the heavy precipitation days express extreme features of precipitation. For very wet days, the 95th percentile reference value (18.3 mm) has been obtained from all non-zero total precipitation events for 1979–2005. It is better to use indices based on percentile values rather than a fixed threshold in Canada due to large variations of precipitation intensities in various regions.
7.4. Uncertainty estimation
An analysis of the uncertainties is made for a quantitative evaluation of the reliability of statistically downscaled climate data representing the local climate conditions in the UTR basin. The analysis focuses on the monthly and seasonal mean values and variability of precipitation over different climate regimes.
Figure 13 presents probability plots of heavy precipitations generated by the AOGCMs at 95% confidence interval (upper and lower bound in each set) with Weibull's distribution using maximum likelihood estimates. The parameter estimates have been displayed with Anderson–Darling (AD) goodness-of-fit statistic and associated p value. The AD measures how well several distributions from several AOGCMs follow the historical observations. A lower value of p (usually < 0.05) indicates that the data do not follow the specified distribution. For comparison of several distributions with AD, the smallest AD statistic indicates the closest fit to the data. One common feature of all AOGCMs is that they are positively (rightward) skewed indicating more data points in the right tail in the upper half than expected. This clearly suggests increase in the number of heavy precipitation days. The higher AD and lower p (<0.5) values indicate that the CGCM3T47 A1B, CGCM3T63 A1B, MIROC3.2MEDRES A1B and B1 scenarios do not follow the same data distribution as the historical data.
Changes in the precipitation indices compared to the historically observed 1979–2005 values are computed and presented in Table III. The mean change in the heavy precipitation is not very significant over the 54 year period; CGCM3T63 A1B shows an increase of 3 days of heavy precipitation events. Interestingly, a few models have shown a decrease in the occurrences of heavy precipitation days. So the deviation of the occurrence of heavy precipitation days between the AOGCMs is 5.3 days.
Table III. Change in precipitation indices compared to 1979–2005
Number of heavy precipitation days
Number of very wet days
5 day maximum precipitation, mm
Figure 14 and Table III show a comparison of probability plot of the number of days associated with greater than 95th percentile precipitation as predicted by AOGCMs. CGCM3T47 A2, CSIROMK3.5 A2 and CGCM3T63 A1B scenarios have predicted higher occurrence of very wet days with an increase of 2.5 days. However, scenarios, such as MIROC3.2MEDRES A1B, GISS-AOM B1 and CSIROMK3.5 A1B have predicted a decrease of 2 very wet days. So in this case also, the deviation of the occurrence of very wet days between the AOGCMs is 5 days.
Figure 15 and Table III present frequency plots of the highest 5 day maximum precipitation accumulated over each year. The AOGCMs have predicted a wide variation in predicting the extent of higher precipitation amounts. The relative positions of the peaks of CGCM3T47 A1B, GISSAOM B1, MIROC3.2HIRES A1B, MIROC3.2-MEDRES A1B and MIROC3.2MEDRES A2 show that these models have captured the highest 5 day precipitations very well. The shorter and wider-looking fitted distributions of CGCM3T47 A2 and CGCM3T63 B1 have shown greater variability. However, the highest frequency of the 5 day maximum precipitation ranges between 100 and 120 mm for most of the scenarios, which is higher than the historically observed precipitation (around 60 mm). The change in the precipitation amounts, however, does not show any specific pattern.
This study investigates the potential impact of climate change due to increased precipitation events on the UTR basin using WG-PCA-integrated algorithm. Six AOGCM models have been used along with three SRES emissions scenarios. Fifty-four years of synthetic data has been created using the WG-PCA algorithm with downscaled AOGCM data. For the purpose of comparing performances of AOGCMs weather generator results, the London station has been chosen. The weather generator has been able to adequately reproduce a historical dataset that is statistically similar to the observed data. The model has also been able to simulate future plausible scenarios presented by the AOGCMs. For a given scenario, this model has produced an unprecedented amount of precipitations, thus enabling higher accuracy of generating higher and lower extreme values which are more appropriate of assessing the flood and drought conditions in the study area under a changed climate. Generated results have been able to keep the correlation structure of the observed values, which is important to produce hydrologic models at watershed scale. The climate change scenario simulations indicate wider variability between the plausible scenarios. These discrepancies are not only due to the incorrect reconstruction of local regimes in the GCMs because of their coarse resolution but also due to the physical parameterization in the climate models, which is related to surface processes over heterogeneous conditions such as frost/thaw characteristic of the soil, including water content and its solid and/or liquid phase distribution during the year. The comparisons of different AOGCMs reveal that some of the scenarios are significantly different than their counterparts. For example, while most scenarios predicted an increase in precipitation during winter months, GISSAOM B1 predicted a decrease in precipitation for the months of December and February. MIROC3.2HIRES B1, MIROC3.2MEDRES A2 and A1B have predicted a decrease in precipitation only except the months of January to May while other scenarios predicted a different result. However, all models have indicated a decrease in summer precipitation and increase in wintertime precipitation. The performances of the AOGCMs in predicting extreme precipitation indices are assessed by precipitation indices. No consistent pattern has been found in the number of highest 5 day maximum precipitation. The variability between the AOGCMs is even wider in case of extreme precipitation, which increases the difficulty of detecting a significant pattern. This inconsistency clearly indicates the need for regional studies to explore local characteristics of precipitation extremes and improving the model quality by introducing more input variables relevant to the precipitation extremes.
The authors wish to gratefully thank Canadian Foundation for Climate and Atmospheric Sciences for providing financial assistance. The constructive comments by anonymous reviewers for improving the manuscript are also greatly acknowledged.