The asynchronous regional regression model (ARRM) is a flexible and computationally efficient statistical model that can downscale station-based or gridded daily values of any variable that can be transformed into an approximately symmetric distribution and for which a large-scale predictor exists. This technique was developed to bridge the gap between large-scale outputs from atmosphere–ocean general circulation models (AOGCMs) and the fine-scale output required for local and regional climate impact assessments. ARRM uses piecewise regression to quantify the relationship between observed and modelled quantiles and then downscale future projections. Here, we evaluate the performance of three successive versions of the model in downscaling daily minimum and maximum temperature and precipitation for 20 stations in North America from diverse climate zones. Using cross-validation to maximize the independent comparison period, historical downscaled simulations are evaluated relative to observations in terms of three different quantities: the probability distributions, giving a visual image of the skill of each model; root-mean-square errors; and bias in nine quantiles that represent both means and extremes. Successive versions of the model show improved accuracy in simulating extremes, where AOGCMs are often most biased and which are frequently the focus of impact studies. Overall, the quantile regression-based technique is shown to be efficient, robust, and highly generalizable across multiple variables, regions, and climate model inputs.
Atmosphere–ocean general circulation models (AOGCMs) and the new generation of earth system models provide insights into the dynamic nature of possible climate responses to anthropogenic forcing. With spatial scales typically on the order of one half degree or coarser, however, they are unable to simulate climate at the local to regional scale. To compensate for this relatively coarse resolution, a number of dynamical and statistical techniques have been developed to downscale climate model outputs to the impact-relevant spatial and temporal scales at which observations are made.
Despite the plethora of downscaling methods in the literature (Crane and Hewitson, 1998; Wilby et al., 1998; Huth et al., 2001; Stehlik and Bardossy, 2002; Wood et al., 2004; Haylock et al., 2006; Schmidli et al., 2006; Kostopoulou et al., 2007; Hidalgo et al., 2008; to name just a few out of hundreds), relatively few downscaling methods have been applied to quantify potential impacts of climate change at the local to regional scale for a broad cross-section of regions and sectors across North America. The majority of studies of climate change impacts in the United States, for example, rely on one of five methods: a delta approach whereby a change or ‘delta’ is added to observed mean annual, seasonal, or monthly values in order to get future values (Hay et al., 2000; as used in USGCRP, 2000); simulations from a regional climate model (e.g. Mearns et al., 2009; as used in NARCCAP); the Bias Correction-Statistical Downscaling model originally developed as a front end to the hydrological variable infiltration capacity model, which uses a quantile mapping approach to downscale monthly AOGCM-based temperature and precipitation to a regular grid (Wood et al., 2004; as used in Hayhoe et al., 2004, 2008; Luers et al., 2006; USGCRP, 2009); a constructed analogue approach that matches AOGCM-simulated patterns to historical weather patterns (Hidalgo et al., 2008; as used in Luers et al., 2006); and a linear asynchronous regression approach that downscales daily AOGCM-based temperature and precipitation to individual station locations (Dettinger et al., 2004; as used in Hayhoe et al., 2004, 2008, 2010).
Each of these methods has its own benefits, and each can be sufficient for certain applications. For example, the simple and transparent delta approach can yield a nearly identical downscaled annual or seasonal mean temperature value as a more complex statistical model. At the other end of the spectrum, complex regional climate models are computationally demanding, but provide consistent high-resolution projections for a plethora of surface and upper-air variables. None of these five methods, however, allows for using multiple climate models and scenarios as input while downscaling to any spatial scale (including both station-based and gridded), simulating additional impact-relevant variables (such as solar radiation and humidity), and adequately resolving projected changes in daily climate extremes, at the same time.
For that reason, we have developed a new statistical downscaling model, the asynchronous regional regression model (ARRM). ARRM builds on the same statistical technique used by the last downscaling approach listed above (Dettinger et al., 2004), asynchronous quantile regression, to define a quantitative relationship between any daily observed and simulated surface variable that has a symmetric distribution, with particular emphasis on accurately resolving the relationship at the tails of the distribution in order to capture simulated changes in extremes. Asynchronous quantile regression removes the time stamp from historical observations and simulations, reordering each time series by value before matching quantiles of observed data with those from AOGCM output. This is important because coupled AOGCM simulations generate their own patterns of natural variability, meaning that no day-to-day or even year-to-year correspondence with observations should be expected.
The general concept of quantile regression was originally developed in the field of econometrics by Koenker and Bassett (1978) to estimate conditional quantiles of the response variable as opposed to the conditional mean estimated by the orthodox least-squares regression method. The quantile regression approach is of particular utility to geospatial data, in that it can be used to determine relationships between two quantities that are not measured simultaneously, such as an observed and a model-simulated time series. It takes advantage of the hypothesis that although the two time series may be independent, their distributions may be similar.
The general technique of quantile regression has been used in a variety of applications, including by O'Brien et al. (2001) to determine relationships between measurements of relativistic electron conditions measured from two different satellites passing over the same area at different times. Dettinger et al. (2004) were the first to apply this method to downscaling AOGCM output, to examine simulated hydrologic responses to climate change. In this application, the first time series was observations and the second, historical model simulations. The regression model derived from these two distributions was then applied to transform the distribution of, or downscale, future model simulations.
The objective of this study is to build on the foundation of quantile regression to develop a relatively straightforward, flexible, efficient, and robust statistical model that is capable of downscaling any atmospheric variable, measured on a daily or monthly basis, which has, or can be transformed into, an approximately symmetric distribution. Section 'Model development' describes the statistical basis of the model and refinements that improve its ability to downscale global model outputs. Section 'Data and simulations' describes the long-term weather station observations and the AOGCM outputs used to evaluate the downscaling model in terms of its ability to simulate observed temperature and precipitation, using the same variables from the AOGCMs as predictors. Section 'Model evaluation' describes how the model was developed in multiple steps, each of which is successively tested to ensure that the additions improve the model's ability to reproduce historical climate. Section 'Future projections' discusses the results of applying the downscaling model to end-of-the-century temperatures and precipitation and the changes between downscaled and raw AOGCM output compared with present conditions. Finally, Section 'Conclusions' summarizes the findings of this study.
2. Model development
2.1. Statistical basis
The concept of quantile regression was first introduced by Koenker and Bassett (1978), where quantiles refer to values of a cumulative population (i.e. when the data are sorted by increasing value) that divide the population into equal-sized segments. Quantiles are the data values marking the boundaries between consecutive subsets. If the data are divided into q equal-sized subsets, the kth quantile for a variable is the value x such that the probability that the variable will be less than x is no greater than k/q and the probability that the variable will be more than x is no greater than (q − k)/q. A distribution has q − 1 quantiles, one for each integer k satisfying 0 < k < q.
In general, regression analysis quantifies covariance between variables, and, if it exists, provides a model to predict one variable on the basis of the other variables used as input to the regression. Quantile regression specifically estimates conditional quantile functions – models in which quantiles of the distribution of the predictor variable are expressed as functions of observed covariates (Koenker and Hallock, 2001). In other words, quantile regression results in estimates approximating the quantiles of the predictor variable. For a time series containing N values there are N ranks in each vector. A model can be constructed by regressing the value at rank ni of the simulated vector onto the value of the same rank of the vector containing observed values, for i = 1…N (as done for example in Dettinger et al., 2004). This regression is asynchronous, i.e. data values that are regressed against each other did not necessarily occur the same calendar day, but rather correspond by quantile or rank. The regression model derived from historical AOGCM simulations and historical observations can then be applied to future AOGCM simulations, to project downscaled future conditions.
Asynchronous regression is an important component of this model, because a coupled AOGCM simulation is free to evolve chaotically, with only the external forcings being prescribed; hence, each simulation represents one out of many possible outcomes and no daily correspondence between the model and observations should be expected.
2.2. Model input
Both theoretical and practical considerations affect the selection of inputs to quantile regression. First, it is important to verify that the two time series (simulated and observed, or predictor and predictand) have somewhat similar distributions; the closer both distributions are to Gaussian, the simpler the function relating the two distributions. Even non-Gaussian distributions can sometimes be manipulated to mimic a Gaussian distribution; here, in the case of precipitation, by taking the natural logarithm of the daily wet day precipitation values.
To train the downscaling model, the observed record must have an adequate length and quality of data. A minimum of 20 consecutive years of daily observations with less than 5% missing data is usually needed in order to appropriately sample from the range of natural climate variability at most of the station locations examined here and to produce robust results without overfitting. To challenge the downscaling model, two stations were selected for this evaluation that had substantially less data available (Bridgeport, WV with 78% and Moosehead Lake, ME with 88% of daily data missing over 50 years).
2.3. Model structure
The structure of the ARRM model is summarized in Figure 1. The first step is to prepare the data by separating it into 12 vectors by month such that a separate statistical model can be built for each month. This accounts for different weather patterns dominating any given region at different times of the year that could alter AOGCM biases relative to observations. Two weeks of overlapping data on either side of each month are included to account for future conditions that may lie outside the range of a typical historical month. This extension also doubles the use of each data point during the training process. Each month's time series is then reordered by rank to create an asynchronous vector. Figure 2 shows AOGCM-simulated (grid cell containing the weather station) versus observed temperature for chronological and for sorted data, illustrating how ranking of the inputs provides a correlation between observations and model simulations whereas matching by calendar date does not.
The second step in the ARRM model is to fit a regression function to the ranked values shown in Figure 2(b). For most station locations and global models, a linear fit (as used in Dettinger et al., 2004) is adequate within at least the 20th–80th percentiles of the distribution (dark–coloured line in Figure 3) with a high coefficient of determination (R2). However, residuals are often large near the tails of the distribution that, depending on the application, can be of greater interest to climate impact studies than values at the centre of the distribution. Polynomials of increasing order result in increasingly better fits to the historical observations (not shown), but run two serious risks: first, of overfitting, and second, of exhibiting unnatural behaviour at the tails of the distribution that could unrealistically predict lower observed temperatures for higher modelled values than for lower modelled values, and vice versa.
Instead, we found that a piecewise linear regression (light–coloured, segmented line, Figure 3) provided the most consistent fit while accounting for biases in model values near the tails of the distribution; biases that can be markedly different than those simulated for values near the centre of the distribution. Adding breakpoints, or knots, allows for different slopes at different parts of the distribution, in particular minimizing the residuals at the tails of the distribution when compared with either a linear or a polynomial fit.
R (R Development Core Team, 2012), the statistical programming language used to build ARRM, has spline-based functions such as bs and ns that can add breakpoints to a regression. However, these functions require the user to set the number of breakpoints manually and then place the points at predetermined, evenly distributed, quantiles. As illustrated in Figure 3, the ideal number of breakpoints can vary broadly, depending on the characteristics of model bias for a given month and/or location. A new function was therefore required that would optimize the regression model for each month by automatically identifying the number and location of up to six independent breakpoints. This piecewise linear regression function is described next.
The third step in the ARRM model is to use the statistical regression models, constructed from observed and historical simulated time series, to downscale future projections. The resulting downscaled values must subsequently be rearranged back into the original order to retrieve the final product, a continuous chronological time series of the downscaled values.
2.4. Piecewise linear regression function
The piecewise linear regression function developed for ARRM is based on linear regression that iterates over a moving window. For the majority of the distribution, the window width remains fixed at a given percentage of the total number of data points for that particular month. As the concentration of data points near the tails of the distribution is much sparser than at the centre, window width at the tails of the distribution decreases linearly to a minimum width by the ends of the distribution.
This function requires four fixed settings: the percentage of data points in the fixed window width, the minimum and maximum probabilities over which a fixed window width is used, the minimum permissible width of the window at the tails of the distribution, and the maximum number of breakpoints allowed. Optimal values for these settings are a function of AOGCM bias, characteristics of which differ from one variable to another. In general, a fixed window width of 5% of the distribution between probabilities of 0.1 < P < 0.9, linearly decreasing to a minimum width of either 2 °C or ten data points (whichever is greater) for P < 0.1 and P > 0.9, is adequate for temperature as the relationship between observed and modelled values tends to be relatively linear over much of the distribution. For precipitation, greater variability in AOGCM bias over the distribution requires a wider fixed window width, on the order of 10%, between probabilities of 0.15 < P < 0.85, linearly decreasing to a minimum of 5% of the mean value or ten data points (again, whichever is greater) for P < 0.15 and P > 0.85.
Up to six breakpoints are allowed in each regression model. This number was determined on the basis of two factors: first, visual testing by plotting downscaled projections for the historical period for individual months showed that more breakpoints tended to increase the risk of overfitting, such as introducing shorter segments with negative slopes, particularly for months with sparse data or poor model performance, and second, months with dense data rarely required more than six breakpoints and often far less. The function begins the piecewise regression at the 40th percentile, where the data point at the 40th percentile is the largest value in the window and moves up (to the highest quantile of the distribution) from there. In other words, the moving window starts with the X% data points below the 40th percentile, where X equals 5 for temperature and 10 for precipitation. The selection of the 40th percentile is to ensure that the middle part of the distribution is well covered by a moving window. QR matrix decomposition is used to fit a linear regression to the data in the window. The R2 value for each regression is recorded and saved in a vector, and the moving window is shifted up one data point towards the end of the distribution until it reaches the 100th percentile. The first breakpoint is defined as the central point of the window with the lowest R2 value of the vector, if the value of R2 for that window is less than the value for the entire time series. The R2 values on either side of that breakpoint are then blocked for the width of the moving window and a new minimum identified, for a total of up to three breakpoints in the upper half of the distribution.
This process is repeated beginning at the lowest found breakpoint, or if no breakpoints are found, at the 40th percentile moving down to the 0th percentile. This time, the moving window trails above the percentile. This allows an R2 value to be assigned to each data point in the monthly vector, from the first to the last. Setting a minimum window width of ten points means that breakpoints are not allowed to fall within the first and last five points of the dataset.
Before the statistical model is finalized, slopes between breakpoints are automatically reviewed. Breakpoints that create a negative slope can cause lower AOGCM values to produce higher downscaled values than higher AOGCM values. Breakpoints that create a slope close to zero (−0.1 < slope < 0.1) can create an unrealistic peak of nearly identical values in the downscaled distribution. Removal of a breakpoint causing a negative or ‘flat’ slope will always have a detrimental effect on the R2 value of the regression fit, because the segment having the negative or ‘flat’ slope yielded the best fit, but improve the realism and generalizability of the fit. Sometimes, when AOGCM biases are particularly nonlinear, the removal of negatives slopes can have a greater impact on the quality of the fit than the impact of having a few data points with downscaled values that decrease rather than increase for a small interval within the distribution. Hence, the function allows for negative or ‘flat’ slopes under two conditions: if they are not the first or last segment in the regression, and if they span less than five points. If these conditions are not met, the breakpoint below the negative slope is removed unless it is the first segment of the regression, in which case the breakpoint to the right is removed. One breakpoint is removed at a time and the process repeated once the regression and new slopes have been recalculated to determine whether a new segment with a negative slope has been introduced. This process is repeated until all negative or flat slopes have been eliminated.
Once the breakpoints have been finalized, the regressions are used to build a statistical model that performs piecewise linear regressions, with the use of spline interpolations, between the monthly simulated and observed data ordered by rank. This regression model can then be used to downscale future values, similarly ordered by rank, assuming stationarity in climate system feedback mechanisms.
2.5. Bias correction
As ARRM is a statistical model, there is a risk of introducing unrealistic values especially at the tails of the distribution, where data points are sparse and the slope of the initial and/or final regression can be very sensitive to a single extreme point. In some cases, an observational data point may even be in error. An example is the Global Historical Climatology Network (GHCN) dataset for Hialeah, FL, which had a recorded maximum temperature for 8 November 2003 with a value of −17.8 °C, 25 °C lower than the second lowest maximum temperature recorded for this station, and with temperatures for the previous and following days of 29.4 and 30.0 °C, respectively. This erroneous point noticeably affected the magnitude of predicted cold temperature extremes for this location. Unrealistic values in the original observations are therefore removed by the quality control procedure described in Section 'Data and simulations', prior to their use as input to the downscaling model.
Because of this sensitivity, downscaled extremes (defined as lying below the 5th percentile and/or above the 95th percentile of the distribution) that fall outside a realistic range for each station are corrected separately, by calculating the bias in percent difference between the downscaled value and the minimum or maximum observed value for that location. To avoid large biases that can be caused by small differences between low values, temperature is first converted to Kelvin and an arbitrary large number (here, 250) is added to daily precipitation values. For temperature, scaling is done by dividing the downscaled value by 1 + the bias when values fall more than 3% below or above the lowest or highest observed values (in Kelvin), respectively, or more than 2% above the highest observed precipitation value (with 250 added). For precipitation, the downscaling model in some cases predicts values below zero. These are reset to zero.
2.6. Variable-specific refinements
Although the downscaling model is purposely designed to be applicable to any variable with a relatively symmetric distribution, predictors must be preselected for each variable and there are some differences in the initial processing of each predictor that can improve the performance of the model in downscaling.
Selection of predictors for temperature and precipitation downscaling has been the subject of several comparative studies (Huth, 1999; Wilby and Wigley, 2000; Widmann et al., 2003; Jeong et al., 2012). ARRM has been designed to allow for user-selected predictors, if desired. For the purposes of model evaluation and comparison, predictors were chosen to be the same variables as the predictands: 2 m maximum and minimum temperature and 24 h cumulative precipitation. These are the most frequently archived daily output from both CMIP3 and CMIP5 AOGCMs; furthermore, comparison with upper-air predictors for the stations in this study showed no consistent improvement that would affect the performance of the downscaling model. For models that archive convective, total, and/or large-scale precipitation, the downscaling model calculates the RMSE for the historical training period between the observations and separate downscaled values using each of the three predictors. The predictor variable and corresponding regression model for the training period with the lowest RMSE for a particular month is used to downscale future precipitation for that month and station. This refinement significantly improved the method's ability to simulate precipitation over regions that tend to experience more convective-type precipitation, including the subtropics and mid-latitude summer.
Smoothing AOGCM output has been previously recommended (e.g. Raisanen and Ylhaisi, 2011), and it has been suggested that the smoothing that results from averaging may be one of the reasons why ensemble AOGCM projections typically outperform any individual model simulation (Knutti et al., 2010). Here, temperature fields are smoothed using Empirical Orthogonal Function (EOF) analysis, retaining only the EOFs accounting for 97% of the original variance. Root-mean-square errors (RMSEs) identified 97% as a generally appropriate threshold, with both higher and lower thresholds resulting in higher errors. This step improved model performance, especially for inland stations with higher variance.
Compared to temperature, precipitation tends to display a greater amount of smaller scale variability. This is likely one of the reasons why EOF filtering was found to degrade rather than assist precipitation downscaling. Precipitation is also a combination of a binary (wet/dry) and a continuous non-Gaussian distribution that must be transformed into a more symmetrical distribution before it can be ranked by quantile. Dettinger et al. (2004) used the square root of daily precipitation as a predictor, but we found that taking the natural logarithm of precipitation achieves a more symmetric distribution. To address the binary nature of the data, dry days must be omitted from the regression. However, simulated and observed time series of precipitation rarely contain the same amount of precipitable days. To correct for any differences in number of rainy days between observations and the simulated time series, the two time series are ordered by rank, extracting the top number of values in each vector corresponding to the number of rainy days in the shorter non-zero time series (usually observations, because AOGCMs tend to ‘drizzle’ or simulate many more low-precipitation days than observed; e.g. Chen et al., 1996; Sun et al., 2006; Perkins et al., 2007). Drizzle is also addressed by setting downscaled precipitation amounts less than trace (typically defined as 0.005 inches or 0.127 mm) to zero.
The fact that the downscaling process can only be applied to precipitable days raises concerns regarding model performance in extremely dry regions. Given the typical variance of precipitation, to have a confidence level of 95% there must be at least 57 samples in the dataset (i.e. at least 57 wet days in each of the 12 monthly time series that span the entire training period). This value was determined by applying a simple sample size calculation for linear and logistic regression following Hsieh et al. (1998). During the dry season in arid regions the sample size can be insufficient, even for 50 years of data including half a month on either side. If the sample is insufficient, the model automatically expands it by including an extra week's data on either side of that month (thus containing 3 weeks each of the prior and subsequent months), repeating the process up to a maximum of eight times until a sample size of at least 57 is reached. If 16 weeks have been added and the sample is still less than 57 but greater than 20, a linear regression is used. If less than 20 (which, for a training period of 50 years, would mean less than 1 day in 2 years with measurable precipitation), all downscaled values are set to zero for that month. This procedure has been tested and produces reasonable downscaling of historical precipitation in regions that are arid or semiarid.
The ARRM model was constructed in three distinct phases to quantify the contribution of specific elements to model performance. All phases build monthly models that incorporate 2 weeks' data on either side of the target month to double the sample size, and all versions prefilter the temperature and precipitation predictors as described above before ranking by value. The difference between the versions is the function used to fit the quantile–quantile relationship between observations and historical simulations. The first version applies a least mean squares linear fit (using the function lm in R), similar to that used in the SAR approach of Dettinger et al. (2004). The second version applies the piecewise regression function described above. The third version also uses piecewise regression, but incorporates removal of negative or flat slopes and bias correction near the tails. Removal of negative slopes is not expected to yield significant improvements in model performance, and in some cases it may even degrade initial performance; however, it is necessary to reduce the risk of unrealistic statistical relationships between modelled and observed values. The purpose of the comparison is not primarily to demonstrate the superiority of the final model, but rather to ensure that model performance is not overly degraded by this step.
The three different versions will be referred to as linear, simple piecewise, and full piecewise downscaling models, respectively. The ability of these three versions to downscale daily temperature and precipitation for 20 long-term stations in North America was evaluated using the data and model simulations described next.
3. Data and simulations
Downscaling was conducted and tested using observed daily minimum and maximum temperature and 24-h cumulative precipitation amounts for 20 long-term North American weather stations for the period 1960–2009. Seventeen of the stations are distributed across diverse climatic regions in the continental United States including coastal, central, and mountainous regions; two stations are located in Canada; and one in Mexico (Figure 4). Records were obtained from the GHCN (Peterson and Vose, 1997).
Although GHCN station data have already undergone a standardized quality control (Durre et al., 2008), before using the station data for downscaling they were filtered using a quality control algorithm to identify and remove (replacing with ‘NA’) erroneous values in the GHCN database. This additional quality control step included three tests for errors, removing data on any days where the daily reported minimum temperature exceeds the reported maximum, any temperature values above (below) the highest (lowest) recorded values for North America (−50 to 70 °C) or with precipitation below zero or above the highest recorded value for the continental United States (915 mm in 24 h), and repeated values of more than five consecutive days with identical temperature or non-zero precipitation values to the first decimal. Additionally, an erroneous value was found for Hialeah, FL, of −17.8 °C on 8 November 2003 (with temperatures of 29.4 °C the previous day and 30.0 °C the following day), which was removed.
3.2. Atmosphere–ocean general circulation models
Model output from four AOGCMs was used to evaluate the downscaling model. The models chosen for this study are all part of the Coupled Model Intercomparison Project version 3 (Meehl et al., 2007): the National Center for Atmospheric Research Community Climate System Model version 3 (CCSM3; Collins et al., 2006), the National Oceanic and Atmospheric Administration/Geophysical Fluid Dynamics Laboratory Climate Model version 2.1 (GFDL CM2.1; Delworth et al., 2006), the United Kingdom Met Office Climate Model version 3 (HadCM3; Pope et al., 2000), and the Department Of Energy/National Center for Atmospheric Research Parallel Climate Model (PCM; Washington et al., 2000). Previous studies (e.g. Gleckler et al., 2008; Stoner et al., 2009; Rusticucci et al., 2010) show that these models are able to represent key features of atmospheric variability including teleconnection patterns, extreme temperature and precipitation, as well as other climate metrics. A description of the models is provided in Table 1.
Table 1. Summary of key characteristics of AOGCMs used, including acronyms, host institution, as well as atmospheric and oceanic resolution
1.4° × 1.4°
1.125° × 0.43°
2.0° × 2.5°
0.9° × 1.0°
UK Met Office (UK)
2.5° × 3.75°
1.25° × 1.25°
2.81° × 2.81°
1.0° × 1.0°
Historical AOGCM simulations correspond to the CMIP ‘20th Century Climate in Coupled Models’ or 20C3M total forcing scenarios. These scenarios include forcing from anthropogenic emissions of greenhouse gases, aerosols, and reactive species; changes in solar output; particulate emissions from volcanic eruptions; changes in tropospheric and stratospheric ozone; and other influences required to provide a complete picture of the climate over the last century. Where multiple simulations were available, the first was used here (run 1).
To represent a broad range of alternative climate futures, simulations corresponding to the IPCC Special Report on Emission Scenarios (SRES) higher (A1fi) and lower (B1) emission scenarios were used (Nakićenović et al., 2000). These scenarios describe internally consistent pathways of future societal development and corresponding emissions, with atmospheric CO2 concentrations reaching approximately 550 ppm (B1) and 990 ppm (A1fi) by 2100.
20C3M simulations only cover the period 1960–1999, in order to have a longer range of historical simulations we extended this period by 10 years by including 2000–2009 simulated output from the A2 SRES scenario. We find this to be a reasonable approach because the inertia of the climate system delays its response to forcings from increased greenhouse gasses and other factors identifying each scenario and there is not much difference between the scenarios over the first decade of the century (Stott and Kettleborough, 2002).
4. Model evaluation
4.1. Creation of independent simulated historical time series
To evaluate the three versions of ARRM (linear, simple piecewise, and full piecewise), 50 years' worth of data and historical total forcing simulations from 1960 to 2009 were used to build downscaling models for daily temperature and precipitation for 20 long-term weather stations across North America. N-fold cross-validation, or jackknifing, was used whereby the downscaling model was trained on all but one of the years, then used to predict values for the remaining year. ARRM builds a separate model for each of the 12 months of the year, so this process was repeated until 600 independent 1-month simulated daily time series had been generated for each location, independent of the observations used to train the statistical model. These were then combined into a single 50-year time series for evaluation.
Use of cross-validation in creating the historical simulated time series to be evaluated against observations is a crucial aspect of the evaluation. If the statistical model had been trained on all 50 years and then used to predict those same 50 years, comparing the resulting time series with observations would simply reflect the ability of the regression function to fit the data. The results of such an evaluation would be improved by overfitting, for example, by allowing the piecewise regression function to fit an infinite number of knot points to the quantile–quantile relationship. In contrast, by generating an independent time series, the evaluation instead reflects the ability of the model to recreate observations that were not used to train the model. The results of such an evaluation are degraded by overfitting that makes the model less generalizable. The split-sample approach, whereby observational data are divided into a training and evaluation period, is commonly used to evaluate statistical downscaling methods in the literature. The ability of the statistical model to reproduce observed natural variability at a given location, however, depends on the degree to which it is able to sample from that variability in both training and evaluation. The split-sample approach limits the sample size of both the training and observation periods (typically, N/2 years each), whereas the jackknifed cross-validation approach used here, with a training period of N − 1 years and an evaluation period of N years, more closely approximates the skill of the full dependent sample model that will be used to downscale future projections. As the purpose of downscaling is to ‘recreate’ future conditions that cannot be used to train the model, we argue that the type of evaluation done here is more relevant to assessing the performance of a downscaling model. This is somewhat similar to a bootstrapping approach (Li et al., 2010).
4.2. Evaluating temperature downscaling
The overall skill of the downscaling models is assessed in terms of their ability to reproduce the observed annual distribution (through comparing probability distribution functions), the RMSEs compared to observations, and the absolute value of the bias in the 0.1th, 1st, 10th, 25th, 50th, 75th, 90th, 99th, and 99.9th quantiles. Model projections are also compared (although not evaluated) for end-of-century under the SRES A1fi (higher) and B1 (lower) emissions scenarios.
To gain a qualitative perspective on the downscaling, we first compare observed, AOGCM-simulated (nearest grid cell), downscaled (training period), and downscaled (independent evaluation period) maximum and minimum temperature distributions for the coastal location of Half Moon Bay, CA (Figures 5 and 6), for which the simulated and observed temperature distributions differ noticeably. The three rows correspond to the three versions of the downscaling model (linear, simple piecewise, and full piecewise). The three columns show AOGCM predictions, predictions from training the downscaling model on all 50 years of data, and the independent cross-validation predictions, derived by the method described above. Identical figures for the remaining 19 stations and other graphics not included in this publication are available online (http://temagami.ttu.edu/arrm/).
For this location, all AOGCMs simulate a wide distribution for maximum temperature with two peaks near 10 and 28 °C (Figure 5). In contrast, the distribution of observed maximum temperatures is narrow and only has one peak near 17 °C. The HadCM3 distribution is additionally skewed towards lower temperatures. One reason for the large difference between observed and simulated distributions is due to the landmask in the AOGCMs, which can have anything from 0 to 100% land fraction in coastal grid cells, differing between AOGCMs. Table 2 gives land fraction values for grid cells used to downscale stations near the coast. The grid cell downscaled to Half Moon Bay has only partial land coverage in most models (PCM: 15.2%, CCSM3: 53.8%, and GFDL-CM2.1: 84.2%) and is a complete ocean grid cell in the HadCM3 model. Predictions might be improved by selection of a different AOGCM grid cell; however, the purpose here is not to generate optimal predictions but rather to test the ability of the downscaling method to correct AOGCM output. From that perspective, using a near-shore grid cell to simulate coastal conditions represents a greater challenge for the model, and all three versions of the downscaling model are able to approximate observations for these grid cells, narrowing the simulated distribution and removing the double peaks. The linear model is able to capture the general shape of the observed distribution, but underestimates high temperatures towards the tail of the distribution. This is improved upon by the simple piecewise model and almost completely resolved by the full piecewise model. There is little difference between the results for the training (middle column) and independent (last column) predictions, indicating that the downscaling model does not overfit and is successful at simulating observed conditions outside the training period.
Table 2. Fraction of land (in percent) of AOGCM grid cell used to downscale each station for the four AOGCMs. Values are given only for stations near a coast as the percentage of land in grid cells used to downscale inland stations were all 100%
Garden City, NY
Half Moon Bay, CA
Moosehead Lake, ME
There are some differences between maximum and minimum temperature (Figure 6). First, AOGCM distributions for minimum temperature more closely resemble observed, although generally skewed towards cooler when compared with warmer values. Second, all three downscaling models perform well at the tails of the distribution, but the peak of the distribution is better resolved by the two downscaling models that apply the piecewise regression technique.
Figure 7 compares the RMSE in maximum temperature across the entire distribution for all 20 stations. Applying any of the three downscaling models greatly reduces RMSEs compared with raw AOGCM outputs, which in most cases are an order of magnitude larger. Moreover, the downscaling process is able to transform a broad range of AOGCM predictions into distributions closely resembling observed. For all 20 stations, downscaling reduces the RMSE of simulated historical values from 2 to 8 °C down to less than 0.5 °C. Refining the downscaling technique by applying piecewise regression further decreases the residuals. There is little difference between RMSEs of the simple piecewise and full piecewise regression methods as improvements due to bias removal tend to be offset by removal of negative slopes. Results for minimum temperature (not shown) are similar, except that the RMSE values for AOGCMs tend to be lower, confirming the indication from Figures 5 and 6 that these AOGCMs are generally better at simulating daily minimum when compared with maximum temperatures, regardless of location.
The results of this evaluation are summarized by scatter plots of downscaled versus AOGCM RMSE (Figure 8). Applying downscaling reduces the spread of RMSEs noticeably with the linear version of the downscaling model, and even further when piecewise regression is added to the downscaling model, with RMSE values below 0.5 °C for temperature and below 10 mm for precipitation. For both simple piecewise and full piecewise downscaling models, the majority of points are clustered between 0.2 and 0.3 °C for temperature and between 1 and 5 mm for precipitation (the far outlier for precipitation for the simple model is HadCM3 downscaled for Phoenix, AZ, with an RMSE value of 76.2 mm), indicating that this level of bias is most likely the limit to the ability of this particular type of statistical downscaling model, within the range of natural variability represented in the training dataset.
The third measure used to evaluate the downscaling methods is by examining the bias in the 0.1th, 1st, 10th, 25th, 50th, 75th, 90th, 99th, and 99.9th quantiles (Figure 9). Bias in AOGCM output is generally an order of magnitude larger than bias downscaled output, regardless of downscaling technique. There is no consistent tendency for AOGCM biases to be larger for certain quantiles, but downscaled quantiles tend to be slightly larger for extreme when compared with median quantiles.
Figure 9 also shows the percentage missing data in the observations for each station. Even for locations with a very high percentage of missing data (Bridgeport and Moosehead Lake) downscaling is able to improve on AOGCM output, although the resulting biases reflect the uncertainty from the very small sample size of the data used to train the statistical models.
Comparing the reduction in biases in the lowest, middle, and highest quantiles of maximum temperature achieved by downscaling from AOGCM outputs for the cross-validation results shows that using the linear downscaling method noticeably reduces the range in bias relative to AOGCM output for the median quantiles, but not for more extreme quantiles (Figure 10; results for minimum temperature are similar, not shown). Incorporating piecewise regression makes little difference to the 50th quantile when compared with the linear model, but significantly reduces biases in more extreme quantiles. This suggests that the piecewise regression technique's primary improvement for temperature, compared with a linear model, is in downscaling extreme values.
4.3. Evaluating precipitation downscaling
To gain a qualitative perspective on precipitation downscaling, we first compare observed, AOGCM-based, and downscaled distributions of the natural logarithm of precipitation for 1960–2009 for Kentland, IN (Figure 11). The left column of plots shows the tendency of AOGCMs to drizzle on the left-hand side of the distribution and underestimates the magnitude of high-precipitation extremes on the right-hand side of the distribution (e.g. 150 vs 400 mm). The AOGCMs also fail to simulate the double-peaked distribution common to many stations, including Kentland.
The linear version of the downscaling model corrects the lower tail, partly corrects the higher tail (although it introduces some very high-precipitation values), and does not correct for the two peaks in the distribution. Incorporating piecewise regression resolves the peaks but introduces artificially large extreme values that are corrected in the full piecewise method that includes bias correction.
Figure 12 compares RMSE values for all 20 stations between observations, AOGCM output, and downscaled simulations for the evaluation period. For almost all locations, applying the linear downscaling model increases RMSEs relative to AOGCM output. This is most likely due to the linear model simulating extreme values that are too high but carry more weight in the overall RMSE calculations. Piecewise regression corrects the high-end bias and in almost all cases reduces RMSE relative to AOGCM output.
Absolute bias (in percent) in the same nine quantiles as used for temperature (Figure 13) shows that for all nine quantiles, biases are generally small, for the full model the bars are barely visible for most stations for all nine quantiles. Plotting real-value quantile biases for the 0.1th, 50th, and 99.9th quantiles (Figure 14) shows again that biases are very minimal for the lower and middle quantiles, with larger values for the highest quantile. AOGCM biases in the 99.9th quantile are nearly all negative, i.e. AOGCMs underestimate extreme precipitation accumulation in almost all 20 locations examined here. This is not surprising, given that AOGCM values are averaged over a large area whereas observations are for point sources.
5. Future projections
The purpose of most downscaling models is to generate future projections more representative of individual locations than current AOGCMs can provide with grid cell-sized information. Here, we compare the results of AOGCM simulations with ARRM downscaled future projections using the entire historical period (1960–2009) to train each model.
5.1. Maximum temperature
Figures 17 and 18 show the change in downscaled versus raw AOGCM daily maximum temperature for 2070–2099 relative to the historical period observations (1960–2009) for the three temperature downscaling models (linear, simple piecewise, and full piecewise) and 0.1th, 50th, and 99.9th quantiles. Under the higher A1fi scenario (Figure 15), the most obvious difference between raw AOGCM versus downscaled future changes is that downscaling produces only positive changes (i.e. increases) in all three quantiles illustrated [with the exception of one station (Half Moon Bay, CA) for the linear model and 99.9th quantile], whereas raw AOGCM changes are both positive as well as negative for these three quantiles, indicating that the raw output projects warming for some locations and cooling for others at the end of the century. For the lower B1 scenario (Figure 16), more stations also show warming at the end of the century after downscaling compared with raw AOGCM results, especially for the middle and upper quantiles. Some cooling is projected for the lowest quantiles, indicating that some stations might see a wider distribution of daily maximum temperature at the end of the century with more extremes in both ends of the distribution.
Figure 17 shows the mean AOGCM absolute 2070–2099 daily maximum temperature changes, relative to 1960–2009, in each of the nine quantiles for the A1fi and B1 scenarios for raw (light-coloured bars) and downscaled output (dark-coloured bars). There is overall a general agreement among the 20 stations that a greater change in the 50th quantile is projected for the A1fi scenario than for the B1 scenario (Loreto, MX, and Hialeah, FL, being the only exceptions – note that these stations also have low land fraction in all four AOGCMs). However, there is no general tendency for the mean change to be more or less for downscaled versus raw AOGCM output, with some locations, such as Atlanta, GA, and Bridgeport, WV, showing a larger change projected for the A1fi scenario than for the B1 scenario (Loreto, MX, and Hialeah, FL, being the only exceptions). Similarly, projected changes in higher quantiles from raw AOGCM can be higher than downscaled for certain locations and lower for others. This indicates that downscaling produces results specific to each location, as opposed to the more general AOGCM grid cell output.
Figure 18 shows the 2070–2099 relative to 1960–2009 raw AOGCM versus downscaled precipitation changes in the 0.1th, 50th, and 99.9th quantiles for the three versions of the downscaling model for the A1fi scenario, given as RMSE differences. Unsurprisingly, there is less than 1 mm change in predicted changes for the 0.1th quantile for all 20 stations for both raw AOGCM and downscaled projections. The reason for the fixed RMSE values for the 0.1th quantile for the raw AOGCMs is due to weather stations not reporting trace precipitation, which is set at 0.005 inches (0.127 mm). The higher frequency of low precipitation events, compared with higher precipitation events, in most locations causes the 0.1th quantile to almost always equal to the lowest recorded or simulated precipitation value. The lowest simulated value in AOGCMs, when rounded to the nearest 2 decimals, is 0.01 mm, because AOGCMs do not allow for ‘trace’. All but one stations have a lowest value of 0.2 mm (when converted from inches), whereas one station (Bridgeport, WV) has 0.1 mm as the lowest recorded value, which is the cause for that station not being in agreement with the others in the bias plot (Figure 18). Under both scenarios, AOGCM outputs project little to some (up to about 8 mm) decrease in the amount of precipitation comprising the median quantile, whereas when downscaled the same quantile shows less than 2 mm change from current conditions, with few exceptions, for both scenarios. The largest change is in the 99.9th quantile for both scenarios, with up to several hundred millimetres change from current extreme conditions. For raw AOGCM projections, future extreme precipitation amounts appear to have decreased, whereas when downscaled, the same locations show a large increase in precipitation extremes, especially for the A1fi scenario (Figure 18). This is most likely due to poor simulation of precipitation at the local scale by AOGCMs and is corrected by applying the statistical downscaling model, which is trained on historical temporal precipitation variability for each location. The linear version of the downscaling model produces very large, up to about 1750 mm, increases in extreme precipitation events, whereas the full piecewise downscaling model produces more moderate, but still large – up to about 300 mm increases in extreme events. The numbers are very similar, although slightly smaller, for the B1 scenario (not shown here, but available at http://temagami.ttu.edu/arrm/).
Absolute precipitation changes for the 99th and 99.9th quantiles are shown in Figure 19 for both A1FI and B1 scenarios, averaged across all four AOGCMs. For some stations, such as Hialeah, FL, Loreto, MX, and Vernon, AL, there is a substantial difference between extreme event projections for raw versus downscaled AOGCMs, with the raw AOGCM generally projecting larger absolute changes, compared with present conditions, than downscaled projections. The results of the cross-validation evaluation suggest that more confidence could be placed in the downscaled projections when compared with raw AOGCM output, because downscaled projections are tailored to each individual location.
The ARRM is an empirical statistical downscaling model capable of downscaling local projections of temperature and precipitation to both station-based observations and spatially gridded observations. Quantile regression, the method on which ARRM is based, is unique in that it builds a regression model based on matching the quantiles of the observed and simulated time series as opposed to matching corresponding day-to-day data points, which is the basis for many other regression-based statistical downscaling studies (Wilby et al., 1998; Huth, 1999, 2002; Wilby and Wigley, 2000; Huth et al., 2001; Boé et al., 2007; Kostopoulou et al., 2007). ARRM adds to this by using a piecewise regression model instead of a straight linear regression, which improves its ability to simulate more extreme temperatures and precipitation, one of the major issues with other downscaling methods (Huth, 1999; Goodess et al., 2012).
The downscaling model was evaluated based on cross-validation of three different (linear, simple piecewise, and full piecewise) versions of both the temperature and precipitation models. Each version was evaluated in terms of three different quantities: the distributions, giving a visual image of the skill each model; the RMSE; and bias in a range of quantiles.
The addition of piecewise regression, instead of straight linear regression, was found to have the largest impact on the performance of the method. The largest biases were found to be near the tails of the distribution, primarily due to data sparseness. Some sensitivity to station location was found in the linear versions of the downscaling model, but the addition of piecewise regression was able to eliminate much of this.
For future projections, the spread among projected temperature increases is generally narrower for downscaled temperature compared with raw AOGCM projections for the three quantiles shown, with more stations showing positive temperature changes after downscaling than before, for both higher A1fi and lower B1 scenarios. Downscaled projections of precipitation show smaller changes for the 50th quantile than raw AOGCM projections, for both A1fi and B1 scenarios, but slightly larger changes in extreme events, with projected changes being generally greater under the A1fi scenario than the B1 scenario.
Evaluating the ability of ARRM to reproduce observed temperature and precipitation at 20 stations across North America shows that the statistical downscaling model is able to reproduce values from the 0.1th to the 99.9th quantiles with biases generally below 1 °C and 5 mm. Downscaling future projections can alter the sign of AOGCM-simulated changes and usually narrows the range of projected changes across multiple AOGCM simulations.
The ultimate purpose of the ARRM framework is to allow for user selection from a broad range of predictors and predictands to efficiently downscale either point source or gridded observations of any observed climate variable with a Gaussian-like distribution that can be predicted from large-scale AOGCM output fields. Model performance for station-based temperature and precipitation downscaling appears sufficient to support continued development of such a generalized model. Future work will describe the application of this model to gridded datasets and to downscaling solar radiation and relative humidity.
This work was supported by the National Science Foundation, under NSF contract number DMS-0724377, and the U.S. Geological Survey, under contract number G10AC00248. Station data were obtained from the Global Historical Climatology Network, and AOGCM output fields were retrieved from the Earth System Grid.