Statistical downscaling methods (SDMs) are often used to increase the resolution of future climate projections from coupled atmosphere-ocean general circulation models (GCMs). However, SDMs are not able to capture small-scale dynamical changes unresolved by GCMs. For this reason, we propose a two-step generalized validation process to evaluate the performance of any statistical downscaling method relative to regional climate model (RCM) simulations driven by the same GCM fields. First, we compare historical station-based observations with simulations obtained from a statistical model fitted to and driven by reanalysis fields, and then driven by historical GCM fields. Then, the SDM is required to produce future projections consistent with RCM simulations used as pseudo-observations under future emissions scenarios. Using the climate extension of the fifth generation Penn-State/NCAR Mesoscale Model (CMM5) driven by NCAR/DOE Parallel Climate Model (PCM) simulations, we apply this method to identify the strengths/weaknesses of a nonhomogeneous stochastic weather typing method.
 To assess the likely magnitude of climate change over the coming century and its resulting impacts, we rely on simulations from GCMs, driven by plausible scenarios of future emissions from human activities. Computational constraints currently limit most century-scale GCMs simulations to spatial resolutions on the order of a degree or more. However, the spatial scale of the information used to investigate the impacts of changing climate on a range of human and natural systems (including water resources, energy, infrastructure, agriculture, ecosystems) can affect the magnitude and even the sign of the potential impacts [e.g., Hayhoe et al., 2004, 2006]. Hence, high-resolution climate projections at scales appropriate to the impacts being examined are essential to accurately determine the potential impacts of future climate change.
 To obtain high-resolution projections, both dynamical RCM-based and statistical downscaling methods are commonly used. Using GCM fields as boundary conditions, RCMs dynamically simulate regional climate processes at scales on the order of 5 to 50 km [e.g., Liang et al., 2006]. However, RCMs are equally if not more computationally intensive than GCMs, and furthermore often require rarely-available 6hr GCM fields as input. Hence, RCMs can only be applied to limited regions and time periods. In contrast, SDMs provide relatively fast and (generally) less computationally-intensive simulations of local climate, and can be derived based on monthly and/or daily GCM output fields. Due to their flexibility and lower computational cost, statistically-downscaled projections are often used as alternatives to regional modeling in many impact assessments [e.g., Wood et al., 2004; Hayhoe et al., 2004, 2006]. However, statistical methods are inherently limited in that they assume present-day relationships between large-scale patterns and local-scale climate will continue to be valid under future climate change [e.g., Wilby et al., 1998], an assumption that cannot be directly tested at present.
 Due to these inherent limitations, we propose here a generalized validation process to test whether any SDM can be appropriately applied to produce higher-resolution projections of a given climate variable from future GCM simulations. This validation test is explicitly designed to assess the potential weaknesses of statistical downscaling.
 First, we evaluate the ability of the SDM to reproduce observed climatology when driven by reanalysis fields and by historical GCM simulations. This step is based on a number of studies that have already assessed the ability of SDMs to reproduce historical observations when fitted to reanalysis fields and driven by reanalyses [e.g., Huth, 1999; Robertson et al., 2004; Vrac et al., 2007] or historical GCM fields [e.g., Wilby and Wigley, 2000; Charles et al., 2004].
 Second, we test the validity of assuming static dynamical relationships at the regional and local scale by requiring the SDM to produce future projections consistent with RCM simulations used as pseudo-observations driven by the same GCM under a range of future emissions scenarios. Similar comparisons have also been undertaken by several studies that have compared the behavior of SDMs under future climate change to GCM output directly [e.g., Frias et al., 2006] or RCM fields [e.g., Wood et al., 2004; Busuioc et al., 2006; Haylock et al., 2006].
 In the sections that follow, we first describe the proposed validation process and data requirements. We then apply this validation process to a specific SDM, our example being a nonhomogeneous stochastic weather typing (NSWT) approach that has demonstrated efficiency and the ability to generate local precipitation features [Vrac et al., 2007]. Finally, we conclude by discussing the implications of the validation process itself for assessing regional climate projections, as well as what it revealed about the NSWT method.
2. A Two-Step Validation Method for Statistical Downscaling
 The validation process that we propose consists of the following two steps:
 (1) SDM performance is assessed in terms of its ability to reproduce the climatological present characteristics of the variables of interest at a given location when driven by reanalysis or historical GCM fields:
 (a) Fit the SDM to historical (e.g., 1990–1999) large-scale upper-air atmospheric reanalysis fields and surface weather station observations (obs).
 (b) Drive this fitted SDM method with temporally-independent reanalysis fields to generate a simulated time series for the same local surface variables as were used from the obs records to fit the SDM in step (a).
 (c) Compare the statistical properties of the obtained time series with those of local obs. Good agreement implies that the SDM can reconstruct the climatology of observed local variables when driven by large-scale observations.
 (d) Drive the SDM fitted to obs (in step (a)) with historical GCM fields to generate new local time series.
 (e) Compare them with the observed local-scale time series. Good agreement implies that GCM fields can adequately simulate local variables when used to drive an SDM fitted to obs (step a), and hence may continue to do so in the future.
 (2) The ability of the SDM to capture future spatial and/or temporal local-scale changes is assessed. RCM outputs are employed as pseudo-observations or proxies of future conditions. Both RCM and SDM methods must be driven by the same GCM simulations.
 (a) Fit the SDM to historical GCM simulation fields and surface variables derived from individual RCM grid-cell outputs for the same time period.
 (b) Drive it with future GCM simulations from multiple emissions scenarios to generate statistically-downscaled time series for each RCM grid-cell.
 (c) Compare the SDM-generated future time series with the future RCM-based time series of surface variables. Satisfactory agreement implies confidence that the SDM is able to capture a similar climate change signal to that simulated by the RCM, despite its assumption of static dynamical relationships.
 In order to compare RCM- and obs-fitted SDMs, Charles et al.  recommended comparing respective parameterizations. We omit this step from our generalized validation methodology as being overly-specific to the SDM being validated. Moreover, a good agreement between the parameterizations does not necessarily imply that GCM simulations will be able to accurately drive an SDM fitted to observations if, for example, the GCM outputs are significantly different from the reanalysis data. For this reason, we instead propose that the first step compare observed time series with simulated ones obtained through an obs-fitted SDM driven by historical GCM outputs.
3. Data Requirements and Application to NSWT-Based Precipitation Downscaling
 The generalized validation method presented here requires four primary sources of observational data and model output. First, we require continuous daily upper-air and surface GCM output fields for the historical and future time periods of interest. We also require reanalysis output fields for the same variables, covering the same historical period as the GCM simulations. Third, regional model simulations driven by these same GCM output fields are needed, covering at minimum one 10-year historical period (here, 1990–1999) and one 10-year future time period (here, 2090–2099) for multiple future climate scenarios (here, the SRES A1FI higher and B1 lower emissions scenarios). Lastly, we require a continuous time series of daily observations of the surface climate field of interest for the same historical time period as the RCM simulations. The longer the time periods available for calibration and validation, the more robust the statistical relationships that can be derived, covering a wider range of climate conditions and reflecting both average climate as well as some degree of change; hence, 10 years should be viewed as a minimum requirement.
 We here attempt to downscale daily precipitation at 37 weather stations in Illinois. Daily observations for these locations are provided by the National Weather Service Co-op Observer Program. For RCM simulations, we rely on the CMM5 model, a climate extension of MM5 v3.3 [Liang et al., 2004; J. Dudhia et al., PSU/NCAR mesoscale modeling system tutorial class notes and users guide: MM5 modeling system version 3, available at http://www.mmm.ucar.edu/mm5/, 2000]. RCM simulations for the 1990s and 2090s (under the SRES A1FI and B1 scenarios) are driven by 6hr temperature, humidity, wind and other upper-air fields generated by the NCAR-DOE Parallel Climate Model (PCM, [Washington et al., 2000]), which is a low-sensitivity GCM with a spatial resolution of T42 or approximately 2.8° × 2.8°. NCEP/NCAR daily reanalyses fields, originally at 2.5° × 2.5° spatial resolution, were regridded to the 2.8° × 2.8° resolution of the PCM model. RCM time series were available only from April 1st to August 31st for each year, hence for consistency we confine all analyses to those months.
 Based on these data, we apply the validation method to evaluate the ability of a NSWT method based on a nonhomogeneous Markov model (NMM) that represents the transitions between regional daily precipitation states, as described by Vrac et al. , to model local-scale precipitation occurrences and intensities. Applying the two-step validation process described in Section 2 to the NSWT model, we first fit the NSWT to observed precipitation at the 37 weather stations and to the regridded 2.8° × 2.8° 1990–1999 NCEP reanalyses. A hierarchical ascending clustering method is applied to the observed precipitation [Vrac et al., 2007], yielding four primary Apr–Aug precipitation patterns, or states (see Figure S1a of the auxiliary material). State 2 corresponds to the smallest intensities of rainfall, while state 3 is associated with moderate precipitation over the whole Illinois region (with a slight South–North gradient). States 1 and 4 display almost mirror-image structures, with moderate intensities in northern (southern) Illinois and strong precipitation in the southern (northern) part of the state, respectively.
 The NSWT method is then conditionally fitted to reanalysis fields and the precipitation time series of each of the 37 locations given these patterns. Based on Vrac et al. , we select three large-scale atmospheric variables - geopotential height, specific humidity and dew point temperature depression, all at 850 mb. This level was chosen as in the summer months, convective overturning dominates the Midwest due to interactions between the upper (200mb) and lower (>850 mb) jets, inducing baroclinicity and maximizing convective activity.
3.1. Validation Step 1: SDM Performances on Present Climate
 The first historical validation steps (step 1(b–c)) have already been performed for this NSWT method by Vrac et al. . The results indicate that the NSWT is able to reproduce key temporal characteristics of local precipitation times series over IL when driven by NCEP reanalyses. We next use 1990–1999 PCM output fields to drive the NSWT method to produce downscaled precipitation time series at the 37 locations for the same time period as the NCEP reanalysis (step 1(d)). The local distribution probabilities of rainfall intensity resulting from the historical PCM-driven SDM as well as the PCM-driven RCM simulations are compared with observed precipitation probabilities in two different ways.
 First, the 10th, 25th, 50th, 75th, 90th, and 99th percentiles (representing the range from low through moderate to “extreme” precipitation events) from the PCM-driven NSWT simulated precipitation are plotted for all 37 stations against observations in Figure 1a.
 This comparison highlights the superior ability of the RCM to translate PCM upper-air fields into local precipitation estimates, with some scatter likely due to the comparison being made between RCM gridcells and station observations, as well as model limitations. For lower precipitation intensities (i.e., 10th to 50th percentiles), the SDM method shows a tendency to over-estimate the percentiles; in other words, it appears to perpetuate the known tendency of GCMs to “drizzle”, producing too much precipitation at the lower end of the spectrum. For higher precipitation amounts, although more distributed around the y = x line (i.e. observed percentile) than the smaller percentiles, the SDM percentiles for the 37 stations show a little under-estimation with slightly larger standard errors than the RCM-simulated percentiles.
 Focusing on individual station plots, the “quantile-quantile plots” or QQplots (Figures S2a–S2c of the auxiliary material) similarly reveal a consistent but slight overestimation of the GCM-driven SDM quantiles caused by an underestimation of the local “no rain” probabilities that is clear at low rainfall values (i.e. near 0), when too many wet days (i.e. not enough dry days) are simulated by the NSWT method driven by PCM. This indicates that PCM fields are not able to drive the probabilities of local rain accurately, as compared to historical observations. Hence, the associated model-based wet and dry spell probabilities (Figure S3 of the auxiliary material) are in relatively poor agreement with observed, and this GCM-driven SDM method should not be applied to estimating likely distributions of wet/dry days only. However, except near 0, most of the GCM-driven obs-fitted SDM vs. obs QQplots are parallel to the y = x line, meaning that the distributions of the positive values of the local rainfall intensities are correctly simulated for precipitation values >0. This implies that the GCM-driven SDM method can produce realistic daily precipitation values for wet days.
3.2. Validation Step 2: SDM Assessments Under Climate Change
 For the second validation step, the NSWT SDM is fitted to 1990–1999 PCM and RCM outputs. Although the optimal number of precipitation patterns obtained through fitting to historical RCM simulations is four as in the previous analysis, these patterns (Figure S1b of the auxiliary material) are noticeably different from those based on observations. The mean intensity of each pattern is slightly different, and the S-to-N gradient of rainfall intensity present in obs-based pattern 3 is now a N-to-S gradient. The boxplots of precipitation intensities for these two types of patterns (Figure S4 of the auxiliary material) show that, despite similar general structures (proving the relative quality of CMM5-simulated precipitation), differences do exist in the rainfall intensity distribution inside each pattern. However, this does not mean that our SDM, when driven by GCM outputs, will be unable to capture an RCM-simulated climate change signal. To test this point, NSWT is then driven by 2090–2099 PCM simulations to statistically generate time series of precipitation at the 37 (RCM) locations for the SRES A1FI (higher) and B1 (lower) emission scenarios. This time, the precipitation percentiles for the SDM method for the two scenarios are plotted against the RCM-based percentiles for the same future time period (Figure 1b). We see that the SDM method is able to capture the RCM-simulated change in precipitation percentiles, particularly for amounts up to the 50th percentile (implying that there is a slight increase in smaller summer rainfall events simulated by the RCM that the SDM method is also able to reproduce). However, there is also a systematic bias towards “extreme” (i.e., 90th percentile or higher) precipitation percentiles using the SDM method as compared with the RCM: the RCM simulates higher percentiles for the heaviest precipitation events, while the SDM method systematically under-estimates the magnitude of these “extreme” events. One of the reasons could be the relatively short training period, since over ten years, the 99th percentile of rainfall intensity is not encountered often.
 The associated complete QQplots (Figures S2d–S2i of the auxiliary material) show that, for most stations, the simulated precipitation time series do display distributions close to the RCM-based ones up to the 90th quantile. However, again a slight tendency of the NSWT to underestimate model-based quantiles for the future scenarios as compared with RCM simulations is revealed, particularly at the higher quantiles. Moreover, in general, the QQplots are in better agreement for the B1 scenario than for A1FI, where the B1 scenario corresponds to smaller climate forcing than A1FI. This suggests that, besides the short training period (10 years), some of the bias, particularly for higher precipitation amounts, may be due to climate-driven changes in smaller-scale dynamical processes that are not captured by an SDM method.
 The same is true for the wet and dry spell probabilities (Figure S5 of the auxiliary material) which are closer to the RCM-based spell probabilities for B1 than for A1FI. In particular, a tendency for the SDM to underestimate the probability of longer spells is discernible for some stations, mainly under the A1FI scenario. However, overall these results show a good agreement between pseudo-observations (i.e. RCM outputs) and simulations. Hence, Figure 1 and the extensive QQplots analysis (auxiliary material) demonstrate that the NSWT approach is able to capture temporal characteristics of a future climate change signal as simulated by this RCM, particularly for precipitation amounts below the 90th percentile.
 The two-step validation method presented here encompasses and standardizes many tests performed in the statistical downscaling literature to assess confidence in the ability of any statistical downscaling approach to generate future climate projections relative to both historical observations and future RCM-based simulations.
 The first step in this method assesses SDM performance over a historical period when driven first by reanalyses and then by GCM fields. The general issue of agreement between GCM and a historical observations-fitted statistical model relates directly to any downscaling method so should be viewed as essential whenever GCM outputs are used to downscale future climate variables. This step represents an improvement over studies which rely on statistically-based projections of regional climate changes but do not evaluate the present-day ability of the statistical method to simulate observed climate statistics when driven by historical GCM fields.
 The second step tests whether the SDM is capable of producing future projections consistent with RCM simulations used as “pseudo-observations” driven by the same GCM under a range of future emissions scenarios. This step essentially validates whether the SDM can translate a climate change signal simulated by the GCM into the same regional features as the RCM, hence providing a tool that could provide comparable or complementary information to RCM simulations over a given region, but at a much lower computational cost. It is important to note that the SDM is being judged here relative to future RCM simulations only and not relative to “real” future observations. Nevertheless, we consider it unlikely that the SDM would work well when fitted to the data when it does not work well when fitted to RCM outputs since the GCM and the RCM are more closely linked than the GCM and the actual climatology.
 As an example, we have applied this method to evaluate the ability of a NWST approach to downscale daily precipitation distributions for 37 stations in the state of Illinois. For this method, the first validation step confirms that this SDM can reproduce historical precipitation when driven by NCEP reanalysis, and distributions of positive rainfall intensities (although not wet/dry day distributions) when driven by historical PCM simulations. The GCM-driven SDM method does tend to have larger precipitation quantiles than observations at lower quantiles, perpetuating the tendency of the GCM to produce too frequent low-rain days.
 Step 2 shows that the NSWT method is capable of capturing the climate change signal as simulated by RCM output, particularly for quantiles below the 90th percentile level, i.e., lower to moderate precipitation amounts. Better agreement between SDM- and RCM-based precipitation projections under a smaller climate forcing (represented here by the SRES B1 scenario) as compared with larger forcing (represented by the A1FI scenario) suggests that projected changes in summer precipitation over this region may be influenced by some of the smaller-scale dynamical processes not captured at the scale of GCM fields. Based on these results, we therefore recommend that the NSWT method driven by PCM simulations be used for lower climate forcing only and/or for median quantiles.
 Furthermore, to maximize the utility of evaluations we propose, the validation should take advantage of RCM simulations based on multiple GCMs of differing sensitivity or, as in this work, based on GCM simulations driven by higher and lower future emissions scenarios. Differing degrees of future climate change may affect not only temperature but also precipitation and extremes [e.g., Tebaldi et al., 2006].
 Although this research has been funded in part by the United States Environmental Protection Agency through STAR Cooperative Agreement R-82940201 to the University of Chicago, and by the US EPA Science to Achieve Results Award RD-83096301-0, it has not been subjected to the Agency's peer and policy review and therefore does not necessarily reflect the views of the Agency, and no official endorsement should be inferred. We also acknowledge NOAA/ESRL/GSD and NCSA/UIUC for the supercomputing support.