Reviews of Geophysics

Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user

Authors


Abstract

[1] Precipitation downscaling improves the coarse resolution and poor representation of precipitation in global climate models and helps end users to assess the likely hydrological impacts of climate change. This paper integrates perspectives from meteorologists, climatologists, statisticians, and hydrologists to identify generic end user (in particular, impact modeler) needs and to discuss downscaling capabilities and gaps. End users need a reliable representation of precipitation intensities and temporal and spatial variability, as well as physical consistency, independent of region and season. In addition to presenting dynamical downscaling, we review perfect prognosis statistical downscaling, model output statistics, and weather generators, focusing on recent developments to improve the representation of space-time variability. Furthermore, evaluation techniques to assess downscaling skill are presented. Downscaling adds considerable value to projections from global climate models. Remaining gaps are uncertainties arising from sparse data; representation of extreme summer precipitation, subdaily precipitation, and full precipitation fields on fine scales; capturing changes in small-scale processes and their feedback on large scales; and errors inherited from the driving global climate model.

1. INTRODUCTION

[2] Global climate models (GCMs) are the primary tool for understanding how the global climate may change in the future. (Italicized terms are defined in the glossary, after the main text.) However, these currently do not provide reliable information on scales below about 200 km [Meehl et al., 2007] (for an illustration, see Figure 1). Hydrological processes typically occur on finer scales [Kundzewicz et al., 2007]. In particular, GCMs cannot resolve circulation patterns leading to hydrological extreme events [Christensen and Christensen, 2003]. Hence, to reliably assess hydrological impacts of climate change, higher-resolution scenarios are required for the most relevant meteorological variables.

Figure 1.

Average UK winter precipitation (mm/d) for 1961–2000 simulated by the Hadley Centre global climate model (GCM) HadCM3 and the regional climate model (RCM) HadRM3 at 50 and 25 km resolutions compared with gridded observations (E. Buonomo et al., unpublished data, 2009). The GCM does not provide regional precipitation information. The RCM reproduces basic regional structure but is limited in mountain areas (western UK); in addition, this particular RCM exaggerates the rain shadow effect (east Scotland).

[3] Downscaling attempts to resolve the scale discrepancy between climate change scenarios and the resolution required for impact assessment. It is based on the assumption that large-scale weather exhibits a strong influence on local-scale weather but, in general, disregards any reverse effects from local scales upon global scales. Two approaches to downscaling exist. Dynamical downscaling nests a regional climate model (RCM) into the GCM to represent the atmospheric physics with a higher grid box resolution within a limited area of interest. Statistical downscaling establishes statistical links between large(r)-scale weather and observed local-scale weather.

[4] During the last 2 decades, extensive research on downscaling methods and applications has been carried out. For a comprehensive overview of applications, see Christensen et al. [2007]; see also Prudhomme et al. [2002] and Fowler et al. [2007a], who focus on hydrology. Several reviews of downscaling methods have been published [e.g., Hewitson and Crane, 1996; Zorita and von Storch, 1997; Wilby and Wigley, 1997; Xu, 1999a; Hanssen-Bauer et al., 2005]. In addition to updating these methodological reviews, this paper aims to integrate different perspectives on precipitation downscaling, in particular, from meteorologists, climatologists, statisticians, and impact modelers such as hydrologists. As such, we focus on laying out concepts and discussing methodological advances.

[5] In general, the most relevant meteorological variables for hydrological impact studies are precipitation and temperature [Xu, 1999b; Bronstert et al., 2007]. For freshwater resources in particular, precipitation is the most important driver [Kundzewicz et al., 2007], though it is considerably more difficult to model than temperature mostly because of its high spatial and temporal variability and its nonlinear nature. The overall objective of this paper is to define a set of generic end user needs (in particular, for impact modelers) for downscaled precipitation and then to discuss how these needs are met by various downscaling approaches and what gaps are remaining.

[6] Statistical downscaling has received considerable attention from statisticians. Their contributions have, however, largely been unrecognized by the climate community, although they attempt to address important end user needs. An essential part of this paper is therefore to review recent statistical models that have been developed to improve the representation of spatial-temporal variability and extremes. We attempt to bring these recent approaches together with classical statistical downscaling methods and discuss differences and similarities between individual methods and approaches, as well as their advantages and drawbacks.

[7] Traditionally, statistical downscaling has been seen as an alternative to dynamical downscaling. With the increasing reliability and availability of RCM scenarios, recent work on statistical downscaling has aimed to combine the benefits of these two approaches. Under the name model output statistics (MOS), gridded RCM simulations are statistically corrected and downscaled to point scales. We describe MOS approaches in detail and discuss their relation to other statistical downscaling approaches.

[8] To seriously evaluate the skill of downscaling approaches to meet the end user needs, a quantitative evaluation is necessary. Therefore, an important part of the paper is a review of validation techniques.

[9] In section 2 we identify a set of generic end user needs. The state of the art in dynamical and statistical downscaling is presented in sections 3 and 4, respectively, and in section 5, validation techniques are introduced. Finally, in section 6 we discuss how the approaches presented in sections 3 and 4 meet the specific needs identified in section 2. In particular, section 6 seeks to address the following questions: How does dynamical downscaling address a particular end user need? How can MOS improve the RCM simulations and close potential gaps? How does statistical downscaling perform as alternative to dynamical downscaling? What are the remaining gaps? Sections 35 are quite technical in nature, while sections 2 and 6 are written to be accessible to the nonexpert.

2. NEEDS OF THE END USER

[10] Downscaling precipitation, in most cases, is not an end in itself but provides a product (in the form of data or information) to an “end user.” Their goal may be, for example, to understand and potentially act upon the impacts that are likely to be caused by a localized climate extreme or by a future change in the climate. End users range from policy makers, through planners and engineers, to impact modelers. As well as the product, the end user might also require a clear statement of the assumptions involved and limitations of the downscaling procedure, a transparent explanation of the method, a description of the driving variables used in the downscaling procedure and their source, a clear statement of the validation method and performance, and some characterization of the uncertainty or reliability of the supplied data. Fowler et al. [2007a] note that very few downscaling studies consider hydrological impacts and those that do seldom provide any consideration of how results might enable end users to make informed, robust decisions on adaptation in the face of deep uncertainty about the future. To be able to successfully make such a decision, nonspecialist end users (e.g., the policy maker) might benefit from including social scientists with experience in translating between nonspecialists and natural scientists [Changnon, 2004; Gigerenzer et al., 2005; Pennesi, 2007]. This communication process can ensure that the downscaled product can, in fact, be used as intended and is understood correctly. This paper mainly addresses the hydrological impact modeler, but sections 6 and 7, especially, provide useful information for other types of end user.

[11] In hydrological impact studies, whether using observed or simulated precipitation, assumptions about the spatial and temporal distribution of precipitation are required, and the pertinent question is what assumptions are appropriate given the nature of the specific problem being addressed. Hydrological impact analyses can have different objectives and hence focus on different components of the hydrological cycle. They are applied in differing environments (e.g., different climates, land use, and geology), and it is essential that the processes and pathways involved in a particular study area are well understood and represented in the model. Furthermore, they employ models of varying complexity and temporal resolution, depending on their purpose and model availability (e.g., empirical models on an annual base, “water balance models” on a monthly base, “conceptual lumped parameter models” on a daily base, and “process-based distributed parameter models” on an hourly or finer base [Xu, 1999a]). Therefore, the objective, study area characteristics, and type of model used will determine the sensitivity of the system to different precipitation characteristics (spatial and temporal distribution) and the form of the precipitation required (e.g., continuous time series, seasonal averages, and annual extremes).

[12] It is well established that the minimum standard for any useful downscaling procedure is that the historic (observed) conditions must be reproducible [Wood et al., 1997], but it is also necessary that the simulated conditions are appropriate for the particular hydrological problem being addressed. This can be achieved using a hydrological evaluation step in the downscaling procedure [Bronstert et al., 2007], whereby the usefulness of the climatic data to the hydrological impact analysis is assessed. Fowler et al. [2007a] suggest using a sensitivity study to define the climatic variables that need to be accurately downscaled for each different impact application. This should apply not only to different variables but also to different characteristics of particular variables, i.e., different precipitation indices.

[13] In sections 2.12.6, a set of generic end user needs is identified, giving specific examples. The skill of the various downscaling methods to meet these needs is described in section 6.

2.1. Regional and Seasonal Needs

[14] The needs of the end user will vary regionally and seasonally as a function of socioeconomic needs and pressures, land use, and the climatological context. Depending on the particular end user, in some regions it may therefore be important to provide reliable precipitation characteristics for a particular season. In monsoonal climates, such as the Indian subcontinent [Zehe et al., 2006] and West Africa [Laux et al., 2008], the prediction of the onset and strength of monsoon rainfall is critical for management of water resources and agriculture. In temperate climates there is much less of a seasonal pattern in rainfall, though seasonal evaporation can significantly impact the water cycle. For instance, groundwater resources in southeast England are recharged (i.e., replenished with water originating from precipitation, infiltrating and percolating through the overlying rock) primarily in the winter months when precipitation exceeds evaporation, whereas during the summer much of the precipitation is lost to evaporation. Therefore, under current climate conditions, resource availability is considered primarily a function of winter precipitation (see, e.g., the recharge models discussed by Ragab et al. [1997]). Herrera-Pantoja and Hiscock [2008] have suggested that under climate change potential winter recharge will increase, while summer recharge will reduce (reflecting changes in both precipitation and potential evaporation). Therefore, the impact in terms of flood or drought risk will depend on a more complicated balance of these two seasonal components. Mean summer rainfall is an important control on agricultural yield, while extreme rainfall events, especially during the summer, can damage crops, reduce pesticide efficiency, erode soil, and cause flooding, all of which have a negative impact on crop yield [Rosenzweig et al., 2001]. Therefore, agricultural impacts require reliable predictions of summer average and extreme rainfall conditions.

2.2. Event Intensity

[15] Many hydrological applications require continuous simulation and as such have a requirement for reliable precipitation intensities, from light to heavy events. Intensities are often characterized by their return level and return period. The return level is defined as the event magnitude which, in a stationary climate, would be expected to occur on average once within the return period. In this paper we refer to heavy precipitation as events having a return period of the order of months or a few years. The intensities of events with return periods of decades or centuries are rarely observed and probably exceed the range of observed intensities. To correctly assess such rare events, extreme value theory [e.g., Coles, 2001; Katz et al., 2002; Naveau et al., 2005] is necessary. We will refer to such events as extreme precipitation. In particular, extreme precipitation intensities are required for the design of urban drainage networks. The UK Department for Environment, Food and Rural Affairs sets a target of a 100 year return period protection for urban areas, prioritized on cost/benefit grounds [Wheater, 2006].

2.3. Temporal Variability and Time Scales

[16] Different temporal characteristics of precipitation are important depending on the catchment characteristics. The flooding in Boscastle, southwest England, in August 2004 was caused by 181 mm of rain which fell in 5 h [Wheater, 2006]. By contrast, groundwater flooding in Chalk catchments in Hampshire and Berkshire, south England, in 2001 was caused by the highest 8 month total precipitation in a record starting in 1883 [Marsh and Dale, 2002]. Daily precipitation totals during this period were unexceptional and not in themselves “flood producing” [Department for Environment, Food and Rural Affairs, 2001]. These are both examples of different types of extreme precipitation. In the case of Hampshire and Berkshire, it is necessary that the statistics of extremely long duration (up to 8 month) precipitation totals are projected reliably, while daily precipitation totals are much less important. In order to project the statistics of future flood events similar to the 2004 Boscastle flood, the downscaler should be able to supply reliable estimates of daily or even subdaily extreme precipitation. Another example where rainfall intensity over short durations is highly important is urban flooding [Cowpertwait et al., 2007].

2.4. Spatial Coherence and Event Size

[17] In principle, downscaling can provide point scale, areal average, or spatially distributed precipitation fields, though the latter is challenging. Which of these is required by the end user will depend on the extent to which the spatial structure of precipitation is likely to affect the response of a system under study. For example, in the context of rainfall-runoff modeling there is evidence that spatial structure is important for small, rapidly responding catchments and for catchments that are larger than the scale of typical precipitation events [Ngirane Katashaya and Wheater, 1985; Michaud and Sorooshian, 1994; Singh, 1997; Segond et al., 2007; Wheater, 2008], but other factors, such as catchment geology, may serve to damp out the effects [Naden, 1992].

2.5. Physical Consistency

[18] Many hydrological responses are affected by variables other than precipitation, notably evaporation and snowmelt (exceptions are short time scale responses to large rainfall events). Ignoring the coherence of these variables, i.e., treating them as though they were independent, may in some circumstances be inappropriate. In certain regions warmer winters might mean that precipitation falls as rainfall rather than snowfall, leading to lower snowmelt, lower spring-summer runoff, and hence potential drought risk [Rosenzweig et al., 2001].

2.6. Downscaling for Future Climate Change

[19] End user needs for future scenarios fall into two categories: projections of the long-term (several decades to 100 years) trend and predictions of variability over the next 1 or 2 decades. The long-term trend is important for design of flood defenses and general infrastructure, as well as strategic planning regarding agriculture, water resources, and water-related hazards. The prediction of shorter-term climate variability has more immediate applications, such as predicting crop yields.

3. HOW FAR HAVE RCMs COME?

[20] RCMs contain the same representations of atmospheric dynamical and physical processes as GCMs. They have a higher resolution (50 km or less) but cover a subglobal domain (e.g., Europe). Because of their higher resolution, RCMs typically require a reduced model time step (5 min or less) compared to GCMs (typically 30 min time step) to maintain numerical stability, although semi-Lagrangian semi-implicit RCMs such as the Canadian Regional Climate Model are able to use time steps as large as GCMs. RCMs are driven by winds, temperature, and humidity imposed at the boundaries of the domain and sea surface temperatures, supplied by the global model, which usually leads to large-scale fields in the RCM being consistent with the driving GCM.

[21] In general, the larger the domain size, the more the RCM is able to diverge from the driving model [Jones et al., 1995]. The consistency of large-scale features can be further increased by forcing the large-scale circulation within the RCM domain to be in close agreement with the global model [von Storch et al., 2000]. In these one-way nesting approaches there is no feedback from the RCM to the driving GCM [Jones et al., 1995].

[22] Because of their higher spatial resolution, RCMs provide a better description of orographic effects, land-sea contrast, and land surface characteristics [Jones et al., 1995; Christensen and Christensen, 2007]. They also give an improved treatment of fine-scale physical and dynamical processes and are able to generate realistic mesoscale circulation patterns which are absent from GCMs [Buonomo et al., 2007]. They provide data that are coherent both spatially and temporally and across multiple climate variables, consistent with the passage of weather systems. The fact that RCMs can credibly reproduce a broad range of climates around the world [Christensen et al., 2007] further increases our confidence in their ability to realistically downscale future climates.

[23] Climate models need to represent processes at scales below those that they can explicitly resolve, such as radiation, convection, cloud microphysics, and land surface processes. This is done using parameterization schemes, which represent a simplification of the real world and hence lead to inherent modeling uncertainty. For example, the simulation of precipitation in an RCM is divided into a large-scale scheme, accounting for clouds and precipitation which result from atmospheric processes resolved by the models (e.g., cyclones and frontal systems), and a convection scheme describing clouds and precipitation resulting from subgrid-scale convective processes. For example, a convection scheme may model convective clouds in a grid box as a single updraft, with the amount of convection determined by the rate of uplift at the cloud base. Convective activity is restricted to a single time step, and thus, there is no memory of convection in previous time steps. In addition, there is no horizontal exchange regarding convective activity in neighboring grid boxes.

[24] There are many different RCMs currently available, for various regions, developed at different modeling centers around the world. The different RCMs produce different high-resolution scenarios for a given boundary forcing [e.g., Buonomo et al., 2007], due to differences in model formulation, but also due to small-scale internal variability generated by the RCM. There has been considerable international effort recently to quantify uncertainty in regional climate change through the intercomparison of multiple RCMs, for example, the Prediction of Regional Scenarios and Uncertainties for Defining European Climate Change Risks and Effects (PRUDENCE) [Christensen and Christensen, 2007] and ENSEMBLES [Hewitt and Griggs, 2004; van der Linden and Mitchell, 2009] projects for Europe and the North American Regional Climate Change Assessment Program project (http://www.narccap.ucar.edu/) [Mearns et al., 2009] for North America. The recent Coordinated Regional Climate Downscaling Experiment (CORDEX) initiative from the World Climate Research Program promotes running multiple RCM simulations at 50 km resolution for multiple regions.

[25] The typical grid size of RCM simulations to date has been 25 or 50 km. However, recently, a few RCM simulations with grid scales below 20 km have become available for Europe: the REMO-UBA (10 km) and the CLM (18 km) simulations of the Max Planck Institute for Meteorology and the HIRHAM (12 km) simulations of the Danish Meteorological Institute [Dankers et al., 2007; Früh et al., 2010; Hollweg et al., 2008; Tomassini and Jacob, 2009]. In addition, RCMs with grid sizes of 5 km or less are being developed at several modeling centers. For example, a 5 km RCM has been developed over Japan [Kanada et al., 2008]. Also, preliminary results using cloud-resolving models on climate time scales spanning small domains are becoming available, e.g., for the Alpine region at a grid scale of 2.2 km [Hohenegger et al., 2008].

3.1. Skill of RCMs to Downscale Precipitation

[26] Precipitation is one of the climate variables most sensitive to model formulation, being strongly dependent on several parameterization schemes and their interplay with the resolved model dynamics. For this variable, it has been shown that RCMs are able to contribute significant added value compared to the driving GCMs [e.g., Durman et al., 2001; Frei et al., 2006; Buonomo et al., 2007].

[27] Compared to the driving GCM, RCMs produce an intensification of precipitation [Durman et al., 2001], leading to an improved representation of the daily precipitation distribution, including extreme events [Christensen and Christensen, 2007]. Also, RCMs can reproduce many features of the precipitation distribution over regions of complex topography not resolved in the GCM [Frei et al., 2006]. Significant biases in the simulation of mean precipitation on large scales can be inherited from the driving GCM [Durman et al., 2001]. To provide a clearer assessment of the performance of an RCM, it can be driven by reanalysis data (see also section 5). These provide quasi-observed boundary conditions and allow RCM downscaling skill to be isolated [Frei et al., 2003]. Reanalysis-driven RCM simulations not only exclude systematic biases in the large-scale climate but, in contrast to standard simulations, also are able to reproduce the actual day-to-day sequence of weather events, which allows for a more comprehensive and exact assessment of the downscaling skill. For instance, the ENSEMBLES project provides a set of European Centre for Medium-Range Weather Forecasts 40 Year Reanalysis (ERA40)-driven RCMs. Recent work within this project has shown that 25 km RCMs driven by ERA40 boundary conditions give a good representation of rainfall extremes over the UK, with model biases of a similar order to the differences between the 25 km ENSEMBLES and 5 km Met Office gridded observational data sets (E. Buonomo et al., manuscript in preparation, 2010).

[28] There is evidence that RCM skill in simulating the spatial pattern and temporal characteristics of precipitation increases with increasing model resolution. Improved skill may result from the improved representation of complex topography and the resolution of fine-scale dynamical and physical processes and also through the sensitivity of physical parameterization to model grid size [Giorgi and Marinucci, 1996]. A recent study by Rauscher et al. [2010] compared the downscaling skill of RCMs at 25 and 50 km grid spacings over Europe. They found improved skill at higher resolution during summer, although not in winter in some regions. However, this apparent geographic dependence in the sensitivity to model resolution may, in part, reflect regional variations in observational station density.

[29] For a given RCM, downscaling skill has been shown to depend on the region, season, intensity, and duration of the precipitation event considered. In general, RCMs show better downscaling skill in winter than in summer and for moderate compared to very heavy precipitation. We will discuss these issues in detail in section 6, where we compare the skill of RCMs with statistical downscaling approaches.

[30] We note that in the context of climate change projections, the effects of model biases may be reduced. In particular, biases in RCM precipitation may, in part, cancel out on taking differences between the control and future scenarios. For example, Buonomo et al. [2007] find that two RCMs give similar precipitation changes, despite significant differences in model biases for the present day. However, recent work by Christensen et al. [2008] suggests that biases may not be invariant in a warming climate. In particular, models tend to show a greater warm bias in those regions that are hot and dry, while wet (dry) months tend to show a greater dry (wet) bias.

3.2. Limitations of RCMs

[31] RCMs only provide meaningful information on precipitation extremes on the scale of a few grid cells, with considerable noise on the grid cell scale [Fowler and Ekström, 2009]. Thus, for RCMs with a typical grid spacing of 25–50 km, this equates to providing information on scales of ∼100 km (although this also depends on other factors such as season and topography). Spatial pooling, whereby daily precipitation data from neighboring grid cells are concatenated to give one long time series, is effective at improving the signal to noise ratio and thus provides improved statistics of local heavy precipitation [Kendon et al., 2008]. We note, however, that this technique is only applicable where neighboring grid cells are effectively sampling from the same precipitation distribution and also that spatial dependence needs to be accounted for when assessing uncertainties. As RCMs with grid scales of less than 20 km become available [e.g., Dankers et al., 2007; Hollweg et al., 2008], the spatial scale on which meaningful information is provided will decrease. Nevertheless, a discrepancy will remain between the spatial scale of RCM precipitation, which should be interpreted as areal average values [Chen and Knutson, 2008], and site-specific data needed for many impacts studies.

[32] Linked to the spatial resolution of RCMs, there is also a minimum temporal scale on which RCMs can provide meaningful information. In particular, current RCMs show skill in capturing statistics of the daily precipitation distribution but do not well represent subdaily precipitation and the diurnal cycle of convection [Brockhaus et al., 2008; Lenderink and van Meijgaard, 2008]. As the spatial resolution of RCMs increases and, in particular, convection-resolving scales are achieved, models give an improved representation of the diurnal cycle [Hohenegger et al., 2008] and may provide meaningful information on hourly time scales. It should be noted, however, that a 30 year RCM integration just represents one possible 30 year realization of the climate and not the actual sequence of weather events. In particular, natural variability on daily to decadal time scales is a key source of uncertainty when estimating precipitation extremes.

[33] A key source of model deficiencies in the simulation of precipitation is the convective parameterization. In particular, many of the parameterization schemes used in RCMs may not be appropriate, having been developed for coarser-resolution GCMs and tropical regions [Hohenegger et al., 2008]. This is particularly likely to be an issue in summer, when rainfall is predominantly convective in nature, and on subdaily time scales, when the highest precipitation intensities are usually related to convective showers [Lenderink and van Meijgaard, 2008].

[34] Moreover, the simulation of precipitation in RCMs is also highly sensitive to other aspects of the model formulation, including the grid resolution, the numerical scheme, and other physical parameterizations [Fowler and Ekström, 2009]. A number of parameters in the model physics are not well constrained, and varying these parameters within reasonable bounds leads to differences in the simulated precipitation [Bachner et al., 2008; Murphy et al., 2009]. RCMs developed at different modeling centers around the world use different formulations, leading to differences in downscaling skill. There is some evidence that regions and seasons showing the greatest model biases in the simulation of precipitation are also those with the greatest intermodel differences [Frei et al., 2006; Fowler et al., 2007b]. Past experience has shown that no single RCM is best for all climate variables and statistics considered [Jacob et al., 2007; Christensen and Christensen, 2007], and it is not trivial to develop an objective scheme for weighting different RCMs. Indeed, it has been argued that when using multiple outputs from climate models, it is necessary to develop methodologies that exploit each model predominantly for those aspects where it performs competitively [Leith and Chandler, 2010].

4. METHODS TO BRIDGE THE GAP: STATISTICAL DOWNSCALING

[35] There are many statistical approaches to bridge the gap between GCM or RCM outputs and local-scale weather required to assess impacts. In the simplest form, the idea of statistical downscaling comprises some kind of mapping between a large- (or larger-) scale predictor X and the expected value of a local-scale predictand Y,

equation image

where β represents a vector of unknown parameters that must be estimated to calibrate the downscaling scheme. More advanced downscaling approaches may also explicitly model variability that is not explained by the dependence of Y upon X, as a random variable η.

[36] Wilby and Wigley [1997] classified statistical downscaling into regression methods, weather type approaches, and stochastic weather generators (WGs). As an alternative classification, Rummukainen [1997] suggested a categorization based on the nature of the chosen predictors, which distinguished between perfect prognosis (PP; also referred to as “perfect prog”) and MOS. To integrate these suggestions, we classify statistical downscaling approaches into PP, MOS, and WGs. This classification should only be seen as a means to sensibly structure sections 4.14.3.

[37] Classical statistical downscaling approaches, which include regression models and weather pattern–based approaches, establish a relationship between observed large-scale predictors and observed local-scale predictands (see Figure 2a). Applying these relationships to predictors from numerical models in a weather forecasting context is justified if the predictors are realistically simulated, and thus, these methods are known as perfect prognosis downscaling [e.g., Klein et al., 1959; Kalnay, 2003; Wilks, 2006]. In the context of climate change projections PP methods are based on the assumption that the simulated large-scale predictors represent a physically plausible realization of the future climate. Common to these downscaling approaches, the weather sequences of the predictors and predictands can directly be related to each other event by event.

Figure 2.

Statistical downscaling approaches. Two-headed arrow refers to a calibration, and regular arrow refers to a downscaling step. (a) Perfect prognosis (PP) is calibrated on large-scale and local-scale observations. For the projection, large-scale predictors are simulated by a GCM or RCM. Model output statistics (MOS) calibrates model output against observations. (b) The whole model (GCM+RCM) is corrected. Therefore, the same GCM and RCM have to be used in the projection. In this setting, the calibration is based on the distributions of model output and observations only. (c) Only the RCM output is corrected. In the projection, an arbitrary GCM can be used (this is a PP step). This setting allows for a calibration based on the whole time series of model output and observations. MOS can also be applied directly to GCMs, e.g., in a forecasting situation. Here the GCM is forced to closely follow observational data for the calibration. Conditional weather generators can be used either in a PP setting (Figure 2a) by using large-scale predictors or in a MOS setting (Figures 2b and 2c) by using change factors.

[38] PP approaches establish statistical relationships between variables at large (synoptic) scales and local scales. Physical processes on intermediate scales are usually ignored. With the increasing skill of RCMs and the availability of RCM scenarios (see section 3), alternative statistical downscaling approaches that make use of simulated mesoscale weather are becoming popular. These approaches are known as MOS. The idea of MOS is to establish statistical relationships between variables simulated by the RCM and local-scale observations to correct RCM errors (see Figures 2b and 2c).

[39] WGs are statistical models that generate local-scale weather time series resembling the statistical properties of observed weather. In their most basic unconditional form, WGs are calibrated against observations on local scales only and are hence not downscaling approaches. Historically, the most common way of using such unconditional WGs in conjunction with climate change scenarios was to apply so-called change factors, derived from regional climate models [e.g., Kilsby et al., 2007]. This approach can be considered as simple MOS (see Figure 2b). Other WGs condition their parameters on large-scale weather [Wilks and Wilby, 1999]. Such weather generators are thus hybrids between unconditional weather generators and PP statistical downscaling (Figure 2a).

4.1. Perfect Prognosis Statistical Downscaling

[40] This section reviews statistical downscaling approaches that establish links between observed large-scale predictors and observed local-scale predictands. The large-scale observations are often replaced by surrogate observational data such as those obtained from reanalysis products. For a discussion of problems related to observational data, refer to section 5. Many state-of-the-art PP approaches are used in a weather generator context. These specific applications will then be discussed in section 4.3.

[41] In a PP framework, equation (1) defines a relationship between a large-scale predictor X and the expected value of a local-scale predictand Y for times t, with some noise η not explained by the predictor. This is often achieved by regression-related methods, in which case the predictors X are also called covariates. Since for every observed large-scale event, there is a corresponding observed local-scale event, the calibration can be done event-wise, i.e., relating the time series of predictors and predictands to each other in sequence rather than only relating the distribution of predictors and predictands to each other.

[42] The model shown by equation (1) can be used to generate local-scale time series, by predicting Y(t) from observed or simulated predictors X(t) = (X1(t), X2(t), …). Simple PP approaches disregard any residual noise term ηi, whereas state-of-the-art PP approaches explicitly provide a noise model to represent variability and extremes. The former are often called deterministic, and the latter are often called stochastic.

[43] The construction of the actual downscaling scheme can be divided into two steps: first, the selection of informative large-scale predictors and second, the development of a statistical model for the link between large-scale predictors and local-scale predictand (i.e., the f(.) in equation (1)). Often, the first step also requires transformation of the raw predictors into a useful form. To avoid both overfitting or ignoring valuable information, a model selection according to statistical criteria should be carried out.

4.1.1. Predictor Selection

[44] The selection of suitable predictors is crucial when developing a statistical downscaling model. The most basic requirement for a predictor is that it is informative; that is, it has a high predictive power. Informative predictors can be identified by statistical analyses, typically by correlating possible predictors with the predictands. Various predictors representing the atmospheric circulation, humidity, and temperature have been used to downscale precipitation. According to Charles et al. [1999], measures of relative humidity (e.g., dew point temperature depression) are more useful than measures for specific humidity. In general, the predictor choice depends on the region and season under consideration [Huth, 1996, 1999; Timbal et al., 2008a].

[45] In a climate change context, predictors that capture the effect of global warming [Wilby et al., 1998] are necessary. In particular, measures of humidity are necessary to capture changes in the water-holding capacity of the atmosphere under global warming [Wilby and Wigley, 1997], whereas temperature adds little predictive power to predict long-term changes in precipitation. Suitable predictors need to be reasonably well simulated by the driving dynamical models (PP assumption), and the relationship between predictors and predictands needs to be stationary, i.e., temporally stable.

[46] These requirements are summarized in the Statistical and Regional Dynamical Downscaling of Extremes for European Regions (STARDEX) project [Goodess et al., 2010]. A list of predictors used for precipitation downscaling is given by Wilby and Wigley [2000], along with a comparison of observed and simulated predictors and a stationarity assessment. A comparison of predictors for different regions is given by Cavazos and Hewitson [2005].

4.1.2. Predictor Transformation

[47] Raw predictors are generally high-dimensional fields of grid-based values. Moreover, the information at neighboring grid points is not independent. It is thus common to reduce the dimensionality of the predictor field and to decompose it into modes of variability.

[48] Principal component analysis (PCA) [Preisendorfer, 1988; Hannachi et al., 2007] is the most prominent method for dimensionality reduction. It provides a set of orthogonal basis vectors (empirical orthogonal functions) allowing for a low-dimensional representation of a large fraction of the variability of the original predictor field [e.g., Huth, 1999]. PCA, however, does not account for any information about the predictands, and the predictor/predictand correlation might thus not be optimal. Different in this respect is canonical correlation analysis or maximum covariance analysis. These methods simultaneously seek modes of both the predictor and the predictand field (e.g., a set of rain gages), such that their temporal correlation or covariance is maximal [Bretherton et al., 1992; Huth, 1999; von Storch and Zwiers, 1999; Widmann, 2005; Tippett et al., 2008].

[49] Physically motivated transformations of the raw predictor field can provide predictors that are easily interpretable and influence the predictands in a straightforward way. For instance, Wilby and Wigley [2000] have used airflow strength and direction instead of the zonal and meridional components of the wind field. In a similar manner, airflow indices (strength, direction, and vorticity), derived from sea level pressure [Jenkinson and Collison, 1977; Jones et al., 1993], have been used to downscale and model UK precipitation [Conway and Jones, 1998; Maraun et al., 2010b]. Also, the North Atlantic Oscillation index is a transformation of the North Atlantic pressure field.

[50] Weather types (circulation patterns/regimes) can be considered as another meteorologically motivated predictor transformation. The large-scale atmospheric circulation is mapped to a usually small and discrete set of categories [Michelangeli et al., 1995; Stephenson et al., 2004; Philipp et al., 2007]. Weather types are a straightforward way to allow for nonlinear relations between the raw predictors and predictands; the price paid is a potential loss of information due to the coarse discretization of the predictor field. Typical examples are patterns defined for geopotential heights [Vautard, 1990], sea level pressure [Plaut and Simonnet, 2001; Philipp et al., 2007], or wind fields [Moron et al., 2008a]. The number of types can range from small values (e.g., 4 in the case of North Atlantic circulation patterns [Vautard, 1990; Plaut and Simonnet, 2001]) to almost 30 (Großwetterlagen [Hess and Brezowsky, 1977]). A European cooperation in science and technology action has been initiated to compare different weather types (http://www.cost733.org).

[51] Weather types can be defined subjectively by visually classifying synoptic situations or objectively using clustering and classification algorithms. The latter can be based on ad hoc or heuristic methods such as k means [MacQueen, 1967; Plaut and Simonnet, 2001], hierarchical clustering [Ward, 1963; Casola and Wallace, 2007], fuzzy rules [Bárdossy et al., 2005], or self-organizing maps (SOMs) [Kohonen, 1998; Wehrens and Buydens, 2007; Leloup et al., 2008]. Also, a variant of PCA, the T mode PCA, can be used for weather typing [e.g., Jacobeit et al., 2003]. A relatively new and promising approach is model-based clustering, such as mixtures of Gaussian distributions to model the state space probability density function [Bock, 1996; Fraley and Raftery, 2002; Vrac et al., 2007a; Rust et al., 2010]. Many of these approaches have been compared with respect to circulation clustering by Huth [1996].

4.1.3. Statistical Models for PP

[52] In sections 4.1.3.14.1.3.6, we will describe a range of statistical models that are commonly used for PP statistical downscaling.

4.1.3.1. Linear Models

[53] One of the most widely used methods for statistical downscaling is linear regression. Here the relationship in equation (1) between the predictor X and the mean μ of the predictand Y, e.g., local-scale precipitation, is written as a linear model,

equation image

where βi represents the strength of the influence of Xi. In general, the predictors X explain only part of the variability of the predictands Y; thus, early downscaling approaches, which modeled the predictands according to equation (2), generally underrepresented the local-scale variance. Karl et al. [1990] suggested “inflating” (i.e., to scale) the modeled variance to match the observed. As noted by von Storch [1999], however, inflation fails to acknowledge that local-scale variation is not completely explained by the predictors; instead, it is preferable to randomize the predictand, i.e., to add an explicit noise term η, as in the methods that follow. In a standard linear regression framework, the unexplained variability η is assumed to be Gaussian distributed. Thus, the predictand Y is itself Gaussian, with mean μ and some variance representing the unexplained variability.

4.1.3.2. Generalized Linear and Additive Models

[54] The Gaussian assumption might be feasible for precipitation accumulated to annual totals. However, on shorter time scales, precipitation intensities become more and more skewed, and daily precipitation is commonly modeled with a gamma distribution [e.g., Katz, 1977]. A framework that extends linear regression to handle such situations is the generalized linear model (GLM) [e.g., Dobson, 2001]. Here the predictand Y is no longer assumed to be Gaussian distributed but may follow a wide range of distributions, e.g., a gamma distribution. The conditional mean μ of the chosen distribution, i.e., the expected value of Y, is still modeled as a linear function of a set of predictors, but by contrast to a linear model, μ may be transformed by a link function g(.) to a scale where the influence of the predictors X on μ can be considered linear:

equation image

Simulation of downscaled time series is achieved by drawing random numbers from the modeled distribution of Y, thus intrinsically representing the unexplained variability.

[55] In the context of precipitation downscaling, most applications of GLMs are effectively weather generators; see section 4.3. An extension of the GLM is the generalized additive model (GAM) [Hastie and Tibshirani, 1990], where the linear dependence is replaced by nonparametric smooth functions. The nonparametric framework generally requires more data for accurate estimation of relationships, however. GAMs have been employed, in a paleoclimate context, with large-scale data and geographical characteristics as predictors to downscale climatological monthly temperature and precipitation representative of the Last Glacial Maximum [Vrac et al., 2007b]. GAMs in the context of weather generators will be discussed in section 4.3.

4.1.3.3. Vector Generalized Linear Models

[56] GLMs are capable of describing the mean of a wide class of distributions conditional on a set of predictors. In some situations, especially when studying the behavior of extreme events, one is additionally interested in the dependence of the variance or the extreme tail on a set of predictors. For instance, Maraun et al. [2009] and Rust et al. [2009] have shown that the annual cycles of location and scale parameters of monthly maxima of daily precipitation in the UK are slightly out of phase and are better modeled independently. For this purpose, vector generalized linear models (VGLMs) have been developed [Yee and Wild, 1996; Yee and Stephenson, 2007]. Instead of the conditional mean of a distribution only, a vector of parameters p = (p1, p2, …) of a distribution is predicted:

equation image

The vector p could, for instance, include the mean p1 = μ and the variance p2 = σ of a distribution. In extreme value statistics, these models have long been used when modeling the extreme value parameters dependent on covariates [Coles, 2001]. VGLMs have recently been applied to downscale precipitation occurrence in the United States [Vrac et al., 2007d], and a VGLM developed to model UK precipitation extremes [Maraun et al., 2010a] could easily be adopted to downscaling.

4.1.3.4. Weather Type–Based Downscaling

[57] The popular approach to condition local-scale precipitation on weather types can be thought of as a special case of a linear model. Instead of a continuous predictor field, a set of categorical weather types Xk are used to predict the mean of local precipitation:

equation image

where k gives the index of the actual weather type and μ(Xk) is the mean rainfall in this weather type. As in the case of standard linear regression, weather type approaches can, in principle, be extended to model an additional noise term ηk, such as generalized linear models and vector generalized linear models do. Weather types are mostly applied to condition weather generators; for examples, see section 4.3.

4.1.3.5. Nonlinear Regression

[58] There are also models available that aim to capture nonlinear and nonadditive relationships between the predictors and predictands. For instance, Biau et al. [1999] basically used a nonlinear regression to model the link between North Atlantic sea level pressure in winter and precipitation across the Iberian peninsula. Another nonlinear regression technique that has been applied in statistical downscaling is the artificial neural network (ANN). ANNs have, for instance, been used to downscale precipitation over South Africa [Hewitson and Crane, 1996], Japan [Olsson et al., 2001] and the UK [Haylock et al., 2006].

4.1.3.6. Analog Method

[59] The analog method has been developed for short-term weather forecasting [Lorenz, 1969]. In statistical downscaling, the large-scale weather situation is compared with the observational record. According to a selected metric (e.g., Euclidean distance), the most similar large-scale weather situation in the past is identified, and the corresponding local-scale observations Y( ) are selected as prediction for the desired local-scale weather [Zorita and von Storch, 1999]:

equation image

Lall and Sharma [1996] proposed not to select the most similar historic situation but to randomly choose between the k most similar ones. Potential limitations of the resampling scheme have been extensively discussed in the literature [e.g., Young, 1994; Yates et al., 2003; Beersma and Buishand, 2003]. In particular, the standard analog method does not produce precipitation amounts that have not been observed in the past. Therefore, Young [1994] proposed a perturbation of observed values to overcome this problem. It is also pointed out that daily standard deviations of variables are underestimated because of the so-called “selection effect,” a systematic underselection of certain days.

4.1.4. Model Selection

[60] In general, a range of physically plausible models exists for a given model structure (e.g., linear regression and GLM). For example, multiple variables exist that can be employed as predictors, but in many cases it is a priori not clear which of these are informative and which predictor transformation best conveys the information for the prediction. Taking too many predictors into account would lead to overfitting and would decrease the predictive power. Considering too few predictors would ignore valuable information. To objectively select a model, various statistical criteria have been developed. They are based on the likelihood of the model and assess whether an improvement in likelihood justifies an increased model complexity. Examples are likelihood ratio statistics and information criteria, such as the Bayes and Akaike information criteria [see, e.g., Davison, 2003]. Once an appropriate model has been selected, a model validation (section 5) assesses the skill of this model to predict certain desired properties of the process under consideration.

4.2. Model Output Statistics

[61] As precipitation simulated in RCMs and GCMs is partly unrealistic (section 3, see also Figure 1) and represents areal means at the model resolution rather than local values, it cannot be directly used in many impact studies. The potential deviations from real precipitation make it unsuitable as a predictor in a PP context because it does not satisfy the crucial “perfect prognosis” assumption. However, despite potential errors, simulated precipitation may contain valuable information about the real precipitation.

[62] Statistical models that link simulated precipitation to local-scale, real precipitation have been developed recently for applications to RCMs, and there are also some feasibility studies for GCMs. Such methods are a form of so-called MOS models, which have been applied in numerical weather forecasting for a long time [e.g., Glahn and Lowry, 1972; Klein and Glahn, 1974; Carter et al., 1989; Kalnay, 2003; Wilks, 2006]. In contrast to PP methods, the statistical relationship between predictors and predictands is calibrated not using observed predictors and predictands but using simulated predictors and observed predictands. In principle, predictors and predictands can be on the same spatial scale, in which case MOS would constitute a mere correction for a numerical model, but in most applications the predictand is local-scale precipitation, which means that MOS combines a correction and a downscaling step. The MOS corrections are specific to the numerical model for which they have been developed and cannot be used with other numerical models.

[63] Depending on the type of simulations used for MOS calibration the predictors can either be simulated precipitation time series or properties of the simulated intensity distribution (see Figure 2). Similarly, predictands can either be local precipitation series or properties of the local-scale intensity distribution. MOS can be used to transform deterministic predictors into probabilistic predictands (which is also possible with PP, see section 4.1.3). More general versions of MOS that link simulated and observed variables of different types are also conceivable [e.g., Themeßl et al., 2010]; for such versions, the model structure should carefully be selected according to statistical criteria (see section 4.1.4). However, most examples in climate applications employ simulated precipitation to predict precipitation.

[64] If the MOS calibration is based on an RCM driven by a standard GCM simulation for the recent climate, in which the link to the real climate is established only via the external forcings (such as insolation and concentrations of greenhouse gases and aerosols), the observed and simulated day-to-day weather sequences are not related, and thus, MOS can only be used to link distributions of simulated and observed precipitation. The same is true when using standard GCM-simulated precipitation as predictors. In such a setting there is a risk that differences in simulated and observed distributions, for instance, biases, are falsely attributed to model errors and thus falsely modified by the MOS approach, when they are actually caused by the random differences in the simulated and observed distribution of large-scale weather states on long time scales.

[65] If, however, the RCM is driven by an atmospheric reanalysis [Kalnay et al., 1996; Kistler et al., 2001; Uppala et al., 2005] or GCMs forced toward observations are used, there is a direct correspondence between simulated and observed weather states, and thus, simulated and observed precipitation time series can be directly related, for instance, through regression techniques as discussed in section 4.1.3. Regional or global short-range weather forecast simulations also fall in this category as the synoptic-scale meteorological situation is usually well predicted, and thus, simulated and observed precipitation for individual days can be statistically linked. This setting does not apply for standard GCM simulations. This fact explains why MOS has first been developed in weather forecasting, has recently seen increasing popularity applied to RCMs, and is only in the development phase for GCMs.

4.2.1. Methods for MOS

[66] Most of the examples of MOS applied to RCMs are based on reanalysis-driven RCMs. The simplest method assumes that the scenario precipitation yi+Tf at a time i + T in the future can be represented by (observed) precipitation (or corrected RCM simulations [see Lenderink et al., 2007]) xobs,ip at time i in the observational record, corrected by the ratio of the mean simulated future precipitation equation imagemodf and the mean control run (or reanalysis-driven run) precipitation equation imagemodp:

equation image

This method is sometimes misleadingly called the delta method because it was developed for temperature, where the change is additive rather than multiplicative. A mathematically similar but conceptually different approach is the so-called scaling method [e.g., Widmann and Bretherton, 2000; Widmann et al., 2003]. Here the corrected scenario precipitation yif at a time i in the future is represented by the (modeled) future scenario xmod,if at time i, scaled with the ratio of the mean observed precipitation equation imageobsp and the mean control run (or reanalysis driven) precipitation equation imagemodp:

equation image

This method is sometimes called the direct approach [e.g., Lenderink et al., 2007] and has been applied to GCMs [Widmann and Bretherton, 2000; Widmann et al., 2003] and RCMs [e.g., Leander and Buishand, 2007; Graham et al., 2007b; Engen-Skaugen, 2007]. Schmidli et al. [2006] further extended the approach by using a separate correction for precipitation occurrence and precipitation intensity. The aforementioned methods correct mean and variance by the same factor, such that the coefficient of variation (the ratio of the two) is unchanged.

[67] A generalized approach is quantile mapping, which considers different intensities individually [e.g., Panofsky and Brier, 1968; Hay and Clark, 2003; Dettinger et al., 2004; Wood et al., 1997; Ines and Hansen, 2006; Déqué, 2007; Piani et al., 2009]. For the calibration period, the cumulative distribution function of simulated precipitation is adjusted to match the cumulative distribution function of observed precipitation. The mapping is usually done between empirical quantiles or quantiles of gamma distributions fitted to the observed and modeled precipitation. For modeling of values beyond the observed range, Boé et al. [2007] extrapolated the correction function by using a constant correction, using the correction of the highest quantile from the control simulation. This assumption, however, is, in general, not valid for the extreme tail of the precipitation distribution. A possible solution could be to adapt the mixture model by Vrac and Naveau [2007] (first developed by Frigessi et al. [2002] for temperature data) to shift between a gamma distribution for the core and an extreme value distribution for the tail.

[68] All of these methods can account for the annual cycle, for example, by applying them to individual months or seasons separately. As they calibrate only distributions but disregard any pairwise relationships between predictor and predictand, we refer to these methods as distribution-wise.

4.2.2. MOS for GCMs

[69] Most publications using MOS in a climate change context are related to correcting RCM output, while MOS for GCM-simulated precipitation is still in the development stage. MOS on GCMs might be very useful in areas where no RCM output is available. The potential usefulness of MOS corrections for GCMs was demonstrated by Widmann et al. [2003], who used the National Centers for Environmental Prediction and National Center for Atmospheric Research (NCEP-NCAR) reanalysis [Kalnay et al., 1996] as an example for a GCM in which the synoptic-scale circulation is in agreement with reality because of the assimilation of meteorological data such as pressure, wind speeds, and temperature but in which the precipitation is still calculated according to model physics.

[70] The corrections for the NCEP-NCAR reanalysis model cannot be transferred to other GCMs, and thus, the development of MOS corrections for GCMs used for climate change experiments is difficult. The GCM simulations for the 20th and 21th century do not represent the real temporal evolution of large-scale weather states in the past. As a consequence only distribution-wise MOS would be possible, but it is difficult to assess whether the simulated precipitation is actually a skillful predictor. For this reason, MOS has been applied so far to nonreanalysis GCMs only in the context of seasonal prediction [Landmann and Goddard, 2002; Feddersen and Andersen, 2005; Shongwe et al., 2006], where the simulated and true atmospheric circulation partly match, and in the simple form of climatology-based local debiasing of precipitation over the Alps for climate change simulations [Schmidli et al., 2007].

[71] In order to provide the foundation for comprehensive MOS for future precipitation, J. Eden et al. (Reassessing the skill of GCM-simulated precipitation, submitted to Journal of Climate, 2010) nudged the European Center/Hamburg (ECHAM5) GCM toward the circulation and temperature in the ERA40 reanalysis and showed that the correlation of simulated and observed monthly mean precipitation over large parts of the Earth is larger than 0.8. This suggests that MOS corrections would provide precipitation estimates with small errors.

4.3. Weather Generators

[72] Weather generators, such as WGEN [Richardson, 1981; Richardson and Wright, 1984] and EARWIG [Kilsby et al., 2007], are statistical models that generate random sequences of (usually several) weather variables, with statistical properties resembling those of observed weather. At the core of most weather generators is a precipitation generator, with any remaining variables usually simulated conditional on the generated precipitation.

[73] The general motivations for using weather generators are their capacity to provide synthetic series of unlimited length [Hulme et al., 2002], the possibility of infilling missing values by imputation (i.e., sampling missing observations from their conditional distribution given the available observations [see Yang et al., 2005]), and their computational efficiency [Semenov et al., 1998] that allows for multimodel probabilistic projections or other impact assessments [Jones et al., 2009]. The early weather generators (e.g., WGEN) were originally developed for providing surrogate climate time series to agricultural and hydrological models in case weather observations are too short or have quality deficiencies.

[74] In previous studies [e.g., Fowler et al., 2007a; Wilks and Wilby, 1999; Semenov et al., 1998], weather generators are distinguished on the basis of the implemented parameterization, the assumed distributions, and the suitability for particular application. Here, however, because of the importance of a proper representation of spatial rainfall [Segond et al., 2007] and the limitations of spatial consistency associated with many weather generators [e.g., Jones et al., 2009], we distinguish two groups of precipitation generators: single-station generators and multistation generators. In addition, weather generators have been developed that attempt to model a full precipitation field in continuous space. However, these methods have only recently been extended into a downscaling context. We will therefore present these methods as a brief outlook.

[75] Pure PP and MOS approaches do not explicitly model either temporal or spatial correlations; any structure is imposed by correlations present in the predictors. Weather generators explicitly aim to generate time series or spatial fields with the observed temporal or spatial structure.

4.3.1. Single-Station Generators

4.3.1.1. Unconditional Weather Generators

[76] Unconditional weather generators are calibrated to local observations only; that is, they do not directly use large-scale conditions from RCMs or GCMs. As discussed in section 4.1.3, at finer (e.g., daily) time scales, the distribution of precipitation tends to be strongly skewed toward low values, with a generally high number of zero values representing dry intervals. Moreover, precipitation sequences usually exhibit temporal dependence, particularly in the sequence of wet and dry intervals. Early weather generators treated single-site precipitation as a two-component process, describing precipitation occurrence and precipitation intensity separately. In the simplest case, introduced by Gabriel and Neumann [1962], the wet day occurrence is modeled as a two-state first-order Markov process. This structure implies that the occurrence or nonoccurrence of precipitation is only conditioned on the occurrence of precipitation on the previous day. Letting I(t) denote the binary occurrence event (wet or dry) on day t, the transition probabilities pij(t) are defined as

equation image

The first-order Markov chain has been widely used as a simple model for rainfall occurrence [Katz, 1977; Wilks, 1998; Wilks and Wilby, 1999]. However, first-order models usually underrepresent long dry spells, and this has led to the use of more complex higher-order models [Mason, 2004; Stern and Coe, 1984].

[77] To model the skewed distribution of rainfall intensities, the two-parameter gamma distribution is often used [Katz, 1977; Vrac et al., 2007d], although this is not the only choice; for example, Wilks [1998] uses a mixture of two exponential distributions. In the simplest daily weather generators, nonzero intensities are sampled independently for each wet day. To incorporate seasonality in these weather generators, parameters are typically estimated separately for each month or season.

[78] As an alternative to the separate modeling of precipitation occurrence and intensity, some authors have proposed modeling the two components together. The most common way of achieving this is using a power-transformed and truncated normal distribution [e.g., Bárdossy and Plate, 1992]. For example, if Yt is the rainfall at time t, then a common family of transformations is

equation image

where Zt is a Gaussian random variable and β is a transformation parameter. Glasbey and Nevison [1997] and Allcroft and Glasbey [2003] use a more complex transformation in an attempt to reproduce the rainfall distribution more closely, but the power transformation equation (10) is by far the most widely used. More recently, innovative distributions such as those in the Tweedie family have been suggested as an alternative to transformed Gaussian variables [Dunn, 2004].

[79] The weather generators reviewed above take as their starting point a distribution of precipitation in each time interval. An alternative starting point is to consider explicitly the temporal structure of precipitation within a time interval: this forms the basis of cascade models, which are used for subdaily downscaling because they are able to model correlated rain [Olsson, 1998; Marani and Zanetti, 2007]. As with other weather generators, the simplest way to incorporate seasonality is to calibrate the models separately for each month or season [e.g., Furrer and Naveau, 2007].

[80] A final class of precipitation generators is based on Poisson cluster processes [e.g., Rodriguez-Iturbe et al., 1987, 1988; Cowpertwait, 1991]. This class again attempts to characterize the temporal structure of precipitation sequences but now by explicitly considering the mechanisms of precipitation generation in a simplified stochastic framework: a precipitation time series is considered as a sequence of “storms” (rain events), each consisting of a collection of “rain cells” with random intensity and duration. The models are parameterized using physically interpretable quantities such as storm arrival rate, mean cell intensity, and mean number of cells per storm and have been found to provide useful simulations of precipitation sequences at time scales down to hourly. For reviews of these models, see Onof et al. [2000] and Wheater et al. [2005].

4.3.1.2. Weather Generators and Downscaling

[81] A simple way to use unconditional weather generators for climate change scenarios is to perturb the parameters by so-called change factors [e.g., Kilsby et al., 2007]: in a pair of RCM simulations, one of present day and one of the future climate, the change of the weather generator parameters (e.g., mean temperature or precipitation) from present to future is calculated for the grid box containing the location of the weather station of interest. These so-called change factors (usually differences for temperature and ratios for precipitation) are then used to modify the observed parameters for a future climate. Once these change factors are calculated, no large-scale drivers are needed to generate weather time series. A prominent example for change factor conditioned weather generators are the regional scenarios from the UK climate projections (UKCP09) [Jones et al., 2009]. Deriving change factors for the statistical properties between the RCM control and scenario runs and applying these change factors to the statistical properties of the weather generator is mathematically equivalent to deriving a correction factor between the statistical properties of the RCM control run and the weather generator and then applying this correction factor to correct the statistical properties in the RCM scenario run. In that sense, change factor conditioned weather generators can be seen as a simple MOS (section 4.2).

[82] However, such weather generators often underestimate the interannual variability (overdispersion) and the frequency of extremes [e.g., Katz and Parlange, 1998] because the climatic processes influencing local weather exhibit longer-term variability, which is not captured by stationary low-order Markov models. A possible solution to the overdispersion problem is to condition specific parameters on covariates [Katz and Parlange, 1993; Wilks, 1989] controlling the low-frequency variability of the local weather, e.g., the large-scale atmospheric circulation. Such weather generators can be considered as PP (section 4.1). Besides large-scale weather predictors, transformations of lagged rainfall values, representations of seasonality, and topographic controls may also be used as covariates. Interaction terms can also be used in situations where one covariate modulates the effect of another [e.g., Chandler, 2005].

[83] One way to incorporate covariates into stochastic weather generators is based on GLMs (section 4.1.3). GLMs for rainfall usually use logistic regression to model the changing probability of rainfall occurrence and then consider nonzero rainfall intensities to be drawn from gamma distributions with means that are related (usually via a log link function) to linear combinations of covariates. In their simplest form, such GLMs can be regarded as extensions of the Markov Chain [see, e.g., Coe and Stern, 1982; Grunwald and Jones, 2000]. GLMs are being used increasingly for the analysis and downscaling of precipitation sequences [e.g., Fealy and Sweeney, 2007; Furrer and Katz, 2007], as are GAMS [e.g., Hyndman and Grunwald, 2000; Beckmann and Buishand, 2002; Underwood, 2009]. For parameter estimation of these models software routines are freely available, for example, in the stats package of the R software environment [R Development Core Team, 2008].

[84] Another way of incorporating large-scale information is via weather typing (see section 4.1.3). For example, Hewitson and Crane [2002] used SOMs to define a collection of weather states on the basis of January sea level pressure spatial fields for the northeast United States and, for each state, determined the mean and variance of daily rainfall for a gage in the center of the region. As another example of this kind of approach, Fowler et al. [2000] present a Poisson cluster model (see section 4.3.1.1) in which the parameters for each day are conditional on the particular weather state observed on that day.

4.3.2. Multistation Generators

[85] Multisite generation is challenging, essentially because of the need to model the joint (i.e., multivariate) distribution of precipitation simultaneously at all sites. Relatively few tractable models are available for multivariate distributions; hence, many approaches to multisite precipitation generation are based, at some level, on transformations of the multivariate Gaussian distribution. The use of transformed and truncated Gaussian distributions to model single-site precipitation has been discussed in section 4.3.1; the extension to the multisite setting is accomplished by specifying an intersite correlation structure for the Gaussian variables at each location. Generation of a multisite rainfall sequence therefore proceeds at each time instant by sampling a correlated vector of Gaussian variables (there is a standard algorithm for this [see, e.g., Monahan, 2001, section 11.3]) and back-transforming according to equation (10). The multisite generator of Wilks [1998] operates on a similar principle except that here the transformation to Gaussianity is determined by an assumption that the nonzero rainfall amounts at each site follow mixed exponential distributions.

[86] In a downscaling context, dependence on predictors can be incorporated into such models as discussed in section 4.1, either in a regression-like framework as by Sansó and Guenni [2000] or in conjunction with a weather typing scheme whereby different sets of model parameters are used at each time instant, depending on the underlying sequence of weather states [Bárdossy and Plate, 1992; Stehlík and Bárdossy, 2002; Ailliot et al., 2009; Moron et al., 2008b]. In early applications of this type of methodology, weather types were typically defined solely in terms of the predictor variables.

[87] However, the more recent literature tends to focus on model variants employing so-called weather states: here precipitation patterns themselves are allowed to influence the weather state definitions, so that the resulting weather classifications can be interpreted as corresponding to distinct rainfall regimes. This includes a growing body of work based on nonhomogeneous and hidden Markov models, in which the link between weather states and predictors is probabilistic rather than deterministic [e.g., Hughes et al., 1999; Bellone et al., 2000; Charles et al., 2004; Vrac and Naveau, 2007; Vrac et al., 2007d]. For a schematic of such a weather generator, see Figure 3. In early versions of this type of model, the underlying weather states were considered to be entirely responsible for intersite dependence so that precipitation can be sampled independently at each site given the weather state. However, this may be inadequate at smaller spatial scales in particular, and this has led to the development of more complex models [e.g., Ailliot et al., 2009; Vrac et al., 2007c]. Bayesian hierarchical models also open a promising way forward here [e.g., Cooley et al., 2007]. For all of the approaches outlined above, model calibration can be a challenging task that is nowadays accomplished most easily using computationally intensive Bayesian methods (available software packages are WinBUGS [Lunn et al., 2000], OpenBUGS [Thomas et al., 2006] (software available at http://mathstat.helsinki.fi/openbugs), and BayesX (C. Belitz et al., BayesX—Software for Bayesian inference in structured additive regression models, version 2.0.0, available at http://www.stat.uni-muenchen.de/bayesx)). The appropriate use of such methods can require considerable technical expertise, however. Thus, there is arguably a market for simpler methods that are suitable for routine implementation.

Figure 3.

State-of-the-art weather generator using weather states [after Vrac and Naveau, 2007]. Weather time series are generated as follows: at each time step, the weather jumps into a specific weather state (red dots, spatial rain pattern); the transition probability from state to state is given by the state at the previous time step (red arrows, hidden Markov model) and depends on the large-scale atmospheric circulation at the particular time step (magenta arrows; this makes the Markov model nonhomogeneous). Furthermore, the atmospheric circulation determines the probability of having a dry or a wet day (blue arrows, logistic regression). If a wet day is generated, the actual amount of rain is generated from a distribution dependent on the weather state.

[88] One such method uses GLMs to model precipitation sequences at individual sites (see sections 4.1.3 and 4.3.1), in conjunction with appropriately defined spatial dependence structures that enable the simulation of multisite sequences with realistic joint distributional properties. The potential for dependence between sites raises statistical issues when fitting models, however; for a review of these and straightforward ways of overcoming them, see Chandler [2005] and Chandler and Bate [2007]. The GLIMCLIM software package [Chandler, 2002] incorporates all of these features, as well as the possibility to include large-scale atmospheric variables as predictors and to handle missing data. These ideas are illustrated in the applications of GLMs to multisite rainfall simulation by Yang et al. [2005].

[89] A further approach to generate multisite weather is to apply the analog method (see section 4.1.3) in a weather generator context. For instance, Buishand and Brandsma [2001] proposed a nearest-neighbor resampling scheme conditioned on current large-scale atmospheric circulation patterns in order to derive local weather observations. To improve the temporal structure, some implementations of the analog method compare not only the large-scale weather situation at one point in time with historical weather but also the weather on preceding days. For a more realistic chronology of events, Orlowsky et al. [2008] suggested the resampling of time blocks instead of single events. Because multisite time series are sampled simultaneously, spatial correlations between stations are preserved and physically consistent. In this context no assumptions about the distribution and spatial correlations are necessary. However, in addition to the resampling of intensities, as discussed in section 4.1.3, spatial patterns are also resampled as a whole, and no unobserved patterns are generated.

[90] Most of the papers cited above focus on the generation of rainfall sequences at a daily time scale, which is considered adequate for many climate impact studies. However, in some specialized applications (for example, urban flooding studies and radio telecommunication links), data may be required at finer time scales. Models for the generation of single-site subdaily rainfall have been reviewed in section 4.3.1. At present, there are few extensions of these that provide for the generation of multisite subdaily sequences in a downscaling context. Fowler et al. [2005] describe one possibility in which a spatial-temporal Poisson cluster model is used as the basic multisite generator, with different parameters corresponding to distinct weather states. By contrast, Segond et al. [2006] suggested that subdaily sequences could be generated by first generating multisite daily sequences using one of the many available methods and then disaggregating the daily totals to the time scale of interest.

4.3.3. Full-Field Generators

[91] An important area of investigation in rainfall modeling is the development of models able to simulate a field of precipitation at any required fine scale and thereby provide inputs to distributed hydrological models. Currently, a number of techniques are available for such unconditional full-field simulation. They generally fall into one of the following three categories (see Ferraris et al. [2003] for a comparison). These are models based upon transformed Gaussian processes [Guillot and Lebel, 1999], point process models [Wheater et al., 2005; Cowpertwait et al., 2002; Northrop, 1998], and spatial-temporal implementations of multifractal cascade models [Lovejoy and Schertzer, 2006; Marsan et al., 1996; Over and Gupta, 1996].

[92] Currently, aside from the simple scaling model of spatial rainfall fluctuations by Perica and Foufoula-Georgiou [1996], there are, in the literature, no implementations of such approaches for the downscaling of climate model output. But the potential of the existing methodologies is very clear. Multifractal representations of rainfall fields are well suited to downscaling implementations as they are simulated through cascade models [Deidda, 2000]. Similar but also allowing for nonfractal subgrid-scale structures is a downscaling algorithm based on spectral methods, for which an implementation for cloud fields already exists [Venema et al., 2010]. Disaggregation methods using point process approaches [Koutsoyiannis and Onof, 2001] could, in principle, be extended to the spatial dimension. Transformed Gaussian processes can be conditioned by the average areal rainfall [Onibon et al., 2004].

5. EVALUATION TECHNIQUES FOR DOWNSCALING METHODS

[93] Here we review methods which can be used to validate the performance of downscaling approaches to simulate specific characteristics of precipitation. These are often called metrics and are related to the end user needs, which we have introduced in section 2, and, in principle, form the basis of the discussion of downscaling skill to meet these end user needs in section 6.

[94] Any validation method ultimately relies upon the quality and quantity of observational data. Typical quality problems are inhomogeneities, outliers, and biases due to wind-induced undercatch (i.e., precipitation is underestimated by the rain gage because a nonnegligible amount of rain is blown over the gage). Inhomogeneities may induce spurious trends [e.g., Yang et al., 2006] and increase uncertainty and may potentially weaken predictor/predictand relationships. Estimates of extreme events are particularly sensitive to outliers and inhomogeneities. For an appropriate signal to noise ratio, sufficiently long time series are needed, in particular, to reliably estimate extremes and infer trends. The validation of how natural variability is represented is limited by the length of observational records, typically a few decades. Furthermore, a sparse rain gage network limits the possibility for validation or may even render it impossible. For this reason, high-resolution data sets have been developed in some regions [e.g., Haylock et al., 2008]. For an impression of the global rain gage network, see Figure 4. Data are particularly sparse in the high latitudes, deserts, central Asian mountain ranges, and large parts of South America.

Figure 4.

Rain gages used in the monthly CRU TS data set [e.g., Mitchell and Jones, 2005], which have been in situ for at least 40 years.

[95] Reanalysis data, such as NCEP/NCAR [Kalnay et al., 1996] or ERA40 [Uppala et al., 2005], are frequently used as surrogates for observational data for validation of large-scale processes. Such data are basically interpolations of observational data based on a dynamical model (so-called data assimilation) and are therefore complete and physically consistent. However, they are subject to model biases and can significantly deviate from real weather. Precipitation is a variable which is generally not assimilated but completely generated by the parameterizations in the model, which may induce considerable biases in some locations [Zolina et al., 2004]. Furthermore, the resolution of reanalysis data is too low to resolve local-scale precipitation. Therefore, NCEP/NCAR has developed the North American Regional Reanalysis [Mesinger et al., 2006] that assimilates, among other variables, precipitation in order to provide a more realistic regional hydroclimatology.

[96] Reanalysis data are used to drive RCMs for validation purposes. First, this setting isolates the RCM model bias from any possible GCM bias [e.g., Sanchez-Gomez et al., 2009; Prömmel et al., 2009; Vidale et al., 2003; Jaeger et al., 2008]. Second, this setting accounts for natural variability. As discussed in the context of MOS calibration (section 4.2), the output of a GCM-driven RCM represents just one possible realization of the climate. Discrepancies might simply result from differences between the realization and the observed weather on long time scales rather than model errors. In a reanalysis-driven RCM, however, the sequence of synoptic weather in the RCM will be the same as observed. A remaining issue, though, is small-scale variability generated by the RCM that might be different from observed variability. In particular, if validating precipitation extremes, these may differ between the RCM and observations just because of natural variability.

5.1. Evaluated Metrics

[97] Depending on the application of the impact study, different metrics, or indices, of the downscaled precipitation may be of interest, including intensity metrics and temporal and spatial characteristics as well as metrics characterizing relevant physical processes.

[98] Metrics regarding precipitation intensity are mean, variance, and quantiles (i.e., return levels [Frei et al., 2006; Halenka et al., 2006; May, 2007; Friederichs and Hense, 2007; Fowler and Ekström, 2009; Maraun et al., 2010a]) or parameters of the precipitation distribution. A typical metric for heavy precipitation is the 90th percentile of precipitation on wet days [Goodess et al., 2010; Haylock et al., 2006]. Validation of extreme precipitation intensities (e.g., 50 or 100 year return levels), which are perhaps beyond the range of observed values, should be carried out on the basis of extreme value theory [e.g., Coles, 2001; Katz et al., 2002; Naveau et al., 2005]. Studies applying this framework are still rare; for some notable exceptions in a model intercomparison context, see Frei et al. [2006], Beniston et al. [2007], and Kendon et al. [2008].

[99] Temporal metrics are the autocorrelation function, the annual cycle, interannual and decadal variability [Maraun et al., 2010b] and trends, or metrics focusing on the precipitation occurrence such as wet day probabilities, transition probabilities (wet-wet), and the length of wet and dry spells [e.g., May, 2007; Semenov et al., 1998]. Extremal measures for temporal metrics are, e.g., the maximum number of consecutive dry days. Spatial characteristics are spatial correlations [Rauscher et al., 2010; Achberger et al., 2003], cluster sizes, or spatial patterns [Bachner et al., 2008].

[100] In addition, it is important to assess whether the processes leading to long-term changes in local precipitation are well captured by the models, in order for their projections of future change to be reliable [e.g., Kendon et al., 2009; D. Maraun et al., manuscript in preparation, 2010]. This may be examined through the validation of process-based metrics, for example, relationships of precipitation with the large-scale circulation or with temperature [e.g., Lenderink and van Meijgaard, 2008; Maraun et al., manuscript in preparation, 2010] or the mechanisms of soil-precipitation feedback [Schär et al., 1999].

[101] There have been several attempts to standardize indices; see the Expert Team on Climate Change Detection and Indices [e.g., Peterson et al., 2001; Nicholls and Murray, 1999] and STARDEX project [Goodess et al., 2010] for a full overview. Furthermore, a set of metrics and criteria has been defined in the ENSEMBLES project [van der Linden and Mitchell, 2009] in order to evaluate different aspects of the downscaling model. These aspects are (1) large-scale circulation and weather regimes, (2) temperature and precipitation mesoscale signal, (3) probability distribution functions of daily precipitation and temperature, (4) temperature and precipitation extremes, (5) temperature trends, and (6) temperature and precipitation annual cycle for RCMs and additionally of the stability of the predictor-predictand relationships for statistical downscaling (see a forthcoming special issue in Climate Research (E. Kjellström et al., manuscript in preparation, 2010)).

[102] The simulated and observed characteristics need to enter the validation procedure on comparable spatial scales; that is, point observations may need to be averaged to represent areal means. Scale mismatches, typically occurring when comparing areal model outputs with point measurements, might induce representativeness errors because of the lower variance of averaged values [Ballester and Moré, 2007; Tustison et al., 2001; Ivanov and Palamarchuk, 2007]. This is especially important for RCMs since not only is the grid point average smoothed over a large area, but neighboring grid points are also more correlated than in reality [Déqué, 2007].

5.2. Validation Measures

[103] Downscaling models (either dynamical or statistical) might be driven by GCM simulations or by observational data (often reanalysis data as surrogates). Validation for these two settings is fundamentally different. In the former case, simulated and observed weather are independent. Therefore, validation is limited to evaluating the distribution of precipitation over long periods in a particular grid box or the spatial structure of the climatology (section 5.2.2). In the latter case (called “perfect boundary conditions” in the case of dynamical downscaling), simulated and observed weather events can directly be related to each other. Here, in addition to validating the simulated distributions, validation techniques which have been developed for forecast verification can be applied. These techniques use the simulated time series as a prediction of the observed time series and assess the quality of the prediction (section 5.2.3). First, we will present measures that can be applied for the evaluation of both settings.

5.2.1. General Performance Measures

[104] Simple performance measures that can be applied to time series as well as to distributions and spatial patterns are bias, correlation, mean absolute error, and (root-) mean-square error. To visualize pattern correlation, root-mean-square error, and ratio of standard deviations simultaneously, Taylor diagrams have been introduced [Taylor, 2001] (see Figure 5 for an example where time series are compared). To assess the significance of discrepancies, statistical tests such as Student's t test may be carried out. For precipitation, nonparametric alternatives based on bootstrap resampling [Efron and Tibshirani, 1993; Davison and Hinkley, 1997] might prove useful [e.g., Bachner et al., 2008]. A complex validation diagnostic for spatial characteristics is SAL, which considers aspects of structure (S), amplitude (A), and location (L) of precipitation in a certain region [Wernli et al., 2008].

Figure 5.

Taylor diagram showing the performance of 18 RCMs to simulate annual precipitation over the Thames catchment, UK. The 18 RCMs are driven with ERA40 reanalysis data, such that observed and simulated time series represent the same weather sequence and can be directly compared. The angle is given by the correlation between simulated and observed times series, and the norm is given by the ratio of simulated and observed standard deviation. The distance between the observation point (1, 1) and a model point gives the root-mean-square error between observed and modeled time series, normalized with the observed standard deviation (F. Wetterhall, unpublished data, 2009).

5.2.2. Measures to Validate Distributions

[105] A framework to compare the distributions of simulated and observed precipitation characteristics consists of statistical tests, such as the χ2 test or the Kolmogorov-Smirnov test [e.g., Semenov et al., 1998; Bachner et al., 2008]. Another more graphical technique, especially for the validation of the extreme tail, consists of (quantile-)quantile plots [e.g., Déqué, 2007; Coles, 2001], where observed and predicted quantiles are plotted against each other. For simple validation methods based on quantiles, see Ferro et al. [2005]. Validation of extremal properties (such as return levels) may be done parametrically, i.e., by fitting a generalized extreme value distribution to block maxima or by fitting a generalized Pareto distribution to threshold excesses [Coles, 2001].

5.2.3. Measures to Validate Time Series

[106] Typical measures to compare simulated binary events (e.g., wet/dry) with the actual observed outcome are hit rate, false alarm rate, frequency bias, and log odds ratio [e.g., Jolliffe and Stephenson, 2003; Wilks, 2006; Stephenson, 2000]. For these measures, the simulated weather sequence needs to correspond to the observed weather sequence; therefore, the downscaling model needs to be driven by observed (or surrogate) large-scale weather. Continuous events can be considered, e.g., by defining a threshold. These measures can be displayed in 2 × 2 contingency tables. A powerful tool to evaluate them graphically is the two-dimensional relative operating characteristics diagram, which displays the hit rate against the false alarm rate.

[107] Some of the downscaling approaches discussed in section 4 predict distributions rather than individual values. Here classical measures comparing actual values are not directly applicable. Performance measures for such purpose are probability scores. The classical probability score to validate binary events, e.g., precipitation occurrence, is the Brier score [Brier, 1950]. To validate continuous events (e.g., precipitation amount) the (continuous) ranked probability score [Hersbach, 2000; Jolliffe and Stephenson, 2003] and the quantile verification score (see Friederichs and Hense [2007] and Maraun et al. [2010a] for examples) have been developed.

[108] Absolute values of performance measures are often not meaningful and are therefore compared with scores of reference predictions, such as the climatological mean. When developing a new downscaling method, a sensible reference prediction would be the best previously available downscaling. When assessing the predictive power of a certain predictor, a reference prediction would be the statistical model without this particular predictor. Relative measures of performance are skill scores, which can be derived from all of the aforementioned performance measures. Further skill scores are the Heidke skill score or the equitable threat score [see, e.g., Jolliffe and Stephenson, 2003; Wilks, 2006; Stephenson, 2000].

[109] To assess the performance of a downscaling approach on different time scales, Maraun et al. [2010b] applied the squared coherence [Brockwell and Davis, 1991]. They have investigated the performance of a statistical downscaling model on subannual, interannual, and decadal scales.

[110] To ensure robust results, any meaningful validation of time series needs to be carried out as cross validation; that is, the data used for the validation need to be independent of the data used for the model calibration. To this end, the data set is divided into a training subset and a validation subset. Splitting can be done either in time, by leaving out a certain time period for the validation, or in space, i.e., by leaving out a certain rain gage. Often, all disjunct subsets are successively left out.

5.3. Pseudorealities for Validation

[111] To overcome limitations in observational data and to better isolate different sources of error, validation in a pseudoreality has been suggested. Often, RCM validation is limited because of too sparse an observational network. Furthermore, it is difficult to isolate the contributions of the different components in the whole simulation to discrepancies between simulated and observed local variables. This problem can partly be overcome by driving the RCM with reanalysis data. However, even in this setting, errors caused by the nesting (i.e., the actual downscaling step) and the imperfection in the RCM itself cannot be discriminated. To address these issues, Denis et al. [2002] suggested what they call the “Big Brother Experiment”: a model world is created by a high-resolution large-area RCM simulation (“Big Brother”). A perfect prognosis large-scale representation of this pseudoreality is then created by spatially filtering the high-resolution field, which is then used as boundary conditions for the same RCM but on a smaller domain (“Little Brother”). Because of the perfect prognosis construction, the discrepancies between the Big Brother (pseudo-observed) and Little Brother (modeled) variables can exclusively be attributed to errors in the downscaling itself.

[112] Given the limited availability of long observational time series, each validation is limited by the simple fact that the time scales of interest are longer than the maximum available calibration period. This is crucial especially for statistical downscaling because stationarity issues are potentially more serious for statistical models than for models based on physical relationships. To address this potentially serious disadvantage, Vrac et al. [2007e] proposed a general method to validate statistical downscaling for future climate change in a model world. In addition to validating the statistical downscaling method against observations, they suggest evaluating whether the GCM driven statistical method is able to simulate realistic statistics. Furthermore, they suggest calibrating the statistical downscaling method on pseudo-observations from an RCM, driven by a GCM control run, and evaluating whether this calibrated statistical downscaling model performs well in a future scenario simulated with the same GCM and RCM.

6. SKILL OF DOWNSCALING APPROACHES TO MEET THE END USER'S NEEDS

[113] In sections 3 and 4, we have presented the state of the art in regional climate modeling and statistical downscaling. Here we discuss the extent to which the different approaches are able to meet the end user needs defined in section 2. In each case, we first present the performance of RCMs and then discuss MOS as a method of closing potential gaps between RCM output and the end user need. We then consider the skill of PP approaches and weather generators as stand-alone alternatives to dynamical downscaling.

[114] A recurring element in the discussion of downscaling skill is the difference between frontal and summertime convective precipitation. The former usually is quite homogeneous over large spatial and temporal scales, with moderate intensities. The latter is of a fine spatial-temporal structure, often with very high intensities. For an illustration, see Figure 6.

Figure 6.

Radar images of the region around Bonn, Germany. White indicates no rain, green indicates light rain, and red indicates heavy rain. (left) Image from 10 February 2000, 1616 LT. A cold front crosses and causes a wide band of rain of moderate intensity. (right) Image from 22 June 1999, 1043 LT. Many small convective cells, some of high intensity, cross the Rhineland. Reprinted with kind permission from the Meteorological Institute, University of Bonn, Bonn, Germany (http://www.meteo.uni-bonn.de/forschung/gruppen/radar/index_en.htm).

[115] In this section we first discuss the performance of downscaling approaches for different regions and seasons. We then discuss skill to simulate particular characteristics of precipitation related to the end user needs defined in section 2. We finally address the need for approaches to function in a changed climate.

6.1. Dependence of Downscaling on Region and Season

6.1.1. Regional Dependence of Downscaling Skill

[116] When assessing the potential to downscale precipitation it is important to first assess the performance of GCMs over the region of interest. For example, the GCMs in the latest Intergovernmental Panel on Climate Change report have biases in important large-scale circulation patterns like the El Niño–Southern Oscillation [e.g., Latif et al., 2001; Leloup et al., 2008], blocking (blocking occurs when large-scale high-pressure systems persist in a stable state for several days, effectively “blocking” or redirecting cyclones [e.g., Hinton et al., 2009] (see also Figure 7), monsoonal circulation, and tropical and extratropical cyclones [Meehl et al., 2007]. These deficiencies will affect the ability to downscale precipitation locally. However, even in these areas the value added by downscaling in comparison with precipitation directly taken from GCMs is still substantial [e.g., Christensen et al., 2007; Schmidli et al., 2006]. Global maps of correlations between gridded observations and seasonal precipitation in a GCM (ECHAM5) in which the large-scale atmospheric states have been nudged toward a reanalysis indicate for all seasons a high skill of rescaled (i.e., MOS corrected) GCM precipitation over most parts of the Northern Hemisphere midlatitudes, relatively low skill over Africa and parts of South America, and moderate or seasonally dependent skill elsewhere (Eden et al., submitted manuscript, 2010).

Figure 7.

Mean blocking frequency. Black indicates ERA40 reanalysis, and colors indicate GCMs from the Development of a European Multi-model Ensemble System for Seasonal to Interannual Prediction (DEMETER) project. The dots indicate longitudes where the model climatology is not significantly different from the verification data. The underestimation in blocking frequency would, in turn, underestimate the occurrence of, e.g., heat waves or wet spells. Reproduced from Palmer et al. [2008, Figure 3].

[117] RCMs have been developed for many regions of the world and, in principle, are transferable to other regions. However, when transferring RCMs to very different climates, parameterizations may have to be adapted and the validation might be limited by data sparsity. Statistical downscaling can technically be performed in any part of the world, limited only by the requirement for sufficient data to calibrate and validate the model (see Figure 4).

[118] The number of downscaling studies varies regionally; a rough estimate from a search on the Web of Science (20 March 2010, keywords “Statistical Downscaling” and region, and “Dynamical Downscaling” or “Regional Climate Model” and region) indicates that most studies have been carried out for Europe and North America. There is also a difference in the relative number of studies applying dynamical and (in general, PP) statistical downscaling. For Europe and North and South America there are roughly 4 times as many studies using RCMs than PP, whereas for Africa and Asia there are over 10 times as many, and for Australia the ratio is nearly 1. These differences can partly be explained by large initiatives such as PRUDENCE or ENSEMBLES (which also provides simulations for northern Africa) and by the availability of reliable and dense observational data.

[119] An objective assessment of the downscaling skill depending on region is therefore not possible, but we will point out some general conclusions. We will mainly draw on results from the PRUDENCE [Jacob et al., 2007; Graham et al., 2007a] and ENSEMBLES [van der Linden and Mitchell, 2009] model intercomparison projects for Europe and the STARDEX project [Goodess et al., 2010; Haylock et al., 2006; Schmidli et al., 2007] that compared several downscaling techniques in terms of their abilities to downscale high-precipitation events.

[120] Results over Europe show that the skill of RCMs is generally higher in the northern and western, wetter regions than in the drier, southern and eastern regions, but this varies from model to model [Murphy, 1999; Jacob et al., 2007]. MOS techniques have the potential to increase the skill of RCM precipitation across Europe [e.g., Boé et al., 2007; Déqué et al., 2007; Lenderink et al., 2007; Yang et al., 2010; Piani et al., 2009]. Results from the STARDEX project [Goodess et al., 2010] indicate results for PP statistical downscaling that are similar to the results for RCMs: higher skill over northern Europe than over southern Europe, although the skill strongly depends on the method used.

[121] Over regions with high terrain, RCMs considerably reduce the precipitation bias compared to GCM-simulated precipitation [e.g., Fowler et al., 2005; Buonomo et al., 2007]. Although some of the remaining bias may be inherited from the lateral boundary conditions, a large fraction is likely to be attributable to RCM downscaling error. RCMs over the Alpine region are able to reproduce the most prominent features of the spatial pattern of precipitation, but they show a wet bias along the northwestern windward slopes and a dry bias along the southeastern leeward slopes; precipitation intensity and the frequency of heavy events are underestimated [Frei et al., 2003, 2006].

[122] Salathe [2003] has shown that to reduce the bias to a level that allows a reliable simulation of monthly flow in mountainous catchments, a resolution of 0.125° is needed. Studies by Piani et al. [2009] and Themeßl et al. [2010] suggest that MOS could correct bias in high-elevation regions in Europe, including the Alpine region. Following an idea by Widmann et al. [2003], Schmidli et al. [2006] applied MOS directly to ERA40 precipitation and showed the potential of directly correcting GCM simulated precipitation.

[123] Regarding the representation of spatial precipitation variability in mountainous terrain, Hellström et al. [2001] and Hanssen-Bauer et al. [2003] concluded that PP statistical downscaling outperforms RCMs (with a spatial resolution of ∼50 km). However, in a study of the Alps Schmidli et al. [2007] found that RCMs, in general, outperformed PP in winter but were on a par regarding the summer precipitation. With respect to the regional dependency of downscaling, the two major gaps are (1) limited representation of local-scale precipitation in areas where the large-scale modes of variability are insufficiently represented by GCMs and (2) the limited availability and/or accuracy of downscaled precipitation in data-sparse regions.

6.1.2. Seasonal Dependence of Downscaling Skill

[124] The assessment of 50 km resolution RCMs from the PRUDENCE project has shown that downscaling skill is generally better in winter than in summer across Europe [Frei et al., 2006; Jacob et al., 2007; Fowler and Ekström, 2009]. In winter, models tend to be too wet in northern Europe [Christensen et al., 2007], and in summer, models tend to be too dry over southern and eastern Europe [Jacob et al., 2007]. In the Alpine domain, biases of up to several tens of percent have been reported both for mean and for particularly extreme precipitation [e.g., Frei et al., 2003, 2006]. Recent work within the ENSEMBLES project, however, has shown that 25 km RCMs driven by ERA40 boundary conditions give a good representation of rainfall extremes over the UK both in winter and in summer, indicating that higher model resolution might improve the representation of summer extremes (Buonomo et al., manuscript in preparation, 2010). By applying MOS on a seasonal basis, the representation of the annual cycle can be improved [Boé et al., 2007; Leander and Buishand, 2007].

[125] Like dynamical downscaling, statistical downscaling of precipitation shows greater skill in winter than in summer (for Sweden, see, e.g., Wetterhall et al. [2007]). Results from the STARDEX project [Goodess et al., 2010; Haylock et al., 2006] indicate the same seasonality in the skill to downscale heavy precipitation. However, for the UK Maraun et al. [2010a] found no seasonality in the skill to model the magnitude of monthly maxima of daily precipitation.

[126] Both dynamical and statistical downscaling approaches show less skill in downscaling precipitation in summer, which may relate to the difficulty in modeling convective precipitation. As such, providing accurate downscaled projections of precipitation in this season remains a challenge and potentially represents a remaining gap in meeting end user needs.

6.2. Downscaling Skill to Model Precipitation Characteristics

6.2.1. Event Intensity

[127] Analysis of the PRUDENCE RCMs showed that models generally perform well for moderate precipitation intensities, with the greatest discrepancies for days with either light precipitation (<5 mm/d) or very heavy precipitation (>80 mm/d) [Boberg et al., 2009]. Most RCMs tend to overestimate the occurrence of wet days (“drizzle effect”) but underestimate heavy precipitation [Murphy, 1999; Fowler et al., 2007b]. There is evidence that this tendency is not region specific, although to some extent, it varies between different RCMs [Fowler et al., 2007b]. This tendency is also found to extend to RCMs with grid scales less than 20 km [Früh et al., 2010].

[128] Over the UK, for which there is a dense rain gage network, RCMs have been shown to realistically simulate extreme precipitation on an annual basis for return periods of up to 50 years [Fowler et al., 2005; Buonomo et al., 2007]. However, there is evidence that RCMs tend to underestimate extreme precipitation, in particular, where rainfall is heaviest [Fowler et al., 2007b; Buonomo et al., manuscript in preparation, 2010] and for more intense events [Buonomo et al., 2007]. On the 50 km grid scale model biases are highly spatially variable, ranging from −50% to +50% for 5 year return period events [Fowler et al., 2005; Buonomo et al., 2007], and also model dependent.

[129] In general, high precipitation intensities occur in association with mesoscale convection or because of orographic enhancement. Thus, the tendency for RCMs to underestimate high-intensity events may be due to inadequate representation of convective processes. While over high terrain, model biases may be explained by inadequate resolution of the topography at the RCM grid scale.

[130] The main rationale for using MOS is to correct RCM precipitation intensities, in particular, the drizzle effect and underestimation of heavy precipitation. A simple approach to correct the drizzle effect is to set all modeled precipitation values below a certain threshold to zero [e.g., Hay and Clark, 2003; Schmidli et al., 2006; Piani et al., 2009]. To improve the representation of precipitation intensities, different methods have been proposed (see section 4.2). Scaling precipitation corrects the mean and variance of precipitation by the same factor. This is generally a reasonable assumption for the core of the intensity distribution, but scaled precipitation might be biased for light and heavy precipitation. A more flexible tool is quantile mapping, which considers the whole frequency distribution of observed values. However, this approach does not explicitly consider the tail of the distribution, and extreme events might be misrepresented. A solution, which, to our knowledge, has not been applied in this context, might be the mixture model suggested by Vrac and Naveau [2007].

[131] Early attempts at PP statistical downscaling have long been recognized to be oversimplistic in terms of representing the observed intensities: they ignored random variability (either completely or by using inflation; see section 4.1.3) and were generally unsuitable for modeling extremes (see Figure 8). von Storch [1999] therefore suggested to randomize the downscaled time series by adding noise realizations. Haylock et al. [2006] and Goodess et al. [2010] compared the performance of several downscaling approaches regarding the representation of different measures of precipitation intensity and found no single approach to perform systematically better than others. Not included in these intercomparison studies have been approaches based on GLMs. These models, in a simple PP setting (section 4.1) or incorporated into a stochastic weather generator (section 4.3), elegantly model the unexplained variability, commonly using a gamma distribution to generate random variability [e.g., Yang et al., 2005; Furrer and Naveau, 2007] (see also Figure 8).

Figure 8.

(a) Distribution of daily winter precipitation for Cambridge, Botanical Garden, 2 January 1898 to 31 December 2006. Grey histogram shows all observed wet day amounts. Red histogram shows amounts predicted by a simple multiple linear regression using airflow strength, direction, and vorticity as predictors. The variability is greatly underestimated and is not skewed. Blue line indicates gamma distribution, providing a suitable model for the core of the distribution. (b) The tail (>20 mm). Blue line indicates gamma distribution, which considerably underestimates the tail of the distribution. Orange line indicates exponential tail (or short/light tail), and green line indicates generalized Pareto (GP) distribution with a shape parameter of approximately 0.2 (heavy tail). For the plot, both extreme value distributions are rescaled to match the scale of the full distribution. The extreme value distributions suitably model the observed threshold exceedances, although further diagnostic plots (not shown) reveal a better fit of the GP distribution. (c) The exponential tail considerably underestimates the occurrence of extremes beyond the observed values.

[132] Evaluation studies so far have focused on moderately heavy rain. For example, in their study on heavy precipitation over the United Kingdom, Haylock et al. [2006] choose the 90th percentile on wet days, roughly corresponding to subannual return levels. For many impact studies and design settings, however, much higher return levels of the order of decades or centuries are relevant. In general, there is no guarantee that statistical models for the core of the distribution will provide an adequate representation of extremes [Wilks and Wilby, 1999] (see also Figure 8). The distribution of precipitation tends to be heavy tailed [Katz, 1977], and statistical downscaling schemes that do not account for this are likely to be heavily biased for high extremes. Recently, statistical models based on extreme value theory have been developed for precipitation [Maraun et al., 2010a, 2010b], which can easily be extended for downscaling. However, as these approaches only model the extreme tail but not the core of moderate precipitation, they are limited in their applicability. Yang et al. [2005] demonstrated that it is possible to obtain heavy-tailed distributions by incorporating nonlinear dependence structures into GLMs based on gamma distributions; however, at present, the conditions under which heavy-tailed distributions can be obtained from this kind of model are poorly understood. A possible alternative solution could be mixture models such as the one suggested by Vrac and Naveau [2007]. Here the authors combine a gamma distribution to model moderate precipitation and a generalized Pareto distribution to model extremes. The performance of these approaches has not yet been compared with standard statistical downscaling schemes. To summarize, downscaling has the potential to reliably simulate event intensities, in particular, when correcting RCM output by MOS or using PP methods to predict full distributions.

6.2.2. Temporal Variability and Time Scales

[133] Studies for the UK have shown that the extent to which model biases increase or decrease for longer-duration events depends on the region and the RCM [Fowler et al., 2007b; Fowler and Ekström, 2009]. For Hadley Centre RCMs, Buonomo et al. [2007] find greater biases for longer-duration (30 day accumulation) extremes compared to 1 day events in regions of heavy precipitation but quite different behavior where long-duration extremes are strongly influenced by lighter precipitation events.

[134] There are relatively few studies to date examining RCM skill in simulating subdaily precipitation. A recent study by Lenderink and van Meijgaard [2008], however, shows deficiencies in the ability of the 25 km RACMO RCM to capture hourly precipitation for temperatures above 20°C. This deficiency is likely to be particularly important in summer months where convective processes may dominate and temperatures are high. Hohenegger et al. [2008] have shown that very high resolution (grid scale ≤5 km) climate modeling improves the diurnal cycle of convection. The representation of short-duration precipitation extremes is also significantly improved at high resolution [Wakazuki et al., 2008]. These resolutions are now common practice in numerical weather prediction [Roberts and Lean, 2008] but are computationally very expensive and thus are currently limited to either short time periods or small spatial domains. For an illustration of model deficiencies in simulating subdaily precipitation, see Figure 9.

Figure 9.

Intensity-duration plot: 5 year return period of precipitation intensities for subdaily durations, from Stockholm, Sweden. Black line indicates observed data, and blue line indicates regional climate model RCA driven by ERA40 reanalysis data.

[135] As MOS is designed to correct precipitation intensities, it does not improve the temporal structure. Even the adjustment of the number of wet days does not guarantee an improved representation of the lengths of dry and wet spells. However, the representation of seasonality can be improved by applying MOS to different seasons [Boé et al., 2007] or months separately or to even shorter parts of the year (e.g., 5 day periods [Leander and Buishand, 2007]). If precipitation sums over longer time periods such as monthly totals are of interest, MOS could be applied to time aggregated precipitation.

[136] In PP statistical downscaling the temporal structure is not explicitly modeled. However, the large-scale predictors impose their time structure on the local-scale precipitation. For instance, Haylock et al. [2006] and Goodess et al. [2010] have shown that the maximum number of consecutive dry days is generally better modeled than the intensity of heavy rainfall, indicating that a reasonable fraction of the time dependency is captured by the predictors. Maraun et al. [2010b] found that predictors representing the large-scale atmospheric circulation explain a significant fraction of the monthly, interannual, and decadal variability of high precipitation intensities. Weather generators explicitly model the short-term day-to-day variability (see section 4.3) but require large-scale predictors to correctly simulate long-term variability [Wilks and Wilby, 1999].

[137] Weather generators, such as Poisson cluster models, can provide subdaily precipitation. They can, in principle, be implemented without subdaily data but perform better when calibrated against subdaily data [Cowpertwait et al., 1996]. For a reasonable calibration, at least 10 years of data are required; to calibrate the models for subdaily extreme precipitation, even longer time series are required. Furthermore, they are generally conditioned on daily RCM change factors and thus cannot provide subdaily information on climate change [Jones et al., 2009].

[138] In summary, deficiencies remain in the ability of downscaling methods to generate local precipitation time series with the correct temporal variability. Many of these deficiencies are inherited from the driving GCMs, with deficiencies in the representation of blocking and tropical modes of variability [e.g., Ringer et al., 2006] (see also section 6.1). RCMs and PP weather generators can “add value” in terms of the representation of short temporal variability.

6.2.3. Spatial Coherence and Event Size

[139] In terms of spatial variability, two potential problems need to be considered: misrepresentation of event size, structure, and spatial coherence, e.g., by overestimating the extent of convective cells, and misplacement of precipitation events, e.g., due to orographic effects.

[140] RCMs tend to overestimate the spatial coherence of precipitation events. As discussed in section 3, convective events are difficult to model, and therefore, these events are often too low in intensity and extend over too large an area. This problem might be solved in the future with higher resolution and improved numerical schemes. Large-scale frontal precipitation is generally well simulated by RCMs, although the coarse orography, especially in mountainous regions, can cause erroneous spatial distributions of precipitation [Frei et al., 2003]. In addition to improving subdaily precipitation representation, very high resolution climate modeling ensures more accurate localization of rainfall maxima over regions of complex topography [Hohenegger et al., 2008].

[141] Most MOS approaches are not designed for correcting errors in spatial correlations since the predictand still inherits much of the spatial correlation structure of the simulated precipitation [Boé et al., 2007]. However, Widmann et al. [2003] suggested a nonlocal MOS: they applied singular value decomposition to derive coupled spatial patterns of simulated and observed precipitation. These patterns can have a different structure with high values over different locations, such that this approach, in principle, can correct unrealistic aspects in the location and spatial structure of the simulated precipitation, which may be caused, for instance, by an unrealistic topography in a numerical model.

[142] Within individual grid boxes, He et al. [2009] have attempted to account for subgrid orography by distributing the simulated precipitation according to observed patterns. There are examples of MOS weather generators (using change factors derived from RCMs to represent climate change) that have been extended to a high-resolution grid (e.g., 5 km [Jones et al., 2009]), but these are run independently for each grid point.

[143] Standard PP statistical downscaling is facing a dilemma: in a “deterministic” context, i.e., without explicitly adding noise to the downscaled variables, the predictors impose a strong spatial coherence. Yet randomization in the form of adding uncorrelated noise might weaken the spatial coherence too much. The same holds for weather generators based on weather states, which themselves induce intersite correlations. At large spatial scales, it may be reasonable to consider that all of the intersite dependence is captured. Often, however, and particularly at smaller spatial scales, the induced dependence is weaker than that found in observations. A way out is the explicit modeling of spatial dependence, i.e., using multisite weather generators (section 4.3.2) or full-field weather generators (section 4.3.3). The analog method, either in a simple PP setting (section 4.1.3) or extended to a weather generator (section 4.3.2), provides an easy way to simulate spatially coherent and realistic fields. However, this method cannot simulate unobserved weather patterns which might emerge because of changes in the atmospheric circulation. Therefore, its use for climate change projections is limited, especially in simulating fields of extreme precipitation. All PP methods, including PP weather generators, can, in principle, correctly represent orographic influences as their calibration intrinsically accounts for the interplay between the large-scale atmospheric circulation and the orography, such as lee and rain shadow effects.

[144] The representation of spatial variability is limited by the density of the rain gage network. Still unresolved is the issue of full-field precipitation, i.e., the provision of downscaled precipitation between rain gages. Often, this problem is addressed by interpolation from neighboring sites. However, such techniques are a form of smoothing that leads to underestimation of rainfall variability, especially on short time scales and for extremes [e.g., Hofstra et al., 2008]. This is particularly serious in mountain areas, where the relationships between orography and precipitation are very complex and the rain gage network is generally sparse compared to the high spatial variability (for a notable exception, see Frei and Schär [1998]).

6.2.4. Physical Consistency

[145] RCMs model the full atmospheric state and therefore intrinsically address physical coherence. However, small temperature biases might lead to considerable biases in impact models when temperature and precipitation are required. Yang et al. [2010] showed that a MOS correction of temperature and precipitation bias could improve the simulation of river discharge in spring. In general, however, it should be noted that MOS may disrupt internal consistency between weather variables, especially between temperature and precipitation.

[146] Pure PP statistical downscaling does not, in general, explicitly model physical coherence between variables unless, for example, large-scale temperature is used as predictor for precipitation [e.g., Chun et al., 1999]. This is, however, problematic since high summer temperatures may be a consequence of dry conditions (i.e., due to clear skies) or a cause of convective wet conditions, so the correlations are difficult to interpret [Wilby and Wigley, 2000]. Unlike other PP approaches, the analog method intrinsically captures physical coherence.

[147] Most weather generators attempt to model the relationships between relevant variables, mostly by regressing other variables on the generated precipitation [Kilsby et al., 2007; Jones et al., 2009]. An advancement of this approach based on GLMs was developed by Furrer and Naveau [2007]. In these methods, the other variables are derived from the downscaled precipitation without referencing the actual variable (e.g., temperature) in the driving GCM.

6.3. Downscaling for Future Climate Change

[148] Downscaling of climate change scenarios requires the chosen methodology to function in a perturbed climate, i.e., under conditions different from those for which it was developed [Huth and Kyselý, 2000]. Therefore, skill for the present-day climate, although necessary, may not be a sufficient indicator of skill for the future climate [e.g., Charles et al., 1999; Christensen and Christensen, 2007]. It is also difficult to objectively quantify model skill as different models perform better for different variables and processes.

[149] When discussing skill to downscale future climate scenarios, two points affecting the skill have to be addressed, both for dynamical and statistical downscaling. First, stationarity of the physical and statistical relationships has to be established, and second, the driving GCM simulation needs to be informative for the downscaled variable. Closely connected with downscaling of future scenarios is the question of predictability and uncertainty. Often, model consensus is taken as evidence for robust skill. This assumption will be critically reviewed.

6.3.1. Model Consensus as a Measure of Skill

[150] Model consensus does not imply reliability since there may be missing processes or deficiencies common to all models. An understanding of the underlying processes and mechanisms of change, and their evaluation in models, is key to assessing reliability. Modeling, theory, and observational studies suggest that increases in extreme precipitation are reliable, at least on large scales, since they are dominated by increases in atmospheric moisture with warming [Allen and Ingram, 2002; Allan and Soden, 2008; Kendon et al., 2009]. However, for local precipitation extremes, small-scale dynamics of clouds and the subcloud layer and cloud microphysics as well as changes in precipitable water may play an important role [Lenderink and van Meijgaard, 2008]. These small-scale processes are not well represented in current RCMs, as evident from deficiencies in the simulation of high precipitation intensities for the present-day climate. The same holds for statistical downscaling as predictors used in different approaches are often similar, if not identical, and all approaches ultimately rely on a small number of driving GCMs.

[151] Some degree of confidence might be gained from comparing dynamical and statistical downscaling techniques [e.g., Murphy, 1999; Haylock et al., 2006]. For other model comparison examples, see Semenov et al. [1998], Zorita and von Storch [1999], Schmidli et al. [2007], and Timbal et al. [2008b]. In fact, dynamical downscaling and statistical downscaling can be used to mutually validate one another. For instance, an RCM pseudoreality can be used to validate statistical downscaling approaches (section 5.3) [Vrac et al., 2007e], and statistical downscaling can be used to validate physical relationships in the RCM (section 5.1) [Kendon et al., 2009; Maraun et al., manuscript in preparation, 2010].

6.3.2. Stationarity

[152] In the case of dynamical downscaling, assumptions need to be made for RCM parameterizations to be valid in a perturbed climate. This may be a significant issue for RCMs that have been developed for a specific region. For RCMs that have been shown to perform well for multiple regions, there is greater confidence in the applicability of the parameterization schemes in future climates [Christensen et al., 2007].

[153] When correcting the RCM output, the stationarity issue might become more serious. Most MOS methods described in section 4.2 correct the distribution of modeled precipitation, estimated over a long time interval. However, this distribution is, in fact, a mixture of various other distributions, depending on the different weather conditions. Since the relative frequency of different weather conditions might change in a future climate, the resulting mixed distribution might also change, such that the correction function is potentially not valid under climate change. For instance, Christensen et al. [2008] suggest that biases may not be invariant in a warming climate. This argument holds, in particular, for methods that scale observed or control run precipitation, which do not account for possible dynamic changes in temporal variability, for instance, in the frequency of circulation patterns [e.g., Lenderink et al., 2007].

[154] The stationarity issue is also significant for PP statistical downscaling. The more heuristic and less physical the predictor/predictand relationship, the less confident one can be that the relationship might remain stable under climate change. A way to gage the transferability of statistical relationships into the future is to use a sensitivity analysis when calibrating a statistical downscaling method [Frías et al., 2006]. One way is to build the model on data from the coldest (driest) years and then validate it on data from the warmest (wettest) years, thus testing the scheme on two different climate situations. The model can also be tested against extreme years in order to test the stability [Wilby, 1994]. If the time series used for calibration are long enough, it is reasonable to believe that they are representative of those situations that will be more frequent in a future climate [Zorita and von Storch, 1999]. Confidence in the approach is highest if it can model such situations and if the range of variability of the large-scale variable in a future climate is of the same order as today.

[155] Sometimes, nonstationarity in the relationships is only an artifact because the chosen predictors do not convey enough information about long-term variability. Wilby and Wigley [1997] showed that certain changes in the relationship between weather types and precipitation in the UK could be explained by a modulating effect of the central England temperature. Therefore, it is necessary to identify all predictors informative for climate change and to incorporate them in a multivariate approach. A similar issue is discussed by Wilby et al. [2004] regarding nondynamical shifts of predictors due to climate change. Spurious effects on rainfall could be corrected by subtracting the mean shift from the predictors.

6.3.3. Capturing Climate Change

[156] For reliable simulations of future climate, the mechanisms of future change in precipitation need to be represented [e.g., Kendon et al., 2009]. Thus, it is important that the processes leading to long-term changes in local precipitation, such as relationships of precipitation with the large-scale circulation or with temperature [e.g., Lenderink and van Meijgaard, 2008] or the mechanisms of soil-precipitation feedback [Schär et al., 1999], are well captured by the models.

[157] Biases in the GCM-simulated large-scale atmospheric circulation might considerably bias the RCM simulation. For instance, Leander et al. [2008] noted that the representation of extreme precipitation events is potentially sensitive to the driving GCM, limiting the overall possibility to correctly downscale high-intensity rainfall.

[158] A similar argument holds for MOS applications, only on smaller scales. Any correction yields meaningful results only if the temporal variability or the long-term changes in the simulated precipitation are good predictors for the changes in the real world. In the case of MOS calibrated on the basis of reanalysis-driven RCMs or GCMs nudged toward reanalyses this can be assessed directly by comparing the simulated and observed changes in the past, whereas in control run calibrated setups that allow only distribution-wise MOS it is difficult to judge whether the application of MOS corrections is justified. Where the simulated precipitation has simply no skill the application of distribution-wise MOS would not be justified, even if the corrected and observed precipitation intensity distributions could be brought into perfect agreement.

[159] In PP statistical downscaling, the choice of predictors is crucial to capture climate change (see section 4.1). Predictors that are informative on relatively short time scales might not capture long-term variability and, in particular, trends induced by global warming. PP statistical downscaling approaches also rely on the skill of the driving GCM to correctly simulate the relevant predictors. A predictor that is characterized as informative might be of little use if it cannot be assumed to be reliably modeled in the GCM/RCM (in particular, moisture-related quantities are generally considered problematic [Cavazos and Hewitson, 2005]).

6.3.4. Uncertainty and Predictability

[160] An important aspect in assessing predictability is the quantification of the total uncertainty of the downscaled result and the sources that contribute to it. For predictability, the main sources of uncertainty are model formulation, which includes the numerical schemes, parameterizations, and resolution; uncertainty in anthropogenic climate forcing factors; and natural variability [Palmer, 1999; Hawkins and Sutton, 2009], which includes internal variability of the chaotic climate system dependent on initial conditions and natural forced variability due to, e.g., solar forcing.

[161] The range of uncertainty due to model formulation in general, parameterizations in particular, and natural variability can be assessed by ensemble simulations based on different GCMs and RCMs (multimodel ensembles), perturbed parameterizations (perturbed physics ensembles), and different initial conditions. Notable initiatives are the PRUDENCE, ENSEMBLES, and CORDEX projects, which study the uncertainty due to structural errors of different GCMs and/or RCMs. For the development of the probabilistic UKCP09 national climate change projections, a large GCM ensemble with perturbed physics parameterizations was used to drive the Hadley Centre regional climate model HadRM3 [Murphy et al., 2009].

[162] The relative roles of these different sources of uncertainty depend on the time scales under consideration. On decadal time scales, the climate change signal is small compared to natural variability, such that uncertainty caused by initial conditions and natural forcing dominates. Memory, and thus predictability, of natural variability on decadal time scales is generated by the oceans. However, because of limited availability of (deep) ocean data to initialize the prediction, predictability is, in practice, limited. Research on decadal climate predictions is just emerging [e.g., Collins et al., 2006; Smith et al., 2007; Keenlyside et al., 2008], and no regional climate predictions on decadal scales exist. As natural decadal variability increases with decreasing spatial scale, the extent to which regional decadal predictions are possible is largely unknown.

[163] On longer time scales, the signal to noise ratio between climate change signal and natural variability increases, and uncertainty due to model formulation becomes dominant. For instance, results from the PRUDENCE project suggest that GCM uncertainty dominates in the case of changes in seasonal mean climate [Rowell, 2006; Déqué et al., 2007], and variations in RCM formulation are important at fine scales and for changes in precipitation extremes, particularly in summer [Frei et al., 2006]. However, recent studies [Kendon et al., 2009; Kendon et al., 2010] suggest a still dominant role of natural variability for summertime precipitation and precipitation extremes, such that a single 30 year climate projection is not robust. It should be noted that a climate projection represents just one possible realization of the future climate, conditional on a given scenario of natural and anthropogenic forcing.

7. CONCLUSIONS AND OUTLOOK

[164] Reliable downscaling for precipitation is needed, independent of region and season. Depending on the application, generic needs are the correct representation of (1) intensities, (2) temporal variability, (3) spatial variability, and (4) consistency between different local-scale variables, and these are required for future scenarios.

[165] To meet these specific needs, there have been considerable efforts to further develop dynamical and statistical downscaling. We reviewed several recent developments in statistical downscaling, which have not yet received much attention in the climate community. These developments focus on capturing intensities, especially extremes, and the representation of spatial-temporal variability. However, there are still major gaps which currently are not resolved by downscaling:

[166] 1. Downscaling in regions with sparse data is still highly uncertain, mainly in remote areas or developing countries (see Figure 4). RCMs can, in principle, be set up in these regions, but they may not correctly represent region-specific processes. With data sparsity, their validation is limited. Statistical downscaling is even more restricted in such regions, especially to assess precipitation extremes and spatial variability. This problem will make it harder for end users operating in these countries to make optimal planning decisions in all areas, e.g., from water resources, to flood risk management, to urban design, to agricultural activities.

[167] 2. The performance of both dynamical and statistical downscaling schemes is currently better for synoptic and frontal systems than for convective precipitation. End users that are adversely affected by this limitation would be the flood risk managers in arid regions subject to flash flooding or in temperate regions subject to summer flooding. In these cases, improvements in the representation of heavy, localized convective precipitation are needed (see also Figure 6).

[168] 3. Representation of subdaily rainfall is still poor, especially for extremes, both by RCMs and by statistical downscaling. Furthermore, few statistical models are currently available that attempt to capture subdaily information on climate change. The end user community most seriously impacted by this limitation consists of urban planners since runoff generation from largely impermeable urban areas occurs rapidly and is highly sensitive to the fine temporal scale distribution of precipitation (see also Figures 6 and 9).

[169] 4. Downscaling to a fully distributed spatial field at scales smaller than RCM grid size is still unresolved. Full-field weather generators are under development but have not yet been implemented for downscaling. One end user affected by this limitation is the hydrological impact modeler using a spatially distributed model for areas sensitive to the spatial distribution of precipitation, such as small catchments or catchments with an impermeable underlying geology (see also Figure 6).

[170] 5. Changes in small-scale processes (on sub–RCM grid scales) and their feedback on the large scale are not adequately captured in projections of precipitation change. Currently, it is difficult to identify how significant this shortcoming may be and, indeed, which end users may be more affected. For instance, the importance is likely to be seasonally and regionally dependent. This shortcoming remains a challenge for climate modelers.

[171] 6. All downscaling approaches inherit errors in the representation of temporal variability from the driving GCM. Examples are blocking over Europe (see Figure 7) and tropical modes of variability. The former example is especially relevant for agriculture as blocking strongly influences the length of dry spells. Summer drought often comes along with heat waves and thus affects health authorities as well.

[172] These gaps are caused by poor data availability, process understanding, and quality of the GCMs and limitations with the downscaling procedure itself. In the following, we lay out directions in research to address the remaining gaps.

[173] In regions with sparse rain gage networks, the installation of new gages will improve the situation in the long run. However, in many regions networks do exist, but the data have not been made available by the responsible institutions such as national weather services. Here efforts should be undertaken to make these data readily available and to assemble high-resolution gridded data sets as input for hydrological models (where these require spatially averaged rainfall inputs) or for climate model validation; see Haylock et al. [2008] for an example in Europe. Furthermore, digitizing handwritten reports can help to extend the daily database [Moberg and Jones, 2005]. Especially in urban areas, denser networks of subdaily data need to be set up.

[174] The quality of GCM climate projections is constantly improving, and the latest generation of models shows better representation of climate variability [Shaffrey et al., 2009].

[175] In terms of future RCM development there are two competing strands. The first concentrates on developing multimodel ensemble systems, including multiple RCMs as well as multiple GCMs, to quantify modeling uncertainty. Performance-based weighting of different RCMs could add value [e.g., Fowler and Ekström, 2009], although model weighting is a nontrivial task. The second aims to improve the simulation of local processes through the development of RCMs of increasing resolution (which includes improvements in the parameterizations). This is expected to lead to improvements not only in terms of the spatial scale on which meaningful information is provided but also the accuracy of subdaily precipitation.

[176] A key feature of statistical downscaling is the ability to generate complete distributions. They can be used to randomize the downscaled result and thus better represent local variability and extremes. These techniques should be used by default, in particular, when downscaling of extremes is required. However, these methods mostly involve a considerable statistical and computational knowledge; therefore, especially for multistation downscaling, accessible implementations suitable for routine use by researchers and practitioners are needed.

[177] Because of the still limited understanding of multivariate extreme value statistics [e.g., Coles, 2001], multistation weather generators have not yet been extended to explicitly capture extremes. The characterization and modeling of spatial extremes is currently an active area of statistical research.

[178] A promising direction of research is the application of MOS to correct climate model output. Currently, the proposed methods almost exclusively use modeled precipitation as predictors and mostly correct distributions only. None of the approaches explicitly account for extremes. It has been shown that MOS could be applied to directly correct GCM simulations [Widmann et al., 2003]. This approach might prove useful for regions where no RCM simulations are available.

[179] We presented the potential usefulness of full-field weather generators for hydrological modeling. The complexity of existing full-field spatial-temporal models may suggest that it is not currently a realistic research aim. However, rather than add complexity to a spatial-temporal model, conditioning upon climate model outputs may provide useful information for the difficult task of representing advection. Research into linking parameter models with climatological information should be seen as a first step in this direction.

[180] Providing probabilistic climate projections is a key challenge. Initiatives such as PRUDENCE, ENSEMBLES, CORDEX, UKCP09, and Climate Prediction Net provide a first step toward probabilistic climate projections. They have generated a wealth of information about uncertainty in model formulation, but they still do not cover the full plausible range of model uncertainty and do not sufficiently address uncertainty due to natural variability. In particular, on decadal time scales, probabilistic predictions are needed because the anthropogenic climate change signal is still low compared to natural variability. We note that while GCM and downscaling uncertainties can partly be reduced in the future, the internal variability leads to fundamental limitations of predictability, which can be expected to strongly depend on the location and on the precipitation properties under consideration.

[181] In almost all forms of downscaling today, the coarse-scale conditions given by the GCM are taken as fixed. However, this does not reflect the reality of the real climate system in which there are feedbacks between coarse and fine scales. This has been noted by Wilby et al. [2004] as a limitation of statistical downscaling schemes, but of course, it applies equally to RCMs. To represent these feedbacks in any climate simulation will require coupled runs of the coarse- and fine-scale models, and although the implications for impacts applications are unknown at present, this represents an exciting challenge for a future generation of downscaling techniques.

GLOSSARY
Climate Prediction Net:

Initiative to enable probabilistic predictions of future climate conditional on a scenario [Stainforth et al., 2005]. A GCM is run on thousands of home computers to create a large ensemble of future projections, each of which is given a certain likelihood given observational data. (See http://climateprediction.net/.)

Coordinated Regional Climate Downscaling Experiment (CORDEX):

Recent initiative from the World Climate Research Program for running multiple RCM simulations at 50 km resolution for multiple regions. (See http://copes.ipsl.jussieu.fr/RCD_CORDEX.html.)

Dynamical downscaling:

Nests a high-resolution regional climate model into a lower-resolution global climate model to represent the atmospheric physics with a higher grid box resolution within a limited area of interest.

ENSEMBLES:

Project of the European Union 6th framework program. The project created ensembles of general circulation models and regional climate models for Europe and North Africa, developed statistical downscaling models and tools, and constructed a high-resolution gridded validation data set. (See http://ensembles-eu.metoffice.com; RCM data are available at http://ensemblesrt3.dmi.dk/.)

European Centre for Medium-Range Weather Forecasts 40 Year Reanalysis (ERA40):

Six hourly reanalysis of the European Centre for Medium-Range Weather Forecasts, September 1957 to August 2002. Basic resolution of 2.5° × 2.5°, full resolution of 1.125° × 1.125°. (See http://www.ecmwf.int/research/era/.)

Global climate model (GCM):

The acronym GCM usually stands for general circulation model, but it is often, as in this paper, also used for global climate model. A general circulation model is a dynamical model that numerically integrates the Navier-Stokes equations for either atmosphere or ocean across the globe, typically of a resolution of 100–200 km. Atmosphere and ocean general circulation models are key components of global climate models, which, in general, additionally include sea ice and land surface components.

Model output statistics (MOS):

A statistical downscaling approach that corrects dynamical model simulations. The statistical model is calibrated against simulated predictors and observed predictands. Therefore, the statistical model is only valid for the dynamical model it was calibrated with.

National Centers for Environmental Prediction and National Center for Atmospheric Research (NCEP/NCAR) reanalysis:

Six hourly reanalysis of the National Centers for Environmental Prediction (NCEP) and the National Center for Atmospheric Research (NCAR), 1948 to present. Available at 2.5° × 2.5° and 1.875° × 1.875°. (See http://www.esrl.noaa.gov/psd/data/reanalysis/reanalysis.shtml.)

Perfect prognosis (PP):

a statistical downscaling approach that assumes that the predictor variables are perfectly modeled by the dynamical model used. The statistical model is calibrated against large-scale and local-scale observed data and then is transferred to an arbitrary dynamical model that is assumed to fulfill the PP assumption.

Prediction:

An estimate of a future climate state (or a range of states) that is assigned a certain probability (which might be low or subjective) to occur. Climate predictions are possible only for relatively short time scales (seasons to decades) because beyond these time scales the influence of different emission scenarios begins to dominate (see projection).

Prediction of Regional Scenarios and Uncertainties for Defining European Climate Change Risks and Effects (PRUDENCE):

Project of the European Union 5th framework program. (See http://prudence.dmi.dk/.)

Projection:

A simulation of the response of the future climate to a forcing scenario that is not assigned a certain probability. A projection is therefore only a plausible state of the future climate.

Reanalysis data:

Combination of observational data and the forecast of a high-resolution global climate model to build a best estimate of a consistent global weather state. They fill gaps in observational data and provide estimates of nonobserved variables.

Regional climate model (RCM):

High-resolution dynamical climate model, typically of a resolution of 25–50 km, though some recent models provide a resolution of 10 km or less. Usually a limited area model nested into a GCM over a specific region.

Statistical and Regional Dynamical Downscaling of Extremes for European Regions (STARDEX):

Project of the European Union 5th framework program. (See http://www.cru.uea.ac.uk/projects/stardex/.)

Statistical downscaling:

Establishes statistical links between large-scale weather and observed local-scale weather. Either PP or MOS.

UK climate projections (UKCP09):

A project funded by the Department for Environment, Food and Rural Affairs to create regional probabilistic climate projections and a weather generator for the United Kingdom. (See http://ukclimateprojections.defra.gov.uk/.)

Weather generator:

A stochastic model to create random time series which resemble the observed weather statistics (marginal distribution, short-term temporal variability, and sometimes spatial dependence between multiple sites) at a certain point. To account for variability on longer time scales, weather generators can be run in a downscaling context, either PP or MOS.

Acknowledgments

[182] This paper was inspired by the workshop “Precipitation Downscaling and Modeling” held at the Climatic Research Unit in Norwich, 28–30 April 2009, within the Flood Risk From Extreme Events (FREE) program and was generously funded by the National Environment Research Council (NERC, grant PO 1/8292/CS). Fredrik Wetterhall gratefully acknowledges funding from NERC's FREE project (grant NE/E002242/1); E. J. Kendon and R. G. Jones gratefully acknowledge funding from the Joint Department of Energy and Climate Change (DECC) and Department for Environment Food and Rural Affairs (Defra) Integrated Climate Programme (DECC/Defra) (GA01101). We would like to thank all participants of the workshop for stimulating discussions, in particular, Philippe Naveau and Petra Friederichs. We acknowledge the Numerical Modeling and Policy Interface Network (NMPI) for providing a wiki platform to facilitate communication between the authors. Special thanks to Jonas Olsson from the Swedish Meteorological and Hydrological Institute for providing the data for Figure 9 and David Lister from the Climatic Research Unit for providing the list of rain gages used for Figure 4. The land mask in Figure 4 was generated from the National Geophysical Data Center's ETOPO2v2 data set.

[183] The Editor responsible for this paper was Daniel Tartakovsky. He thanks one anonymous reviewer.

Ancillary