The evaluation of the quality and usefulness of climate modeling systems is dependent upon an assessment of both the limited predictability of the climate system and the uncertainties stemming from model formulation. In this study a methodology is presented that is suited to assess the performance of a regional climate model (RCM), based on its ability to represent the natural interannual variability on monthly and seasonal timescales. The methodology involves carrying out multiyear ensemble simulations (to assess the predictability bounds within which the model can be evaluated against observations) and multiyear sensitivity experiments using different model formulations (to assess the model uncertainty). As an example application, experiments driven by assimilated lateral boundary conditions and sea surface temperatures from the ECMWF Reanalysis Project (ERA-15, 1979–1993) were conducted. While the ensemble experiment demonstrates that the predictability of the regional climate varies strongly between different seasons and regions, being weakest during the summer and over continental regions, important sensitivities of the modeling system to parameterization choices are uncovered. In particular, compensating mechanisms related to the long-term representation of the water cycle are revealed, in which summer dry and hot conditions at the surface, resulting from insufficient evaporation, can persist despite insufficient net solar radiation (a result of unrealistic cloud-radiative feedbacks).
 Modern climate models are highly complex numerical constructs, encoding the laws of dynamics and thermodynamics for relevant geophysical fluids. Also, in these models, because of computational and theoretical limitations, explicitly resolved dynamical mechanisms coexist with parameterized physical processes. As stated by Palmer , “the predictability of weather and climate forecasts is determined by the projection of uncertainties in both initial conditions (ICs) and model formulation onto flow-dependent instabilities of the chaotic climate attractor.” Loss of predictability occurs not only because of uncertainties in initial conditions (usually thought particularly relevant for weather forecasting), but also due to model formulation (particularly relevant to climate modeling). It is difficult to separate these two kinds of sources of error, and this seriously hampers the evaluation of climate modeling systems. In fact, the response of a climate model to parameterization changes can lead to unexpected biases, and the tuning, validation and improvement of these complex tools represents a difficult challenge to climate modelers. Improving validation methodologies is thus an important target of future climate research [Intergovernmental Panel on Climate Change (IPCC), 2001, chapter 10].
 Of particular concern in this context is the compensation between model errors: such compensation may well produce seemingly correct results for incorrect reasons. A recent and still largely unresolved example is the representation of the seasonal water cycle over continental-scale land surfaces. Many atmospheric models currently suffer from an artificial summer drying and warming over major mid-latitude continents. Some investigators [e.g., Machenhauer et al., 1998] suggest that the causes may be ascribed to large-scale biases inducing subsidence; others have focused on physical parameterizations, addressing radiation and land surface processes [e.g., Betts et al., 1996; Wild et al., 1996; Murphy, 1999; Seneviratne et al., 2002; Hagemann et al., 2002]. The range of these investigations suggests that many different physical processes are probably relevant to the problem.
 Regional Climate Models (RCMs) were historically developed as physically based downscaling tools, in which the limited-area climate model was driven by time-dependent lateral boundary data, either from an analysis or from a coarser-meshed general circulation model [e.g., Giorgi, 1990; Jones et al., 1995]. Recently, however, as the critical role played by subgrid-scale processes has become fully appreciated, RCMs have been increasingly applied to study physical processes [e.g., Frei et al., 1998; Giorgi and Mearns, 1999].
 The goal of this paper is to develop an improved methodology for the assessment of the quality of an RCM system in the presence of limited predictability. Here the term predictability is intended as sensitivity to ICs in the context of an RCM, that is, a limited area model (LAM) with prescribed lateral boundary conditions (BCs). In order to pursue our goals, a detailed analysis is undertaken of one RCM's ability to represent the natural interannual variability on monthly and seasonal time-scales within the ERA-15 period 1979–1993. The RCM is driven at its lateral boundaries by the observed synoptic-scale variability, and the model is evaluated for its ability to reproduce climatic fluctuations on monthly and seasonal timescales, within predictability bounds derived from an ensemble experiment. A previous version of this methodology has been used in month-long integrations [Lüthi et al., 1996; Fukutome et al., 1999], and more recently in Giorgi and Shields , Small et al.  and Dutton and Barron .
 The validation of a climate modeling system relative to the interannual variability has two major advantages. Firstly, unlike the validation based on seasonal or yearly climate means, the method is much less permissive with respect to the practice of model tuning and associated misleading effects. In fact, even a hypothetical perfectly tuned model with an excellent representation of the longer-term mean climate may still exhibit deficiencies in representing interannual variations. Secondly, the methodology implicitly assesses simulated climatic differences (such as differences between warm and cold winters), and this may to some extent be taken as a surrogate for climatic changes. A validation based on interannual variability can also assess the role of model biases in the simulation of climatic differences, one of the major open issues when using a modeling system for the simulation of climate change [IPCC, 2001].
 The main disadvantage of our validation methodology is that its applicability is restricted to simulations of interannual variability that contain some degree of determinism. In an RCM, the monthly mean climate is largely controlled by the forcing at the lateral boundaries and by long-term memory effects (such as those associated with soil moisture and snow cover) in the interior [Jones et al., 1995], as well as by model formulation. In contrast, our methodology is not applicable to interannual variations simulated by a coupled atmosphere-ocean GCM, where comparison with observations on a month-to-month basis is not meaningful, due to the lack of deterministic forcing. To some extent, however, our methodology is closely related to AMIP-type [Gates et al., 1999] studies on seasonal predictability in the tropics, driven by prescribed SST conditions.
 The interannual variability methodology of Lüthi et al.  is here extended to cover a set of continuous 15-year simulations and to include a treatment of uncertainty due to both model formulation and to predictability limitations. Special consideration will be given to the processes relating to the water cycle, due to their importance for the climate system (and typical model sensitivity associated with their parameterization), but also due to their potentially considerable influence on climate change. Such ensembles of long experiments, in which different combinations of physical parameterization options are activated, while long-term memories in the climate system are retained, can yield important insights into the underlying physical processes (see also recommendations in Giorgi and Bi ). Ensembles of (short) RCM experiments including different model formulations have also been recently analyzed by Yang and Arritt .
 The outline of the paper is as follows: In section 2, the most recent modeling changes introduced into the CHRM are documented; section 3 discusses the model's mean climate, including its ability to represent current climate variability, followed by a comparative assessment of the model's predictability and sensitivity to model formulation; finally section 4 provides an interpretation of the mechanisms uncovered by the sensitivity studies, together with some concluding considerations.
2.1. CHRM Regional Climate Model
 The CHRM is a climate version of the former mesoscale weather forecasting model of the German and Swiss meteorological services, known as the HRM (High Resolution Model) or formerly EM (Europa-Modell). This model has been used until recently as an operational numerical weather prediction (NWP) model at the Swiss and German weather services [Majewski, 1991; Majewski and Schrodin, 1994] and had been modified by Lüthi et al.  for application as a regional climate model. The model grid is a regular latitude/longitude grid (Arakawa type C) with a rotated pole and a hybrid vertical coordinate [Simmons and Burridge, 1981]. It includes a full package of physical parameterizations, including a mass-flux scheme for moist convection [Tiedtke, 1989]; Kessler-type microphysics [Kessler, 1969; Lin et al., 1983]; a radiation package [Ritter and Geleyn, 1992] including interaction with partial cloud cover (of the type described by Slingo ); a land surface scheme [Dickinson, 1984] with three soil moisture layers; an “extended force-restore” soil thermal model [Jacobsen and Heise, 1982], also capable of interacting with accumulated snow at the soil surface. Vertical diffusion and turbulent fluxes are based on the flux-gradient approach (of the classic Louis et al.  type) in the surface layer, while the parameterization of Mellor and Yamada  is used in the boundary layer and above.
 Recent changes in our regional climate modeling suite, in relation to previous work [Lüthi et al., 1996; Schär et al., 1999; Heck et al., 2001], were inspired by the need to extend simulations beyond the periods typically considered in the NWP context, and have come in three individual areas
2.1.1. Land Surface and Soil Processes
 The Soil-Vegetation-Atmosphere Transfer Scheme (SVATS) and deep soils upgrades were motivated by a desire for improved simulation of land surface balances of heat, water and momentum in a realistic and sustainable fashion on decadal timescales and over a very heterogeneous region. In particular, the soil water storage capacity had to be increased from the standard (shallow) NWP profile because soil water evolves freely after initialization of the RCM and is never corrected in the course of the simulation. Three soil moisture levels are therefore used to reach a total depth of 1.7m. The soil profiles are initialized (never nudged) with ERA-15 data, retaining a “climatological” layer from 1.7 to 3.4m, which acts as a fixed boundary condition, but is only accessed in case the root zone layer dries further than the air dryness point (ADP). In terms of soil thermal processes, the original “extended force-restore method” of Jacobsen and Heise  was modified in order to control winter soil surface temperatures: we included a representation of soil moisture freezing, similar to Lunardini , which imposes a latent heat release/uptake barrier in a 1 degree interval around the freezing point, as is also done in the BATS 1e [Dickinson et al., 1993] and LSM [Bonan, 1996].
 The Ritter and Geleyn  radiation package is of the delta-two-stream type and was originally conceived as a fairly comprehensive package, useful for both NWP and climate studies. For such a purpose it retains the interaction of three short-wave and five long-wave bands of radiation with cloud droplets (not yet with ice species), gases (H2O, a composite CO2 ensemble and O3) and five aerosol species. The key to the parameterization's interaction with clouds lies in the definition of the model's integrated grid box liquid water content, which is then used inside the radiation module to distinguish eight different cloud types and respective drop-size distributions (thus prescribing relative optical properties), as is described by Ritter and Geleyn  and originally formulated by Stephens . The model is thus capable of representing the cloud-radiation interaction in a variety of climatic conditions, including future climate scenarios, since the abundance and distribution of all radiatively active agents can meaningfully be represented: in particular, for cloud water, different atmospheric conditions will immediately feed back onto radiation and vice versa. In the CHRM suite of models this is accomplished by combining contributions from both grid–scale and subgrid–scale clouds and performing a weighted average in this fashion:
where qRADl is the total liquid water passed to radiation (gkg−1); qSGSlis the total liquid water content of subgrid-scale clouds; qGSl is the total liquid water content of grid scale clouds; PCCSGS and PCCGS are the fractional covers of subgrid-scale and of grid-scale clouds, respectively.
Equation (1) shows how the cloud-radiation feedbacks are dependent on the convection and stable precipitation parameterizations. For the grid-scale liquid water content the standard model formulation employs an approach based solely on relative humidity [Slingo, 1987], while the subgrid-scale portion is based on the diagnostic liquid water content provided by the convection scheme. The following is the standard CHRM implementation of the grid-scale Partial Cloud Cover (PCCGS):
where qGSTOT = qGSv + qGSl is the total water at a grid point; q*w (T) is the saturation mixing ratio over water at temperature T; the critical relative humidity at model layer σ is RHcrit = 0.95 − RHr1 σ (1 − σ) (1 + RHr2 (σ − 0.5)), with RHr1 = 0.8; RHr2 = ; RHr3 = 1.0.
 Recently, as a result of detailed analysis of the CHRM surface energy balance [see Hagemann et al., 2002], it was decided to implement the Xu and Randall  cloud diagnostics parameterization as a CHRM option, so that it could be used to test how a more physically based cloud cover diagnostic scheme (and associated liquid water path) would affect the atmospheric extinction of radiation and thus the surface energy balance. According to Xu and Randall, we have introduced the following alternative definition for PCCGS:
where qGSl is the grid scale liquid water (gkg−1); RH is the relative humidity; q*w (T) is the saturation mixing ratio over water at temperature T; p = 0.25; α0 = 100; γ = 0.49 are the dimensionless coefficients as in Xu and Randall . The two alternative formulations in equations (2) and (3) have been tested in different model realizations and will be contrasted later in the results section.
2.2. Model Setup for the Numerical Experiments
 The experiments were integrated over a standard European domain (Figure 1), already used in earlier studies [Lüthi et al., 1996; Schär et al., 1999; Heck et al., 2001], with a grid spacing of approximately 56 km and a time step of 5 min. Twenty levels were used in the atmosphere and three layers in the soil. Land surface physiography and phenology were imposed every six hours by interpolating in space and time the monthly ISLSCP I [Sellers et al., 1994] climatological fields (e.g., LAI, vegetation cover fraction). The only other substantial data ingestion deviations from the operational NWP modeling system are due to the use of ERA-15 data for the lateral boundaries forcing, with an updating frequency of six hours, using the Davies  relaxation technique for temperature, atmospheric water and wind. Moreover, in the NWP operation of the HRM model, the soil water profile is calculated by the driving GCM and used to initialize and nudge the model, with the intent of controlling 2 m temperatures through a Bowen ratio approach, and not with multiyear soil water conservation objectives in sight. The nature of the climate simulations presented here requires more careful specification of the soil model framework and of the yearly evolution of surface and subsurface parameters and processes. For this purpose the deep soil temperature boundary condition is set to reflect the 1979–1993 surface temperature average at each grid-point, as is recommended for the extended force-restore soil model [Jacobsen and Heise, 1982].
 Three incremental CHRM model formulations are introduced, all based on the common model described so far: the first, SOIL (CHRM 2.1), makes use of an earlier configuration of the original NWP version of the soil model. In this version, as a result, infiltration of precipitation is hindered by the numerics of the vertical grid stretching (with soil layers corresponding to 2, 8 and 190 cm), and by physical limitations in the form of artificial impermeability deriving from an unrealistic treatment of the soil (cold) temperature barrier affecting soil water conductivity. As previous studies had not included the full yearly cycle, the lack of appropriate soil moisture recharge has only been noticed recently.
 The second model formulation, HYD (CHRM 2.2), by contrast, relaxes all artificial in-soil water flux constraints (and applies normal soil grid stretching), resulting in more realistic recharge and latent heat fluxes, including a reasonable seasonal contribution of transpiration originating in the root zone.
 Model version RAD (CHRM 2.3), which is a further development from HYD and addresses climatologically significant negative surface short-wave radiative biases that are present in SOIL and HYD, is used to uncover the mechanisms and feedbacks behind the well-known surface cold bias in the model. The method used here, rather than resorting to the tuning of the liquid water path fed to the radiation scheme, consists in calculating the radiatively active cloud liquid water by using the Xu and Randall  parameterization. The setup of simulation RAD preserves therefore the treatment of soil moisture fluxes in simulation HYD and includes the alternative PCCGS formulation presented in equation (3). The model version used in the ensemble experiment with different initial conditions (section 2.3) is RAD (CHRM 2.3).
2.3. Predictability in a RCM
 In this study we will conduct and analyze 15-year-long RCM simulations using different model versions driven by the ERA-15 analysis (initialized on January 1, 1979). In order to test the sensitivity to initial conditions, we have in addition designed an ensemble experiment composed of four members, one comprising the first four years of the standard 15-year simulation performed with model RAD, the other three consisting in simulations started on 2, 3, 6 January 1979 and continued until 31 December 1982, but otherwise identical. The set of initial conditions includes all prognostic variables and also all land surface (snow/ice cover), soil (complete temperature and moisture profiles) at initial time and at every grid point. The simulation start dates were chosen as would be done in a typical NWP environment, so that individual simulations would be related through a common synoptic situation and very similar in terms of soil moisture and snow cover states. However, our analysis (section 3.3) discards year 1 of the integration and during this time substantial soil and snow anomalies are allowed to develop in response to the spread in atmospheric evolutions in the ensemble.
 This IC-based approach differs from the approach taken by Christensen et al.  for the estimation of model internal variability, where the ensemble was composed of seven 1-year ensemble members (reapplying lateral boundaries from a single year) and only soil moisture was allowed to retain its memory of initial conditions; it also differs from the approach of Giorgi and Bi , where different combinations of initial (but atmosphere-only) and boundary conditions were perturbed in generating sets of seasonal ensemble members. The simulation length of each of our ensemble members (four years) provides a good sample size for the variability imposed by the lateral “perfect boundaries” nudging.
 The spread in model solutions generated by our ensemble (as illustrated in section 3.3) will be used to estimate the predictability limitations in our modeling system, as dependent on uncertainties in initial conditions, and will be presented together with the results from different model formulations in sections 3.4 and 3.5.
2.4. Observational Data
 Data sets used for validation purposes were mainly extracted from the Climatic Research Unit analyses [New et al., 1999] and the Alpine precipitation data set [Frei and Schär, 1998], both at 50 km and available exclusively over land. Additionally, ERA-15 reanalysis data at T106 truncation (excluding of course the fields used for the nudging) were also used for validation purposes in the interior of the domain, although only in instances in which other data from independent origins were not available. All data were available at monthly intervals for the entire simulation period.
 The common integration domain is shown in Figure 1, along with the subdomains to be used for time series calculations. Two letter labels identifying analysis domains of interest are listed in the caption and used in sections 3.4 and 3.5.
3.1. Mean Climate
 Previous month-long integrations with the EM family of models [e.g., Christensen et al., 1997] have shown that monthly precipitation biases were at most 1–2 mm day−1 in winter (EA, FR, AL and SP subdomains) and −2 to −1 mm day−1 in summer (AL, DA subdomains). The corresponding biases in temperature were at most −6 to −2 °C in winter (SW, AL, DA) and +2 to +4 °C in summer (FR, ME), while over the DA region the summer bias was between +2 and +6 °C. Another study by Lüthi et al. , concentrating on ensembles of January and July simulations, reported a similar geographical distribution of errors for precipitation, but the magnitude of the biases was smaller than in Christensen et al. , mostly positive in winter (0.6 to 0.8 mm day−1 over SW, GE and AL) and negative in summer (−0.6 mm day−1 over GE and AL). In these studies the model also showed a clear tendency to produce too much rain in the northern part of the domain, while becoming too dry in the southern part of the domain.
 In the new set of CHRM simulations the model has been run continuously for the 1979–1993 period, so that direct comparison to the aforementioned studies is complicated by spin-up and reinitialization issues: the current model needs to rely much more on the long-term behavior of its physical parameterizations than was the case with month-long and seasonal simulations. This is particularly relevant for root-zone soil moisture, which is characterized by a pronounced seasonal cycle and may contribute to the month-to-month “memory” of precipitation (see also the discussion by Schär et al. ).
Figure 2 shows maps of 15-year mean precipitation. Our simulation results are shown in the right-hand panels and compared with the CRU observed precipitation and the results of the ERA-15 reanalysis. Comparison shows that the yearly precipitation fields produced by simulations SOIL and RAD (which are the two extremes in this set of model formulations) both reproduce the correct distribution and amount of precipitation, especially in the region of localized maxima in northern UK, Scandinavia, the Alps and the Balkans. These maxima are fairly accurately positioned, despite some local differences in magnitude and extent. The north-south distribution tends to point to a slight overestimation in the north and underestimation in the south. The analysis in the ERA-15 data set is of coarser resolution and cannot reproduce some of the narrow regions of precipitation in the CRU and simulation fields, especially near coastlines. The southwards extent of the region with significant precipitation tends to be insufficient in both our simulations and seems to reflect a tendency for Mediterranean dryness. This bias is also present in the ERA-15 data.
Figures 3 and 4 show the precipitation bias for the entire domain, calculated as deviations from CRU data for the 1979–1993 period and averaged for winter and summer. The winter (DJF) precipitation bias maps show that the model has a small positive bias in the north, amounting to less than 1 mm day−1 and underestimation in coastal areas and over the south, with a typical bias around −1 mm day−1 over Portugal. Some localized regions, over the north of the UK and the Alps (where a north-south dipole is visible), show a locally enhanced pattern, but this is in fact due to 1–2 grid points shifts in the simulated precipitation in relation to the observed simulation. A large portion of the domain shows a bias contained in the −0.5 to 0.5 mm day−1 interval and differences between model versions are very small and localized. The ERA-15 precipitation is fairly successful over coastal areas (except for Norway) but overestimates precipitation in an extensive region in the northern part of the domain.
 The summer (JJA) bias in Figure 4, on the other hand, shows that SOIL has a drying problem in large portions of central Europe and in particular in the Danube and Alpine regions, with a bias exceeding −1 mm day−1 and −2 mm day−1 respectively. The bias is clearly more pervasive in simulation SOIL, covering most of central Europe, while being geographically more contained in simulations HYD and RAD. A very scattered positive bias of less than 1 mm day−1 is present in some part of Scandinavia, which is somewhat worse in the model versions HYD and RAD with a larger soil moisture availability and correspondingly larger evapo-transpiration, as will be seen later in this section. The ERA-15 analysis also shows discrepancies in relation to CRU data, in particular over the Alps (negative bias) and the UK, but is very successful in the Danube catchment region. In this regard, however, it should be recalled that the ERA-15 assimilation, despite using a fully interactive soil model, applies soil moisture increments which change the seasonal cycle [Douville et al., 2000].
Figure 5 shows the complementary horizontal distributions of winter (DJF) temperature bias at the surface, which was calculated in reference to CRU temperatures, corrected for differences in underlying orography (using CHRM orography as reference), using a 6.5 Kkm−1 lapse rate, as in Christensen et al. . The same correction was applied to ERA-15 temperature data. The bias is generally negative (especially so over the Alps) and clearly extends to the entire domain, including small portions of the southern regions, with local values as low as −4K. Northern Scandinavia is an exception, with a local warm bias of about 2K. The bias over the Alps is smallest in simulation RAD while there are some indications that the bias over the UK and Scandinavia is smallest in simulation HYD. The bias in the ERA-15 analysis (also reported by Viterbo et al. ) is very similar to that found in our model simulations, both in magnitude and in geographical distribution, especially so over the Iberian Peninsula and the Alps.
 For the summer (JJA), Figure 6 shows how the bias is more differentiated between individual simulations: it is most prominent in magnitude and horizontal extent in HYD where there is a large-scale negative bias between −1 and −3 K, especially noticeable over the Iberian peninsula and Central Europe, while simulation SOIL displays a very large region of positive bias over the SE portion of the domain, often exceeding +2K. Simulation RAD appears to clearly reduce the warm bias seen in SOIL, while reducing the cold bias over most of the domain in relation to HYD, despite remaining slightly cold over most of the in-land regions and over the Iberian Peninsula, when compared to SOIL. ERA-15 data show excellent agreement with CRU data, except over the Iberian Peninsula, where the analysis indicates an underestimation very similar to the one in the CHRM results.
3.2. Soil Moisture Evolution
 The soil moisture evolution in the three model formulation experiments strongly affects the interannual and seasonal variations in the simulations, as will be seen in sections 3.4, 3.5 and 3.6. As an example of a regional soil moisture evolution, Figure 7 compares the soil moisture levels from the first and last year of the three simulations, averaged over the Alps subdomain (see Figure 1). Bearing in mind that 1992 and 1993 were years of extremely low precipitation in this region (see Figures 9–10), while 1978–79 was a very wet winter, the January 1979 initial condition, imposed from ERA-15 analysis, shows all three model versions very near the field capacity level (shown as a weighted average over the domain with continuous lines). Already by inspecting the soil water levels in December 1979, it is clear, however, that the soil moisture in the root zone is not recharged equally in the three simulations, with SOIL recharging least and RAD recharging most. Simulations HYD and RAD achieve a stable, repeating soil moisture cycle between the first and fourth simulation years, depending on location, by exclusively interacting with the atmospheric water cycle. In simulation SOIL, on the contrary, the soil is losing water as a result of underestimating the recharge, despite sizable access to the climatological layer, in order to prevent soil moisture values under the air dryness point (ADP). This behavior has a cumulative effect over the course of the fifteen years: simulation SOIL is clearly achieving a much lower water level by 1993 (about 100 mm less over the domain average, much more pronounced locally), with a smaller amplitude of the yearly cycle, than either HYD or RAD. The difference between simulations RAD and HYD can be ascribed to a slightly more vigorous water cycle in RAD and to the warmer temperatures, which help water infiltration into the soil, due to the less frequent triggering of soil impermeability caused by freezing.
3.3. Predictability of Seasonal Means
 Prior to analyzing the interannual characteristics of the simulations (including biases and sensitivities to different physical parameterization choices), the predictability of seasonal means by a given model version is assessed using an ensemble experiment in which the model formulation is kept fixed (RAD). The ensemble consists of four simulations with durations of 4 years, and these are initialized from slightly different initial conditions (see section 2.3).
 Examples of results are shown in Figure 8, in the form of scatter diagrams, representing the seasonal mean responses of the four ensemble members over one particular subdomain (Alps), and illustrating the uncertainty associated with unforced internal variability induced by varying the initial conditions. Each data point represents a spatial average for a particular ensemble member over a particular season (left panels are for winter, and right panels are for summer), for total precipitation (top, mm day−1) and for 2m temperature (bottom, °C), respectively. The results in Figure 8 show that, given an initial (January 1979) model spread of about 0.4 mm day−1 (and corresponding soil moisture and snow cover values) together with a 1.6 K temperature spread, the summer precipitation uncertainties arising from the model's predictability are contained in a 0.2 to 0.8 mm day−1 interval, while temperature uncertainties in the summer range from 0.3 to 0.6 K. The behavior of individual model realizations is in no way systematic, be it by variable or by season. The vertical spread between data points is a measure of the limited predictability due to the chaotic nature of the model's dynamics; this uncertainty estimate will be later contrasted with the uncertainties stemming from alternative model formulations.
 To this end, we consider only the years 1980 to 1982, since January 1979 is made special by spin-up issues and also contaminated by different simulation lengths: even a single storm, missed by starting the model on subsequent days, could locally affect the monthly means and prevent comparability with 1980–1982 winter means. After removing the interannual variability, which was done by calculating the yearly anomalies of each ensemble mean by season (with respect to the 1980–1982 mean), the resulting 12 values were used to calculate an anomaly variance at each grid point. Subdomain standard deviations were thus derived and applied to the comparative analysis of model uncertainties stemming from model formulation (Figures 9–12, which are introduced in the next section).
3.4. Interannual Variability of Precipitation
 Previous EM-CHRM studies considering interannual variability [e.g., Christensen et al., 1997; Lüthi et al., 1996; Fukutome et al., 1999; Heck et al., 2001] have focused on ensembles of short, 1–5 month simulations, to establish the statistical significance of the model's skill in representing interannual variability of precipitation and temperature. In particular, Lüthi et al.  have shown that, in general, the model possesses some skill at representing interannual variability of precipitation in winter, when the simulated signal is large, while not being able to achieve the same level of skill in summer, when the performance of the model has to rely more on the quality of its physical parameterizations.
 In this subsection we analyze the interannual variability in precipitation for the 1979–1993 period and the ability of the CHRM model to regionally represent it. The results are presented in the form of scatter diagrams of model and observed (CRU) subdomain seasonal averages. Before proceeding, we use the top left panel of Figure 9 (subdomain SW) to explain their use. On the abscissa are CRU observational data, while on the ordinate are model results. In each panel, the results from the three simulations SOIL, HYD and RAD, as well as the ERA-15 reanalysis, are represented (using different symbols and a common year label), while each of the three simulations, together with ERA-15 analysis, is summarized by its regression line. Perfect simulation data would be located on a diagonal line (left bottom to right top) across each panel. This type of plot allows to distinguish three different types of error. First, an overall wet or dry bias can be identified from a location of the regression line above or below the diagonal (e.g., ERA-15 and RAD). Second, a systematic bias in representing the interannual variability is present when the slope of the regression line does not match that of the diagonal (e.g., ERA-15 has a tendency to overestimate precipitation more in wet years than dry years in absolute terms, albeit not necessarily so in relative terms). This behavior will be referred to as a misrepresentation of the “precipitation sensitivity,” and it pinpoints a problem in simulating differences (here between wet and dry seasons). This kind of consideration may be relevant to assess the suitability of a model for conducting climate change scenarios, as is recommended in the latest IPCC  report. Third, the scatter of individual data points around the regression line represents an unsystematic error contribution. This error contribution may partly be explained by the limited predictability of the system (see the previous section), which is summarized for each variable and region by the grey polygon of height 2 · σ (standard deviation of ensemble results from the previous section) straddling the “perfect simulation” diagonal across each diagram. A “perfect model” (i.e., a model with perfect physics and dynamics), driven by perfect boundary conditions, would produce results contained in this grey area, also assuming perfectly accurate observations.
 The winter precipitation in Figure 9 shows very good skill of the model at reproducing interannual variability, as data lie principally along the diagonal over most regions. The subdomains with the best reproduction of the signal are the Alps and France, for which both precipitation amount and sensitivity are almost perfectly represented for all four data sets. Germany, Spain, SE Mediterranean and the Danube region show good simulation quality, but slightly less so in years of high precipitation, which are overestimated in the north and underestimated in the south; Scandinavia (SW) and the east (EA) domains display the largest errors, with pronounced overestimation in SW (but less so than ERA-15) and poor slope of the regression line for SP and ME. Modeled precipitation regression lines over subdomains SW, EA, GE, DA show some degree of overestimation, but at the same time lie between ERA-15 and CRU estimates. It is of interest to note that, for most of the data sets, the slope of the regression line corresponds very closely to reality (is parallel to the diagonal), but remains poorest in the south. The uncertainty associated with unforced variability, ranging from 0.2 to 0.6 mm day−1, is largest in subdomains further from the entry point of storms (NW), and is generally at least as large as the spread in results among the individual model versions.
 During summer (Figure 10) the grey area is much thicker, in response to the reduced predictability. The results nevertheless show how the SOIL simulation tends to be consistently too dry, especially in the south and southeast, and also how the slope of the regression line (the precipitation sensitivity) is generally underestimated. The dry bias is substantially reduced in simulation HYD and RAD over most domains, while the underestimation of the precipitation sensitivity is at best only marginally improved. The regions displaying the most pronounced dry bias are the Alps and the Danube; Germany and France are relatively better represented, while the representation of interannual variability of precipitation over Spain and the Mediterranean shows surprisingly good agreement with observations, despite the small signal and the identified bias. Simulation RAD is closest to the observational data in the majority of subdomains, except in SW and FR. The uncertainty stemming from alternative initial conditions is larger than in winter, ranging from 0.2 to 1 mm day−1, but in some subdomains is comparable to the differences stemming from model formulation, since individual model versions produce quite different bias and precipitation sensitivity results. The magnitude of the uncertainty is generally larger in the east and near mountain ranges.
 In general, it is quite clear how the signal under study displays enough interannual variability as to allow the (1979–1993) model errors to be relatively large, while still enabling the model to claim skill at representing this variability over most subdomains over the entire yearly cycle. The skill, however, is least in the summer period and further from the principal entry point of storms, at the NW corner of the domain.
3.5. Interannual Temperature Variability
 The winter temperature scatter diagrams in Figure 11 show for most domains a good skill at representing the temperature sensitivity (the slope of the regression lines), while there is a cold bias as large as −2K in several domains (e.g., France, Spain and Alps). Differences between individual simulations are quite small, but comparisons to the uncertainty associated with limited predictability (ranging from 0.1 to 0.6 K) indicate that model formulation is a more important source of uncertainty for this variable in winter.
 Summer temperature scatter diagrams in Figure 12 show how most data are roughly aligned parallel to the diagonal (thus correctly representing temperature sensitivity), but the systematic errors are quite large, as much as 2K. The temperature field displays the largest differences between simulations, with simulation SOIL always much warmer and simulation HYD much colder than the other two. Over the Danube region, simulation SOIL is systematically over 1 K warmer than CRU, while simulation HYD is systematically 1K colder; simulation RAD has the least bias, well in agreement with ERA-15. The regression line of simulation SOIL is closest to the diagonal in several domains, but this is a clear case of error compensation and occurs at the expense of pronounced underestimation of summer precipitation in most areas (contrast with Figure 10). Simulation HYD is generally the coldest, while simulation RAD is a clear improvement over HYD in all subdomains, being the one with the least bias over the Danube region, and being within 1K error bars over the Alps, Sweden, Germany, France and the SE Mediterranean, with the exception of Spain. It is also noteworthy that ERA-15 has quite an excellent behavior over most subdomains, with the exception of Spain, which shows a bias signature very similar in geographic distribution and magnitude to the one in our model.
 The uncertainty stemming from model predictability (0.2 to 0.6 K) is comparatively much less important for a variable and period in which large discrepancies exist between solutions produced by alternative model configurations, and especially so in the south.
3.6. Surface Energy and Water Fluxes Effects
 The soil-atmosphere feedbacks in the water cycle, which affect the land surface temperature and precipitation budgets, can be better understood by considering the surface energy and water fluxes and contrasting them in all three model formulation experiments. The fields which are mostly affected in the three different simulations are the surface net short wave flux and the surface latent heat flux. The three experiments are compared with the ERA-15 fluxes in Figures 13 and 14, this time in the form of the mean seasonal cycle of the 15-year period, again organized by region. The use of ERA-15 solar fluxes as a proxy for observations is justified by Wild et al. , who showed that the incoming solar radiation is in general well reproduced by ERA-15 and appropriate for this type of basic validation in regional climate studies. It must be remembered, however, that the latent heat fluxes in ERA-15 are mostly a model product, despite the continuous data assimilation.
 The absorbed solar radiation of simulation SOIL (Figure 13) is in good agreement with the fluxes of ERA-15, with a maximum local overestimation of 20 Wm−2 in subdomain DA (corresponding to summer positive temperature biases) and an underestimation over subdomain SW (−40 Wm−2 at the peak). Most subdomains exhibit however significant drying (in several regions as much as 40 Wm−2 at the peak of the growing season), as evident from the depressed latent heat flux simulated by the model (Figure 14), also associated with a general attenuation of the soil moisture annual cycle, as was seen in Figure 7.
 The surface latent heat fluxes (seen in Figure 14) in model HYD are in better agreement with those of ERA-15 than those in SOIL (except over SW and GE where some overestimation is present). However, this extra water flux into the atmosphere feeds the almost exclusive growth of low-level clouds (not shown) which have the general effect of depressing the net surface short wave over the growing season by 10–30 Wm−2: in Sweden the June biases of −40 Wm−2 are made to be about −60 Wm−2 in this model formulation.
 Model RAD, with about the same total water content as HYD but a different diagnostic formulation of cloud cover by layer (and corresponding liquid water path), displays solar radiation with the opposite tendency, substantially correcting the bias by almost 40 Wm−2 over Sweden and also over Germany, in eastern Europe and the Alps. The corrections due to RAD are most pronounced in central and northern Europe and are also found (although with slightly smaller magnitude) in the net radiation plots (not shown), so that the response to the introduction of the Xu and Randall  cloud diagnostic is clearly of benefit to the surface energy balance and explains the improved results in the temperature plots. The representation of the surface latent heat fluxes in RAD is very similar to that in HYD.
 The short wave plots for the southern domains (SP, ME) show that radiation is rather well represented in this region and insensitive to model formulation. The latent heat flux systematic difference from ERA-15 for these regions is significant, but also suggestive of spring and fall errors in the initiation and termination of vegetation activity. The static phenology in the ERA-15 land surface parameterization is mostly responsible for these latent heat flux differences, which also helps explain the ERA-15 cold bias during the cold season in the southern portions of the domain (especially Spain), where sufficient energy is available but vegetation should be dormant instead of transpiring.
4. Discussion of the Simulated Water and Energy Cycles
4.1. Comparison of Results From Predictability and Model Formulation Experiments
 Uncertainties originating from limited model predictability have been found to be generally smaller than those originating from alternative model formulations. Unlike the results from experiments with alternative model formulations, no systematic behavior was uncovered in the time frame of the ensemble simulation, with a spread of solutions continuously converging and diverging, depending on location, variable and season, but no defined bias or trend. The summer precipitation field appears to be the one with the greatest sensitivity to initial conditions (although in general of comparable or smaller magnitude than the sensitivity to model formulation), arising from soil moisture and snow cover memory effects, the timescales of which will need to be investigated further.
4.2. Mechanisms Uncovered
 The results of the experiments with alternative model formulations uncovered clear mechanisms associated with the water cycle: a compromised soil moisture recharge (in SOIL) causes systematic early depletion of soil water, leading to a dry warm bias in summer. Comparatively less low-level cloud formation in the growing season and limited latent heat fluxes also contribute to a warmer summer climate. A more realistic, self-sustaining, water cycle (in HYD) also enhances summer precipitation, but allows excessive interplay of low-level cloud-radiation feedbacks, which, together with the enhanced latent heat fluxes over the growing season, produces a climate significantly colder than the observed climate. An alternative cloud-radiation feedback intensity, achieved by altering the cloud diagnostic (in RAD) and the resulting short wave attenuation throughout the troposphere, produces a more reasonable radiative balance at the surface (and associated temperatures) while sustaining a realistic water cycle.
4.3. Biases and Their Sources
 Interpreting these results in terms of biases and related compensation of model errors, it is clear that model SOIL produces realistic surface temperatures during the growing season (except over the DA region) by compromising the soundness of its water cycle. This is characterized, for instance, by the significant dry biases and the serious depletion of the soil moisture reservoir. The surface drying contributes to the substantial precipitation biases, which are largest over the eastern side of the domain and in years of relatively abundant precipitation. The unrealistic representation of the energy and water cycles and their interplay means that many of the apparently good results in model SOIL are in fact due to compensating model errors.
 Model HYD, on the other hand, is capable of representing a sound and self-sustaining water cycle, mostly addressing the precipitation, latent heat flux and soil moisture errors in model SOIL, but at the cost of introducing a severe surface cold bias, partly explained by an underestimation of short wave radiation at the surface.
 Compared with the other two model versions, model RAD more realistically represents both the energy and water cycles, with the smallest net short wave and latent heat flux biases, coexisting with a sustainable soil moisture cycle and one of the best representations of precipitation, in both seasons. The summer temperature bias is still significant, but represents an important improvement over the biases in models SOIL and HYD, while it also derives from a more realistic net surface radiative balance.
 It is particularly interesting to notice that the increase in solar radiation between simulations HYD and RAD and the increase in evapo-transpiration between simulation SOIL and HYD are just about the same and occur over the same regions. This again confirms the diagnostic of error compensation in the treatment of the soil-water and energy cycles in SOIL. The Xu and Randall  corrections are largest in regions of high mean cloudiness (e.g., Scandinavia) and are much smaller in regions of infrequent cloud cover (e.g., Spain), so that temperature biases are virtually unaffected there. The summer positive bias in net short wave, which is present in model RAD over the DA subdomain, corresponds to the largest deficit in latent heat flux over the domain. The same observation applies, with smaller involved amplitudes, to subdomains SP and ME.
 The uncovered mechanisms and related error compensations do not however explain all biases. A reasonable interpretation of the winter surface (2m) temperature bias is that it partially reflects the winter error in the ERA-15 data (which consists in a domain-wide -2K bias [see Viterbo et al., 1999]). A more complete explanation needs to also take into account the characteristics of the force-restore soil model used in the CHRM, which has only two layers, and therefore introduces large phase and amplitude errors at timescales other than diurnal and annual (see also the discussion by Jacobsen and Heise ). The soil model cannot retain sufficient memory of the summer heat storage (and is also influenced by a too cold boundary condition at the lowermost level, corresponding to the 15-year surface temperature average in ERA-15). The model therefore tends to quickly reflect and respond to the cold bias in the driving data traveling through the domain from the lateral boundaries. Moreover, CHRM underestimates the diurnal temperature range (not shown, but confirmed from a comparison to CRU data). The bias is in general most evident in the maximum (daytime) 2m temperature field, both in summer and winter (albeit almost exclusively in the southern extremes of the domain for winter) which is the field affected most by both the evapo-transpiration corrections in HYD and by the cloud-radiation alterations in RAD. The impact of the grid-scale liquid water diagnostics scheme, revealed by differences in simulation RAD versus simulation HYD, is also confined almost exclusively to daytime maximum temperatures.
 During the summer, when the ERA-15 temperature bias is relatively small, advection is relatively weak and the model is free to achieve its own surface energy balance. This is much more meaningful under the new conditions imposed by the Xu and Randall cloud diagnostics, despite the fact that its partitioning into sensible and latent heat is locally still favoring too high Bowen ratios.
 As expected, winter precipitation appears to be well represented, despite some local overestimation, and appears to be unaffected by the physical parameterization changes introduced.
5. Conclusions and Outlook
 Both the predictability of the climate system and the uncertainties related to model formulation must be considered in testing, understanding and improving a climate modeling system. Both factors have been addressed in this study, which expands the interannual variability method already applied by Lüthi et al. . The nature of the methodology, and the involved computational costs, indicate that RCMs can provide sound and affordable test beds for physical parameterization packages in the context of climate studies. Analysis of our simulation results results suggests the following:
 1. The model has skill at representing interannual variability in precipitation and surface temperature, more so in winter, despite fairly sizable (but within the state of the art) biases in both precipitation and temperature.
 2. Interannual variations in temperature are generally well represented, while the model better represents precipitation in relatively dry years, especially in summer and in the south.
 3. The comparison of model predictability and uncertainties stemming from different model formulations indicates that the latter are relatively more important over most of the European region, except for precipitation in summer, where some subdomains indicate a moderate loss of predictability. The relevance of local physical processes is enhanced at times when the large scale driving has less influence, most notably in summer, and farther from the entry region of storms.
 4. If the in-soil water flux is not realistically represented, significant drying will result in the root zone after the first few years of simulation, creating corresponding deficits in precipitation and large positive temperature biases in most central European regions, especially in the Danube catchment region.
 5. Correcting the large deficit in surface solar radiation has improved the model's representation of the energy and the water cycles, especially in summer; this is also true, in winter, of elevated regions such as the Alps.
 The new series of simulations that will be undertaken in the course of the next year will use driving data from HadAM3 and ECHAM5 simulations for current climate conditions, and also, as soon as available, from ERA-40 data; this should allow for better understanding of the influence of the lateral boundary forcing on the remaining biases. Tests will also be performed with an expanded domain, in order to study the ability of the model to develop its own solution in a larger interior region. Furthermore, a more advanced and comprehensive SVATS will be coupled, including a multilayer diffusive soil thermal model, which should address the limitations of the force-restore method for this type of long term studies.
 This research was supported by the 5th Framework Program of the European Union (project MERCURE, contract ENV4-CT97-0485), by the Swiss Ministry for Education and Science (BBW contracts 97.008) and by the Swiss National Science Foundation (NCCR Climate Project 2.2). We are extremely grateful to the German Weather Service (DWD) for allowing use of the HRM base model. Special thanks to D. Majewski, B. Ritter and E. Heise for numerous suggestions and practical help with the code. This work could not have been accomplished without the fundamental support of Mr. B. Loepfe and the staff of ID-ETH, who provided assistance in data transfer and guaranteed continuous and reliable supercomputer operations. The authors wish to acknowledge use of the NCAR Graphics and Ferret softwares for analysis and graphics in this paper. NCAR Graphics is a product of University Corporation for Atmospheric Research; Ferret is a product of NOAA's Pacific Marine Environmental Laboratory. We are grateful to Dr. K. Taylor of PCMDI, Lawrence Livermore National Laboratory, USA, and to an anonymous reviewer, for their help in improving the manuscript.