An evaluation is undertaken of the accuracy with which the Joint UK Land Environment Simulator (JULES) can simulate snow cover and depth when driven using data from the Hadley Centre regional climate model. The JULES model provides the facility to diagnose the thermal and hydrological state of the land surface and soil given time-varying inputs of air temperature, wind speed, humidity, shortwave and longwave radiation, and precipitation. The observed data set used in this study consists of daily snow depths measurements at 601 climate stations with more than 15 years of observations in the period from January 1976 to December 2000. In this study, the JULES model was driven using two data sets at 25 km horizontal resolution: one produced using the U. K. Met Office Hadley Centre regional climate model HadRM3-P (RCM), the other in which regional climate model precipitation and air temperature data were replaced with observed values (RCM+PT). The results indicate good agreement between the land surface model simulations and observations of snow cover at climate stations. The median snow cover accuracy indices for all 601 stations were 89% and 91% for the RCM and the combined RCM+PT driving data sets, respectively, with only a small interannual variation. In contrast, the differences between modeled and measured snow depth were much larger. The median values of mean snow depth bias were similar, −0.4 cm for RCM and −1.2 cm for RCM+PT; however, the RCM simulation was found to overestimate the observed snow depth at more than 25% of climate stations. The extent to which the results from RCM-driven simulations match observed data is strongly related to the accuracy of the RCM precipitation. The large overestimation has significant impact on the snow mass simulation and the assessment of extreme values in the mountains. We note that even if snow cover can be simulated with a high degree of accuracy, this should not imply a similarly high degree of accuracy in the simulation of snow depth. Model performance was poorest in regions of significant topographic heterogeneity and our findings suggest that the most promising additional model developments should be directed toward computationally efficient representations of subgrid topography.
 Accurate specification of snow cover and depth in land surface models is essential not only for accurate simulation of global climate [e.g., Barnett et al., 1989; Randall et al., 2007], but also because snow cover and depth are increasingly used to initialize land surface models in hydrological and weather forecasting models [Walsh and Ross, 1988], and to provide better information for water and land management, especially under scenarios of climate, land use, and other environmental change [Hall, 2004]. The extent of snow cover can alter energy and moisture fluxes between the land surface and the atmosphere because the albedo of snow covered surfaces is typically 0.6–0.9, while the albedo for snow-free soils and vegetation is generally less than 0.3 [Dingman, 1994]. In addition, the reduced thermal conductivity of snow compared with soil means that snow, when present in significant quantities, acts to insulate the soil surface from air temperature changes [Stieglitz et al., 2003].
 Despite the importance of snow in the Earth system, the representation of snow processes in land surface models has been identified as a key area in which future improvements are required [Dirmeyer et al., 2006; Roesch, 2006]. Indeed, in a comparison of models used in the Inter-governmental Panel on Climate Change's Fourth Assessment Report, Roesch  showed that while most of the land surface models accurately predicted the onset of snow accumulation, they tended to overestimate snow mass in spring, and predicted a delayed onset of spring melting. Part of the difficulty in modeling snow processes in the global climate lies in the discrepancy between spatial and temporal scales at which the physics of snow accumulation and ablation operate; these scales are much finer than those resolved in most models of global climate. This heterogeneity is most significant in regions of significant orography, where topographic gradients influence local precipitation [Blöschl and Kirnbauer, 1992; Barros and Lettenmaier, 1994; Daly et al., 1994]. An additional influence on snowpack heterogeneity is land cover, especially forest cover. Results from the Snow Model Intercomparison Project (SNOWMIP2) comparison of 33 snowpack models of varying complexity have shown that “many current land surface models represent a sufficient range of processes that can be calibrated to reproduce the mass balance of forest snow packs well while simultaneously providing reasonable estimates of canopy albedos and temperature”, [Essery et al., 2009, p. 1132]. Consequently, while no single best model could be identified in the SNOWMIP2 study [Rutter et al., 2009], the major difficulty in accurately simulating snow processes lay in the estimation of reliable model parameters.
 At larger spatial and temporal scales, regional climate models (RCMs) provide a promising set of tools for dynamical downscaling of precipitation, as well as providing theoretical insight into regional climate processes [e.g., Schär et al., 1999; Frei et al., 2003]. However, to date, few high-resolution data sets of observed snow cover and depth have been available to test the snow components of land surface models over broad geographical regions with complex and heterogeneous topography and land cover. Existing assessments of land surface model simulations of snow cover have generally made only limited comparisons with observations of snow cover, or have presented snow model efficiency statistics calculated over large regions or elevation zones. For example, Sheffield et al.  compared four different land surface models with observed snow cover from the Interactive Multisensor Snow and Ice Mapping System measured during a 3 year period over the conterminous United States. They found that the snow cover accuracy varied with geographic location and elevation, where larger discrepancies in higher elevation regions were related to the inherent difficulties in modeling snow processes over variable topography. Pan et al.  evaluated the same models over the same domain using snow water equivalent (SWE) measured at 110 Snow Telemetry (SNOTEL) sites. They reported a consistent model underestimation of maximum annual SWE and stressed the importance of precipitation and air temperature bias corrections. In a recent study Habets et al.  presented a multiobjective validation of a land surface scheme over France, which includes the evaluation of snow depth bias. Instead of comparing station data with grid box information, they compared the measurements and simulations averaged in five different elevation zones. The results indicate low average bias, which tends to increase with increasing elevation.
 The aim of this paper is to test the Joint UK Land Environment Simulator (JULES) together with input from the Hadley Centre Regional Climate Model and a comprehensive archive of historical observations of precipitation, temperature, and snow cover available at over 600 locations in Austria, over 25 years. Our principal objective is to assess the performance of the JULES land surface model in simulating spatial and temporal dynamics of snow cover and snow depth. We are particularly interested in investigating the effects of topography and intraannual and interannual variability in snow model efficiency and evaluating the benefits of using regional climate model output to drive hydrological models.
 The paper is organized as follows. First, the land surface model is introduced and the methodology used for the snow validation is presented. Next, the study region, in situ snow depth measurements and model driving data are described. We then evaluate the accuracy with which the JULES model simulates spatial and temporal patterns of snow cover and depth over Austria. Finally, the results are discussed in the context of snow subgrid variability and the accuracy of existing snow cover products. In conclusion, we offer some suggestions for further improvements to land surface models of snow processes.
2.1. Land Surface Model
 In this work, we have used the Joint UK Land Exchange Scheme (JULES) [Blyth et al., 2006]. This model is based on the Met Office Surface Exchange System (MOSES) [Cox et al., 1999; Essery et al., 2003] and provides the facility to diagnose the thermal and hydrological state of the land surface and soil given time-varying inputs of air temperature and humidity, wind speed, shortwave and longwave radiation, and precipitation. MOSES was originally developed for the Hadley Centre Global Climate Model (GCM) to calculate surface-to-atmosphere fluxes of heat, water, momentum, CO2, and CH4, and to model the surface and subsurface variables that affect them. Within JULES there are four soil layers in the vertical direction, with a temperature and soil moisture content associated with each. In common with most land surface schemes used in climate-modeling applications, JULES assumes that water and heat move in the vertical direction only.
 The snowfall precipitation is directly given as a model input or is estimated from the precipitation by using a threshold air temperature Tt·. The snowpack is represented by a single model layer which is combined with the surface layer of the soil model [Essery et al., 2003]. When there is snow on the ground, the surface layer has the combined depth and thermal conductivity of the snow layer and the surface soil layer. Snow is given a constant thermal conductivity and a constant density. The heat capacity of snow is neglected, but snow decreases the bulk thermal conductivity of the surface layer due to both the increased layer thickness and the different conductivities of snow and soil. The surface skin temperature is not allowed to exceed 0°C while snow remains on the ground, and the heat flux used to melt snow is diagnosed as a residual in the surface energy balance. Meltwater drains immediately from the snow and is partitioned into soil infiltration and runoff; there is no storage or freezing of liquid water in snow. A spectral albedo scheme calculates separate diffuse and direct-beam albedos in visible and near-infrared bands for vegetation tiles. Snow aging is parameterized using a prognostic grain size. Albedo is weighted by the snow cover fraction, fs = d/(d + 10z0), where d is snow depth and z0 is snow-free tile roughness length. This fraction is used together with the snow-free albedo, α0, and deep snow albedo, αs, to calculate the adjusted albedo, α = fsαs + (1 − fs)α0 [Essery et al., 2003]. Surface heat fluxes are calculated using an implicit scheme with four layers with thicknesses of 0.1, 0.25, 0.65, and 2 m and the surface temperature is diagnosed using a surface energy balance. Snow is melted if this temperature exceeds the melting point (Tm) until either the surface temperature reaches the melting point, or the snow is exhausted [Essery et al., 2003].
 In the current experiment, the JULES model was run offline on a 0.22° grid (approximately 25 km), with an hourly time step. JULES is a tiled model in which each grid box contains a variable fraction of a range of distinct surface types, each of which can be set to have different properties relating to heat and water transport and vegetation. In the present experiment, JULES was configured with the standard nine surface types: broadleaf tree, needleleaf tree, C3 grass, C4 grass (C3 and C4 are alternate metabolic pathways used to fix carbon in photosynthesis), shrub, urban, open water, bare soil, and ice. Surface parameters, including vegetation distributions and values of the leaf area index (LAI), were taken from the standard data sets used by Essery et al. , which are mainly derived from the land cover archive of Wilson and Henderson-Sellers , and Advanced Very High Resolution Radiometer (AVHRR) data. Canopy heights and roughness lengths were calculated from values of LAI using the parameterization of Essery et al. .
 In order to establish the snow model parameters, a sensitivity analysis was performed. We used the combination of Latin hypercube and one-factor-at-a-time sampling methods [van Griensven et al., 2006] and evaluated the snow model sensitivity separately at six climate stations representing different elevation zones of the domain. We ran the JULES model using 3400 parameter combinations selected from parameters listed in Table 1. As a measure of snow model performance, the root-mean-square error (RMSE) between observed and simulated snow depth in the period 1976–2000 was used. Ranking the sensitivity of snow parameters showed that at each station the snow model performance was mostly sensitive to two model parameters: the threshold air temperature (Tt) and the snow density (Rh). Figure 1 shows the RMSE variability for different ranges of these two model parameters and the variability between different locations (elevation). It is clear that for an individual location (climate station) it is possible to find an optimal parameter solution, however it is difficult, if not impossible, to find a unique combination of model parameter values for the entire spatial domain. Interestingly, the variability in snow model performance between different elevations is significantly larger than that obtained for particular location by using different model parameters. Assuming a large variability in RMSE within a selected parameter ranges and between the stations we decided to fix the snow model parameters to literature values [Essery et al., 2003], setting the constant Tt = 274 K and Rh = 250 kg m−3. The values of other snow model parameters are given in Table 1. We note that in these experiments, we neglect the heat capacity of snow and use the model output from the C3 grass tile to provide appropriate comparison with observed data. A 1 year spin-up period was used to allow the model to equilibrate with the driving data.
Table 1. Parameter Values Used in the Snow Component of the Land Surface Model (JULES) and Parameter Ranges (Minimum and Maximum) Used in Sensitivity Analysis
Constant density of lying snow
250 kg m−3
Threshold air temperature
Thermal capacity of lying snow
0.3 × 106 J K−1 m−3
0.2 × 106
0.9 × 106
Thermal conductivity of lying snow
0.265 W m−1 K−1
Grain size for fresh snow
Maximum snow grain size
Snow grain area growth rates (melting/cold fresh/cold aged snow)
0.6/0.06/0.23 × 106μm2.s−1
0.3/0.02/0.20 × 106
0.8/0.08/0.30 × 106
Maximum albedo for fresh snow (visible wavelengths)
Maximum albedo for fresh snow (near infrared wavelengths)
2.2. Snow Simulation Accuracy Assessment
 Quantitative assessment of the accuracy with which the JULES model simulates snow depth is performed in two stages, each using observed data from Austria. In the first stage, JULES' ability to simulate snow cover dynamics is evaluated. In the second stage, we evaluate the model's performance using snow depth observations.
 To assess the accuracy with which the JULES model simulates the presence of snow cover, we define the snow cover accuracy index ka. Snow depth observations at the climate stations are considered as ground truth for the model grid box in which they are located. For the purposes of model evaluation, we remove the effect of very small values of snow depth in the simulated data set by assuming that snow is present in the grid box only when the simulated snow depth exceeds 0.05 cm. The results presented here were found to be insensitive to this threshold. The ka index is then defined as the sum of correctly classified days (i.e., days when snow was both modeled and observed, and days when snow was neither modeled nor observed), expressed as a percentage of the total number of days [cf. Wilks, 2005]:
where A, B, C, and D are as defined in Table 2. In order to assess the spatial and temporal variability of snow cover dynamics, the ka index is estimated for each individual climate station using data from different time intervals: the entire long-term period, individual years and seasons (months).
Table 2. Confusion Matrix Relating the Ground Based Snow Depth Observations (Ground) and the Snow Simulation by the Land Surface Model (JULES)
JULES: No Snow
Ground: No snow
 The misclassification of snow cover is evaluated by the model overestimation (MO) and underestimation (MU) errors. The MO error represents the case when the JULES model simulates snow, but there is in fact no snow observed at the ground and conversely, the MU error represents the case when the model simulates no snow, but snow is in fact reported at the climate station. Both types of error are represented by the relative frequency of station days that were misclassified in all climate stations, estimated as follows:
The MO and MU errors are evaluated in different seasons (months), by summing the classification categories for all ground measurements.
 The accuracy of the snow depth simulations is evaluated by the mean snow depth error (ME):
where n is the number of days and SDsim and SDobs represent the simulated and observed snow depth in cm, respectively. The ME error is estimated for each climate station in the entire period, individual years and seasons and is summed up for all stations in particular seasons (months). In addition to the evaluation of snow depth accuracy, we also compare predicted annual maximum snow depth with observed annual maxima.
3.1. Study Region and Snow Depth Observations
 The spatial domain which the present study is concerned covers the region of Austria. Austria has an area of about 84,000 km2 and is characterized by flat or undulating topography in the east and north and by steep mountainous terrain in the west and south (Figure 2). Elevations range between 115 m above sea level (asl) in the eastern lowland part of the country and 3797 m asl in the Alps. Climatologically, Austria is situated in the temperate zone at the border between the Atlantic and the continental part of Europe. Mean annual temperature varies from about 10°C in the lowlands to less than −8°C in the Alps. The mean annual precipitation varies from less than 400 mm yr−1 in the east and almost 3000 mm yr−1 in the west. Land use is mainly agricultural in the lowlands. At medium range elevations forest dominates and Alpine vegetation and rocks prevail in the highest mountain regions. Austria therefore provides a diverse range of climatic, physiographic, and land cover types in which to perform model validation. These conditions maximize the applicability of our findings to other areas with similar properties.
 The snow data set used in this study consists of daily snow depth measurements at 601 climate stations with more than 15 years of observations in the period from January 1976 to December 2000. The locations of these climate stations are shown in Figure 2. The snow depth readings are taken from permanently installed staff gauges and hence are point measurements at artificially cleared grassy sites. Observations are made daily at 0700 LT, and snow depths are reported in centimeters as integer values [Hydrographischen Zentralbüro, 1992].
 For the quantitative validation of the land surface scheme, the spatial locations of point measurements are important. The spatial arrangement of climate stations for snow validation in Austria has been previously evaluated by Parajka and Blöschl . They demonstrated that the snow depth measurements cover a wide range of elevation zones of the region, but in mountain regions the stations tend to be located at lower elevations, typically in valleys. The highest climate station used in this study is at 2290 m asl which means that the area above that elevation (comprising 6% of Austria) is not represented by any climate station. The comparison of elevations of climate stations and the mean elevations of model grid boxes indicates that the stations tend to be lower than the mean model grid box elevations. Only 10% of stations are located higher than 100 m above their corresponding model grid box, 35% of stations are located within 100 m of the elevation of the equivalent model grid box, 35% of stations are 100–500 m lower than their corresponding grid box elevation, and 20% of stations are more than 500 m lower than the equivalent model grid box.
3.2. Model Driving Data
 In this study, the land surface model is driven by two data sets at a 0.22° (25 km) horizontal resolution: one produced using a regional climate model (RCM), the other in which RCM precipitation and air temperature data were replaced with observed values. In the first data set, (hereafter referred to as RCM), meteorological driving data were obtained from a simulation in which an RCM was driven with data from the ERA-40 reanalysis experiment [Uppala et al., 2005; Buonomo et al., 2007]. The regional climate model used to produce these data was HadRM3, which is described by Buonomo et al. . This model is based on the atmospheric component of the Hadley Centre GCM, HadCM3 [Pope et al., 2000; Gordon et al., 2000]. The RCM was run with a 5 min time step with lateral boundary conditions updated every 6 h from ERA-40 data represented on a 1.125° × 1.125° grid (the highest resolution regular latitude-longitude grid compatible with the spectral resolution of the ERA-40 data set). The RCM was run from 1958 until 2002, with the results from the initial year discarded to account for model spin-up. A number of changes were made by Buonomo et al.  to improve the simulation of precipitation in the regional model. When using output from this RCM simulation, we retained the distinction between large-scale and convective rain and snow. The fact that this RCM simulation is nested within lateral boundaries from a reanalysis climatology rather than a GCM means that the day-to-day variations in RCM meteorological outputs will correspond more closely to observed weather than if the lateral boundary forcing were derived from a free-running GCM; this constraint allows us to compare the RCM outputs directly with observed data.
 In the second data set, RCM precipitation and air temperature were replaced with local observations from Austria. It should be noted that the RCM precipitation diagnostics are split into four components based on the precipitation type and generating mechanism (large-scale rain and snow result from stably stratified clouds produced in midlatitude frontal systems with convergence into lows and upslope flow; whereas convective rain and snow result from convective clouds which form during buoyant ascent). All four of the precipitation components produced by the RCM were replaced with a single measure of observed total precipitation. We assigned this total precipitation to be entirely rain or snow based on a temperature threshold of 274 K. The observed data comprise daily measurements of precipitation at 1091 stations and daily air temperature at 212 climatological stations. The observed data were spatially interpolated onto a 1 km grid mesh using elevation as auxiliary information. External drift kriging [Pebesma, 2001] was used to interpolate precipitation data and a least squares trend prediction [Pebesma, 2001] method was applied to air temperature. The data were then aggregated to the same resolution as the land surface model grid. Temporal disaggregation of precipitation data to an hourly time step was achieved using a simple uniform partition. The temporal disaggregation of daily air temperature data was achieved by scaling the diurnal cycle represented in the RCM to match the observed mean daily air temperature. This combined data set is, in this study, referred to as RCM+PT.
 A comparison of RCM and gridded observed precipitation and air temperature is shown in Figure 3 for the mean seasonal difference between RCM and observed precipitation and air temperature. The evaluation indicates that the RCM tends to overestimate precipitation when compared with observed values. A difference within the range of ± 10 mm was found in 30% of grid boxes (Austria) in autumn (September–November) and winter (December–February) and 18% in spring (March–May). A difference larger than ± 50mm was observed in 17%, 33%, and 37% of grid boxes in autumn, winter, and spring, respectively. Spatially, the largest RCM overestimates were observed mainly in the western part of the Alps in winter and spring. The difference between modeled and observed air temperatures is expressed by the difference in mean seasonal degree day totals above 273.15 K. The comparison shows a distinct regional pattern in autumn and winter. In autumn, very good agreement was observed between RCM and observed temperatures in lowland and hilly regions (47% of Austria) where the RCM was similar to or slightly underestimated the mean seasonal degree day totals. The mean absolute air temperature difference was less than 1°C in these regions. In the mountains, the RCM air temperature was lower than observed (mostly within the range 1°C–4°C), which resulted in underestimation of degree day totals more than 120 K. In winter, the RCM overestimated air temperature in the lowlands and hence overestimated the mean degree day totals (more than 120 K). In alpine regions, the mean degree day difference was very small. The RCM underestimated slightly the observed air temperature, but air temperature was mostly negative. In spring, the RCM remarkably underestimated air temperature in 87% of Austria, which resulted in large underestimation of degree day totals. Overall, the comparison of mean degree day totals indicated overestimation of potential snowmelt rates in lowlands in winter months and underestimation of snowmelts in alpine regions in autumn and practically in whole Austria in spring season. In combination with the overestimation of the RCM precipitation, the RCM data set has the tendency of longer and larger snow coverage in comparison with observed data.
4.1. Snow Cover Validation
 The evaluation of the snow cover accuracy, ka, over the entire period 1976–2000 is presented in Table 3. The results show good agreement between the land surface model simulations and snow observations at climatological stations. The median values of the accuracy index, ka, for all 601 stations were 89 and 91% for the RCM and the combined RCM+PT driving data sets, respectively. A larger difference between the two runs was observed for 10% of stations with low ka accuracy. The 10th ka percentile for the simulation driven by the RCM data set was approximately seven percentage points lower (77%) than that obtained using the RCM+PT data set (85%).
Table 3. Snow Cover Accuracy, ka (in %), by Percentile for Simulations Driven by the RCM and RCM+PT Data Sets for 601 Climate Stations in the Period 1976–2000
 The comparison of the accuracy index in individual years revealed only small interannual variability (Figure 4). The median of the ka index varied between 87% (year 1996) and 90% (2000) for the RCM and between 87% (1996) and 93% (1989) for the RCM+PT. The low accuracies in 1996 were caused mainly by considerable model underestimation of snow cover, which occurred on more than 21 days at 50% of climate stations. The comparison of the two model runs indicates a larger degree of scatter in the ka accuracy index calculated for the RCM-driven data set. The percentile (p) difference (p75% − p25%) varies between 7.1% and 16.2%, in comparison to a range 4.4–11.2% obtained by the RCM+PT. The largest scatter was observed in 1989. This year is characterized by low snow coverage, where only 3.5% of stations have mean annual snow depths greater than 10 cm, and 64% of stations have mean annual snow depths less than 1 cm. Interestingly, 1989 is the year with the highest ka agreement for the RCM+PT run.
 The monthly distribution of snow cover accuracy shows a typical seasonal pattern (Figure 5). The period between May and September is characterized by very little snow occurrence and hence the ka accuracy is close to 100%. In contrast, the main changes in snow cover are observed in winter and spring, resulting in larger discrepancies between simulated and observed snow cover. During the period December–March, the median accuracy, ka, was lower than 80% for both simulations. The agreement between snow simulations based on the RCM data alone is somewhat lower and has larger variability than that obtained by using the combined RCM+PT data set. The largest variability in ka was seen in April, when the RCM snow cover accuracies vary between 66% and 96% at 50% of the climate stations. In April, snow typically covers only the high mountain regions and the flatlands, and the valleys are already snow free.
 A detailed examination of the types of snow simulation errors is presented in Table 4. These results indicate that, in general, the land surface model tends to overestimate snow cover in November and December, and underestimate snow cover in February and March. The largest overestimation (MO) errors, 11–16% for the RCM and 11% for the RCM+PT, were observed in November and December, respectively. In this season, the model overestimated the snow cover in the lowlands and alpine valleys. It is interesting to note that stations located south of the main ridge of the Alps showed larger MO errors than stations located north of the ridge. The underestimation errors (MU) in February and March are similar in magnitude to the overestimation errors (MO), but, interestingly, are about 4–5% larger for the RCM+PT simulations. Spatially, the MU errors dominate in flat and hilly regions of Austria where elevations are typically below 800 m, rather than in the high mountains (elevations above 1500 m asl).
Table 4. Seasonal Variations of the Snow Cover Overestimation (MO) and Underestimation (MU) Errors (%) for Land Surface (JULES) Snow Simulations Driven by the RCM and RCM+PT Data Setsa
The first value is the median, and the second value is the percentile (p) difference (p75% − p25%) over 601 stations in the period 1976–2000.
 Spatial patterns of the ka accuracy index are displayed for each season in Figure 6. Figure 6 (top) shows ka accuracy for the RCM data set; and Figure 6 (bottom) shows the ka for the combined RCM+PT simulations. Comparison reveals that the spatial pattern of seasonal snow cover accuracy is similar for each simulation, and shows that the detailed spatial pattern of accuracy is related to the topography of the region. Obviously, the best agreement between the model and station measurements is in summer (JJA), when snow is very rare. The exceptions are the highest stations in the Alps, at which the model systematically underestimates the snow cover. In autumn (SON), there is a distinct contrast between the accuracy in lowland areas and that in hilly and alpine regions. The model tends to overestimate snow cover duration, especially at stations located in hilly and mountain regions (600–1500 m asl) resulting in a lower ka accuracy. Underestimates of snow cover duration in autumn are observed only at some of the highest stations, located at about 2000 m asl. In winter (DJF), the highest ka agreement is observed for stations in mountainous areas, where snow cover is typically well developed and continuous. In general, lower agreement was observed for stations in lowlands. The lowest values of the ka accuracy index (less than 70%) were found at stations in the northern and southern parts of Austria, where the model melts snow much more quickly than observed. This difference is responsible for a considerable underestimate of snow cover. In spring (MAM), the lowlands are usually without continuous snow cover and this situation is correctly simulated by the model. The lower accuracies observed in the Alps are caused mainly by the overestimation of snow cover duration in April and May. Such errors are larger in the RCM-driven run, especially in the central and southern part of the Alps. These are most likely caused by overestimation of total precipitation in the RCM data set.
 In winter and spring, a strong relation between observed snow depth and elevation was found, especially in alpine regions. The strong variability in snow cover results in a considerable model subgrid variability in the ka agreement, which is seen mainly in western part of the Alps. A more detailed insight into the model snow cover accuracy in different elevation zones can be obtained from Figure 7. The distinct seasonal pattern in the accuracy index, ka, is strongly related to snow cover duration. The season in which the model accuracy is lowest varies according to the exact timing of onset of snow cover and subsequent melting of snow in different elevation zones. It is in these periods that the model is most susceptible to errors. We note that in the elevation zones with more continuous snow cover (above 800 m asl) the model accuracy tends to increase after a drop in late autumn and then again decrease during the snowmelt period. The ka accuracy drops to 75–80% in the lowlands and hilly regions (elevations below 800 m asl) and is even lower in the mountains (ka is only about 60%). The RCM-driven simulations have lower accuracies for the stations in the elevation zone below 300 m asl and the zone between 800 and 1500 m asl. In contrast, the RCM+PT simulation tends to overestimate the snow cover in the highest locations (above 1500 m asl).
 An indication of how the spatial and temporal patterns of the ka accuracy index translate into the accuracy with which mean annual snow cover duration is simulated is shown in Figure 8. Figure 8 (top) shows the relative snow cover duration in the period 1976–2000, as it is observed at climate stations. Figures 8 (middle) and 8 (bottom) show the snow cover duration estimated from model simulations driven by the RCM and RCM+PT data sets, respectively. The spatial patterns indicate that the JULES model provides good estimates of snow cover duration over Austria. The differences were small and observed mainly in alpine regions. In order to statistically quantify the agreement, the coefficient of determination R2 was estimated for 212 stations situated at approximately the same elevation (±100 m) as the model grid. The R2 values were 0.84 (RCM+PT) and 0.71 (RCM), and clearly show that the model is able to explain a large variation in snow cover duration at average grid box elevations. The effects of elevation on the subgrid variability in snow model performance are further examined in section 5.
4.2. Snow Depth Validation
 A statistical evaluation of mean annual snow depth error (ME) over the entire period 1976–2000 is presented in Table 5. These results indicate much larger differences between the RCM and RCM+PT simulations than were obtained in the snow cover comparison. The medians of ME are similar, −0.4 cm for the RCM and −1.2 cm for the RCM+PT; however, the RCM simulation considerably overestimates the observed snow depth at more than 25% of climate stations. The mean annual ME at 10% of stations is even larger than 40 cm. In contrast, the RCM+PT-driven simulation has a significantly smaller range in ME and, while this data set gives an overall underestimate of snow depth, the mean annual ME lies within the range −0.5 to −3.8 cm at 300 climate stations.
Table 5. Mean Annual Snow Depth Error, ME (in cm), by Percentile for Simulations Driven by the RCM and RCM+PT Data Sets for 601 Climate Stations in the Period 1976–2000
 The same patterns of similar medians, but significantly larger scatter and large overestimation of snow depth at numerous stations for the RCM run are clearly demonstrated in the ME interannual variability evaluation (Figure 9). The median of ME is very close to zero for the RCM but up to 2–3 cm below zero for the RCM+PT run. However, the 75% percentile of the mean annual ME is several times higher for the RCM simulation than for the RCM+PT simulation. A comparison with the distribution and variability of observed snow depth indicates that the lowest ME median and percentile difference (scatter) was, in general, observed in years which had lower snow depths (e.g., 1989). We note that there is no obvious relation between the distribution and variability of observed snow depth and mean annual ME.
 The seasonal evaluation of the model bias is presented in Figure 10. The results show significant variability in the mean monthly ME between stations and between the simulations. The snow depth simulation is essentially unbiased at a large sample of stations; however, at many stations the model overestimates (RCM) or underestimates (RCM+PT) observed snow depth. The largest mean monthly ME variability occurred between January and March. The largest percentile difference was in February, −5 to 38 cm for the RCM and −15 cm to 0 cm for the RCM+PT. For comparison, the observed mean monthly snow depths lie between approximately 5 and 30 cm.
 Spatial patterns of the mean seasonal ME (Figure 11) reveal that in the RCM simulation the model overestimated the observed snow depth in western Austria. In this region, the ME patterns tended to follow the boundaries of model grid boxes (driving data), which indicates that the bias may be caused by inaccuracies in the RCM precipitation data set, although uncertainties introduced by the model structure and parameterization cannot be ruled out. For the RCM+PT simulation, the patterns of snow depth accuracy are more closely linked with the topography; a slight overestimate was found in valleys and a slight underestimate in higher locations. In winter, the model underestimation is more pronounced in central parts of the Alps. In the lowlands, snow depth is simulated well by both driving data sets. The assessment of mean monthly ME in different elevation zones shows that the simulations in lowland areas (below 300 m asl), and for the RCM+PT data set also in hilly regions (300–800 m asl), are unbiased (Figure 12). These regions are, however, characterized by low snow coverage with mean monthly snow depth less than 15 cm. The larger overestimation errors are observed for RCM simulations. In hilly regions the model overestimates the mean snow observations by about 20 cm in February and March; the overestimation is even larger in higher elevation zones. Interestingly, the largest overestimates are observed in elevation zone 800 to 1500 m asl in the RCM experiment. In contrast, the simulation driven by the RCM+PT forcing data tended to underestimate snow depth in higher elevation zones. The largest underestimate was found at stations in high mountains (the elevation zone above 1500 m asl), more than 60 cm in March and April. This is somewhat lower than the mean monthly snow depth observations, which are more than 80 cm in the period February to April.
 The evaluation of model bias indicates the tendency of the model to overestimate or underestimate snow depth at different stations in different time intervals. A more detailed insight into the model's ability to simulate extreme snow depth is given by Figure 13, which shows the mean of annual snow depth maxima in the period 1976–2000. Figure 13 (top) shows the mean of observed annual maxima at climate stations, whereas Figures 13 (middle) and 13 (bottom) show the mean of the annual maxima simulated using the RCM and RCM+PT forcing data, respectively. The mean of the observed annual maxima in lowlands is below 50 cm, and this observation is accurately reproduced in both model simulations. The RCM-driven simulation generally overestimates snow depth extremes in the Alps, especially in the western part. The mean of the annual maxima exceeds 500 cm in 21 model grid boxes. The RCM+PT simulation is much closer to the extreme observations in mountains. Interestingly, the model tends to overestimate the annual maximum, although the bias evaluation indicated a tendency to underestimate the snow depth.
5. Discussion and Conclusions
 The main objective of the study was to evaluate the performance of the JULES land surface model in simulating snow cover and depth. The examination of land surface model snow simulations over Austria enables us to analyze the performance of the snow model at a large number of climate stations, situated over a range of elevation zones, ranging from lowlands to high mountains. The model evaluation shows that the land surface model simulates the snow cover dynamics well compared with observed data. The median snow cover accuracy was 89% and 91% for the RCM and the combined RCM+PT driving data sets, respectively, with only a small interannual variation. In the period of major snowmelt, the model tends to underestimate snow cover in the lowlands and, conversely, overestimate snow cover in the mountains. These findings are most likely related to the model structure, particularly to the parameterization of snowfall threshold air temperature and snow density, which were assumed to be constant for the entire period and region. In contrast, the differences between modeled and measured snow depth were much larger. The medians of snow depth bias were similar, −0.4 cm for the RCM and −1.2 cm for the RCM+PT; however, the RCM simulation considerably overestimates the observed snow depth at more than 25% of climate stations.
 Overall, the JULES snow model performance is similar to that obtained in other validation studies. Sheffield et al.  reported an average 75–80% snow cover agreement over the United States during the winter months and about 85–90% snow cover accuracy in the spring. This fits well with the 80% and 87% accuracy of JULES over Austria in winter and spring, respectively. The snow depth assessment over France [Habets et al., 2008] referred to a low average bias varying between 3 cm in lower elevations and 10 cm in the mountains. For higher elevations, they reported a systematic underestimation of the snow depth in the period from January to February (elevation zone 1250–2000 m), and an overestimation of the snow depth from September to January (elevations above 2000 m). In this study, we found similar underestimation of the land surface model in the elevations above 1500 m asl. Biases larger than 5 cm were observed at more than 15% climate stations only in winter (December–February). A remarkable overestimation of the annual maximum snow depth was observed in the Alps (Figure 15), although the bias evaluation indicated a tendency to underestimate the overall snow depth. This finding differs from the results of Pan et al. , who presented a consistent underestimation of maximum annual SWE, as simulated by four different land surface schemes. This bias, however, was explained to a large extent by biases in forcing precipitation data, which are clearly strongly related to the selected region and forcing data set.
 The snow observation data set used for model validation is very similar to that which was used by Parajka and Blöschl  to assess the accuracy with which the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite snow cover product can be used to estimate snow cover. The comparison of snow cover accuracy ka between the two studies shows that the median accuracy of 90% is somewhat lower than 95% obtained in the MODIS validation for cloud-free days; however, the median accuracy is much larger than the accuracy index, ka, obtained taking into account all weather conditions (38.5%) [Parajka et al., 2010]. This finding indicates that, for regional studies, the RCM data and land surface simulations may prove to be a valuable alternative source of data for snow cover simulation in ungauged regions. Of further note in the context of such studies is the relation between snow cover and snow depth performance measures, shown in Figure 14. We note that even if snow cover can be simulated with a high degree of accuracy, this does not imply a similarly high degree of accuracy in the simulation of snow depth. For example, Figure 14 shows that in situations where the annual snow cover accuracy, ka, is below approximately 90% the bias in snow depth may fall within the range −20 to +80 cm. This finding clearly suggests that demonstrations of accurate simulation of snow cover do not imply accurate simulations of snow depth.
 One of the objectives of this study was to compare two model simulations; one driven by the RCM, the second by the combined RCM with local observations (RCM+PT). The main benefit of the RCM is the availability of different meteorological variables at a subdaily temporal scale and its larger spatial extent. Examination of snow simulations showed that the extent to which the results from RCM-driven simulations match observed data is strongly related to the initial accuracy of the RCM precipitation. The present assessment has demonstrated that the RCM simulation of snow cover corresponds well with observed data in the lowlands (elevations below 300 m asl) and hilly regions (elevations between 300 and 800 m asl) in the eastern part of Austria. However, larger discrepancies were found in the western part of the Alps. We suggest that these differences are caused by overestimates in the RCM precipitation, mainly in the large-scale (frontal) snowfall precipitation component. The overestimation of precipitation by RCM has a remarkable impact on the snow mass balance simulation especially in the western Alps, which is even more significant for the assessment of extreme values. The overestimation in the Alps in winter and spring highlights the potential utility of a future bias correction to the RCM precipitation simulated in the mountains. We found that, in general, use of observed precipitation data led to a greater improvement in snow model performance than use of observed temperature data. Based on these findings, we suggest that correction of systematic biases in driving data is likely to prove important especially for hydrologic applications, which rely on quantitative measures of snow-mass dynamics [cf. Horton et al., 2006; Leander and Buishand, 2007].
 One key advantage of a detailed data set of regional snow observations is that it enables the investigation of spatial and temporal patterns in snow cover and depth dynamics. The present study reveals the importance of subgrid variability. The simulation driven with observed precipitation and air temperature data (RCM+PT) was essentially unbiased in lowlands and hilly regions. The snow depth bias was largest in the mountains; this finding is strongly related to the subgrid variability of snow depth within each grid box. In general, the simulated mean grid box snow depth was larger than observed in valleys and lower than observed at stations situated above the mean grid box elevation. Figure 15 shows the relation between the snow model performance and the elevation difference between the stations and model grid boxes. Figures 15 (top) and 15 (bottom) give the mean annual snow cover and snow depth efficiency, respectively. Figure 15 demonstrates that the best model performance occurs when the grid box mean elevation is approximately equal to the station elevation. This finding holds both for snow cover accuracy and snow depth error. Moreover, the deterioration in model performance when topographic heterogeneity is poorly resolved in the model is greater in mountainous topography than in lowland regions. Our analysis indicates that a fairly simple resolution of subgrid topography may lead to significant improvements in the model's ability to predict snow cover and depth. Our future research will be directed toward investigation and development of computationally efficient representation of subgrid topography (e.g., as proposed by Blöschl , Essery , or Liston ). We also plan to include sensitivity analyses and validation of the new snow multilayer scheme and to evaluate snow cover dynamics by using remote sensing products.
 The authors would like to thank the Austrian Science Foundation (FWF project P18993-N10) the Austrian Academy of Sciences (predictability of runoff), and the United Kingdom Natural Environment Research Council (NE/E011969/1). The British Council Researcher Exchange Program (RXP 397) provided financial support and the Austrian Hydrographic Service (HZB) provided hydrological data. This work was undertaken while the lead author was a recipient of a Visiting Fellowship to the Joint Centre for Hydrometeorological Research which is funded by the Centre for Hydrology and Ecology and the UK Met Office. We thank Richard Jones and Erasmo Buonomo for providing Regional Climate Model output, and we thank Douglas Clark and Martin Best for making the JULES model available. We thank Vicky Bell, Douglas Clark, and Richard Harding for providing constructive feedback on the manuscript.