Simulations of snow-covered area (SCA) over Northern Hemisphere lands by a suite of general circulation models (GCMs) are evaluated. Results from GCM experiments submitted by an international array of research groups participating in the second phase of the Atmospheric Model Intercomparison Project (AMIP-2) are compared to a data set derived primarily from visible band satellite imagery provided by the United States National Oceanic and Atmospheric Administration. At continental to hemispheric scales we find improvements over AMIP-1 models, including the elimination of temporal and spatial biases in simulations of the seasonal cycle of SCA, as well as improved simulations of the magnitude of interannual variability. At regional spatial scales, while no consistent model biases are identified over North America, regions over Eurasia are identified where models consistently either underestimate or overestimate SCA at the southern boundary of the seasonal snowpack. The region of greatest model bias is eastern Asia. While SCA biases are associated with temperature and precipitation biases, over only one region do we find a relationship between the magnitudes of SCA biases and the magnitudes of temperature and/or precipitation biases.
1. Introduction: Cryospheric/Snow Cover Fluctuations in Perspective
 Substantial changes in Northern Hemisphere high latitude climate and the cryosphere have been observed during the second half of the twentieth century. Serreze et al.  synthesized evidence of high latitude environmental changes from a number of reports across an array of scientific disciplines. Between 1920 and 1940 and again between 1966 and 1995 significant annual mean warming of up to 1 C per decade occurred over the Eurasian as well as western and central North American landmasses; while a much smaller area covering eastern North America, the North Atlantic Ocean, and southern Greenland apparently cooled by a comparable amount. These trends are most apparent in winter, and to a lesser extent spring. Between around 1940 and 1970 a cooling trend is apparent. Observed changes in various components of the cryosphere, including sea ice and snow cover; glacier mass balances; permafrost; lake and river ice are consistent with observed temperature changes. Twentieth century arctic temperatures were higher than at any time during the last four centuries, and this warming can probably be attributed to a combination of natural and anthropogenic causes [Overpeck et al., 1997; Serreze et al., 2000]. Whatever the cause, because of the unique nature of the cryosphere and of the arctic, climate fluctuations are likely to have a significant impact on ecosystems in that region [Walker, 1999], and large-scale cryospheric variability is likely to be a sensitive indicator of climate change.
 Analysis of remotely sensed imagery, available since 1967, shows that Northern Hemisphere snow-covered area (SCA) peaks in January or February at around 45 × 106 km2 (over 60% of which is over the Eurasian continent), and reaches minimum values in August at around 4 × 106 km2 (more than half of which is over Greenland) (Figure 1a) [Robinson and Frei, 2000]. Interannual variations of SCA can also result in significant year-to-year differences in surface characteristics. For example, mean winter (December–February) SCA between 1967 and 2002 varied between around 41.2 × 106 km2 and 48.6 × 106 km2, with similar ranges (maximum–minimum value) observed during other seasons. SCA peaked during the mid- to late-1970s, and reached minimum values during the late-1980s through the early 1990s. With the exception of a few years during the mid-1990s, SCA after around 1987 has tended to be lower than during the earlier part of the satellite era, particularly during spring (Figure 1b). SCA variations during the last few decades are consistent with expected results of global warming; and with the timing and phase of observed modulations of seasonal cycles in temperature, atmospheric CO2 concentrations, and the terrestrial biosphere [Thomson, 1995; Mann and Park, 1996; Myneni et al., 1997; Dettinger and Ghil, 1998; Keeling et al., 1996; Robinson et al., 2001; Groisman et al., 1994].
 In order to view satellite era variations within a broader context, historical variations of SCA have been estimated. Frei and Robinson , Frei et al. , and Brown  used station observations in conjunction with satellite data to estimate continental scale SCA fluctuations over North America back to the early twentieth century. SCA was relatively low during the 1920s and 1930s, and tended to increase during subsequent decades. During the mid- to late-twentieth century seasonal differences began to emerge. November and December SCA continued to increase until around the 1970s, after which no trend is apparent. January and February SCA peaked around 1980, and subsequently decreased. In March and April, SCA apparently peaked, and began to decline, during the 1950s and 1960s. Brown  was able to detect recent decreases in springtime SCA over the Eurasian continent as well. Studies of historical snow depth variations over North America [Brown and Braaten, 1998] and Eurasia [Ye et al., 1998] corroborate these results. Thus, during the latter half of the twentieth century we have observed a trend toward earlier spring snowmelt.
 The Atmospheric Model Intercomparison Project (AMIP) was initiated in 1989 under the auspices of the World Climate Research Programme. Its mission is to systematically compare and evaluate atmospheric GCMs that have been developed by an international array of research institutes for investigation of climate change issues [Gates, 1992]. AMIP modeling groups run experiments for designated years with identically specified boundary conditions, including observed sea surface temperatures, so that discrepancies in model results are attributable to internal differences between atmospheric models. One of the diagnostic subprojects designated by AMIP has as its goal the evaluation of modeled snow cover.
 The first phase of AMIP (AMIP-1) has been completed [Gates et al., 1999]. Frei and Robinson  evaluated snow simulations from 27 AMIP-1 GCMs, finding that at continental to hemispheric scales models provided a reasonable simulation of the mean annual snow cycle, although underestimations of fall and winter SCA were found over North America, while overestimations of spring SCA were typical over Eurasia. The models failed to reproduce observed interannual variability in two respects. First, in terms of dispersion, the ranges and interquartile ranges of seasonal and annual mean SCA displayed by almost all models were less than half of observed values. Second, modeled SCA had no year-to-year correlation to observed values, indicating that SCA in the models was not driven by sea surface temperature. In addition, AMIP-1 GCMs displayed inconsistent abilities to reproduce observed relationships between synoptic scale circulation features and SCA. (For background information on AMIP, and an extensive list of references, see the AMIP website at http://www-pcmdi.llnl.gov/amip/.)
 Earlier evaluations of GCM snow simulations [e.g., Foster et al., 1996; Zhong, 1996; Yang et al., 1999] found mixed results with regards to SCA and snow water equivalent (SWE), depending on the models being evaluated, the types of experiments being performed, and the data set used for validation. AMIP, which was developed precisely to overcome the difficulties in interpreting GCM evaluations performed under such disparate conditions, represents the most ambitious and comprehensive evaluation effort to date. AMIP includes longer observed time series, a larger suite of GCMs, and more similar boundary conditions in comparison with previous studies. AMIP-2 contains improvements over AMIP-1, including ensemble experiments using a more recent generation of models, improved parameterizations, longer integration periods, and increased spatial resolution. Here we present the first results from the diagnostic subproject for snow cover for the second phase of AMIP (AMIP-2).
2. AMIP-2 Models
 AMIP-2 includes GCMs that were developed by research groups around the world. A number of them are adaptations of identical earlier models. Table 1 identifies the fifteen modeling groups whose results are available at the time of this writing and are included in this analysis. A variety of numerical schemes are employed, including both finite differences and spherical harmonics. All model experiments include the same time interval (1979–1995).
Table 1. Fifteen General Circulation Models Evaluated in This Analysis
Canadian Centre for Climate Modeling and Analysis
24 × 96
Center for Climate System Research
32 × 128
Centre National de Recherches Meteorologiques
32 × 128
Department of Numerical Mathematics
23 × 72
European Centre for Medium-Range Weather Forecasts
46 × 180
Goddard Laboratory for Atmospheres
23 × 72
Japanese Meteorological Agency
48 × 192
Meteorological Research Institute
32 × 128
National Center for Atmospheric Research
32 × 128
Pacific Northwest National Laboratory
32 × 128
State University of New York, Albany
32 × 128
UK Universities' Global Atmospheric Modeling Programme
36 × 96
University of Illinois at Urbana-Champaign
23 × 72
United Kingdom Meteorological Office
36 × 96
23 × 72
 The treatments by the models of the physical processes that affect snow cover take on a variety of permutations. In general, models accumulate surface snow during precipitation events when the temperature of the lowest atmospheric level is at or below freezing. Snowmelt typically occurs as a result of the energy balance of the snowpack, including terms for sensible and latent heat fluxes, with some models including a term for the latent heat flux of nonfrozen precipitation. Most models also include a term for sublimation, which is then added to the evaporative flux from the surface to the atmosphere. Many models parameterize fractional snow coverage in a grid box using a critical threshold of SWE: for SWE values below the critical threshold, fractional coverage is proportional to SWE; above the critical threshold, fractional coverage is usually 1 (0.9 in one case). Fractional coverage sometimes depends on surface vegetation characteristics. Snow cover usually affects the surface albedo and surface thermal properties (i.e., heat conduction and heat capacity). In some models the parameterizations for these effects depend on surface roughness; and, surface roughness is sometimes parameterized as a function of snow cover. Many of the models employ the SiB [Sellers et al., 1986], BATS [Dickinson et al., 1993], or LSM [Bonan, 1996] land surface biosphere routines, which include modules for handling snow. None of the models include the recently released Common Land Model (CLM) [Zeng et al., 2002], which was developed as cooperative effort among seven institutions. The CLM provides improved treatment of land surface processes including snow cover, resulting in improved simulations of many aspects of land surface-atmospheric interactions. Zeng et al.  find that the inclusion of CLM results in improved snow cover simulations during the accumulation season.
 A complete discussion of the characteristics of individual models is not possible here. The grid resolution of each model included in this analysis is indicated in Table 1 by showing the number of latitude/longitude grid cells covering the Northern Hemisphere. The reader is referred to the AMIP model documentation web site www.pcmdi.llnl.gov/modeldoc/amip2/index.html) for more details, but is advised that at the time of this writing model documentation remains incomplete.
3.1. SCA and SWE Observations
 The principal data set used for estimating historical large-scale SCA is based primarily on visible-band imagery. This weekly data set, produced by the National Oceanic and Atmospheric Administration (NOAA), covers the period from around 1967 to present, constituting the longest remotely sensed environmental time series that has been derived in a consistent fashion [Robinson and Frei, 2000; Robinson, 1993] (also see climate.rutgers.edu/snow cover). The use of visible band imagery to identify snow-covered surfaces is problematic under cloudy conditions; over heavily forested areas; and during low winter solar illumination. With regard to climatological analysis, the prime region of significant size with questionable accuracy is the Tibetan Plateau during the 1970s. During this time we believe that this product overestimates snow extent. However, as the AMIP-2 time period begins in 1979, this problem should be minimal. Other areas with lesser potential inaccuracies include winter over Europe (which covers a relatively small area) and early fall over Siberia.
 Passive microwave based satellite sensors, which have been used to monitor SCA since 1978, provide an alternative and avoid some of the pitfalls inherent in visible-band data. However, microwave sensors have problems identifying snow cover that is thin or patchy; snow that contains liquid water at or below the surface; or snow that has developed ice lenses, hoarfrost, or other solid ice features. The visible and passive microwave data sets provide comparable estimates of hemispheric interannual variability and long-term trends, but the microwave product tends to underestimate SCA [Armstrong and Brodzik, 2001; Basist et al., 1996]. Passive microwave has been used to track SCA fluctuations [Chang et al., 1990; Grody and Basist, 1996; Sun et al., 1997], often in regional applications [Tait and Armstrong, 1996; Tait, 1998; Walker et al., 1995]. Passive microwave offers the additional possibility of obtaining spatially complete information on snow mass, in addition to SCA, but current algorithms tend to underestimate snow mass, and are not always transferable between different geographic regions [Armstrong and Brodzik, 2002]. While each type of data has its own strengths and weaknesses, in this analysis we rely on the NOAA visible-band based data set since it is still considered the most accurate and consistent for large-scale climatological studies of SCA.
 The third potential source of snow data includes traditional station-based observations. Station based data sets do not provide adequate spatial coverage to directly estimate continental scale fluctuations, although they have been used in studies over large regions [e.g., Kripalani and Kulkarni, 1999; Onuchin and Burenina, 1996; Ye, 2000; Ye and Bao, 2001; Serreze et al., 1999; Ye, 2001]. The other primary disadvantage to station observations is that typically only snow depth, not snow water equivalent (SWE), is recorded. Nevertheless, we are currently collaborating with two groups who are using station observations in conjunction with snowpack models to derive gridded data sets of estimated snow water equivalent for use in model evaluation. Results from those analyses will be reported separately.
3.2. Temperature and Precipitation Observations
 In the regional analysis section we use gridded surface temperature and precipitation estimates from version 1.02 of Terrestrial Air Temperature and Precipitation: Monthly and Annual Time Series [Willmott and Matsuura, 2001] to investigate the causes of model SCA errors over three Asian regions identified as problem-regions later in this analysis. This data set includes over 7000 stations for air temperature and over 20,000 for precipitation that were interpolated on to a 0.5 by 0.5 degree global latitude/longitude grid using Climatologically Aided Interpolation [Willmott and Robeson, 1995], an enhanced version of a traditional distance-weighting method. Enhancements include adjustments for lapse rates using digital elevation model information and adjustments for climatological means.
Table 2. Observed Mean Temperature, Precipitation, and Snow-Covered Area (SCA) During the AMIP-2 Time Period Averaged Over Several Regionsa
SCA values calculated from visible based satellite imagery. Temperature, precipitation, and their respective cross-validation error estimates are taken from Willmott and Matsuura  (W&M, see text for explanation). For comparison purposes, results from the Northern Great Plains of the United States and precipitation estimates from Xie and Arkin  (Xi-Arkin) for all regions are also provided.
W&M Temp, °C
W&M Median Temp CV, °C
W&M Pcp, mm/day
W&M Median Pcp CV, mm/day
Xi-Arkin Pcp, mm/day
W&M Temp, °C
W&M Median temp CV, °C
W&M Pcp, mm/day
W&M Median Pcp CV, mm/day
Xi-Arkin Pcp, mm/day
W&M Temp, °C
W&M Median Temp CV, °C
W&M Pcp, mm/day
W&M Median Pcp CV, mm/day
Xi-Arkin Pcp, mm/day
Northern Great Plains
W&M Temp, °C
W&M Median Temp CV, °C
W&M Pcp, mm/day
W&M Median Pcp CV, mm/day
Xi-Arkin Pcp, mm/day
 An additional advantage to the Willmott and Matsuura  data set is provided by the accompanying cross-validated interpolation errors (CV), which are also shown in Table 2. Cross validation involves the removal of one station at a time, and interpolation of temperature or precipitation using surrounding stations. Willmott and Matsuura  provide absolute values of cross-validation errors that have been interpolated to the same spatial resolution as the temperature and precipitation fields. In Table 2 we include median CV values for each of the three regions. These were calculated by taking the mean regional CV for all grid points for each year of the AMIP-2 simulation period, resulting in a single time series for each region. The median value of that time series is provided on the table. In addition, for comparative purposes we provide the equivalent information for the U.S. Northern Great Plains, which has a dense station network.
 Temperature CV errors vary from region to region. The Tibetan Plateau has the most varied terrain and the sparsest station network, and as expected has the largest temperature CV errors. The western Asian region has little topographic variation, but has relatively few stations, and has the second largest mean CV error. Eastern China includes quite varied terrain but a more dense station network, and has the smallest mean CV errors. In contrast, the U. S. Great Plains, which has minimal topographic variability and a dense station network, has smaller CV errors than any of the three Eurasian regions.
 Precipitation CV errors are much more consistent in the three Eurasian regions, all of which have smaller median CV errors than the Great Plains. We cannot provide an explanation here for the lack of inter-region variability in precipitation CV errors, but this (in conjunction with the agreement with results from Xie and Arkin ) indicates that the Willmott and Matsuura  data set provides as accurate an estimate of regional precipitation as we are likely to obtain from available sources.
 The standard suite of GCM output provided to AMIP-2 includes gridded fields of monthly mean SWE expressed in kg/m2. Our methodology for converting from SWE to SCA includes several steps. First, an assumption of snow density is required. We assume a snow density of 250 kg m−3 because several modeling groups assume this value for their snowpack. Analyses conducted for AMIP-1 SCA evaluations indicated that results were insensitive to the density values between 200 kg m−3 and 400 kg m−3 [Frei and Robinson, 1998]. We performed similar sensitivity analyses on AMIP-2 model output for density values between 200 kg m−3 and 300 kg m−3. The assumed density does affect estimated SCA values in AMIP-2, but before explaining why that occurs and discussing the magnitude of the effect, we must first describe the remaining steps in the methodology.
 To convert from mean monthly snow depth to fractional coverage, grid cells with depths in excess of a critical snow depth are considered 100% snow covered, while those with smaller depths are assigned a fractional coverage. A critical depth value of 2.5 cm is chosen because this is generally the minimum snow depth that the visible satellite can detect over open areas [Kukla and Robinson, 1981]. For grid cells with <2.5 cm of depth, SCA is defined as a linear function of depth (i.e., a depth of 1.25 cm would result in fractional coverage of 50%). We justify this on the grounds that monthly mean values represent a distribution of values of daily and weekly means that would, if observed by the satellites, result in a nonzero estimate of monthly mean fractional coverage. This is the only part of the methodology that differs from the AMIP-1 SCA evaluations of Frei and Robinson , who did not include a fractional grid box estimate. Allowance for fractional SCA raises mean winter SCA values by approximately 3.0 × 106 km2 compared to a “hard” cutoff value of 2.5 cm with no fractional coverage.
 To evaluate continental to hemispheric scale results, we estimate the fractional land coverage north of 20°N for each month by summing, over all cells within the region of interest, the product of each cell's fractional coverage by its land area. The conversion from SWE to fractional coverage is summarized by the following equations:
where di = snow depth (m) for grid cell i; dc = critical snow depth (m) = 2.5 × 10−2; σ = snow density (kg m−3); SWEi = snow water equivalent for grid cell i (kg m−2); fi = fractional snow coverage for grid cell i; fR = fractional snow coverage for region R; and ai = land area in grid cell i (m2). Fractional coverage, rather than absolute values in units of area, is preferred because models with different grid schemes have different total land areas: thus, normalizing by total land area provides a more fair comparison.
 The effect of the density assumption on total SCA, which was mentioned above, is due to the critical depth value: greater density values result in smaller snow depths, which cause more grid boxes to fall under the critical value, resulting in fewer grid boxes with 100% snow coverage. For all models except one (model 6) the effect is seasonally dependent and decreases as the snow season progresses. In fall, a change in assumed density from 200 to 300 kg m−3 decreases Northern Hemisphere SCA by <2.5% of the Northern Hemisphere land area; the effect decreases in winter to under 2%, and in spring to under 1.5%. Model 6 is more sensitive than other models, with density changes affecting Northern Hemisphere SCA by ∼5% during all seasons, because this model has a large area covered by a very shallow snowpack when compared to other models.
 Greenland is included with the North American calculations. This hardly affects the conclusions, as almost every grid cell over Greenland is snow covered during most months. Note that in the land masks associated with each model, the grid cells over Greenland for only some, but not all, models are identified as land. We ignore these designations and include the appropriate grid cells in the calculations for all models.
5. Continental to Hemispheric Scale SCA Results
 In this section we evaluate the seasonal cycle and interannual variability of modeled SCA compared to observations by spatially averaging over the entire land area of the Northern Hemisphere (NH), as well as over the Eurasian (EU) and North American (NA) continents separately. Results are shown for individual models at the seasonal timescale and for summarized statistics from all fifteen models at the monthly timescale.
5.1. Seasonal Cycle
 Most AMIP-2 models produce seasonal mean SCA within 5% of the observed mean during fall, winter, and spring over the entire NH, as well as over NA and EU separately (Figure 2). (Use of the median, rather than mean, provides comparable results.) Each panel in the figure shows a bar chart of observed and modeled SCA, with horizontal lines indicating observed values plus or minus 0.05. Applying the 5% criterion, only three models display a tendency to underestimate SCA (particularly so over North America): models 4 and 8 during all seasons, and model 13 during spring. Only four models overestimate SCA for at least one season/contintent (models 1, 3, 5, and 6). All other models simulate mean SCA within 5% for all cases. These are greatly improved results compared to AMIP-1 simulations, half of which underestimated NH winter SCA by >5% (Figure 1a of Frei and Robinson ).
 The success of AMIP-2 models in simulating the seasonal SCA cycle is apparent at the monthly timescale as well (Figure 3a). During January and February almost 60% of Northern Hemisphere land areas north of 20°N are snow covered (top panel, asterisks). Median model results (shown by box and whisker plots) fall within 2% of this value, and the interquartile spread is under 10%. During the shoulder seasons, models perform adequately but are less accurate than during winter.
 To emphasize seasonal dependencies in model simulations we express the results of Figure 3a as anomalies (Figure 3b). Median model results show little seasonal dependency, as they are within 5% of observed values during all months except October, when the median value overestimates observations by >5% over Eurasia. However, model results do exhibit seasonal dependencies, as the spread of model values tends to be greater during the shoulder seasons than during winter.
5.2. Interannual Variability
 AMIP-2 models underestimate the range of NH winter SCA (Figure 4). This figure is similar to Figure 2, except that the bars indicate “range” rather than “median,” and the horizontal lines indicate 100% and 50% of observed values. During the shoulder seasons the underestimation of NH interannual variability is worse than during winter: five models during spring and five during fall have ranges below 50% of observed. Only one model is below 50% in winter; only one model is below 50% during all three seasons; only one other model is below 50% in two seasons (spring and fall). There is no single model that stands out from the others by exhibiting closer-to-observed variability in all three seasons. Despite this obvious mismatch between simulations and observations, these results demonstrate great improvements over AMIP-1 when the underestimation of interannual variability was more severe.
 Over NA and EU individually some models overestimate interannual variability. During winter five models overestimate range of values for each continent. During spring only one or two models overestimate the observed range. During fall, half the models overestimate interannual variability over North America. Only during fall over Eurasia do all models underestimate interannual variability.
 It is apparent from Figure 4 that the range of values for the entire Northern Hemisphere is not greater than the range for either continent individually (in both observations and model results). This is because there is little correlation between extreme seasons on the two continents: for example, a large SCA year over North America does not usually correspond to a high year over Eurasia [Frei and Robinson, 1999]. Thus anomalies in SCA over the two continents will tend to compensate each other, resulting in smaller anomalies for the hemisphere as a whole.
6. Regional SCA Anomalies
6.1. SCA Anomalies
 To evaluate subcontinental scale regions, we assess simulations of SCA in each of 24 15-degree wide longitudinal bands around the Northern Hemisphere. Within each band the fraction of land area north of 20°N covered by snow is calculated in the same manner as described earlier for the continental scale analysis. Figure 5 shows results from the 15 AMIP-2 models (indicated by plus signs) and from observations (diamonds) for each season. For example, mean results for the fall season show that most models, as well as the model ensemble mean (solid line), lie above the observed values over Eurasia (between 90° and 180°E longitude).
 To better identify regions of overestimation and underestimation, we express the ensemble mean of the 15 AMIP-2 models (solid line in Figure 5) as anomalies (the differences between ensemble mean and observed values) in units of fractional area north of 20°N over each longitudinal band. When ensemble-mean anomalies are plotted as a function of longitude (Figure 6, solid line, left-hand axis), the models appear to have consistent anomalies over four particular regions. These regions include western Asia (30°–60°E) and Greenland/Iceland (300°–345°E), where SCA tends to be underestimated; and, eastern Asia (80°–120°E) and the Bering Strait region (∼180°E), where SCA tends to be overestimated. However, when expressed in fractional area units, the magnitudes of the model anomalies do not indicate which regions have errors that are significant in the context of global variability.
 To emphasize those regions over which model anomalies are significant in a global context, we identify regions with anomalies that are large in terms of absolute values of SCA (rather than fractional land areas) by expressing the anomalies in units of actual area (Figure 6, dashed line, right-hand axis). In two of the four regions (Greenland/Iceland and the Bering Strait) very little land area exists, and absolute anomalies are insignificant. The remaining two regions, eastern Asia and western Asia, have significant and consistent model biases in SCA. Eastern Asia has the largest anomalies, with SCA overestimated by >106 km2 in January. Over western Asia model biases are of the opposite sign and approximately half the magnitude. There are no biases over North America that are consistent among most or all models.
 To illustrate in more detail the spatial distribution of these anomalies, anomaly maps of mean SCA from three representative models are depicted in Figure 7. (The three models chosen for display are illustrative only; any set of models would have sufficed.) Over western Eurasia (the region bounded by approximately 44°–61°E, 35°–50°N) we find a tendency for models to have less frequent snow cover than observed. Over eastern Asia (the region bounded by approximately 80°–120°E, 28°–40°N), which includes the Tibetan Plateau and northeastern China, the predominance of positive anomalies indicates that models have too frequent snow cover. Although Figure 7 also shows large regions with anomalies over North America for individual models, the geographic locations and signs of those anomalies were not consistent between most or all models.
Tables 3a–3c show monthly and seasonal SCA anomalies for AMIP-2 models in three subregions over Eurasia. The eastern Asian region identified in the analysis above (bounded by approximately 80°–120°E, 28°–40°N) includes the Tibetan Plateau on its western half (with elevations over 4000 m) and the mountains and lowlands of northeastern China on its eastern half (with elevation range 0–3000 m). As these two areas have markedly different climates, we thought it logical to evaluate anomalies over each individually. Therefore we evaluate SCA anomalies over the Tibetan Plateau region (80°–100°E, 28°–40°N) (Table 3a) and the eastern China region (100°–120°E, 28°–40°N) (Table 3b) separately. The third region is the one identified above as western Asia (44°–61°E, 35°–50°N) (Table 3c). These three regions, the boundaries of which are shown in Figure 7 and labeled as numbers 1 through 3, are evaluated in more detail in the remainder of this analysis.
Table 3a. Model SCA (%) Anomalies (Model–Observed) Compared to Visible Band Satellite Imagery Averaged Over the Tibetan Plateau Region 80°–100°E, 28°–40°N
Mean values include models 1 through 14 only. See text for explanation.
 Over the Tibetan Plateau and northeastern China SCA tends to be overestimated. Anomalies for all models during all months are positive (with the exception of three models over the Tibetan Plateau during October, which had small negative anomalies <10%). The overestimation is greater over the Plateau (October through April mean model anomaly of 38%) than over eastern China (October through April mean model anomaly of 25%). Over both regions the overestimation is greatest between December and March (mean model values of 45% over the Plateau and 36% over eastern China).
 In contrast, over western Asia the majority of models underestimate SCA during all months. Only three models have positive anomalies during any month. These include model 1, which has a tendency to slightly overestimate SCA throughout the season; model 5, which overestimates during the latter part of the snow season; and model 6, which overestimates during the first half of the snow season. As a result, the mean model anomaly from October through April is −14%. As over eastern Asia, the magnitudes of biases are greatest during the December through March period (−20%).
6.2. Temperature and Precipitation Anomalies
 In this section we attempt to explain the SCA biases by evaluating temperature and precipitation simulations over the three Asian regions. As we will see below, model 15 has unusual results over eastern Asia in a number of respects, and is considered an outlier in this analysis. Therefore when calculating mean values for all models, and when performing multiple regression analyses, model 15 is excluded.
 The climates of these three regions display interesting contrasts (Table 2). While all three are arid, by far the coldest and driest is the Tibetan Plateau, with mean temperature for the season approximately −5°C and mean precipitation approximately 0.35 mm per day. Seasonal mean SCA for the Plateau is ∼23%.
 Regionally averaged climate data for the remaining two regions (western Asia and eastern China) present an apparent inconsistency. Both have comparable mean temperatures and precipitation rates (seasonal mean temperatures are between 4°C and 5°C, and precipitation rates approximately 1 mm/day). However, SCA over the two regions are quite different. Western Asia has over 30% seasonal mean SCA, with the mid winter months having SCA of almost 60%; in contrast, eastern China has SCA under 7% in every month. This difference is due to topographically induced within-region climate variability that is masked by regional averaging. The western Asian region includes mostly steppes with relatively small topographic, and therefore climatic, variation. The eastern China region, however, includes steep west-to-east topographic, and therefore temperature, gradients as the terrain drops from the Plateau in the west to the lowlands in the east. For example, mean January temperatures drop by approximately 10°C from the western to eastern sides of the region. Zonal precipitation gradients over eastern China are also strong, and are in the opposite direction of temperature gradients. Mean January precipitation in this region increases from 5 mm in the highlands to around 50 mm in the lowlands. Thus the eastern China region experiences significant precipitation over the warmer lowlands, and little precipitation over the colder uplands, resulting in minimal snow cover compared to the western Asian region.
 Over the Tibetan Plateau region, the models tend to be too cold and wet, which is consistent with the overestimation of SCA. Temperature anomalies in this region (Table 4a) are negative for all models except three. Models 4 and 8 have moderate positive temperature anomalies, while model 15 has unusually large positive temperature anomalies and is considered an outlier. The mean model temperature anomaly for October through April is −2.9°C, with particularly large departures in February and March. Precipitation anomalies over the Plateau (Table 5a) are positive for every model during each month, with mean October through April departures of over 1 mm per day. This is significant for such an arid region, which has mean precipitation of only 0.35 mm per day (Table 2).
Table 4a. Model Temperature (C) Anomalies (Model–Observed) Compared to Willmott and Matsuura  Averaged Over the Tibetan Plateau Region 80°–100°E, 28°–40°N
Mean values include models 1 through 14 only. See text for explanation.
 Over eastern China results are similar (Tables 4b and 5b). The cold bias is less pronounced than over the Plateau, with mean seasonal anomalies of −2.2°C. Precipitation anomalies are comparable to the Plateau when expressed in mm/day. However, since observed precipitation over this region is about three times greater than over the Plateau during the snow season (Table 2), when expressed relative to normal precipitation, departures here are less extreme than over the Plateau. Nevertheless, the models do run relatively cold and wet.
 In contrast, over western Asia SCA tends to be underestimated by AMIP-2 models (although these results are less consistent from model to model). Simulated climates over western Asia tend to be too warm and wet (Tables 4c and 5c). The mean seasonal temperature anomaly for all models is 1.3°C, with particularly large departures during February and March. Four of the fifteen models have negative temperature anomalies over this region. Precipitation is overestimated, with a mean seasonal departure for all models of 0.21 mm per day compared to a mean observed value of 1 mm per day (Table 2).
 Only in the western Asian region does there appear to be a significant relationship between the magnitude of the SCA anomaly and the magnitudes of either temperature and/or precipitation anomalies. Scatterplots of SCA versus temperature and precipitation anomalies were evaluated, and a series of band-regression analyses and least squares multiple linear regressions were performed using SCA as the dependant variable and temperature and precipitation as the independent variables, for each month between October and April; for seasonal mean (October through April) values; and for December through March values. In addition, the analyses were performed using natural log transformations of the independent as well as the dependent variables. For the Tibetan Plateau and eastern China regions, few significant results are apparent. Over western Asia the magnitude of the temperature anomaly appears to be inversely related to the magnitude of the SCA anomaly between November and March. For models with temperature anomalies greater than about 2 C, we find more negative SCA anomalies. This is probably related to the fact that mean January and February temperatures in this region are approximately −2°C. This was the only one of the three regions for which a relationship between the magnitudes of SCA, temperature, and precipitation anomalies is apparent.
7. Discussion and Conclusions
 AMIP-2 models display little seasonal bias in their simulations of the seasonal cycle of SCA at continental scales. This represents a significant improvement over AMIP-1 models. All AMIP-1 models underestimated North American SCA between September and December and overestimated Eurasian SCA during April and May [Figure 3 of Frei and Robinson, 1998].
 AMIP-2 models also display improvement with regards to interannual variability. One conspicuous deficiency in AMIP1 simulations was their universal underestimation of interannual variability of SCA (Figure 1b of Frei and Robinson, 1998). In fact, the range of SCA values in almost half of all AMIP-1 simulations were below 50% of observed values. AMIP-2 simulations have partially rectified this problem. The tendency to underestimate interannual variability, while not completely rectified, is far less pronounced in AMIP-2. Improvements in model results are associated with increased resolution, improved parameterizations, and ensemble experiments. However, detailed analyses of individual models to identify specific causes are beyond the scope of this analysis.
 Regional scale biases, on the other hand, are apparent in AMIP-2 models. At the southern boundary of the seasonal snowpack the models consistently overestimate SCA over eastern Eurasia (positive biases on the order of +1 × 106 km2) and underestimate SCA over western Eurasia (negative biases on the order of −0.5 × 106 km2). While these biases are associated with temperature and precipitation anomalies, over eastern Asia the magnitude of the modeled SCA anomaly does not seem to be related to the magnitude of the temperature and/or precipitation anomalies. Over western Asia, SCA anomalies seem to be driven by temperature but not precipitation. More detailed diagnostic studies are left for the individual modeling groups.
 Room for improved simulations of SCA by GCMs remain, particularly in regard to simulating the climate over interior continental Asia and to the parameterization of precipitation and probably sublimation processes over cold, dry, high elevation regions. Nevertheless, improved snow simulations found in recent-generation GCMs should lend more credibility to the results of climate change experiments, especially if other diagnostic subprojects find similar improvements in other aspects of model performance.
 Evaluations of simulated SWE, in addition to SCA, by GCMs is now becoming possible using gridded data sets of SWE over North America that have recently become available. These data sets combine fairly dense networks of snow depth observations with more sparse networks of SWE observations and snowpack models. Detailed analyses of AMIP-2 SWE using such data sets are currently underway, and the results will be reported separately. Future analyses will also include evaluations of modes of variability in snow cover patterns, and relationships between snow and atmospheric circulation in AMIP-2 models.
 This research was supported by grants from NSF (grant ATM-9818098), NOAA (grant NA06GP0566), and NASA (grant NAG5-11403). We thank two anonymous reviewers for helpful comments.