Snow models such as SNOW-17 may estimate past snow water equivalent (SWE) using either a forward configuration based on spatial extrapolation of measured precipitation, such as with the parameter-elevation regressions on independent slopes model (PRISM), or a reconstruction configuration based on snow disappearance timing and back-calculated snowmelt. However, little guidance exists as to which configuration is preferable. Because the two approaches theoretically have opposite sensitivities to model forcing, combining (averaging) their SWE estimates may be advantageous. Using 154 snow pillow sites located in maritime mountains of the western United States, we compared forward, reconstruction, and combined configurations of a simplified SNOW-17. We evaluated model errors in annual precipitation, peak SWE, and SWE errors during the accumulation and ablation seasons. We also conducted a separate analysis to assess the sensitivity of peak SWE to biased forcing data and model parameters. The forward model had the greatest precipitation accuracy, while the combined model had the greatest accuracy in peak SWE and SWE during the accumulation and ablation seasons. In determining peak SWE, the forward and reconstruction models demonstrated opposite sensitivities to errors in air temperature and model parameters, and the combined model minimized errors due to temperature bias and parameter uncertainty. In basins with precipitation gages, we recommend PRISM for precipitation estimation and the combined model for SWE estimation. In areas with high precipitation uncertainty, reconstruction is more viable. Accurate model parameters dramatically improved reconstruction, so more work is needed to advance parameter estimation techniques in complex terrain.
 Snow hydrologists often ask the fundamental questions, “What is the snow water equivalent (SWE) at an ungaged location in a mountainous basin, and how does it change in time?” These questions are important to hydrologists because understanding spatial distributions of SWE is essential for constructing depletion curves [Luce et al., 1999; Homan et al., 2010] which may be used to forecast seasonal runoff [Rango and Martinec, 1982]. The questions are also important for understanding seasonal snowpack interactions with fine-scale ecology. The magnitude of peak SWE (and snow depth) is positively related to snowpack persistence [Liston, 1999], and these variables impact ecology, such as vegetation distributions [Barbour et al., 1991], tree growth [Littell et al., 2008], and wildlife habitat [Millar and Westfall, 2010].
 The answers to the above questions remain elusive because of extreme spatial variability in snow properties [Elder et al., 1989] and because observational networks are sparse in many mountainous basins worldwide (e.g., British Columbia, Canada [M. Miles and Associates, 2003], California [Lundquist et al., 2003], and New Zealand [Weingartner and Pearson, 2001]). Furthermore, observation stations are typically located in flat clearings [Farnes, 1967], which often report systematically higher SWE than the surrounding area [Molotch and Bales, 2005; Lee et al., 2005; Grünewald and Lehning, 2011]. Thus, the available observations are not likely to represent the true spatial distribution of SWE, especially in mountainous basins with complex terrain and heterogeneous vegetation [Blöschl, 1999]. Researchers must either collect more snow data (e.g., field surveys, remote sensing) or use models forced by other information (e.g., terrain characteristics, precipitation, air temperature) to estimate SWE.
 To expand SWE observations, intensive ground-based field surveys [e.g., Cline et al., 2003; Molotch and Bales, 2005] have been conducted in relatively small areas over discontinuous time periods. These surveys are uncommon because they rely on intense manual labor operating in challenging terrain. Observations of SWE or snow depth have also been provided by remote sensing, such as laser scanning technology [Grünewald et al., 2010; Prokop et al., 2008] or scanning microwave radiometers [e.g., Dahe et al., 2006]. Despite the promise of these remote sensing instruments, limitations remain. Laser scanning technology is not routinely employed in most basins, and microwave radiometers observe SWE in large footprints (e.g., 0.5°), which are too coarse to resolve SWE variability in many basins. Passive remote sensing instruments (e.g., USGS/NASA Landsat, NASA MODIS, and ESA MERIS) provide global observations of snow cover but do not observe SWE or snow depth.
 Because of these challenges, researchers are left to model SWE. To interpolate and extrapolate SWE observations, studies around the northern hemisphere have utilized many techniques, such as multivariate statistical methods [e.g., Anderton et al., 2004], probabilistic approaches [e.g., Skaugen, 2007], masked interpolation methods [e.g., Fassnacht et al., 2003], global interpolation models [López-Moreno and Nogués-Bravo, 2006], and regression trees based on terrain characteristics [e.g., Elder et al., 1998]. Methods that rely on snow observations alone may not accurately model SWE in areas above the highest observations [Rice et al., 2011], so a deterministic snow model [e.g., Anderson, 1976] that simulates SWE time series based on meteorological data (e.g., air temperature, precipitation) may be preferred. Two different configurations of the same snow model may be employed to simulate SWE with time.
 In the first configuration, estimates of precipitation are input into the snow model, which partitions the precipitation at each time step into rain and snow (typically with a threshold air temperature), and stores snowfall accumulation as SWE. The model reduces SWE when environmental conditions favor snowmelt. This precipitation-driven approach is common, and we refer to it herein as the “forward” model (Figure 1a). If a precipitation gage network exists nearby, the estimates of precipitation can be estimated with analytic mapping models, such as the parameter-elevation regressions on independent slopes model [PRISM] [Daly et al., 1994], multivariate regression [Marquínez et al., 2003], kriging [Garen and Marks, 2005], inverse-distance weighting [Gemmer et al., 2004], or truncated Gaussian filters [Thornton et al., 1997]. When few precipitation gages are available in a mountainous basin, which is not uncommon, uncertainty increases in precipitation inputs [e.g., Tsintikidis et al., 2002] and this uncertainty will propagate through the forward model [Shamir and Georgakakos, 2006].
 With the second configuration, SWE is reconstructed by summing modeled snowmelt backward in time starting from the date of snow disappearance, in order to estimate how much SWE must have existed before snowmelt commenced (Figure 1b). Herein we refer to this method as “reconstruction” [Molotch and Bales, 2005], although it has also been called the “depletion method” [Cline et al., 1998; Rice et al., 2011]. Several members of the snow science community (see Table A1 in Appendix A) have selected reconstruction over forward modeling for various reasons. First, reconstruction does not require precipitation observations to estimate peak SWE in years when no snow accumulates after peak SWE. At a minimum, reconstruction requires air temperature to calculate snowmelt, and air temperature may be more reliably estimated than precipitation [Ninyerola et al., 2000]. Second, reconstruction incorporates observed snow disappearance timing (Figure 1b), which provides additional information because it is correlated with peak SWE and ablation season melt rates [Liston, 1999]. Third, snow disappearance timing observations are available for most basins worldwide from snow covered area (SCA) products derived from passive remote sensing imagery (e.g., Landsat, MODIS, and MERIS). Thus, reconstruction is possible in most basins worldwide, regardless of the availability of precipitation observations.
 When estimating past SWE in locations that have nearby precipitation gages, one must decide to use either a forward model or reconstruction, but there has been little research to guide this decision. Studies have independently examined the accuracy and sensitivity of forward models [Shamir and Georgakakos, 2006; He et al., 2011a, 2011b] and reconstruction [Rice et al., 2011; A. G. Slater, et al., Uncertainty in seasonal snow reconstruction: Relative impacts of model forcing and image availability, submitted to Adv. Water Resourc., 2011]. However, no study has examined these together. The critical premise of reconstruction is that snowmelt can be modeled more accurately than snowfall, but this hypothesis has not yet been tested. Additionally, if these models have opposite sensitivities to model inputs or parameters (see section 2), then a “combined model” that averages SWE from forward and reconstruction models may reduce the likelihood of SWE errors. No prior study has examined the hypothetical benefits of this type of combined model.
 The purpose of this study is to compare the accuracy and sensitivity of three configurations (forward, reconstruction, and combined) of the same snow model, in order to provide guidance for selecting a model configuration to estimate SWE and precipitation. Two specific questions are addressed: Which model configuration (forward, reconstruction, or combined) is likely to produce the most accurate estimates of annual precipitation, peak SWE, and SWE during the accumulation and ablation seasons? And how sensitive are the model configurations to biases in model data and parameters when estimating SWE?
 To answer these questions, we employ a simplified version of SNOW-17 [Anderson, 1976] to calculate snow accumulation and ablation. We select a temperature index model because it only requires air temperature and precipitation data; an energy-balance approach requires additional data (e.g., radiation, wind, humidity), which are not widely available in most mountainous basins. We use air temperature, precipitation, and SWE data from 154 snow pillows in the western U.S. to calibrate SNOW-17 and test the three model configurations. Although these flat, clearing sites may not be representative of zonal or basin SWE, we use them because they readily provide a large pool of SWE, air temperature, and precipitation data that allow testing and comparison between the model configurations. In contrast, observations are rarely available along sloped terrain [Pomeroy et al., 2003].
 In testing each model configuration, we only assume that air temperature and snow disappearance timing data are known; the SWE data at each snow pillow site are used only for cross-validation and do not inform model calibration or the mass balance simulation at that site. For the forward model, we estimate precipitation using a PRISM precipitation map because PRISM has been shown to produce smaller errors and less bias than other techniques [Daly et al., 1994]. The methodology was developed here to reflect the likely data available to a researcher who needs to estimate past SWE at ungaged locations.
2. Forward and Reconstruction Models: Theory, Limitations, and Opportunities
 Daily SWE time series with a forward snow model (Figure 1a) can be generalized as
where n is the nth day of the water year (1 October–30 September), At is estimated daily snowfall accumulation, Mt is estimated daily snowmelt, and t is the time step (d). At a minimum, forward models require precipitation and air temperature data, as well as model parameters for rain-snow threshold temperatures, snowmelt temperatures, and snowmelt factors.
 In contrast, a SWE time series from a reconstruction model (Figure 1b) is
where d is the date of snow disappearance (elsewhere referred to as the snow depletion or snow-free date, or the final day of seasonal snow cover), and n is the day of interest, which, by definition, must be before date d. The reconstruction model runs in reverse from date d to date n (Figure 1b). Because the model runs in reverse during this time domain, calculated snowmelt increases reconstructed SWE while snow accumulation decreases reconstructed SWE. At a minimum, reconstruction requires air temperature data and the snow disappearance date, as well as model parameters for snowmelt temperatures and snowmelt factors. Precipitation data and rain-snow threshold parameters are required by reconstruction when snowstorms occur after day n (to the right of t = n in Figure 1b). Many reconstruction studies have assumed no snowfall (i.e., At = 0) after a specified date in March or April (see Table A1 in Appendix A). This assumption was not made in this study because total observed snow accumulation during the melt season was typically 10% of peak SWE at the study sites, and in extreme cases, exceeded 35% of peak SWE.
 Forward models are intuitive because they operate in the same direction as time, and are practical because they can simulate past, current, or future conditions. Reconstruction models are retrospective and can only simulate past SWE because they first require the snow to disappear. Despite this limitation, reconstruction may yield the only viable estimate of past SWE when precipitation distributions are unknown because it does not require precipitation data. When using a temperature-index model, reconstruction will have comparable or improved accuracy relative to a forward model, given accurate air temperature data, low uncertainty in snow disappearance dates, and appropriate model parameters (see section 4).
 In practice, both models are impacted by errors in data and parameters, and these are likely to impact the SWE estimation. At ungaged locations, air temperature is commonly estimated with a lapse rate, which may introduce error when not based on regional observations [Minder et al., 2010]. Air temperature estimation errors of 1°C are typical across lower elevations, and errors exceeding 1.5°C are common when estimating temperature at higher elevations [Hasenauer et al., 2003]. Precipitation biases of 10%–50% may occur when windy conditions cause gage undercatch [e.g., Goodison et al., 1998], while additional errors may arise during interpolation or extrapolation of gage data across a basin (e.g., PRISM or kriging). Snow disappearance dates inferred from remote sensing may be inaccurate for various reasons, such as cloud cover, missing scenes, concealment by forest canopy, or if the spatial scale of interest is smaller than the satellite instrument's footprint [e.g., Dozier et al., 2008]. For temperature index models, uncertainty in model parameters at ungaged locations may be large [He et al., 2011a].
 In theory, forward and reconstruction models should exhibit varying sensitivities to model data and parameters when estimating peak SWE (Table 1). One key observation to note is that the models have opposite sensitivities to air temperature and to the model parameters, though the magnitudes of these sensitivities should vary. Bias in air temperature may impact the forward model through rain-snow partitioning [e.g., Moore and Owens, 1984; Minder et al., 2010], and reconstruction through snowmelt rates [e.g., Richard and Gratton, 2001; Minder et al., 2010]. Minder et al.  found that air temperature estimates with a warm bias may result in less snowfall and greater snowmelt rates. Bias in model parameters depends on location and seasonal climate. For example, forward model estimates of peak SWE are only sensitive to the snowmelt parameters in years when melt occurs before peak SWE, while reconstruction is sensitive to rain-snow threshold parameters only when storms bring rain and snow after peak SWE.
Table 1. Correlation of Forward and Reconstruction Model Biases to Biases in Model Inputs and Parameters When Estimating Peak SWEa
Forward Bias in Peak SWE
Reconstruction Bias in Peak SWE
We assume that midwinter melt may occur before peak SWE (impacting the forward model) while some snow accumulation may occur after peak SWE (impacting the reconstruction model).
The rain-snow threshold temperatures in this study are Train and Tsnow.
The snowmelt threshold temperature in this study is MBASE.
The minimum and maximum snowmelt factors in this study are MFMIN and MFMAX, respectively.
 Therefore, forward and reconstruction models should have opposite sensitivities to accumulation and ablation processes, presenting an opportunity where averaging their peak SWE estimates may minimize the impact of biased data and/or parameters. We hypothesize that this type of combined model is more likely to produce smaller SWE errors than either forward or reconstruction models because averaging the opposing sensitivities will reduce the overall error.
3.1. Observational Sites and Quality Control
 The study was conducted at 154 snow pillows (Figure 2a) in the western U.S. maritime ranges, including the Cascades of Washington and Oregon, the California Sierra Nevada, the Blue Mountains of Oregon, the Pacific Northwest Coastal Range, and the California Klamath Range. In these ranges, 85%–95% of annual precipitation arrives between October and May [Baker, 1944]. In the subalpine areas of these ranges, 50%–67% of annual precipitation is snowfall [Serreze et al., 1999], while 90%–100% of alpine precipitation falls as snow [Smith and Berg, 1982; Kattelmann and Elder, 1991]. Site elevations ranged from 685 (snow transition zone) to 3475 m (alpine zone); 42% of the study snow pillows were located below 1600 m, which is the typical rain-snow transition zone in our study areas.
 All of the study sites in Washington, Oregon, and California (east of the Sierra Nevada crest) were drawn from the Natural Resources Conservation Service (NRCS) snowpack telemetry (SNOTEL) network (available at http://www.wcc.nrcs.usda.gov/snow/). All California sites west of the Sierra Nevada crest were selected from the California Department of Water Resources (CDWR) network (available at http://cdec.water.ca.gov/), managed by the California Cooperative Snow Surveys. Observations between water years 1996–2004 were considered in Washington and Oregon, and between water years 1996–1998 in California; the selection of these years was arbitrary. Peak SWE observations ranged from 75 to 2540 mm, and annual precipitation ranged from 330 to 5275 mm over the study. All sites had daily observations of mean air temperature and SWE. Precipitation data were available at all analysis sites except at 26 of the CDWR sites. Eight additional CDWR precipitation gages (Figure 2a, red stars) augmented the California data pool. At sites with precipitation data, undercatch correction was not attempted because wind speed data (not typically available) are required to correct undercatch for storage gages [Sevruk, 1983], which are standard at SNOTEL sites.
 Daily air temperature, precipitation, and SWE data were quality controlled following Meek and Hatfield's  framework. Quality control flags were placed when values exceeded limits (Table 2), and flagged data were either accepted or rejected based on visual inspection. Temperature and precipitation time series from individual stations were also compared to observations at neighboring stations to find anomalies. A station year (i.e., one water year of data at a single station) was discarded if there were four or more consecutive days of missing or flagged data of any one variable during the observed snow season. Station years were also discarded if the snow disappearance date could not be determined from the SWE data. Data gaps of <4 days in the temperature and SWE data were filled with interpolation from data immediately before and after each gap, while no precipitation was assumed during these gaps.
Table 2. Data Thresholds Used to Flag Potentially Erroneous Data in the Quality Control Processa
All flagged values were visually inspected to make quality control decision. The rate-of-change (ROC) limit was used to detect jumps in the data series while the no-observed-change (NOC) limit was used to detect constant data.
150 mm d−1
150 mm d−1
300 mm d−1
 After quality control, 388 station years remained. Fifty-four station years (i.e., 18 stations, three water years each) were isolated for snow model calibration (see section 4) and were not used in evaluation statistics. This left 334 station years (136 snow pillow sites) of data available for the evaluation. Forty of these station years were at the CDWR sites which lacked precipitation data.
3.2. PRISM Data
 Output data from the parameter-elevation regressions on independent slopes model (PRISM) [Daly et al., 1994] were used to estimate precipitation for the forward model (see section 5). PRISM was selected because it is used widely to map precipitation in hydrologic and ecological models. PRISM divides a digital elevation model into topographic facets based on slope orientation and coastal proximity. For each topographic facet in a region, PRISM develops elevation-based regressions with gage observations, and estimates monthly and annual grids of precipitation based on those regressions. Daly et al.  reported mean absolute errors (MAE) of 10%–17% in PRISM annual precipitation when using a 52-station network in Oregon.
 To estimate the 1971–2000 climatology of annual precipitation in the conterminous United States, the PRISM climate group (available at www.prism.oregonstate.edu) created a 30-arcsec (800 m) “normals” product (Figure 2b) from observations at over 13,000 stations [Daly et al., 1994, 2008]. This product incorporates historical precipitation data or snow course data from most major observational networks (e.g., SNOTEL, CDWR). Data from 114 of the 136 analysis sites were used to produce the normals product [C. Daly, personal communication, 2010]. An annually varying 2.5-arcmin (4 km) PRISM “analysis” product is also available; we compared the two PRISM products but found that the normals product yielded improved results in our study. Therefore, we only used the mean annual precipitation data from the 800 m normal product (henceforth called PRISM).
4. Snow Accumulation and Melt Model
 The snow accumulation and melt model used in this study was SNOW-17 [Anderson, 1976]. SNOW-17 is a single-layer, temperature-index snow model used operationally by the National Weather Service (NWS) for flood forecasting. SNOW-17 estimates SWE and outflow (snowmelt plus rain) at each time step. Although many past reconstruction studies (Appendix A) used snow models that required net radiation to calculate snowmelt, our study sites generally lacked radiation observations. Accordingly, we selected SNOW-17 because it required data inputs (e.g., air temperature) that were available at our study sites, and because it has been shown to simulate snowmelt as well as energy balance methods in some studies [Franz et al., 2008].
 We simplified the SNOW-17 model for the sake of computational efficiency. Whereas the full NWS SNOW-17 has 10 model parameters, our simplified version has only five. In the simplified version, the rain-on-snow, light rain, and ground heat melts, liquid water holding capacity, and heat deficit components of the full NWS SNOW-17 were deactivated. If the heat deficit and liquid water holding routines were activated, reconstruction would have required an iterative solution [Raleigh, 2009] because these routines are dependent on snowpack conditions before that time step. With these routines deactivated, the simplified SNOW-17 could reconstruct SWE without multiple iterations. We repeated the analysis using iterations with the full NWS model, but found the results of the two versions were not significantly different at our study sites. Thus, only the results from the simplified SNOW-17 were included here.
 The five parameters used in the simplified SNOW-17 are listed in Table 3. Whereas many SNOW-17 studies have used a single threshold temperature to distinguish rain and snow, we used two threshold temperature parameters (Tsnow and Train) because mixed rain-snow storms are not uncommon at many of our maritime sites. In conjunction with mean daily air temperature (Tt), these two parameters were used to estimate the snowfall fraction (ft) of daily precipitation at each snow pillow. All daily precipitation was assumed snow when Tt was less than or equal to the Tsnow parameter, all was rain when the Tt was greater than or equal to the Train parameter, and the precipitation was a linear rain-snow mixture between those two parameters [United States Army Corps of Engineers, 1956]. Snowfall was accumulated in the modeled snowpack while rainfall and melt water passed through the snowpack without being stored.
Table 3. Descriptions and Ranges of the Calibrated Snow Model Parametersa
Parameters were optimized at the 18 snow pillow sites (Figure 2a) that were isolated for regional calibration.
All precipitation is snow when Tair ≤ Tsnow
All precipitation is rain when Tair ≥ Train
Minimum temperature required for snowmelt
Minimum temperature index snowmelt factor
0.36°C–3.6 mm °C−1 d−1
Maximum temperature index snowmelt factor
2.8°C–6.8 mm °C−1 d−1
 The three remaining model parameters (MBASE, MFMIN, and MFMAX) were used to calculate snowmelt. Snowmelt (Mt) on day t was calculated as [Anderson, 1976]:
where MFt is the daily-varying melt factor (mm °C−1 d−1), Tt is mean daily air temperature (°C), and MBASE is the minimum air temperature (°C) for snowmelt (no melt when Tt ≤ MBASE). MFt is varied daily with a sinusoidal curve to reflect seasonal changes in net solar radiation, such that MFt equals MFMIN on 21 December and MFMAX on 21 June [Anderson, 1976].
 All five model parameters required calibration. Eighteen snow pillows were designated as calibration sites and were selected on a regional basis (every 1° of latitude, with separate stations on the east and west slopes, Figure 2a). The 18 calibration sites were excluded from the rest of the study. Each site was independently calibrated using three water years of data. At each calibration site, an optimization algorithm was used to find the single values of Tsnow, Train, and for MBASE, MFMIN, and MFMAX that produced the lowest root-mean-square error (RMSE) in snowfall accumulation and snowmelt, respectively. The resulting calibrated parameters (Table 3) were comparable to values reported in other SNOW-17 studies [e.g., Shamir and Georgakakos, 2006; Franz et al., 2008; He et al., 2011a]. The study results exhibited sensitivity to the model calibration. This is demonstrated in section 6.4 and discussed further in section 7.
 Forward, reconstruction, and combined configurations of the simplified SNOW-17 model were used to estimate annual precipitation and SWE at each study site (i.e., point scale). All sites had snow pillows, and most had precipitation gages, which allowed an evaluation of each model configuration at each study site; local observations of SWE and precipitation were not used as input into the model. Figure 3 shows the assumptions made regarding data availability at the study site (hereafter referred to as site X) when estimating precipitation (PX) and snow water equivalent (SWEX). These assumptions are described below.
 1. To assess the models' applicability in locations without precipitation gages, we assumed each study site (site X) lacked a precipitation gauge and thus required estimation based on observations (PY) at the closest gage, site Y. To estimate daily precipitation at site X, a precipitation multiplier (S) was used to uniformly increase or decrease PY, to account for accumulation differences between locations, due to effects such as orographic enhancement of precipitation [Roe, 2005].
 2. Observations of air temperature and snow disappearance timing were assumed available at site X. The annual snow disappearance date was provided by the snow pillow at site X and was assumed the first date with SWE = 0 after peak SWE. We assumed these two observations were available because point values may be observed easily with distributed temperature sensors in applications outside of this study [e.g., Lundquist and Huggett, 2008; Lundquist and Lott, 2008].
 3. We assumed that the five snow model parameters from the nearest calibration station (section 4) could be transferred to site X and were constant from year-to-year. This assumption is tested in section 6.5 to evaluate the errors associated with transferring model parameters from regional calibration stations to study sites.
 4. We assumed sublimation, wind transfer, and avalanches were negligible at site X and thus did not require simulation. Model simulations in western Idaho suggest that sublimation is a minor component of the mass balance, with an expected magnitude of 3% of peak SWE during wet years and 10% during dry years [Reba et al., 2011]. Snow pillows (i.e., site X) are typically located in flat clearings where mechanical redistributions are minimal or nonexistent [Farnes, 1967].
 While the daily snowfall fraction and potential snowmelt were the same for each model configuration, the key differences were the precipitation multiplier (S), the direction of the model simulation (Figure 1), and the starting point of the forward and reconstruction models. These differences impacted modeled precipitation and SWE, and are described further below, in section 5.1.
5.1. Forward Model
 In the forward model, the PRISM mean annual precipitation map (Figure 2b) was used to estimate the mean precipitation ratio between sites X and Y (Figure 3). For example, if PRISM showed mean annual precipitation of 1500 mm at the site X pixel and 1000 mm at the site Y pixel, then the multiplier used (SPRISM,XY) to map daily observations from site Y to X would be 1.5. This common methodology is used in distributed models [e.g., Smith et al., 2004; Shamir and Georgakakos, 2006] and mountain microclimate models [e.g., Running et al., 1987].
 Snowfall accumulation at site X on day t was estimated with the forward model as
where PY,t was observed daily precipitation (mm) at the nearest offsite gage (site Y, Figure 3); fX,t was the snowfall fraction of precipitation at site X, based on the transferred Tsnow and Train parameters (see section 4); SPRISM,XY was the PRISM precipitation multiplier, PPRISM,X/PPRISM,Y. SPRISM,XY was constant between years, as it was the mean precipitation difference between sites.
 Each year's precipitation (Pann) at site X was estimated with the PRISM multiplier as
 We modeled daily snowfall accumulation (At) with the reconstruction model based on the mass balance across the snow season. Every year at site X, total snowfall must equal total snowmelt (equation (3)) over the course of the snow season (neglecting other mass transfers):
where, Srecon,XY is an annually and spatially varying multiplier that relates mass outflow (i.e., snowmelt) to mass inflow (i.e., unadjusted snow accumulation) at site X, t = c denotes the first day of continuous snow cover modeled at site X, and d is the observed snow disappearance date.
 During each year, Srecon,XY was solved in equation (7) and then used to model snowfall accumulation at site X on day t with the reconstruction model:
This addition to the reconstruction model permitted SWE modeling across the entire snow season and allowed estimation of annual precipitation with SWE reconstruction:
While SPRISM could have been used in the SWE reconstruction model (equation (10)) in place of Srecon, we found that SPRISM did not improve SWE reconstruction (no results shown). We include Srecon here to assess the accuracy of a basic approach of backing out precipitation from the snowpack mass balance (equations (7) and (9)).
5.3. Combined Model
 Annual precipitation and SWE were estimated with the combined model as
5.4. Sensitivity Analysis
 The study was designed to represent a best-case scenario for the model configurations. In practice, it is common that air temperature at site X (Figure 3) is not observed and must be estimated from data at other stations. Snow disappearance timing is most readily observed with remote sensing, which is also subject to various errors (see section 2). The study was further idealized because the PRISM map was trained by past data at 114 of the 136 study sites, so PRISM accuracy was likely maximized. Because biases in model inputs and parameters should impact the models differently (Table 1), a sensitivity analysis (section 6.4) was conducted to quantify the impact of biases in model input data (air temperature, precipitation, and snow disappearance timing) and model parameters (rain/snow delineation, melt threshold temperature, melt factors) on peak SWE. This was accomplished by introducing independent, artificial biases in each data input and model parameter, and observing the changes in peak SWE accuracy.
6.1. Annual Precipitation
 Estimates of annual precipitation were compared to the uncorrected, on-site precipitation observations. The most accurate estimates of annual precipitation were associated with the forward model (i.e., PRISM). While median errors (Table 4) in annual precipitation from the three models were similar, the reconstruction and combined models had a higher frequency of larger errors as seen in Figure 4a. The reconstruction and combined models had larger errors in precipitation because of errors associated with transferring the five model parameters (see section 6.5). Figure 4b presents the results as cumulative probability distributions and demonstrates that the forward model generally yielded smaller errors than the other two models. The reconstruction approach of estimating annual precipitation (equation (9)) was twice as likely as the forward model to produce an annual precipitation error exceeding 10%, and nearly 12 times as likely to produce an error exceeding 50% (Figure 4b). Not surprisingly, the combined model results fell between the extremes of the forward and reconstruction models.
Table 4. Summary Statistics of Errors in Seasonal SWE With the Forward, Reconstruction, and Combined Modelsa
Statistics in percent in annual precipitation (n = 294), peak SWE (n = 334), and mean absolute error, MAE (%). SWE errors are relative to observed peak SWE. All error distributions were nonnormal, and thus the nonparametric statistics are of prime interest. Parametric statistics are shown for reference only. Seasonal MAE was taken during the observed accumulation and ablation seasons. In the nonparametric statistics, the interquartile range (IQR) is the difference between the 75th and 25th percentiles.
MAE Accumulation SWE
MAE ablation SWE
6.2. Peak SWE
 Median peak SWE errors (Table 4) from the three models were not significantly different and had negative biases (Figure 4c). The negative bias for the forward model may be indicative of the median measurement error due to undercatch, an error not corrected by PRISM. The combined model was more likely to produce smaller peak SWE errors than either the forward or reconstruction models (Figures 4c and 4d). The forward and reconstruction models consistently demonstrated similar probabilities of absolute errors (Figure 4d); this indicated that neither approach was statistically preferable for modeling SWE at the study sites. The forward and reconstruction models were each 1.2 times as likely as the combined model to produce a peak SWE error exceeding 10%, and 3.2 times as likely to produce an error exceeding 50% (Figure 4d).
6.3. SWE During the Accumulation and Ablation Seasons
 To understand the accuracy of SWE estimation during specific seasons, mean absolute errors (MAE) in modeled SWE were recorded during each accumulation and ablation season. Seasonal errors were assessed because peak SWE errors (Figure 4d) may not be useful for seasonal-specific applications, such as the development of snow depletion curves for the ablation season. During the accumulation season, the forward and reconstruction models produced similar MAE (Figure 5 and Table 4) while the combined model produced lower MAE. During the ablation season, the forward model had significantly higher MAE, while the reconstruction and combined model had similar MAE (Figure 5). Forward model errors were greater during the ablation season because errors from the accumulation season were carried over to the ablation season. This caused major errors in estimated snow disappearance timing with the forward model; 65% of the forward simulations had at least a 7-day error in snow disappearance.
6.4. Sensitivity of Results to Model Inputs and Parameters
 The sensitivity analysis (Figure 6) confirmed the expectations of Table 1. As seen in Figure 6a, the forward (reconstruction) model SWE error was negatively (positively) correlated with air temperature bias. The combined model was significantly less sensitive to air temperature errors because averaging over- and underestimation errors from the forward and reconstruction models resulted in median SWE errors closer to zero (Figure 6a). This implied that some SWE errors in the original analysis (no artificial bias) may have been the result of errors in the observational air temperature data and/or the model calibration. As hypothesized, these errors tended to cancel out in the combined model.
 The forward model SWE errors (Figure 6b) were directly proportional to the introduced precipitation bias. Although our formulation of the SWE reconstruction method (equation (10)) included precipitation to allow modeling across the entire snow season, it was insensitive to precipitation bias. This was explained by the multiplier development. For example, a precipitation bias of −50% (i.e., 0.5 PY) would increase Srecon by a factor of 2 in equation (7), which would cancel the same −50% PY bias in equation (10). Therefore, reconstructed SWE was not impacted by precipitation bias.
 The forward model was independent of biases in snow disappearance timing by definition. When reconstructing peak SWE, there was an average additional error of 4.3% for every 1 day of snow disappearance date bias (Figure 6c). This error is similar in magnitude to the results of Slater et al. [A. G. Slater, A. P. Barrett, M. P. Clark, J. D. Lundquist, and M. S. Raleigh, Uncertainty in seasonal snow reconstruction: Relative impacts of model forcing and image availability, submitted to Adv. Water Resourc., 2011].
 When estimating peak SWE, the forward and reconstruction models had opposite sensitivities to the model parameters (Figures 6d, 6e, and 6f), but the magnitudes of their sensitivities varied. The forward model was more sensitive to the rain-snow threshold temperatures (Tsnow and Train) than the reconstruction model (Figure 6d) because the forward model peak SWE is dependent on snowfall accumulation (equation (1)). Likewise, the reconstruction model had greater sensitivity to bias in the snowmelt threshold temperature (MBASE) and the snowmelt factors (MFMIN and MFMAX), because reconstructed peak SWE was primarily a function of calculated snowmelt (equation (2)). The reconstruction model's sensitivity to snowmelt factors (Figure 6f) was comparable to the forward model's sensitivity to precipitation bias (Figure 6b).
6.5. Parameter Transfer Accuracy
 The assumption that model parameters could be transferred (section 5) was checked by running the calibration optimization routine (section 4) at each study snow pillow during all available water years, such that a unique set of five parameters was developed on-site for each station year. These on-site parameters were considered as the best-case calibration for each station year and eliminated any errors associated with transferring parameters from the regional calibration stations. Figure 7 displays the results of this analysis, and generally shows that reconstruction benefited the most from improved model parameters. For annual precipitation (Figures 7a and 7c), the reconstruction and combined models demonstrated increased accuracy with on-site parameters, while the forward model accuracy was unchanged because the forward model's estimates of annual precipitation were independent of the snow model parameters. Peak SWE accuracy dramatically increased for reconstruction when on-site parameters were used (Figures 7b and 7d). With the forward and combined models, SWE accuracy increased only slightly with on-site calibration (Figures 7b and 7d). Consequently, reconstruction became the most accurate peak SWE estimator when improved model parameters were available (Figure 7b).
7.1. Summary of Key Findings
 This study answered the two study questions: When transferring model parameters (Figure 3), the PRISM-forced forward model generally estimated annual precipitation with the greatest accuracy, while the combined model typically estimated SWE with the greatest accuracy (Figures 4 and 5). And as expected for peak SWE estimation, the combined model was least sensitive to air temperature errors (Figure 6a), and reconstruction was least sensitive to precipitation errors (Figure 6b). The forward model was most sensitive to the rain-snow parameters (Figure 6d), while the reconstruction model was most sensitive to the melt threshold temperature (Figure 6e) and the melt factors (Figure 6f); the combined model was less sensitive in those cases. In practice, uncertainty will exist in all model parameters and data, and the results here suggest that the combined model may yield the lowest overall sensitivity to uncertainty.
 When transferring model parameters, the likelihood of errors in peak SWE was nearly identical for the forward and reconstruction models (Figure 4d), and thus the accuracy of estimating snowfall during the accumulation season was comparable to the accuracy of estimating snowmelt during the ablation season. While this result did not support the implicit premise of reconstruction, that snowmelt can be estimated more accurately than snowfall, it implies that both approaches may be equally viable, given reasonable input data (Figure 3). With similar distributions of peak SWE errors, the combined model improves accuracy at locations where one model (forward or reconstruction) overestimates peak SWE and the other underestimates peak SWE. This overestimation-underestimation situation, which is characteristic of errors in air temperature and model parameters (Figures 6a, 6d–6f), occurred in 58% of the station years (n = 334). The combined model improved SWE estimation (Figure 4c) in 62% of those cases. By construction, the combined model could never produce the largest error at any given station year because it was the average of the other two model estimates. This guarantees that the combined model will always estimate peak SWE more accurately than at least one of the two models, a useful feature when uncertainty exists in the forcing data and parameters of both forward and reconstruction models. Thus, the combined model will estimate SWE distributions more reliably than both the forward and reconstruction models only if those models have similar overall accuracy, which is the case here (Figures 4c and 4d).
 Because the forward and reconstruction models produced over- and underestimation errors at various station years, it was evident that errors in observed air temperature and/or transferred model parameters must have corrupted the models' estimates of peak SWE. This was demonstrated with the forward model, which had high accuracy in annual precipitation in the original analysis (Figure 4a), and lower accuracy when estimating peak SWE (Figure 4c). The forward model underestimation of peak SWE may be partially attributed to gage undercatch, as the −8.9% median error was in the range of the −4.8%–9.5% mean undercatch errors reported in the study area [Serreze et al., 1999]. Errors in air temperature and model parameters limited the accuracy of reconstruction. The accuracy of the reconstruction model dramatically increased with on-site calibration (Figure 7d), which corrected errors in model parameters and compensated for bias (if any) in observed temperature.
 Given improved (on-site) calibration parameters, the initial premise of reconstruction was supported, as the ablation season was simulated more accurately than the accumulation season (Figure 7b). Because the forward model gained little improvement in peak SWE accuracy with on-site calibration, we found that the rain-snow partitioning parameters (Train and Tsnow) were difficult to calibrate with accuracy, but placed a major control on model performance. Optimal parameters for snowmelt (MBASE, MFMIN, MFMAX) improved reconstructions of peak SWE (Figure 7d), but in practice, the problem of deriving these optimal parameters remains [He et al., 2011a]. Until research demonstrates improved parameter transferability and estimation, errors in the model parameters are likely to be comparable to or larger than those in the original analysis, where the combined model produced more accurate peak SWE estimates (Figures 4b and 4d).
 Parameter transferability presents a significant obstacle to temperature-index models, and energy balance models are often advocated as an alternative. However, energy balance models may have large uncertainties in data inputs and also have multiple parameters which must be estimated or transferred. A complete energy balance model might have 25 or more terms of potential uncertainty [see Table 7 by Marks and Dozier, 1992], whereas the simplified SNOW-17 reconstruction model had a total of seven terms of uncertainty (air temperature, snow disappearance timing, and the five model parameters). Even if a simple energy balance model is employed, uncertainty in the radiative terms alone (on the order of 10–40 W m−2) may exceed the data uncertainty for a model like SNOW-17 [A. G. Slater, A. P. Barrett, M. P. Clark, J. D. Lundquist, and M. S. Raleigh, Uncertainty in seasonal snow reconstruction: Relative impacts of model forcing and image availability, submitted to Adv. Water Resourc., 2011]. Most energy balance models simulate the required inputs (e.g., radiation, humidity) through empirical relationships [e.g., Waichler and Wigmosta, 2003] that are defined by parameters. These parameterized empirical equations must be transferred as well, and therefore parameter transfer is an inescapable issue for all types of snow models.
7.2. Guidelines for Model Selection
 The purpose of this study was to evaluate the models' accuracy and sensitivity, in order to understand which should be employed in practice. The guidelines derived from the study's results are summarized below and depend on data availability and the user's objectives.
 1. If a reliable precipitation gage network exists near the study basin, then PRISM (or a comparable data interpolation method) should be used to estimate precipitation (Figures 4a and 4b). However, the combined model should be employed to estimate peak SWE (Figures 4c and 4d), especially in the context of data and parameter uncertainty (Figure 6). Both the combined and reconstruction models are preferred over the forward model when developing snow depletion curves that relate SCA and SWE in the basin during the snowmelt season (Figure 5).
 2. If a basin with a precipitation gage network has high uncertainty in remotely sensed snow disappearance timing (e.g., cloudy conditions during the snowmelt season or infrequent sampling), then the forward model should be considered to estimate peak SWE (Figure 6c).
 3. If the precipitation gage network is sparse or nonexistent, then SWE reconstruction should be employed because it is likely to produce similar errors as a forward model driven by PRISM (Figures 4c and 4d). A crude estimate of spatially distributed precipitation (Figures 4a and 4b) may be backed out using the reconstruction method with equations (7) and (9) if at least one precipitation gage exists in the area.
 4. In all cases, the accuracy of SWE estimation will vary with the spatial scale of interest. For example, when estimating peak SWE across all station years (n = 334), the median bias was relatively small for all three models (Table 4). This implies that when aggregating SWE estimates over a large spatial scale (e.g., a basin), any of the three models might have skill in estimating mean areal SWE, but at any one specific point location (e.g., an ecological study site), there is a high probability of producing an error that exceeds the median bias. The median bias was less than 10% for all three models, but 65% of the combined model simulations and 77% of the forward and reconstruction simulations exceeded 10% error (Figure 4d).
7.3. Representativeness of the Results
 The accuracy and sensitivity of model configurations were evaluated here at flat clearings located at midelevations in maritime mountain ranges, so the results are most applicable to sites with similar characteristics and climate. These sites can be found worldwide (e.g., New Zealand, Japan, eastern Russia, northwest Europe, Chile, and southwestern British Columbia). Modeling SWE in other areas may inevitably demand inclusion of other (scale-dependent) processes, discussed further below, which did not require representation at the study sites.
 Forested and sloped regions present additional uncertainties in all three models. The forward model may need to represent canopy interception and sublimation dynamics, and this could introduce additional uncertainty in SWE estimation. Likewise, the forest canopy may introduce uncertainty in the date of snow disappearance, which increases reconstruction uncertainty. The forest canopy also places a major control on snowmelt dynamics by reducing incident shortwave radiation and reducing turbulent energy transfer, and the representation of these processes in a model requires the estimation of many additional model parameters [e.g., Storck, 2000]. Terrain aspect also controls snowmelt energy, although to a lesser extent than forest canopy [Coughlan and Running, 1997]. Additional work is needed to develop techniques for estimating model parameters at sloped and forested sites [Rutter et al., 2009].
 Because of the study's location (maritime mountains of the western U.S.), 10% of the SWE simulations were at sites located below 1000 m, and nearly 50% were at sites below 1600 m. Thus, the results presented here are most representative of sites in or just above the snow transition zone, where winter air temperatures are mild, and mixed rain/snowstorms and midwinter snowmelt events are common [Marks et al., 1998]. In this zone, the forward model's sensitivities to air temperature and rain-snow threshold parameters are high, which provides more incentive to use the combined model. At higher elevations, alpine regions, and colder continental climates, all winter precipitation falls as snow. In these locations the forward model may be less sensitive to air temperature, and estimates of snowfall will be impacted more by precipitation extrapolation errors. However, wind transfer and sublimation have larger impacts on the mass balance in these regions [Marks and Dozier, 1992] and may require representation.
 Because the analysis focused on snow pillow sites, this study showed results under an ideal scenario for data inputs (Figure 4) and model parameters (Figure 7). In reality, the data inputs, precipitation multipliers, and model parameters will have increased uncertainty when estimating SWE at ungaged locations. With heightened uncertainty in all model forcings (Figure 6), the results show that the combined model will reduce the magnitude of bias in peak SWE because of compensating errors (Table 1).
 When estimating precipitation and SWE at study locations (assuming only air temperature and snow disappearance timing are known), the selection of a model configuration (i.e., forward, reconstruction, or combined) depends on the density and quality of the precipitation gage network, the uncertainty of the model inputs and parameters, and the user's objectives. Precipitation-based studies should be guided by PRISM or similar off-line precipitation modeling (no snow model necessary). Ablation-specific studies (snowmelt depletion curves e.g., Homan et al.  and Lee et al. ) should utilize either reconstruction or a combined model (Figure 5).
 A snow model yields different estimates of peak SWE depending on the model configuration (forward or reconstruction), partly because each configuration is uniquely sensitive to errors in model data and parameters (Figure 6). The quality of the forward and reconstruction estimates cannot be known when estimating peak SWE at an ungaged point location (e.g., an ecological plot) because the magnitude and sign of the errors in the data and parameters are unknowable. However, the simple averaging of the forward and reconstruction estimates of peak SWE in the combined model may yield the most reliable estimate at point locations because the contrasting sensitivities of the models tend to minimize overall sensitivity to errors in data and parameters. On the basis of our evaluation at snow pillow locations in the maritime region of the mountainous western U.S., forward and reconstruction configurations have comparable accuracy, and so we recommend the combined configuration in this region.
 When estimating SWE over larger domains (e.g., zonal areas of a basin) for subsequent streamflow analysis, the three configurations may yield similar SWE estimates when averaged across the basin, but may produce contrasting SWE estimates in each zone. Similar mean SWE estimates will not change the seasonal flow volume, but the distribution of SWE across the zones will impact streamflow timing. The combined model acts to eliminate large SWE errors (Figure 4c) in these zones, and therefore may improve estimation of streamflow timing.
 Improving snow model parameterization and transfer remains a challenging research endeavor, but is nevertheless important because model parameterization places a fundamental control on model accuracy, as demonstrated in this study. Accurate snow model parameters may dramatically improve estimates of annual precipitation and SWE with reconstruction (Figure 7).
 Because the study was restricted to flat clearings in the maritime zone, additional investigation is needed to compare the model configurations in areas with varying slope, aspect, and forest cover and in different snow regimes and climates. If snow models are to represent fine-scale spatial variations in SWE in these areas, they must accurately model the associated accumulation and ablation processes. Snow and meteorological data are not routinely collected along slopes and under forest canopies, but are nevertheless required for testing model performance in these environments.
 A variety of studies have applied snow water equivalent (SWE) reconstruction to derive distributed SWE estimates. Characteristics and validation efforts of these studies are listed in Table A1.
Table A1. SWE and Snow Depth Reconstruction Studies to Date
Study locations: (A), Alaska, (H), Himalayas, (I), Idaho, (RM), Rocky Mountains, (SA), Swiss Alps, (SN), Sierra Nevada.
Initial SWE refers to the earliest date of reconstructed SWE, and generally corresponds to the timing of peak SWE.
Denotes whether snowfall (if any) during the ablation season (i.e., after peak SWE) was included in SWE reconstruction.
Denotes whether SWE estimated with reconstruction was compared directly to ground-based observations. Values of n are in station-years and are all at the point scale, unless otherwise noted.
Regression trees [Elder et al., 1998] have been used in many studies to check SWE reconstruction. Regression trees quantify the variability of distributed observations based on terrain characteristics and estimate gridded values of SWE, which can be compared to gridded estimates from SWE reconstruction. The reported statistics in this column show the differences between regression trees and reconstruction for distributed SWE estimates.
 We would like to thank Courtney Moore, Nic Wayand, Laura Hinkelman, Brian Henn, Shara Feld, Kael Martin, Jenna Forsyth, Jeff Deems, and Alan Hamlet, for review and comments. We also thank Water Resources Research Editors John Selker and Michi Lehning and several anonymous reviewers for helping improve the manuscript. This publication was partially funded by the Joint Institute for the Study of the Atmosphere and Ocean (JISAO) under NOAA Cooperative Agreements NA17RJ1232 and NA10OAR4320148, contribution 1882. This work was also supported by NASA Headquarters under the NASA Earth and Space Science Fellowship Program grant NNX09AO22H.