Simulation of long‐term soil water dynamics at Reynolds Creek, Idaho: implications for rangeland productivity

Plant productivity, forage availability and soil carbon dynamics are all strongly controlled by soil water in semi‐arid rangelands. Sagebrush steppe ecosystems are among the most extensive in the western USA. We used the soil ecohydrology model (SEM) to simulate soil water dynamics and estimate plant growth at three different sagebrush steppe sites. SEM is a capacitance parameter model that uses a water balance approach to simulate daily changes in soil water and a modification of the de Wit equation to estimate plant yield. Model‐simulated soil water results were evaluated using long‐term (27–34 years) measured data. We found that SEM accurately simulated soil water dynamics and total soil water storage at all three sites, with R2 values greater than 0.8, Nash–Sutcliffe efficiencies near 0.8 and mean absolute errors of about 1.5 cm. Model‐estimated yield indices indicated a high degree of interannual variability, with values ranging more than fivefold and no long‐term trend in plant production due to water availability. We also found that seasonal yield could be estimated with reasonable accuracy at the outset of the growing season (March 1) for about half of the years simulated due to either relatively high or low pre‐growing season precipitation. Published 2015. This article is a U.S. Government work and is in the public domain in the USA. Ecohydrology published by John Wiley & Sons, Ltd.


INTRODUCTION
Interest in soil water has grown in recent years with the increased recognition that it is an important modulator of the global energy balance (Knapp et al., 2008). On a more local level, soil water partially controls overland flow and erosion, groundwater recharge, soil development and critical biological processes related to plant productivity and carbon cycling. These biological processes, which are ecologically fundamental (Whittaker, 1975), are typically strongly controlled by the availability of soil water in arid and semiarid regions (Le Houerou et al., 1988;Sala et al., 1988). A better understanding of the relationship between available water and plant productivity is critical for optimal management of arid and semi-arid rangelands (Hobbs et al., 1994).
Because precipitation is the only source of water for most upland soils, a close correlation between precipitation and plant production is expected. In fact, a general relationship between plant productivity and precipitation has been established both globally (Leith, 1973) and regionally (Sneva and Hyder, 1962;Hanson et al., 1983). These relationships have two major limitations, however. First, not all precipitation is equally effective for plant growth. Summer thunderstorms may produce overland flow or be largely evaporated relative to the same amount of gentle spring rain. On the other hand, major rainfall events may overwhelm the soil storage capacity, especially in shallow soils, causing a loss of water through the root zone (Fernandez-A et al., 1991;Seyfried and Wilcox, 2006). In addition, where there is snow, the timing of melt and spatial distribution of snow may greatly affect the amount of water available to vegetation. A second limitation is that evaporative demand, which also affects productivity, may vary considerably within a given precipitation regime. This may be expressed in variations in yield within a precipitation regime related to exposure to solar radiation on differing slopes and aspects.
A more mechanistic approach, based on the fact that water is lost from plants as carbon is assimilated, is to relate transpiration to productivity. Rosenzweig (1968) demonstrated a strong correlation between the actual evapotranspiration (AET) and net primary productivity. De Wit (1958) expanded on the pioneering work of Briggs and Shantz (1914) and developed the following equation that defines the relationship between plant growth and transpiration in arid and semi-arid climates: where Y is total dry matter produced expressed as mass per unit area (kg ha À1 ), T is total transpiration during the growing season (mm), T max is potential transpiration (mm) for the same period and m is a crop factor dependent on variety and species. The basic assumption behind this equation is that, in water limited environments, vegetative yield is directly proportional to the amount of transpiration and inversely proportional to the evaporative demand. The difficulty of measuring T directly has lead to a number of variations of Equation (1) that are based on the same principles (Kirkham, 2005). Considerable empirical research supports the validity of Equation (1) (Hanks, 1983;Hobbs et al., 1994;Raes et al., 2006), and some form of Equation (1) is used to estimate the effects of water stress on plant productivity in most agronomic models (Saseendran et al., 2008). In arid and semi-arid regions, the primary reason that T is substantially less than T max is that T is limited by the amount of plant available soil water. Thus, knowledge of the dynamics of soil water storage (S) is critical for estimating plant production. These dynamics are generally described by the soil water balance equation, which, for a specified time interval, can be written as follows: where ΔS is the change in soil water storage; P is the amount of water deposited on the soil surface (including litter) as rain or snow; E amount of water that returns to the atmosphere directly (not transpired) as evaporation or sublimation; T is transpiration; F o is the net overland flow; and D d is the deep drainagewater that passes through the root zone and is assumed to eventually become groundwater recharge and/or streamflow. Units for all terms are cm 3 of water per cm 2 of soil surface. The soil ecohydrology model (SEM) uses a combination of the de Wit equation (Equation (1)) and the one dimensional water balance equation (Equation (2)) to calculate soil water dynamics and plant production. It is based on the Ekalaka Range Hydrology and Yield Model (ERHYM-II) developed by Wight and Hanks (1981). ERYHM-II was used extensively in a variety of climates to model evapotranspiration (ET), simulate soil water dynamics and estimate yield of rangeland vegetation (Cooley and Robertson, 1984;Wight and Hanson, 1990;Wight and Hanson, 1991;Conner, 1994;Weltz and Blackburn, 1995).
It is an example of a capacity parameter-based model (Addiscott, 1995;Portoghese et al., 2007) that uses relatively low variability parameters and tends to be best suited for relatively large-scale modelling. It was modified by Seyfried (2003) to include exponential drainage with time and other modifications. These modifications have been incorporated in the SEM model and have been recently applied to a similar sagebrush rangeland environment where the water balance aspect of the model proved effective .
This approach to the estimation of plant yield ignores potentially controlling variables such as nutrient availability, phenology effects, herbivory, disease, and so on. More sophisticated models have been developed to account for some of these processes, particularly those related to above and below ground nutrient partitioning and phenology (e.g. Hanson et al., 1988;Kiniry et al., 1992;Flanagan and Nearing, 1995;Boote et al., 1998;Nouvellon et al., 2000). In the context of rangeland productivity, a limitation of these models is that detailed, plant species or variety specific information is required to accurately include those processes (Raes et al., 2006). This kind of information is rarely available for rangeland vegetation, which has received much less research than agricultural crops. This problem is exacerbated by the fact that most native rangelands are composed of an assortment of different plant species, which vary both between and within seasons. In addition, the detailed soil and climatic information more complex models usually require are frequently absent in extensive semi-arid rangelands.
Regardless of the modelling approach or quality of data required, the best way to assess model utility for field applications is to evaluate model simulations relative to field data. Actual T measurements are difficult, so measurements of soil water are often used instead to evaluate models (Raes et al., 2006;Boote et al., 2008). Although this approach does not guarantee accurate simulation of plant productivity, soil moisture simulation accuracy is a requirement for accurate plant productivity simulation in water limited environments . A robust test of a model should include a wide range of conditions. This can be accomplished by model testing in a variety of environments or over a period of many years (i.e. a range of wet, dry, cool and warm years).
The sagebrush steppe is the largest semi-arid vegetation type in North America (West, 1983) and is characterized by an overstory dominated by various species of Artemisia. Long-term (27-34 years) soil water data were collected from three different sagebrush rangeland sites at the Reynolds Creek Experimental Watershed (RCEW) in southwest Idaho, USA (Seyfried et al., 2001, Seyfried et al., 2011. The sites are located along an elevational gradient of 500 m within the RCEW, providing a range of environments. With this unique data set, SEM soil water simulations can be evaluated under a variety of soil and climatic conditions over relatively long time periods. These simulations may have important implications for rangeland production in the sagebrush steppe. In particular, they may provide important insights into the interannual variability and long-term trends of plant production and the withinyear patterns of soil water use. The interannual variability of plant production has important implications for grazing management as it provides insight into rangeland carrying capacity and sensitivity to weather variations. It also has important implications for soil carbon flux, because the vast majority of plant production carbon eventually enters the soil (Bardgett and Wardle, 2010). In both cases, the long-term data are useful because there are very few studies of either plant yield or carbon flux that extend beyond a very few years and therefore fail to characterize the full sensitivity of those processes to weather variations. The determination of long-term trends also has obvious implications regarding the impact of climate change on rangeland productivity. Previous research has established that average annual temperature has been increasing during the previous 45 years at all elevations within the RCEW and, while total annual precipitation has shown no temporal trend, the proportion of rain (as opposed to snow) has increased (Nayak et al., 2010). Over a somewhat shorter time period (32 years), no temporal trend in soil water has been detected (Seyfried et al., 2011).
The within-year trends, specifically the trends of soil water storage during the year, have important implications for grazing management. The climate in most of the sagebrush region is characterized by an asynchronous relationship between the supply of water via precipitation and evaporative demand (Schlaepfer et al., 2011). The majority of effective precipitation is usually received in fall, winter and spring (Sneva and Hyder, 1962), when plants are dormant or just entering the active stages of plant growth. Evaporative demand is greatest in late spring and summer. For this reason, it may be possible to estimate the eventual growing season plant production from the soil water storage accumulated during the fall and winter months prior to the growing season. This information could improve management decisions, especially those related to stocking rates.
If the assumptions behind SEM are appropriate and soil water simulations are accurate, we can reasonably apply the model calculated yield index to the long-term data. Our objectives in this study were to (1) evaluate the accuracy of SEM soil water simulations using long-term soil water data collected in varied sagebrush steppe environments; (2) determine the interannual variability and long-term trends of model-estimated yield index; and (3) explore the potential use of model-calculated pre-growing season S to predict end-of-season productivity.

Site description
Soil water content (θ, cm 3 cm À3 ) was measured at three sites on the RCEW in Owyhee County of southwestern Idaho (Figure 1). The three sites, located at different elevations, represent differing climatic conditions, soils and vegetative communities. Soil classification and parent material information were obtained from the Natural Resources Conservation Service website (http:// websoilsurvey.sc.egov.usda.gov) as part of the Owyhee County soil survey conducted in 2003. At an elevation of 1190 m, the Flats site (FL) has a mean annual precipitation of 240 mm and a mean annual temperature of 9°C. The soil at FL is mapped as a Hardtrigger-Enko complex (2-15% slopes) and classified as a fine-loamy, mixed, superactive, mesic Xeric Haplargid with a parent material of volcanic ash and loamy alluvium. The second site, Nancy Gulch (NG), is at an elevation of 1400 m, has a mean annual precipitation of 300 mm and a mean annual temperature of 9°C. Soils at NG are mapped as the Arbidge-Owsel-Gariper complex (1-15% slopes) and classified as a fine-silty, mixed, superactive, mesic Durinodic Xeric Haplargid with a parent material of volcanic ash, mixed alluvium or loess. Vegetation at NG is characterized by Wyoming big sagebrush, bluebunch wheatgrass (Pseuodroegneria spicata Pursh), bottlebrush squirreltail and Sandberg bluegrass as the dominant species. The third site, Lower Sheep Creek (LSC), is at 1627 m and has a mean annual precipitation of 350 mm and a mean annual temperature of 8°C. The soil is mapped as a Vitale-Itca-Rubble land complex (2-60% slopes) and is classified as a loamy-skeletal, mixed, superactive, frigid Typic Argixeroll with a parent material of tephra, alluvium or colluvium over bedrock. Dominant species at LSC are low sagebrush (Artemisia arbuscula Nutt.), lupine (Lupinus L. sp.), milkvetch (Astragalus L. sp.) and Sandberg bluegrass. Slopes are very gentle at all sites so that topographic effects, which can be critical in semi-arid rangelands (Gutierrez-Jurado et al., 2006), were minimal. The temporal distribution of precipitation and temperature is illustrated with long-term data from the NG site ( Figure 2). All three sites follow the same seasonal trends. Precipitation during summer months is low to very low and approximately the same at all sites. The summer time period has the greatest evaporative demand, as exemplified by the temperature (solar radiation follows a similar trend). Vegetative growth during that period is dependent on the accumulation of soil water during the previous months, especially November, December, January, February and March (April is transitional), when evaporative demand is low and incoming precipitation is retained in the soil. Winter precipitation at FL and NG is mostly in the form of rain, with occasional snows melting soon after the event, making snow water inputs effectively a delayed rain. At LSC, snow accumulation can be significant in some years, resulting in a spring snowmelt pulse of input water as well as the potential for snow redistribution by wind.

Data collection
Soil water content was measured using neutron moisture metres over a 27-32-year period, depending on the site. The method was first developed in the 1950s (Gardner and Kirkham, 1952) and has been a well-accepted method for measuring soil water content for many years (Chanasyk andNaeth, 1996, Hignett andEvett, 2002). Seyfried et al. (2001) describe the details of instrumentation and calibration used in this study.
The advantages of the neutron probe for measuring θ are that, being non-destructive, it allows for repeated monitoring of the same site and that it has a relatively large measurement volume (sphere of about 15 cm radius). This latter feature makes the neutron probe especially good for estimating the total profile soil water storage S, as opposed to θ at a specific depth. Aside from the use of a radioactive source, the primary disadvantage of the instrument is that it requires a site visit for each measurement. With the exception of a 1 year hiatus (1996)(1997), data were collected biweekly, averaging about 23 readings per year, year round. The implications of biweekly as opposed to hourly data commonly collected with more recently developed electronic instruments are described in detail by Seyfried et al. (2011).
Measurements were made at depths of 15, 30, 61 and 91 cm and, at LSC, 122 cm. We present data collected at all depths, but our focus is on the total profile S, which is calculated as follows: where S is the total soil water stored (cm of water) between the soil surface and the bottom of the soil profile, which is designated with the subscript, and θ is the neutron probemeasured soil water content centred at the subscripted depth (cm). In order to facilitate comparisons across sites, years and different instruments, the data are presented as relative values such that the long-term minimum values at each depth are subtracted as described in Seyfried et al. (2011). Relative values are represented by a subscript 'r' so that S r (cm) and θ r (cm 3 cm À3 ) refer to the relative soil water storage and relative soil water content, respectively. The recurring minimum water content is widely observed in the region and the result of extended, dry summers (Obrist et al., 2004;McNamara et al., 2005).

Model description
The SEM model uses weather, soil and vegetation data to calculate a daily water balance as described in Equation (2), which is then used to calculate a soil water dependent yield index. In addition, initial conditions are required that are easily estimated in this very dry summer climate as θ returns to a minimum value each year and the native vegetation returns to a predictable, low leaf area index (LAI). We describe the model briefly in the succeeding paragraphs. See Seyfried et al. (2009) and Seyfried (2003) for more complete descriptions.
The weather data required are the daily minimum and maximum temperature, precipitation and solar radiation. The data used were collected at long-term meteorological stations located close to the measured soil profiles using methods described by Hanson (2001) and Hanson et al. (2001). From these data, daily water input (rainfall and snowmelt) is determined and the evaporative demand, quantified as the potential evapotranspiration (PET) is calculated using the Priestley-Taylor equation (Priestley and Taylor, 1972). The primary vegetation data required are the maximum LAI, the maximum rooting depth and the relative growth curve, which is the temporal progression of leaf area under very high available water conditions (Seyfried, 2003). The LAI data were estimated from previously reported data collected at the sites (Seyfried, 2003). The maximum rooting depth is generally assumed to be the depth to bedrock in these soils.
Soil parameters control the redistribution of water within the soil profile and the ultimate amount of D d . The required parameters are the θ values at saturation, field capacity and the plant extraction limit (analogous to the permanent wilting point for agronomic crops) for each layer simulated and were determined for each site from field measurements of θ as described by Seyfried et al. (2009). These values take into account the coarse fragment content, which is assumed to contribute nothing to the soil water storage dynamics. In addition, the soil evaporation limit for the surface layer (assumed to be 0.02 cm 3 cm À3 ) and the soil depth are required as input.
Given these inputs, PET is partitioned between potential transpiration (T p ) and potential surface evaporation (E p ), which are dependent on the growth curve-derived LAI for that date. Actual (simulated) T is calculated for each soil layer as a function of θ, relative to field capacity and root density. Calculation of actual evaporation, E s , is based on the time from input (e.g. rainfall) and θ. In this environment, F o is assumed to be zero based on numerous years of runoff data collection (Pierson et al., 2001). Thus, the 'losses' of water to the system are T, E s and D d . The plant yield index, which is the ratio between the simulated, soil water dependent yield (Y) and the potential yield given a very wet year (Y p ), is calculated as which is a modification of Equation (1) (Wight and Hanks, 1981) in which Y estimation is scaled to the estimated 'potential' yield for the site.

Model evaluation
We used the long-term soil water data to evaluate the accuracy of the SEM model simulations. Three quantitative metrics were used, linear regression of the measured and simulated values, the Nash-Sutcliffe efficiency index (NSE) and the mean absolute error (MAE). With linear regression, a perfect agreement between measured and simulated values yields an R 2 value of 1.0 with a slope of 1.0 and a Y-intercept of 0.0. Values of R 2 between 1 and 0 indicate the degree of agreement between modelled and simulated values described by a linear model. Slopes and intercepts differing from 1.0 to 0.0 indicate bias in the relationship between measured and modelled values. The NSE index (Nash and Sutcliffe, 1970) is a widely used statistical metric for evaluating model performance. It is calculated for S as follows: Where S m is the measured soil water storage, S s is the simulated storage and S avg is the average measured S. An NSE value of 1.0 indicates perfect agreement between simulated and measured values, while an NSE value of 0 indicates that the average S value over the study period matches the measured data as well as the model. NSE values less than 0 indicate that the overall average value matches the measured data more closely than the model.
The MAE uses the absolute value of differences between measured and simulated values as a metric (Moriasi et al., 2007). The closer to 0, the better the agreement between the two sets of data. The MAE has the advantage over many metrics in that it is expressed in the same units as the data and is linearly related to the level of model agreement with the evaluation data.
Although these metrics provide important information concerning the quality of the agreement between the simulated and measured values, none are definitive (Krause et al., 2005). This is particularly true regarding the dynamics of S, which are important to capture with a model but can be obscured in the averaging process associated with the evaluation metrics. It is also important to understand the timing or circumstances associated with discrepancies between the measured and simulated values in order to properly evaluate the results. We therefore supplement the quantitative metrics with a qualitative description of SEM performance.

Soil water modelling
Data description. For many applications, the primary output from the SEM model is the S or S r , which can be related to plant production and deep percolation. It is important, however, to understand the linkage between the θ r outputs used to calculate S r for at least three reasons. First, the partitioning of soil water between the surface and the remainder of the profile strongly affects the amount of E s , as opposed to T. Second, it is important to understand within-profile changes because they affect losses to D d . And third, accurate representation of the within-profile soil water dynamics supports the basic assumptions behind the model (e.g. one dimensional flow with negligible lateral loss). We present 7 years of S r data, both simulated (S r,s ) and measured (S r,m ) from LSC in Figure 3A along with corresponding depth profiles of relative soil water content, θ r , again both simulated (θ r,s ) and measured (θ r,m ), data from LSC in Figure 3B.
The very strong seasonal wetting and drying is evident at all depths with θ r,s values approaching zero each year in the summer and early fall as plant available water is lost to ET. At 15 cm, θ r,s values rise to similar levels each year, as the surface is fully wetted even in dry years. This layer is 'spiky' relative to the others as small individual events register near the surface and tend to be dampened at greater depths. The wetting up of greater depths invariably follows a vertical progression from the surface downward. In the driest years (e.g. 1988), the simulated wetting front did not penetrate below the 30 cm layer while, in very wet years, it penetrated to a depth of 120 cm (e.g. 1993).
The measured data, being biweekly, do not capture the rapid, short term increases and decreases portrayed by simulation due to specific events; however, previous analysis has shown that biweekly data accurately represent seasonal soil water dynamics in this environment (Seyfried et al., 2011). In general, θ r,m values follow θ r,s closely, both in terms of maximum values and in the timing of wet up and dry down. Some substantial deviations are apparent as well. For example, in 1988, the maximum θ r,s at the 30cm layer was 0.04 cm 3 cm À3 , while θ r,m exceeded 0.13 cm 3 cm À3 . In that year, no simulated water penetrated to the 61cm depth while there was a distinct θ r,m rise at that depth. On the other hand, θ r,s values at 30 cm exceeded θ r,m by about 0.06 cm 3 cm À3 in 1992. Note that very little change in either measured or simulated values was registered at 120-cm depth, indicating that, at the wettest site in the study, little or no water passed vertically through the profile. Thus, virtually, all input water is subject to the process of ET. The total water storage determined from the data represented in Figure 3A, or the dynamic water storage, is plotted for both measured and simulated values in Figure 3B. The curves tend to be smoother than for individual layers because storage is large relative to any given event. The level of agreement between simulated (S r,s ) and measured (S r,m ) soil water storage at LSC is generally better than that between θ r,s and θ r,m . Even so, the discrepancies pointed to earlier are still apparent. Thus, S r,s values in 1988 were substantially below S r,m while S r,s values exceeded S r,m in 1992. Within the series of years presented, the amount of available water at the outset of the growing season ranges about threefold, from about 6 to 18 cm, suggesting large differences in plant growth for those years. Also important but not so apparent is the impact of spring rains. In 1993, spring rains extended the growing season considerably so that S r,m in 1989 decreased below 5 cm on June 6, while that did not occur until July 8 in 1993.
Soil water dynamics. A somewhat unique challenge of this project is to capture variations in soil water over many years with a corresponding wide range of conditions. We compare θ r,s and θ r,m for the 27-year period of record at different depths at the NG site. NG is intermediate among the sites in most respects and therefore used as the example. As the statistical analysis will demonstrate, the level of agreement between measured and simulated values was similar at all three sites.
The basic seasonal and interannual trends described in Figure 3A are apparent in Figure 4A-D but with a greater variety of conditions and notably less deep percolation at the drier, NG site. The 15 cm θ r,s values exhibit the same 'spikiness' seen at LSC and a consistency of maximum values, indicating a range in θ r,s of about 0.27 cm 3 cm À3 ( Figure 4A). There is general agreement with θ r,m in terms of maximum values and dynamics except during the years 1998-2004, during which θ r,s was much greater than θ r,m . At 30 cm, the maximum θ r,s values were a little lower than at 15 cm and the θ r,m data are generally much smoother, as small events were dampened ( Figure 4B). The level of agreement between θ r,m and θ r,s was generally quite good except in 1988, when θ r,m greatly exceeded θ r,s . During the very dry years of 1991 and 1992, θ r,m and θ r,s remained near 0 cm 3 cm À3 all year as the wetting front failed to penetrate to the 30-cm depth. Otherwise, most years had a peak value of about 0.2 cm 3 cm À3 .
The peak values at 61 cm are highly variable but generally much lower than for the 15 and 30 cm data, with several years remaining near 0 cm 3 cm À3 ( Figure 4C). It is clear that the wetting front often does not pass through 61 cm. There was general agreement between θ r,m and θ r,s in terms of peaks and dynamics, although in 1988 θ r,s was much lower than θ r,m , as it was at 30 cm. At 91 cm, very low values indicate that the wetting front never really got to 91 cm ( Figure 4D). (Recall that measurement volume is roughly a 15-cm radius). Small discrepancies between measured and modelled values are apparent, but both are almost always close to 0.0 cm 3 cm À3 .
The profile dynamics at all three sites are similar, with strong seasonal effects of wetting up during the winter and spring and drying out in the summer, evident each year ( Figure 5A-C). From visual inspection, the level of agreement between measured and simulated S r was similar for each site. In addition, the same years tend to be relatively wet or dry at each site because they are subject to the same synoptic weather patterns. Thus, the early 1990s tend to have low values and the late 1980s tend to have high values at all three sites. Differences between the sites, consistent with the precipitation and temperature gradient, are also evident. For example, S r was relatively high in 1986 at all three sites, but the maximum at FL was just over 14 cm, at NG about 18 cm and at LSC about 20 cm. In the dry years, LSC was rarely less than 10 cm, while at FL several years are below 6 cm and NG was intermediate between those two sites. (Note that S r for LSC was  calculated to a depth of 106 cm as it was for Fl and NG, for purposes of comparison). The overall averages of S r for each site during the period that NG was monitored were as follows: 5.78 cm at LSC, 4.58 cm at NG and 2.99 cm at FL.
Statistics. The regression statistics (Table I) indicate a similar level of agreement between modelled and measured values for the three sites. In all cases, S r (0-106 cm) R 2 values are slightly greater than 0.8, indicating that the SEM model effectively explains more than 80% of the measured variance. The slope for the Flats is close to 1, indicating almost no bias, whereas the slope of 1.11 for NG indicates a slight bias towards overestimation and the slope of 0.86 at LSC indicates a slight negative bias. Offsets are practically 0.0 cm at all three sites. When applied to θ r , R 2 values are slightly lower than those for S r at the 15-and 30-cm depths and then tend to deteriorate at greater depths and are very low at the greatest depth measured for each site. Roughly the same can be said for the slopes at different depths, which are reasonably close to 1.0 for the 15-and 30-cm data and then deviate substantially from that value as the depth increases. Offsets are very near 0.0 cm 3 cm À3 at all depths except the 15 cm.
Nash-Sutcliffe efficiency values for S r are well above 0 for all three sites, indicating that the model was more effective than simply using the overall average. With respect to θ r , NSE values were almost as high as those for S r for the 15-and 30-cm depths and then decline rapidly to the point that model simulations are either about the same or worse than the overall average. The S r MAE values range from 0.9 to 1.6 cm with only small biases. The θ r MAE are also similar for the three sites, with FL generally a little lower than NG or LSC. At all three sites, there is a strong MAE decline with depth, indicating much smaller deviations, on average, between the measured and simulated values deeper in the soil profile.

Yield index
Long-term YI. As might be expected, given the high degree of interannual variability of precipitation in semiarid regions, the yield index (YI) varies dramatically from year to year (Figure 6). At LSC, for example, there is more than a fivefold range in YI, from about 0.15 to greater than 0.8, implying a similar range in site productivity. The three sites follow the same sequence of high and low YI years as they respond to the same synoptic weather patterns. LSC, Table I. Goodness of fit metrics for the three sites in terms of profile storage (from 0 to 106 cm) and soil water content at each depth.
NSE refers to the Nash-Sutcliffe efficiency index and mean absolute error (MAE) is the average absolute value of the difference between the measured and simulated value.  which is cooler and has greater winter precipitation than the other sites, consistently has the highest YI, followed by NG and FL, which are similar. There is no significant (α = 0.05) long-term temporal YI trend over the measurement period at any of the sites. Regression of YI versus time over the period of record at each site yield R 2 values less than 0.01 or less with slopes less than 0.006, which is not detectable over a 30-year time span (<0.035 YI units in 30 years). This result is in agreement with the lack of temporal trend at RCEW in annual precipitation (Nayak et al., 2010) and soil water storage (Seyfried et al., 2011). This lack of temporal trend means that all the data at a given site can be regarded as part of the same population (e.g. stationary) for the purposes of investigating within-year trends.
Within year YI. We used linear regression to assess the potential for using S r determined prior to the growing season to predict the subsequent seasonal YI. A high correlation between S r on a given date and the seasonal YI supports the use of pre-growing season S r to estimate YI for that season. The results were similar for all three sites (Table II). There was some degree of correlation with all dates and, in general, the degree of correlation was poor in winter (January) and improved as the season progressed. These results indicate that there is a definite impact of winter soil water storage on the seasonal YI but also that relatively late, spring precipitation is critical.
The R 2 data alone does not provide much practical guidance in terms of what management action might be undertaken. A plot of S r versus YI with the data segregated into three groups, the lower 25% of years, the upper 25% of years and the middle 50% of years (Figure 7), is informative. The data from all sites for March 1 indicate that relatively dry winters, with S r values in the lower 25% of all years, had YI values that were at highest, the average YI, and were usually much lower. On the other hand, years with S r values in the upper 25% on March 1 had, at worst, an average YI and were mostly higher than average. For years with intermediate S r values, YI values were both well below and well above the average, and S r had little utility as a predictor.

Soil water modelling
Storage. For the purposes of this study, S r is the most critical parameter because it is directly related to the YI. The statistical metrics of R 2 and NSE both indicate a reasonably good agreement between the simulated and measured S r values at all three sites (Table I). The reasonably high R 2 , slopes near 1.0 and very low offsets support this. Although some soil water modelling studies have produced higher R 2 values, values similar to those reported in Table II are common (Raes et al., 2006;Kelleners et al., 2009). Direct comparisons with other studies are not straightforward partly because the length of record we analysed is roughly an order of magnitude greater than that for most studies. This leads to a relatively  wide range of environmental conditions the model must address, as well as issues related to data quality over time.
As an example, Seyfried et al. (2009) obtained substantially higher R 2 values using neutron probe data and the SEM model at a different site in the RCEW but with only 2 years of data to analyse. The NSE values all indicate levels of agreement between S r,m and S r,s that are much better than simply using the overall average. According to Moriasi et al. (2007), the model fit for LSC and FL is 'very good' and that for NG is 'good'.
The MAE values of 1-2 cm seem high considering that the dynamic range of S is 15-20 cm at the sites. Consider, however, that a difference of 0.01 cm 3 cm À3 translates to an error of 1 cm over a depth of 1 m. In other words, very small discrepancies, possibly due to measurement error, if consistent over the entire profile, could result in MAE values similar to those obtained. In general, the largest discrepancies occur during periods of relatively rapid decrease in S r during the spring drying when the entire profile tends to be either under or over represented in the model. At those times, small errors in the timing of plant activity (e.g. 'green up') or transpiration rate can result in relatively large, but short lived, discrepancies between measured and simulated values.
Water content with depth. Depth simulations of θ r are critical because they determine the amount of water lost to deep drainage and how surface evaporation and transpiration are partitioned, both of which ultimately affect YI. In general, the fact that θ r statistics tend to indicate worse agreement between measured and simulated values than for S r (Table I) is to be expected, because θ r,s requires accurate representation of infiltration and redistribution with depth as well as an accurate partitioning of ET among each depth.
In this context, we also note that the R 2 and NSE statistics are reasonably high near the soil surface and then deteriorate at greater depths while the MAE statistics show nearly the opposite trend. The large MAE values occur during periods of rapid transition. These contradictory trends can be explained by the annual soil water dynamics and illustrate the importance of evaluating the meaning of each metric. Most of the discrepancies between θ r,m and θ r,s that contribute to the MAE occur during the dry-down period when small errors in the simulation of transpiration timing can lead to large discrepancies and therefore large MAE. As the soil depth increases, these periods of rapid change become less frequent with smaller changes (e.g. Figures 3B and 4D), which both tend to reduce MAE. At the greatest depths (and drier soils), there is very little variation in θ r,m and MAE is very low.
These dynamics have the opposite impact on the R 2 and NSE statistics. The R 2 may be thought of as the ratio of the model explained variance to the total variance subtracted from one. Other things being equal, a decrease in total variance results in an increase in the ratio and therefore a decrease in R 2 . The extremely low R 2 values deep in the soil profile (Table I) occur at depths that have very little variation of θ r over time (e.g. Figures 3B and 4D) and there is a small MAE. That is, even though the actual discrepancies are small, the overall variability is even smaller so that the ratio term increases and the overall R 2 decreases. The situation is similar for the NSE. The denominator in Equation (5) reflects the level of variability relative to the measured average measured value. For depths that exhibit low variability, the average is a very good representative of the overall θ r and NSE is low. At NG, for example, the greatest θ r,m at 91 cm was only about 0.05 cm 3 cm À3 and almost all data were less than 0.02 cm 3 cm À3 so that the average actually describes almost all the data fairly well.
Error sources. Despite the relatively high statistical metrics for S r , there are many obvious differences between the model and measured values. These may be attributed to inaccurate input or evaluation data, inappropriate model structure or poor parameterization. In this study, we are fortunate to have excellent input data in that the climate data, which were collected at the site, have been carefully collected and checked. There are some issues with the evaluation data, however. This is not surprising given that several different instruments and operators were used over a 35-year period to collect data year round every 2 weeks under a wide range of conditions. One specific problem is that the neutron probe cable stop was placed at the wrong location for the 15-cm readings between 2000 and 2004, resulting in erroneously low data during that time ( Figure 4A). There are probably other data problems we have not identified. In general, however, the consistency of the data over time and the lack of a strong bias at any of the sites indicate that the NP data quality is good.
There are a number of ways the model structure may lead to inaccurate S r estimation that have been discussed at length elsewhere . The most basic structural assumption is that there is no net lateral flow or vertical bypassing in the soil profile if the water content is less than field capacity. While this has not been rigorously tested, there is considerable data from within the RCEW indicating that lateral overland flow is rare (Pierson et al., 2001). Regarding bypassing, all of the profile soil water data collected in this study and others (e.g. Grant et al., 2004;Seyfried et al., 2009;Seyfried et al., 2011) are consistent with a lack of bypassing in that they invariably follow a downward vertical progression when water contents increase.
It is expected that, because the progression of soil wetting is vertical, parameterization errors will tend to accumulate with depth. For example, a systematic underestimation of field capacity will tend to produce an overestimation of wetting depth and θ r,s as depth increases.
This may explain some of the small θ r,s spikes at NG ( Figure 4C and D) and some of the deterioration of model fit with depth described earlier.
In general, it is difficult to ascribe various discrepancies to a single source because they are interrelated. For example, there are many years when ET was either over or underestimated, resulting in most of the observed discrepancies. These could be due to poor parameterization (e.g. LAI) or a model issue such that the plant response to water stress was not properly represented, or some combination of the two. One exception to this is exemplified by the simulations at LSC in 1979-1981 ( Figure 5C). In those years, peak S r,m was much greater than peak S r,s . The S r,m values were moderately high in those years even though the precipitation was relatively low. We suspect that the actual water inputs to the site were enhanced by snow redistribution (blowing) during those relatively cold winters. Because the SEM model only uses water derived from precipitation, the modelled water inputs are low relative to the actual inputs under those conditions. Addressing this problem requires either a direct snow measurement campaign, insertion of a snow blowing routine in the SEM model or some kind of coupling to a different model. Overall, while we can attribute some of the observed discrepancies to specific sources like the cable stop or snow redistribution, we do not know the source of most of the discrepancies but it is safe to say that they may be attributed, to some extent, to variability of the soil water system.

Yield index
Despite the discrepancies described earlier, it seems clear that the model captures both the interannual variability and the within-year dynamics of S r with reasonable accuracy and little bias. Because both are strongly controlled by the vegetation (Walvoord et al., 2002;Seyfried et al., 2005), these results suggest that some inferences concerning the interannual variability of plant production are justified. We should be clear that YI is only an index. To the extent that Equation (1) is appropriate for these sites and that we have adequately described S r dynamics with the SEM model, we can expect that the YI accurately reflects trends in actual vegetation yield. Note that we assume no carryover of production from one growing season to the next, mostly due to a lack of quantitative data to the contrary, but this is also consistent with our observations. Factors related to soil fertility, disease or CO 2 'fertilization' are also not accounted for. Ultimately, extensive field monitoring is required to link the YI to actual production. To our knowledge, this has not been carried out in the sagebrushsteppe. Our interpretation of the YI data is that they are an indicator of relative production at these sites in terms of interannual variability and within-year patterns.
Interannual variability. With respect to the interannual variability, two results seem clear. First, there is roughly a fivefold range in YI, indicating a similar range in production. This has important management implications because current management generally imposes a single rate of grazing on any particular allotment. If that rate is chosen to minimize the negative impacts of heavy grazing, in most years, large amounts of forage produced on the land will not be available for domestic livestock. On the other hand, selecting stocking rates that more effectively utilize forage production in average and above-average years creates the potential for severe overuse in below-average years. Clearly, a more flexible approach, as discussed by Holecheck et al. (2001), has merit.
The second result is that, despite the documented increase in temperature at the RCEW during the study period (Nayak et al., 2010), there was no temporal trend in YI. This is consistent with research indicating no temporal trend in S r over a similar time frame (Seyfried et al., 2011) and research indicating contradictory effects of warming and CO 2 increase (Polley et al., 2013). It should be noted that Nayak et al. (2010) also found no temporal trend in precipitation, which is the primary determinant of plant water use. Given the relatively small (1-2°C) temperature increase over the time period and the extreme variability in YI (Figure 6), it will be difficult to detect a temperature effect 'signal'.
Within-year variability. The correlations documented from the within-year analyses reflect the importance of fallwinter precipitation for the subsequent YI in this environment. While the correlations are significant as early as January, the considerable scatter in the S r -YI relationship indicates that spring conditions substantially influence the eventual YI. As the year progresses, R 2 values improve and YI can be predicted with more accuracy (Table II). Eventually, R 2 values decline as stored soil water is 'used' and therefore no longer effects YI. From a management perspective, the value of predictions diminishes as time progresses because there is insufficient time to alter management (e.g. stocking rate) of the site. On March 1, the growing season has progressed to the point that useful predictions of YI can be made for about half of the years. That is, for dry winters with a low (lower 25%) March 1 S r ,, the eventual YI is, at best (given a wet spring), average and usually much lower than average. This suggests that lower stocking rates would be justified. For a wet winter, with a high (upper 25%) S r , a better than average YI is assured, even with a dry spring. This indicates that greater stocking rates are justified. In the remaining 50% of years, spring rains determine the eventual YI, and S r offers no special guidance in terms of stocking.
Extrapolation. We should note that both the interannual and within-year results need to be interpreted in the context of the soil water storage capacity (S c ) and the amount of input water supplied as rainfall and snowmelt. This was discussed by Seyfried and Wilcox (2006) in the context of recharge water generation in the same climate, and it is similarly critical for water availability for plant production. They defined the soil water storage capacity as where P s is the porosity (dimensionless), R d is the effective plant rooting depth and A w is the water holding capacity of the fine-earth soil fraction. Any or all of these factors may limit S c at different sites. High coarse fragment content limits P s , bedrock often limits R d and highly sandy soils limit A w . The soils in this study have sufficient S c to absorb and store virtually all incoming water before it is returned to the atmosphere so that essentially all incoming cold-season precipitation can be stored for future use. Where S c is small relative to the magnitude of water inputs, both the interannual and within-year YI dynamics will be affected. Interannual variability would be reduced because water lost to deep percolation in relatively wet years would not be available for plant consumption, thereby lowering many of the higher YI years. The ability to predict YI from S r would be reduced because large accumulations in the winter months would not be possible resulting in an elevated importance of spring precipitation. This is consistent with the finding of Smith et al. (2011) who, working in relatively low S c soils, found that spring precipitation was the primary determinant of seasonal plant production in site near the RCEW.

SUMMARY
The accuracy of simulated soil water output from the SEM model was evaluated by comparing 27-34 years of measured soil water data collected biweekly at three sagebrush dominated rangeland sites in the RCEW. The combination of dataset longevity and variety of conditions makes this a unique and powerful test of the model. From visual inspection, it is clear that, with some exceptions, the basic dynamics of S r and θ r were captured by the model at all three sites year after year. The quantitative metrics support this. Regression analysis (R 2 = 0.81-0.82) and the NSE index (NSE = 0.68-0.87) both indicate a high degree of agreement between simulated and measured S r . The MAE of 1 to 2 cm of water with low bias also indicates a good fit of the modelled and simulated data. At the specific depth increments, R 2 and NSE statistics indicate strong agreement near the soil surface that deteriorate with increasing soil depth. This was partly due to the loss of soil water variability over time at greater depths as was apparent by visual inspection and by the very low MAE observed at those depths. Taken as a whole, SEM simulations of soil water dynamics exhibited reasonably good agreement with measured data over a very wide range of conditions justifying the extension of the model results to the yield index.
The long-term YI data exhibited large interannual fluctuations, about fivefold, implying a similar range in plant production. We detected no long-term trends in YI for any of the sites over the analysis period in spite of the documented 1-2°C temperature increase at the RCEW during that time. Looking at within-year variability, we found that winter precipitation was significantly correlated with the seasonal YI. Furthermore, over the time period analysed, the March 1 S r could be related to the seasonal YI on about half of the years. That is, when the March 1 S r was in the lower 25%, the eventual YI was always average or less, while if it was in the upper 25%, it was always average or higher. In the remaining 50% of years, there was no clear correlation. This could improve the livestock stocking rates by providing managers with critical forage availability information prior to the growing season.