The study reported here employed the physics-based Integrated Hydrology Model (InHM) to conduct continuous hydrologic-response simulation from 1990 through 1996 for the Coos Bay experimental catchment. The uniqueness of the boundary-value problem used to simulate three sprinkling experiments was assessed, via model performance evaluation against observed piezometric and discharge data, for 33 events extracted from the continuous record. The InHM simulations could not adequately reproduce the distributed observed pore water pressures, suggesting that detailed characterization of the locations and connectivities of bedrock fractures is critical for future efforts designed to simulate distributed hydrologic response at the field scale for locations where bedrock fracture flow is important. The simulations presented here suggest the potential for interaction between the deeper water table and near-surface hydrologic response. The results reported herein suggest that while uniqueness can be reasonably achieved with respect to the integrated response (i.e., discharge), the integrated response uniqueness is no guarantee that the distributed response (i.e., pressure head) is either unique or well simulated.
 Hydrologic-response models based on numerical solutions to the coupled partial differential equations governing surface and subsurface flow are commonly described as “physically-based” or “physics-based” [see Freeze and Harlan, 1969; Loague and VanderKwaak, 2004]. One of the purported attractions of physics-based models is that (in theory) the governing equations, boundary conditions, and parameter values calibrated to a brief hydrologic record should then apply to most hydrologic conditions, even those conditions beyond the successfully tested range [see Abbott et al., 1986; Bathurst and O'Connell, 1992]. The question of whether hydrologic response can be simulated as well for a validation period outside of a calibration period addresses the issue of uniqueness. The definition of uniqueness [Neuman, 1973; Carrera and Neuman, 1986] requires that only one set of parameter values can be estimated from a given set of observations and that this parameter set must also represent the observed behavior for other hydrologic conditions. When employing a distributed model, the simulated hydrologic response should be compared to distributed observations of state variables in the watershed [see Dunne, 1983; Beven, 1989; Grayson et al., 1992; O'Connell and Todini, 1996]. A focus on the distributed response is of particular importance for physics-based simulation because of the many degrees of freedom (e.g., parameters, boundary conditions), which can give rise to equifinality when only the integrated response (i.e., discharge or solute concentrations in discharge) is used for evaluation [e.g., Beven, 1989, 2006; Ebel and Loague, 2006]. An additional question is whether uniqueness with regard to simulation of the integrated response indicates if uniqueness for the distributed response will also be achieved (or if the distributed response will be simulated correctly for any storm, much less all storms).
 Despite the importance of employing both integrated (e.g., discharge) and distributed (e.g., piezometric, soil-water content, or surface water depth) observations when evaluating the simulated hydrologic response during a validation period, there are relatively few studies addressing this issue. Refsgaard  reported an application of a physically based hydrologic model where discharge and piezometric response were shown to be nonunique for a validation period following a calibration period. Feyen et al.  found that discharge at the catchment outlet was well simulated during a validation period after calibration, but internal discharges and water table levels were not well simulated during the validation period. Anderton et al.  reported physically based simulation results in which discharge, soil-water content, and water table levels were all simulated worse during the validation period following a calibration period. Bathurst et al.  found that discharge, internal water table, and pressure head values were correctly simulated during a blind validation test. Heppner et al. , employing a characterization developed from event-based simulations for long-term continuous simulation with a physics-based model, found that water balance components, peak discharge, and sediment discharge were each well simulated while distributed soil-water contents were poorly simulated in their continuous hydrologic modeling.
 Clearly no consensus has been reached as to how robust physics-based hydrologic-response models are (or can be) with respect to uniqueness, especially with regard to simulation of distributed hydrologic response. Yet the question of uniqueness is of fundamental importance if physics-based hydrologic simulations are to provide a reliable foundation for investigations of physical processes where the distributed hydrologic response is important, for example, hydrogeomorphology [e.g., Loague et al., 2006; Mirus et al., 2007]. This study examines the uniqueness of physics-based simulation of the integrated and distributed hydrologic response for a well characterized field site.
1.2. Objectives and Study Design
 The principal goal of the study presented here is to employ continuous hydrologic-response simulation for a nearly 7-year period to assess the uniqueness of the hydrologic parameterization and boundary conditions (BCs) used by Ebel et al. [2007b] to simulate three sprinkling experiments at the Coos Bay 1 (CB1) catchment. Hydrologic-response simulation of high-intensity storms at the CB1 site is of particular interest because the field site experienced slope failure during a storm with a high rainfall intensity (i.e., 40 mm h−1) and large rainfall depth (i.e., total accumulation of 221 mm) in November 1996 (D. R. Montgomery et al., Instrumental record of debris flow initiation during natural rainfall, manuscript in preparation, 2008). An additional objective of this study is to further illustrate the potential benefits and limitations of using 3-D, transient, variably saturated hydrologic-response models to drive simulation-based hydrogeomorphology process investigations [see Loague et al., 2006].
 The study reported here builds upon the three separate week-long sprinkling experiments at the CB1 catchment conducted in 1990 and 1992 [Anderson et al., 1997a, 1997b; Anderson and Dietrich, 2001; Montgomery et al., 1997, 2002; Montgomery and Dietrich, 2002; Torres et al., 1998]. Important conclusions from the CB1 field study include the critical role of unsaturated flow in governing the hydrologic response at CB1 [Torres et al., 1998] and the contributions from fracture flow through the weathered bedrock for pore water pressure development and runoff generation [Anderson et al., 1997b; Montgomery and Dietrich, 2002; Montgomery et al., 1997, 2002]. Data from the three sprinkler experiments and site characterization from the long-term monitoring effort at CB1 were further analyzed by Ebel et al. [2007a] to define the boundary-value problem (BVP) for event-based hydrologic-response simulations using the physics-based Integrated Hydrology Model (InHM) [see Ebel et al., 2007b]. The CB1 simulations reported by Ebel et al. [2007b] focus on runoff, pressure heads, soil-water contents, and solute transport during the sprinkler experiments and led to the conclusions that (1) characterization of layered permeability contrasts was important for hydrologic-response simulation, (2) neglecting fracture flow through the weathered bedrock precluded simulating the locations of some pore water pressure “hotspots,” and (3) in situ measurements of soil-water retention provided superior parameterization, with regard to simulation of hydrologic response, than soil-texture based estimates for the CB1 soil.
 It should be noted that there are several important differences between the study reported here and the sprinkling experiment simulations from Ebel et al. [2007b]. For example, the week-long sprinkling experiments were designed to bring the system to a hydrologic steady state, while the variable duration storms considered in the continuous simulations reported here are transient in nature (i.e., shorter event durations with large temporal variability in rainfall intensities). The sprinkler experiments were conducted at relatively low intensities (i.e., 1.5–3.0 mm h−1) while the intensities considered in this effort range up to 40 mm h−1 (only 10 min in duration prior to the 1996 slope failure). Unsaturated zone observations (e.g., pressure heads, soil-water contents, deuterium concentrations) were collected during the sprinkling experiments while only piezometric and runoff data exist for the long-term effort reported here. The event-based sprinkling experiment initial conditions (ICs) were more closely controlled to match the observed ICs including the deeper water table location, while the ICs for storms during the long-term effort are the result of months to years of previously simulated continuous hydrologic response.
 The sprinkling experiment BVP [see Ebel et al., 2007a, 2007b], referred to hereinafter as the “Base Case,” is used to simulate the 7 years of hydrologic response at CB1 from 1990 through the 1996 failure. The Base Case BVP is evaluated against runoff and piezometric data to assess model performance and uniqueness. Alternate BVPs, based on alternative viable parameterizations, are assessed in the context of a sensitivity analysis to improve the uniqueness of the CB1 BVP. It is worth pointing out that it is not the intention of this study to meticulously calibrate the InHM BVP to match the observed data, “declare victory,” and move on. Instead, problems with the CB1 BVP uniqueness and mismatches between the observed and simulated hydrologic response are employed to suggest hydrologic-response observations and measurements that would improve the uniqueness of the BVP and could be of particular importance for simulating the hydrologic conditions preceding slope failure.
Figure 1 shows the locations of the long-term (i.e., 1990–1996) hydrologic-response observations collected at CB1. The long-term data set consisted of discharge from two weirs and automated piezometers installed into soil, saprolite, and weathered bedrock [see Montgomery et al., 1997, 2002; Montgomery and Dietrich, 2002]. The upper weir was installed in November 1989, and monitored the soil runoff contribution; the lower weir was installed in October 1991, and monitored the weathered and unweathered bedrock runoff contribution that flowed underneath the upper weir. Soil and saprolite piezometric response was monitored from 22 automated piezometers installed in nine nests prior to the first sprinkling experiment in 1990. Weathered bedrock piezometric response was monitored using 12 piezometers installed after the second sprinkling experiment in 1990. Five of the soil/saprolite piezometers (5–3, 5–3A, 7–6, 9–5, and 9–5D; see Figure 1) recorded pressure head to a precision of 0.1 m from 1 December 1993 through the landslide and therefore are only useful for evaluating saturation during that period. Four of the weathered bedrock piezometers (B4, B4A, B5, and B5A; see Figure 1) were outside the area simulated and two of the weathered bedrock piezometers (B1A and B9A; see Figure 1) had negative values because of pressure transducer problems and are consequently not used for simulated pressure head evaluation. The deep well at the ridge crest shown in Figure 1 was completed to a depth of 35 m (see Anderson et al.  for further information). Figure 2a shows the monthly rainfall at North Bend airport, and Figure 2b illustrates the continuity of the discharge and piezometer records during the simulated period. The temporal resolution of both the discharge and piezometric records varied between 600 s (in the winter rainy season) and 1200 s (in the summer dry season). The weir and piezometer observations were recorded by data loggers until the 1996 landslide, which destroyed most of the monitoring equipment.
 Subsurface flow in 3-D variably saturated porous media is estimated in InHM using
where is the Darcy flux [LT−1], qb is a specified rate source/sink [T−1], qe is the rate of water exchange between the subsurface and surface continua [T−1], ϕ is porosity [L3 L−3], Sw is water saturation [L3 L−3], t is time [T], fa is the area fraction associated with each continuum [-], and fv is the volume fraction associated with each continuum [-]. The Darcy flux is estimated as
where krw is the relative permeability [-], ρw is the density of water [ML−3], g is the gravitational acceleration [LT−2], μw is the dynamic viscosity of water [ML−1T−1], is the intrinsic permeability vector [L2], z is the elevation head [L], and ψ is the pressure head [L].
4.2. Surface Flow
 Surface flow is estimated using the diffusion-wave approximation of the depth-integrated shallow water equations. The 2-D surface flow is conceptualized as a second continuum linked with the subsurface via a thin soil layer of thickness, as [L], with fluxes between the continua determined by dynamic pressure-head gradients [VanderKwaak, 1999]. The equation for conservation of water on the land surface is
where ψs [L] is the water depth with the superscripts mobile and store denoting mobile versus stored water, s is the surface water velocity [LT−1], qb is the source/sink rate (that is, rainfall/evapotranspiration) [T−1], qe is the surface-subsurface water exchange rate [T−1], Sws is the surface saturation [-], and hs is the average height of nondiscretized surface microtopography [L]. A 2-D form of the Manning water depth/friction discharge equation is used to estimate surface water velocities as
where is the Manning's surface roughness tensor [TL−1/3] and Φ is the friction (or energy) slope [-].
5. CB1 Boundary-Value Problem: Base Case
 The BVP used in this study consists of equations (1)–(4), the finite element method of numerical solution, the BCs, the ICs, and a spatial parameterization of the surface/near-surface hydraulic properties. The Base Case BVP [see Ebel et al., 2007a, 2007b] for the CB1 simulations reported here is summarized in the following sections.
5.1. Finite Element Discretization
 The CB1 finite element mesh consists of 264,220 prism elements (138,544 nodes) in the subsurface and 4804 triangular elements (2474 nodes) for the surface. The vertical mesh discretization (Δz) varies from 0.04 m (near surface) to 1.67 m (at depth) and the horizontal mesh discretization (Δx, Δy) varies from 0.4 m (down-gradient areas, along the measurement platforms, and near the hollow axis) to 2.0 m (near the up-gradient boundaries). An adaptive time step (Δt) that ranged from several seconds to a few hours was used.
5.2. Boundary Conditions
 The lettering in Figure 3 identifies the points of interest for the BC description below. Impermeable BCs are specified for the drainage divides at the side boundaries for the surface (AE and BH) and subsurface (ADJE and BCIH), the up-gradient surface drainage divide (AB) at the ridge crest, and the basal subsurface BC (DCIJ). The depth of the basal boundary was set deep enough so that the BC did not impact simulated near-surface hydrologic response. The CB1 upper weir (see Figures 1 and 3) consists of sheet metal sealed to the bedrock using concrete and is represented using an impermeable subsurface BC (along FG from the surface to the top of the weathered bedrock) and a critical depth [see Chow, 1959] surface flow BC (FG). The up-gradient (ABCD) and down-gradient (EHIJ) subsurface BCs (except for the upper weir) use a local head BC, as described by Heppner et al. . The local head BC represents a known hydraulic head value (that can be time variable) at a point outside the boundaries of the finite element mesh. The volumetric boundary flux (Qb,l) [L3T−1] is calculated for a boundary node as
where hl is the total head at the local head point [L], hb,pm is the total head for the porous medium equation at the boundary node [L], A is the nodal area [L2], and dl is the (positive) distance in the x-y plane between the node and the regional sink point [L]. It is worth noting the local head point is treated as a vertical line of constant potential below the specified local head point, which results from using the two-dimensional distance (dl) in equation (5).
 For the Base Case long-term simulations, the up-gradient local sink is parameterized using the level of the deep well at the up-gradient ridge crest (see Figure 1). The time variable well data are used for the period from 7 October 1992 (the start of the automated deep well record) through failure on 19 November 1996 and the average water level height (3.69 m) during the period of record is used for 1 January 1990 through 7 October 1992, when there are no deep well data. It should be pointed out that the up-gradient (ABCD) subsurface BC in the event-based simulations by Ebel et al. [2007b] was impermeable, but the local sink BC used in the effort reported here attempts to better represent the long-term (i.e., multiple simulated years) deeper water table dynamics. The down-gradient local head point is a location of consistent observed surface saturation in the channel downslope of CB1 just below the lower weir.
 A specified flux BC is employed for the surface (ABHGFE in Figure 3) to represent precipitation fluxes. The mean rainfall rate from three onsite automated rain gages (see Figure 1) provides the specified precipitation flux for the surface BC. The temporal resolution of the rain gages is either 600 s (during the winter rainy season) or 1200 s (during the summer dry season. Evapotranspiration is neglected during the entire simulation period; the accuracy of this assumption is evaluated in section 7.1.6.
5.3. Initial Conditions
 Because the long-term hydrologic-response simulations are continuous, the ICs (i.e., pressure head at every subsurface node and the water depth at every surface node) are specified only at 12:00 A.M., 1 January 1990. The ICs are the outputs from a yearlong warm-up simulation driven by the 1990 CB1 rainfall record, which had a total rainfall depth of 2150 mm (including the two sprinkling experiments that added 463 mm of water). The total rainfall depth at the North Bend, Oregon, Airport for 1990 was 1598 mm. It should be noted that the mean and median annual rainfall depths at the North Bend Airport are 1595 mm and 1596 mm, respectively, on the basis of 104 years of record.
5.4. Parameterization of Hydraulic Properties
Figure 3 shows a fence diagram of the hydrogeologic units used to parameterize the CB1 subsurface. Table 1 contains the thicknesses and hydraulic parameter estimates for each of the CB1 subsurface formations/layers. The soil and saprolite depths (see Figure 3) were estimated, using ordinary Kriging, on the basis of the 100 measurements reported by Montgomery et al.  and Schmidt . The weathered bedrock layer thickness decreases moving downslope from the ridge crest [Anderson, 1995] to near-zero thickness near the upper weir. Soil porosity was estimated as the mean water content at saturation from the six retention curve experiments of Torres et al. . The saprolite, weathered bedrock, and unweathered bedrock porosities (see Table 1) were estimated from the CB1 drill core data contained in the work of Anderson et al. .
 The saturated hydraulic conductivity values in Table 1 are based on piezometer slug tests [see Ebel et al., 2007a] and incipient-ponding sprinkling rates at the surface during the retention-curve experiments of Torres et al. . The uniform hydraulic conductivity values used for each of the hydrogeologic layers (see Figure 3 and Table 1) are the result of insufficient data for characterizing the spatial structure for a meaningful 3-D interpolation [see Ebel et al., 2007a]. Because of the large range in estimates of saturated hydraulic conductivity (4 orders of magnitude) in the weathered bedrock and no discernable spatial pattern [Montgomery et al., 2002; Ebel et al., 2007a], the unweathered bedrock conductivity and weathered bedrock conductivity were set to the same value for the Base Case. Slug tests were not conducted in the unweathered CB1 bedrock, so the unweathered bedrock saturated hydraulic conductivity was estimated (via calibration) to be 5.0 × 10−7 m s−1. The range of saturated hydraulic conductivities used for this calibration was constrained between the range of saturated hydraulic conductivity estimates calculated from the rate of water table decline in the deep well during the summer dry season, which ranged between 4.0 × 10−7 m s−1 and 5.0 × 10−7 m s−1.
Figure 4a shows the measured hysteretic soil-water retention data [see Torres et al., 1998] and the estimated soil-water retention curve generated using the van Genuchten  method. The hysteresis representation [Scott et al., 1983; Kool and Parker, 1987] incorporated by Ebel et al. [2007b] into InHM is subject to pumping effects with repeated cycles of wetting and drying, making the method ill-suited for long-term continuous simulation. While alternate methods of representing hysteresis that do not suffer from pumping effects exist [e.g., Huang et al., 2005], the large memory requirements needed to store the many hysteretic reversals make these methods impractical for long-term simulation [see Werner and Lockington, 2007]. Consequently, the nonhysteretic hydrologic-response simulations reported here use a “mean” curve between the delimiting primary wetting and drying soil-water retention curves. Stauffer and Kinzelbach  found that the mean curve between the primary wetting and drying soil-water retention curves was a better approximation for nonhysteretic flow simulation than either the wetting or drying curve. Figure 4b shows the hydraulic conductivity function estimated using the van Genuchten  method. The −2 m limit to the pressure head axes in Figures 4a and 4b solely represent the limits of the experimental data collected by Torres et al.  and are not a “hard limit” to the simulated pressure heads (i.e., the simulated pressure heads decline beyond −2 m). The soil-water retention and hydraulic conductivity functions for the saprolite, weathered bedrock, and unweathered bedrock layers were not measured at CB1 and are parameterized using nonhysteretic characteristic curve values from Wu et al.  and the van Genuchten  method.
6. Continuous Long-Term Simulation Results
6.1. Model Performance Evaluation
 The performance of InHM was quantitatively and graphically evaluated relative to discharge and piezometric data. The performance evaluations were conducted for 33 selected storms from 1990 through 1996. The selected storms were chosen because the upper weir peak discharge exceeds the upper weir peak discharge from the third sprinkling experiment in 1992. The storm characteristics (i.e., mean and maximum rainfall intensity, total depth, and duration) are presented in Table 2. Figure 5a shows the frequency histogram, which serves as an approximation of the probability distribution function, of rainfall rates from the automated rain gages for the nearly 7-year period of record. Figure 5b is an enlargement of the section of Figure 5a showing the probabilities of larger (i.e., 10 to 45 mm h−1) rainfall rates. The observed CB1 rainfall record a has a mean rainfall rate of 1.98 mm h−1, a median rate of 1.524 mm h−1, a standard deviation of 2.29 mm h−1, and a skewness of 4.52. Table 2 and Figures 5a and 5b provide a framework for analyzing model performance relative to storm characteristics. It should be noted that the model performance from the highest-intensity sprinkling experiment (i.e., experiment 2) simulated by Ebel et al. [2007b] was not as good as the model performance for the lower-intensity sprinkling experiments (i.e., experiments 1 and 3). A more comprehensive range of storms (see Table 2) will better assess uniqueness of the Base Case BVP [Ebel et al., 2007a, 2007b].
Table 2. Rainfall Characteristics of the 33 Selected Storms at CB1
where Pi are the predicted values, Oi are the observed values, n is the number of samples, and is the mean of the observed data. The EF statistic ranges from 1.0 to –∞, with 1.0 indicating a perfect match between Pi and Oi and EF less than zero indicating that is a better model than Pi for simulating Oi. A second measure of model performance used in this study is the mean absolute bias (MAB) (M. Kirkby, personal communication, 2005), also called the mean absolute error [Willmott, 1982], computed with the relation
 The third measure of model performance used in this effort is the percentage of error in the simulated variable, calculated using
 Many additional performance statistics exist [see Willmott, 1982; Loague and Green, 1991], but EF, MAB (or the similar root mean square error), and errors in the timing, magnitude, and volumes of simulated and observed discharge are the most frequently employed in hydrologic-response simulation at the catchment scale.
6.2. Simulated Discharge Evaluation: Base Case
Table 3 presents the Base Case simulated discharge performance statistics for the 33 storms and the individual years from 1990 through 1996. Perusal of Table 3 reveals that most of the individual events are not well simulated. Only five of the 33 storms have positive discharge EF values and only one of the years (1993) has a positive discharge EF value. It is worth noting that sprinkling experiments 1 and 3 (storms 5 and 17 in Table 3) were not as well simulated as in the event-based simulations reported by Ebel et al. [2007b] for three principal reasons: (1) the automated rain gage record used in the simulations reported here underestimates the sprinkling rates during the experiments because of wind-driven undercatch related to the sprinkler heights [see Ebel et al., 2007a], (2) the event-based simulations allow more control over the ICs at the start of the sprinkling experiment simulation, and (3) the event-based simulations incorporate hysteresis in the characteristic curves for the soil.
Table 3. Model Performance for the Base Case InHM Simulated Discharges at the CB1 Upper Weira
 The best simulated storms (based on EF values) are 2, 7, 8, 9, 10, and 17. The mean precipitation rates observed during these storms (approximately 2.0 mm h−1; see Table 2) are slightly higher than those observed during the sprinkling experiments and the storm depths (with the exception of storm 17, which is sprinkling experiment 3) have small total rainfall depths. Examination of Figure 4a shows that the majority of the rainfall rates observed at CB1 are approximately 2 mm h−1. The storms with the smallest total rainfall depths are also, in general, the storms with the smallest errors in total simulated discharge (e.g., storms 2, 7, 8, 10, and 27). The smallest simulated errors in peak discharge are also for the storms with the smaller rainfall depths (e.g., storms 2, 7, 8, 9, 10, 14, and 23). Storms 11, 15, 18, and 19, which all have peak rainfall rates larger than 20 mm h−1, are not well simulated on the basis of EF, MAB, and errors in peak discharge. Examination of Figure 5b shows that rainfall rates exceeding 20 mm h−1 are rare at CB1, but that observed rates range up to the 40 mm h−1 value observed prior to the 1996 slope failure. Storms 20, 22, and 29 all have mean rainfall rates that are approximately equal to the mean irrigation rates applied during sprinkling experiments 1 and 3, but these storms are not well simulated relative to EF and total discharge. The events simulated worst, on the basis of the EF values in Table 3, overestimate the peak magnitude by a factor of 2 or more. The peak simulated discharges are consistently larger than the observed (i.e., 28 out of the 33 storms). The simulated timing of peak discharge is also consistently larger (slower) than the observed (i.e., 32 out of the 33 storms). The simulated total discharge follows the same trend as the peak magnitude and timing statistics, with 30 out of the 33 storms having higher simulated, relative to observed, total discharges.
Figure 6 shows the observed rainfall, observed and InHM Base Case simulated upper weir discharge, and observed and InHM simulated cumulative discharge for three of the seven simulated years. The three years shown in Figure 6 were chosen to represent the worst (1991), the median (1992), and the best (1993) simulated discharges based on EF values for discharge (see Table 3). Gaps in the observed discharge record shown in Figure 2b are included in the simulated cumulative discharge shown in Figure 6 and Table 3. All of the cumulative simulated discharges are larger than the observed cumulative discharges (i.e., 1991 is more than twice the observed, 1992 overestimates by approximately 40%, and 1993 overestimates by approximately 60%). The simulated hydrograph shown in Figure 6 for the worst simulated discharge year (1991) supports the same conclusions reached from Table 3 (i.e., simulated discharge magnitude is generally overpredicted, and peak simulated discharge is slower than peak observed discharge). Storms 9 and 10 (see Table 3) in 1991 have discharge EF values greater than zero, but storms 11, 12, and 13 are poorly simulated. The year 1992 includes sprinkling experiment 3 (storm 17) and one of the three largest observed storms (see storm 15 in Table 3), with respect to peak discharge, of the 7-year simulated period. All of the simulated storms in 1992, except the third sprinkling experiment (storm 17) have oversimulated peak discharges. The year 1993 is the best simulated annual discharge because of data gaps during the larger storms (see Figures 2b and 6), which results in the EF statistic being weighted toward the dry season when both simulated and observed discharges are near zero.
 The trends to be gleaned, relative to improving the InHM Base Case BVP at CB1 on the basis of the simulated versus observed discharge statistics in Table 3 and hydrographs in Figure 6, are clear in some respects and ambiguous in others. The best simulated EF values (i.e., storms 2, 7, 8, 9, 10, and 17 in Table 3) are for events with small storm depths and peak observed discharges of the same magnitude as the two lower-intensity sprinkling experiments simulated by Ebel et al. [2007b]. Storms with maximum rainfall rates greater than 20 mm h−1 (e.g., 11, 15, 18, and 19) are poorly simulated and storm with rainfall rates less than or nearly equal to the irrigation rates used in sprinkling experiments 1 and 3 are also not well simulated. Searching for further trends employing measures of IC, such as the antecedent precipitation index [see Kohler and Linsley, 1951], yielded little in terms of identifiable trends. One clear trend, based on the peak discharge timing and magnitude errors and the total discharge error, is that the InHM simulations consistently overestimate discharge magnitude. Another obvious trend is that the Base Case simulated peak discharge response is slow relative to the observed. It appears that the simulated storage in the unsaturated zone allows InHM to simulate storms with a small rainfall depth and peak discharge approximately equal to sprinkling experiments 1 and 3 reasonably well. Storms with larger rainfall depths highlight an error in the Base Case BVP related to either the way water is transmitted in the weathered bedrock (thus bypassing the upper weir that collects soil-water flow) or additional storage that is not well simulated (perhaps in the fractured weathered and unweathered bedrock).
6.3. Simulated Piezometric Response Evaluation: Base Case
 The CB1 piezometers were poorly simulated, in terms of simulated saturation and pressure head magnitudes and dynamics, during the continuous simulations using the Base Case BVP. Nearly all the simulated piezometers remained below the 0.03 m detection threshold of the observed piezometers (owing to a 0.03 m PVC cap height) throughout the entire 7-year simulation period, with the exception of piezometer nest 0–1 (see Figure 1). These simulations support the observation-based conclusions from Montgomery et al. [1997, 2002] that pore water pressure generation in the CB1 soil is principally controlled by bedrock fracture flow exfiltrating into the overlying soil, not perched saturation at the soil-bedrock interface. The extremely high saturated hydraulic conductivity of the soil, the large topographic component to the hydraulic gradient (owing to the steep slope), and the hydraulically conductive bedrock prevent perched saturation at the soil-bedrock interface from contributing significantly to pore water pressure generation in the soil. This is consistent with the simulated piezometric response from the event-based sprinkling experiment simulations from Ebel et al. [2007b], which lead to the conclusion that capturing the pore water pressure dynamics at the soil-bedrock interface would be difficult without incorporating spatially variable fracture flow through the weathered bedrock.
7.1. Why Does the Base Case BVP Poorly Simulate the CB1 Hydrologic Response?
 It is clear from the simulated upper weir runoff and piezometric response that the Base Case BVP did not perform well over all the 33 storms evaluated. Numerous potential reasons for this nonuniqueness exist. For example, there is some uncertainty in the subsurface hydraulic parameterization (e.g., effective hydraulic conductivities). It is also clear from the previous field data analysis [Anderson et al., 1997b; Montgomery et al., 1997, 2002] that bedrock fracture flow is critical for developing localized regions of elevated pore water pressure and saturation in the soil. Montgomery et al.  noted that the deeper water table position at CB1 is high and may interact with the soil to influence runoff production and piezometric response immediately upslope of the upper weir. One possibility is that a rise in the deeper water table, on a seasonal and storm-driven basis, may connect the soil with the saturated subsurface via bedrock fracture flow pathways driven by the large gradient associated with the steep slope [Montgomery et al., 2002]. If this is the case, the poor simulation of the deeper water table position (i.e., the simulated water table position is commonly several meters below the observed water table position in the bedrock piezometers) combined with not representing fractures in the weathered and unweathered bedrock may preclude accurate simulation of the piezometric response.
 A suite of hydrologic-response simulations, in the context of a sensitivity analysis, were conducted for the year 1990 to investigate problems with the Base Case BVP and uncertainty in the InHM simulations. These alternate BVPs were selected to represent parameterizations that could be deemed viable on the basis of observations and measurements at CB1, rather than arbitrarily adjusting the Base Case parameters (e.g., ±20%). All the sensitivity analyses employ the same ICs as the Base Case, with the exception of some of the BC sensitivity analyses. While using the same ICs facilitates direct comparisons with the Base Case, it is possible that the low simulated water table position for the IC may affect the outcome of the alternative simulations. The sensitivity analyses are summarized in the following sections. Selected simulations that improved aspects of runoff generation are shown in Figure 7. EF statistics for the selected simulations shown in Figure 7, compared to the Base Case, are given in Table 4.
Table 4. Model Performances for InHM Simulated Discharges at the CB1 Upper Weira
This BVP is the same as the Base Case, except the hydraulic conductivity for the unweathered bedrock (see Figure 3) is twice as high.
This BVP is the same as the Base Case, except the hydraulic conductivity for the weathered bedrock (see Figure 3) is set to the arithmetic mean of the slug tests [see Ebel et al., 2007a].
This BVP is the same as the Base Case, except the hydraulic conductivity for the soil (see Figure 3) is set to the arithmetic mean of the slug tests, not including the surface hydraulic conductivity estimates [see Ebel et al., 2007a].
7.1.1. Saturated Hydraulic Conductivity of the Unweathered Bedrock
 As neither pump nor slug tests were conducted in the CB1 deep well, the water table decline during the dry season was used to estimate the hydraulic conductivity of the unweathered bedrock for the Base Case BVP. To investigate the sensitivity of the simulated hydrologic response to the unweathered bedrock saturated hydraulic conductivity, a simulation was conducted with the unweathered bedrock saturated hydraulic conductivity equal to twice the value (i.e., 1.0 × 10−6 m s−1) used in the Base Case simulation. It is important to note that the weathered and unweathered bedrock hydraulic conductivities were set to the same value in this simulation (as in the Base Case). Simulated runoff generation is moderately sensitive to doubling the unweathered bedrock hydraulic conductivity (see Figure 7 and Table 4). The timing of the simulated discharge response is essentially unaffected by doubling the unweathered bedrock conductivity. Table 4 shows some improvement in the simulated discharge EF values owing to the reduction in simulated discharge magnitudes because the higher unweathered bedrock conductivity facilitates more leakage to the deeper groundwater system. The simulated piezometric response from doubling the unweathered bedrock hydraulic conductivity is equally as poor as for the Base Case.
 In addition to fractures, another parameterization uncertainty is lithologic heterogeneity of the CB1 bedrock. Lithologic analyses of the unweathered bedrock core from the deep well installation described by Anderson et al.  revealed nearly flat bedded, primarily sandstone lithology with some shale beds. The shale beds were principally concentrated at 277 m elevation and a fault was observed at 275 m, coinciding with the minimum water table height observed in the deep well [Anderson et al., 2002]. A simulation was conducted incorporating a planar low-permeability zone located at 277 m elevation within the unweathered bedrock layer to represent the shale beds using a hydraulic conductivity of 1.0 × 10−11 m s−1 [Freeze and Cherry, 1979]. The shale layer is not present in the weathered bedrock layer in the simulations and therefore does not intersect the soil bedrock interface. The effect of the low-permeability zone on the simulated piezometric response in the soil and the simulated upper weir discharge was not significant.
7.1.2. Saturated Hydraulic Conductivity of the Weathered Bedrock
 Simulations investigating the control of weathered bedrock saturated conductivity on runoff and piezometric responses were conducted using the geometric mean (1.7 × 10−6 m s−1) and arithmetic mean (1.7 × 10−5 m s−1) of the weathered bedrock slug test estimates [see Ebel et al., 2007a]. The timing of simulated discharge was insensitive to the weathered bedrock hydraulic conductivity parameterization. The simulated discharge magnitude was moderately sensitive to the hydraulic conductivity of the weathered bedrock. As most of the Base Case simulated discharges are overestimated, increasing the weathered bedrock conductivity results in less simulated discharge (through the soil) captured at the upper weir. The discharge EF values are improved considerably for the larger storms (e.g., storms 1, 3, 4, and 6 in Table 4 and Figure 7) and are slightly worse for the smaller storms (e.g., 2, 5, 7, and 8 in Table 4 and Figure 7) for the arithmetic mean parameterization. With respect to storm rainfall depth, the storms with smaller depths (e.g., 2 and 7 in Table 2) are simulated worse and the storms with larger rainfall depths (e.g., 1 and 3 in Table 2) are better simulated with respect to EF for the arithmetic mean parameterization. The discharge EF for all of 1990 is positive for the arithmetic mean parameterization. The simulated piezometers are as poor as in the Base Case.
7.1.3. Saturated Hydraulic Conductivity of the Soil
 Two new simulations were conducted using the arithmetic (i.e., 1.1 × 10−4 m s−1) and geometric (i.e., 6.2 × 10−5 m s−1) mean estimates of saturated hydraulic conductivity from the piezometer slug tests. It should be noted that the Base Case soil saturated hydraulic conductivity value (see Table 1) included the surface saturated hydraulic conductivity estimates from the in situ retention curve experiments by Torres et al. . The simulated runoff generation was extremely sensitive to variations in saturated hydraulic conductivity for the soil, which is consistent with previous simulation-based studies [e.g., Freeze, 1972a, 1972b; Rogers et al., 1985]. Not surprisingly, the timing of simulated runoff generation was considerably worse, relative to the Base Case, when the soil hydraulic conductivity was decreased (see all the storms in Figure 7). Decreasing the hydraulic conductivity of the soil did improve the EF values for the larger storms (e.g., storms 1, 3, and 4 in Table 4 and Figure 7) as well as for the entire year of 1990 owing to the reduction in simulated discharge magnitude. The simulated piezometer response remained as poor as for the Base Case.
7.1.4. Role of Bedrock Fractures
Montgomery et al. [1997, 2002] determined that piezometric response in the soil and saprolite at CB1 was largely controlled by fracture discharge from the weathered bedrock. Unfortunately, insufficient data are available to characterize the locations, connectivities, and hydraulic properties of the fractures in the CB1 weathered and unweathered bedrock. However, estimates of saturated hydraulic conductivity from the piezometer slug tests at the interface of the soil/saprolite with the bedrock may serve as a proxy for the locations of hydraulically active fractures. Figure 8 shows a Kriged map of the base-10 logarithm of estimates of the saturated hydraulic conductivity from slug tests in the deepest piezometers (i.e., those at the colluvium/saprolite and bedrock interface). Obviously, collapsing the 3-D saturated hydraulic conductivity data set into 2-D (plan view) poses serious limitations on the accuracy of any spatial estimation method, and therefore Figure 8 is best considered in a qualitative sense. Figure 8 reveals that the saturated hydraulic conductivities in the soil are generally high [also see Montgomery et al., 2002; Ebel et al., 2007a]. In Figure 8, the areas of lower conductivity are labeled with the piezometer numbers (also see Figure 1). Table 5 lists all the piezometers in the soil layer that exhibited a large pressure head response during sprinkling experiment 3 as well as the saturated hydraulic conductivity estimates for the soil/saprolite and weathered bedrock at the same locations. It should be noted that what was considered a large pressure head response is greater than 0.1 m, which is still relatively small owing to the high hydraulic conductivity of the soil and large topographic component of the hydraulic gradient. Inspection of Figure 8 and Table 5 shows a correlation between higher piezometric responses and low saturated hydraulic conductivity estimates in the soil, saprolite, and weathered bedrock. Notable exceptions to the correlation between piezometric response and saturated hydraulic conductivity estimates shown in Table 5 include some piezometers at the down-gradient end of CB1 (0–1, 0–1A, 0–3, and 1–3), where convergent flow and saturation backing up behind the impermeable wing walls of the upper weir and exfiltration owing to the steep slope may cause high pore water pressures. The correlation between piezometric response and saturated hydraulic conductivity estimates suggests that the areas of low estimated saturated conductivity may serve as a proxy for fracture locations, which is supported by partial field mapping of exposed weathered bedrock fracture locations (D. R. Montgomery et al., manuscript in preparation, 2008), after the 1996 landslide removed portions the overlying soil, showing fractures near piezometer nests 7–6, 6–3, 5–3, and 2–2 (see Figure 1).
Table 5. Piezometers Exhibiting Greater Than 0.1 m of Pressure Head During Experiment 3 and Slug Test Estimates of Saturated Hydraulic Conductivity in Those Piezometersa
 It is unclear whether the low saturated hydraulic conductivity values presented in Table 5 are representative of the porous media or are a byproduct of hydrologic conditions during the slug tests. The slug tests were conducted during the third sprinkling experiment, and exfiltrating gradients from the weathered bedrock fractures into the soil (see the analysis of Montgomery et al. ) could have prevented the entry of the water “slug” into the soil through the piezometer screen, thereby lowering the estimate of saturated hydraulic conductivity. There is, however, no independent evidence that exfiltrating gradients significantly impacted the slug testing at CB1. Exfiltrating gradients from the weathered bedrock into soil are present at some of the piezometers exhibiting low values of saturated hydraulic conductivity (for example, piezometer nest 5–3) whereas infiltrating gradients from the soil into the weathered bedrock are present at other piezometers exhibiting low values of saturated hydraulic conductivity (for example, piezometer nest 7–6). It is also possible that the weathered bedrock fractures are hydraulically active because of emplacement in a low saturated hydraulic conductivity matrix.
 Two new simulations were conducted to investigate whether the low-conductivity areas in Figure 8 represent actual low-conductivity zones or areas of high conductivity that inhibit slug testing owing to exfiltrating weathered bedrock fracture flow. Localized hydraulic conductivity zones (either high or low saturated hydraulic conductivity values) determined from Figure 8 were embedded into the Base Case parameterization. The low- or high-conductivity zones fully penetrate the saprolite and weathered bedrock and extend 10 m into the unweathered bedrock. For the low-conductivity case, a saturated hydraulic conductivity of 5.4 × 10−8 m s−1 was used, which corresponds to the minimum slug test estimate in the weathered bedrock [see Ebel et al., 2007a]. For the high-conductivity case, the estimated saturated hydraulic conductivity of the weathered bedrock fractures, based on the bromide tracer tests of Anderson et al. [1997b], was 2.0 × 10−3 m s−1. Neither the high- nor low-conductivity simulation produced major differences in either simulated discharge magnitude or timing relative to the Base Case. Not surprisingly, the simulated piezometric response for both the high- and low-conductivity zone simulations was different than the Base Case only for the localized regions near the specified conductivity anomalies. The high-conductivity zone simulation better represents the dynamics (both timing and magnitude) of observed piezometric response than either the Base Case or the low-conductivity zone simulation. However, the high-conductivity zone simulation still does not capture the magnitudes of piezometric response to an acceptable degree of accuracy, consistently underestimating the observed magnitude of piezometric response. While using the spatial patterns of saturated hydraulic conductivity estimates shown in Figure 8 as a proxy for weathered bedrock fracture locations shows some promise, it is likely that the poor simulated water table position combined with the isolated nature of the specified conductivity variations (i.e., the connectivity is unknown and unrepresented) are overwhelming limiting factors. While other research efforts have had success simulating fracture flow in deterministically defined fracture networks [e.g., Lapcevic, 1997], there is seldom sufficient data to employ such an approach at the field scale [see Novakowski et al., 2007], as is the case at CB1.
7.1.5. Role of Subsurface Boundary Conditions
 BCs can significantly influence simulated hydrologic response, and there is some uncertainty regarding the specification of the subsurface BCs at CB1. As noted previously, using a no-flow up-gradient subsurface BC (as in the work of Ebel et al. [2007b]) causes the water table at the ridge crest to drop too low in the long-term simulations conducted here, which is why a local head BC was used to attempt to better replicate the deeper water table position. However, the local head (i.e., specified head) BC used as the up-gradient BC (ABCD in Figure 3) changes the water balance by adding water into the subsurface. A water balance for the 1990 simulated hydrologic response shows that 1.7 times the volume of total rainfall is added into the subsurface by the up-gradient local head BC. However, this addition of water does not contribute to the upper weir discharge, based on the water balance, as the up-gradient local head BC increases the 1990 cumulative simulated discharge by only 0.2 m3 or approximately 0.2 mm of runoff depth, relative to a simulation conducted using the Base Case parameterization with an impermeable up-gradient subsurface BC rather than a local head BC. The water added by the up-gradient local head BC flows out of the down-gradient local head BC (EHIJ in Figure 3) rather than contributing to runoff. The largest difference, other than the increase in the down-gradient BC flux, between having the up-gradient BC being parameterized as a local head rather than a no flow BC is the change in the unweathered bedrock storage (i.e., the deeper water table height). The simulation with the no flow up-gradient BC loses nearly 12 times as much volume in storage from the unweathered bedrock as the simulation with the up-gradient local head BC does.
 To test whether other down-gradient BCs would produce simulated deep water table positions that are more similar to observed values (at the ridge crest), two simulations were conducted for 1990 employing a no flow up-gradient BC combined with (1) a “radiation” down-gradient BC that employs an explicit boundary flux based on the upstream hydraulic gradient of the boundary node and (2) a no-flow down-gradient BC. All of the simulations run for the BC sensitivity analyses in this section use the 1990 record to maintain consistency with the other sensitivity analyses in this effort. Both the radiation and no-flow downstream BCs produce a higher water table near the downstream BC, but the upstream deeper water table is poorly simulated on the basis of comparison with the observations from the ridge crest well in other years (both simulations underestimate the water table height, note that there is no deep well data in 1990). Changing the IC to set the initial water table above the level typically observed in the ridge crest well at the beginning of January does not maintain a high water table in the up-gradient portion of CB1 through the spring months. It should be noted that there are few physically realistic choices of a subsurface BC for the down-gradient boundary. This is because the catchment boundaries at CB1 were chosen for the original observational study on the basis of surface features, such as the surface topography and a seepage face (where the upper weir was constructed). Essentially all of the subsurface instrumentation and subsurface characterization is located within these specified boundaries.
7.1.6. Consideration of Evapotranspiration
 Two new simulations were conducted to examine the affect of considering evapotranspiration, which was not considered in the Base Case, for the simulated hydrologic response during 1990. Potential evapotranspiration for one of the simulations was estimated using the Thornthwaite method [Thornthwaite and Mather, 1955] corrected for latitude (see Table 5–2 from Dunne and Leopold ) using daily temperature data from the North Bend Municipal Airport (from National Climatic Data Center, where daily weather data for the North Bend Municipal Airport are available at http://cdo.ncdc.noaa.gov/ulcd/ULCD) located 15 km away and 300 m lower in elevation relative to CB1. While the Thornthwaite method is not a robust estimation technique for potential evapotranspiration, it is consistent with the temporal discretization and type of meteorological data available for CB1 during the simulation period. The second simulation estimated potential evapotranspiration using the method of Hargreaves and Allen ; also see the work of Hargreaves and Samani  and Hargreaves et al. . The same meteorological data were used for the Hargreaves and Allen estimates as for the Thornthwaite estimates. Potential evapotranspiration is converted to actual evapotranspiration by scaling the potential estimate by the InHM-simulated soil saturation. The actual evapotranspiration is removed from the subsurface (during the simulation) distributed linearly over an average vegetation rooting depth of 0.5 m, based on the rooting depth measurements of Schmidt  at CB1. Evapotranspiration is set to zero during rainfall events.
 Simulated actual evapotranspiration amounts for both the Thornthwaite and Hargreaves methods are small, totaling only 24 mm for the Thornthwaite method and 76 mm for the Hargreaves method for the entire year of 1990. These small ET amounts are 1% (for the Thornthwaite method) or 3% (for the Hargreaves method) of the 2150 mm of rainfall and sprinkling that fell on CB1 in 1990. Analysis of the effects of inclusion of ET will focus on comparing the simulation using the larger Hargreaves potential evapotranspiration estimates with the Base Case. Comparison of the peak simulated discharge for the eight storms in 1990 yields a difference between the Base Case with and without evapotranspiration of 0.008 L s−1 on average, with a maximum of 0.03 L s−1 for storm 4. The percent differences in timing of peak discharge are indistinguishable, at the 600 s observed discharge resolution, between the Base Case with and without evapotranspiration. Incorporation of evapotranspiration improved the EF values relative to the Base Case, but only by 0.06 [-] on average with a maximum improvement of 0.21 [-] for storm 4 (improving from −2.08 [-] for the Base Case to −1.87 [-]. Inclusion of evapotranspiration had a minimal effect on simulated piezometric response during the eight storms evaluated in 1990, which is a result of the Base Case simulated piezometric response undersimulating the observed piezometric response. Incorporating evapotranspiration causes drier simulated ICs in the soil leading to further undersimulated piezometric response when evapotranspiration is included.
 While the 76 mm actual evapotranspiration estimate for the Hargreaves method for the entire year of 1990 may seem small, these estimates are consistent with the values reported by other researchers for young saplings in clear-cut areas. For example, Livingston and Black  measured transpiration rates from 1- to 3-year-old Douglas Fir seedlings on a south facing clear-cut slope and found that transpiration rates varied from 0.1–1.1 mm d−1, with a cluster of values in the 0.3–0.4 mm d−1 range. It should be noted that the CB1 seedlings were 1 year old in 1990 and that broadleaf vegetation, primarily consisting of alder (Alnus) and blackberry (Rubus), was periodically trimmed from 1990 to 1992. The north facing Coos Bay aspect (shown in Figure 1) combined with the 43° slope also reduces evapotranspiration. Additional water inputs, such as fog drip from coastal fog, are not considered in this study and may supply water to plants at CB1 in the summer months. The water balance analyses by Montgomery et al.  at the catchment adjacent to CB1 found that total runoff accounted for 87–93% of the rainfall, leaving 7–13% of the water balance partitioned between leakage to a deep groundwater system and evapotranspiration, which provides observational evidence that evapotranspiration is a minimal component of the water balance. On the basis of the simulations reported here, neglecting evapotranspiration during long-term hydrologic-response simulation at CB1 is a reasonable hydrologic assumption for investigations focused on storm-scale hydrologic response.
7.2. BVP Improvements Based on Sensitivity Analyses
 The objective of this section is to incorporate some of the BVP aspects from the sensitivity analyses that positively affect simulated discharge and piezometric response in an attempt to improve the uniqueness of the Base Case BVP. The differences between the Base Case BVP and the Improved Case BVP are summarized below. The BCs are the same as the Base Case except that, owing to the lack of continuous deep well data for the period from 1990 through 1993 (see Figure 2b), the 1995 well data are used to parameterize the up-gradient local head BC. The 1995 well data are chosen because of the three complete years of well records (i.e., 1994–1996); the 1995 data are higher in magnitude than the 1994 data but lower in magnitude than the 1996 data. The unweathered bedrock was changed to incorporate the low saturated hydraulic conductivity shale layer, as described in section 7.1.1. The weathered bedrock saturated hydraulic conductivity is parameterized using the arithmetic mean of the piezometer slug test estimates. The saprolite saturated hydraulic conductivity was not specifically analyzed in the sensitivity analysis in this effort; however, the event-based simulations of Ebel et al. [2007b] suggested that the saturated hydraulic conductivity value used for the saprolite in the Base Case was too high. Consequently, the geometric mean of the piezometer slug test estimates in the saprolite [see Ebel et al., 2007a] was used to parameterize the saturated hydraulic conductivity of the saprolite layer, which is now 2.0 × 10−5 m s−1. The ICs used for the 1990 Improved Case simulations are the same as those employed for the 1990 Base Case simulations.
Table 6 presents the performance statistics for the Improved Case simulation results for the 4-year period from the start of 1990 through 1993. Figure 9 shows observed and simulated Base Case and Improved Case hydrographs from eight selected storms from Table 6. Figure 10 presents observed and simulated cumulative discharges for 1990 through 1993. The Improved Case results are only shown for 1990 through 1993 because CB1 was more carefully monitored and maintained during this period, which includes the sprinkling experiments. The observed discharge and piezometer records are also more complete between 1990 and 1993 (see Figure 2b). Because the period from 1990 through 1993 includes more than half the simulated Base Case storms (i.e., 21 out of 33) and an adequate range of storm magnitudes, generalized conclusions can be made about the Improved Case BVP. Examination of Table 6 and Figure 9 reveals that the Improved Case simulated discharges are generally better than the Base Case simulated discharges, especially for storms with larger rainfall depths (i.e., storms 1, 3, 4, 6, 12, 13, 15, 16, 21). The EF and total discharge error values are worse for the smaller rainfall depth events for the Improved Case. Examination of the initial discharges for storms 4, 12, 14, 20, and 21 in Figure 9 suggests that the Improved Case is better than the Base Case for simulating hydrologic processes between storms. Figure 10 and Table 6 show that the cumulative simulated discharge is better for the Improved Case for 1990 and 1991. The cumulative simulated discharge for the Improved Case oversimulates cumulative discharge in 1990 because of the gap in the observed discharge from mid-October through mid-November (see Figures 2b and 7). The Improved Case appears to undersimulate the cumulative discharge in 1992 and 1993. However, in 1992 there is a period from 4 March through 8 April where the observed discharge record holds steadily at 0.1 L s−1 when there is little rainfall (see Figure 6), which adds 227 m3 to the observed cumulative discharge record. There is another similar period from 12 December 1992 through 19 January 1993 (see Figure 6), which adds 133 m3 in 1992 and 162 m3 in 1993. Another period of sustained discharge data near 0.1 L s−1 exists from 23 April to 11 May 1993, which adds 271 m3. The steep slope and highly hydraulically conductive soil at CB1 prevent significant (i.e., near 0.1 L s−1) sustained discharges for weeks at a time and draw the observed discharge data during the aforementioned periods into question and could make the observed cumulative discharges during 1992 and 1993 closer to the Improved Case simulated values. Another interesting observation from the Improved Case hydrographs shown in Figure 9 is the asymmetry in time of the observed hydrographs relative to the more symmetrical simulated hydrographs. It is possible that rapid fracture flow through the unweathered bedrock contributes to the rapid rise in discharge while the hydrograph recession is slower, reflecting drainage from the unsaturated zone (in agreement with the CB1 data analysis by Montgomery and Dietrich ).
Table 6. Model Performance for the Improved Case InHM Simulated Discharges at the CB1 Upper Weira
 While the Improved Case appears to represent the bulk behavior of weathered bedrock, with respect to better simulation of runoff for large storms, this equivalent porous medium approach does not capture the localized piezometric response well. The simulated piezometric response is slightly better for the Improved Case, relative to the Base Case, but still not of acceptable accuracy to drive process-based hydrogeomorphology investigations. Novakowski et al.  points out that the equivalent porous media approach will not work well when the study domain is small enough that individual fractures influence the flow system, which seems to be the case as CB1. A simulation combining the Improved Case parameterization with the high saturated conductivity zones used to represent bedrock fractures (see section 7.1.4) improves the simulated piezometric response, relative to both the simulated piezometric response from the Base Case with high conductivity zones to represent fractures and the Improved Case with no high-conductivity fracture zones, but still consistently underestimates piezometric response relative to the observed values.
7.3. Insights and Future Directions
 While the Improved Case simulations represent a clear improvement over the Base Case simulations relative to simulated discharge and piezometric response, the Improved Case simulation results are far from perfect. Relative to the simulated piezometric response, better representation of the weathered and unweathered fracture flow in the CB1 hydrologic-response simulations is needed to accurately simulate pore water pressure development in the soil. This is not surprising, given that previous studies have demonstrated that spatial variations in hydraulic conductivity can control pore water pressure hot spots [e.g., Pierson, 1977; Wilson and Dietrich, 1987; Wilson et al., 1989; Johnson and Sitar, 1990; Reid and Iverson, 1992; Montgomery et al., 1997, 2002]. It would likely be beneficial to complete a “post-mortem” at CB1 by removing all the soil and saprolite and inspecting the locations of exposed fractures. As noted by Novakowski et al. , characterization of the spatial distribution and correlation of fractures at the field scale is prohibitively expensive using traditional techniques. However, recent advances in near surface geophysical techniques have shown promising results in detecting the locations [e.g., Holden et al., 2002] and connectivities [e.g., Holden, 2004] of subsurface preferential flow paths and may prove useful for future studies at locations similar to CB1. Hydraulic tomography [see Gottlieb and Dietrich, 1995; Yeh and Liu, 2000] has also shown to prove instructive for detailed characterization of the unsaturated zone [Yeh and Šimůnek, 2002] and fractures in rock [McDermott et al., 2003; Renshaw, 1996]. Techniques such as hydraulic tomography may prove useful for future studies that employ hydrologic-response models similar to InHM in locations where accurate representation of fracture locations and connectivities are important.
 The CB1 results presented here suggest that the deeper water table plays a larger role in pore water pressure development than would be expected. Figure 11 shows time series of observed pressure head in selected weathered bedrock piezometers and the deep well (see Figure 1 for locations) from 1993 through the slope failure. Figure 11 illustrates the impact of the water table decline during the dry season (i.e., July through October) on observed weathered bedrock piezometric response in piezometers B1, B9, and B13 in 1995; note that the summer period data is missing for 1993, 1994, and 1996 (see also Figure 2). Correlation coefficients between deep well pressure head and weathered bedrock piezometers are 0.002 for piezometer B1, 0.27 for piezometer B12, 0.64 for piezometer B13, 0.53 for piezometer B9, and 0.52 for piezometer B1. The correlation coefficients suggest that, for the down-gradient weathered bedrock piezometers (i.e., B13, B9, and B1), that a higher water table position (i.e., larger pressure head in the deep well) produces larger magnitude piezometric response. The up-gradient weathered bedrock piezometers (i.e., B16 and B12) seem to have pressure head dynamics that appear independent of the water table position. It should be noted, however, that a high water table position is common in the winter months when large precipitation events occur at CB1 that increase the magnitude of observed piezometric response, suggesting that the correlations are not without uncertainty. The conclusions drawn from Figure 11 echo those of Montgomery et al. , who found that for the sprinkling experiments and natural storms, piezometric response in the soil and weathered bedrock was dependent on both position in the CB1 catchment and seasonal water table position. For example, Montgomery et al.  noted that for soil piezometers in nest 5–3 and the nearby weathered bedrock piezometer B13, which occur in the midslope section of CB1, that early season storms exhibited infiltrating gradients from the soil into the bedrock while midwinter storms when the water table was several meters higher exhibited exfiltrating gradients with water flowing from the weathered bedrock into the overlying soil. Montgomery et al.  also found that piezometers up gradient (further toward the ridge crest) of soil piezometer 5–3 and bedrock piezometer B13 illustrated consistently infiltrating gradients regardless of water table position. On the basis of the data-based and simulation-based conclusions of previous CB1 efforts and this research, it seems that bedrock fracture flow, heterogeneity in the unweathered bedrock, and wetting front propagation through the unsaturated zone all complicate simulation of pore water pressure at CB1 and may all need to be well characterized (and correctly represented in the BVP) to accurately simulate the distributed hydrologic response.
Anderson et al. [1997b], Montgomery et al. , and Montgomery and Dietrich  concluded that the runoff generation at the CB1 upper weir during the sprinkling experiments was controlled by a “subsurface variable source area” at the soil-bedrock interface. The work by Montgomery et al. [1997, 2002] demonstrated that the patchy subsurface saturation occurring at the soil-bedrock interface (i.e., the subsurface variable source area that feeds runoff generation) is produced by flow exfiltrating from weathered bedrock fractures. If, as suggested in the work by Montgomery et al.  and this effort, exfiltrating fracture flow from the weathered bedrock into the overlying soil depends on the seasonal dynamics of the deeper water table and these factors then control the extent of the subsurface variable source area then correct simulation of both the deeper water table and the fracture flow would be critical for correctly simulating runoff generation at CB1. If this is the case at CB1, then simulations that reproduce the observed runoff without reproducing the fracture-flow-dominated piezometric response or the deeper water table position may be “right for the wrong reasons.”
 The issue then becomes what information is needed to get the simulated hydrologic response “right for the right reasons”? It is clear that despite the immense effort invested into characterizing the CB1 hydrologic BVP with fieldwork and observations, information on the unweathered bedrock layer is insufficient. No information exists regarding the water-retention curves and hydraulic conductivity functions of the unweathered bedrock, other than lithology. Pump tests at various depths within the deep well would have been useful for parameterizing the magnitude and variability with depth of the saturated hydraulic conductivity of the unweathered bedrock to account for heterogeneity and anisotropy as well as storage parameter estimates. The aforementioned hydraulic information for the unweathered bedrock would likely improve the simulated deeper water table and pore water pressures at CB1. Spatial lithologic characterization, including the presence of permeability contrasts at bedding planes such as the shale interbeds at CB1, may be critical for slope failure simulation at locations similar to CB1. For example, Iverson and Major  noted that horizontal seepage at a permeability contrast increases the probability of slope failure. Long-term monitoring in the unsaturated zone (i.e., soil-water content or tensiometric response) would also have helped to evaluate the simulated state variables in the unsaturated zone. There is also little information on the distributed position of the deeper water table throughout CB1. Such information could assist in finding the correct BCs, unweathered bedrock saturated conductivities, and unweathered bedrock storage parameters needed to correctly simulate the dynamics of the deeper water table. On the basis of the findings of this study, it appears that the characterization of the deeper subsurface may be needed for accurate simulation of near-surface hydrologic response for field sites like CB1 with steep slopes and hydrologically active deeper water tables.
8. Summary and Conclusions
Ebel et al. [2007b] had reasonable success simulating the hydrologic response to three sprinkling experiments using the physics-based InHM. The study reported here assessed the uniqueness of the BVP used by Ebel et al. [2007b], called the Base Case herein, for continuous InHM hydrologic-response simulation from 1990 through 1996. The Base Case BVP poorly reproduced the piezometric response (i.e., undersimulated pore water pressure magnitudes) and only the discharges from small magnitude storms (i.e., of similar peak discharge magnitude as the three sprinkling experiments) were well simulated. Sensitivity analyses of the BVP parameterization indicated that during the long-term CB1 hydrologic-response simulations, (1) soil-saturated hydraulic conductivity strongly impacts simulated discharge, (2) representation of layered permeability contrasts can moderately affect simulated discharge, (3) inclusion of evapotranspiration had a minimal effect on simulated integrated hydrologic response, and (4) simple representations of heterogeneity in saturated hydraulic conductivity attempting to mimic bedrock fractures improved simulated pore water pressures but still underestimated pore water pressure magnitudes.
 Improvements to the Base Case employing insights from the sensitivity analyses conducted in this study did improve the uniqueness of simulated discharge but did not improve the simulated piezometric response to an acceptable level. The inability of any of the simulations presented in this effort to reproduce the observed pore water pressure magnitudes and dynamics suggest that more information is needed to characterize the locations and connectivities of bedrock fractures for models like InHM to accurately simulate of hydrologic effects of fracture flow at locations like CB1. The results shown here also indicate the potential role of the deeper water table position in influencing the CB1 hydrologic response. It needs to be pointed out that care should be exercised when generalizing the simulation-based conclusions from this study to other locations. Relative to using models like InHM for simulation of hydrologically driven landslide initiation, this study shows that uniqueness can be a problem relative to employing a BVP used successfully for smaller magnitude storms to simulate a failure-causing storm. This study further supports the conclusion of Ebel and Loague  that simulating an integrated hydrologic response (i.e., discharge) reasonably well in no way guarantees that distributed hydrologic responses (e.g., pore water pressure) will be correctly simulated. Integrated hydrologic responses are potentially the least useful performance evaluation data relative to simulating hydrologically driven slope failure owing to the importance of the distributed pore water pressures. The results reported here indicate that further studies conducting detailed comparisons between observed and simulated hydrologic responses are needed before physics-based hydrologic-response models, similar to InHM, can be used reliably for operational purposes (e.g., landslide hazard assessment).
 The work reported here was supported by National Science Foundation grant EAR-0409133. The data collection efforts of people from UC Berkeley, including Ray Torres and Suzanne Anderson, as well as the Weyerhaeuser Company facilitated this study. Kevin Schmidt supplied additional data on the soil and saprolite depths. The presentation benefited from the thoughtful comments of Ben Mirus on an earlier manuscript.