Transient storage models are widely used in combination with tracer experiments to characterize stream reaches via calibrated parameter estimates. These parameters quantify the main transport and storage processes. However, it is implicitly assumed that calibrated parameters are uniquely identifiable and hence provide a unique characterization of the stream. We investigate parameter identifiability along with the stream conditions that control identifiability for 10 breakthrough curves (BTC) for 100 m pulse injections along Stringer Creek, Montana, USA. Identifiability is assessed through global, variance-based sensitivity analysis of the one-dimensional transport with inflow and storage model (OTIS). Results indicate that the main channel area parameter A and the dispersion coefficient D were the most sensitive parameters and, therefore, likely to be identifiable across all timescales and reaches. Identifiability of transient storage zone size As fell into two categories along Stringer Creek. As was identifiable for lower elevation regions, corresponding to a constrained valley, higher stream slopes, and in-channel roughness, but not for upper stream regions, corresponding to a wider valley floor, flatter stream slopes, and low roughness. The storage zone exchange parameter α was nonidentifiable across all study reaches. Our results suggest that only some of the processes represented in the model will be relevant and, therefore, identifiable for pulse injection data. As such, calibrated parameter estimates should be accompanied by an assessment of parameter sensitivity or uncertainty. We also show that parameter identifiability varies with stream setting along Stringer Creek, suggesting that physical characteristics directly influence the identification of dominant stream processes.
 Transient storage models (TSM) are conceptually simple, widely used tools that have been applied to analyze solute transport in streams around the world [e.g., Bencala and Walters, 1983; Runkel, 1998; Edwardson et al., 2003; Gooseff et al., 2003a; Martinez and Wise, 2003; Keefe et al., 2004]. These models route mass in a one-dimensional framework via main-channel processes and exchange with a temporally static, spatially uniform transient storage zone. The transient storage zone is a conceptual representation of slower flow paths in the stream, which includes in-channel areas of slow flow, dead zones, and the hyporheic zone [Bencala and Walters, 1983]. The hyporheic zone is a spatially and temporally variable region where channel water interacts with groundwater [Findlay, 1995; Boulton et al., 1998]. Transient storage models can be used to predict mass transport [Runkel, 1998], though recent research has focused on the relationship between model parameters and physical stream characteristics to understand nutrient cycling [Edwardson et al., 2003], ecological functioning [Tate et al., 1995; Mulholland et al., 1997], geomorphological setting [Wondzell, 2006; Gooseff et al., 2007], and hydrological functioning [Harvey et al., 1996; Gooseff et al., 2003a]. Specifically, model parameters that characterize transient storage, including hyporheic exchange, are used to assess stream disturbance and biogeochemical functioning [Edwardson et al., 2003; Lautz et al., 2006; Lautz and Siegel, 2007], and to interpret flow path timescales [Harvey et al., 1996; Gooseff et al., 2003a]. Though multiple formulations of transient storage models exist, the one-dimensional transport with inflow and storage (OTIS) model [Runkel, 1998] is the most commonly applied transient storage model, and the focus of this study.
 Transient storage models represent stream processes through conceptual parameters which must be inferred indirectly from other data sources because they cannot be feasibly measured at reach scales [Wagener and Gupta, 2005]. Within OTIS, model parameters are tied to three processes affecting mass transport: advection, dispersion, and transient storage (Figure 1a). To quantify these processes, reach-representative parameters are estimated by optimizing the model to fit solute concentration data from tracer experiments [StreamSoluteWorkshop, 1990]. Stream tracer injection experiment data and flow measurements are used as inputs and boundary conditions to the transient storage model. The model parameters are calibrated to the observed downstream tracer concentrations, referred to as the solute breakthrough curve (BTC) [Runkel, 1998]. The resulting parameter estimates quantify the magnitude of processes inferred for a given stream [e.g., D'Angelo et al., 1993; Runkel, 2002; Edwardson et al., 2003].
 The crux of transient storage model calibration is that the parameter estimates are assumed to be uniquely identified from the experimental data (i.e., a unique single set of parameters can be determined that characterizes reach response for a given tracer experiment). This assumption is rarely tested. Lack of parameter identifiability is a problem common to all types of environmental models [Beven, 1989; Beven and Binley, 1992]. Past studies have suggested that TSM parameter nonidentifiability is a result of equifinality, as different combinations of parameters can reproduce observed BTCs with the same degree of accuracy [Harvey and Bencala, 1993; Harvey et al., 1996; Harvey and Wagner, 2000; Wagener et al., 2002]. Despite this recognized problem, researchers and practitioners commonly interpret calibrated parameter values without testing their uniqueness. Only unique, identifiable parameters should be used to formulate reliable conclusions relating parameter estimates from transient storage models to the characteristics of a stream [Harvey et al., 1996; Harvey and Wagner, 2000].
 The aim of this study is twofold: (1) to demonstrate a method for quantifying TSM parameter identifiability and interactions and (2) to use these quantities to link physical stream attributes to the dominance of flow processes. We, therefore, analyzed TSM parameter uniqueness for multiple reaches with varying physical settings along a single stream. The uniqueness of parameter estimates was tested via global, variance-based sensitivity analysis for four OTIS model parameters: the dispersion coefficient D, the main channel area A, the storage zone area As, and the exchange rate between the main channel and storage zone α. Global sensitivity analysis, both applied to the entire BTC and at each time step across the BTC, quantifies the individual effects of the parameters and their interactions.
 Experiments were performed in Stringer Creek, located within the Tenderfoot Creek Experimental Forest in Montana, USA. We analyzed pulse injection tracer test data from ten 100 m reaches selected along the valley from the initialization of Stringer Creek to its confluence with Tenderfoot Creek, to test how stream characteristics relate to parameter sensitivity. Tracer tests were performed under relatively constant flow conditions and base flow contributions. The 10 reaches selected along Stringer Creek also exhibit variability in vegetation, geology, and hydrology. A change in valley structure at 1200 m separates the upper from the lower portions of Stringer Creek and corresponds to general differences in physical setting (stream slope, valley morphology, riparian vegetation, and the presence of woody debris in the stream channel) [Payn et al., 2009, 2012; Patil et al., 2013; Ward et al., 2013]. Upstream of 1200 m, the valley is less constrained and characterized by wide valley floors and meadows. Downstream of this point, stream slopes and hillslope slopes are greater, the stream channel has more woody debris, and the valley is much more constrained than in the upper region. As the magnitude of stream storage and transport processes are a function of the physical setting within a given stream reach, we hypothesize that physical setting likely influences our ability to identify the parameters describing these processes; therefore, highlighting which processes are dominant in a given stream. We seek to identify the conditions (e.g., stream characteristics) associated with identifiable parameters for this series of reaches. Understanding parameter identifiability across these reaches will help clarify whether stream characteristics (e.g., stream slope, amount of in-stream vegetation) are associated with identifiable transient storage parameters.
2. The OTIS Model
 The one-dimensional transport with inflow and storage (OTIS) model, the most commonly used transient storage model [Runkel, 1998], estimates solute concentrations via mass balance equations for an advective main channel with an adjacent, static storage zone [Thackston and Schnelle, 1970; Bencala and Walters, 1983]. Within the main channel, solute transport is governed by advection, dispersion, and exchange with the transient storage zone; within the storage zone, solute transport is a function of the rate and quantity of water exchanged with the main channel and the concentration gradient between the transient storage zone and main channel (Figure 1b) [Runkel, 1998]. The storage zone represents the lumped effects of water exchange with surface and subsurface exchange, including exchange with the hyporheic zone, channel dead zones, eddies, or pools, and groundwater [Harvey and Wagner, 2000; Payn et al., 2009]. Readers should refer to Bencala and Walters  for the detailed derivation of model equations and Runkel  for details on the solution scheme.
 The OTIS model operates from the following equations
where t is time (s), x is distance (m), C is the solute concentration in the stream (mg L−1), Q is the volumetric flow rate (m3 s−1), A is the cross-sectional area of the stream channel (m2), D is the dispersion coefficient (m2 s−1), qLin is the lateral volumetric groundwater inflow length (per length of stream) (m2 s−1), CL is the solute concentration in the lateral inflow (mg L−1), Cs is the solute concentration in the storage zone (mg L−1), As is the cross-sectional area of the storage zone (m2), and α is the stream-storage exchange coefficient (s−1) [StreamSoluteWorkshop, 1990; Runkel, 1998]. A, As, D, and α are typically calibration parameters (Table 1).
Table 1. Model Parameter Names, Abbreviations, and Unitsa
The values for both the wide and constrained runs are included. Refer to Table 3 for the constrained channel area parameter ranges for each reach.
 Though OTIS only has four calibration parameters, interactions will likely occur among all parameters based on the forms of equations (1) and (2). Parameters are said to interact when their values are interdependent, meaning that their effects on a given model output are not additive and, therefore, not independent [Saltelli et al., 2008]. For OTIS parameters, the level and cause of interactions will differ for each experiment. Parameter interactions for the OTIS model are acknowledged in early publications, where stepwise, manual calibration recommends iterative parameter adjustment because parameter values (e.g., A and D) are interrelated [StreamSoluteWorkshop, 1990]. High correlations between model parameters found by later studies also corroborate this point [Gooseff et al., 2005; Wagener et al., 2002]. In terms of equations (1) and (2), A, present in every term of both equations, is expected to interact with all parameters. As and α, which are inversely proportional in equation (2), are also expected to interact. Past work also shows that D and the storage zone parameters As and α will interact, either for well-mixed conditions when either process can simulate concentration in the tail or for cases when storage zone processes are insensitive [Harvey and Wagner, 2000].
2.1. OTIS Calibration
 OTIS parameters are usually estimated by calibration to an experimentally obtained BTC [Edwardson et al., 2003; Harvey and Wagner, 2000; Ryan et al., 2004; Wondzell, 2006]. The most widely used optimization method for calibration of the OTIS model is OTIS-P, an adaptive, nonlinear least squares algorithm based on the NL2SOL software package [Dennis et al., 1981] and implemented within the Standards Time Series and Regressions Package (STAR-PAC) [Donaldson and Tryon, 1987]. In this package, error is quantified by the residual sum of squares RSS, which is calculated according to
where Ok and Sk are the observed and simulated kth values, n is the total number of observations, and wk is the weight for the kth observation [Wagner and Gorelick, 1986]. Weights are not required, but can be included to remove the emphasis on fitting to larger values at the BTC peak, given that the squared error values are minimized [Wagner and Gorelick, 1986].
RSS is minimized iteratively within the OTIS-P framework. Parameter estimates are updated based on approximates of the Hessian (second derivative) matrix for the residual sum of squares and partial derivatives of model-predicted concentrations, which are constrained by a parameter trust region that grows and shrinks based on the level of RSS fit [Donaldson and Tryon, 1987]. The algorithm converges to a final parameter set when there is a minimal change in (1) parameter value or (2) objective function value, based on user-defined thresholds for both criteria [Donaldson and Tryon, 1987; Runkel, 1998].
 Like all local search algorithms, convergence to a final parameter set is sensitive to the choice of an initial parameter set, and can lead to a local instead of global minimum solution [Dennis et al., 1981]. OTIS-P may not converge to a final solution for all applications. Singular convergence can occur when the data or complexity of an application cannot be used to estimate all transient storage parameters [Donaldson and Tryon, 1987; Ryan et al., 2004]. When OTIS-P does not converge, the estimated parameter values are not reliable, and should not be reported. Beyond assessing convergence, OTIS-P provides statistical information on parameter certainty calculated within the neighborhood of the parameter estimate. These values, though available as output from an OTIS-P run, are rarely used or reported in the literature. Statistics from OTIS-P are based on the variance-covariance matrix for the solution achieved by the nonlinear least squares optimization algorithm. The variance-covariance matrix represents an approximation of parameter uncertainty within the neighborhood of the solution [Donaldson and Tryon, 1987]. Though correlation coefficients from the variance-covariance may give an indication of uniqueness, they are a local, not global, assessment [Hill and Tiedeman, 2007]. Therefore, these values cannot be used to infer global behavior or uniqueness across the feasible parameter space. One approach to assess parameter uniqueness for similar models is usually tested by running the least-squares optimization multiple times from different initial parameter values; if the algorithm converges to the same parameter set, the set is said to be unique [Hill and Tiedeman, 2007]. While this method has a better chance of assessing parameter uniqueness than a single optimization run, it still does not guarantee complete exploration of the parameter space, and any statistics will be local values.
2.2. OTIS Parameter Uncertainty and Sensitivity
 Environmental models are simplified mathematical representations of physical systems used to describe observed phenomena [Wagener and Gupta, 2005]. Parameter values in these models are often conceptualized physical processes or system properties that cannot be directly measured, and therefore, must be estimated via calibration. Ideally, calibration produces a single set of parameter values that best reproduce the observed data [Cobelli and DiStefano, 1980; Sorooshian and Gupta, 1983]. However, it is also possible that more than one parameter set will reproduce the observations equally well with respect to a particular performance measure [Cobelli and DiStefano, 1980]. This condition is sometimes referred to as equifinality [Beven, 1989; Beven and Binley, 1992]. When equifinality occurs, it is usually the result of parameters that are insensitive [Johnston and Pilgrim, 1976] and/or interactive [Ibbitt and O'Donnell, 1971], and therefore nonidentifiable. Parameter equifinality and identifiability are especially problematic for complex models with many parameters [Beven and Binley, 1992], though equifinality can also occur for simple models with fewer parameters [Ibbitt and O'Donnell, 1971; Pickup, 1977; Beven, 1989; Wagener et al., 2002]. Parameter identifiability is affected by the model structure, data uncertainty, and the combination thereof [Sorooshian and Gupta, 1983; Beck, 1987; Beven, 2008]. The portion of the parameter space being sampled can also influence the interpretation of identifiability and sensitivity [Saltelli et al., 2006].
 Past studies have shown that OTIS model parameters can be nonidentifiable [Harvey and Bencala, 1993; Harvey et al., 1996; Wagner and Harvey, 1997; Harvey and Wagner, 2000; Wagener et al., 2002]. Sensitivity analyses addressing nonidentifiability have concluded that sensitivity for each of the parameters occurs for different parts of the BTC. In these studies, the main channel area parameter A is usually well identified, while identifiability for D varies [Wagener et al., 2002]. As and α are typically the least identifiable parameters [Harvey et al., 1996; Wagner and Harvey, 1997].
 Beyond testing parameter identifiability, these studies have attempted to answer the question, What conditions control identifiability? As A and D are typically well identified, research to assess controls on identifiability has focused on the storage zone parameters As and α. These controls can be explained in terms of the Damköhler number (DaI), defined as the ratio of average advective transit time to the trnaisnet storage zone interaction timescale and given by Bahr and Rubin  as
 Average advective transit time is a function of reach length (L), which depends on the experimental design, and the average reach water velocity (v), both of which can be determined prior to performing a tracer experiment [Harvey and Wagner, 2000; Wondzell, 2006]. Storage zone interaction time is a function of calibrated parameter values for A, As, and α, which must be estimated after performing a tracer experiment. Wagner and Harvey  previously investigated parameter sensitivities via constant rate injections. They found that As and α were best identified for DaI values near 1 and 0.1 respectively, representative of a ratio where the timescale of advection is reasonably approximated by the timescale of transient storage zone reactivity. Nonidentifiability occurs when one process overwhelms the other, governed by the following conditions:
 1. DaI >> 1, which occurs when storage zone exchange rates are much greater than advective velocities. This can occur for slow moving streams with rapid transient storage exchange. For this condition, the tracer mixes quickly in the stream and storage zone, generating a tail signal that can be matched using D or using As and α [Harvey and Wagner, 2000]. The influences of the parameters are indistinguishable from one another therefore interact, resulting in parameter interaction and nonidentifiability.
 2. DaI << 1, which occurs when advective velocities are much larger than storage zone exchange rates. This occurs for streams with high velocities and lower exchange rates, resulting in long time scales of transient storage exchange [Wagner and Harvey, 1997]. For this condition, the model is more sensitive to the main-channel processes and less sensitive to the transient storage zone processes and its parameters As and α [Harvey et al., 1996].
 Both conditions can also be influenced by the experimental design, via constraining the reach length to DaI values, corresponding to recommendations from Wagner and Harvey  and following Harvey and Wagner , [Wagner and Harvey, 1997; Harvey and Wagner, 2000], or through the sampling frequency and the duration of the experiment, which control the length and timescales of flowpaths that can be detected [Harvey et al., 1996; Harvey and Wagner, 2000; Gooseff et al., 2003a; Scott et al., 2003; Wondzell, 2006]. There is no recommendation for a specific DaI range that constitutes certain versus uncertain parameter sets; these guidelines instead offer an interpretation of how transport processes and experimental design will influence parameter uncertainty. The DaI recommendations from Wagner and Harvey  are applicable to high gradient streams for either constant rate or pulse injections, though the authors show that parameter sensitivity (quantified in terms of the coefficient of variation) is greater for a constant rate injection with sampling on the rise, plateau, and fall of the breakthrough curve as compared to a breakthrough curve generated from a pulse injection.
 Other researchers have applied the DaI recommendations from Wagner and Harvey  to assess parameter reliability and sensitivity in their own studies [Fellows et al., 2001; Martinez and Wise, 2003]. However, this approach again assumes that calibrated parameter values are unique. If the DaI value indicates high uncertainty in parameter estimates, the estimated parameter values cannot be used, despite the fact that a tracer experiment has already been performed. Confidence in this value is especially limited when it is used with uncertain parameters, as it implies that the DaI value is also uncertain because it is based on uncertain parameter estimates. For these reasons, DaI is not used directly in this study. However, we do believe that the conditions described above, where DaI is much greater or much less than 1, should represent cases where certain transient storage parameters will not be identifiable.
3. Study Area and Experimental Data
 This study uses BTC data from stream tracer tests along Stringer Creek, which drains a subcatchment of the Tenderfoot Creek Experimental Forest (TCEF), Montana, USA (Figure 2). TCEF has a predominantly continental climate, with an average annual precipitation of 840 mm. Precipitation occurs mainly as snow, and the majority (70%) falls between November and May [Farnes et al., 1995; Jencso et al., 2009]. Streamflow is snowmelt dominated, reaching a peak between late April and early June during the melt period. Flow generally declines through the summer and early fall [Jencso et al., 2009; Nippgen et al., 2011].
 Stringer Creek is a generally gaining, montane headwater stream. We refer to locations along Stringer Creek by valley distance from the gauge near the confluence with Tenderfoot Creek, where 0 m is at the gauge and 2700 m is near the origin of flow during base flow conditions. Physical characteristics of the stream vary along the valley, predominantly due to the gain in stream flow and to a change in underlying bedrock at approximately 1600 m [Payn et al., 2009, 2012]. With reference to this location, granite-gneiss bedrock underlies the valley downstream and sandstone bedrock underlies the valley upstream. The transition in bedrock corresponds to a substantial change in valley structure near 1200 m. Upstream of this location, down-valley slope averages 6%. Relative to downstream regions, the upstream valley is less constrained (wider valley floor) and hillslopes have lower relief. Meadows dominate the riparian landscape in the upstream valley, and there is less large woody debris in the stream channel compared to downstream regions. In contrast, the down-valley slope downstream of 1200 m averages 9%. Downstream valleys are more constrained (narrower valley floor) with higher relief hillslopes, and there are more riparian trees and more large woody debris in the stream channel.
 Our analysis was based on tracer test data collected in August 2005 using instantaneous tracer releases. Payn et al.  conducted individual tracer tests every 100 m along the valley. Salt (NaCl) was used as a conservative tracer, and tracer concentrations were measured by in situ electrical conductivity measurements. Mixing lengths across the reaches varied between a valley distance of 5–30 m and experiments were planned and conducted to limit the possibility of incomplete mixing by considering stream channel structure as well as repeating experiments in which incomplete mixing was observed in the breakthrough curve [Payn et al., 2009]. Full details of the experimental design are available in Payn et al.  and Payn et al. , and other analysis with the 100 m data sets are reported in Ward et al. .
 We selected 10 of the 26 BTCs for 100 m reaches collected along the 2700 m length of valley (Figure 2). Reaches were selected to span the entire length of the valley, to include a characteristic set of BTC shapes and durations, and to include a representative set of varying physical features along the valley (Figure 3). Each study reach is referenced by the valley distance at the downstream end of the reach, and reaches include: 2500, 2400, 2100, 1300, 700, 600, 400, 300, 200, and 100 m. These locations reflect the longitudinal spatial variability in stream flow, velocity, channel area, elevation, slope, gross hydrologic loss/gain, and net loss/gain (Figure 2b). Physical stream setting encompasses the variability in physical characteristics displayed in Figure 2b. These characteristics were quantified for each reach from the tracer tests or from a light detection and ranging (LIDAR) derived 1 m digital elevation model (DEM) of TCEF (Table 2). We compare model analyses from BTCs among these sites to explore how parameter identifiability changes with the variability in these characteristics. It is generally regarded that physical stream setting, including the influence of differences in geology [Morrice et al., 1997], geomorphology [Harvey and Bencala, 1993; Gooseff et al., 2007], streamflow [Harvey et al., 1996], and obstructions to sediment transport [Gooseff et al., 2003b; Ensign and Doyle, 2005], has some influence on transient storage parameter estimates. Instead, we seek to move beyond the scope of these past studies, to understand the relationship between physical setting and parameter identifiability. The Stringer Creek system provides a range of settings to test this relationship.
Table 2. Names, Abbreviations, Units, and Calculation Methods for Site Characteristicsa
Averaged streamflow at beginning and end of the reach
The difference between the flow at the beginning and end of each reach
Reach length divided by the time between the pulse peak at the injection point and the pulse peak at the downstream measurement location
Average flow divided by average velocity
Average value from the DEM across the reach
The difference in DEM elevation from the point of injection to the point of measurement, divided by the reach length
The average of the minimum and maximum loss, calculated from [Payn et al., 2009]
The average of the minimum and maximum gain, calculated from [Payn et al., 2009]
4.1. Sobol' Sensitivity Analysis
 Sensitivity analysis assesses how variability in model output is ascribed to variability in model factors [Saltelli et al., 2004]. Model factors include, but are not limited to, model inputs, initial states, parameters, and other user-defined properties that will influence the model output. We quantify sensitivity using Sobol' sensitivity analysis, a global, variance-based technique [Sobol', 2001, 1993]. Past studies indicate that Sobol's method effectively characterizes sensitivities for models of low to intermediate levels of complexity [Fieberg and Jenkins, 2005; Tang et al., 2007; Saltelli et al., 2008]. Sobol's method [Sobol', 2001, 1993] ascribes the total variance in model output to variance in model factors. Model output is often measured by an objective function which computes the difference between observed and simulated output, though other model output can be used. The total variance in the model output D(y) is divided into contributions from individual factors and factor interactions, represented as
where y is the distribution of model output, Di is the measure of the sensitivity to model output y due to the ith component of a given input factor pi, Dij is the portion of output variance due to the interaction of parameters pi and pj, and m represents the total number of factors being investigated. From equation (5), first- and total-order Sobol' sensitivity indices are calculated for each model factor in terms of the amount of total output variance reduced by that factor plus its interactions with other factors. This can be written as
where D∼i represents the average variance from all factors but pi. STi represents the variance in the output due to the factor pi and its interactions. It represents the sum of an interactions index and the first-order index Si, which measures the effect of factor pi alone. The ratio of first-order to total-order index is important to factor identifiability, while the magnitude of the total-order index determines sensitivity. Factors with high first- and total-order indices are likely to be identifiable. Factors with a low first-order index but high total-order index are dominated by interactions and are likely nonidentifiable. The magnitude of the total-order index is high for a sensitive parameter and low for an insensitive parameter. Insensitive parameters, defined as having a low total-order index, are nonidentifiable.
 Factor sets are sampled from user-defined ranges according to the Sobol' quasi-random sampling sequence [Sobol', 1967; Bratley and Fox, 1988; Sobol', 1994]. Sobol' indices are calculated for a total of N model runs and corresponding model outputs for each run. Saltelli  recommends
where m is the number of factors being investigated and s is the quasi-random sample size. Sobol'  suggests values of s that satisfy s = 2b where b is an integer to achieve stability in Sobol' indices.
4.2. Objective Functions
 The Sobol' method quantifies indices based on the variance in model output. The root-mean squared error (RMSE) is calculated to assess sensitivity to the entire simulated and observed time series'. RMSE is calculated as
where Ok and Sk are the observed and model simulated values at each time step k and n is the total number of time steps. RMSE is a slightly different but equivalent form of RSS (equation (3)), used commonly in OTIS model calibration [Runkel, 1998]. As RMSE is not a normalized error metric, we also report model performance in terms of the coefficient of determination r2. The r2 value, which assesses the fraction of variability in the observed BTC that is described by variability in the simulated BTC, is directly comparable across reaches. This is calculated as
where is the mean of the observed time series [Devore, 2000].
4.3. Fuzzy Performance
 Sensitivity analysis sampling strategies can include combinations of unreasonable model factors which generate poorly performing simulations that would normally be removed from an uncertainty analysis or calibration. When these solutions are included, the model factors which produce the largest errors have the highest sensitivity indices. This obscures the influence of other model factors, leading to a false assessment of model controls. Fuzzy metrics are alternative objective functions which allow users to determine which simulations should be included or excluded for an analysis [Pappenberger and Beven, 2004]. Applying a fuzzy performance metric evaluates the sensitivity of parameters to simulations which approximate the observations within an envelope of error, minimizing the effect of simulations outside of this envelope.
 The concept of fuzzy sets was first introduced by Zadeh  as a method for expanding binary to gradational membership. Values for this objective function M vary between 0 (nonmember) and 1 (member). Previous applications of this method [Aronica et al., 1998; Pappenberger and Beven, 2004; Cloke et al., 2008] determine values for M by comparing simulation values to two error widths E1 and E2 from the observed values. As seen in Figure 4, if the simulation value
 1. is within a width of E1 from the observations, M = 1,
 2. is outside a width of E2 from the observations, M = 0,
 3. is between a width of E1 and E2, M is linearly interpolated between 0 and 1 based on the distance of the simulation value from E1, normalized to the difference between E1 and E2.
 Typically, values and widths are selected based on measurement error/uncertainty or user-specific knowledge [Pappenberger and Beven, 2004; Cloke et al., 2008]. In this study, fuzzy metric performance was calculated at each time step for each model simulation (Figure 4). Because the observations span multiple orders of magnitude, we constructed the membership envelope in log10 space. We assigned values of E1 = 0.5 and E2 = 1, so that any simulated values greater or less than an order of magnitude from the observed value are rejected. These thresholds are wide so that only very poor solutions are excluded. Before calculating the fuzzy metric, we additionally removed all unstable model simulations, identified as oscillating simulations with more than one peak. Any peaks beyond the maximum concentration were classified as oscillating solutions if the difference between the peak value and the value at the previous and next time step was greater than 0.0048 mg/L, the smallest value of detection for concentration across the 10 reaches. Fuzzy metric values for an unstable simulation were set to M = 0 across the entire BTC.
4.4. Model Setup for Current Study
 The OTIS model was implemented for a conservative solute, steady flow, and a constant dispersion coefficient for each river reach. Solute mass input was a measured time series, which was recorded during the tracer injection. The model was run at a 2 second time step and Sobol' analysis was performed at 6 s intervals. Qin was set to the flow at the location of the solute injection, which was calculated via dilution gaging following Payn et al. . Lateral inflow (qLin) and lateral outflow (qLout) were calculated from water balance data from Payn et al.  based on the net change in flow over the reach and tracer mass recovery. Lateral inflow can be treated as a calibration parameter [Wagner and Harvey, 1997; Gooseff et al., 2003a]. In this application, these values were constrained with independent estimates based on the evidence of both gross gains and losses in the study reaches. CL was set to zero, as NaCl concentrations were determined using electrical conductivity measurements which were corrected for background values [Payn et al., 2009]. A description of the tracer experiments can be found in Payn et al. .
 In this study, parameter identifiability is determined based on parameter sensitivities. A parameter is likely to be nonidentifiable for two cases:
 1. It has a low total-order index, in which case the parameter is insensitive, or,
 2. It has a high total-order index but a low first-order index, indicating that the parameter influences model output through interactions with other parameters.
 A parameter is likely to be identifiable if it has a high first-order and total-order index, indicating that it influences model output through changes in its value alone. Tang et al.  used a threshold index of 0.1 and 0.01 for a complex, distributed watershed model to distinguish highly sensitive and sensitive parameters. We distinguish between high and low sensitivities as those indices greater or less than 0.25, given that our model is parsimonious with only four parameters. It is also important to note that a model parameter can be best identified for a value of zero, indicating that it is the absence of a process that is important [Wagener et al., 2003]. For an application where this does occur, the parameter would be identifiable, corresponding to a high first-order sensitivity index.
 Sensitivity indices were computed for A, As, D, and α (Table 1). Results are reported for three analyses, each with varying parameter ranges and objective functions. For the first analysis, wide ranges were used for all parameters. Wide ranges were initially used because sensitivity analyses are dependent on the associated parameter ranges. Sensitivity indices were calculated based on RMSE for the entire BTC. Ranges, specified in Table 1, were defined from an understanding of the system and from values used in previously performed sensitivity analysis studies [Wagner and Harvey, 1997].
 For the second analysis, parameter bounds were kept wide for As, D, and α (Table 1), but the ranges for A were narrowed. The results from the first analysis were used to constrain A, based on the best 1000 runs (quantified by RMSE and r2) (Table 3). In the first analysis, sensitivity indices for A were much higher than for any other parameter, so A was constrained to test how the sensitivities for the other three parameters would change. The top 1000 runs for RMSE and r2 produced similar ranges for A for all reaches except at location 2500 m. Again, indices were calculated from RMSE of the entire BTC. For the final analysis, fuzzy metric values were calculated from the simulations produced by the second analysis. Sobol' indices were computed for the fuzzy metric values at each time step in the BTC.
Table 3. Model Fits for the Top 1000 Runs for the First, Second, and Third Analysesa
Analysis 2 and 3
r2, Top 1000
RMSE, Top 1000
Channel Area, A (m2)
r2, Top 1000
RMSE, Top 1000
The constrained ranges for the main-channel area parameter A were taken from the top 1000 runs by RMSE and r2 for the first analysis. These values were computed for each reach.
5.1. Sensitivity Indices for the Complete BTC
 Sensitivity indices for the entire BTC were computed for all 10 reaches (Figure 5). Main channel area A is the most sensitive parameter for the first 2 analyses and across all 10 reaches. For the first analysis, where wide bounds were used for all parameters, the total-order indices for A were between 0.83 and 1.12 (Figure 5a). D was slightly sensitive, and As and α had total-order sensitivities below 0.1 for all reaches. The differences between first and total-order indices also indicate that interactions accounted for 34 and 100% of the total-order indices for D. A had the highest first-order indices across all parameters. First-order values for A represented 63–89% of the total-order index. For the second analysis, A and D were most sensitive, with total-order indices between 0.38 and 1.17 for A and 0.25 and 0.70 for D (Figure 5b). Total-order indices for As and α were much smaller than values for A and D, but larger than the total-order values for As and α in the first analysis. As was slightly sensitive and had total-order index values between 0.08 and 0.19 across all reaches. Interactions accounted for nearly all of the total-order indices for As (72–89%) and α (87–100%), and for variable portions of the total-order indices for D (29–82%) and A (19–64%).
 To illustrate that the parameter ranges did generate parameter sets that approximated the observations, we reported the level of parameter fit for the top 1000 runs for both wide and constrained analyses in Table 3. The top 1000 runs achieved high levels of fit for all reaches for both analyses. The best values for both analyses and across all reaches were 0.99. The lowest r2 values for the top 1000 runs ranged from 0.578 to 0.853 for the first analysis to 0.693–0.981 for the second analysis. These levels of fit demonstrate that model runs provide reasonable predictions of the observed data.
5.2. Fuzzy Performance Sensitivity Indices
 Though indices for RMSE were computed for the first and second analysis in a moving window framework, the resulting indices were similar through time and across all parameters and locations. Thus, the fuzzy metric was implemented to filter poor simulations and to avoid misrepresentation of parameter sensitivities. In general, first-order fuzzy metric sensitivity indices were small, indicating that parameter interactions control output variance (Figure 6). Total-order indices across all reaches were highest for A and D. Though the patterns of periods when parameters were most sensitive differ between reaches, D and A were more sensitive than As and α in the rising limb, the peak of the BTC, and occasionally through the falling limb. In contrast, As and α were more sensitive on the falling limb and in the tail of the BTC. Applying the fuzzy metric also removed all unstable solutions, which occurred for only three reaches (100, 200, and 2500 m) and represented less than 0.3% of simulations.
 We classify high sensitivities as those greater than 0.25, and low sensitivities as those less than 0.25, corresponding to sensitive and insensitive parameters. Across all reaches, first and total-order sensitivities indicated high sensitivity to A and D. As and α had high total-order sensitivities for all reaches but 2500 m, and 2400 m, and 2500 m, respectively. In contrast, As had high first-order sensitivities for reaches near the bottom of the stream (100, 200, 300, 400, 600, and 700 m) and the reach at 2100 m. First-order sensitivities for α were low across all reaches, indicating that α influences model output through interactions. Given these patterns, first-order indices were grouped into two different types of responses. The reaches near the top of the stream (1300, 2100, 2400, and 2500 m) had higher first-order indices for D and A, whereas As, when sensitive, influenced model output through interactions. The exception is at location 2100 m, where As was sensitive through the tail. For the reaches toward the bottom of the stream (100, 200, 300, 400, 600, and 700 m), first-order sensitivities were highest for As and A in the BTC tail, and D on the rising limb. For these reaches, interactions were largely responsible for all parameter sensitivities.
6.1. The Importance of Parameter Ranges and Performance Objectives in Assessing Parameter Sensitivities
 Though the magnitudes of indices for the entire BTC varied slightly across the four parameters, general controls across all reaches remained the same independent of whether or not A is constrained. The main channel area parameter overwhelmingly controlled the largest portion of model output variance for both analyses, indicating that it is the main control on output variance in advection dominated systems (Figure 5). A also had a high first-order index, indicating that the parameter has a strong independent influence on model output, though it still interacted to some extent with other parameters. The sensitivity indices of other parameters increased when A was constrained (Figure 5b). D and A still controlled the majority of model output variance across all reaches. Sensitivity of the output to As was limited to interaction with other parameters.
 These results suggest that the model output for the entire BTC provided the most information about the cross-sectional area of the advective channel. A is likely identifiable and will be accurately estimated using optimization tools such as OTIS-P. These results are a function of the performance metric RMSE, which amplifies large errors and is, therefore, primarily influenced by errors in BTC peak timing and magnitude. Calibration with RMSE, or by extension RSS, will therefore be sensitive to A, and will fit the BTC peak. High sensitivity to A across all reaches is also a function of stream setting, given the moderate to steep stream slope gradient (average stream slopes from a 1 m DEM ranging from 2.3 to 6.3%). The first- and total-order sensitivity indices for As and α were low. As such, As and α are likely to be nonidentifiable. Calibration to the entire BTC will provide little unique information about transient storage for pulse injections in advective systems.
 Sensitivity indices calculated for the fuzzy metric indicated that A and D controlled the majority of output variance through the entire BTC for all reaches. For reaches at the bottom of the stream, A exhibited lower first-order sensitivities than As or D, though is still likely to be identified with first-order index values greater than 0.25 across all reaches for variable portions of the BTC. Constraining parameter bounds for A resulted in parameter sets that well approximate the BTC, such that changes in A may not lead to much variability in the fuzzy metric. Instead, A influenced model output across a constrained range through interactions with other parameters. As and α had larger total-order sensitivity indices than in the previous application to the entire BTC and variable first-order indices across the reaches.
 All parameters across all reaches largely influence model output through interactions. This supports conclusions of past research that OTIS parameters are highly interactive [Wagener et al., 2002; Gooseff et al., 2005]. To our knowledge, this level of interaction has not been previously quantified. This high level of interaction indicates that OTIS parameters for the reaches in this study are most likely nonidentifiable, except when first-order indices are high (greater than 0.25). Nonidentifiability does not imply a value of zero for a parameter estimate; it indicates that the value is interacting with other parameters, such that its value depends on the value of another parameter, or that the value is unimportant with respect to a given objective function.
6.2. Parameter Controls Across Reaches
 Reach parameter sensitivities were generalized into two types of responses, which were grouped based on similarities in sensitivities to the fuzzy metric through time and changes in physical setting in Stringer Creek. The discontinuous change in valley structure near 1200 m discussed previously corresponds to similarities in fuzzy metric sensitivities, which divided the reaches into two stream types, Type I (below 1200 m) and Type II (above 1200 m). Among many differences in valley, riparian, and channel structure, this change corresponds to differences in the down-valley slope, the amount of in-stream woody debris in the active channel, and the topography of hillslope terrain [Payn et al., 2012]. To explore how this discontinuity related to parameter sensitivities, fuzzy metric sensitivity indices were averaged across reaches and different periods of the BTC for each stream type and visualized in Circos plots (Figures 7 and 8; [Krzywinski et al., 2009]).
 Type I streams included locations 100, 200, 300, 400, 600, and 700 m. In these reaches, A and D had high (<0.25) total-order sensitivity indices on the rising limb, peak, and falling limb (Figures 8 and 9a). Additionally, interactions between A an d D dominated all parts of the BTC but the tail. Dispersion played a greater role than advection, with larger total-order sensitivity indices for D versus A in all but the falling limb and tail. As had high total-order sensitivity indices through the falling limb and tail and was most sensitive to model output on the falling limb. In the falling limb, As and α interacted primarily with D but also with A. Across these reaches, D, A, and As are likely identifiable parameters given their high first-order indices.
 Type II streams include BTCs from 1300, 2100, 2400, and 2500 m (Figures 8 and 9b). In contrast to Type I streams, A and D were sensitive to model output through all parts of the BTC. Advection dominated dispersion, with larger sensitivity indices for A compared to D. A and D may be identifiable, given their high first-order indices. Transient storage parameters As and α had low first-order indices for Type II streams. As such, they are nonidentifiable for this stream type. The only exception is the reach at 2100 m, which had a high first-order index for As through the tail of the BTC.
6.3. Controls on Parameter Sensitivities
 Differences in model sensitivities for the two generalized stream types along Stringer Creek are likely related to the physical characteristics of the stream systems (Figure 10). The relationships between sensitivity and physical setting also point to controls on parameter identifiability. We expect sensitive parameters to be tied to dominant reach processes and use the physical characteristics of Stringer Creek to interpret this.
 The highly advective nature of the Stringer system and results from Figure 5 indicate that A should be identifiable for all reach types. The differences in sensitivity between A and D and the influence of As and α appear to be variable and organized by the physical context of the stream channel. For Type I streams, model output is more sensitive to D than A in the range of the observed tracer data. These reaches are subject to greater dispersion in more tortuous flow through step-pool sequences created by fallen trees and boulders. In contrast, model output for Type II streams is more sensitive to A than D. Flow in Type II streams is much less tortuous than in Type I streams, likely indicating that dispersion is less important to solute transport, resulting in lower sensitivity of output to D.
 Sensitivity of the storage zone parameters appears to be related to both stream size and stream type. As and α are not identifiable in Type II reaches. Transient storage for these reaches is most likely occurring to a small extent, though not absent, and parameters related to this process are, therefore, insensitive. This is supported by the stream setting in these locations. The riparian zone in upper portions of the watershed is almost entirely meadow across a wide valley floor, resulting in far fewer impediments to water flow or sediment transport in the active channel of Type II reaches. As may be identifiable for Type I streams. In contrast to Type I streams, boulders and large woody debris are common in the active channel of the lower portions of the watershed (Figure 10). The presence of these features has the potential to result in transient storage. Identifiability of As appears to be a precursor for the identifiability of α for instantaneous tracer release data, due to the strong interactions between these two parameters through the falling limb and into the tail.
 Overall, these results indicate that transient storage models can be used to characterize a subset of processes in a given reach for pulse tracer data. In this context, it is unlikely that all four parameters will be identifiable from instantaneous releases. Instead, dominant processes related to physical setting will correspond to the most identifiable parameters, and only these values can be used to formulate reach-level conclusions. However, we do demonstrate that pulse data can inform estimates of As, but that this information is contained within the tail of the BTC, and cannot be estimated from a calibration to the entire curve. This suggests that there is still valuable information on transient storage processes contained within a pulse release BTC for an advection dominated system.
6.4. Comparison to Previous Studies
 Several studies have suggested both similar and different conclusions regarding the identifiability of transient storage parameters from instantaneous release tracer data (Figure 9). Wagner and Harvey  (Figure 9c), Wagener et al.  (Figure 9d), and Scott et al.  (Figure 9e) have all examined time-varying parameter sensitivities or surrogate measures of parameter information content for tracer experiments in various stream settings. All three stream studies found sensitivity to As and α along either the falling limb and/or tail of the BTC. This corroborates results for Type I streams from this analysis. Additionally, sensitivity to D and A along the rising limb and falling limb for all three studies was similar to the responses found for Type I and Type II streams. The most apparent difference between these past studies and current results occurs at the peak of the BTC. All three studies found sensitivity at this peak to either As or α, which was not consistent with our analysis. Interestingly, none of the studies found sensitivity to A at the BTC peak, which occurs for both stream types from this study.
 Differences in peak controls may be caused by a number of different factors. While the stream system studied by Wagner and Harvey  was similar to the Stringer system, their study used different methods for characterizing sensitivity, in terms of the number of parameter sets, the algorithm used to generate parameter sets, the type of data investigated, and the method used to calculate sensitivity indices. The United Kingdom stream studied by Wagener et al.  was much different than the Stringer Creek system, with flat terrain and stream slopes. Also important to note is that the generalized response from Wagener et al.  does not show that A is the most sensitive parameter through all parts of the BTC except the tail. This is again similar to the results of the Stringer Creek sensitivity analysis. Of these three studies, reaches along the Stringer Creek valley would be most similar to the reaches in Uvas Creek investigated by Scott et al. . Results of Scott et al.  have similarities to both Type I and Type II streams. Overall, results of previous studies corroborate findings for the Stringer System that different parameters will be sensitive across different parts of the BTC, and that this will vary across stream reaches. These differences can be attributed, at least in part, to difference in physical settings among these sites.
 While parameter estimates from the transient storage model have been used to characterize streams in terms of the magnitude of these processes, this approach can be flawed when estimated parameters are not uniquely identifiable. The sensitivity analysis performed in this study corroborates this point, with parameters A and D generally dominating across 10 reaches in an advective mountain stream. Storage zone parameter α, which other studies have shown to be much less certain than A and D, was not sensitive across the analyzed reaches. This suggests that α cannot be used to characterize high gradient mountain streams similar to Stringer Creek. Model output was sensitive to As through the BTC tail for models of reaches with stream features generating in-stream transient storage, suggesting that pulse injections have the potential to identify this parameter. Estimates for As should be reported with caution, as stream characteristics will influence whether or not this parameter is identifiable.
 Given that the concentration data from this study were observed for a particular experimental setup and for particular conditions, there are certain limitations to the study conclusions. Past synthetic work from Wagner and Harvey  suggests that parameter uncertainty is also a function of the type of tracer release and study reach length. Wlostowski et al.  also show for experimental BTCs that the type of tracer release will influence which parameters can be identified as well as where along the BTC this information is contained. For the experiments analyzed in this study, all data were taken from instantaneous tracer releases and for 100 m reach lengths. Future work with this approach should test how parameter identifiability varies for constant rate versus instantaneous tracer injections, as well as how it varies across different reach lengths [e.g., Wlostowski et al., 2013]. This study also uses reach level data from a single stream; all conclusions are given within the context of the physical characteristics and setting which govern this stream. These results represent one application to a particular type of stream in one environment. We expect that streams with more variable stream characteristics than those seen in Stringer Creek will exhibit different patterns and levels of parameter identifiability. For instance, transient storage zone parameters may be identifiable for braided streams or streams known to experience shorter time or length scales for hyporheic exchange, as transient storage may be expected to be a dominant process in these types of streams. While these limitations merit future investigation, this study, to our knowledge, represents the most comprehensive sensitivity analysis and investigation of parameter identifiability for the application of transient storage models to date.
 The one-dimensional form of the transient storage model makes a number of approximations to the stream system. Assuming a constant value for D and As along the reach length simplifies a two-dimensional flow field into a one-dimensional application. For reaches where significant variation in physical characteristics along a given length of stream exists, crude approximations to the velocity flow field may compromise characterization of patterns of hyporheic exchange [Cardenas et al., 2004]. The alternative to the one-dimensional TSM is a two- or three-dimensional model, e.g., MODFLOW, MODPATH [e.g., Gooseff et al., 2006], or COMSOL [e.g., Cardenas, 2009]. While efficient numerical solvers exist for two-dimensional applications [Hill and Tiedeman, 2007], the applicability of a 2-D or 3-D model to a given site depends on whether there is enough information about a field site to constrain the model [Harvey and Wagner, 2000]. Aside from experimental characterization of the field site, equifinality becomes an even greater problem as the number of parameters in a model increases [Beven, 2006].
 The wider implications of this study are that modeling of instantaneous tracer data may not be capable of characterizing all stream processes related to the model parameters for all stream types. Given that this is an application to a series of reaches along a single stream, these results do not demonstrate that storage zone parameters are always nonidentifiable. Instead, we recommend that model parameter estimates should be reported along with sensitivity estimates as provided here or other types of uncertainty estimates. Only identifiable parameter values should be used to infer reach functioning.
 While these results suggest that care should be used in the reporting of these parameters, they highlight that the transient storage model is an excellent tool for identifying the dominant processes across stream reaches. They also show that there is still much information that can be gained through a pulse injection, and that the amount of information that can be abstracted will depend on the characteristics of a given reach. The relationship between parameter identifiability and physical setting suggests that detection of important stream processes may be better framed in terms of parameter sensitivity and identifiability. Parameter identifiability indicates that a given process is important for a given reach, while a lack of identifiability can be attributed to nonsensitivity or interactions with other parameters (which can be distinguished). In this application, the difference in dominant controls across reaches suggests that there is also an opportunity for an a priori classification of streams if we can relate sensitive processes to physical reach characteristics. Such a classification system would allow researchers to determine which transient storage model parameters are likely identifiable prior to performing a tracer test. In a broader sense, such a classification system will undoubtedly improve our understanding of transient storage processes and their relationship to physical setting. Broad characterization of stream reaches and the processes governing solute transport will be especially important as the anthropogenic footprint on ecosystems continues to grow, especially within small stream systems [Freeman et al., 2007] where transient storage models are often applied [Stanley and Jones, 2000].
 The authors would like to acknowledge Seth Kurt-Mason for discussions that helped contribute to the manuscript. Partial support for this project was provided by an EPA STAR Early Career Award (R834196) to Thorsten Wagener, and by an EPA STAR Graduate Fellowship Award to Christa Kelleher. The authors would also like to thank Rob Payn, Kelsey Jencso, and other field researchers from Montana State University for preparing and sharing tracer tests and terrain metrics for Stringer Creek. Sobol' analysis code was generated by Y. Tang, and generously shared by the Patrick Reed group at Penn State. Data for this study were provided by NSF grants EAR 03–37650, EAR-0943640, and EAR-0837937. The authors would like to thank the U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station for providing runoff data and logistical support. Any opinions, findings, and conclusions or recommendations expressed in this study are those of the authors and do not necessarily reflect the views of the funding institution.