A systematic approach is described for determining the minimum level of model complexity required to predict runoff in New Zealand catchments, with minimal calibration, at decreasing timescales. Starting with a lumped conceptual model representing the most basic hydrological processes needed to capture water balance, model complexity is systematically increased in response to demonstrated deficiencies in model predictions until acceptable accuracy is achieved. Sensitivity and error analyses are performed to determine the dominant physical controls on streamflow variability. It is found that dry catchments are sensitive to a threshold storage parameter, producing inaccurate results with little confidence, while wet catchments are relatively insensitive, producing more accurate results with more confidence. Sensitivity to the threshold parameter is well correlated with climate and timescale, and in combination with the results of two previous studies, this allowed the postulation of a qualitative relationship between model complexity, timescale, and the climatic dryness index (DI). This relationship can provide an a priori understanding of the model complexity required to accurately predict streamflow with confidence in small catchments under given climate and timescales and a conceptual framework for model selection. The objective of the paper is therefore not to present a perfect model for any of the catchments studied but rather to present a systematic approach to modeling based on making inferences from data that can be applied with respect to different model designs, catchments and timescales.
 The paper is concerned with the choice of appropriate model complexity for prediction of streamflow responses of ungauged catchments at different timescales. At present there is inadequate guidance available to a priori choose a model of appropriate complexity for predictions of streamflow of a specified accuracy, given only the climatic, vegetative, topographic and soil characteristics, and a given timescale. Provided such a method exists, and a model of appropriate complexity can be chosen, the dominant physical controls on streamflow variability can then be properly investigated. These results can subsequently be used to concentrate the modelers' efforts on the best approaches for estimating the required parameter values, with a view to improving the accuracy of predictions and to reducing predictive uncertainty.
 The motivation for this paper is therefore not the development of a “perfect model”, but a systematic method of model selection that makes tradeoffs between model complexity, accuracy and predictive uncertainty. Although integral to modeling strategies in catchment hydrology [Garen and Burges, 1981], the tradeoffs between model complexity, accuracy and predictive uncertainty have not been explored in a systematic manner in the past. A plethora of models with a wide-ranging complexity have been developed [Chiew et al., 1993; Singh, 1995], but there is no consistent method available to select the most appropriate model for a given set of catchment conditions. Chiew et al.  compared the relative accuracy of six predefined calibration-based models of increasing complexity, when tested on a number of catchments with different climatic conditions, and concluded that the most accurate model was the most complex model, regardless of timescales and catchment characteristics. Although, in some cases simpler models were reported to make accurate enough predictions, the links between climate, timescale and catchment characteristics, and model accuracy and required model complexity, were not explored in any detail.
 Conceptual models dependent on calibration for their parameter values, such as the ones studied by Chiew et al. , are used extensively in catchment hydrology. However, Klemes  suggests that a common weakness of many calibration-based models is structural arbitrariness and overparameterization. As they require estimation of many parameters with little or no physical meaning, there is often no means to estimate them all a priori. Consequently, there is a large associated parameter uncertainty which, when propagated through the model, produces output errors too large to place confidence in the predictions. Because these calibrated model parameters often could not be linked to physically meaningful or measurable climate and landscape properties, the links between catchment characteristics, climate and timescale, and predictive uncertainty, could not be established, and hence the tradeoff between model complexity, accuracy and predictive uncertainty could not be fully investigated. This has also inhibited the identification of the dominant physical controls on streamflow variability at various time and space scales.
 The approach to model development adopted in this paper is inspired by the downward approach outlined by Klemes . Starting with the simplest model possible, model complexity is systematically increased in direct response to demonstrated deficiencies in the model predictions, and the process is continued until the required accuracy is achieved. For water balance modeling, the modeler often finds that a relatively simple model is sufficient to make predictions at large timescales, such as the annual timescale, whereas the model complexity needs to increase as the timescales decrease. The starting model used in this paper is a variant of the Manabe bucket model [Manabe, 1969; Milly, 1994]. It incorporates simple, but physically meaningful representations of hydrological processes, and requires parameter values and climatic inputs most of which can be realistically derived from a priori field measurements. Minimal calibration against selected streamflow records from a number of storm events (<10) each of approximately 5–10 days duration is required to estimate two of these parameters before simulations are run and predictions made.
 The ultimate extent of the model evolution reached in the downward approach presented here is dependent upon the tradeoff between model complexity, accuracy, and predictive uncertainty, which are assessed using systematic sensitivity and Monte Carlo error analyses. These analyses identify the dominant physical controls on streamflow variability at each timescale, and produce estimates of predictive accuracy and uncertainty, and hence the degree of model complexity required/available to capture observed streamflow variability. Between-catchment differences in model predictions are correlated to physically measurable catchment properties and climatic conditions. Although the model applied in this paper is specific to New Zealand catchments, the approach used has also been adopted with considerable success to catchments in other climatic and physiographic regions [Jothityangkoon et al., 2001; Eder et al., 2002; Farmer et al., 2002], and to other types of models as well. Based on the correlations obtained in all of these studies (including the present one), a qualitative relationship is formulated between climatic indices, timescale and required model complexity, which can provide useful guidance to model selection, development and inter-comparison in other regions.
2. Study Sites
 Four New Zealand catchments were selected for this study because the climate is relatively moderate (rather than extreme), and the extensive data required to model streamflow was available through the National Institute of Water and Atmospheric Research (NIWA). The four catchments are Moutere, Waihopai, Mahurangi, and Ngahere, and as seen in Figure 1, these are well spaced to represent the different soil, climatic, vegetative and topographic conditions found in New Zealand. The relevant climatic, soils, vegetative and topographic information is presented in Table 1 including the length and period of datasets used.
Table 1. Mean Annual Rainfall, Potential Evaporation, and Streamflow as well as Relevant Soils, Vegetative, and Topographic Information
 Our model development in fact commenced with the application of the classic Manabe  “bucket” model, with the representation of the catchment as a single “bucket”, as depicted in Figure 2a. Preliminary simulations with this rather simple model produced acceptable predictions of annual runoff in all four catchments (not presented here for brevity). The most accurate results were obtained in the very wet catchment with high annual rainfall (i.e., Ngahere). However, monthly and daily flow predictions using the model did not match the observations in any of the catchments. It was therefore decided to include subsurface flow in addition to saturation excess runoff. In addition, the amount of vegetation cover has been observed to vary between catchments and so it was decided to include an explicit treatment of vegetation cover to enable between catchment inferences.
3.2. Bucket Representation
 Given these considerations, the Manabe bucket model was adapted to incorporate vegetation cover and a subsurface runoff generation mechanism. Figure 2b illustrates the operation of the new “leaky bucket” model, which represents the starting point of the analyses presented here. It introduces two storage thresholds governing water movement from the catchment. Soil moisture below the permanent wilting point is assumed to be immobile with respect to both evaporation and subsurface drainage, and “active water” is water that is stored above this threshold. Similarly, when the moisture content is below a field capacity threshold, then water does not drain through the subsurface flow pathway, but can be lost by evapotranspiration. The five parameters required to operate the “leaky bucket” model are fc, Sbc (soil properties), M (vegetative property), a and b (conceptual soil and topographic parameters related to shallow subsurface flow). Their descriptions are provided in Appendix.
3.3. Bucket Capacity and Threshold Storage Parameter
 The bucket capacity (Sbc), is simply the storage capacity of the bucket, while the threshold storage parameter (fc), determines the “threshold storage” (storage at field capacity, Sfc), controlling both runoff and evapotranspiration. The definitions for the bucket capacity, threshold storage parameter and threshold storage are provided in (A1–A3), along with equations describing these in terms of basic soil and vegetation properties, and showing a dependence on porosity (ϕ), moisture content at field capacity (θfc) and at the permanent wilting point (θpwp), and depth to an impervious clay layer (D).
3.4. Actual Evapotranspiration
 Actual evapotranspiration is a function of the “available water” (storage), and atmospheric conditions. Transpiration (Eveg) and bare soil evaporation (Ebs) are modeled explicitly and separately, as functions of soil properties and vegetative cover, as well as the dynamic value of soil moisture storage in the bucket. The rates of bare soil evaporation and transpiration are assumed to depend on the parameter M, representing the fraction of the catchment area covered by forests of any kind. Transpiration is modeled as occurring over a fraction M of the catchment, and bare soil evaporation occurs over the remaining fraction 1 − M. Assuming all species have approximately the same transpiration rate, then M = 1 and M = 0.1 represent completely forested and completely pasture catchments, respectively. Pasture catchments are assumed to have 10% effective vegetation cover, due to the fact that the grasses in New Zealand are tall and relatively more deep rooted than usual, so not all evapotranspiration is the result of bare soil evaporation under pasture conditions. A catchment with short pasture would be modeled as M = 0.05. Equations (A4) and (A5) contain all of the relevant equations for estimating bare soil evaporation and transpiration. The combined effects of bare soil evaporation and transpiration are assumed to also implicitly include the effects of interception.
 Daily discharge is produced via two mechanisms, surface runoff or quick flow via the saturation excess mechanism (Qse), and subsurface runoff or delayed flow (Qss). Saturation excess runoff is produced when the storage exceeds the bucket capacity (Sbc). Similarly, subsurface flow is produced when the storage in the bucket exceeds the threshold storage Sfc, and is expressed through a nonlinear storage-discharge relationship and governed by two parameters “a” and “b”, as described in Wittenberg and Sivapalan . Equations (A6) and (A7) are the relevant equations.
3.6. Water Balance
 The resulting water balance equation for the catchment is given by:
Equation (1) is used to simulate the temporal variation of water storage and the various outgoing fluxes, at a daily time step, given daily inputs of rainfall (P), and potential evaporation (Ep), as well as estimates of the parameters; M, fc, Sbc, a, and b. Appendix A contains all of the relevant definitions. To determine the initial storage (S0) for each catchment at the beginning of simulations, the model was initialized by running the model for one year and then recording the soil storage at the end of the simulation.
3.7. Parameter Estimation
 The estimation of parameters M, fc and Sbc is relatively straightforward, given the relevant equations in Appendix A, and the information presented in Table 1. The estimation of the nonlinear recession parameters a and b is not as straightforward. The estimation of these parameters involves a process of assembling together individual recession curves at various times of the year extracted from observed streamflow data to produce a master recession curve (or mean curve). A recession in this instance is defined as the period shortly after a runoff event (>1 day after a rainfall event). The theoretical functional form of the recession curve corresponding to the nonlinear storage-discharge relationship (A7) is given by [Wittenberg and Sivapalan, 1999]:
where Qt is recession flow (mm/d) at time t, Qo is the initial runoff in the mean time series, in mm/d. This theoretical relationship is fitted to the empirical master recession curve described above, through minimization of an appropriate objective function (calibration). The objective function we use, res, measures the quality of fit (a value of zero indicates perfect fit), and is given by (3) below:
where t is the time in days, and Qpred and Qmean are the predicted and measured (master or mean) recession curves.
 When using (3) to minimize res and to solve for a and b, we found that there were numerous optimal solutions that produced equally accurate fits, in all four catchments. A sensitivity analysis was performed and the results (Figure 3) suggested that for b values between 0.4 and 0.6, all four catchments produced equally accurate recessions (low res). Wittenberg and Sivapalan  produced similar results for arid and semi-arid catchments of southwest Western Australia, and concluded that it was reasonable to fix the exponent b at a mean or dominant value. A value of 0.5 was suggested, agreeing with the empirically estimated values for unconfined aquifers presented in the previous work, e.g., Werner and Sundquist , and Fukushima . We have therefore, for convenience, adopted a value of 0.5 for the parameter b, and have assumed it to be the same for all four catchments. Although fixing b at 0.5 is a pragmatic approach to reduce parameter interdependence and improve identifiability, it does not represent the optimum performance [Campbell and Bates, 2001]. Given that b = 0.5, (3) was subsequently used to estimate a for all catchments by minimization of the function res. Note that the effect of evaporation can be considered negligible during the recession periods, in zones of moderate climate such as those found in New Zealand [Wittenberg and Sivapalan, 1999]. Our sensitivity analyses have also shown that evaporation has negligible impact on the estimated recession parameters in these catchments.
Table 2 presents estimates of the four parameters of the model for the four study catchments: the first three (M, fc, Sbc) were estimated from field data (as described before), and the last (namely a) was estimated from the recession curve analyses (as described above). All of the parameters, including the parameters a and b, were estimated prior to the application of the leaky bucket model to the four catchments.
Table 2. Parameter Values for Moutere, Waihopai, Mahurangi, and Ngahere
 The results of model simulations are presented in the form of signature plots [Jothityangkoon et al., 2001; Farmer et al., 2002], and also in terms of the hydrographs of observed versus predicted streamflows, and through statistical summaries. Signature plots are statistical representations of the inter-annual, intra-annual (monthly) and daily streamflow variability of the catchment runoff response. They enable us to assess how well the model is able to predict the streamflow response of the catchment without the reliance on hydrograph fitting, and moreover, can give us considerable insight into catchment response.
 However, the hydrographs of observed and predicted streamflow have also been presented together to confirm if the matching of the signature plots is indeed reflected in an accurate reproduction of the observed hydrographs. This is because visual examination of the signature plots chosen provides little quantitative information about model performance in capturing individual storms, especially, streamflow recessions. Two numerical measures of fit, the correlation coefficients (ρx,y), and the Runoff Ratio (RR), are introduced to enable the comparison of the timing and magnitude (respectively) of the observed and predicted hydrographs. These are estimated using (4a) and (4b) given below:
where ρx,y and covx,y are the correlation coefficient and covariance, respectively, between the observed and predicted time series (x,y), and varx and vary are the variances of the individual time series. ρx,y is used to measure how well the model mimics the time variability of the catchment. RR is the dimensionless runoff ratio and E[Qobs], E[Qpred] are the expected mean annual observed and predicted runoff respectively. RR is used to measure the bias in model estimates. Although there are many measures of model performance available, those selected here are deemed adequate to assess the general performance over the entire length of record.
4.1. Results from Preliminary Model Simulations
Figures 4a, 4b, 4c, and 4d and Table 3 contain the signature plots, the observed versus predicted streamflow hydrographs, and the streamflow statistics, for all four catchments (E[P] is the mean annual rainfall and E[Q] is the mean daily runoff for the observed and predicted time series). The results suggest that the model predicts streamflow with reasonable accuracy at the annual and monthly timescales but fails to reproduce daily streamflows, especially during low flows. Failure to capture low flows is exemplified by the apparent “plunging” of the tail of each catchment's modeled flow duration curve and by the recession limbs of their streamflow hydrographs (Figure 4a, plots iii and iv; Figure 4b, plots iii and iv; Figure 4c, plots iii and iv; and Figure 4d, plots iii and iv), as well as by the low correlation coefficients corresponding to daily flows (Table 3).
 Low flows have little effect on the timing of annual and monthly flow volumes, but can have a significant effect on the timing of daily predictions (especially in summer). Correlation between observed and predicted daily flow is noticeably higher at Moutere than the remaining three catchments. This can be attributed to the very small catchment area and consequent lack of base flow. Waihopai, Mahurangi and Ngahere are all larger catchments with significant volumes of base flow, which should be represented in the model to improve predictions of daily flow. The Runoff Ratio (RR), was close to unity for all catchments (RR ≈ 1), except in Moutere, which showed significant over-predictions at all timescales. Given the area of Moutere, the rainfall and pan evaporation measured on site are good estimates of the catchment average rainfall and evaporation so are unlikely to be the sources of over-predictions. These are more likely caused by one or more of the catchment parameters used in simulations. Poor timing and the poor low flow predictions do not seem to affect RR because it is more truly an annual statistic.
 Therefore it is suggested that the model as it stands could be used to predict annual and monthly flows, but would be inappropriate at the daily time step. Failure to capture low flows at the daily time step can be attributed to the operation of the bucket as depicted in Figure 2b and equation (A7), by the fact that streamflow reduces to zero when storage falls below Sfc. Since observed streamflow records suggest there is a persistent base flow component throughout the year at Waihopai, Mahurangi and Ngahere, introduction of base flow process, its parameterization, and the means to estimate the corresponding parameter value(s), are needed to enhance the model's capability to predict daily runoff accurately. With negligible base flow at Moutere, the model must be extended so base flow can be turned off and on when required.
4.2. Introducing Base Flow
 A simple base flow mechanism is introduced by which water is slowly released from the bucket, but in such a way as to persist throughout the year. To limit model complexity we chose not to introduce an additional deep groundwater store, since this would require knowledge of the rates of leakage to the deeper store and the rate of discharge from this store, both of which will require considerable soil testing to estimate. Rather, we lump the effects of such leakage and discharge and, for simplicity, introduce the base flow discharge as a linear function of bucket storage. Being a linear response it introduces only one additional parameter, namely the base flow response time (Tcbf). Figure 2c illustrates the operation of the revised model.
Tcbf is used the same way as the parameters controlling subsurface discharge (a and b) and takes the same form as (A7) when b = 1 (linear response). Base flow, calculated via (A8), is assumed not to be limited by field capacity, and operates on the total bucket storage. A similar calibration procedure described for parameters a and b is used to determine Tcbf. The only difference is that estimation of Tcbf is based on the recession limbs of hydrographs below a predefined summer low-flow threshold or during “dry spells”, and it is allowed to include effects of evapotranspiration [Wittenberg and Sivapalan, 1999]. Evapotranspiration is included because the focus is now on summer low flows, during which potential evaporation is as high as measured streamflow. The estimated base flow parameter values, Tcbf, for the four catchments are presented in Table 2. The base flow parameter for Moutere was set to 10,000 to simulate the lack of base flow in very small catchments. Although the volume of base flow produced is minimal, it is still significant since it is able to generate low flows that were not captured by the previous model.
4.3. Results from Improved Model Simulations
 With the extended parameter sets for each catchment, model simulations were carried out and the results presented in Figures 5a, 5b, 5c, and 5d and Table 4. The signature plots, the observed versus predicted streamflow hydrographs, and the summary streamflow statistics all show marked improvements in the prediction of low flows, for all catchments except Moutere (which does not have base flow). On the other hand, the statistics contained in Table 4 suggest that the introduction of base flow has made very little change to the timing and magnitude of the predictions, at all three timescales. This is to be expected since the magnitude of low flows is negligible compared to flood peaks and the annual volumes of base flow would be negligible compared to annual volumes of total runoff.
 Although the addition of base flow has helped to improve the accuracy of model predictions, this additional complexity will increase the uncertainty of the predictions. Hence it is not clear whether the improved accuracy gained through additional complexity justifies the increased predictive uncertainty. The aim of this paper is to capture streamflow variability with confidence and sufficient accuracy, so that the model can be used to identify the dominant sources of variability; hence, it is important to keep the model as simple as possible so as to limit predictive uncertainty. At this stage, the current model predicts streamflow sufficiently accurately at all timescales, so the addition of further complexity is not deemed warranted. Therefore sufficient accuracy has been defined in this paper as predictions where ρx,y is approximately ≥0.8, RR is close to 1, and there is little difference between the observed and predicted signature plots (as described by Farmer et al. ). Sensitivity and error analyses were performed with the current version of the model to investigate the sensitivity of predictions to parameter variations, timescale and climatic conditions and to investigate the tradeoffs between model accuracy and predictive uncertainty.
5. Sensitivity and Error Analyses
5.1. Sensitivity Analysis
 Sensitivity analysis was identified as an effective means of determining the dominant physical controls on streamflow variability, as it provides insight into the operation of the model and explores its response to parameter variations. The sensitivity of annual, monthly and daily flow predictions was assessed at decreasing timescales, as well as between-catchment differences in model sensitivity, with a view to establishing a link between parameter sensitivity, timescale and climate. This of course presupposes that the mechanisms controlling the water balance in all catchments at all timescales have been reasonably well represented in the model.
 It is important to first give a definition of “dominant physical control”, which in this paper refers to a model parameter to which model predictions are extremely sensitive to, when varied over a realistic range of values (estimated from field measurements, observation and experience). The sensitivity analyses performed in this paper used a range of ±50% for all parameters shown in Table 2. In the circumstance where ±50% of the measured parameter value is infeasible (i.e., fc and M), the range of values is set so they do not exceed the maximum and minimum allowable limits (1 and 0 respectively). The model was run repeatedly, with the starting nominal value for each parameter in Table 2 multiplied, in turn, by 0.5, 0.75, 1, 1.25 and 1.5, while keeping all other parameters constant at their nominal starting values. The resulting streamflow time series are assessed for the timing of annual, monthly and daily streamflows using the correlation coefficient ρx,y, and the magnitudes using RR. When varying each parameter in turn over the selected range, there is a spread of values for both ρx,y and RR. Since we require some measure of the spread (which is a measure of the model sensitivity to the parameter values), the differences between the maximum and minimum estimates for ρx,y and RR (Diff ρx,y and Diff RR) are also estimated. These are presented in Figure 6 in separate plots.
 Although sensitivity to variations in rainfall and evaporation were also investigated (presented in a later section), we are more interested in the model response to variations in the physical parameter values.
5.2. Results of Sensitivity Analysis
 The results have been separated into the effects that variations in parameter values have on (1) the timing of annual, monthly and daily streamflow (ρx,y), and (2) the average magnitude of streamflow (RR). If the timing or magnitude of flow is highly sensitive to variations in parameter values, we would expect this to be reflected in large values of Diff ρx,y or Diff RR, respectively.
5.2.1. Timing of Discharge
 For all parameters and all catchments Figures 6a–6c shows increasing sensitivity of the timing of flow with decreasing timescales. This increase corresponds to an increase in sensitivity, and needs to be explained physically. At annual timescales, the timing of streamflow is not important as almost all effective rainfall falling on a catchment in one year is discharged within the year, regardless of parameter values. At monthly timescales, timing appears to become more important in water-limited catchments such as Moutere, Waihopai and Mahurangi where there are distinctive months of high and low flows. As the timing of these high and low flows is dependent on antecedent soil moisture conditions, the correlation between observed and predicted monthly flows is sensitive to the parameters controlling storage (fc and Sbc). This is reflected by the large values of Diff ρx,y for these parameters in Figure 6b. In Figure 6c, the large scatter at daily timescales confirms that timing is very important in all catchments. The timing and magnitude of daily discharge is dependent upon the interaction of all physical processes governed by bucket storage, therefore runoff production and the correlation between observed and predicted hydrographs is sensitive to all of the parameters. However, of all the five parameters, Figure 6c suggests that the timing of daily streamflow is again most sensitive to fc and Sbc.
5.2.2. Magnitude of Discharge
 In Figure 6d the magnitude of flow (Diff RR) in each catchment is found to be either most sensitive to fc or relatively insensitive to all five parameters. For example, in Ngahere and Mahurangi Diff RR is insensitive to all five parameters while in Moutere and Waihopai it is extremely sensitive to fc. In the catchments where Diff RR is extremely sensitive to fc, there is a large risk of under-/over-prediction of streamflow because, according to (A2), (A3), (A6) and (A7), fc controls the bulk of the discharge. Therefore sensitivity of Diff RR to variations in fc can be used to help explain between-catchment differences in runoff response. To explain these between-catchment differences we use a measure of catchment dryness called the climatic dryness index (DI), presented in (5) below as the dimensionless ratio of mean annual potential evaporation to the mean annual rainfall.
 The between-catchment differences can then be explained through a link between DI and the sensitivity of Diff RR to fc. Presented in Table 1 are the DI values for each catchment, which classify Moutere as a dry catchment (high DI), and Ngahere as wet catchment (low DI). There is strong correlation between the DI and the sensitivity of the Runoff Ratio (RR) to fc. The correlation is also supported by evidence that the timing of discharge is dictated by fc, especially in dry catchments. This trend between catchment dryness and sensitivity to fc can be explained with respect to moisture storage.
 In wet, energy limited catchments, effective rainfall dominates the water balance so storage usually lies between Sfc and Sbc, evaporation remains close or equal to the potential rate for most of the year, and runoff is generated continuously via surface and subsurface processes. With such a high storage, fc has little control on runoff production, most of which is generated via the saturation excess mechanism. This is the case for Ngahere where storage fluctuates between Sfc and Sbc for the majority of the year as depicted in Figure 7a. Drier catchments are water limited, receiving less rainfall than potential evaporation. Storage controls runoff production in these catchments, since it tends to fluctuate above and below Sfc most of the year (Figure 7b). As described in (A7), this limits discharge, resulting in streamflow predictions that are highly sensitive to fc. Therefore this parameter can, in dry catchments, control both the timing and magnitude of monthly high/low flows, and the total annual runoff, and can account for the differences between observed and predicted signature plots shown in Figures 5a, 5b, 5c, and 5d.
 Although not an integral part of this study, sensitivity of the timing and magnitude of flow to variations in rainfall and evaporation was also investigated. The results in drier catchments showed that although there was noticeable sensitivity to ±10% variations in rainfall, the timing and magnitude of flow were still most sensitive to soil parameters. In addition, the use of a ±50% range of values in the sensitivity analysis results in the routing parameters (a, Tcbf) changing over a much wider range than the remaining parameters. Additional sensitivity analysis using smaller error bounds for the routing parameters was performed and found to have little effect in terms of importance of parameters. However, the choice of error bounds is something that requires further investigation.
 The sensitivity analyses have thus shown that accuracy generally decreases with decreasing timescale, and that the accuracy of estimated soil parameters (namely, fc) would dictate the accuracy of model simulations. In particular, the accuracy of predictions of streamflow at all timescales is less dependent on the accuracy of soil parameters (namely, fc) in wet catchments (Ngahere) than in drier catchments (Moutere). To compare these trends in model accuracy against trends in predictive uncertainty (which is a measure of confidence in model predictions), systematic error analyses were performed using carefully assessed bounds of uncertainty for each of the parameter values.
5.3. Error Analysis
 The parameter values used in the model simulations are “best guessed” estimates, so all of them possess their own error bounds, based on the reliability of the sources of information. It is possible that the correct but unknown values may be, to some degree, different from those used in the simulations.
 The error bounds for the parameter values for each New Zealand catchment are presented in Table 5, and within these bounds, the parameters are assumed to follow the uniform distribution. The error bounds used in simulations were based on the quality of the sources of information, for each catchment, from which they were derived. The uniform distribution was selected because it is easy to use, and since we are mostly interested in notional estimates of predictive uncertainty only. The error bounds for M, D, Tcbf and a are estimated directly, while error bounds for the parameters fc and Sbc were estimated via the error estimates of the basic catchment properties ϕ, θfc, θpwp, and D. It should be also noted that the error bounds for soil properties are much greater at Moutere, Ngahere and Waihopai than they are for Mahurangi, because the source of information for the latter is much more comprehensive.
Table 5. Error Bounds
Field Capacity (θfc)
Permanent Wilting Point (θpwp)
Soil Depth (D)
a (b = 0.5)
 The effect of these parameter uncertainties on model outputs was tested using the Monte Carlo technique. A random number generator was used to generate numerous combinations of randomly chosen parameter values within the chosen error bounds, and these are used with the model to produce multiple realizations of model predictions for the same climatic inputs. (This is distinct from the sensitivity analysis of the previous section, where we varied one parameter at a time.) Specifically, 100 simulations were performed for each catchment and the results presented in Figure 8 and Tables 6a and 6b. The errors in the timing and magnitude of flow were assessed using ρx,y and RR. Crosses indicate model predictions using the “best guessed parameter estimates”, while the bars represent the error bars for each catchment. The larger the error, the less confidence we can place in the results. Effects of uncertainties in the rainfall and potential evaporation inputs were also tested. The error bounds for the input uncertainties used in these tests (rainfall and potential evaporation) were ±10% (above and below the mean) for each input. However, once again these results will not be discussed in detail because we are mainly interested in parameter uncertainty and the consequent impact on model predictive uncertainty.
Table 6a. Results of Error Analyses With Respect to the Runoff Ratio (RR)
Figures 8a–8c and Table 6b suggest that the error associated with the timing of predicted streamflow becomes progressively larger as we move from annual to daily timescales. These results suggest that, for a given model, we can place progressively less confidence in model predictions as we move to shorter timescales. This was to be expected, since the sensitivity of flow to variations in parameter values also became progressively larger at decreasing timescales.
5.3.2. Magnitude of Discharge
 To assess the level of confidence in the magnitude of flow predictions, the error bars for RR are presented graphically and numerically in Figure 8d and Table 6a. Larger error bars tell us that we can have less confidence in the results of simulations, when using the best-guessed parameter values. Assuming a reasonable correlation between observed and predicted streamflow has been achieved for all four catchments (ρx,y ≈ 0.8), our results imply that we can have relatively little confidence in the results of simulations at Moutere (RR ≫ 1), moderate confidence in the results at Waihopai (RR > 1), and good confidence in the results at Mahurangi and Ngahere (RR ≈ 1). However, smaller errors do not necessarily mean that we have more accurate results, as these are achieved when RR is close to one. When ranking all catchments in order of decreasing accuracy and decreasing confidence, we find that the catchments were in the same order, and this order corresponds to the order of the dryness index (DI). Table 6a and Figure 8d present these results and show the correlation between accuracy, confidence and DI.
 The effect of the 10% uncertainty in rainfall and potential evaporation on model predictions was found to be minimal, increasing the error bars for each catchment by roughly equal amounts. Based on the results of sensitivity analyses, the largest source of error in model simulations remains the uncertainty surrounding the value of fc. The results of this analysis suggest that it is possible to establish a link between accuracy, confidence and DI Dry catchments produced relatively “less accurate” results with “little confidence”, while relatively “more accurate” results were produced with “good confidence” in wetter catchments.
 Combined sensitivity and error analyses have assessed model sensitivity to parameter variations, timescales and catchment dryness. The results have shown that, with decreasing timescales, the simple single bucket water balance model produces progressively less accurate results with increasingly poorer confidence. This is a consequence of the fact that the model becomes increasingly more sensitive to the estimated parameter values.
 Of all the parameters, sensitivity and error analyses have identified fc as a dominant control on streamflow variability in these New Zealand catchments at annual, monthly and daily timescales. The results of sensitivity and error analysis suggest that the degree of dominance is well correlated with the level of confidence in the results (size of error bars), accuracy (assessed by ρx,y and RR), and climatic dryness index (DI). Simulations using the derived model have shown that streamflow prediction in dry catchments is sensitive to the soil parameter fc, with relatively less accurate results and with less confidence, while simulations in wet catchments are less sensitive to fc, producing more accurate results and with greater confidence.
 Therefore accurate measurements of soil properties (namely, fc) are required for predictions of flow in dry catchments but this is less important for predictions in wet catchments. A slight exception to this trend was observed for the Waihopai and Mahurangi catchments whose DI value are similar, yet they failed to display the same RR, or error bounds, or sensitivity to fc. Analysis of Figures 6d and 8d shows Waihopai to have less accurate predictions (indicated by RR), larger output errors and greater sensitivity to fc. The most likely cause of this is the difference in the sources of soil information. The source at Mahurangi is extensive, and more reliable than the information used at Waihopai, resulting in reduced parameter uncertainty, and reduced predictive uncertainty and sensitivity to parameter variations. Therefore the link established between DI, accuracy, error/confidence and sensitivity only applies when the sources of soil information are the same or of a similar quality. When all catchments have measured soil properties to the accuracy required to achieve similar levels of confidence in the results, the necessity of further increases of model complexity can be investigated. Detailed soil information was not available at all catchments in this case, so additional model complexity was not introduced, although this can be hypothesized.
 The results of simulations, sensitivity analysis and error analysis can also be used to identify the conditions under which additional model complexity is warranted. Simulation in wet catchments such as Ngahere suggest that model parameters have very little influence on the magnitude of streamflow produced at annual, monthly and daily timescales. Accurate predictions with confidence can be made at annual and monthly timescales using very basic models (such as the Manabe bucket), because the parameters and processes they control have little effect on the water balance. The influence of model parameters (namely, fc), does become important in the timing of daily streamflow and in the capturing low flow events, so the current model complexity is required to predict daily flows. Conversely, simulations in dry catchments such as Moutere suggest that the soil parameters have a stronger influence on the timing and magnitude of streamflow produced at annual, monthly and daily timescales. Therefore accurate predictions, with confidence, require the maintenance of the current model complexity at all three timescales, provided accurate soils information is available. Although detailed soils information may help improve the results, further complexity may be required to achieve daily predictions of acceptable accuracy.
 The combined results suggest that there are clear differences in the complexity required to model streamflow to a predefined accuracy and confidence, between wet and dry catchments, and with changing timescales. To summarize this trend a series of tables are presented listing the model complexity required for accurate predictions at annual (Table 7a), monthly (Table 7b), and daily (Table 7c) timescales for catchments with varying catchment dryness. All four New Zealand catchments have been considered in these tables along with an additional five Australian catchments studied by Farmer et al. , who found a similar relationship between model complexity and timescale, but did not investigate its connection with catchment dryness in detail. The models of increasing complexity compared in Tables 7a, 7b, and 7c include: model A, the simple bucket model (Figure 2a) described by Manabe ; model B, single storage model with a nonlinear subsurface discharge mechanism (Figure 2b); model C, single storage model with a nonlinear subsurface discharge mechanism and linear base flow (Figure 2c); model D, complex models. The complexity required has not been defined but is needed for accurate predictions. Suggested complexities may include an unsaturated zone, routing, distributed representation, interception, hillslope representation and nonlinear base flow.
 “Appropriate model complexity” was identified when simulations produced predictions with ρx,y approximately ≥0.8, RR close to 1, and there was little difference between the observed and predicted signature plots (as described by Farmer et al. ). Analysis of the tables suggests the required model complexity increases with decreasing timescales, and with increasing catchment dryness (DI). This trend has been expressed schematically in Figure 9 in the form of 2-D picture linking required model complexity to the climatic dryness index (DI) and timescale. At this stage, the contours drawn into the schematic plot are merely qualitative indicators. Types of model complexity are added in, where appropriate, to indicate the type of processes that are considered important under the different combinations of DI and timescale. The admittedly qualitative relationship shown in Figure 9 assumes homogeneity of catchment characteristics, and roughly similar sized catchments. Although the model complexity needed for accurate predictions of hourly streamflow has been investigated recently by Atkinson  using data from Mahurangi catchment, it has been left out of Figure 9 for reasons of brevity.
 The qualitative relationship shown in Figure 9 is a useful guideline that can aid the a priori selection of model complexity. It can help the modeler to choose a simple but effective model, avoiding the use of unnecessary parameterizations that may eventually cloud the physical meaning of the model predictions. With a model thus selected, the dominant controls on flow variability can then be identified and measures taken to accurately measure the appropriate model parameters to ensure acceptably accurate predictions are achieved, with sufficient confidence.
 Using a downward approach, and starting with a simple single bucket water balance model with physically meaningful parameter values, we have systematically added further process complexity (as required), with a view to making accurate predictions of streamflow at the annual, monthly and daily timescales in four New Zealand catchments. The results of sensitivity and error analyses suggest that the dominant control on streamflow variability is the soil parameter fc, which is related to field capacity. For this model, it was found that more accurate predictions can be made with better confidence in wetter catchments, using relatively simple models, since such catchments tend to be less sensitive to fc under the prevailing energy-limited conditions. Under water limited conditions, such as those found in dry or arid catchments, it is much harder to accurately predict streamflow with confidence, because the model is much more sensitive to fc, so further model complexity may be required. More detailed soils information is therefore required to achieve accurate predictions.
 The results of sensitivity and error analyses enabled a qualitative conceptual relationship to be established between model complexity, measure of climatic wetness (dryness index), and the timescale. This relationship, presented by Figure 9, suggests that required model complexity increases with decreasing timescale, and increasing dryness index. It is suggested that the development of this relationship, and its physical interpretation, is a step forward toward the development of a consistent method of a priori model selection that incorporates just the minimum level of complexity needed to predict streamflow with good confidence and acceptable accuracy. This way, unnecessary model complexity and parameterization are avoided, thus allowing us to explore the physical controls of streamflow variability in natural catchments, and then to spend more of our efforts toward estimating the critical model parameters either in the field or from other data sources.
Appendix A:: Model Equations
[NaN] Bucket capacity
Threshold storage parameter and threshold storage
Bare soil evaporation
bucket capacity, the maximum storage of the bucket model, (mm).
soil depth, measured as the depth to an impervious layer promoting lateral subsurface runoff, (mm).
field capacity (dimensionless).
permanent wilting point (dimensionless).
threshold storage parameter (dimensionless) (0 < fc < 1).
threshold storage (mm).
bare soil evaporation (mm/d).
potential evaporation (mm/d).
forest cover (dimensionless) (0 ≤ M ≤ 1).
S(t + 1)
storage the next day (mm/d).
subsurface runoff (mm/d).
saturation excess runoff (mm/d).
nonlinear discharge parameter (mm1−bdayb).
nonlinear discharge parameter (dimensionless),
base flow (mm/d).
base flow response time (days).
correlation between observed and predicted streamflow (dimensionless).
runoff ratio (dimensionless).
measure of spread in ρx,y for variations in parameter values (dimensionless).
measure of spread in RR for variations in parameter values (dimensionless).
 The authors are grateful to the National Institute of Atmospheric Research (NIWA), New Zealand, for partial financial support, which enabled the first author to travel to New Zealand at the beginning of this study. The authors thank the many staff of NIWA for their valuable, support, and for the provision of all necessary data and information. Special thanks are due to Maurice Duncan for his assistance in accessing and interpreting the raw data. The study was funded through an Australian Postgraduate Award (APA) awarded to the first author, and through an Australian Research Council Small Grant awarded to the third author through the University of Western Australia. CWR Reference ED 1557 SA.