A Bayesian approach was used to fit a conceptual transpiration model to half-hourly transpiration rates for a sugar maple (Acer saccharum) stand collected over a 5-month period and probabilistically estimate its parameter and prediction uncertainties. The model used the Penman-Monteith equation with the Jarvis model for canopy conductance. This deterministic model was extended by adding a normally distributed error term. This extension enabled using Markov chain Monte Carlo simulations to sample the posterior parameter distributions. The residuals revealed approximate conformance to the assumption of normally distributed errors. However, minor systematic structures in the residuals at fine timescales suggested model changes that would potentially improve the modeling of transpiration. Results also indicated considerable uncertainties in the parameter and transpiration estimates. This simple methodology of uncertainty analysis would facilitate the deductive step during the development cycle of deterministic conceptual models by accounting for these uncertainties while drawing inferences from data.
 Conceptual or mechanistic models are developed as mathematical representations of the underlying mechanisms governing the modeled processes based on available scientific knowledge or hypotheses [Swartzman and Kaluzny, 1987]. Therefore such models are more suited to Earth systems science applications, such as developing quantitative and scientific understanding of Earth systems in terms of components and processes or making informed decisions regarding long-term management of these systems [Wainwright and Mulligan, 2004], compared with purely empirical models [Box et al., 1978; Beven, 1989]. Conceptual models are also easier to interpret and extend by incorporating additional information. However, such models are not completely mechanistic and necessarily simpler than the real system due to constraints such as the availability of scientific knowledge, data, and computational resources [Oreskes et al., 1994; Ellner et al., 1998; Kendall et al., 1999; Turner et al., 2001]. The simplifications in the model, natural variability in system response, and measurement errors lead to mismatches between modeled and observed responses. For most applications, the mismatches are minimized by estimating the model parameter values through calibration [Klemeš, 1986; Janssen and Heuberger, 1995; Sorooshian and Gupta, 1995]. However, uncertainties inherent in such estimation processes are traditionally not quantified. Lack of this information compromises the ability to statistically test hypotheses, compare model structures for suitability for specific applications, compare parameter values for different systems, or provide an estimate of expected errors in the predictions obtained using the calibrated model.
2. A Discussion of Uncertainty Estimation Methodologies
 Uncertainties associated with conceptual hydrologic models may be represented using several distinct approaches, which lead to the various uncertainty estimation methodologies currently used for this purpose. One frequently used representation stems from the concept of Pareto optimality proposed by Gupta et al. . The derived methodologies [e.g., Gupta et al., 1999; Boyle et al., 2000; Madsen, 2000; Wagener et al., 2001; Vrugt et al., 2003a] estimate the parameter uncertainty in the form of a Pareto solution set identified through multiobjective optimization and quantify the prediction uncertainty from the set of predictions obtained using the entire Pareto solution set or one of its subsets. Another commonly used methodology is the generalized likelihood uncertainty estimation procedure (GLUE) [Beven and Binley, 1992], which adopts the concept of equifinality [Beven, 1993] to represent the uncertainties. In GLUE, the definition of the likelihood function in a Bayesian framework is generalized to include functions that are traditionally used as model performance measures, which are rescaled to resemble probability distribution functions. Although probability distribution functions can also be used within GLUE, rescaled performance measures are most commonly used [e.g., Freer et al., 1996; Franks and Beven, 1997; Franks et al., 1999; Cameron et al., 1999; Page et al., 2003; Candela et al., 2005; McCabe et al., 2005; McMichael et al., 2006] due to the reasons explained by Beven and Freer . In such situations, the likelihood value “may be treated as a fuzzy measure that reflects the degree of belief of the modeler” [Beven and Binley, 1992, p. 287]. Therefore such likelihood values may not be possible to interpret strictly in terms of probability [e.g., Thiemann et al., 2001; Engeland and Gottschalk, 2002; Christensen, 2003; Montanari, 2005] and may have alternative interpretations, such as in terms of fuzzy set theory [e.g., Franks and Beven, 1999; Samanta and Mackay, 2003]. As might be seen from numerous examples in the hydrologic literature, methodologies based on the above representations of uncertainty are very useful in various contexts, such as where the errors are difficult to describe in terms of a probability density function. However, an explicitly probabilistic representation of the uncertainties facilitates the use of well-established statistical methods for checking assumption and drawing inferences. Therefore this representation was considered to be the most compatible with the objectives of this study.
 Several uncertainty estimation methodologies based on probability theory are also available in the hydrologic literature. These methodologies usually recast a deterministic conceptual model in a probabilistic form and use either analytical techniques [e.g., Kuczera, 1983; Krzysztofowicz, 1999; Montanari and Brath, 2004] or simulation techniques based on Bayesian statistics [e.g., Kuczera and Parent, 1998; Bates and Campbell, 2001; Engeland and Gottschalk, 2002; Thiemann et al., 2001; Balakrishnan et al., 2003; Vrugt et al., 2003b] to estimate the uncertainties. One advantage of using a Bayesian approach is that many powerful computational techniques are available for Bayesian analysis [Gelman et al., 1995], e.g., Markov chain Monte Carlo (MCMC) simulations, which are not available for traditional statistical analysis. High-dimensional nonlinear models common in the environmental sciences are difficult to analyze without such techniques. In addition, a Bayesian approach allows for the explicit introduction of assumptions and prior knowledge of the system into the model in the form of prior distributions [Box and Tiao, 1973]. Therefore this approach is suitable for combining knowledge from diverse sources commonly used for building models of natural systems. A Bayesian methodology using the MCMC simulation technique was adopted here with the above considerations.
 Inferences from a Bayesian analysis depend on the choice of prior distributions and the model, which determines the form of the likelihood function. Therefore the applicability of the assumptions that determine the form of the likelihood function may vary from system to system. As seen in the results of the studies cited above, various problems may be encountered during the analysis, e.g., underestimation of uncertainty, nonconvergence of Markov chains to stationary distributions, and residuals not conforming to assumptions. Therefore there is a need for further methodological developments and application studies in order to assess the usefulness of the Bayesian approach in various conceptual modeling contexts.
3. Transpiration Simulation Model
 The conceptual model used in this study calculates the rate of canopy transpiration from environmental data using the Penman-Monteith equation [Monteith, 1965]. The mathematical functions associated with the estimated parameters and the adopted process simplifications in the model are briefly described below. A complete description of the model is provided by Samanta . The model input was a sequence of environmental measurements, x. Each element in x, xi (i = 1, 2, …, n; in the temporal order of acquisition), was a vector of values for the environmental variables described in section 5. The rate of total canopy transpiration per unit ground area, Ecanopy(xi, β), corresponding to the input xi and the vector of calibrated parameters, β, was calculated as
where s is the slope of the saturation mole fraction function, Rabs is the absorbed global radiation per unit ground area, es is the surface emissivity, σb is the Stefan-Boltzmann constant, Ta is the air temperature above canopy, G is the ground heat flux, γ* is the apparent psychrometric constant, λ is the latent heat of vaporization of water, Da is the vapor pressure deficit above canopy, pa is the atmospheric pressure, and gv is the vapor conductance of the transpiring surface. The radiation environment within the canopy was modeled from incident photosynthetic photon flux density [Spitters et al., 1986] with the canopy subdivided into two classes of leaves, one sunlit and the other shaded [Campbell and Norman, 1998]. The boundary layer vapor conductance, gva, and the vapor conductance of the canopy surface, gvcanopy, were combined in series to calculate gv. The mechanism of turbulent transport with diabatic corrections was used to calculate gva [Campbell and Norman, 1998]. For this study, only the parameters associated with the calculation of gvcanopy were calibrated, as the transpiration estimates from a model of this type have very low sensitivity to aerodynamic conductance parameters [Dekker et al., 2001]. The remaining model parameters were held constant at the values shown in Table 1, which were adopted based on the recommendations by Campbell and Norman .
Table 1. List of Uncalibrated Transpiration Model Parametersa
These parameters are dimensionless and were held constant at the values shown.
leaf angle distribution parameter
fraction of total incident solar energy in photosynthetically active radiation band
leaf absorptivity in photosynthetically active radiation band
leaf absorptivity in near-infrared band
ratio of zero plane displacement and canopy height
ratio of momentum roughness length and canopy height
ratio of heat and momentum roughness lengths
 The value of gvcanopy was calculated as the sum of the surface vapor conductances for the sunlit and shaded classes of leaves [Norman, 1993; Campbell and Norman, 1998]. The surface vapor conductance for each class of leaves, gvc, was calculated as
where gS is the stomatal conductance per unit leaf area and Lc is the single-sided leaf area index. The value of Lc was calculated by assuming a spherical distribution of leaves in the canopy using the following equation [Campbell, 1990; Norman, 1993; Campbell and Norman, 1998]:
where L is the single-sided leaf area index of the entire canopy and Kb is the canopy extinction coefficient for beam radiation. The stomatal conductance model proposed by Jarvis  (subsequently referred to as the Jarvis model) was used for calculating gS as
where gSmax is the highest conductance for fully developed leaves per unit leaf area, Dc is the vapor pressure deficit within the canopy, δ is the linear rate of reduction in gS with increasing Dc, Qp is the average photosynthetic photon flux density specific to the class of leaves, and A is the ratio of the asymptotic value of gS at infinite Qp in the absence of other constraints and the value of the derivative (dgS/dQp) at Qp = 0. Effects of air temperature and soil moisture on gS were expected to be minor based on other analyses of surface flux data from this region [e.g., Ewers et al., 2002; Cook et al., 2004; Desai et al., 2005] and therefore were not included in the model.
 The transpiration data used in this study were obtained between 5 May 2001 and 19 September 2001and are shown in Figure 1 by plotting along ordinal day of year, DOY. The data points shown as solid circles were used for estimation of parameter values, while the entire data set was used for the leaf area adjustment explained below. The transpiration values were generally higher during the middle of this period compared with those at either end. However, the model above does not include any mechanism capable of capturing this gradual change and therefore is likely to underestimate the transpiration during the middle of the simulation period and overestimate those during the beginning and the end. This gradual change in transpiration over the 4.5-month period is likely to be the combined effect of a number of processes, e.g., leaf aging, changes in leaf, sapwood, and root areas [e.g., Janecek et al., 1989; Kikuzawa, 1995; Morecroft and Roberts, 1999; Wilson et al., 2000; Wang et al., 2004]. However, modeling the effects of the individual processes explicitly was not possible here due to the unavailability of requisite data, e.g., frequent monitoring of L. Therefore a simple semiempirical approach was adopted by representing the combined effect of all the relevant processes by an effective L dynamic derived from observed transpiration values. In this approach, which is similar in principle to but different in its mathematical formulation from the approach adopted by Dekker et al. , L in equation (3) was replaced with LDOY, an adjusted value of the leaf area index as a function of DOY. LDOY calculations were carried out in two steps. First, a second-degree polynomial was obtained by fitting the observed transpiration to DOY using the method of local polynomial regression by weighted least squares (loess) [Cleveland et al., 1992] and the open source statistical language R [R Development Core Team, 2004]. The resulting curve is shown superimposed on the observed transpiration values in Figure 1. Then the fitted transpiration values, lfDOY, obtained from this curve for each DOY, were used to calculate LDOY using the following equation:
where lfmax is the maximum value of lfDOY during the period and lfscl is a calibrated parameter that determines the rate of change of LDOY with respect to lfDOY. The value of lfDOY is constant over any one day and therefore does not influence the modeling of transpiration at half-hourly time steps within a single day. The parameters associated with the gvc model component described above, namely, gSmax, δ, A, and lfscl, were components of the parameter vector, β, estimated using the MCMC methodology described in the next section.
4. MCMC Simulation Method for Model Calibration and Uncertainty Estimation
 A Bayesian approach [Bayes, 1763], with the assumption that the errors are independent and normally distributed with a constant but unknown variance, σ2, was used to fit the transpiration model to a series of observed transpiration data, E. With the addition of the normal error term, the transpiration model could be expressed probabilistically as
where Ecanopy(xi, β) is the transpiration rate modeled using equation (1), Ei is the element in E temporally corresponding to xi, and ɛi is the error. Therefore, from the properties of the normal distribution and the assumption of independent errors, the likelihood function for the entire series E, containing n observations, is given by
The noninformative prior distribution used in this analysis was
which assumes that β is distributed uniformly within a specified interval and the prior distribution of σ is uniform over logσ [Box and Tiao, 1973; Gelman et al., 1995]. The upper and lower limits used for the prior distributions of β, which are shown in Table 2, were based on the following considerations. In order to ensure a positive value of gvc for all xi, the lower limit of gSmax was placed at 0.001 mol m−2 s−1, a very small positive value for tree species. The upper limit for gSmax was placed at 0.5 mol m−2 s−1, which was more than twice the maximum stomatal conductance for sugar maple reported by Ellsworth and Reich [1992a]. To ensure positive gvc, upper limits for the parameters δ and lfscl were set at 0.5 kPa−1 and 1.2, respectively. Conceptually, gvc decreases with increasing Dc and increases with increasing Qp, and therefore the lower limits for δ and A were set at zero to preserve the nature of these relationships. The remaining limits were placed well beyond the expected values of the parameters so that the limits would not influence the posterior parameter distributions. These limits were determined from the results of preliminary simulations and were checked against the final results.
Table 2. Posterior Estimates and Uncertainties for the Parameter Vector, β = (gSmax, δ, A, lfscl), and the Error Standard Deviation, σ
Limits on Prior Distribution
95% Posterior Interval
mol m−2 s−1
μmol m−2 s−1
8.7020 × 10−6
8.3190 × 10−6, 9.1159 × 10−6
 Using the above prior, the joint posterior distribution, also called the target distribution, is defined as
The posterior distribution was sampled using a Markov chain Monte Carlo (MCMC) simulation method based on the Metropolis algorithm [Metropolis and Ulam, 1949]. Detailed description of the algorithm may be found in texts on Bayesian statistics [e.g., Gelman et al., 1995], as well as in the hydrologic literature [e.g., Kuczera and Parent, 1998; Vrugt et al., 2003b]. In MCMC simulation, draws from the joint posterior distribution are iteratively simulated by first generating a candidate parameter value from a proposal distribution (candidate-generating density). Next, the decision to accept or reject the candidate parameter value is made based on the ratio of posterior densities at the candidate parameter value to that at the currently accepted parameter value. After a chain has reached convergence to stationary distribution, subsequent draws are considered to be samples from the posterior distribution.
 For the present analysis, four Markov chains, initialized with randomly generated starting parameter values, were run for 200,000 iterations each. The number of iterations necessary for chain convergence was determined visually by plotting traces of sampled parameter values against iterations for all the chains [Kass et al., 1998] and quantitatively by monitoring the potential scale reduction factor estimated by [Gelman and Rubin, 1992] so that the value of for none of the parameters was greater than 1.2 for the second halves of the chains as recommended by Gelman et al .
 The proposal distribution used in the Metropolis algorithm is symmetric and centered at the currently accepted parameter value. However, the choice of the proposal distribution can greatly affect the efficiency of sampling from the target distribution [Chib and Greenberg, 1995; Gelman et al., 1995; Vrugt et al., 2003b]. A low proportion of accepted jumps to candidate parameter values (i.e., a low acceptance rate) and high autocorrelation in the chains often indicate an inefficient proposal distribution. Chib and Greenberg  show that many different forms of proposal distributions may be used, as long as reasonable acceptance rates are achieved and the resulting chains cover the parameter space. Gelman et al.  recommend tuning the scale of the proposal distribution to obtain an acceptance rate of about 0.23 for five or more parameters through experimentation and run-time adjustments. In this study, the proposal distribution for each parameter was normal with the mean at the current value of that parameter and independent of the other parameters. The variance was updated every 1000 iterations using the parameter values from 20 most recently accepted jumps to achieve acceptance rates between 0.20 and 0.21. Because the acceptance rates were slightly lower than the recommended value, the issue of autocorrelation in the parameter samples was further addressed by subsampling, as obtaining reasonable samples was considered to be more important than computational efficiency for this study [Geyer, 1992]. From autocorrelation plots of the chains, a suitable lag value was determined beyond which the autocorrelation was negligible for all parameters compared with a 95% confidence interval for an uncorrelated series. This lag value was used to systematically subsample each Markov chain. The subsampled sequences of parameter values from the second halves of all the chains were combined and stored as an array of simulated draws from the posterior distributions. This array is referred to as the posterior array and utilized for posterior inferences regarding parameters and predictions.
 Posterior estimates and uncertainties of β and σ2 were summarized by their expected values and posterior intervals. The expected value of a parameter was estimated by the mean of its posterior distribution. The posterior interval corresponding to probability α is defined as the range of values such that exactly α/2 of the posterior probability lies above and below this range [Gelman et al., 1995]. The posterior intervals for β and σ2 were computed from the frequency of occurrence of values in the posterior array using the above definition. The posterior estimate and uncertainty for E, used for checking the model, were computed by simulating its replicates [Gelman et al., 1995]. Each sample in the posterior array of parameter values was used once to simulate one replicate. Therefore the number of simulated transpiration series in each replicate was equal to the number of samples in the posterior array. Ten such replicates were used to derive the estimated mean and the posterior intervals for E in order to reduce the effects of chance occurrences of unlikely values, although using fewer replicates did not lead to visibly different posterior intervals in this case.
 The data used for this study were collected as part of the Chequamegon Ecosystem-Atmosphere Study (ChEAS) [Bakwin et al., 1998; Davis et al., 2003], a collaborative research effort that maintains multiple data collection sites located in and around the Chequamegon-Nicolet National Forest in northern Wisconsin. The above-canopy micrometeorological data, incident photosynthetic photon flux density, and ground heat flux data were from the Willow Creek site [Cook et al., 2004] with small gaps filled in using data from the WLEF TV eddy flux tower at Park Falls. The gaps were possibly due to equipment malfunctions in the field and constituted less than 10% of the used data set. The sap flux and midcanopy micrometeorological data were from the Hay Creek site within the adjacent Hay Creek Wildlife Management Area [Ewers et al., 2002, 2007a, 2007b]. The study sites at Willow Creek and Hay Creek are a little over 21 km apart and have similar sandy loam soils. The forests at both sites consist of upland hardwoods dominated by sugar maple (Acer saccharum Marsh.) and basswood (Tilia americana L.).
 The canopy transpiration data were average half-hourly transpiration rates per unit ground area (mm s−1) for eight sugar maple trees at Hay Creek obtained from measurements of sap flux and sapwood area per unit ground area using methodologies described by Oren et al.  and Ewers et al. . The average canopy height of the sugar maple trees at this site was 18.6 m. The value of L was 4.6, calculated from litter-fall data collected in 2001 [Ewers et al., 2007b]. The simulated period was between 9:00 A.M. to 6:00 P.M. Central Standard Time (CST) from 5 May 2001 to 19 September 2001, DOY 125 and 262, respectively. However, only 83 observed days out of the 138 days within the above period could be used due to the limitations described later in this section. The maximum gap between two successive simulated days was 8 days.
 The model input vector, xi, consisted of Dc (kPa), G (W m−2) at 7.5 cm soil depth, and above-canopy data measured at 29.6 m, namely, Ta (°C), Da (kPa), pa (kPa), incident photosynthetic photon flux density (μmol m−2 s−1), and wind speed (m s−1). The midcanopy measurements were made at two thirds of the canopy height. A sequence of 2579 half-hourly transpiration values was available within the above period. This entire sequence (both solid and open circles in Figure 1) was used to calculate the lfDOY values. However, xi values corresponding to only 1899 transpiration values in the above sequence were available for running the model. Out of these 1899 measurements, 708 data points corresponding to Da values less than 0.6 kPa were discarded due to the potential for large errors in transpiration estimates obtained from sap flux [Ewers and Oren, 2000]. Moreover, the process of evaporation of intercepted precipitation or dew at the leaf surface was not incorporated in the model for the sake of simplicity. Therefore 182 additional data points, coincident with precipitation or temperature inversions, were removed from the data, as the omitted evaporation process could be important at these points. Finally, an additional 37 data points corresponding to very low incident photosynthetic photon flux density, less than 200 μmol m−2 s−1, were removed from the data. The data eliminated by this last condition were measurements made on a few of the observed days after 4:00 P.M. After eliminating the measurements based on the above considerations, number of elements n in the sequences E and x was 972. Because the goal of this study was evaluation of the fit of the model and uncertainty analysis, all of E and x were used for that purpose without setting aside any data for split sample validation.
 All four Markov chains showed visual indication of convergence by the initial 100,000 iterations, considered the “burn in” period. All the chains were similar to the one shown in Figure 2, although with different starting values for β and σ2. Convergence to stationary distributions for all parameters and similarity among chains were also supported by the values, which were between 1.0001 and 1.0016. The final 100,000 iterations from each chain were subsampled (refer to section 4) by retaining every 100th sample to obtain a total of 4000 samples of β and σ2 values in the posterior array. The posterior distributions of β and σ are shown as histograms in Figure 3 and summarized in Table 2 by posterior estimates and 95% posterior intervals. The posterior distributions were approximately symmetric with single modes and well within the upper and lower bounds placed on the prior distribution of β. The much smaller spread of the posterior distributions for β and σ, compared with their prior distributions, indicated that these parameters were identifiable with the Bayesian framework [Kass et al., 1998]. The parameters within β show evidence of correlation in the two-dimensional contour plots of the posterior distributions (Figure 4). These correlations were consistent with equations (2) and (4), as different combinations of parameter values may generate the same value of gS. However, from the contour plots shown in Figure 5, the value of σ did not have any obvious correlation to β. Therefore the prediction uncertainty appeared to be independent of the transpiration estimates themselves, which depend on β.
 The plot of estimated against observed transpiration values (Figure 6a) shows an approximately linear relationship. However, substantial estimation errors are indicated by the wide spread of points around the superimposed one-to-one line. Moreover, the low observed values of transpiration appeared to be overestimated by the model. The residuals (estimated transpiration subtracted from the observed values) did not show pronounced bias or obvious indications of unequal variance when plotted against estimated transpiration (Figure 6b). The overall structure of the residuals showed reasonable consistency with the assumption of normality. However, the highest 5%, approximately, of the estimated transpiration values overestimated the observations. The very low transpiration estimates were also generally associated with small amounts of overestimation; however, the most severe cases of underestimation were also associated with the low transpiration estimates.
 The posterior density regions corresponding to 99, 95, 90, 75, and 50% posterior probabilities, which characterized the uncertainty in estimated transpiration, bounded 98.8, 94.1, 89.5, 74.6, and 50.0% of the observations, respectively. Therefore the posterior density regions provided acceptable probabilistic estimates of prediction uncertainty for the transpiration sequence as a whole. The posterior density region corresponding to 95% posterior probability is shown in Figure 7, superimposed on estimated and observed transpiration values. In general, the transpiration estimates followed the observations closely and the observations were well within the 95% posterior density region. However, relatively large differences between estimated and observed transpiration occurred periodically, e.g., DOY periods 125–138, 217–219, and 232–234. Nearly 75% of the observations that lie outside this posterior density region were from only seven out of the 83 observed days, namely, DOY 135, 136, 214, 232, 233, 234, and 254. Even for small residual values, instances of underestimation or overestimation were usually clustered along the sequence. Therefore the errors may not be considered strictly independent. On the basis of the above results, the estimates of transpiration and prediction uncertainty obtained from this model may not be accurate for short periods, e.g., a week, but might provide acceptable estimates for longer periods.
 On the basis of the evidence of chain convergence and the smaller spread of the posterior distributions compared with the noninformative priors (Table 2), the parameters for the conceptual transpiration model were identifiable within this Bayesian framework [Kass et al., 1998]. However, large uncertainties were associated with β, σ, and the transpiration estimates. The estimates of β and σ were not correlated. Therefore the use of optimum β and σ, without accounting for their uncertainties, would lead to an underestimation of prediction uncertainty.
 Because noninformative priors were used for this analysis, the parameter estimates could be considered as solely determined by the information in the transpiration data within the framework of the model used, i.e., the conceptual transpiration model with normally distributed errors [Gelman et al., 1995]. However, the parameters gsmax, δ, and A also have conceptual interpretations, as they describe the dependence of stomatal conductance on environmental conditions. Therefore their values might also be determined using approaches other than calibration of the transpiration model using the same underlying conceptualizations. As an example, implications for parameterizing this model using the alternative of using regression of directly observed stomatal conductance to environmental conditions are discussed below by comparing the values of similar parameters obtained by the two approaches. Analyses of experimental data for sugar maple in Wisconsin by Ellsworth and Reich [1992a, 1992b] and Tjoelker et al.  are used for the following comparison. The values of maximum stomatal conductance, 0.15 mol m−2 s−1 (clearing) and 0.095 mol m−2 s−1 (understory), compare reasonably well with the gsmax estimate of 0.1045 mol m−2 s−1, considering the associated uncertainties. The slightly lower value of gsmax might be due to the calibration effect of the Penman-Monteith equation [Baldocchi et al., 1991]. The difference between the sensitivity of midafternoon stomatal conductance to daily maximum leaf-to-air vapor pressure deficit (0.284 for clearings and insignificant for understory) and the estimated value of δ (0.2169) appears to be significant. This difference might be due partially to the lower mean value of gsmax compared with the average maximum stomatal conductance for sugar maple, an interdependence noted by Oren et al.  and also seen in the contour plot between gsmax and δ (Figure 4). The photosynthetic capacity was found by Ellsworth and Reich [1992b] to reach 95% of the maximum at a photosynthetic photon flux density of 255 μmol m−2 s−1. In contrast, only about 73% of the maximum conductance is reached at the Qp value of 255 μmol m−2 s−1 with the calibrated A value, which amounts to a significant difference if gs is considered to be proportional to the photosynthetic capacity [Leuning, 1995]. The higher dependence of gs on Qp in the model might be the result of additional constraints on gs present in reality, e.g., the effects of temperature and soil moisture, but not imposed in the model. The interdependence between lfscl and A (Figure 4) suggests that a different model of the L dynamic might also influence the estimate of A through lfscl and help to resolve this difference. Resolving the various possible underlying causes of these differences, e.g., simplifications in the model that affect the conceptualization, differences between the two compared systems, holding some of the parameters fixed (Table 1) instead of estimating from data, is difficult without further data collection and modeling with specifically this goal. However, the differences noted above suggest that parameterizing this transpiration model solely on the basis of conductance measurements may lead to bias in the transpiration estimates, and therefore this model may need calibration with transpiration data when unbiased estimates are required.
 When the entire sequence of transpiration is considered, the residuals do not appreciably deviate from the assumption of normally distributed errors with constant variance (Figure 6b). However, a comparison between the observed and estimated transpiration sequences (Figure 7) shows that the assumption of random errors was violated at scales of the order of 1 or 2 days. However, no physical or conceptual explanation of these observed patterns was readily obtainable based on direct correlations between the residuals and the data available for this modeling exercise. Although these violations did not lead to underestimation or overestimation of prediction uncertainty for the whole sequence, as shown by the posterior density regions, their systematic structure indicates deficiencies in the model and suggests the possibility of more accurate and reliable modeling of transpiration, particularly at small timescales, through appropriate modifications to the model.
 Identification of such modifications that improve the model may be achieved by following the iterative data-induction/model-deduction sequence illustrated by Box . One possible approach of inducing new models for further evaluation based on data is by using alternative functional representations of existing processes or incorporating missing processes in the conceptual part of the transpiration model. Examples of such modifications for this transpiration model include incorporating the dependence of gs on leaf temperature and leaf water potential, modeling gs using a different approach [e.g., Ball et al., 1987; Leuning, 1995], using process-based models of leaf phenology and other causes of long-term variation in gvc [e.g., Janecek et al., 1989; Kikuzawa, 1995; Morecroft and Roberts, 1999; Wilson et al., 2000; Wang et al., 2004], distributed parameterization of the canopy [e.g., Ellsworth and Reich, 1992b; Oren et al., 1999; Sellin and Kupper, 2005], and modeling the lag between transpiration and sap flux [e.g., Schulze et al., 1985; Granier et al., 1996; Goldstein et al., 1998; Herzog et al., 1998; Ewers and Oren, 2000; Kumagai, 2001; Meinzer et al., 2003]. Alternatively, or in conjunction with the above, the independent and normally distributed error model might be replaced by an autoregressive (AR) model [Box et al., 1994]. However, Bayesian analyses of conceptual hydrologic models indicate that the use of the AR error model might not lead to improvements in the residuals, but might result in nonconvergence of Markov chains or inappropriate parameterization of the conceptual part of the model and thereby prevent proper evaluation of the models ability to provide acceptable estimates [e.g., Bates and Campbell, 2001; Engeland and Gottschalk, 2002]. The results obtained here show that convergent chains, reasonable estimates of the parameters, and consistent overall errors for the simple conceptual transpiration model were possible to obtain with the normal error model. Therefore the normal error model, recommended by Kuczera  and Engeland and Gottschalk  as a general and consistent basis for conceptual model parameterization, appears to be useful for analyzing model performance and identifying modifications required in the conceptual model to improve the modeling of transpiration.
 However, modifications to the conceptual model, such as those mentioned above, make the model more complex and increase the number of parameters requiring calibration, besides increasing the requirements for observed data. Usually, a complex model with a larger number of calibrated parameters is more susceptible to over-fitting under calibration compared with a simpler one [Akaike, 1974; Gaganis and Smith, 2001]. Estimating a large number of parameters may also pose difficulties due to limited information content in the data commonly available for calibrating surface flux models [e.g., Dekker et al., 2001; Wang et al., 2001]. Therefore, to identify the most useful structural changes, a quantitative model comparison methodology accounting for both parameter uncertainty and model complexity, using statistical model comparison metrics [e.g., Akaike, 1974; Schwarz, 1978; Spiegelhalter et al., 2002], might be adopted within this Bayesian approach to continue the model development cycle.
 The study presented here illustrates the use of a Bayesian methodology for the statistical analysis of a deterministic and conceptual model of canopy transpiration required in the deductive step of the iterative model development cycle [Box, 2001]. The assumption of independent, homoscedastic, and normally distributed errors led to uncertainty estimates that are consistent with the probability assignments overall, but also showed appreciable deviations at small temporal scales. However, the uncertainties were found to be considerable and therefore important to take into account when drawing inferences from data, where this methodology would be useful. Because the uncertainty estimates were obtained by the addition of only one error term to a single model output, the underlying mechanistic structure of the model could be preserved. Therefore it might be possible to use this simple approach to obtain realistic estimates of prediction and parameter uncertainties associated with the use of physically based conceptual models in general and facilitate the inductive step in the model development cycle by retaining the ease of structural modification and interpretation associated with such models. However, its potential scope of application, as well as the quality of the uncertainty estimates in other applications, remains to be fully explored. Further research in the use of this framework in other contexts, e.g., use of multiple model output for calibration, use of longer data sequences for split sample validation, and use of more complex models, would be needed to address the above issues. Further research regarding the use of informative priors and other random error models within this general framework might provide additional insight into this methodology and its applications.
 Funding for this research was provided by the graduate school and the department of forest ecology and management at the University of Wisconsin-Madison, NASA Land Surface Hydrology grant (NAG5-8554) to D. S. Mackay, and USDA Hatch formula funds to D. S. Mackay. Additional support was provided from NSF Hydrological Sciences grants, EAR-0405306 to D. S. Mackay, EAR-0405318 to E. L. Kruger, and EAR-0405381 to B. E. Ewers. These contributions are gratefully acknowledged. Critical reviews by A. Montanari and the anonymous reviewers greatly improved the manuscript, for which we are thankful.