Water Resources Research

A Bayesian methodological framework for accommodating interannual variability of nutrient loading with the SPARROW model

Authors

  • Christopher Wellen,

    Corresponding author
    1. Ecological Modeling Laboratory, Department of Physical and Environmental Sciences, University of Toronto,Toronto, Ontario,Canada
      Corresponding author: C. Wellen, Ecological Modeling Laboratory, Department of Physical and Environmental Sciences, University of Toronto, Toronto, ON M1C 1A4, Canada. (christopher.wellen@utoronto.ca)
    Search for more papers by this author
  • George B. Arhonditsis,

    1. Ecological Modeling Laboratory, Department of Physical and Environmental Sciences, University of Toronto,Toronto, Ontario,Canada
    Search for more papers by this author
  • Tanya Labencki,

    1. Great Lakes Unit, Water Monitoring and Reporting Section, Environmental Monitoring and Reporting Branch, Ontario Ministry of the Environment,Toronto, Ontario,Canada
    Search for more papers by this author
  • Duncan Boyd

    1. Great Lakes Unit, Water Monitoring and Reporting Section, Environmental Monitoring and Reporting Branch, Ontario Ministry of the Environment,Toronto, Ontario,Canada
    Search for more papers by this author

Corresponding author: C. Wellen, Ecological Modeling Laboratory, Department of Physical and Environmental Sciences, University of Toronto, Toronto, ON M1C 1A4, Canada. (christopher.wellen@utoronto.ca)

Abstract

[1] Regression-type, hybrid empirical/process-based models (e.g., SPARROW, PolFlow) have assumed a prominent role in efforts to estimate the sources and transport of nutrient pollution at river basin scales. However, almost no attempts have been made to explicitly accommodate interannual nutrient loading variability in their structure, despite empirical and theoretical evidence indicating that the associated source/sink processes are quite variable at annual timescales. In this study, we present two methodological approaches to accommodate interannual variability with the Spatially Referenced Regressions on Watershed attributes (SPARROW) nonlinear regression model. The first strategy uses the SPARROW model to estimate a static baseline load and climatic variables (e.g., precipitation) to drive the interannual variability. The second approach allows the source/sink processes within the SPARROW model to vary at annual timescales using dynamic parameter estimation techniques akin to those used in dynamic linear models. Model parameterization is founded upon Bayesian inference techniques that explicitly consider calibration data and model uncertainty. Our case study is the Hamilton Harbor watershed, a mixed agricultural and urban residential area located at the western end of Lake Ontario, Canada. Our analysis suggests that dynamic parameter estimation is the more parsimonious of the two strategies tested and can offer insights into the temporal structural changes associated with watershed functioning. Consistent with empirical and theoretical work, model estimated annual in-stream attenuation rates varied inversely with annual discharge. Estimated phosphorus source areas were concentrated near the receiving water body during years of high in-stream attenuation and dispersed along the main stems of the streams during years of low attenuation, suggesting that nutrient source areas are subject to interannual variability.

1. Introduction

[2] There is a pressing demand for watershed models that can support our efforts to effectively quantify nonpoint source pollution [Rode et al., 2010]. While a suite of process-based models does exist to address this need, they have data requirements which cannot always be met (e.g., detailed subsurface properties) and are typically applied in well-studied catchments [Borah and Bera, 2004]. Hybrid empirical/process-based models, generally founded upon nonlinear regression equations, have been developed at large scales where a process-based model would become unwieldy and a priori knowledge about dominant biogeochemical process rates may not be available. They have been applied extensively in the United States [Alexander et al., 2004], the United Kingdom [Grizzetti et al., 2005], and continental Europe [de Wit, 2001]. The Spatially Referenced Regressions on Watershed attributes (SPARROW) model is a parsimonious hybrid empirical/process-based model developed by the United States Geological Survey to estimate nutrient loads, yields, and deliveries at landscape and regional scales. Despite its nonlinear regression structure, the inputs can be chosen according to a mechanistic understanding of nutrient source and sink dynamics. The SPARROW model has been applied at a variety of sites and scales, including New Zealand's Waikato River Basin [Alexander et al., 2002], the Neuse River Estuary [McMahon et al., 2003] the continental United States [Alexander et al., 2004], the Mississippi River Basin [Alexander et al., 2008], the Southeastern United States [García et al., 2011], the United States drainage to the Laurentian Great Lakes [Robertson and Saad, 2011], the Pacific Northwest, the Missouri River Basin, the Lower Mississippi River Basin, and the New England and Mid-Atlantic drainage [Preston et al., 2011]. SPARROW applications primarily focus on either nitrogen or phosphorus loadings, but models have also been developed for organic carbon [Shih et al., 2010], suspended sediment [Brakebill et al., 2010], and E. coli [Puri et al., 2009]. There are several difficulties to effectively accommodate spatial and temporal variability, when using models such as SPARROW in a nested basin context. The spatial difficulties have been examined in some depth and will be briefly discussed below, followed by the lesser examined temporal difficulties, the subject of this paper.

[3] A direct ramification of SPARROW's distributed structure is the propagation of the model (process) error in space, which in turn poses a major statistical challenge. As do all distributed regression models of mass loading, the model considers upstream stations as point sources to downstream stations. This introduces a potential serial correlation of model residuals. Most SPARROW applications overcome this problem by using the measured upstream load as the input to downstream sites [e.g., McMahon et al., 2003]. However, this practice is prone to essentially the same problem, as it ignores the (possibly substantial) imperfections of measured annual loads of watersheds and propagates the measurement error downstream. SPARROW model applications may also exhibit a spatial structure of the model residuals that does not stem from serial autocorrelation alone [McMahon et al., 2003]. For the sake of parsimony, SPARROW by default assumes uniform values of the model parameters across the study watershed, an assumption that may likely be another source of residual spatial autocorrelation. That is, the use of a single export coefficient for all the agricultural land uses clearly overestimates the intensity of the agricultural practices in certain (neighboring) sites and underestimates them in others. Though some applications of SPARROW do feature some type of spatial variability of the model coefficients [Alexander et al., 2004; García et al., 2011], the spatial delineation of these coefficient zones is often done in an ad-hoc manner. Founded upon Bayesian inference techniques, Qian et al. [2005] presented a formidable framework for accommodating the serial and spatial autocorrelation of residuals in SPARROW. In addition to the classical independent model error, this study introduced an error which applies only to stations receiving loading from upstream stations (the so-called state space or STSP model) and error terms that account for the spatial correlation of neighboring sites regardless of the drainage network (the so-called conditional autoregressive or CAR model). Qian et al. [2005] showed that for the SPARROW application at the Neuse River Estuary watershed [McMahon et al., 2003], the serially autocorrelated error contributes little to the total error, while most of the overall mismatch between model predictions and measurements could be explained by the spatially autocorrelated errors.

[4] Despite the significant progress in explicitly considering the various forms of spatial correlation, there are still no attempts in the published literature to accommodate the interannual variability of loading with either of the most commonly used hybrid empirical/process-based models (SPARROW and PolFlow [de Wit, 2001]). The typical SPARROW approach thus far has been to de-trend time series of annual loading estimates at each water quality monitoring station to a common base year [Alexander et al., 2002]. This base year represents the nutrient load that would have been observed at each station if average hydrological conditions had prevailed. This strategy is a pragmatic means to focus exclusively on spatial variability, while “factoring out” both the temporal variability of loading as well as the effects of different observation periods across sampling stations. Yet, using estimated nutrient source areas generated with this approach to inform policy or target management interventions postulates that there is insignificant interannual variability of source areas, an assumption which has not been examined and most likely oversimplifies the dynamics typically experienced within the watershed context. The PolFlow model simply averages nutrient flux over a 5 year time period [de Wit, 2001]. While conceptually and mathematically simpler than de-trending, this approach requires a very similar data record across sites and lumps interannual variability due to both nutrient sources and climate forcing into nutrient flux estimate uncertainty.

[5] The aim of this paper is to present a methodological framework for incorporating temporal nutrient loading variability into the SPARROW model. We subsequently apply this approach to the Hamilton Harbor drainage basin, a mesoscale catchment of about 450 km2, much smaller than those typically represented with the SPARROW model. While we focus exclusively on phosphorus in this study, our methods could be applied to the modeling of any mass flux. We employ a repeated measures approach—that is, the loading at a station for a year is treated as a datum in the regression. This time for space substitution allows us to estimate source areas and loads for each year. We adapt Bayesian configurations to accommodate the temporal correlation of model residuals and the uncertainty of the calibration data and conduct a number of numerical experiments to test two methodological approaches of incorporating temporal variability. The first approach postulates that the watershed characteristics as modeled by SPARROW represent a static, baseline level of nutrient loading associated with average conditions, while climatic predictors (e.g., precipitation) are used to describe the temporal variability around that mean. The second strategy assumes that in addition to the temporal variability associated with climatic forcing factors, there is also year-to-year variation in the source and sink processes modeled by SPARROW. We adopt methods of estimating time-varying parameters used with dynamic linear models (DLMs) to the nonlinear context of SPARROW. Our presentation will examine model realizations that incorporate a number of temporal predictors and different assumptions about the temporal distribution of model residuals.

2. Methodology

2.1. Description of the SPARROW Model

[6] The SPARROW model has been extensively described elsewhere [Alexander et al., 2002; McMahon et al., 2003; Qian et al., 2005; García et al., 2011], so only a basic introduction is given here. SPARROW is a hybrid empirical/process-based model designed to be applied to a network of water quality monitoring stations. SPARROW consists of a two-level hierarchical spatial structure. Watersheds are first divided into subwatersheds, each of which drains to a water quality monitoring station. Each subwatershed is then disaggregated into reach catchments draining to a particular stream segment. Mean annual watershed export of any constituent is expressed as a function of watershed attributes.

[7] The model considers source and sink processes over annual timescales. Source processes, described with export coefficients, which predict constituent mobilization; delivery factors predict how landscape attributes modulate the delivery of the mobilized constituent to streams; and attenuation coefficients predict the amount of the delivered constituent remaining in transit per length of stream or per reservoir. The SPARROW model is typically calibrated to a particular base year to describe the transport of nutrient inputs occurring in that particular time frame, while incorporating the interannual variability in hydrology that occurs over a series of years. The SPARROW model is formulated as:

display math

where the subscripts i and j refer to subwatersheds and reach catchments, respectively; μi refers to the natural logarithm of the mean annual total phosphorus load measured at station i in metric tons per year; n, N refers to the source index, where N is the total number of sources (diffuse and point sources) and n is an index for each source; Ji refers to the number of reaches in subwatershed i; βn refers to the estimated source coefficient for source n (tons P km−2 yr−1 for nonpoint sources); Sn,j refers to the quantity of source n in reach j in units of km2 of agricultural or urban land use for nonpoint sources, and metric tons yr−1 for point sources; α refers to a vector of land to water delivery coefficients; Zj is a vector of the land-surface characteristics associated with drainage in reach j; inline image refers to the fraction of nutrient mass originating in reach j remaining at station i as a function of first-order loss processes in streams; and inline image refers to the fraction of nutrient mass originating in reach j remaining at station i as a function of first-order loss processes in lakes and reservoirs.

[8] First-order loss processes in streams are expressed as

display math

where ks,m refers to the first-order loss coefficient for stream class m (km−1), and Li,j,m refers to the class m stream length in kilometers between reach i and station j. First-order loss processes in lakes and reservoirs are expressed as:

display math

where l refers to any lakes or reservoirs between reach i and station j, kr refers to the first-order loss coefficient or settling velocity (m yr−1), and ql refers to the aerial hydraulic loading of the lake/reservoir (m yr−1). Table 1 contains a list of all parameters included in the SPARROW model.

Table 1. Stochastic Nodes of the Different Model Configurations Examined
ParameterDescriptionUnits
αLand to water delivery coefficient.
β1Export coefficient for agricultural land.tons P km−2 yr−1
β2Export coefficient for urban land.tons P km−2 yr−1
krReservoir settling velocity.m yr−1
ks1Stream attenuation coefficient for first and second-order streams.km−1
ks2Stream attenuation coefficient for third and higher-order streams.km−1
γvTemporal coefficient for predictor v.
σStandard model error.Ln[tons P yr−1]
ψStandard model error specific to WALK.Ln[tons P yr−1]
ασInitial SD of the prior for the α parameter for the dynamic parameter estimation framework.
β1σInitial SD of the prior for the β1 parameter for the dynamic parameter estimation framework.tons P km−2 yr−1
β2σInitial SD of the prior for the β2 parameter for the dynamic parameter estimation framework.tons P km−2 yr−1
ks1σInitial SD of the prior for the ks1 parameter for the dynamic parameter estimation framework.km−1
ks2σInitial SD of the prior for the ks2 parameter for the dynamic parameter estimation framework.km−1

2.2. Introduction of Temporal Variability to the SPARROW Model

[9] Our framework introduces temporal variability to the SPARROW model by applying a repeated measures approach to a network of water quality monitoring stations. Rather than selecting a single year to phase out the variability in time and subsequently focusing on the spatial variability, we calibrate the model to annual loads measured repeatedly at a subset of intensively monitored sites in the studied watershed. Henceforth, we will be referring to this temporal augmentation as the SPARROW with annual loads of watersheds (SWALLOW) model. With this statistical configuration, the SPARROW model is used to estimate a static baseline level of nutrient loading ( inline image) over the study period and forcing factors are being employed to explain the temporal variability around that baseline:

display math

where Yi,t refers to the natural logarithm of the measured annual load at subwatershed monitoring station i during year t, μi refers to a prediction of the natural logarithm of a baseline annual load at monitoring station i estimated by the SPARROW equation, Wv,t denotes a matrix of v, temporal forcing factors across years t, γv denotes the corresponding vector of coefficients, and εi,t represents an independent spatiotemporal error. All errors are assumed independent, normally distributed, and with zero mean. We refer to equation (4) as the SWALLOW I model throughout this paper.

[10] We specified a second version of the SWALLOW model designed to accommodate the variability of watershed functioning in time:

display math

where μi,t refers to a prediction of the natural logarithm of the annual load at monitoring station i for year t independent of the effects of the temporal covariates represented by the Wv,t matrix. All of the other variables in equation (5) are identical to their counterparts in equation (4). We refer to equation (5) as the SWALLOW II model throughout this paper. The SWALLOW model makes use of nonspatial temporal forcing factors to accommodate interannual variability of watershed loads. While any time series data could be included as forcing factor, we focused on local climatic characteristics due to their importance and availability. Table 1 presents all model parameters examined.

2.3. Bayesian Inference Framework

[11] Bayesian inference was used as a means of model calibration due to its ability to include prior information in the modeling exercise and to explicitly handle uncertainties stemming from what we assumed were the main sources of uncertainty in this modeling exercise: model parameters, calibration data, and model structure. From the Bayesian perspective, statistical inference is treated as a quantitative update of prior beliefs after taking measurements into account. Beliefs are expressed as probability distributions (i.e., random variables), with the central tendency of these distributions corresponding to the degree of certainty that the expected value of the distribution is correct [Gelman et al., 2004]. Mathematically, Bayesian inference is founded upon Bayes' Theorem, expressed as

display math

where π(θ) represents our prior statements regarding the probability distribution that depicts the existing knowledge of the model parameters (θ), L(data|θ) corresponds to the likelihood of observing the data given the different θ values, and π(θ|data) is the posterior probability that expresses our updated beliefs on the θ values after the existing data from the system are considered. The denominator in equation (6) is the expected value of the likelihood function, and acts as a scaling constant that normalizes the integral of the area under the posterior probability distribution. Sequences of realizations from the model posterior distributions were obtained using Markov chain Monte Carlo (MCMC) simulations. We used the general normal-proposal Metropolis algorithm as implemented in the WinBUGS software [Lunn et al., 2000]. This algorithm is based on a symmetric normal proposal distribution, whose standard deviation is adjusted over the first 4000 iterations such as the acceptance rate ranges between 20% and 40%. We collected 40,000 samples each from two chains for each model realization. The first 10,000 samples were discarded and posterior statistics were calculated using a thin of 10, yielding a sample size of 6000 for all the model realizations considered. We assessed convergence qualitatively by visually inspecting plots of the posterior Markov chains for mixing and stationarity and by inspecting density plots of the pooled posterior Markov chains for unimodality. We also assessed convergence quantitatively using the modified Gelman–Rubin convergence statistic [Brooks and Gelman, 1998]. The accuracy of the posterior parameter values was inspected by assuring that the Monte Carlo error for all parameters was less than 5% of the sample standard deviation.

[12] Wherever possible we opted for informative priors. Priors for the export coefficients, settling velocity, and in-stream attenuation were log-normally distributed, owing to the SPARROW parameterization of Qian et al. [2005] using total nitrogen loads from three large river basins in eastern North Carolina, which presented evidence that these parameters tend to be positively skewed (see their Figure 7). The values of the β coefficients represented literature-based estimates of total phosphorus export [Beaulac and Reckhow, 1982]. The upper limit found for total phosphorus in the database was specified as the 70th percentile of our distributions; thus, the corresponding priors were relatively wide, thereby allowing more of the information contained in the posterior distributions to come directly from the data. The distribution for kr was drawn from work by Cheng et al. [2010]. We based the prior distributions for ks1 and ks2, the stream attenuation coefficients, loosely on values from previous models; that is, we assigned a higher median to ks1 than to ks2 along with standard deviations that are fairly large compared to the range of ks between 0 and 1 [Alexander et al., 2004]. The priors are presented in Table S1 of the auxiliary material.

[13] The typical SPARROW practice uses the measured upstream load for input to downstream subwatersheds during calibration, which conceptually undermines the usefulness of the model for predictive purposes [McMahon et al., 2003; Qian et al., 2005]. Using the measured loads as inputs into downstream watersheds has two major problems. First, it overestimates the confidence in the loading data. Being mere estimates of the actual nutrient fluxes, the so-called “measured” loads are associated with a substantial error and failure to account for their uncertainty can result in a misleading model calibration. Second, relying on the measured loads as upstream input means that predictions at stations with most of their watershed area monitored by an upstream station may strongly depend on the measured inputs, which in turn results in a very optimistic assessment of the model error. All of the statistical formulations explored in this paper use the modeled load as input to downstream stations. As established by Qian et al. [2005], some representation of the uncertainty of the calibration data when using modeled loads as inputs to downstream subwatersheds is necessary to avoid a misleading model calibration. We describe our approach to do so in section 2.3.1.

2.3.1. Calibration Data Uncertainty

[14] The importance of explicitly accommodating calibration data uncertainty has been acknowledged in the literature [e.g., Renard et al., 2010], though it is typically ignored in the context of SPARROW-type models. This is a significant omission, considering that annual loads are typically estimated using rating curve models or estimation approaches applied to measurements collected bi-weekly or less frequently, and are subject to substantial uncertainty [Richards and Holloway, 1987; Cohn et al., 1992]. Two approaches have been discussed for representing measurement error in models. In the context of estimates of annual loads, the classical approach assumes that the observed values of a variable Yi,t are drawn from a distribution which has as its expected value Loadi,t, the “true” value of the variable being sampled [Carroll et al., 2006]. The classical approach is appropriate when the uncertainty is assumed to come from deficiencies in sampling or measurement and has been used to model the uncertainty of point rainfall estimates [Balin et al., 2010]. The Berkson model takes the opposite approach, assuming that the true value is drawn from a distribution with expected value equal to the observed value. The Berkson approach is appropriate when the uncertainty is assumed to stem from a lack of commensurability between what has been measured and the variable one is interested in, and has been applied to estimate mean aerial rainfall from point measurements [Ajami et al., 2007]. Mathematically, the key difference between the two resides in whether the observed values vary about the true values (classical) or the true values vary about the observed (Berkson). We assumed that the uncertainty in load estimates stems from a combination of sampling and analytic errors rather than a lack of commensurability, so we opted for the classical representation of measurement error for annual loads.

[15] In our case, the classical measurement error model consists of three components: (1) the (log-transformed) measurements Yi,t, (2) the (log-transformed) true values Loadi,t, and the measurement error inline image. These variables are arranged in a hierarchical framework, which has as its first level the relation of the observed to true loading values:

display math

Note that because we are working with log-transformed data this postulates multiplicative measurement error. For this paper, the values of inline image are prespecified and are not part of the model calibration process. In section 2.5 we detail how they are calculated. The second level of the hierarchy introduces a model for the “true” log transformed loads:

display math

Because the term inline image is equal to the SWALLOW model prediction, this framework essentially postulates that the model is an unbiased estimator of the “true” annual loads with structural (or process) error drawn from a normal distribution with variance inline image. The likelihood of the loading estimate i in year t, given the model, is then the product of the likelihood of the two levels of our hierarchical configurations:

display math

To summarize, our calibration error framework seeks to minimize both the differences between the measured and “true” loading data as well as between the “true” and modeled loading. To do so, we must estimate the “true” loading as part of the model calibration. This adds an additional inline image stochastic nodes, considerably increasing the complexity of the calibration exercise but realistically accommodating the measurement errors as well as the model process error.

2.3.2. Introducing Interannual Variability With Climatic Predictors—SWALLOW I

[16] The simplest approach we investigated used the classical SPARROW model in conjunction with climatic forcing factors to estimate annual loads (equation (4)). This formulation, called Markov-Chain Monte Carlo (MCMC), assumes that the annual log-transformed nutrient loading is a draw from a normal distribution with a mean defined by the model and a constant model (process) error variance. Model residuals are assumed to be independent both in space and time. When we also consider the calibration data uncertainty, we can express this approach mathematically as follows:

display math

where Yi,t refers to the log-transformed measured load of subwatershed i at time t, Loadi,t is a latent variable that represents the “true” loading values when accounting for the measurement error inline image, μi refers to the base loading calculated from the SPARROW equation, σ represents the model (process) error, and gamma(0.001, 0.001) is the gamma distribution with shape and scale parameters of 0.001, representing a “noninformative” or vague prior assigned to the error precision (the inverse of variance).

[17] Even at annual timescales, watershed processes are dynamic, yet the parameters used in equation (4) are static. This may cause a temporal autocorrelation of model residuals. Temporal autocorrelation could stem from systematic trends in either the nutrient export dynamics or the spatial patterns of the different land uses. We use a temporal first-order random-walk function (WALK) to account for the temporal correlation of residuals [Arhonditsis et al., 2008a, 2008b; Sadraddini et al., 2011a]. We posit a random effect vt for each year t represented by a first-order random walk prior [Shaddick and Wakefield, 2002; Arhonditsis et al., 2008a, 2008b]. When we also account for the calibration data uncertainty, the WALK formulation is as follows:

display math

where −t denotes the previous and subsequent years of t, T denotes the total number of years of the study period, and ψ2 is the conditional variance of the vt terms and its prior density was based on a conjugate inverse-gamma (0.001, 0.001) distribution. Our statistical approach reflects prior beliefs that these systematic trends in the watershed functioning are smooth and that sudden jumps between consecutive years are unlikely to occur.

2.3.3. Introducing Interannual Variability With Time Varying Parameters—SWALLOW II

[18] The rest of the formulations, referred to as SWALLOW II, considered time-variant watershed parameters and differed in the nature of the parameters allowed to vary. It is very unlikely that the landscape processes inherent in nonlinear regression models of nutrient loading (e.g., export rates, stream attenuation) operate identically from year to year, and therefore allowing these parameters to vary can be an effective means to accommodate interannual variability. The use of time variant parameters to overcome model structural deficiencies has been investigated for some time [e.g., Beck and Young, 1976]. Perhaps the most widely known approach is the Kalman filter [Kalman, 1960], a sequential model estimation approach that uses a gain function to combine model predictions with system measurements at each time step, inversely weighting each by their uncertainties. This method also postulates an error covariance structure that is estimated along with the rest of the model parameters. Kalman-type approaches to fusing model predictions and measurements online are ubiquitous in many disciplines, including watershed modeling, and have been used as a technique to estimate the values of time-varying parameters [e.g., Moradkhani et al., 2005; Lin and Beck, 2007].

[19] Other approaches in the watershed modeling literature reproduce the temporal variability of parameter values with some type of stochastic process. Reichert and Mieleitner [2009], for instance, used the Ornstein-Uhlenbeck process to accommodate parametric variability in time. Although the underlying assumption of stationarity is a valuable way to account for structural uncertainty while constraining the added complexity of temporally varying parameters, we may not expect a parameter to be stationary with respect to the (always somewhat arbitrary) study time frame, e.g., the intensity of in-stream attenuation would be expected to vary with dry and wet years. Lin and Beck [2007] used a first-order random walk, a nonstationary process, in a dissolved oxygen model of a managed pond. Their analysis shows how time varying parameters can be used to identify structural improvements to models of environmental systems.

[20] For this paper we adapted approaches often used in the context of dynamic linear models (DLMs), which recognize the temporal structure in the data time series with the assumption that the level of the response variable at each time step is influenced by past levels [West and Harrison, 1989; Prado and West, 2010]. Two key points distinguish the DLM approach we employ in this paper from standard regression approaches. First, the DLM approach posits that some or all model parameters vary with time, and that their time series is autocorrelated – the closer in time, the more similar are parameter values. Second, in contrast with regression analysis, where parameters are conditioned on the entire time series, the dynamic parameter estimation is influenced only by prior and current information, not by subsequent data [Stow et al., 2004; Sadraddini et al., 2011a, 2011b]. In principle, the DLM approach is equivalent to the Kalman-type strategies, although the focus here is on a full probabilistic treatment of the underlying uncertainty, instead of a sequential updating of the mean prediction. Further, the sampling of parameter values is not done sequentially through the time series, but rather follows the standard MCMC approach of sampling a proposal point in the parameter space for the entire time series, evaluating that point, and applying the Metropolis Rule in deciding whether to add that point to the Markov chain [Lunn et al., 2000].

[21] In this study, we introduce nonconstant and data-driven variances (with respect to time) using a discount factor on the prior of the first year [Congdon, 2001]. Based on experience from recent work [Azim et al., 2011; Sadraddini et al., 2011a, 2011b] and preliminary trials, we used values ranging from 0.95 to 0.98 and thus our dynamic parameter estimation framework is

display math

where θt represents any of the SPARROW parameters at time t, ϕt is the corresponding error term for year t sampled from normal distributions with zero mean and variance inline image, and ζ represents the discount factor. θmean, θmin, and θmax correspondingly represent the mean value, minimum and maximum of the literature priors used with the SWALLOW I formulations. The values of θ1 as well as subsequent values θt were constrained within the range θmin to θmax. The gamma distribution assigned to the parameter inline image was constructed such that its mean was equal to the variance of the SWALLOW I informative priors, while the uncertainty of the same distribution reflects our confidence of that mean estimate. To achieve commensurability between the SWALLOW I and SWALLOW II formulations, we assumed a very high level of confidence (i.e., coefficient of variation <5%). Our statistical configuration essentially postulates that between 95% and 98% of the information is carried forward from time t to t + 1; thus, the influence of the original priors decreases as time progresses and is gradually superseded by the influence of the data [Azim et al., 2011, Sadraddini et al., 2011a]. In this study, we examined different combinations of time variant parameters to identify the most parsimonious structure. In particular, we selected the following four parameter combinations: (1) delivery coefficient (α) alone; (2) the two export coefficients (β1, β2); (3) the two stream attenuation coefficients (ks1, ks2); and (4) the two export coefficients (β1, β2) along with the two stream attenuation coefficients (ks1, ks2). Table 2 presents all statistical formulations examined.

Table 2. Bayesian Statistical Formulations of the SWALLOW Models Examined
Model NotationDescription
MCMCModel residuals are assumed independent. All parameters are static through time.
WALKRandom walk of model residuals through time. All prior parameters are independent and model residuals in space are assumed independent. All parameters are static through time.
MCMC - αDYNModel residuals are assumed independent. The α parameter (delivery to streams) varies each year, while all other parameters are static through time.
MCMC - βDYNModel residuals are assumed independent. The β parameters (export coefficients) vary each year, while all other parameters are static through time.
MCMC - ksDYNModel residuals are assumed independent. The ks parameters (stream attenuation) vary each year, while all other parameters are static through time.
MCMC - β,ksDYNModel residuals are assumed independent. The β (export coefficients) and ks parameters (stream attenuation) vary each year, while all other parameters are static through time.

2.4. Case Study

[22] Hamilton Harbor is a large embayment at the western end of Lake Ontario. The Harbor is designated as one of 17 Canadian Areas of Concern in the Great Lakes Basin under the International Joint Commission due to a history of eutrophication problems manifested as nuisance algal blooms, turbid water, prevalence of toxic cyanobacteria, and low hypolimnetic oxygen concentrations toward the end of the summer stratified period [Hiriart-Baer et al., 2009; Ramin et al., 2011]. The Hamilton Harbor Remedial Action Plan (RAP), a consortium of government, private sector, and community actors, is mandated with restoring and protecting environmental quality and beneficial uses. RAP consultations with local stakeholders have identified a warm water fishery as a priority use for the Harbor [Charlton, 2001]. While earlier work highlighted the critical role of the sewage treatment plants in governing total phosphorus and chlorophyll α concentrations in the Harbor, substantial uncertainty regarding the water quality conditions exists due to the poorly defined nutrient loadings from the drainage basin [Gudimov et al., 2010, 2011].

[23] Hamilton Harbor's drainage basin is about 450 km2 in aerial extent and consists of watersheds dominated by agricultural (Grindstone and Spencer Creeks) or urban land use (Redhill and Indian Creeks; see Figure 1). Urban and agricultural land together account for 80% of the watershed's surface area. Population in Hamilton has been increasing and urban areas have been expanding, largely at the expense of agricultural land uses (Southern Ontario Land Resource Information System (SOLRIS), Ontario Ministry of Natural Resources, 2008, available from http://lioapp.lrc.gov.on.ca). The soils of the Harbor basin are mainly loams (73%), while organic soils, silty clay loams, and clay loams together make up about 10% of the basin soils. Most of the remainder is composed of rocky outcroppings and ravines. Soils are spread relatively evenly between the four soil hydrologic runoff groups – groups A and B, those least runoff prone, each have 23% coverage, group C has 29% coverage, and group D, the most runoff prone, has 24% coverage. The slopes of the Harbor basin are mild, with the exception of the Niagara Escarpment. The average slope of the entire basin is 4.4%, and ignoring all slopes greater than 30% the average is 3.8%.

Figure 1.

Map of the Hamilton Harbor watershed, western end of Lake Ontario, Ontario, Canada.

2.5. Data Sets

2.5.1. Spatial Data Sets

[24] We provide an extensive description of the spatial datasets used as inputs to the SPARROW model in the auxiliary material and a brief overview here. We used a 10-m digital elevation model to delineate the subwatersheds. Our calibration data set had 6 subwatersheds. Their areas ranged from 25.5 – 75.8 km2, with a mean of 49.3 km2 and a standard deviation of 24.1 km2. There are a total of 118 reach catchments, and each reach catchment discharges into a confluence, reservoir, or water quality monitoring station. Reach catchment areas ranged from 0.02 – 12.3 km2, with a mean of 2.5 km2 and an interquartile range of 3.5–1.3 = 2.2 km2. Each reach is drained by a single stream. The mean stream length is 2.4 km with an interquartile range of 3.2–1.2 = 2.0 km. Two stream classes are included in the model, one for streams of Strahler order 1 or 2, and one class for streams of Strahler order 3 or higher [Strahler, 1952]. Four reservoirs were used during the parameter estimation of the SWALLOW models (Figure 1). Nonpoint nutrient sources included in the model were agricultural land and urban land, together representing 80% of the basin area. A single wastewater treatment plant, the Waterdown plant, drained into one of the streams. The mean loading for this plant between 1996 and 2007 was 0.3 tons of phosphorus per year, with an interquartile range of 0.4–0.2 = 0.2 tons per year (Hamilton Harbor Remedial Action Plan Technical Team, Contaminant Loadings and Concentrations to Hamilton Harbor: 2003–2007 Update, Hamilton Harbor Remedial Action Plan Office, Burlington, Ontario, Canada). Nutrient delivery was parameterized as a function of the proportion of each reach covered by wetlands due to their role in moderating nutrient fluxes to receiving waterbodies [Krieger, 2003]. Proportions of wetland ranged from 0 to 1 with a mean of 0.06 and an interquartile range of 0.06 – 0 = 0.06.

2.5.2. Nutrient Loads

[25] We estimated phosphorus loads for each year between 1988–2009 at each station using station-specific rating curves, each of which expressed log-transformed daily nutrient loading as a function of log-transformed daily flow:

display math

where a0 and a1 are regression coefficients, and Ln(Q) refers to log transformed daily streamflow. All concentration measurements available for each station between 1988 and 2009 were employed in fitting the rating curve. We should stress that this approach to estimating loads accommodates the annual variability in loading associated with variations in the hydrograph, while variations of annual loading due to other factors such as changes in fertilizer application intensity are not captured. By assuming a single response of loading to flow throughout the study time period we may be underestimating the true temporal variability of annual loading, though the estimated log-transformed loads did show significant interannual variability, with coefficients of variability ranging from 0.27 to 0.34. The number of concentration measurements employed at each station ranged from 23 to 161 with a mean of 58. The r2 values for the rating curves ranged from 0.71 to 0.92 with a mean of 0.82. Each rating curve was used in conjunction with daily flow records for each year to estimate average daily loading, which was multiplied by 365 to yield total annual loads for each year from 1988 to 2009. We included between 13 and 22 load estimates for each station for a total of 102 load estimates. One annual loading estimate was based on 346 days of estimated loads, while the rest were based on 365 (or 366 for leap years). Annual loads ranged from 0.2 to 6.3 tons yr−1 with a mean of 2.4 tons yr−1. Log transformed total phosphorus values ranged from −1.6 to 1.8 Ln(t yr−1) with a mean of 0.6 Ln(t yr−1). All rating curve calculations were carried out with the U.S. Geological Survey's LOADEST program [Runkel et al., 2004]. The concentration measurements were supplied by the Ontario Ministry of the Environment's Provincial Water Quality Monitoring Network, while the daily flows were supplied by the Water Survey of Canada (Ontario Provincial Water Quality Monitoring Network, 2011, unpublished data available from: http://www.ene.gov.on.ca/environment/; Water Survey of Canada, 2011, unpublished data available from: http://www.wsc.ec.gc.ca/applications/H2O/).

[26] Our data quality submodel postulates that the log transformed loadings are random variables drawn from normal distributions with mean values equal to the unknown true load and variances ( inline image) representing the associated uncertainty at each site for each year. We derive the inline image terms from the 95% confidence intervals of the calculated mean daily loads at each station for each year as provided by the LOADEST program [Runkel et al., 2004], which estimates the variance of the mean predicted load as the sum of the covariance of all daily load estimates [Gilroy et al., 1990, equation (16)]:

display math

where MSE (X) is the variance about the mean load prediction for a particular station at a particular year, i and j denote arbitrary days, and L(i) and L(j) denote the loads on days i and j predicted by the rating curve model. The covariance terms are estimated using equations given by Gilroy et al. [1990, equations (17)–(25)], and do account for the residual variance of the rating curve model in addition to its parametric uncertainty. The 95% confidence intervals of the mean daily load are calculated using MSE(X), which we then multiplied by 365 and log-transformed to obtain the width of the 95% confidence interval on the log scale. In keeping with our assumption of log-normality, the values of δi,t were estimated as one quarter of this width. Values of δi,t ranged from 0.09 to 0.55 Ln(tons yr−1) with a mean of 0.19 Ln(tons yr−1).

2.5.3. Temporal Forcing Factors

[27] We use two climatic forcing factors in this paper: annual precipitation and annual potential evapotranspiration. All forcing factors were calculated from data collected at Environment Canada's Hamilton Airport station (WMO ID 71,263) between the years 1988 and 2009. Total annual precipitation ranged from 677 mm to 1115 mm, with a mean of 901 mm and an interquartile range of 1023–786 = 237 mm.

[28] Potential evapotranspiration serves as an estimate of the annual variability of atmospheric flux of water out of the basin. Evaporation is a pathway for precipitation to exit the basin without contributing to nutrient loading. We estimated daily potential evapotranspiration with the FAO's Penman-Monteith method and then summed to yearly intervals [Allen et al., 1998]. While using potential evapotranspiration measured as a surrogate implicitly assumes that the main limitations to atmospheric-water fluxes are related to atmospheric conditions and energy supply and not related to water supply at the surface or stomatal/soil resistance to evapotranspiration, the inclusion of potential evapotranspiration as a temporal forcing factor may nonetheless offer some insights into the annual functioning of the Hamilton Harbor basin. All the details of the calculation are presented in the auxiliary material. Both temporal forcing factors were subjected to a nonparametric standardization ((value-median)/interquartile range) before their inclusion into the model.

2.6. Overview of Numerical Experiments and Model Evaluation

[29] The flexible framework provided by an empirical model allows many possible realizations, or combinations of inputs and statistical formulations. We here conceptualize the model realization space of SWALLOW as two dimensional, where the dimensions correspond to the statistical formulation, and temporal (climate) forcing complexity. We employed two W matrices: one that included only annual precipitation and one that included annual precipitation and total potential evapotranspiration. We also examined model realizations that omitted a W matrix altogether, which was only possible with the SWALLOW II formulation.

[30] We used two measures to evaluate the different model realizations examined. First, we used the deviance information criterion (DIC), a Bayesian measure of parsimony that rewards for model fit but penalizes model complexity [Spiegelhalter et al., 2002]. The DIC is the Bayesian analog of Akaike's Information Criterion [Akaike, 1974]. The DIC is defined as follows:

display math

where inline image refers to the posterior mean of the deviance and pD is a measure of the effective number of model parameters. The deviance is defined as the residual information in data Y conditional on a parameter vector θ and is calculated as −2 log{p(Y|θ)} or −2 log{likelihood}. The effective number of parameters is calculated as the posterior mean deviance of the model ( inline image) minus the estimate of the model deviance calculated when using the posterior means of the parameters ( inline image), which corresponds to the trace of the product of Fisher's information and the posterior covariance. A smaller DIC value indicates a more parsimonious, and hence “better,” model. Model realizations were also evaluated for fit alone using two metrics: (1) the Root Mean Squared Error (RMSE), calculated using the medians of the posterior predictive distributions of the yearly log-transformed loads:

display math

and (2) a Weighted Root Mean Squared Error (WRMSE) calculated using as weights the precision (inverse of variance) of the loading estimates:

display math

where Yi,t refers to the measured log-transformed load for subwatershed i at year t, μi,t+ Wν,tγν refers to the median of the posterior predictive distribution of the log-transformed loads from subwatershed i at year t, and n represents the number of total nutrient loading measurements.

3. Results

3.1. Evaluation of Model Performance

[31] The DIC values of the different models parameterized with the total phosphorus data are presented in Table 3. The corresponding RMSE and WRMSE values are also presented in the auxiliary material. The highly favorable DIC values of the WALK formulations suggest systematic changes in the phosphorus exports from the watershed unaccounted for by the SPARROW model and the climatic covariates considered herein. For any number of climatic predictors though, the RMSE value of WALK is usually higher than the corresponding RMSE of the SWALLOW II formulations, indicating that the favorable parsimony score of WALK is likely driven by its lower number of stochastic nodes more than its goodness of fit.

Table 3. Deviance Information Criterion for All Statistical Formulations Used to Model Total Phosphorus Loadinga
FormulationPred0Pred1Pred2
  • a

    Pred0 refers to the sole use of SPARROW to accommodate interannual loading variability; Pred1 refers to the use of SPARROW along with the total annual precipitation; Pred2 refers to the use of SPARROW along with the total precipitation and the total annual potential evapotranspiration.

SWALLOW I
MCMC−26.1−25.8
WALK−80.0−79.7
 
SWALLOW II
MCMC - αDYN−18.5−29.9−30.2
MCMC - βDYN−44.6−45.2−45.7
MCMC - ksDYN−78.4−80.3−79.9
MCMC - β,ksDYN−38.1−37.8−37.1

[32] Despite the complexity entailed by the use of time-variant export and/or attenuation coefficients, the SWALLOW II formulations were generally found to be more parsimonious than the MCMC SWALLOW I configuration and were comparable to the WALK SWALLOW I configuration. For total phosphorus, when the export (β1, β2) or stream attenuation coefficients (ks1, ks2) were allowed to vary with time (SWALLOW II), they provided comparatively better results over the static MCMC SWALLOW I configuration. The statistical formulation that allowed both the export and stream attenuation coefficients to vary was not supported by the DIC. While our data set consists of 102 measurements, the SWALLOW II statistical formulation sequentially fits each year with six or fewer points. Yet, although the parameterization of a particular year is conditional upon the information contained in the preceding ones (see equation (12)), it is still likely that allowing four parameters to vary in time is simply too complex for our data set, despite the likelihood that the processes represented by both export coefficients and stream attenuation coefficients vary annually. SWALLOW II model realizations that included the precipitation variability or potential evapotranspiration as temporal predictors were typically characterized by minor improvement of their DIC values relative to those derived from the consideration of total precipitation alone, suggesting that temporal variability of the SPARROW model parameters may be sufficient to describe the interannual nutrient loading variability.

[33] We include quantile-quantile and autocorrelation plots of posterior mean residuals in Figures 2 and 3 in order to assess the likelihood assumptions of normality and temporal independence of residuals made by all formulations except WALK. We present residuals from the formulations MCMC - Pred1 and MCMC - Pred1 - ksDYN, the most parsimonious SWALLOW I and SWALLOW II formulations. In accordance with our likelihood assumptions, the two components of the likelihood function (equation (9)) are assessed individually. Generally, the quantile-quantile plots show that the residual distributions were centered around the 1:1 line, although the residuals of measured from estimated “true” loads are characterized by somewhat leptokurtic patterns. Interestingly, the latter deviation patterns from the normality assumption were mainly associated with the substantial uncertainty characterizing the loading estimates from Redhill Creek (see auxiliary material Figures S1–S5). The two significant deviations at the lower range of the residuals of “true” from modeled load for SWALLOW I formulation MCMC - Pred1 both correspond to the year 1999, which as we detail in section 3.2.2 was characterized by significantly different parameter values by the SWALLOW II framework. The spatially averaged residuals of the SWALLOW I formulation are relatively independent in time, while the SWALLOW II spatially averaged residuals between “true” and modeled loads manifest some dependence on time (Figure 3, bottom-right). Interestingly, the negative correlation coefficient suggests an oscillatory pattern of the residuals, instead of the expected grouping of over and under predictions with each other, which a positive correlation would indicate. A time series plot showed oscillatory behavior of the residuals in the final 3 years of the study. We omitted these 3 years and calculated a lag-1 correlation coefficient of only −0.35, which was well below the critical correlation coefficient of ±2/ inline image = 0.43, expected from a random process generating 22 time steps of data. The three omitted years correspond to a time period when only three of the stations were active, the sparsest period of our data record.

Figure 2.

(top) Quantile-quantile plots and (bottom) autocorrelation function plots for SWALLOW I formulation MCMC - Pred1. Autocorrelation plots show average of residuals across stations. (left) Residuals of measured from estimated “true” load ( inline image, see equation (7)) and (right) residuals of “true” from modeled load ( inline image, see equation (8)). Circles represent posterior mean residuals, gray lines the 95% credible interval, and black lines the 1:1 line. Gray lines in the autocorrelation plots represent 95% confidence interval for correlation given sample size.

Figure 3.

(top) Quantile-quantile plots and (bottom) autocorrelation function plots for SWALLOW II MCMC - ksDYN formulation with the Pred1 climate forcing complexity. Autocorrelation plots show average of residuals across stations. (left) Residuals of measured from estimated “true” load ( inline image, see equation (7)) and right panels represents residuals of “true” from modeled load ( inline image, see equation (8)). Circles represent posterior mean residuals, gray lines the 95% credible interval, and black lines the 1:1 line. Gray lines in the autocorrelation plots represent 95% confidence interval for correlation given sample size.

3.2. Posterior Parameter Distributions

[34] Table 4 shows the differences of the posterior parameter means and standard deviations among the various models when considering the total precipitation as the sole temporal predictor. The reported values of the time-varying parameters are averages of the mean and standard deviation values across all years examined. The parameter distributions are generally consistent across the formulations and a careful inspection of their values offers insights into the watershed functioning. We found that the consideration of the proportion of each reach covered by wetlands led to well-identified delivery coefficients (α). For a reach with aerial wetland coverage of 6%, the mean of our data set, the delivery coefficient values predict stream deliveries of about 57% of the total phosphorus export predicted by the corresponding coefficients. The total phosphorus export coefficients from agriculture (β1) and urban land (β2) were well-identified and broadly in agreement with previous SPARROW applications [Alexander et al., 2002; García et al., 2011].

Table 4. Markov Chain Monte Carlo Estimates of the SWALLOW Models Parameterized With Total Phosphorus Dataa
ParametersSWALLOW ISWALLOW II
MCMCWALKMCMC - αDYNMCMC - βDYNMCMC - ksDYNMCMC - β,ksDYN
MeanSDMeanSDMeanSDMeanSDMeanSDMeanSD
  • a

    All statistical formulations refer to the Pred1 level of climate forcing complexity. Reported values of β and ks are averages of the mean and SD across all years of the study period. Units of β are tons P km−2 yr−1. Units of ks are km−1. Units of kr are m yr−1.

α8.321.419.451.2510.591.779.811.079.231.3210.501.36
β10.170.040.170.040.180.040.200.050.130.020.160.07
β20.080.020.080.030.070.020.100.070.060.020.070.06
kr14.084.1315.074.1614.524.4014.803.6815.393.6515.583.70
ks10.190.070.180.070.170.060.190.050.130.070.130.11
ks20.040.020.040.010.040.020.050.010.030.010.030.02
γ10.350.050.290.080.350.050.390.310.290.080.120.31
σ0.210.020.050.020.170.020.040.010.030.010.030.01
ψ  0.310.06        
ασ    3.300.17      
β1σ      1.640.08  1.640.08
β2σ      1.820.09  1.820.09
ks1σ        0.710.010.710.01
ks2σ        1.000.021.000.02
DIC−26.1−80.0−29.9−45.2−80.3−37.8
RMSE0.260.140.230.100.140.11
WRMSE0.250.090.190.070.080.05

[35] The reservoir (kr) and stream (ks) attenuation coefficients are generally in agreement with previous SPARROW applications [Alexander et al., 2002; García et al., 2011]. The stream attenuation rates were higher for smaller (first and second order) streams than for larger (third and higher order) streams, reflecting the greater contact of water and streambed as well as the longer hydraulic residence time in smaller streams [Stream Solute Workshop, 1990]. Our parameter results indicate that on average around 16% of phosphorus is lost per kilometer of small stream transit and only 4% per kilometer of large stream transit. The precipitation coefficient (γ1) has a value of roughly 0.3 and is well identified for most of the formulations examined. A positive value of this coefficient indicates that greater precipitation results in greater loading. We also note that no significant relationships were found between the time series of the dynamic parameters and the corresponding annual precipitation inputs when the precipitation was used as a covariate (r2 < 0.2). On a final note, the SWALLOW II formulations were generally characterized by much lower model structural error (σ) than their SWALLOW I counterparts, reinforcing the model improvement with the dynamic watershed parameters.

[36] An interesting systematic effect was observed with the parameter estimates of the SWALLOW II formulation that were allowed to vary with time. While the posterior parameter distributions were fairly consistent across the different models examined, the SWALLOW II formulations resulted in posterior mean values for the dynamic parameters that could differ substantially compared to their static counterparts. One plausible explanation for this discrepancy may be the nature of the parameter estimation process along with the functional role of the priors with the two strategies. Namely, the SWALLOW I formulations use the literature-derived priors to update our knowledge about the average value of the different parameters for the entire time period, while the SWALLOW II formulations with sequential parameter estimation use the prior information solely for the first year, after which the estimate for the previous year supplies the most likely value for the next year's prior.

3.2.1. Effects of Temporal Predictors on Parameter Values

[37] The posterior parameter means and standard deviations for the formulation that considers dynamic stream attenuation coefficients (ks1, ks2) as well as the data uncertainty are provided in Table 5. The total potential evapotranspiration is a poor predictor of loading and its coefficient was not well identified. This result was observed even with the much simpler SWALLOW I formulations. Yet, it is unclear whether the lack of support for potential evapotranspiration as a predictor of annual loading stems from its weak causal link with the nutrient export in Hamilton Harbor or its inability to appreciably capture the dynamics of actual evapotranspiration. While the parameter estimates are fairly consistent among the three levels of climate forcing complexity, the largest differences were found between Pred0 and the rest of the realizations that considered temporal predictors. Adding temporal predictors tended to increase the importance of wetlands in modulating delivery to streams and decreased both the export coefficients and the small-stream attenuation coefficient (ks1). We also note that no significant relationship exists between the annual ks estimates and annual precipitation for the Pred1 realizations (r2 < 0.05), whereas the annual precipitation appears to be a significant predictor of the small stream attenuation coefficient (r2 = 0.32, slope = −0.6, p < 0.01) with the Pred0 realization. The latter finding highlights the tradeoffs when using forcing factors and time-varying parameters, in that the inclusion of significant forcing factors may reduce the variability of the time-varying watershed parameters.

Table 5. Markov Chain Monte Carlo Estimates of the MCMC - ksDYN Formulation Parameterized With Total Phosphorus Data Across Different Levels of Climate Forcing Complexitya
ParametersPred0Pred1Pred2
MeanSDMeanSDMeanSD
  • a

    Reported values of ks are averages of the mean and SD across all years of the study period.

α8.730.889.231.329.171.28
β10.160.020.130.020.130.02
β20.080.020.060.020.060.02
kr15.943.7715.393.6515.323.56
ks10.190.070.130.070.130.07
ks20.030.020.030.010.030.02
γ1  0.290.080.320.11
γ2    0.040.10
σ0.030.010.030.010.040.01
ks1σ0.710.010.710.010.710.01
ks2σ1.000.021.000.021.000.02
DIC−75.0−77.0−76.6
RMSE0.140.140.15
WRMSE0.070.080.08

3.2.2. Temporal Variability of Walk Errors and SWALLOW II Parameters

[38] We found plausible mechanisms to explain the interannual variability of the WALK correlated errors (vt terms) as well as the posterior medians of the SWALLOW II parameters from various formulations. We first consider the vt terms, as these empirical autocorrelated error terms encapsulate the “missing signal” from the static parameterization. Figure 4 shows that annual streamflow explains most of the variability of the vt autocorrelated error terms of the WALK E1 - Pred1 realization. Likewise, the vt terms appear to covary with the large stream attenuation parameter estimates of the most parsimonious SWALLOW II - E1 realization, MCMC - ksDYN - E1 - Pred1. Further, the vt terms explain the majority of the variability in the agricultural export terms in the MCMC - ksDYN - E1 - Pred0 realization (r2 = 0.60) and the delivery terms in the MCMC - αDYN - E1 - Pred1 realization (r2 = 0.79). On the other hand, there were no significant relationships of the vt terms with the runoff ratio (total runoff/total flow; r2 < 0.15), nor with the annual population of the Hamilton Census Metropolitan Area (r2 < 0.01). Interestingly, precipitation was also not correlated with the vt terms (r2 < 0.02), so some aspect of the system related to flow other than total precipitation is being captured by the vt terms.

Figure 4.

Scatterplots of random walk correction factor (vt), average annual flow, and large stream attenuation (ks2) for total phosphorus. Calculations were carried out using the MCMC - ksDYN formulation with the Pred1 climate forcing complexity.

[39] Figure 5 shows the stream attenuation coefficients as a function of average annual streamflow measured at Grindstone Creek, the largest watershed with an unmanaged flow regime. Little of the variability of the small-stream attenuation estimates can be explained by the flows at Grindstone Creek. This counterintuitive result is likely due to the deficiency of the calibration data set in headwater sites, as further described in section 3.4. More than half of the variability of the large-stream attenuation is explained by the average streamflow. This suggests that the attenuation parameter values could partially compensate for the lack of information about the rainfall-runoff process in the model. We also note that the lower attenuation values during periods of higher flow are plausible and in agreement with previous theoretical and empirical work on stream ecology, as the biotic (uptake) and abiotic (settling) processes responsible for attenuation have much less time to exert control on the nutrient load en route to the receiving water body when the streamflow rate is higher [Stream Solute Workshop, 1990; Donner et al., 2004; Basu et al., 2011]. The emergence of this pattern from an empirical model is a very interesting result. Figure 6 presents time series plots of the posterior values of the stream attenuation coefficients for one SWALLOW I realization (MCMC - Pred1 - E1) and one SWALLOW II realization (MCMC - ksDYN Pred1 - E1). The spike in large-stream attenuation in 1999 corresponds with the year of the lowest average and maximum flows during the study for Grindstone (station 3) and Spencer Creeks (station 6) and the third (fourth) lowest average (maximum) streamflow values for Redhill Creek (station 2). As previously mentioned, these two formulations have different relationships to the literature prior, and therefore it is unlikely to obtain complete agreement of the resulting parameterizations. However, it is clear that there is significant interannual variability of the stream attenuation coefficients. This variability is important to take into account when a temporally static SPARROW implementation is used to estimate the locations of nutrient source areas, as discussed in section 3.3.

Figure 5.

Scatterplots of yearly stream attenuation coefficient (ks1 refers to attenuation in first and second-order streams, ks2 to attenuation in third and higher order streams) values Total Phosphorus model against average streamflow. These results correspond to the MCMC - ksDYN formulation with the Pred1 climate forcing complexity.

Figure 6.

Time series plots of ks for the total phosphorus model. Black and gray lines refer to parameters from the MCMC - ksDYN and MCMC formulation with the Pred1 climate forcing complexity. Dashed lines indicate upper and lower limits of the 95% credible interval, solid lines indicate the medians of the distributions.

3.3. Spatio-Temporal Identification of Source Areas

[40] The spatially distributed nature of the SPARROW model offers estimates of the sources and movement of contaminant masses within the basin, and the dynamic augmentation allows this analysis to be extended in time to gain an understanding of how contaminant source and sink processes and source areas change through time. Of course, the inference drawn regarding the temporal variability of source areas in this particular exercise is subject to the ability of our calibration data set to represent temporal trends in loading. As we document in section 2.5.2, we may be underestimating the true interannual variability of our load estimates, and thus an exploration into how the annual estimates of loading propagate through the model to estimate the year-to-year contributions of source areas can offer insights into the credibility of any modeling exercise that aims to accommodate the interannual variability in a watershed context. We used the posterior parameter distributions from the MCMC - ksDYN - E1 - Pred1 realizations to estimate annual basin loads and source areas. Estimated basin load of total phosphorus ranged from 6.6 ± 2.2 to 18.1 ± 4.7 tons per year with a mean of 11.0 ± 3.3 t (the errors are in units of 1 standard deviation). While we recognize that these whole-basin load estimates are subject to the caveat of applying the model coefficients to areas smaller than the calibration subwatersheds, it should be noted that less than 15% of the total basin area falls into this category. The annual precipitation alone explained a substantial portion of the temporal variability of whole basin estimates of total phosphorus (r2 = 0.61, p < 0.01).

[41] Our year-specific estimates of watershed parameters offer insights into the nutrient delivery in the Harbor for each year in addition to the static estimates typically made with SPARROW. Figure 7 shows the spatial and temporal variability of total phosphorus yield delivered to Hamilton Harbor at both the subwatershed and the reach scale. The subwatershed scale maps show the importance of proximity to the Harbor as an important factor in determining the load levels, but the reach scale maps reveal that proximity to the large (third order and higher) streams is also a significant predictor of high areal delivery, likely because the small-stream attenuation coefficients were consistently higher than the large-stream attenuation coefficients. The coefficient of variability of interannual phosphorus delivery appears to increase upstream from the Harbor, where the effect of the variability of the stream coefficients is the highest. Figure 8 presents the estimated per area deliveries at the subwatershed and reach scale for the years 1999 and 2006, i.e., the years of the highest and lowest values of large stream attenuation (see ks2 values in Figure 6). It is clear that the temporal variability of the watershed parameters affects the spatial variability of estimated watershed per area deliveries for total phosphorus. The estimated whole-basin delivery of total phosphorus in 1999 was 6.7 ± 2.1 tons and in 2006 was 15.0 ± 4.1 tons.

Figure 7.

Spatial variability of total phosphorus delivered yield at the (top) watershed and (bottom) reach scales. (left) Mean percent contribution of total load to the Harbor for all years per square kilometer. (right) The coefficients of variability of mean percent contribution across years. These results correspond to the MCMC - ksDYN formulation with the Pred1 climate forcing complexity.

Figure 8.

Spatial-temporal variability of total phosphorus delivered yield at the (top) watershed and (bottom) reach scales. (left) The percent contribution of total load to the Harbor per square kilometer for 2006, the year with the lowest value of ks2. (right) The percent contribution of total load to the Harbor per square kilometer for 1999, the year with the highest value of ks2. These results correspond to the MCMC - ksDYN formulation with the Pred1 climate forcing complexity.

3.4. Jack-Knife Model Evaluation

[42] While the time for space substitution allowed us to parameterize the model, the spatial sampling intensity of the calibration data set was admittedly low. To evaluate whether our data contain sufficient information to impartially draw inferences about the relative contribution of different source areas as well as the interplay between temporal and spatial variability, we performed a jackknife experiment in which the most parsimonious model realization (MCMC - ksDYN - Pred1 - E1) was parameterized against a set of data without the load measurements from one of the six stations. The same exercise was repeated six times, each time omitting a different station. Our hypothesis was that if the calibration data set does not have enough spatial detail, the parameter values should change significantly when the data from any particular station are omitted. This is of course subject to the caveat that we do not learn how well the model performs in areas which are not explicitly represented in the calibration data set, e.g., small streams with drainage basins less than 25 km2, which include headwater areas and areas along the shore of the Harbor. The total phosphorus parameter posteriors were fairly consistent across the moving window (Table 6). The station omission realization with the least correspondence to the posteriors obtained with the full data set is the one without the headwater station for Spencer Creek (station 4), which is the only headwater station of the entire study catchment (Figure 1). The largest discrepancy of phosphorus export coefficients occurs when one of the two urban stations is omitted (station 1).

Table 6. Jackknife Experiment-Markov Chain Monte Carlo Estimates of the MCMC - ksDYN Formulation With the Pred1 Climate Forcing Complexity Parameterized With Total Phosphorus Dataa
ParametersStation Omitted
0123456
MeanSDMeanSDMeanSDMeanSDMeanSDMeanSDMeanSD
  • a

    Each column indicates which station was omitted. The first column is taken from Table 4 and is intended for comparison purposes.

α9.231.328.331.148.741.219.721.506.122.189.041.479.401.27
β10.130.020.140.020.130.020.120.020.090.020.130.020.130.02
β20.060.020.060.020.080.030.050.020.050.010.060.020.060.02
kr15.393.6515.973.7915.843.6915.433.6510.903.1413.953.9916.324.96
ks10.130.070.180.100.150.080.100.070.070.050.120.080.110.07
ks20.030.010.020.010.030.010.030.020.030.010.030.020.030.02
γ10.290.080.300.090.290.080.300.070.290.060.270.080.260.08
σ0.030.010.030.010.040.010.040.010.040.010.040.020.040.02
ks1σ0.710.010.710.010.710.010.710.010.710.010.710.010.710.01
ks2σ1.000.021.000.021.000.031.000.021.000.021.000.021.000.02

[43] We also used the jackknife experiment to gauge the strength of the space for time substitution. We wanted to ascertain whether the model was able to reproduce the values of the omitted stations. More specifically, from column 1 in Table 6, we took the predictions of the (logged) load at station 1. From column 2, we took the loads predicted at station 2, and so on. These were used as the independent variables in a regression with the measured (logged) loading data. This regression was significant (p < 0.001, r2 = 0.91, slope = 0.85). Even with less information in space, the model is able to reasonably predict loads at locations not used in calibration, provided those locations are comparable to those included in the calibration data set.

4. Discussion

[44] In this paper, we presented a methodological framework that aims to facilitate SPARROW application on scales of relevance to local management and to study the relative contribution of loading source areas over time. Our analysis offers a new perspective into the SPARROW modeling practice by shifting the focus toward an examination of the interplay between time and space. We adopted a repeated measures approach that enables the model to be parameterized in relatively small areas with comparatively few monitoring sites, and subsequently examined two strategies to accommodate temporal variability of the nutrient loading estimates. The first approach (SWALLOW I) assumes that the SPARROW model provides a time-invariant baseline estimate of watershed loading, while weather-related forcing factors describe the temporal variability. The second one (SWALLOW II) assumes that the processes described by the SPARROW model are dynamic and are further modulated by temporal predictors. We integrated this framework with Bayesian calibration schemes, founded upon informative prior parameter distributions and statistical formulations that can explicitly consider the data uncertainty and/or the temporal structure of model residuals. Our results show that the SWALLOW framework is able to accommodate the interannual variability of the nutrient loading estimates. Importantly, the dynamic SWALLOW II approach appears to effectively balance between performance and complexity. We also found that the temporal changes of SPARROW model parameters can be significant, thereby driving year-to-year variability of model-estimated total phosphorus source areas. The remainder of the discussion is structured to address the factors comprising the study design (statistical formulation and temporal predictors), the role of the spatial sampling protocol, and a final section examines the plausibility of the model parameterization obtained.

4.1. Role of Statistical Formulations and Temporal Predictors

[45] Previous research has considered time-varying parameters in the context of conceptual rainfall-runoff models [e.g., Reichert and Mieleitner, 2009] as well as models of other environmental systems (e.g., a managed pond, Lin and Beck [2007]). While some of these efforts have significantly improved our predictive capacity, the resulting time series of parameter values does not always give clear ideas about the structural model deficiencies. In this study, we provided two pieces of evidence that corroborate the mechanistic basis of our time-varying stream attenuation coefficients. First, we showed that the annual stream attenuation estimates of phosphorus are inversely proportional to the error terms of our WALK formulation, suggesting that the assumption of a static attenuation parameter may be responsible for much of the error variability [Sadraddini et al., 2011a]. Second, consistent with empirical findings and ecological theory [Stream Solute Workshop, 1990], model estimated (log transformed) stream attenuation is inversely proportional to the (log transformed) mean annual flow. The latter finding may partly indicate that the values of in-stream attenuation compensate for the structural inadequacy of the SWALLOW I model in describing the transformation of precipitation into runoff. Earlier work postulated a resemblance between time-varying parameters and mean-reverting statistical processes [Riechert and Mieleitner, 2009; Tomassini et al., 2009], whereas we here adopt a formulation akin to that used in dynamic linear modeling [Sadraddini et al., 2011a, 2011b]. By contrast to a mean-reverting process, intended to control the uncertainty of posterior parameter values, our approach led to a minor broadening of the 95% credible intervals of the posterior distributions of the dynamic parameters (Figure 6). Yet, our dynamic approach represents a way to relax the assumption of stationarity that a mean-reverting process assumes, and therefore depicts systematic trends that cannot be otherwise accommodated, such as the effect of in-stream nutrient attenuation on the interannual variability of the nutrient source areas of the Harbor.

[46] Our study identified the total annual precipitation as the key predictor variable to accommodate the interannual variability of the total nutrient loading into Hamilton Harbor. In particular, a preliminary exploratory analysis showed that precipitation accounts for a substantial portion of the variability of the log-transformed phosphorus loads of Redhill, Grindstone, and Spencer Creeks (r2 = 0.41, p < 0.001 for phosphorus). Yet, should the SWALLOW model be applied elsewhere, we recommend that a variety of temporal predictors be examined. In urban areas, predictors related to population or population density could augment the land cover data typically used to infer impervious surface cover. Categorical variables related to local management practices, such as upgrades to storm water management systems or passage of stricter land use controls in agricultural systems, could also be incorporated to model their effects on watershed functioning. Further, if the SWALLOW framework is applied to broader spatial scales, the spatial variability of the temporal predictors may also need to be taken into account, i.e., our W matrix could vary in space as well as in time and each entry would correspond to the value of a predictor at a specific subwatershed or reach for a particular year.

4.2. Role of Watershed Spatial Sampling Protocol on the Model Parameterization

[47] The modeling of phosphorus was resilient to the station omissions of the jackknife experiment. By far the greatest discrepancy occurred when station 4 was omitted, which was the only nonurban station draining a predominantly headwater catchment. Station 4 also drains the subwatershed with the largest density of wetlands, so it is no surprise that the delivery coefficient and the small-stream attenuation parameter vary the most with respect to the full data set when station 4 is omitted. Also surprisingly, the jackknife experiment showed that when information from one station is omitted, the model is able to reasonably reproduce the posterior median loads from the omitted stations. In other words, the model is able to borrow enough strength from the included sites to model the load at the omitted site. It is not clear that this would be the case if more than one station were to be omitted, but this result does bolster the strength of the argument that the information in space we have is able to produce some (albeit uncertain) inference. Of course, the loads at the “missing stations” as estimated by the model when calibrated to only five stations are subject to more uncertainty than when the same loads are estimated over the entire data set. The average value of the posterior standard deviation of the log-transformed loads was 0.11, while for the loads estimated by the model calibrated to only five stations of data it was 0.14. Another caveat of the present exercise is that we were not able to condition model parameterization upon areas not explicitly represented in the calibration data set, such as small streams with drainage basins less than 25 km2, which include many headwater areas and areas along the shore of the Harbor.

4.3. Plausibility of the Parameterization

[48] We compare the understanding of the functioning of the Hamilton Harbor watershed obtained from our modeling work with results from the SPARROW literature as well as other empirical evidence from the study area. This comparison will allow us to gauge the plausibility of the model parameterization, while enriching our understanding of the functioning of this site of intense management interest. Our model parameterization suggests that agricultural land uses result in higher phosphorus export than urban land uses. This is consistent with some previous SPARROW studies [Moore et al., 2004] and empirical work [Law et al., 2004; Soldat and Petrovic, 2008; Soldat et al., 2009]. However, other SPARROW applications [Alexander et al., 2004; García et al., 2011] and empirical literature [Beaulac and Reckhow, 1982] have found the opposite – urban land exports more phosphorus than agricultural land. Some studies in Southern Ontario tend to agree with the latter assertion [e.g., Winter and Duthie, 2000]. Both agricultural and urban nutrient export fluxes are highly variable and contingent upon a number of regulatory factors, including soil type, urban storm water management, agricultural intensity and conservation practices [Beaulac and Reckhow, 1982]. Our estimates of urban phosphorus export are slightly higher than estimates obtained from the Great Lakes region in the United States [Robertson and Saad, 2011], but comparable to those from the American Southeast [García et al., 2011].

[49] Following empirical work, previous SPARROW applications have generally used soil parameters as delivery variables for phosphorus [Beaulac and Reckhow, 1982]. Wetlands have been shown to attenuate the loadings of phosphorus through processes such as particle settling, denitrification, and biotic uptake [Reddy et al., 1999; Krieger, 2003], but have not been explicitly included in SPARROW models as a delivery factor. While some SPARROW model applications have considered soil properties that would implicitly address wetlands, such as soil organic matter and soil pH [García et al., 2011], these factors would describe delivery from both wetland and upland areas and may not reflect processes unique to wetlands [Reddy et al., 1999; Krieger 2003]. Working in the Laurentian Great Lakes, Robertson and Saad [2011] included a land use class intended to describe phosphorus from background sources, into which they combined wetlands, forest and scrubland. Our present results suggest that wetlands do not necessarily act as a source but may also as a sink for phosphorus at the landscape scale. In this regard, one of the lessons learned from our analysis is that SPARROW applications should consider wetland coverage as a candidate delivery variable.

[50] While we did not allow reservoir attenuation processes to vary through time, they were nonetheless an important aspect of the spatial variability of phosphorus delivery to the Harbor. The posterior mean settling velocity (kr) for total phosphorus was about 15.4 m yr−1 with a 95% credible interval of 9.2–23.4 m yr−1. The total phosphorus settling velocity is close to that of 14.3 m yr−1 obtained by Alexander et al. [2004] for the continental United States, but substantially greater than the value of 4.8 m yr−1 obtained by Robertson and Saad [2011] for the United States' Laurentian Great Lakes and Midwest. Notably, empirical research conducted in Cootes Paradise, a coastal wetland draining about half of Hamilton Harbor's basin, corroborates our phosphorus settling velocity results. Prescott and Tsanis [1997] review the net settling velocity estimates for Cootes Paradise and report values ranging from 10 to 16 m yr−1. We used our posterior settling velocities to estimate 95% credible intervals for the retention of phosphorus for Cootes Paradise. Total phosphorus retention ranged from 16% to 36%, with a median of 25%. These values are in agreement with those reported by Krieger [2003] for a coastal wetland in the Lake Erie basin. This implies that Cootes Paradise plays a major part in reducing nutrient loading to Hamilton Harbor and, not surprisingly, the water quality in Cootes Paradise itself is degraded [Prescott and Tsanis, 1997].

[51] The posterior means of small-stream phosphorus attenuation were somewhat lower than previous SPARROW work in New Zealand [Alexander et al., 2002], but nonetheless reasonably commensurate. Although our separation of stream classes was based on Strahler's [1952] stream order and not the discharge or travel time, our results are consistent with other SPARROW studies in that the values of small stream attenuation (ks1) were smaller than those for large stream attenuation (ks2), reflecting the higher attenuation rates of smaller streams [Stream Solute Workshop, 1990; Alexander et al., 2002, Figure 7]. It should again be stressed that our database is deficient in headwater sampling sites, so our estimates of small-stream attenuation and its variation in time are subject to substantial uncertainty. Nonetheless, the SWALLOW II framework is a promising one for accommodating interannual variability into SPARROW models.

[52] Estimated large stream attenuation coefficients proved to be quite variable in time for most statistical formulations. The mechanisms that modulate the variability of nutrient attenuation across stream size are fairly well established in the literature. They generally refer to the tighter coupling of smaller streams with their streambeds, whereby biological and chemical removal processes in the sediments have greater access to the nutrients in the water column [Stream Solute Workshop, 1990; Alexander et al., 2002; Alexander et al., 2004]. The longer hydraulic residence time of smaller streams also allows these processes to operate for longer times. Recent work suggests that stream stage explains the interannual variation of nutrient attenuation at a particular site over time [Basu et al., 2011], implying that the coupling between the streambed and water column changes from year to year. Consistent with Basu et al.’s [2011] findings, we here show that the interannual variability of the average discharge, a function of stream stage, can explain more than half of the variability of stream attenuation estimates from the SPARROW model.

[53] An interesting implication of this study is that for Hamilton Harbor's basin, the interannual variability of the contribution of phosphorus source areas may be strongly affected by the capacity of stream reaches to attenuate nutrient loads (Figure 8). Empirical studies of nutrient uptake in rivers indicate significant variability of nutrient attenuation rates at annual timescales for phosphorus [Doyle et al., 2003] and nitrogen [Claessens et al., 2009]. Donner et al. [2004] found that nutrient attenuation rates varied nearly two-fold between wet and dry years in the Mississippi River, with wet years exhibiting lower attenuation. Basu et al. [2011] also showed an inverse relationship between stream stage and nutrient attenuation that was consistently manifested across spatial and temporal scales. This finding implies that fluctuations in stage (and discharge) may indeed affect the spatial location of significant nutrient source areas at a variety of scales and is not an artifact of the present analysis. While previous research has documented the variability of in-stream attenuation at annual timescales, the SWALLOW framework allows us to estimate how this variability impacts basin-scale nutrient source areas.

[54] In conclusion, SPARROW is a spatially distributed, empirical model that can be used to identify areas of unusually high delivery of nutrient loads to water bodies and prioritize the allocation of scarce management resources accordingly. Yet, nutrient loads, source/sink processes, and source areas are subjected to significant interannual variability, and thus a temporally static approach can oversimplify the broad range of dynamics typically experienced in a watershed context. As an alternative to employing complex, process-based models to understand the mechanisms of this variability, our SWALLOW modeling framework offers a parsimonious representation of watershed functioning through time that builds upon the SPARROW foundation. Consistent with empirical and theoretical work, our model parameterization suggests that in-stream attenuation rates varied inversely with streamflow, which also affects the location of nutrient source areas. While we found little support for the use of time-varying export coefficients and stream attenuation coefficients, it is most likely that nutrient export and delivery to streams varies at annual timescales as well as in-stream attenuation processes. The SWALLOW II framework we present in this paper is a promising approach to arrive at a balanced depiction of the interaction of nutrient export, landscape delivery, attenuation, and climate when applied to larger datasets. By quantifying the interannual variability of nutrient delivery to the receiving water body, we believe that the modeling framework proposed can meaningfully assist long-term watershed management planning. The Bayesian nature of our approach allows the estimation of critical nutrient loads that could result in acceptable probabilities of compliance with different water quality criteria, while accounting for the different sources of uncertainty (model structure imperfection, measurement error, model input uncertainty) as well as natural system variability.

[55] On a final note, we believe that models are a worthwhile scientific activity and a sound basis for the policy-making process only if the underlying assumptions are acknowledged and impartially communicated [Zhang and Arhonditsis, 2008]. For example, our jackknife experiment showed that the watershed sampling protocol is deficient in headwater sampling sites. Model development is a dynamic, iterative process similar to the policy practice of adaptive management. The model parameterization/structure can be sequentially refined as new knowledge is obtained from the system, and this gradual model evolution should provide the basis for revised (and improved) management actions.

Acknowledgments

[56] This project has received funding support from the Ontario Ministry of the Environment (Canada Ontario grant Agreement 120,808). Such support does not indicate endorsement by the Ministry of the contents of this material. Christopher Wellen has also received support from the Ontario Graduate Scholarships.

Ancillary