## Introduction

A long-standing challenge in the study of water quality is the estimation of the flux of a suspended or dissolved substance in a river, averaged over a time period such as a year or a decade. This is commonly known as the load estimation problem. The problem setting is the following. At a monitoring location on a river there is a set of instantaneous measurements of concentration. They are fairly sparsely measured in time, for example, 12-36 observations per year, measured over some study period of about a decade. They are accompanied by a complete record of daily mean discharge values for the full study period for a location at, or very near, the sample collection site.

A common method for estimating the average flux values for monthly, annual, or multiyear periods is to use multiple regression to estimate a daily flux based on observations of daily discharge, time, season, and in some cases other variables derived from these. These daily flux estimates are then summed to form estimates of average flux over the period of interest. These types of models are generally referred to as “rating curve” or “regression-based” approaches. There is an extensive literature about these models that includes the following: Dolan *et al*. (1981), Ferguson (1986, 1987), Cohn *et al*. (1989, 1992), Preston *et al*. (1989), Crawford (1991), Robertson and Roerish (1999), Runkel *et al*. (2004), Cohn (2005), Crowder *et al*. (2007), Hirsch *et al*. (2010), Stenback *et al*. (2011), Verma *et al*. (2012), and Richards *et al*. (2012). This study will examine three examples of such regression-based estimates. These are the seven-parameter LOADEST model (L7), the five-parameter LOADEST model (L5), and Weighted Regressions on Time, Discharge, and Season (WRTDS). The LOADEST models are perhaps the most commonly used approaches, and they have been used in many applications in the U.S. Geological Survey (USGS) over the past two decades including the estimation of mean flux values used in Spatially Referenced Regressions on Watershed Attributes (see Smith *et al*., 1997; Preston *et al*., 2011). The WRTDS method was introduced more recently (Hirsch *et al*., 2010), in response to some of the limitations of LOADEST and has been applied in studies of the Mississippi River, Lake Champlain, and Chesapeake Bay watersheds (see Sprague *et al*., 2011; Medalie *et al*., 2012; Zhang *et al*., 2013). There are other variations on the LOADEST model that also address some of its weaknesses. Examples include a variety of models with other explanatory variables (e.g., Hirsch, 1988; Vecchia *et al*., 2009; Garrett, 2012). All of these include the use of explanatory variables beyond the three types of variables used in LOADEST and WRTDS (time, discharge, and season) but they are not considered in this study. Reliable parameter estimation may be difficult for such models if sample sizes are small. However, such models may be useful as potential solutions to the bias problem. The diagnostic approaches proposed in this study for the L5, L7, and WRTDS models would all be applicable to these more complex models.

All three of these models share the same motivation: use some version of multiple regression to estimate the concentration (and hence flux) on unsampled days. In each case, the regression assumes that the log of concentration is the sum of four components drivers: discharge, season, long-term trend, and random unexplained variation. These models should not be thought of as a simple causative model. Rather, they are used because in many situations they can provide efficient and unbiased estimates of concentrations on unsampled days. If this formulation based on discharge, season, and time does not remove substantial amounts of variance from the data, then other methods, such as interpolation or ratio estimators (see Richards and Holloway (1987) for a description of the Beale Ratio Estimator) may be better estimators. Purely deterministic models are also available for estimating mean flux. These models are not considered here. The datasets considered in this study all have characteristics that suggest that this type of multiple regression procedure is potentially useful for estimating fluxes on all days, and hence estimating long-term mean fluxes.

There have been several articles published in the last three years that explore the bias of these types of flux estimates. These include Stenback *et al*. (2011), Garrett (2012), Moyer *et al*. (2012), and Richards *et al*. (2012). A 2013 update to the USGS LOADEST code also discusses this issue and provides diagnostics for the bias problems (at http://water.usgs.gov/software/loadest/doc/loadest_update.pdf). An important point made in all of these studies is that there are cases in which application of some of these regression-based approaches can produce long-term average flux estimates that are biased by many tens of percent and either positive or negative in sign. These studies also show that there are many cases in which one or more of these models provide estimates that are virtually unbiased. The studies that identify the potential for severe bias have only brief discussions of potential causes of these problems. Within these discussions, heteroscedastic residuals are mentioned in the first three, lack of fit is mentioned in the last two, and failure to properly capture the seasonal pattern is mentioned only in the first of these.

The goal of this study was to advance the understanding of the problem of large biases in these three regression-based flux estimates. This includes discussion both of the statistical issues as well as some of the hydrologic processes that may give rise to these problems and also suggest some methods that practitioners can use to help identify cases in which these problems may exist. This will be accomplished through the analysis of a limited collection of datasets, which are sufficiently rich in samples such that the true fluxes for decade-long period can be approximated directly from the data. Small samples are then selected from these datasets to evaluate the bias in estimates that would have been computed by using each of these regression-based models. The analysis will show that these bias problems arise due to severe violations of the set of assumptions that are the basis of the L5 or L7 models. It will also show that these problems may be reduced or eliminated under the much less restrictive assumptions of the WRTDS model. The study describes some tools for diagnosing the problem and suggests approaches to reducing the risk of producing highly biased results.

This study only considers datasets of 120 observations or more, which are representative of the full distribution of discharges for the site, and have no censoring. All three models do include appropriate computational schemes that allow for the analysis of censored data, but to limit the complexity of the analysis, censored cases were not considered here.