Evaluation of streamflow as a covariate in models for predicting daily pesticide concentrations

Several models have been developed with streamflow as a covariate for predicting daily pesticide concentrations in surface water systems. Among these models, the SEAWAVE‐QEX model has been proposed by the United States Environmental Protection Agency for regulatory assessments. In this paper, the model was modified to include alternative transformations of streamflow data, and to include no streamflow covariates. The predictive performance of the modified models was evaluated and compared with the original SEAWAVE‐QEX model using a high frequency sampling dataset that includes 9 sites with 10 years of data from the Atrazine Ecological Monitoring Program. Streamflow transformations evaluated included those in the SEAWAVE‐QEX model (short‐term and mid‐term flow anomalies), reduced models with only short‐term flow anomaly or without any flow covariates, normalized Box‐Cox transformation of flow, and combinations of normalized Box‐Cox and flow anomalies. Loglinear interpolation was also evaluated. The normalized Box‐Cox transformation provided best predictive performance and significantly better predictive performance than that of the SEAWAVE‐QEX model for a target quantity of regulatory interest, such as the maximum 1‐day rolling average (similarly for the maximum 60‐day rolling average, but not significantly so). The no‐flow covariate model was only slightly worse than Box‐Cox. Significant differences in predictive performance of the SEAWAVE‐QEX model were detected across sites.

Two general approaches to imputing the data are interpolation and model-based methods.Interpolation methods (e.g., linear interpolation) are simple to use but have the drawback of being downwardly biased for the estimation of peak concentrations due to the frequent failure to sample peak concentrations.Statistical model-based methods are more complicated but can reduce the bias through use of daily measured covariates.They also can provide confidence intervals for target quantities of interest.
Model-based methods include regression models that assume uncorrelated errors (e.g., Hirsch et al., 2010;Vecchia et al., 2008;Zhang et al., 2019), or regression models that assume serially correlated errors (hereafter referred to as "time-series models") (e.g., Lee et al., 2019;Mosquin et al., 2016Mosquin et al., , 2018;;Shumway, 2001;Vecchia, 2018;Zhang & Hirsch, 2019).The regression models that assume uncorrelated errors focus on the mean function and should only be used if there is little or no serial correlation, or if the data spacing stretches beyond the range of serial correlation.The time-series models, in allowing for serial correlation, are more suitable for the estimation of target quantities that are functions of consecutive day readings (e.g., maximum 21-day rolling average), if the data exhibit serial correlation (which is usually the case for chemical concentration data).In this paper, modifications of the SEAWAVE-QEX time-series model (Vecchia, 2018) are considered for atrazine, an herbicide where the maximum m-day rolling averages have been evaluated as a target quantities of interest (USEPA, 2019).
The most common daily measured covariate used by model-based methods is co-located streamflow (or flow).Although the relationship is sometimes mixed, flow generally positively correlates with pesticide concentration since storm events can induce runoff from treated areas where pesticide residue may exist, often increasing flow and concentration simultaneously (although dilution may also occur when pesticide residue available for runoff is low).
This paper presents a study of the contribution of flow to the predictive performance of the SEAWAVE-QEX model, using different transformation options of the streamflow data as investigated by Mosquin et al. (2016Mosquin et al. ( , 2018)).The SEAWAVE-QEX model was developed by the United States Geological Survey (USGS) for prediction of daily concentration values and has recently been proposed by the United States Environmental Protection Agency for Tier III and Tier IV regulatory use (USEPA, 2019).The model is an extension of the earlier SEAWAVE-Q regression model, which has covariates for linear trend, seasonality (e.g., pesticide usage season), and two transformations of flow data representing the short-term and mid-term effects of flow.
The two flow covariates used in the SEAWAVE-QEX model are the short-term flow anomaly (STFA) and the mid-term flow anomaly (MTFA).
These are, respectively, a deviation of flow on Day t from the average of that of the previous 30 days (including Day t) and a deviation of a 30-day average of flow (up to Day t) from the overall average of the flow over the duration of the entire time period examined for the site in question.
Other transformations of flow can be considered; Mosquin et al. (2018) used a type of Box-Cox transformation (Box & Cox, 1964), motivated by the observation that log-transformed flow tends to retain a high degree of skewness.The Box-Cox transformation considerably reduces the skewness of flow, and when used as a covariate in a linear prediction model, Box-Cox-transformed flow was found to provide superior predictive performance to either the STFA covariate or a no-flow model (i.e., the SEAWAVE-QEX model with the STFA and MTFA terms omitted).This work examines the role of flow in the predictive performance of the SEAWAVE-QEX model using the Atrazine Ecological Monitoring Program (AEMP) as an example dataset and using a variety of covariate models based on either flow anomalies or Box-Cox transformation (Mosquin et al., 2018).Also considered for comparison is an interpolation approach (loglinear interpolation; see Mosquin et al., 2016) and a no-flow covariate model.Evaluations are made using a very high frequency sampling dataset (i.e., nearly daily in usage season) of 9 sites each with 10 years of data (2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) from the targeted AEMP, allowing calculation of actual (true) values for comparison.Sampling designs are systematic 7-and 14-day sampling and target quantities are the maximum 1-and 60-day rolling averages.Point and interval estimation performance of each model is evaluated.

| Data
Atrazine data used in these analyses were sampled from nine sites of the AEMP each having 10 years of data over the 10-year period of 2010- The first two letters of each site identification represent the state abbreviation (e.g., IA for Iowa), followed by numbers and sometimes letters.
The usage season of atrazine products varies by state due to different crop growing seasons: March 1 to July 31 for Louisiana, January 16 to May 31 for Texas, and April 1 to July 31 for the other three states.
Sites were chosen for their high frequency of sampling during the usage season, having atrazine sampled on a daily (or almost daily) basis during that time.Of the 32,868 days (i.e., 9 sites × 10 years) in the dataset, atrazine samples were available for 9510 days (29%), almost exclusively concentrated in the usage season (8880 days sampled within usage season, giving 80% sampling frequency within usage season).This is illustrated by Figure 1, which provides a plot of the sampled days at each site-year.The horizontal axis in Figure 1 represents the 365 (or 366) days in a year, the vertical axis represents each of the 90 site-years, starting with IA-04 in 2010 at the bottom and ending with TX-01 in 2019 at the top (space limitations forced only every second site-year to be labeled, although all 90 are represented).Each sampled day for each site-year is represented by a black dot, and the beginning and ending of the usage season for each site-year is represented by a red line.
For example, Figure 1 indicates that IA-04 in 2010 was sampled almost every day during the usage season (indicated by red lines) but almost not at all outside the usage season.
Due to the usage season sampling, most samples were detected above the program limit of detection (LOD) of 0.05 μg/L for atrazine, with only 223 (2.3%) readings being below the LOD.Readings below the LOD were flagged as such for subsequent modeling analyses.Non-detects tended to be distributed at low count among the site-years; the site-year with the greatest number of values below the LOD was IA-052013 with 27 measurements below the LOD, which was 28% of the total sampled measurements for that site-year.See Table S1 for counts of values below detection limits by site-year.
Average daily flow readings were obtained that were co-located with the atrazine sampling sites.With rare exceptions (61 days) each atrazine reading had a paired flow measurement.Of the 61 days with atrazine readings and no flow, 33 occurred at NE-042013: all others were distributed at low frequency across the site-years.No days had a flow measurement but no atrazine measurement.
F I G U R E 1 Distribution of sampled days in dataset.Each sampled day for each site-year is represented by a black dot, and the beginning and ending of the usage season for each site-year is represented by a red line.The horizontal axis represents the 365 (or 366) days in a year, the vertical axis represents each of the 90 site-years, starting with IA-04 in 2010 at the bottom and ending with TX-01 in 2019 at the top (space limitations forced only every second site-year to be labeled, although all 90 are represented).
Study sites are located at small and highly seasonal headwater systems vulnerable to hydrological runoff events, occasionally having days of zero flow (due to stream dry out).A total of 1289 of the sampled flow readings were of flow = 0, approximately 14% of the total sampled days.Spatially, these flow = 0 readings were most common at the Texas site (463/1289 or 36%) and temporally in 2019 (339/1289 or 26%).
Counts of zero-flow days by site-year are provided in Table S2.In order for log-transformations of flow to be defined, zero flow values were set the minimum flow value allowable in the data of 0.001 m 3 /s.
Atrazine readings on non-sampled days were imputed using loglinear interpolation (i.e., linear interpolation after logarithmic transformation) after first setting measurements on January 1 and December 31 equal to one half of the LOD = 0.05 μg/L (because no measurements ever occurred on those days outside of the usage season).Interpolation to one half LOD at the year beginning and end is justified by the strong seasonality of the atrazine yearly concentration profile (i.e., concentrations were mostly not detectable outside of the usage season).Flow readings on non-sampled days were imputed similarly by loglinear interpolation except measurements on January 1 and December 31 were set equal to the first and last observed flow values within that year.

| The SEAWAVE-QEX model
The SEAWAVE-Q model (Vecchia et al., 2008) is a linear regression model developed for the analysis of pesticide trends over several years.For the purpose of modeling long-term trends, it was considered reasonable to assume that the error term in the model consisted of independent normal random variables with constant variance.
The SEAWAVE-QEX model (Vecchia, 2018) is an "extended" version of the SEAWAVE-Q model.It includes the same fixed terms as those of the SEAWAVE-Q model but has the error term modified to allow seasonal heteroscedasticity and serial correlation among errors from observations occurring several days apart.The change in the error term was considered necessary for the changed purpose of the SEAWAVE-QEX model, which is to simulate daily concentrations (i.e., "fill in" simulated concentration data for unsampled days).A requirement of both the Seawave-Q and SEAWAVE-QEX models is that each study site should include at least 3 years (not necessarily consecutive) of data.Note that if more than one site is being modeled, separate models are fitted for each site.
The SEAWAVE-QEX model is expressed as where • logC(t): Base-10 log concentration (typically, in μg/L) at time t (expressed in decimal years).
Model fixed terms are as follows: • W(t): Seasonal wave allowing for one or two usage seasons per year.The seasonal wave is expressed with a single application season, or with two distinct application seasons: where WaveCL1 is wave class 1, SW1 is a seasonal wave from class 1, m denotes the pulse input model, h is the decay rate, s is the phase shift, WaveCL2 is wave class 2, and SW2 is a seasonal wave from class 2 (see Figure S1 for an illustration, and Vecchia, 2018, for further details).
• STFA(t): STFA, which is a deviation of daily streamflow (in the log scale) from the past 30-day mean.Defining Q(t) as the measured flow on day t, the STFA for day t is: • MTFA(t): MTFA, which is a deviation of the past 30-day mean streamflow (in the log scale) from the overall mean: where M is the average sampled log flow for the entire time period analyzed.
• t − t m : Mean-corrected linear trend over the entire time period analyzed (i.e., t m is the midpoint of the time period being analyzed).
The error term has two components: • SSD(t): Seasonal heteroscedasticity (non-constant variance) as follows: where  > 0 and 0 ≤  < 2 are seasonal standard deviation parameters.
• Z(t): Normalized model error, assumed to have a normal distribution with mean zero, variance one, and serial correlation function: where t is the integer day associated with decimal time t, k is the time lag in days, and CTS > 0 is the correlation time scale in days.This function implicitly assumes a nugget of zero, as identified by Mosquin et al. (2016).
The SEAWAVE-QEX model is implemented using R code (R Core Team, 2022) modified from USGS code (swaveqexFunctions_V2.R at https:// pubs.usgs.gov/sir/2017/5159/downloads/).Code was modified to remove flow truncation at 1 ft 3 /s (line 88 of swaveqexFunctions_V2.R).The study dataset has many flow values below that threshold (n = 1289 (14%) of zero flow, n = 4511 (48%) between 0 and 1 ft 3 /s), and so to preserve them, the truncation was removed.An explanation for the 1 ft 3 /s truncation in the code or documentation was not provided by the model developers.To examine the effect of the truncation, the model was run both with and without the truncation, if without then zero-flow values were set to the minimum flow observed in the dataset, 0.001 m 3 /s (0.035 ft 3 /s).

| SEAWAVE-QEX conditional simulations
Once the regression model (1) has been fit at a site, then the fitted model is used to generate K conditional simulations (or traces) of daily concentrations across all site-years of the site (in this study, K = 250).In each trace: for sampled days, the simulated values are set equal to the observed values; for unsampled days, the simulated values are randomly generated from the model (i.e., they possess the same statistical properties of the fitted model) conditional on the observed values (hence the term conditional simulation).Thus, each conditional simulation is anchored to the observed data.An illustration of a conditional simulation (first of 250 generated) of site IA-04 is provided in Figure S2.
The next step is to back-transform all K traces to the original scale (μg/L) and, for each trace, estimate the desired target quantity (e.g., maximum 60-day rolling average) for each site-year within the site.For example, if a site has three site-years, then three annual estimates of the target quantity will be obtained from each trace.The final point estimate for each site-year is then obtained from the mean of the corresponding K trace estimates.Vecchia (2018) suggested that corresponding 80% confidence interval estimates can be obtained from the 10th and 90th percentiles of the distribution of K trace estimates.Vecchia (2018) also suggests that if higher-level confidence intervals are desired, then the number of traces should be set so that C ≥ 20 ∕ (1 − p ∕ 100), where p is the desired percentage level; for example, 95% confidence intervals would require C ≥ 400.

| Skewness
Skewness is an important consideration in the choice of suitable covariates for a model.As noted in Mosquin et al. (2016), excess skewness of flow covariates can lead to extreme overestimation of right-tail target quantities after back-transformation of log-scale predictions to the original scale. (4) The skewness of a distribution is a measure of its asymmetry.A value of zero indicates perfect symmetry, a positive value indicates a distribution skewed to the right, and a negative value indicates a distribution skewed to the left.Common rules of thumb for interpreting skewness appear to be as follows: if the skewness lies between −0.5 and 0.5, then the distribution is assumed to be fairly symmetric; if the skewness lies between −1 and −0.5 or between 0.5 and 1, then the distribution is assumed to be moderately skewed; and if the skewness is less than −1 or greater than 1, then the distribution is assumed to be highly skewed.
Skewness of atrazine and various flow transformations were calculated for the purpose of deriving alternative flow covariates.

| Alternative flow covariates
Motivated by the results of Mosquin et al. (2018) alternative flow covariates were also considered.The first of these is a normalizing transformation based on the method of Box and Cox (1964).The normalized Box-Cox flow at day t is defined as where ̂ is the Box-Cox parameter as estimated using the Box-Cox method for non-zero measured flow values for that site-year, and M and S are the site-year mean and standard deviation, respectively, of either Q(t) ̂ − 1 ̂ or logQ(t) depending on the value of ̂ .
We also considered transformations that combine the Box-Cox approach with the idea of deviation from historical averages as implemented by the SEAWAVE-QEX flow anomaly covariates.Two types of transformation were considered: 1.A short-term Box-Cox flow transformation: 2. A normalized Box-Cox STFA, defined as in Equation ( 6) but replacing Q(t) with antilog STFA(t).
Durations of  = 2, 5, 10, 20, and 30 were used in this study, with  = 30 matching the duration of the STFA.Calculation of the above flow covariates for the study dataset showed extremely high values for the normalized Box-Cox flow in site-year TX-012011 (Box-Cox flow values are plotted in Figure S3).Flow values at the site-year were found to be highly unusual: of 102 sampled values, 93 were less than or equal to 0.001 m 3 /s (this represents stream dry-out) and the three largest were 0.036, 0.063, and 0.369 m 3 /s.This siteyear was dropped from the analysis.Other years at that site were retained, not showing unusual Box-Cox flow.7)

| Models, target quantities, and sampling designs
Two target quantities were used in this evaluation: (1) maximum 1-day rolling average (max1RA) and ( 2) maximum 60-day rolling average (max60RA).The max1RA target quantity is the same as the annual maximum concentration within a site-year and hence is the most extreme target quantity; max60RA lies somewhere between the annual maximum (i.e., max1RA) and the annual mean (which could be expressed as max365RA, or max366RA for leap years).
Two sampling designs were used in this evaluation: (1) 7-day and (2) 14-day systematic sampling.For example, within each site-year a 7-day sampling design generated 7 distinct samples corresponding to 7 different starting days (i.e., a starting day of 1 generated sampled days 1, 8, 15, etc.; a starting day of 2 generated sampled days 2, 9, 16, etc.; and so on); similarly, a 14-day sampling design generated 14 distinct samples.
These designs were implemented using common usage season start days across multiple years at the site, resulting in seven and fourteen different samples at the site for the two designs.Within a particular site-year, there are only seven or 14 possible samples and so this approach provides all possible within site-year samples at equal frequency.The actual 7-or 14-day samples were collected only within the defined usage season of each site-year.This was done to limit the number of imputed values (which would occur outside of the usage season) in the samples, allowing model fits to almost exclusively be based on sampled data.Furthermore, concentration levels in the "trough" of the non-usage season would have a negligible impact on the estimation of peak concentrations.

| Performance measures
Five performance measures were evaluated for each of the two sample designs and target quantities.These included (1) bias, (2) relative bias, (3) standard deviation (or root variance), (4) root mean squared prediction error (RMSPE), and (5) relative RMSPE (RRMSPE).Actual coverage of confidence intervals was also evaluated for the SEAWAVE-QEX model and modified versions thereof.Because loglinear interpolation is not a model-based method, no model-based confidence intervals exist for it.
For each of the 9 sites, the size of the population (of days) covering all 10 years is 3652 (i.e., two leap years are included).Thus, for each site i, this population can be expressed as where N = 3652 (this exposition ignores the removal of TX-012011 for clarity).
For a 7-day systematic sampling design, U i can be subdivided into 7 exhaustive and mutually exclusive samples simply by constructing each sample by the selection of a different start day (from among the first 7 days), and thereafter systematically selecting every 7th day.The 7 samples can be expressed as s 1 = {1, 8, 15, … , N − 4}, …, s 7 = {7, 14, 21, … , N − 5}.Samples for a 14-day systematic sampling design can be similarly constructed, except that in this case 14 exhaustive and mutually exclusive samples will be constructed by the selection of a different start day from among the first 14 days.Note that different samples are defined by different common usage season start days applied across multiple years at each site.
For model-based methods, a trace estimator of ij from a particular sample can be expressed as The final estimator of ij from a particular sample is obtained from the mean of the trace estimators The loglinear estimator of ij from a particular sample can also be expressed as ̂ ijs k (i.e., it is based on a single "trace").
The mean of the estimators of ij across all samples is where K = 7 or 14 for 7-or 14-day sampling, respectively.
For each site-year, the Bias was calculated as the difference between the sample mean of the estimators and the true value: Relative bias was calculated as the bias divided by the true value: The root variance (or standard deviation) is the square root of the average of the squared deviations of the sample estimators from the mean of the sample estimators: where K = 7 or 14 for 7-or 14-day sampling, respectively.
The RMSPE is the square root of the mean squared prediction error (MSPE), which is the average of the squared deviations of the sample estimators from the true value: where K = 7 or 14 for 7-or 14-day sampling, respectively.
The RMSPE is a measure of total prediction error, and it plays an important role in the understanding of estimator performance due to the algebraic identity: which shows that the MSPE of an estimator can be decomposed into the two contributing components of bias and variance.
The RRMSPE was calculated as the RMSPE divided by the true value: The relative performance measures (i.e., relative bias and RRMSPE) standardize site-year quantities by division by true site-year values.
This adjustment is motivated by the fact that a few site-years have large true values, and without the adjustment those site-years tend to have a dominating effect on the (unadjusted) performance measures.
Vecchia (2018) suggests the following method for calculating 80% nominal site-year confidence intervals: generate 100 conditional simulations (or traces) from the model, estimate the desired target quantity from each of those traces, and then use the 10th and 90th percentiles of the distribution of 100 estimated values as the 80% confidence interval for the site-year.Nominal coverage was compared to the actual coverage, which is defined as the number of intervals (out of 7 or 14) containing the true value of the target quantity expressed as a percentage of the total number of intervals.For example, in a 7-day sampling design, if for 6 of the 7 sample realizations the true target quantity lies within the corresponding nominal 80% confidence interval, then the actual coverage for that site-year is 6/7 = 86%.
The Kruskal-Wallis (i.e., nonparametric one-way ANOVA) test was used to assess between-site differences of selected performance measures among selected models for each sampling design and target quantity.The Kruskal-Wallis test was used instead of the parametric ANOVA test because of its robustness to violations of normal-theory and homoscedasticity assumptions.

| Skewness and flow transformations
The skewness of study flow covariates was calculated using values on sampled flow days (i.e., no imputed values were included).The results are given in Table 1, which also shows the absence of skewness in the dependent variable log(Atrazine) with a value of 0.1, and the moderate skewness in log(flow) with a value of 0.5 (if the flow variable is truncated, then skewness increases to 1.0). (13) Short-term variability in flow within the SEAWAVE-QEX model is provided by the STFA covariate, which Table 1 indicates has a skewness value of 1.3 (i.e., highly skewed to the right).In comparison, the Box-Cox transformed flow covariate, FBXCX, has a skewness value of 0.3 (i.e., relatively symmetric).
The short-term Box-Cox flow variables, STFBXCX(m), showed appreciable skewness in Table 1, decreasing as the duration m of the anomaly increased, ranging from 1.7 (highly right-skewed) for m = 2 to 0.6 (moderately right-skewed) for m = 30.A similar trend was observed for the Box-Cox flow anomalies, FBXCXSTFA(m), although for these variables the skewness was almost completely removed at larger values of m, with values ranging from −0.3 for m = 2 to 0.1 for m = 30.The moderate to high degree of skewness found in the STFBXCX(m) variables suggests that they will perform poorly on predictive measures.This was in fact found to be the case and these models are not examined further in this paper.
Figure 2 presents a distributional representation (i.e., mean, median, 5th, 25th, 75th, and 95th percentiles) of ̂ ijs k ∕ ij (i.e., Ratio of site-year sample estimate and true value), where a value of 1 indicates a perfect estimate, a value less than 1 represents an underestimate, and a value greater than 1 represents an overestimation.The distributional representations are presented for all models (except the STFBXCX(m) models), and both target quantities and sample designs.Loglinear interpolation consistently underestimates, leading to negative bias (discussed below).
All model-based methods tend to overestimate when the target quantity is the max1RA, except FBXCX, which has a distribution centered very close to 1.These characteristics are more pronounced for 14-day sampling.When the target quantity is the max60RA, then all model-based methods have narrower distributions centered close to 1, but FBXCX appears to have the narrowest distribution.Also note that the no-flow model appears to have distributions suggesting better predictive performance than the corresponding distributions of the SEAWAVE-QEX model.

| Site-year level performance
Performance of methods at the site-year level is examined closely for three of the models: Loglinear interpolation, SEAWAVE-QEX, and FBXCX.Loglinear interpolation and SEAWAVE-QEX are considered baseline comparable models, as is FBXCX to a lesser degree.Evaluations with other models give results similar to those provided here.
Maximum 1-day rolling average biases at individual site-years for three models (loglinear interpolation, SEAWAVE-QEX, and FBXCX) and the two sampling designs are plotted in Figure 3a.Site-year biases vary considerably and show possible site-level variation by absolute bias, for example NE-04 being least variable, IA-05 more variable.For all models, biases at 7-and 14-day sampling appear positively correlated, with 14-day sampling biases being more pronounced than 7-day sampling for the log-linear model, but not necessarily for the other two models.
Across models, loglinear interpolation is, as expected, consistently negatively biased, while for SEAWAVE-QEX, the biases show no obvious downward or upward tendency for 7-day sampling, perhaps a slight upward tendency for 14-day sampling.The Box-Cox approach has no clear upward or downward bias.
Some site-years had considerable bias, in particular IA-052014 for both the SEAWAVE-QEX (407 μg/L (7-day) and 351 μg/L (14-day)) and the FBXCX method (216 μg/L (7-day) and 168 μg/L (14-day)).IA-052014 had the largest observed instantaneous atrazine concentrations in the dataset and the largest target quantity values (344 and 90 μg/L for the max1RA and max60RA, respectively).Its concentration profile is characterized by an abrupt increase to the yearly maximum followed by a more gradual tapering (see Figure S4).
Plots of site-year bias for the max60RA (Figure 3b) show absolute bias to be much reduced.Neither SEAWAVE-QEX nor FBXCX showed a clear tendency to positive or negative bias, although loglinear remained negatively biased.Again, IA-052014 tended to show large absolute bias, with large values now being negative in magnitude: SEAWAVE-QEX (3 μg/L (7-day) and −22 μg/L (14-day)) and FBXCX method (−4 μg/L (7-day) and −26 μg/L (14-day)).The scatterplots in Figure 4a The Kruskal-Wallis test was used to assess between-site differences in bias among loglinear interpolation, SEAWAVE-QEX, and FBXCX for each sampling design and target quantity.For loglinear interpolation, differences in bias across all sites were statistically significant (i.e., p < 0.05) for both sampling designs and target quantities; for SEAWAVE-QEX, differences in bias across all sites were statistically significant for both target quantities under 14-day sampling; and for FBXCX, differences in bias across all sites were statistically significant in all cases except for the max1RA under 7-day sampling (where p = 0.0875).
The scatterplots in Figure 4b,e indicate a high degree of correlation in the standard deviation of the SEAWAVE-QEX and Box-Cox methods at the site-year level.Most points lie close to the diagonal line (and slightly above for max1RA), but there are several points that lie well to the left of the diagonal line, indicating in those cases that the standard deviation of SEAWAVE-QEX is much greater than that of FBXCX for the site-years represented by those points.
The Kruskal-Wallis test was used to assess between-site differences in standard deviation among loglinear interpolation, SEAWAVE-QEX, and FBXCX for each sampling design and target quantity.For all three models, differences in standard deviation across all sites were statistically significant for both sampling designs and target quantities.
The scatterplots in Figure S5 indicate a high degree of correlation between bias and standard deviation for loglinear interpolation, but only moderate correlation for SEAWAVE-QEX or FBXCX.The high correlation observed for loglinear interpolation with respect to the max1RA may possibly be explained by the fact that exactly one realization of the 7-or 14-day sample design within each site-year will include the true annual maximum value.If other realizations include estimated maximum values much lower than the true value, then this will simultaneously elevate the bias and standard deviation.The scatterplots in Figure 4c,f indicate a high degree of correlation in the RMSPE of the SEAWAVE-QEX and Box-Cox methods at the site-year level.Most points lie close to but above the diagonal line, but there are several points that lie well to the left of the diagonal line, indicating in those cases that the RMSPE of SEAWAVE-QEX is much greater than that of FBXCX for the site-years represented by those points.
The Kruskal-Wallis test was used to assess between-site differences in RMSPE among loglinear interpolation, SEAWAVE-QEX, and FBXCX for each sampling design and target quantity.For all three models, differences in RMSPE across all sites were statistically significant for both sampling designs and target quantities.

| Overall predictive performance
Overall median predictive performance by model, target quantity, and sample design is provided in Tables 2-6.Median value of the performance measures assessed over all site-years are provided in the tables.Mean values of the performance measures were also assessed and were larger than median values suggesting right skew of site-year performance measures.Mean results were otherwise largely consistent with median results and are provided in Supporting Information (Tables S3-S8).The results are now described for each performance measure separately.

| Bias
Table 2 indicates that if the target quantity is max60RA, then the median bias (in absolute terms) is consistently small (<1 μg/L) across all prediction methods, except loglinear interpolation, and both sampling designs.The median bias is negative and slightly larger (in absolute terms) for loglinear and is slightly larger for 14-day sampling.
Table 2 also indicates that if the target quantity is max1RA, then the median bias is larger (in absolute terms) and more variable across prediction methods when compared with max60RA, and there is a further consistent increase for 14-day sampling compared with 7-day sampling.Loglinear has the largest negative median bias (in absolute terms): −18.0 and −24.1 μg/L for 7-and 14-day sampling, respectively.For 7-day sampling, FBXCXSTFA(m) for m = 5, 10, 20, 30, FBXCX, and no-flow models all indicate a median (absolute) bias <0.5 μg/L; and for 14-day sampling, FBXCXSTFA(m) for m = 10, 20, 30, FBXCX, and no-flow models all indicate a median bias <7.2 μg/L.In contrast, the SEAWAVE-QEX model has a median bias of 4.4 and 11.8 μg/L for 7-and 14-day sampling, respectively.
Mean bias results (Table S3) indicate similar patterns to those of median bias shown in Table 2.For max60RA, all mean bias values were almost identical to the corresponding median bias values; and for max1RA, most mean bias values were consistently larger (by a factor of 1.5 or more) than the corresponding median bias values.
The median and mean bias results of the SEAWAVE-QEX model with the flow variable truncated (designated as SEAWAVE-QEX* in Table 2 and Table S3) are greater than or equal to those of the SEAWAVE-QEX model with the flow variable not truncated.

TA B L E 2
Overall median bias of model estimates by sampling design and target quantity.

| Relative bias
The median relative bias results shown in Table 3 are similar to the median bias results shown in Table 2.In contrast to bias which is expressed in the units μg/L, relative bias is expressed as a percentage of the target quantity in question.
Table 3 indicates that if the target quantity is max60RA, then the median relative bias (in absolute terms) is consistently small (<10%) across all prediction methods, except loglinear interpolation, and both sampling designs.The median relative bias is negative and much larger (in absolute terms) for loglinear interpolation and is somewhat larger for 14-day sampling.
Table 3 also indicates that if the target quantity is max1RA, then the median relative bias is larger (in absolute terms) and more variable across prediction methods when compared with max60RA, and there is a further consistent increase for 14-day sampling compared with 7-day sampling.Loglinear interpolation has the largest negative median relative bias (in absolute terms): −36.3% and −54.6% for 7-and 14-day sampling, respectively.For 7-day sampling, FBXCXSTFA(m) for m = 5, 10, 20, 30, FBXCX, and no-flow models all indicate a median (absolute) relative bias ≤3.3%; and for 14-day sampling, FBXCXSTFA(m) for m = 10, 20, 30, FBXCX, and no-flow models all indicate a median relative bias ≤26.0%.In contrast, the SEAWAVE-QEX model has a median relative bias of 14.1% and 34.0% for 7-and 14-day sampling, respectively.
Mean relative bias also was assessed over all site-years (Table S4), and these results indicate similar patterns to those of median relative bias shown in Table 3.For max60RA, values were almost identical to the corresponding median values although somewhat larger at 14-day sampling; and for max1RA, most values were consistently larger (by a factor of 1.5 or more) than the corresponding median values.
The median and mean relative bias results of the SEAWAVE-QEX model with the flow variable truncated (designated as SEAWAVE-QEX* in Table 3 and Table S4) are greater than those of the SEAWAVE-QEX model with the flow variable not truncated.

| Standard deviation
Table 4 indicates that if the target quantity is max60RA, then the median standard deviation is consistently small (<2 μg/L) across all prediction methods for 7-day sampling; the median standard deviation is approximately doubled for 14-day sampling.Median standard deviation is smallest for loglinear interpolation.
Table 4 also indicates that if the target quantity is max1RA, then the median standard deviation is larger across all prediction methods when compared with max60RA (by a factor of around 10), and it is approximately twice that for 14-day sampling.Loglinear interpolation has the smallest median standard deviation at approximately half of that of most other prediction methods, with values of 10.1 and 12.2 μg/L for 7-and 14-day sampling, respectively.FBXCX has the smallest median standard deviation among all model-based prediction methods, with values of 12.7 and 25.2 μg/L for 7-and 14-day sampling, respectively.
Mean standard deviation also was assessed over all site-years ( With respect to estimates based on 14-day sampling, the median and mean standard deviation results of the SEAWAVE-QEX model with the flow variable truncated (designated as SEAWAVE-QEX* in Table 4 and Table S5) are greater than those of the SEAWAVE-QEX model with the flow variable not truncated.

| RMSPE
Table 5 indicates that if the target quantity is max60RA, then the median RMSPE is consistently small (≤2.1 μg/L) across all prediction methods for 7-day sampling; the median RMSPE is approximately doubled for 14-day sampling.
Table 5 also indicates that if the target quantity is max1RA, then the median RMSPE is larger across all prediction methods when compared with max60RA (by a factor of around 10).Median RMSPE of 14-day sampling is larger than that of 7-day sampling, being approximately twice as large for methods other than loglinear interpolation.FBXCX has the smallest median RMSPE, with values of 16.6 and 32.9 μg/L for 7-and 14-day sampling, respectively.
Mean RMSPE also was assessed over all site-years (Table S6), and these results indicate similar patterns to those of median RMSPE shown in Table 5.Most mean RMSPE values were fairly consistently larger (by a factor of around 1.5) than the corresponding median RMSPE values.
The median and mean RMSPE results of the SEAWAVE-QEX model with the flow variable truncated (designated as SEAWAVE-QEX* in Table 5 and Table S6) are greater than or similar to those of the SEAWAVE-QEX model with the flow variable not truncated.

| RRMSPE
The median RRMSPE results shown in Table 6 mimic the median RMSPE results shown in Table 5, but with a notable exception described below.In contrast to RMSPE which is expressed in the units μg/L, RRMSPE is expressed as a percentage of the target quantity in question.
Table 5 indicates that the RMSPE values for max60RA (ranging from 1.8 to 3.7 μg/L) are consistently smaller than they are for max1RA (ranging from 16.6 to 45.0 μg/L) by a factor of around 10; in contrast, Table 6 indicates that the RRMSPE values for max60RA (ranging from 19.8% to 43.3%) are consistently smaller than they are for max1RA (ranging from 35.4% to 85.2%) but by a much smaller factor of around 2. In all other respects, the patterns shown in Table 6 mimic those shown in Table 5.
Mean RRMSPE also was assessed over all site-years (Table S7), and these results indicate similar patterns to those of median RRMSPE shown in Table 6.Most mean RRMSPE values were fairly consistently larger (by a factor of around 1.5) than the corresponding median RRMSPE values.A 2-sided t-test indicated that the mean RRMSPE was statistically significantly smaller for FBXCX than for SEAWAVE-QEX for the max1RA, irrespective of sample design.
The median and mean RRMSPE results of the SEAWAVE-QEX model with the flow variable truncated (designated as SEAWAVE-QEX* in Table 6 and Table S7) are greater than or similar to those of the SEAWAVE-QEX model with the flow variable not truncated.The Kruskal-Wallis test was used to assess between-site differences in actual coverage between SEAWAVE-QEX and FBXCX for each sampling design and target quantity.For both models, differences in actual coverage across all sites were statistically significant for both target quantities at 14-day sampling.

| Prediction intervals
Table 7 indicates that for max1RA with 7-day sampling, the median actual coverage slightly exceeds the nominal 80% level for all prediction methods except for the FBXCX and no-flow models; for the latter models, the median actual coverage is 71% and it is 86% for the remaining models.For max1RA with 14-day sampling, the median actual coverage is 79% for all prediction methods.For max60RA with both sampling designs, the median actual coverage is 71% for all prediction methods.
Mean actual coverage also was assessed over all site-years (Table S8), and these results indicate fairly similar patterns to those of median actual coverage shown in Table 7, except that most mean values were fairly consistently smaller (a reduction of around 5-10 percentage points) than the corresponding median values (e.g., the maximum mean actual coverage value was 76.0%, which is less than the nominal 80%).
Performance differences were more pronounced at reduced duration of the moving average (i.e., more extreme value) and with less frequent sampling.For the RRMSPE comparison of FBXCX and SEAWAVE-QEX these differences were significantly different for the max1RA but not the max60RA, although for the max60RA the effects were in similar direction and lack of significance may be due to inadequate statistical power.As was described in Mosquin et al. (2018), excess skewness in covariates can lead to overestimation of extremes on the log-scale, and the effects are magnified after back transformation to the original scale (e.g., figure 1 of Mosquin et al., 2018).The negative association between skewness and predictive performance was observed again in this paper (e.g., compare Tables 1 and 7).
A variety of modified SEAWAVE-QEX models was evaluated, the simplest consisting of the removal of the MTFA.The reduced STFA-only model showed improved performance, suggesting that the simpler model is to be preferred.Further simplification by removal of both STFA and MTFA showed the no-flow model to have the best performance of the three approaches.The approach of Mosquin et al. (2018) of applying a normalized Box-Cox transformation to flow was found to provide the best performance of all methods examined.
Attempts to combine the flow-anomaly and Box-Cox approaches were only partially successful.Flow anomaly transformations of Box-Cox transformed flow had substantial skewness (Table 1) and poor performance, while Box-Cox transformations of flow anomaly transformations removed skewness but did not perform as well as the Box-Cox transformation alone.Based on these results, it appears that simple flow anomaly transformations are ineffective.Historical flow values may still prove useful if other sources of information are considered (e.g., in combination with application timings and amounts); further study may be productive in this area.
In comparison to the statistical models, loglinear interpolation had lowest median RRMSPE at 14-day sampling among all approaches and had comparable 7-day sampling performance.It is by far the simplest approach to implement and understand but is limited in its application by its negative bias and not having variance or interval estimates.FBXCX also had a significantly smaller mean RRMSPE than that of SEAWAVE-QEX for the max1RA, irrespective of sample design.
Evaluation of the performance of SEAWAVE-QEX confidence intervals showed average actual coverage across site-years to be close to nominal for the max1RA and somewhat below nominal for the max60RA.Coverage at individual site-years could vary wildly, potentially as low as 0%.An understanding of reasons for model failure with respect to confidence interval estimation at these individual site-years is a topic for future work.
In its original form, the SEAWAVE-QEX model used the flow variable truncated at 1 ft 3 /s, with no explanation given for this flow rate threshold limit.This truncation was removed in all analyses represented in this paper, except for those where the original SEAWAVE-QEX model with truncated flow was compared against the same model without truncated flow.In almost all cases, the performance measures were improved for the SEAWAVE-QEX model without truncated flow compared with those for the same model with truncated flow.The truncated flow variable also exhibited greater skewness than the untruncated variable.
In addition to performance differences across site-years, significant site-level differences were detected for bias, variance, RMSPE, and actual coverage.Whether or not these differences can be associated with site characteristics is another topic for future work.
The conclusions of this study are specific to the monitoring data that were analyzed but are expected to generalize for atrazine to sampling designs with similar frequencies and other extreme target quantities (e.g., upper percentiles) due to the impact of any excess skew of transformed flow covariates.For other compounds and monitoring data in different surface water systems the relationship between covariate skewness and that of the response variable may differ and should be evaluated separately (see Rathjens et al. (2022) for an evaluation of SEAWAVE-QEX on flufenacet, flufenacet sulfonic acid, diflufenican, and isoproturon).

| CON CLUS IONS
• For estimation of the Max1RA all model-based methods performed at least as well as SEAWAVE-QEX, including the no-flow model.The FBXCX was best, having lowest absolute bias, and mean RRMSPE significantly smaller than for SEAWAVE-QEX.
• Flow anomaly transformations based on deviations of current flow from historic flow were not found to improve predictive performance.
• For estimation of the Max60RA, all model-based methods perform similarly.FBXCX had best estimates, but not significantly so.Given the max1RA results, this may be due to inadequate power.
• Predictive performance of flow covariates was consistent with expectations based on covariate skewness.
• The loglinear interpolation method was negatively biased with median RRMSPE at 14-day sampling slightly lower than FBXCX.
• Prediction intervals had median actual coverage similar to nominal for the max1RA and somewhat lower than nominal for the max60RA.
Actual coverage varied greatly at the site-year level.
• Truncation of the flow variable at 1 ft 3 /s should be removed, because almost all performance measures were improved when compared to those where the truncation was not removed.
• These analyses suggested topics that might be useful for further study: (1) examine site attributes such as stream hydrological vulnerability to runoff that might explain the significant site-level variation in predictive performance observed for bias, variance, RMSPE, and actual coverage levels; (2) extend analyses to other pesticides with different environmental fate properties; (3) the approach to develop flow covariates in this paper has been statistical-an alternative approach might use deterministic (e.g., hydrological) models.
2019.These sites were located in five states: • Iowa (2 sites): IA-04, IA-05 | 1461 EVALUATION OF STREAMFLOW AS A COVARIATE IN MODELS FOR PREDICTING DAILY PESTICIDE CONCENTRATIONS • Missouri (4 sites) MO-01, MO-02, MO-07N, MO-08 • Nebraska (1 site): NE-04 • Texas (1 site): TX-01 (m), for m = 2, 5, 10, 20, 30 5. FBXCXSTFA(m), for m = 2, 5, 10, 20, 30 6.The no-flow covariate model Model 1 follows the form of the SEAWAVE-QEX model as presented by Vecchia (2018); model 2 is identical to the SEAWAVE-QEX model, but with MTFA removed; models 3-5 replace the STFA and MTFA covariates with the indicated Box-Cox covariate; and model 6 excludes all flow covariates.All models listed above were run with the flow truncation removed.An additional analysis was conducted that compared the SEAWAVE-QEX model with the truncation removed versus the same model with the truncation in place (this model is referred to as the SEAWAVE-QEX* model).

(
,d indicate a high degree of correlation in the bias of the SEAWAVE-QEX and Box-Cox methods at the siteyear level.The points in Figure 4a,d represent a paired bias relationship between the two methods for each site-year.Most points lie close to but above the diagonal line (which indicates equality among pairs), but there are several points that lie well to the left of the diagonal line, indicating in those cases that the bias of SEAWAVE-QEX is much greater than that of FBXCX for the site-years represented by those points.FI G U R E 2 Summary statistics of distribution of ratios of sample estimates ̂ ijs k to true target quantity values ij for all sampling designs, target quantities, and models.The horizontal line at 1.0 indicates an estimate that perfectly matches its target quantity.F I G U R E 3 Site-year performance statistics: (a) Maximum 1-day rolling average bias, (b) maximum 60-day rolling average bias, (c) maximum 1-day rolling average root variance, (d) maximum 1-day rolling average root mean squared prediction error (RMSPE).Models are loglinear interpolation (black), SEAWAVE-QEX (red), and FBXCX (blue) with 7-day sampling (open points) and 14-day sampling (filled points).

Figure 5a ,
Figure5a,b present site-year actual coverage values for maximum 1-and 60-day rolling averages, respectively.The plots include actual coverages only for the SEAWAVE-QEX and FBXCX models, for both sample designs.Actual coverage is highly variable by site-year, with values ranging from 0% to 100%.
Skewness of atrazine and study flow covariates.Highly skewed values are in bold, and moderately skewed values are in italics.
TA B L E 1 Overall median relative bias of model estimates by sampling design and target quantity.Overall median standard deviation of model estimates by sampling design and target quantity.Overall median RMSPE of model estimates by sampling design and target quantity.

Table
S5), and these results indicate similar patterns to those of median standard deviation shown in Table 4.Most mean standard deviation values were fairly consistently larger (by a factor of around 1.5) than the corresponding median standard deviation values.Overall median relative RMSPE of model estimates by sampling design and target quantity.