### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Seasonal Kendall's Slope
- 3. An Improved Approach Through GAMS
- 4. Inference for GAMS With Correlated Errors
- 5. Robust Trend Estimation
- 6. Comparison of Estimation Methods
- 7. Case Study: Salinity Trend at Balranald
- 8. Discussion
- Appendix A:: Weighted Analysis of Variance
- Acknowledgments
- References
- Supporting Information

[1] This paper advocates the use of Generalized Additive Models (GAMs) for the estimation of nonlinear trends in water quality in the presence of serially correlated errors. The GAM methodology is applicable to a range of physical, chemical and biological water quality variates. Comparison with the estimate based on Seasonal Kendall's Slope and robust regression is discussed. An example is given concerning log-transformed stream electrical conductivity, which is adjusted for flow. The monthly data have first order autocorrelation exceeding 0.5 and the trend is markedly nonlinear. Seasonal effects are shown to have been changing over time.

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Seasonal Kendall's Slope
- 3. An Improved Approach Through GAMS
- 4. Inference for GAMS With Correlated Errors
- 5. Robust Trend Estimation
- 6. Comparison of Estimation Methods
- 7. Case Study: Salinity Trend at Balranald
- 8. Discussion
- Appendix A:: Weighted Analysis of Variance
- Acknowledgments
- References
- Supporting Information

[2] Trends over time in meaningful physical, chemical and biological water quality variates may usefully inform the decisions of water managers. They identify important changes in the river health and allow the effectiveness of key management interventions to be assessed. It is vital that there are techniques available that adequately address important features of the monitoring data in determining any trend. In this paper we focus on the need for methods that accommodate the following.

[4] • serial correlation.

[5] • adjustment for season and covariate effects such as flow.

[6] Since water quality data are often not Normally distributed, *Hirsch et al.* [1982] argue for the use of nonparametric methods. For the detection of trends, they proposed a variation of Kendall's Tau that eliminates the constant seasonal effects. Seasonal Kendall's Tau (SKT) has become a very popular method of trend detection for monotonic trends. The test statistic can be significant if the change in the water quality between any two time points tends to be in the same direction (increasing or decreasing). Much effort has been made to extend the methodology. The technique has been modified to cope with serial correlation [*Hirsch and Slack*, 1984] and with allowance for covariates [*Alley*, 1988; *Libiseller and Grimvall*, 2002 and references therein]. There is a price to pay for adopting a nonparametric approach as some statistical power is lost when it is modified to allow for serial correlation, and methods for handling covariates are not entirely satisfactory. Among the reviews of trend analyses are *Hirsch et al.* [1991], *Esterby* [1996] and *Dixon and Chiswell* [1996]. In all of these, much of the discussion concerns nonparametric tests and their modifications.

[7] Trend estimation is distinct from trend detection in that the objective is to quantify the changes and investigate models that provide interpretation as to the processes possibly causing them. For our purpose, we consider trend as being the progress over time in the mean (arithmetic or geometric) of a variable of interest. Kendall's Tau may also be presented as an estimate of linear trend [*Theil*, 1950; *Sen*, 1968], and similarly for the seasonal modification, between SKT and the Seasonal Kendall's Slope (SKS) [*Hirsch et al.*, 1982]. For series of sufficient length, the trend can be nonlinear or even nonmonotone, in which case SKS would be inappropriate. An extension of Kendall's Tau to detect nonlinearity of a trend is given by *El-Sharaawi and Niculescu* [1993]. It assumes independent and identically distributed errors and is based on Kendall's Tau applied to first differences of the data. This would provide an estimate of trend linear in first differences but would lose sensitivity if the differences were not monotonic. We consider other approaches based on estimation of a more general trend that also incorporates serially correlated errors.

[8] Water quality variates often change with discharge, season, management interventions and other anthropogenic activities. The adjustment of the water quality trend for these sources of variation is often desirable for it reveals the underlying trend not attributable to known causes. These sources of variation are represented by covariate terms in a regression model. Seasonal effects may be represented by sinusoidal terms or by a seasonally dependent intercept (factor). Step changes in the trend and covariates may be included in the regression model.

[9] We first discuss the pros and cons of Seasonal Kendall's Slope. We then demonstrate the use of Generalized Additive Models (GAMs). GAMs have been used extensively in the analysis of air pollution; “nearly ubiquitous” according to the *Health Effects Institute* [2003], and “a standard tool in time series studies of air pollution and health in the past decade” [*He et al.*, 2006]. We contend that GAMs should be more frequently used in the analysis of water quality trends. They have been used to analyze numerous sequences of salinity data in Australia where trends are often nonlinear, [e.g., *Nathan et al.*, 1999; *Jolly et al.*, 2001], and for streamflow [*Letcher et al.*, 2001], but less frequently elsewhere. GAMs allow the nonlinearity of a trend to be of arbitrary shape. Examples of nonlinear trends occur in the work of *Brillinger* [1994], *Miller and Hirst* [1998], *Schreider et al.* [2002], *Cox et al.* [2005], *Giannitrapani et al.* [2005], and *McMullan et al.* [2007]. With GAMs there is great scope for more complex modeling. For example, the parameters in the regression could change over time, and we estimate a changing seasonal effect in the example below. We show how GAMs can be analyzed in the presence of serial correlation. As a rule of thumb, first order autocorrelation exceeding 0.2 can, if ignored, seriously distort the inference about the trend by leading to liberal tests and confidence intervals that are too narrow [*Hirsch and Slack*, 1984]. In our Australian experience from two large studies, 80 out of 117 sites had first order autocorrelation *ρ* ≥ 0.2 for log electrical conductivity (EC) in *μ*S/cm. For a smaller set of 26 sites, 6 of the 8 other chemical log-variates analyzed had *ρ* ≥ 0.2 for at least 20 sites. Autocorrelation was consistently high for log-counts of 16 phytoplankton species at 9 sites, where 94 of the 139 species by site combinations analyzed had *ρ* ≥ 0.4. This paper gives a brief discussion of robust methods because they may be considered as being intermediate between SKS and the GAM methodology. A comparison of the three broad approaches to trend analysis is provided.

[10] For simplicity of exposition, we shall assume that monthly data are available. If the data are collected at a higher resolution than monthly we assume that it has been reduced to a monthly value by taking a mean or median and the reduced data visualized as being arranged in yearly rows and monthly columns January to December. Note that the methodologies described in this paper readily extend to data means at any frequency, e.g., weekly, monthly or quarterly; in many papers, these are referred to as “seasons”.

### 2. Seasonal Kendall's Slope

- Top of page
- Abstract
- 1. Introduction
- 2. Seasonal Kendall's Slope
- 3. An Improved Approach Through GAMS
- 4. Inference for GAMS With Correlated Errors
- 5. Robust Trend Estimation
- 6. Comparison of Estimation Methods
- 7. Case Study: Salinity Trend at Balranald
- 8. Discussion
- Appendix A:: Weighted Analysis of Variance
- Acknowledgments
- References
- Supporting Information

[11] Nonparametric estimation methods are widely used in water resources and refer to methods that do not require an explicit assumption of a distribution. This is often seen to add robustness and extend the validity of the inference. For instance, nonparametric methods may be used to estimate the linear trend parameter so that the inference is valid without the assumption that the data are Normally distributed. While nonparametric estimation methods alleviate the need for specific distributional assumptions, outliers should still be examined and a decision made as to whether they should be omitted before assessing the underlying trend.

[13] Under the extra assumption of linear trend, where

and the errors *ɛ*_{ij} are assumed to be independent and identically distributed so that any permutation of the yearly rows is equally likely, we may estimate the magnitude of the trend. The SKS linear trend estimate as given by *Hirsch et al.* [1982] is

[14] The SKT and SKS are mathematically connected. Consider the function

and note that = *S*(0) and is the solution of *S*(*γ*) = 0. Under the assumption of a linear trend, confidence intervals for *γ* can be obtained from inverting the SKT applied to *y*_{ij}−*γ*time_{ij}.

[15] Thus SKS is the estimate of linear trend due to *Theil* [1950] and *Sen* [1968] modified by removing seasonal effects. Note that the SKT test is invariant under monotone transformation of *y*, but that the SKS estimate is not.

[16] In the *Hirsch and Slack* [1984] treatment of SKT, serial correlation is allowed between months in the same row but not between rows, so that the permutation argument still holds. Any correlation between December one year and January the next year is ignored. The correlations between columns within each row are assumed constant and are estimated in nonparametric fashion. The test is not robust against highly persistent processes (*ρ* > 0.6) and some power is lost due to estimating binary correlations; a series of at least 10 years of monthly data is recommended by *Hirsch and Slack* [1984] and *Harcum et al.* [1992] before the advantage of using SKS over parametric estimation of linear trend would start to be realized. An alternative approach is to remove the autocorrelation by prewhitening. For this an estimate of *ρ* is required. *Yue et al.* [2002] recommended that where a linear trend exists, the required autocorrelation is estimated by eliminating the trend. The differences *y*_{ij}−*ρy*_{ij−1} can then be analyzed by SKS with trend *γ*(1 − *ρ*) from which *γ* could be estimated. *Wang and Swail* [2001] propose an iterative prewhitening and trend estimation procedure (see their Appendix A). *Zhang and Zwiers* [2004] compare the performance of different prewhitening and trend estimation procedures, including the procedure of *Wang and Swail*. If the trend is nonlinear, the estimated autocorrelation could be misleading and the method unreliable. For longer series, it may be unreasonable to assume that the trend is linear, so we conclude that SKS is best for fairly short series where serial correlation may be assumed to be negligible.

[17] Water quality often comes with a number of covariates and we now introduce them into the model. Let *x*_{ijk} be the value of the *k*-th covariate at *time*_{ij}*.* For a linear time trend, the regression model with covariates is

where *ɛ*_{ij} are residual errors. Seasonal effects *α*_{j} form a factor at 12 levels for monthly data. Excluding seasonal effects, correction for covariates expressed in the first term may be estimated by Ordinary Least Squares (OLS) regression, but with *γ* = 0. Having adjusted for the covariates, the residuals are then subjected to a SKS analysis. With this approach, any trends in significant regressors may distort the trend in the response and so reduce the power. Ideally, the effects of the covariates and the response trend should be estimated simultaneously, but this upsets the permutation argument. Another possibility is to consider columns for every combination of month and covariate then reduce each column to its SKS estimate. The idea of *Libiseller and Grimvall* [2002] is to treat the vector of estimates as multivariate Normal. However, the conditional distribution of the response given the covariate 's does not adjust the original response data for the covariates in the same way as in a linear regression.

### 3. An Improved Approach Through GAMS

- Top of page
- Abstract
- 1. Introduction
- 2. Seasonal Kendall's Slope
- 3. An Improved Approach Through GAMS
- 4. Inference for GAMS With Correlated Errors
- 5. Robust Trend Estimation
- 6. Comparison of Estimation Methods
- 7. Case Study: Salinity Trend at Balranald
- 8. Discussion
- Appendix A:: Weighted Analysis of Variance
- Acknowledgments
- References
- Supporting Information

[18] Correlated errors are most easily handled through standard time series models. *Hipel and McLeod* [1994] give a very full account of time series applications to water resources, but do not include semiparametric models. *Brillinger* [1994] includes semiparametric models and point processes. Nonlinear trends could be represented by *g*(*t*) as a polynomial in time *t* in model (5) and its coefficients estimated accordingly. However, polynomials do not have straight segments and tend to be most curved and prone to high variation at the extremes of the period, making them unsuitable for prediction. Furthermore, by their global nature, the estimate of a polynomial trend at any one point can be greatly influenced by data from distant points in time. These comments indicate that there is a need for the more flexible curves that we consider next.

[19] Semiparametric models are regression models where some of the covariates occur as terms in a parametric linear model and others are represented by arbitrary smooth functions that are nonparametric. When the terms are additive, they are known as Generalized Additive Models (GAMs). GAMs with correlated errors are commonly used for analyzing repeated measures data. There is a vast literature on repeated measures, but much of the methodology utilizes the replication due to the existence of parallel time series to estimate the correlation [cf. *Diggle et al.*, 1994]. This feature is generally not available for water quality data, and the correlation model must be estimated from within the single series.

[20] Software for fitting GAMs with correlated errors is not available in the major statistical packages as far as we are aware. We show how this can be done in the simplest case where there is one nonlinear term. Where the effects of another regression term are substantial, it is possible to fit it as another nonlinear term. We have routinely fitted log(*flow*) as a spline to other data although the nonlinearity was seldom very great. This has not been done here in order to keep the exposition simple. The backfitting algorithm below may easily be extended, but explicit algebraic form for the weighted Anova is more involved.

[21] For the *i*-th observation, *y*_{i} = **x**_{i}^{T}*β* + *g*(*t*_{i}) + *ɛ*_{i} (*i* = 1…*n*), where **x**_{i} is a vector of covariates which includes seasonal terms and *g*(*t*_{i}) is the underlying smooth trend at time *t*_{i} after allowing for those covariate effects. Note that we are now representing time by *t*_{i} where *i* is an observation index because GAMs can analyze data with irregular times. There is no longer any need to have suffix *i* for years and *j* for months as in SKS. In vector notation, and across all time points, this model is written as

where the errors are assumed to have the correlation matrix **V**. So var(*ɛ***)** = *σ*^{2}**V**, with *σ*^{2} to be estimated from the weighted residual sum of squares. Given the trend, Weighted Least Squares (WLS) would be used to estimate *β*. There are various methods that may be used to estimate **g**. Our preference is to use penalized smoothing splines. Locally weighted polynomial regression (“loess”) is sometimes used in the analysis of water quality to present a graph of the trend, but has not been commonly used in conjunction with regression. Loess or kernel smoothers have the disadvantage that the window width for the smoother needs to vary according to the autocorrelation in *ɛ*; with positive autocorrelation, the windows should be wider. Penalty methods are preferred because the same penalty criterion should apply whatever **V** is. The penalty is almost invariably chosen to have a quadratic form so that *β* and **g** are estimated by minimizing the penalized sum of squares

where the first term is the weighted sum of squared errors and the second term is the penalty for a matrix **K** that is nonnegative definite and symmetric. The smoothing parameter *λ* is to be chosen and governs the trade-off between the fidelity to the observed data and the smoothness of the curve. A large value of *λ* makes *g*(*t*) very smooth, almost linear, while a small value allows *g*(*t*) to be rough. Choice of *λ* depends on the amount of detail in the trend that is of interest. *λ* may be chosen automatically by cross-validation or by Akaike's information criterion (AIC), though these methods tend to lead to curves with substantial undersmoothing, particularly when positive autocorrelation is present. *Diggle and Hutchinson* [1989] and *Opsomer et al.* [2001] exhibit this phenomenon and explain why it is so. They also review kernel smoothers, splines and wavelets under correlation.

[22] There are many kinds of penalized smoothers. We focus on smoothing splines. The construction of **K**, as given by *Green and Silverman* [1994], is

where **Q**_{n×(n−2)} and **R**_{(n−2)×(n−2)} are tri-diagonal matrices as follows. Let *h*_{i} = *t*_{i+1} − *t*_{i}*.* When the index runs over only *n*−2 values, we index it from 2 to *n*−*1*. Then, for *i* = 1…*n*, *j* = 2…*n*−1,

For *i* = 2…*n*−1, *j* = 2…*n*−1,

Define the matrix **S** = (**I***+ λ***VK**)^{−1}. Then minimization of (6) leads to the pair of equations to be solved for the estimators

where **G** = (**X**^{T}**V**^{−1}**X**)^{−1}**X**^{T}**V**^{−1}.

[23] Solution of equations (10) and (11) may be achieved by iterative substitution, also known as backfitting, as discussed by *Hastie and Tibshirani* [1990] and adopted by standard software for the case of uncorrelated errors. Start with **g** = **0**, solve (11) for *β*; with this value *β*, solve (10) for **g** and repeat the process until the estimates converge.

[24] It is common to assume that the residuals are from a stationary autoregressive moving average (ARMA) process of low order. Often an autoregressive process of order 1 works well, and has the attraction of the dependence being Markovian, i.e., conditional on the previous month, a residual is independent on the past. For estimation, one may first take **V** = **I,** the identity matrix, in order to obtain residuals by fitting **X**; from these one can get a reasonable estimate of **V** as in *Diggle and Hutchinson* [1989]. The process may be refined by iteratively refitting **X** by WLS and updating **V**. As with WLS, the minimization of (6) does not imply that we are assuming that the distribution of the residuals is multivariate Normal. The essential assumption is that the mean and variance specifications are true.

[25] For models where the errors are independent, an excellent account of GAMs is given by *Hastie and Tibshirani* [1990]. Nonnormal data are included in the set-up provided that the variance of *y* has a known relationship to its mean. This could be used for data that are counts or where a variance-stabilizing transformation is unsuitable. Then **V** is the diagonal matrix **V**_{0} with elements {*v*(*μ*_{i})}. Correlation can be introduced by assuming that the scaled data have correlation **W** usually containing further parameters to be estimated. That is **V** = **V**_{0}^{1/2}**WV**_{0}^{1/2}.

### 5. Robust Trend Estimation

- Top of page
- Abstract
- 1. Introduction
- 2. Seasonal Kendall's Slope
- 3. An Improved Approach Through GAMS
- 4. Inference for GAMS With Correlated Errors
- 5. Robust Trend Estimation
- 6. Comparison of Estimation Methods
- 7. Case Study: Salinity Trend at Balranald
- 8. Discussion
- Appendix A:: Weighted Analysis of Variance
- Acknowledgments
- References
- Supporting Information

[27] SKS is advocated because it is robust. The property of robustness is that estimation is not unduly influenced by observations that are outliers. This is perhaps the main reason for using SKS. Robust regression [*Huber*, 1981] refers to parameter estimation in linear regression models that have high efficiency when the errors have heavy tailed distributions. It is based on the minimization *ϕ*(*ɛ*_{i}) where *ϕ* is a known function. *ϕ*(*u*) = *u*^{2} gives the familiar least squares criterion. Robust methods however choose *ϕ* to increase less rapidly than the quadratic for large values. If *ϕ*(*u*) =∣*u*∣, the method is known as median regression, and quantile regression is an extension of this. An attraction of using robust estimation, as with GAMs, is that the trend and the regression coefficients *β*_{k} are estimated simultaneously since there is no need to separate season and trend from the covariates in the model.

[28] Robust methods can accommodate a linear regression model satisfactorily and incorporate seasonal effects in the regression terms. One might think of robust regression as being intermediate between SKS and Normal regression. Robust methods have been used by *Cox et al.* [2005] to fit data with censoring and assuming independent errors.

[29] Where correlation is present, all robust methods experience difficulty in estimating serial dependence. *Jung* [1996] derives a method for performing median regression with autocorrelation. Autocorrelation between successive sgn(*ɛ*_{i}) can be estimated empirically. *Koenker et al.* [1994] consider penalized robust estimation that combines quantile regression with spline fitting for independent data. Autoregression could be added analogously to Jung, provided that one can surmount the computational obstacles and identify a numerical algorithm that performs satisfactorily. There could be a limitation if the sample size were insufficient to estimate the autocorrelation accurately enough.

### 7. Case Study: Salinity Trend at Balranald

- Top of page
- Abstract
- 1. Introduction
- 2. Seasonal Kendall's Slope
- 3. An Improved Approach Through GAMS
- 4. Inference for GAMS With Correlated Errors
- 5. Robust Trend Estimation
- 6. Comparison of Estimation Methods
- 7. Case Study: Salinity Trend at Balranald
- 8. Discussion
- Appendix A:: Weighted Analysis of Variance
- Acknowledgments
- References
- Supporting Information

[32] The data set consists of salinity and flow measurements taken at Balranald (station code 410130), New South Wales, Australia over the period 1978-2006. Balranald is downstream of all the major irrigation areas on the Murrumbidgee River, and so trends in salinity are of interest because they may reflect changes in irrigation practices. Groundwater flows to the river could also be influenced by changes in groundwater extraction.

[33] Sampling was more or less weekly up to 1989 and thereafter with increasing frequency up to daily from 1994. Observations were reduced to monthly means of EC (electrical conductivity in *μ*S/cm) and flow (discharge in ML per day), which were complete except for April and May in 1979. Medians were considered instead of means, but did not give an improved fit. The means were log-transformed. Natural logarithms were used throughout. Diagnostic plots of the residuals showed that there were 5 outliers. These could be deleted on the basis that they presumably were responses to temporary factors and not representative of the long-term trend we sought to estimate. A Q-Q plot of the remainder indicated that Normality was very satisfactory.

[34] For convenience of output, we redefined the timescale *time*_{i} = 1978 + *t*_{i}/12 (month *t*_{i} from the start). Season was represented as a single cycle sinusoid by defining regression terms *sin*_{i} = sin(2*π**time*_{i}) and *cos*_{i} = cos(2*π**time*_{i}) that capture the Autumn versus Spring and Summer versus Winter contrasts respectively. These allow an arbitrary sinusoidal season effect where the corresponding amplitude and phase reflect the magnitude and timing of the seasonal peak. We considered a possible interaction between seasonal effects and time; so multiplicative terms were included with time centered. The model fitted to *log*(*flow*) was

where *t*_{adj,i} = *time*_{i} − 1992.5 and s(*time*_{i};df) means a spline trend with formal degrees of freedom df and includes the constant. Figure 1 shows the log-flow observations and the fitted values of this model. For the purpose of data summary, autocorrelation was not taken into account. It is noticeable there was a large seasonal component at the start which diminished as time progressed. The *cos* component of season was negligible. From Table 2, the *sin* coefficients were *α*_{1} = −0.6949 and *α*_{3} = 0.04744 indicating a strong Spring minus Autumn contrast. At 1978, 14.5 years before the midpoint of the time period, the estimated *sin* effect was −0.6949 − 14.5 × 0.04744 = −1.383 so that Spring flow was about 4 times the geometric mean and Autumn flow about 0.25 times the geometric mean. Because of the simplicity of the model and these calculations being at the extremes of the period these factors are possibly a little unrealistic, but the feature of changing seasonal effect is unmistakable.

Table 2. Estimated Regression Coefficients From Model (15) With Autocorrelation 0.595 and Spline With 10 dfTerm | Estimate | s.e. |
---|

sin, *α*_{1} | −0.6949 | 0.0139 |

cos, *α*_{2} | 0.0134 | 0.0138 |

sin × t_{adj} , *α*_{3} | 0.0474 | 0.0018 |

cos × t_{adj} , *α*_{4} | 0.0064 | 0.0017 |

linear trend component | −0.0387 | 0.0009 |

[35] Figure 1 is consistent with our understanding of the system. In 1994 river regulation was tightened and a cap in the flow was put in place. This reduced some of the fluctuations in flow, with fewer high flows in particular. Greater irrigation has also contributed to the general reduction in flow. The increased river management of “environmental flow” has meant that the minimum flows are higher than occurred before 1994.

[36] Since 2004 a new water sharing plan plus a widespread drought have meant that the minimum flows have been held at almost same level since around 1996 but the maximum flows are much reduced. Overall the flow was greatly reduced. Better regulation and much less rain has resulted in a strong reduction in the fluctuations in the flow regime, with minimum flows and peak flows both being much steadier, and the gap between the two decreasing.

[37] A similar model, but including a log-flow term, was fitted to *log*(*EC*):

[38] The errors {*ɛ*_{i}} were assumed to be AR(1) with autocorrelation *ρ*. The time trend term s(*time*_{i};df) can be expressed as a linear component *α* + *β*_{6 }*time*_{i} with remainder being a nonlinear component that captures the deviations from linearity.

[39] Necessarily, any departure of the true trend from the spline curve gets incorporated with the errors. Degrees of freedom may be thought of as having roughly the smoothness to the order of a polynomial fit. If the curve is allowed to be more wiggly, df increases and *λ* decreases; the estimated correlation decreases, but is still substantial even for an over-generous 30 df in Table 3.

Table 3. Values of the Formal Degrees of Freedom of the Spline, tr(S), the Smoothing Parameter *λ* and the Autocorrelation *ρ* for the Balranald log(EC) DataFormal df | *λ* | *ρ* |
---|

4 | 253,200 | 0.572 |

5 | 105,400 | 0.568 |

7 | 28,755 | 0.555 |

10 | 7571 | 0.527 |

15 | 1688 | 0.492 |

30 | 138 | 0.420 |

[40] Table 4 gives the weighted Anova for the case where df = 10 (including the linear component); the estimates and their standard errors are given in Table 5. The effect of log-flow was nonsignificant after eliminating effects that could be attributed to season. Seasonal effects (eliminating log-flow effects) are quite large and the interactions of season are clearly significant. The *cos* coefficient (*β*_{3} + *β*_{5}*t*_{adj}) increased over time from −0.0101 at the start of 1987 to 0.3802 at the end of 2006; thus there was little difference Summer minus Winter initially, but it increased over time. At the end, the ratio of Summer EC to Winter EC was exp(2 × 0.3802) = 2.14. The *sin* coefficient showed a smaller change −0.1071 to −0.2756, thus Spring EC was higher than Autumn EC and the ratio increased. The standard error of the coefficient of *sin* is smaller than that for *cos* because it is confounded with the log-flow effect which has a large *sin* component. Thus in interpreting Table 4, statistical significance does not necessarily indicate the magnitude of an effect but the ability to detect it. The estimated amplitude of the sinusoid was 0.1076 at the start with phase (maximum) about 20 May; at the end the amplitude was about 4 times larger at 0.4696 with phase about 21 March.

Table 4. Weighted Anova Table for Balranald Salinity Data Assuming First Order Autocorrelation of 0.527 for the ErrorsTerms | df | Weighted SS | MSS | F | P |
---|

Linear terms (log-flow, season, season × time, linear trend) | 6 | 7.386 | 1.231 | 19.74 | <0.001 |

Nonlinear spline trend | 9 | 4.118 | 0.458 | 7.34 | <0.001 |

Residual | 330 | 20.580 | 0.062 | | |

Total | 345 | 32.083 | | | |

Table 5. Estimated Coefficients and Standard Errors for Balranald Salinity Model (16) with df = 10 and Autocorrelated Errors *ρ* = 0.527Term | Coefficient | Standard Error | t-Value |
---|

log-flow *β*_{1} | −0.0262 | 0.0268 | −0.98 |

sin *β*_{2} | −0.1913 | 0.0232 | −8.23 |

cos *β*_{3} | 0.1851 | 0.0138 | 13.44 |

sin × t_{adj}*β*_{4} | −0.0058 | 0.0021 | −2.70 |

cos × t_{adj}*β*_{5} | 0.0135 | 0.0017 | 7.86 |

linear trend *β*_{6} | −0.0104 | 0.0014 | −7.44 |

[41] Figure 2 shows the estimated trend and seasonal effects superimposed on the adjusted data. The adjusted log(*EC*) is obtained by subtracting *β*_{1}log(*flow*). Residuals showed reasonably constant variance over time, though two high values occur in 1985 and three low adjusted log(*EC*) are to be seen in 2004 and have slightly depressed the trend for the adjacent data. Apart from these, the scatter in the data is fairly constant. As the flow adjustment is small, the change in seasonal effect is not attributable to the change in seasonal flow. We have noted separately that the regression coefficients differ very little from those in Table 4 if df = 7, and indeed were found to be fairly insensitive to a wider range of trend smoothing.

[42] For any site where the nonlinear component of the trend is found to be significant, the linear component is an insufficient summary of the nature of the trend. In Table 4 the F-value is highly significant. This does not imply that the estimate of the linear trend in Table 5 is invalid but does suggest that other features could be as important too. The coefficient of the linear component of trend in adjusted log(*EC*) was −0.0104, which corresponded to a decrease of 1.05% p.a. on the natural scale. This was statistically highly significant (t = −7.43). However, this number gives quite the wrong impression, for the trend in Figure 2 is markedly nonlinear; it is shown in more detail in Figure 3 on the natural scale. We first visited the Balranald data for 1966 to 1995 [cf. *Jolly et al.*, 2001]. Without the subsequent data, the estimated linear trend was significantly positive, but the spline curve had already detected the downturn. This reinforces the point that a linear trend can be an inadequate summary and a poor predictor when nonlinear trends occur; it is sensitive to the start and the end of the sequence.

[43] In Figure 3, the curve and 95% confidence bands on the log scale have been transformed back to the natural scale. For a more familiar interpretation, the trend curve has been scaled to have the same mean as the original EC. The spline trend curve shows a marked increase in salinity from mid 1983. This occurred after a severe drought broke in April and May 1983 and reflects the increased salinization in the upper tributaries. This rise ceased around the start of 1985. A steady decline started around 2001 which coincided with the start of another drought and the adjusted EC level has dropped below what it was in 1978. This observed decrease in salinity is driven by reduced saline inflows to the system from the saline tributaries, reduced irrigation drainage, and reduced groundwater levels proximal to the river and hence reduced groundwater inflow to the river.

[44] If df = 7 is chosen, the trend curve is smoother (Figure 4). This suggests that the rise may have continued more slowly upwards past 1985 to about 1990. The smaller and more local peak and troughs in Figure 3 between 1985 and 2001 may or may not have an interpretation, but whether Figure 3 or Figure 4 is preferred, the big features are unmistakable.

[45] If the data are sufficiently informative, it is clear that more complex models could be analyzed. Variates such as log-flow could be lagged as in transfer models, seasonal effects could include two-cycle sinusoid terms, management or remediation measures could be represented, etc. It is possible to model the coefficients as varying smoothly over time, as by *Hastie and Tibshirani* [1993]. However, if the response to flow is not constant, for example *β*_{1} might vary with time as the result of building of a weir, then it is not clear what the appropriate method for correcting the trend in water quality for flow effects should be, and the solution may have to be specific to the circumstances. In their analysis of log-sulphate concentration, *Miller and Hirst* [1998] and *Hirst* [1998] use product terms of log-flow by sin and cos, with smoothly varying coefficients over time, but assumes uncorrelated errors. The model (16) is similar to *Potts et al.* [2003] for nitrate concentration except that they use time-varying spline coefficients for log-flow and season; the errors are AR(1). They base their inference on bootstrapping.

### 8. Discussion

- Top of page
- Abstract
- 1. Introduction
- 2. Seasonal Kendall's Slope
- 3. An Improved Approach Through GAMS
- 4. Inference for GAMS With Correlated Errors
- 5. Robust Trend Estimation
- 6. Comparison of Estimation Methods
- 7. Case Study: Salinity Trend at Balranald
- 8. Discussion
- Appendix A:: Weighted Analysis of Variance
- Acknowledgments
- References
- Supporting Information

[46] Water quality data are often highly variable. This is compounded by the fact that they can be subject to errors that result from sampling anomalies, contamination, laboratory imprecision, transcription and data storage issues. Many of these anomalies present themselves as “outliers” and typically are identified by large standardized regression residuals. We have generally not noticed that outliers seriously distort the trend estimate unless they occur at the extremes of the period. The main advantage of giving outliers less weight in the analysis is in the reduction of the residual variance and consequently the narrower confidence intervals. If observations are apparently aberrant, they should be inspected and considered for exclusion from the analysis, whatever trend method is used.

[47] It is always important to consider the frequency of observations required in relation to what we are trying to detect. For instance, if we are looking for trends or changes over several seasons there is substantial redundancy in data collected daily or to the nearest minute. Averaging to a wider time window will have a negligible effect on the information content but may make the interpretation easier and reduce the computational burden. Such means lessen the need to model the autocorrelation in more detail and are frequent enough to identify seasonal effects. However, to determine the rapid changes in water quality, for example due to rainfall events, a daily or hourly timescale would be needed.

[48] As was noted in the Balranald example, the sampling frequency changed over time. The variance and autocorrelation of monthly means residual might well have changed during the period. Changes in the procedure of monitoring could in principle be incorporated into a GAM model. This would be very difficult to tackle with any generality, and it is beyond the scope of this paper, which has introduced this case study as an example to illustrate some of the possibilities. The point is that the GAM set-up has a much greater capacity to increase the complexity of the model than have nonparametric or robust methods.

[49] The appropriate length of the monitoring data sequence will depend on the need and ability to separate effects that are contributing to the data observed at different scales. When strong seasonal components are present, several years are often needed so as to adequately separate the broader general trend from the shorter term seasonal variation. Longer sequences are also generally needed when there is substantial serial correlation because it reduces the effective sample size.

[50] Irregularly spaced or missing data are naturally accounted for within the GAM methodology, though irregularly spaced data usually means we have to adjust/average the data to a regular time interval or model the serial correlation using a continuous AR(1) process. If the proportion of missing data is large it may not be possible to estimate the autocorrelation parameter. If the missing data occur in large blocks the trend during that period will be poorly estimated.

[51] The choice of the smoothing parameter *λ* is problematic. *Hirst* [1998] recommends 1 degree of freedom for each year, but that would clearly be too much for our example. Data driven methods such as cross-validation or AIC tend to undersmooth, particularly where positive autocorrelation is present. If automatic methods are used, any inference should take the process into account, but in practice this is often ignored. We advocate that the degree of smoothing should depend upon the purpose of the enquiry (at what scale are the phenomena of interest and what are interpretable?). This depends on knowledge of floods, droughts, intervention or changes in land management. Once these have been identified, additional regression terms may be included in the model and a smoother residual trend may be satisfactory. For prediction beyond the end of the series, a fairly smooth curve is required. *Morton et al.* [2008] investigate the choice of smoothing parameter on that basis.

[52] The use of a smoothing spline to estimate the trend is effective when the underlying trend is indeed smooth. It will however fail to pick up rapid changes in slope, and always underestimates the peak of a local maximum or the trough of a local minimum. At an extreme of the period, the spline is straight and may be useful for short term prediction, but it would not be sensible to predict the future very far by extending a spline curve that has recently changed direction. Indeed, the degree of nonlinearity of the trend curves means that it is unreliable to predict the future without some hydrological insight as to why the peaks and troughs have occurred.

[53] GAMs have many advantages for analyzing trends. All the facilities that are available in regression modeling can be incorporated to represent any desired complexity. There is great flexibility in determining the shape of the trend and its confidence bands.