SEARCH

SEARCH BY CITATION

Keywords:

  • Robust regression;
  • seasonal kendall's slope;
  • smoothing splines;
  • time series;
  • water quality trend

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[1] This paper advocates the use of Generalized Additive Models (GAMs) for the estimation of nonlinear trends in water quality in the presence of serially correlated errors. The GAM methodology is applicable to a range of physical, chemical and biological water quality variates. Comparison with the estimate based on Seasonal Kendall's Slope and robust regression is discussed. An example is given concerning log-transformed stream electrical conductivity, which is adjusted for flow. The monthly data have first order autocorrelation exceeding 0.5 and the trend is markedly nonlinear. Seasonal effects are shown to have been changing over time.

1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[2] Trends over time in meaningful physical, chemical and biological water quality variates may usefully inform the decisions of water managers. They identify important changes in the river health and allow the effectiveness of key management interventions to be assessed. It is vital that there are techniques available that adequately address important features of the monitoring data in determining any trend. In this paper we focus on the need for methods that accommodate the following.

[3] • nonlinear trends.

[4] • serial correlation.

[5] • adjustment for season and covariate effects such as flow.

[6] Since water quality data are often not Normally distributed, Hirsch et al. [1982] argue for the use of nonparametric methods. For the detection of trends, they proposed a variation of Kendall's Tau that eliminates the constant seasonal effects. Seasonal Kendall's Tau (SKT) has become a very popular method of trend detection for monotonic trends. The test statistic can be significant if the change in the water quality between any two time points tends to be in the same direction (increasing or decreasing). Much effort has been made to extend the methodology. The technique has been modified to cope with serial correlation [Hirsch and Slack, 1984] and with allowance for covariates [Alley, 1988; Libiseller and Grimvall, 2002 and references therein]. There is a price to pay for adopting a nonparametric approach as some statistical power is lost when it is modified to allow for serial correlation, and methods for handling covariates are not entirely satisfactory. Among the reviews of trend analyses are Hirsch et al. [1991], Esterby [1996] and Dixon and Chiswell [1996]. In all of these, much of the discussion concerns nonparametric tests and their modifications.

[7] Trend estimation is distinct from trend detection in that the objective is to quantify the changes and investigate models that provide interpretation as to the processes possibly causing them. For our purpose, we consider trend as being the progress over time in the mean (arithmetic or geometric) of a variable of interest. Kendall's Tau may also be presented as an estimate of linear trend [Theil, 1950; Sen, 1968], and similarly for the seasonal modification, between SKT and the Seasonal Kendall's Slope (SKS) [Hirsch et al., 1982]. For series of sufficient length, the trend can be nonlinear or even nonmonotone, in which case SKS would be inappropriate. An extension of Kendall's Tau to detect nonlinearity of a trend is given by El-Sharaawi and Niculescu [1993]. It assumes independent and identically distributed errors and is based on Kendall's Tau applied to first differences of the data. This would provide an estimate of trend linear in first differences but would lose sensitivity if the differences were not monotonic. We consider other approaches based on estimation of a more general trend that also incorporates serially correlated errors.

[8] Water quality variates often change with discharge, season, management interventions and other anthropogenic activities. The adjustment of the water quality trend for these sources of variation is often desirable for it reveals the underlying trend not attributable to known causes. These sources of variation are represented by covariate terms in a regression model. Seasonal effects may be represented by sinusoidal terms or by a seasonally dependent intercept (factor). Step changes in the trend and covariates may be included in the regression model.

[9] We first discuss the pros and cons of Seasonal Kendall's Slope. We then demonstrate the use of Generalized Additive Models (GAMs). GAMs have been used extensively in the analysis of air pollution; “nearly ubiquitous” according to the Health Effects Institute [2003], and “a standard tool in time series studies of air pollution and health in the past decade” [He et al., 2006]. We contend that GAMs should be more frequently used in the analysis of water quality trends. They have been used to analyze numerous sequences of salinity data in Australia where trends are often nonlinear, [e.g., Nathan et al., 1999; Jolly et al., 2001], and for streamflow [Letcher et al., 2001], but less frequently elsewhere. GAMs allow the nonlinearity of a trend to be of arbitrary shape. Examples of nonlinear trends occur in the work of Brillinger [1994], Miller and Hirst [1998], Schreider et al. [2002], Cox et al. [2005], Giannitrapani et al. [2005], and McMullan et al. [2007]. With GAMs there is great scope for more complex modeling. For example, the parameters in the regression could change over time, and we estimate a changing seasonal effect in the example below. We show how GAMs can be analyzed in the presence of serial correlation. As a rule of thumb, first order autocorrelation exceeding 0.2 can, if ignored, seriously distort the inference about the trend by leading to liberal tests and confidence intervals that are too narrow [Hirsch and Slack, 1984]. In our Australian experience from two large studies, 80 out of 117 sites had first order autocorrelation ρ ≥ 0.2 for log electrical conductivity (EC) in μS/cm. For a smaller set of 26 sites, 6 of the 8 other chemical log-variates analyzed had ρ ≥ 0.2 for at least 20 sites. Autocorrelation was consistently high for log-counts of 16 phytoplankton species at 9 sites, where 94 of the 139 species by site combinations analyzed had ρ ≥ 0.4. This paper gives a brief discussion of robust methods because they may be considered as being intermediate between SKS and the GAM methodology. A comparison of the three broad approaches to trend analysis is provided.

[10] For simplicity of exposition, we shall assume that monthly data are available. If the data are collected at a higher resolution than monthly we assume that it has been reduced to a monthly value by taking a mean or median and the reduced data visualized as being arranged in yearly rows and monthly columns January to December. Note that the methodologies described in this paper readily extend to data means at any frequency, e.g., weekly, monthly or quarterly; in many papers, these are referred to as “seasons”.

2. Seasonal Kendall's Slope

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[11] Nonparametric estimation methods are widely used in water resources and refer to methods that do not require an explicit assumption of a distribution. This is often seen to add robustness and extend the validity of the inference. For instance, nonparametric methods may be used to estimate the linear trend parameter so that the inference is valid without the assumption that the data are Normally distributed. While nonparametric estimation methods alleviate the need for specific distributional assumptions, outliers should still be examined and a decision made as to whether they should be omitted before assessing the underlying trend.

[12] We discuss only SKS, as this is the most commonly used nonparametric method for estimating trends in water quality. Consider a response yij in the i-th year, j-th month and define timeij = i − 1+ j/12. The SKT test assumes the yij are independent and identically distributed, and is based on the statistic equation image = equation imageequation image sgn {yi'jyij}, where sgn{a} is −1, 0 or 1 according as a is <0, 0 or >0.

[13] Under the extra assumption of linear trend, where

  • equation image

and the errors ɛij are assumed to be independent and identically distributed so that any permutation of the yearly rows is equally likely, we may estimate the magnitude of the trend. The SKS linear trend estimate as given by Hirsch et al. [1982] is

  • equation image

[14] The SKT and SKS are mathematically connected. Consider the function

  • equation image

and note that equation image = S(0) and equation image is the solution of S(γ) = 0. Under the assumption of a linear trend, confidence intervals for γ can be obtained from inverting the SKT applied to yijγtimeij.

[15] Thus SKS is the estimate of linear trend due to Theil [1950] and Sen [1968] modified by removing seasonal effects. Note that the SKT test is invariant under monotone transformation of y, but that the SKS estimate is not.

[16] In the Hirsch and Slack [1984] treatment of SKT, serial correlation is allowed between months in the same row but not between rows, so that the permutation argument still holds. Any correlation between December one year and January the next year is ignored. The correlations between columns within each row are assumed constant and are estimated in nonparametric fashion. The test is not robust against highly persistent processes (ρ > 0.6) and some power is lost due to estimating binary correlations; a series of at least 10 years of monthly data is recommended by Hirsch and Slack [1984] and Harcum et al. [1992] before the advantage of using SKS over parametric estimation of linear trend would start to be realized. An alternative approach is to remove the autocorrelation by prewhitening. For this an estimate of ρ is required. Yue et al. [2002] recommended that where a linear trend exists, the required autocorrelation is estimated by eliminating the trend. The differences yijρyij−1 can then be analyzed by SKS with trend γ(1 − ρ) from which γ could be estimated. Wang and Swail [2001] propose an iterative prewhitening and trend estimation procedure (see their Appendix A). Zhang and Zwiers [2004] compare the performance of different prewhitening and trend estimation procedures, including the procedure of Wang and Swail. If the trend is nonlinear, the estimated autocorrelation could be misleading and the method unreliable. For longer series, it may be unreasonable to assume that the trend is linear, so we conclude that SKS is best for fairly short series where serial correlation may be assumed to be negligible.

[17] Water quality often comes with a number of covariates and we now introduce them into the model. Let xijk be the value of the k-th covariate at timeij. For a linear time trend, the regression model with covariates is

  • equation image

where ɛij are residual errors. Seasonal effects αj form a factor at 12 levels for monthly data. Excluding seasonal effects, correction for covariates expressed in the first term may be estimated by Ordinary Least Squares (OLS) regression, but with γ = 0. Having adjusted for the covariates, the residuals are then subjected to a SKS analysis. With this approach, any trends in significant regressors may distort the trend in the response and so reduce the power. Ideally, the effects of the covariates and the response trend should be estimated simultaneously, but this upsets the permutation argument. Another possibility is to consider columns for every combination of month and covariate then reduce each column to its SKS estimate. The idea of Libiseller and Grimvall [2002] is to treat the vector of equation image estimates as multivariate Normal. However, the conditional distribution of the response equation image given the covariate equation image's does not adjust the original response data for the covariates in the same way as in a linear regression.

3. An Improved Approach Through GAMS

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[18] Correlated errors are most easily handled through standard time series models. Hipel and McLeod [1994] give a very full account of time series applications to water resources, but do not include semiparametric models. Brillinger [1994] includes semiparametric models and point processes. Nonlinear trends could be represented by g(t) as a polynomial in time t in model (5) and its coefficients estimated accordingly. However, polynomials do not have straight segments and tend to be most curved and prone to high variation at the extremes of the period, making them unsuitable for prediction. Furthermore, by their global nature, the estimate of a polynomial trend at any one point can be greatly influenced by data from distant points in time. These comments indicate that there is a need for the more flexible curves that we consider next.

[19] Semiparametric models are regression models where some of the covariates occur as terms in a parametric linear model and others are represented by arbitrary smooth functions that are nonparametric. When the terms are additive, they are known as Generalized Additive Models (GAMs). GAMs with correlated errors are commonly used for analyzing repeated measures data. There is a vast literature on repeated measures, but much of the methodology utilizes the replication due to the existence of parallel time series to estimate the correlation [cf. Diggle et al., 1994]. This feature is generally not available for water quality data, and the correlation model must be estimated from within the single series.

[20] Software for fitting GAMs with correlated errors is not available in the major statistical packages as far as we are aware. We show how this can be done in the simplest case where there is one nonlinear term. Where the effects of another regression term are substantial, it is possible to fit it as another nonlinear term. We have routinely fitted log(flow) as a spline to other data although the nonlinearity was seldom very great. This has not been done here in order to keep the exposition simple. The backfitting algorithm below may easily be extended, but explicit algebraic form for the weighted Anova is more involved.

[21] For the i-th observation, yi = xiTβ + g(ti) + ɛi (i = 1…n), where xi is a vector of covariates which includes seasonal terms and g(ti) is the underlying smooth trend at time ti after allowing for those covariate effects. Note that we are now representing time by ti where i is an observation index because GAMs can analyze data with irregular times. There is no longer any need to have suffix i for years and j for months as in SKS. In vector notation, and across all time points, this model is written as

  • equation image

where the errors are assumed to have the correlation matrix V. So var(ɛ) = σ2V, with σ2 to be estimated from the weighted residual sum of squares. Given the trend, Weighted Least Squares (WLS) would be used to estimate β. There are various methods that may be used to estimate g. Our preference is to use penalized smoothing splines. Locally weighted polynomial regression (“loess”) is sometimes used in the analysis of water quality to present a graph of the trend, but has not been commonly used in conjunction with regression. Loess or kernel smoothers have the disadvantage that the window width for the smoother needs to vary according to the autocorrelation in ɛ; with positive autocorrelation, the windows should be wider. Penalty methods are preferred because the same penalty criterion should apply whatever V is. The penalty is almost invariably chosen to have a quadratic form so that β and g are estimated by minimizing the penalized sum of squares

  • equation image

where the first term is the weighted sum of squared errors and the second term is the penalty for a matrix K that is nonnegative definite and symmetric. The smoothing parameter λ is to be chosen and governs the trade-off between the fidelity to the observed data and the smoothness of the curve. A large value of λ makes g(t) very smooth, almost linear, while a small value allows g(t) to be rough. Choice of λ depends on the amount of detail in the trend that is of interest. λ may be chosen automatically by cross-validation or by Akaike's information criterion (AIC), though these methods tend to lead to curves with substantial undersmoothing, particularly when positive autocorrelation is present. Diggle and Hutchinson [1989] and Opsomer et al. [2001] exhibit this phenomenon and explain why it is so. They also review kernel smoothers, splines and wavelets under correlation.

[22] There are many kinds of penalized smoothers. We focus on smoothing splines. The construction of K, as given by Green and Silverman [1994], is

  • equation image

where Qn×(n−2) and R(n−2)×(n−2) are tri-diagonal matrices as follows. Let hi = ti+1ti. When the index runs over only n−2 values, we index it from 2 to n1. Then, for i = 1…n, j = 2…n−1,

  • equation image

For i = 2…n−1, j = 2…n−1,

  • equation image

Define the matrix S = (I+ λVK)−1. Then minimization of (6) leads to the pair of equations to be solved for the estimators

  • equation image
  • equation image

where G = (XTV−1X)−1XTV−1.

[23] Solution of equations (10) and (11) may be achieved by iterative substitution, also known as backfitting, as discussed by Hastie and Tibshirani [1990] and adopted by standard software for the case of uncorrelated errors. Start with g = 0, solve (11) for β; with this value β, solve (10) for g and repeat the process until the estimates converge.

[24] It is common to assume that the residuals are from a stationary autoregressive moving average (ARMA) process of low order. Often an autoregressive process of order 1 works well, and has the attraction of the dependence being Markovian, i.e., conditional on the previous month, a residual is independent on the past. For estimation, one may first take V = I, the identity matrix, in order to obtain residuals by fitting X; from these one can get a reasonable estimate of V as in Diggle and Hutchinson [1989]. The process may be refined by iteratively refitting X by WLS and updating V. As with WLS, the minimization of (6) does not imply that we are assuming that the distribution of the residuals is multivariate Normal. The essential assumption is that the mean and variance specifications are true.

[25] For models where the errors are independent, an excellent account of GAMs is given by Hastie and Tibshirani [1990]. Nonnormal data are included in the set-up provided that the variance of y has a known relationship to its mean. This could be used for data that are counts or where a variance-stabilizing transformation is unsuitable. Then V is the diagonal matrix V0 with elements {v(μi)}. Correlation can be introduced by assuming that the scaled data have correlation W usually containing further parameters to be estimated. That is V = V01/2WV01/2.

4. Inference for GAMS With Correlated Errors

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[26] The solution of (10) and (11) may be written as

  • equation image
  • equation image

and H = XG is commonly known as the “hat” matrix in the absence of smooth terms. It was found that estimation of g and β using (12) and (13) was numerically unstable and did not agree precisely with estimates provided by standard GAM backfitting software in the case where the autocorrelation ρ in an AR(1) model was assumed to be zero. Consequently, backfitting was preferred for estimation, and convergence was achieved within 10 iterations. However, standard errors were calculated from (12) and (13) as these express the estimators as linear functions of y. Thus

  • equation image

The square roots of the diagonal elements give the standard errors of the elements of g which may be used to display confidence bands around the estimated trend curve. A test for the difference in the trend between two prescribed time points can be based on the t-statistic for the difference between g at those points. Note that it is beneficial to retain any data outside the interval of interest because it is informative in “anchoring” the endpoints and is useful to help estimate σ2V. Naturally, to avoid bias, the selection of the endpoints for consideration must not be based on the hindsight after seeing the results. A test of significance for the nonlinearity of the trend may be performed based on the weighted Anova, leading to an approximate F test (see Appendix). Similarly, standard errors and t-statistics for the regression coefficients may be calculated.

5. Robust Trend Estimation

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[27] SKS is advocated because it is robust. The property of robustness is that estimation is not unduly influenced by observations that are outliers. This is perhaps the main reason for using SKS. Robust regression [Huber, 1981] refers to parameter estimation in linear regression models that have high efficiency when the errors have heavy tailed distributions. It is based on the minimization equation imageϕ(ɛi) where ϕ is a known function. ϕ(u) = u2 gives the familiar least squares criterion. Robust methods however choose ϕ to increase less rapidly than the quadratic for large values. If ϕ(u) =∣u∣, the method is known as median regression, and quantile regression is an extension of this. An attraction of using robust estimation, as with GAMs, is that the trend and the regression coefficients βk are estimated simultaneously since there is no need to separate season and trend from the covariates in the model.

[28] Robust methods can accommodate a linear regression model satisfactorily and incorporate seasonal effects in the regression terms. One might think of robust regression as being intermediate between SKS and Normal regression. Robust methods have been used by Cox et al. [2005] to fit data with censoring and assuming independent errors.

[29] Where correlation is present, all robust methods experience difficulty in estimating serial dependence. Jung [1996] derives a method for performing median regression with autocorrelation. Autocorrelation between successive sgn(ɛi) can be estimated empirically. Koenker et al. [1994] consider penalized robust estimation that combines quantile regression with spline fitting for independent data. Autoregression could be added analogously to Jung, provided that one can surmount the computational obstacles and identify a numerical algorithm that performs satisfactorily. There could be a limitation if the sample size were insufficient to estimate the autocorrelation accurately enough.

6. Comparison of Estimation Methods

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[30] The pros and cons of these three broad approaches to trend estimation are summarized in Table 1. There appears to be little advantage of SKS over median regression and it has the substantial disadvantage that trends need to be linear. Polynomial trends can be used with robust methods, but can produce curves that are unreliable, particularly with high curvature at the extremes. GAM methods produce more reliable curves in this respect and have the advantage that the variance structure can be estimated more accurately.

Table 1. Pros and Cons of Different Methods to Trend Estimationa
 Seasonal Kendall's SlopeRobust RegressionGeneralized Additive Models (GAMs)
  • a

    Along with splines, GAMs include linear, polynomial and interaction terms.

Trend forms availableLinear only.Linear, polynomial.Linear, polynomial, spline.
Short-term predictionMisleading if trend is not linear.Extrapolation of a PolynomialSpline better; it has straight extension.
Adjustment methods for x-variatesParametric form – not estimated simultaneously to trend.Parametric form of regressionParametric form or spline regression.
Varying seasonal effectsNot available.Interaction term in a regression model.Interaction term in a regression model.
Efficiency with heavy tailed errorsGood.Good.Some loss, unless errors are Normal.
OutliersImplicitly down-weighted assuming symmetric distribution. Should still be examined.Implicitly down-weighted assuming symmetric distribution. Should still be examined.Analyst must examine them and deal with them directly.
Correlated errorsSome reduction in efficiency fornonparametric estimation. Prewhitening is an alternative.Available for median or quantile regression; AR estimation may require a large sample size.Good for linear or polynomial models, extends to GAMs.
Missing data, irregular spacingSome loss of efficiency if many data missing.OK.OK.
Censored dataLinear trend OK.Trend OK for median or quantile regression.Must substitute values for censored data.

[31] Censored data (e.g., below the minimum level of detection) can be troublesome. If there are many censored data, equation image is slightly biased [Hirsch et al., 1991]. GAMs do not handle censored data easily, but one might take the view that inserting values for the censored data will not interfere with the estimation of trend when the water quality variate is high enough to be of concern for decision-making.

7. Case Study: Salinity Trend at Balranald

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[32] The data set consists of salinity and flow measurements taken at Balranald (station code 410130), New South Wales, Australia over the period 1978-2006. Balranald is downstream of all the major irrigation areas on the Murrumbidgee River, and so trends in salinity are of interest because they may reflect changes in irrigation practices. Groundwater flows to the river could also be influenced by changes in groundwater extraction.

[33] Sampling was more or less weekly up to 1989 and thereafter with increasing frequency up to daily from 1994. Observations were reduced to monthly means of EC (electrical conductivity in μS/cm) and flow (discharge in ML per day), which were complete except for April and May in 1979. Medians were considered instead of means, but did not give an improved fit. The means were log-transformed. Natural logarithms were used throughout. Diagnostic plots of the residuals showed that there were 5 outliers. These could be deleted on the basis that they presumably were responses to temporary factors and not representative of the long-term trend we sought to estimate. A Q-Q plot of the remainder indicated that Normality was very satisfactory.

[34] For convenience of output, we redefined the timescale timei = 1978 + ti/12 (month ti from the start). Season was represented as a single cycle sinusoid by defining regression terms sini = sin(2πtimei) and cosi = cos(2πtimei) that capture the Autumn versus Spring and Summer versus Winter contrasts respectively. These allow an arbitrary sinusoidal season effect where the corresponding amplitude and phase reflect the magnitude and timing of the seasonal peak. We considered a possible interaction between seasonal effects and time; so multiplicative terms were included with time centered. The model fitted to log(flow) was

  • equation image

where tadj,i = timei − 1992.5 and s(timei;df) means a spline trend with formal degrees of freedom df and includes the constant. Figure 1 shows the log-flow observations and the fitted values of this model. For the purpose of data summary, autocorrelation was not taken into account. It is noticeable there was a large seasonal component at the start which diminished as time progressed. The cos component of season was negligible. From Table 2, the sin coefficients were α1 = −0.6949 and α3 = 0.04744 indicating a strong Spring minus Autumn contrast. At 1978, 14.5 years before the midpoint of the time period, the estimated sin effect was −0.6949 − 14.5 × 0.04744 = −1.383 so that Spring flow was about 4 times the geometric mean and Autumn flow about 0.25 times the geometric mean. Because of the simplicity of the model and these calculations being at the extremes of the period these factors are possibly a little unrealistic, but the feature of changing seasonal effect is unmistakable.

image

Figure 1. Summary of log(flow) at Balranald; the curve shows the trend and seasonal effects sin and cos varying linearly in time.

Download figure to PowerPoint

Table 2. Estimated Regression Coefficients From Model (15) With Autocorrelation 0.595 and Spline With 10 df
TermEstimates.e.
sin, α1−0.69490.0139
cos, α20.01340.0138
sin × tadj , α30.04740.0018
cos × tadj , α40.00640.0017
linear trend component−0.03870.0009

[35] Figure 1 is consistent with our understanding of the system. In 1994 river regulation was tightened and a cap in the flow was put in place. This reduced some of the fluctuations in flow, with fewer high flows in particular. Greater irrigation has also contributed to the general reduction in flow. The increased river management of “environmental flow” has meant that the minimum flows are higher than occurred before 1994.

[36] Since 2004 a new water sharing plan plus a widespread drought have meant that the minimum flows have been held at almost same level since around 1996 but the maximum flows are much reduced. Overall the flow was greatly reduced. Better regulation and much less rain has resulted in a strong reduction in the fluctuations in the flow regime, with minimum flows and peak flows both being much steadier, and the gap between the two decreasing.

[37] A similar model, but including a log-flow term, was fitted to log(EC):

  • equation image

[38] The errors {ɛi} were assumed to be AR(1) with autocorrelation ρ. The time trend term s(timei;df) can be expressed as a linear component α + β6 timei with remainder being a nonlinear component that captures the deviations from linearity.

[39] Necessarily, any departure of the true trend from the spline curve gets incorporated with the errors. Degrees of freedom may be thought of as having roughly the smoothness to the order of a polynomial fit. If the curve is allowed to be more wiggly, df increases and λ decreases; the estimated correlation decreases, but is still substantial even for an over-generous 30 df in Table 3.

Table 3. Values of the Formal Degrees of Freedom of the Spline, tr(S), the Smoothing Parameter λ and the Autocorrelation ρ for the Balranald log(EC) Data
Formal dfλρ
4253,2000.572
5105,4000.568
728,7550.555
1075710.527
1516880.492
301380.420

[40] Table 4 gives the weighted Anova for the case where df = 10 (including the linear component); the estimates and their standard errors are given in Table 5. The effect of log-flow was nonsignificant after eliminating effects that could be attributed to season. Seasonal effects (eliminating log-flow effects) are quite large and the interactions of season are clearly significant. The cos coefficient (β3 + β5tadj) increased over time from −0.0101 at the start of 1987 to 0.3802 at the end of 2006; thus there was little difference Summer minus Winter initially, but it increased over time. At the end, the ratio of Summer EC to Winter EC was exp(2 × 0.3802) = 2.14. The sin coefficient showed a smaller change −0.1071 to −0.2756, thus Spring EC was higher than Autumn EC and the ratio increased. The standard error of the coefficient of sin is smaller than that for cos because it is confounded with the log-flow effect which has a large sin component. Thus in interpreting Table 4, statistical significance does not necessarily indicate the magnitude of an effect but the ability to detect it. The estimated amplitude of the sinusoid was 0.1076 at the start with phase (maximum) about 20 May; at the end the amplitude was about 4 times larger at 0.4696 with phase about 21 March.

Table 4. Weighted Anova Table for Balranald Salinity Data Assuming First Order Autocorrelation of 0.527 for the Errors
TermsdfWeighted SSMSSFP
Linear terms (log-flow, season, season × time, linear trend)67.3861.23119.74<0.001
Nonlinear spline trend94.1180.4587.34<0.001
Residual33020.5800.062  
Total34532.083   
Table 5. Estimated Coefficients and Standard Errors for Balranald Salinity Model (16) with df = 10 and Autocorrelated Errors ρ = 0.527
TermCoefficientStandard Errort-Value
log-flow β1−0.02620.0268−0.98
sin β2−0.19130.0232−8.23
cos β30.18510.013813.44
sin × tadjβ4−0.00580.0021−2.70
cos × tadjβ50.01350.00177.86
linear trend β6−0.01040.0014−7.44

[41] Figure 2 shows the estimated trend and seasonal effects superimposed on the adjusted data. The adjusted log(EC) is obtained by subtracting β1log(flow). Residuals showed reasonably constant variance over time, though two high values occur in 1985 and three low adjusted log(EC) are to be seen in 2004 and have slightly depressed the trend for the adjacent data. Apart from these, the scatter in the data is fairly constant. As the flow adjustment is small, the change in seasonal effect is not attributable to the change in seasonal flow. We have noted separately that the regression coefficients differ very little from those in Table 4 if df = 7, and indeed were found to be fairly insensitive to a wider range of trend smoothing.

image

Figure 2. Log(EC) at Balranald adjusted for log(flow) showing the trend and seasonal effects in Table 5.

Download figure to PowerPoint

[42] For any site where the nonlinear component of the trend is found to be significant, the linear component is an insufficient summary of the nature of the trend. In Table 4 the F-value is highly significant. This does not imply that the estimate of the linear trend in Table 5 is invalid but does suggest that other features could be as important too. The coefficient of the linear component of trend in adjusted log(EC) was −0.0104, which corresponded to a decrease of 1.05% p.a. on the natural scale. This was statistically highly significant (t = −7.43). However, this number gives quite the wrong impression, for the trend in Figure 2 is markedly nonlinear; it is shown in more detail in Figure 3 on the natural scale. We first visited the Balranald data for 1966 to 1995 [cf. Jolly et al., 2001]. Without the subsequent data, the estimated linear trend was significantly positive, but the spline curve had already detected the downturn. This reinforces the point that a linear trend can be an inadequate summary and a poor predictor when nonlinear trends occur; it is sensitive to the start and the end of the sequence.

image

Figure 3. Estimated trend (linear plus nonlinear) with 95% confidence limits for salinity at Balranald. Formal df for the spline is 10, autocorrelation 0.527.

Download figure to PowerPoint

[43] In Figure 3, the curve and 95% confidence bands on the log scale have been transformed back to the natural scale. For a more familiar interpretation, the trend curve has been scaled to have the same mean as the original EC. The spline trend curve shows a marked increase in salinity from mid 1983. This occurred after a severe drought broke in April and May 1983 and reflects the increased salinization in the upper tributaries. This rise ceased around the start of 1985. A steady decline started around 2001 which coincided with the start of another drought and the adjusted EC level has dropped below what it was in 1978. This observed decrease in salinity is driven by reduced saline inflows to the system from the saline tributaries, reduced irrigation drainage, and reduced groundwater levels proximal to the river and hence reduced groundwater inflow to the river.

[44] If df = 7 is chosen, the trend curve is smoother (Figure 4). This suggests that the rise may have continued more slowly upwards past 1985 to about 1990. The smaller and more local peak and troughs in Figure 3 between 1985 and 2001 may or may not have an interpretation, but whether Figure 3 or Figure 4 is preferred, the big features are unmistakable.

image

Figure 4. Estimated trend with 95% confidence limits for salinity at Balranald. Formal df for the spline is 7, autocorrelation 0.555.

Download figure to PowerPoint

[45] If the data are sufficiently informative, it is clear that more complex models could be analyzed. Variates such as log-flow could be lagged as in transfer models, seasonal effects could include two-cycle sinusoid terms, management or remediation measures could be represented, etc. It is possible to model the coefficients as varying smoothly over time, as by Hastie and Tibshirani [1993]. However, if the response to flow is not constant, for example β1 might vary with time as the result of building of a weir, then it is not clear what the appropriate method for correcting the trend in water quality for flow effects should be, and the solution may have to be specific to the circumstances. In their analysis of log-sulphate concentration, Miller and Hirst [1998] and Hirst [1998] use product terms of log-flow by sin and cos, with smoothly varying coefficients over time, but assumes uncorrelated errors. The model (16) is similar to Potts et al. [2003] for nitrate concentration except that they use time-varying spline coefficients for log-flow and season; the errors are AR(1). They base their inference on bootstrapping.

8. Discussion

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[46] Water quality data are often highly variable. This is compounded by the fact that they can be subject to errors that result from sampling anomalies, contamination, laboratory imprecision, transcription and data storage issues. Many of these anomalies present themselves as “outliers” and typically are identified by large standardized regression residuals. We have generally not noticed that outliers seriously distort the trend estimate unless they occur at the extremes of the period. The main advantage of giving outliers less weight in the analysis is in the reduction of the residual variance and consequently the narrower confidence intervals. If observations are apparently aberrant, they should be inspected and considered for exclusion from the analysis, whatever trend method is used.

[47] It is always important to consider the frequency of observations required in relation to what we are trying to detect. For instance, if we are looking for trends or changes over several seasons there is substantial redundancy in data collected daily or to the nearest minute. Averaging to a wider time window will have a negligible effect on the information content but may make the interpretation easier and reduce the computational burden. Such means lessen the need to model the autocorrelation in more detail and are frequent enough to identify seasonal effects. However, to determine the rapid changes in water quality, for example due to rainfall events, a daily or hourly timescale would be needed.

[48] As was noted in the Balranald example, the sampling frequency changed over time. The variance and autocorrelation of monthly means residual might well have changed during the period. Changes in the procedure of monitoring could in principle be incorporated into a GAM model. This would be very difficult to tackle with any generality, and it is beyond the scope of this paper, which has introduced this case study as an example to illustrate some of the possibilities. The point is that the GAM set-up has a much greater capacity to increase the complexity of the model than have nonparametric or robust methods.

[49] The appropriate length of the monitoring data sequence will depend on the need and ability to separate effects that are contributing to the data observed at different scales. When strong seasonal components are present, several years are often needed so as to adequately separate the broader general trend from the shorter term seasonal variation. Longer sequences are also generally needed when there is substantial serial correlation because it reduces the effective sample size.

[50] Irregularly spaced or missing data are naturally accounted for within the GAM methodology, though irregularly spaced data usually means we have to adjust/average the data to a regular time interval or model the serial correlation using a continuous AR(1) process. If the proportion of missing data is large it may not be possible to estimate the autocorrelation parameter. If the missing data occur in large blocks the trend during that period will be poorly estimated.

[51] The choice of the smoothing parameter λ is problematic. Hirst [1998] recommends 1 degree of freedom for each year, but that would clearly be too much for our example. Data driven methods such as cross-validation or AIC tend to undersmooth, particularly where positive autocorrelation is present. If automatic methods are used, any inference should take the process into account, but in practice this is often ignored. We advocate that the degree of smoothing should depend upon the purpose of the enquiry (at what scale are the phenomena of interest and what are interpretable?). This depends on knowledge of floods, droughts, intervention or changes in land management. Once these have been identified, additional regression terms may be included in the model and a smoother residual trend may be satisfactory. For prediction beyond the end of the series, a fairly smooth curve is required. Morton et al. [2008] investigate the choice of smoothing parameter on that basis.

[52] The use of a smoothing spline to estimate the trend is effective when the underlying trend is indeed smooth. It will however fail to pick up rapid changes in slope, and always underestimates the peak of a local maximum or the trough of a local minimum. At an extreme of the period, the spline is straight and may be useful for short term prediction, but it would not be sensible to predict the future very far by extending a spline curve that has recently changed direction. Indeed, the degree of nonlinearity of the trend curves means that it is unreliable to predict the future without some hydrological insight as to why the peaks and troughs have occurred.

[53] GAMs have many advantages for analyzing trends. All the facilities that are available in regression modeling can be incorporated to represent any desired complexity. There is great flexibility in determining the shape of the trend and its confidence bands.

Appendix A:: Weighted Analysis of Variance

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[54] Overall tests of significance of the trend may be performed. The linear component is estimated in the usual way by Weighted Least Squares (WLS). The remaining nonlinear component can be tested as follows.

[55] Denote r0 = (IH)y the residual ignoring trend, r1 upon adding linear trend and r = y−Xequation image, the final residual. Form the weighted analysis of variance table, see Table A1. Where 1 is the vector of ones and equation image = yTV−11/1TV−11 is the estimated mean. The ratio of mean weighted sum of squares for a term to that for the residual may be referred to the F-distribution, in the usual manner, to give an approximate test. For the nonlinear trend, the F-test tends to be too liberal (i.e., rejects the null hypothesis more frequently than the formal significance level). An improvement to this test can be achieved by scaling the weighted SS term and adjusting its formal df, so as to make the first two moments agree with the corresponding chi-square distribution, in a similar manner to Hastie and Tibshirani [1990, section 3.9].

Table A1. Weighted Analysis of Variance Table
TermdfWeighted SS
β (ignoring g)p(Hy)TV−1Hy
Linear g (eliminating β)1r0TV−1r0r1TV−1r1
Nonlinear g (eliminating β)tr(S) − 2r1TV−1r1rTV−1r
Residualnp − tr(S)rTV−1r
Totaln − 1yTV−1yequation image21TV−11,

[56] A companion table can be formed whereby g is fitted first. However, the weighted SS terms for β eliminating g can be unreliable and the change in weighted SS when adding this term may even be negative. This paradox is due to the fact that the estimates are derived from minimizing T not the weighted SS. We recommend that this companion table not be used, but that inference about β be based on the multivariate Normal distribution

  • equation image

[57] Estimation of V is based on the residuals. The process starts by estimating β and g under the assumption of independent errors, estimating V and iteratively updating the estimates alternately until satisfactory convergence. It may be sufficient to do this only once as was done by Diggle and Hutchinson [1989]. The estimate of σ2 is the mean residual weighted SS.

Acknowledgments

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information

[58] The authors thank the Murray Darling Basin Commission for providing the Balranald salinity data. The data were collected as part of the MDBC River Murray Water Quality Monitoring Program. We are grateful to our colleagues Eddy Campbell, Peter Hairsine, and Evan Christen for helpful comments on an earlier version of this paper.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information
  • Alley, W. M. (1988), Using exogenous variables in testing for monotonic trends in hydrological time series, Water Resour. Res., 24, 19551961.
  • Brillinger, D. R. (1994), Trend analysis: Time series and point process problems, Environmetrics, 5, 119.
  • Cox, M. E., A. Moss, and G. K. Smyth (2005), Water quality condition and trend in North Queensland waterways, Mar. Pollut. Bull., 51, 8998.
  • Diggle, P. J., and M. F. Hutchinson (1989), On spline smoothing with autocorrelated errors, Aust. J. Stat., 31, 166182.
  • Diggle, P. J., K.-Y. Liang and S. L. Zeger (1994), Analysis of Longitudinal Data, Oxford Univ. Press, Oxford, U.K.
  • Dixon, W., and B. Chiswell (1996), Review of aquatic monitoring program design, Water Res., 30, 19351948.
  • El-Shaarawi, A. H., and S. P. Niculescu (1993), A simple test for detecting non-linear trends, Environmetrics, 4, 233242.
  • Esterby, S. R. (1996), Review of methods for the detection of trends with emphasis on water quality applications, Hydrol. Process, 10, 127149.
  • Giannitrapani, M., A. W. Bowman and E. M. Scott (2005), Additive models for correlated data with applications to air pollution monitoring, technical report, Univ. of Glasgow. (Available at www.gla.ac.uk/adrian/).
  • Green, P. J., and B. W. Silverman (1994), Nonparametric Regression and Generalized Linear Models, CRC Press, London.
  • Harcum, J. B., J. C. Loftis, and R. C. Ward (1992), Selecting trend tests for water quality series with serial correlations and missing values, Water Resour. Bull., 28, 469478.
  • Hastie, T. J., and R. J. Tibshirani (1990), Generalized Additive Models, CRC Press, London.
  • Hastie, T. J., and R. J. Tibshirani (1993), Varying-coefficient models (with discussion), J. R. Stat. Soc. Ser., B, 55, 757796.
  • He, S., S. Mazumdar, and V. C. Arena (2006), A comparative study of the use of GAM and GLM in air pollution research, Environmetrics, 17, 8193.
  • Health Effects Institute (2003), Revised analyses of time-series studies of air pollution and health: Special Report, Health Effects Institute, Boston, Mass.
  • Hipel, K. W., and A. I. McLeod (1994), Time Series Modelling of Water Resources and Environmental Systems, Elsevier, Amsterdam.
  • Hirsch, R. M., and J. R. Slack (1984), A nonparametric trend test for seasonal data with serial dependence, Water Resour. Res., 20, 727732.
  • Hirsch, R. M., J. R. Slack, and R. A. Smith (1982), Techniques of trend analysis for monthly water quality data, Water Resour. Res., 18, 107121.
  • Hirsch, R. M., R. B. Alexander, and R. A. Smith (1991), Selection of methods for the detection and estimation of trends in water quality, Water Resour. Res., 27, 803813.
  • Hirst, D. (1998), Estimating trends in stream water quality with a time-varying flow relationship, Aust. J. Stat., 27, 3948.
  • Huber, P. J. (1981), Robust Statistics, John Wiley, New York.
  • Jolly, I. D., et al. (2001), Historical stream salinity trends and catchment salt balances in the Murray-Darling Basin, Australia, Mar. Freshwater Res., 52, 5363.
  • Jung, S.-H. (1996), Quasi-likelihood for median regression models, J. Am. Stat. Assoc., 91, 251257.
  • Koenker, R., P. Ng, and S. Portnoy (1994), Quantile smoothing splines, Biometrika, 81, 673680.
  • Letcher, R. A., S. Yu. Schreider, A. J. Jakeman, B. P. Neal, and R. J. Nathan (2001), Methods for the analysis of trends in streamflow response due to changes in catchment condition, Environmetrics, 12, 613630.
  • Libiseller, C., and A. Grimvall (2002), Performance of partial Mann-Kendall tests for trend detection in the presence of covariates, Environmetrics, 13, 7184.
  • McMullan, A., A. W. Bowman, and E. M. Scott (2007), Water quality in the River Clyde: A case study of additive and interaction models, Environmetrics, 18, 527539, doi:10.1002/env.823.
  • Miller, J. D., and D. Hirst (1998), Trends in concentrations of solutes in an upland catchment in Scotland, Sci. Total Environ., 216, 7788.
  • Morton, R., E. L. Kang, and B. L. Henderson (2008), Smoothing splines for trend estimation and prediction in time series, Environmetrics, in press.
  • Nathan, R. J., N. Nandekumar, and W. E. Smith (1999), On the application of Generalised Additive Models to the detection of trends in hydrologic time series data, in Water 99 Joint Congress; 25th Hydrology and Water Resources Symposium, 2nd International Conference on Water Resources and Environment Research, pp. 165172, Inst. of Eng., Barton, ACT.
  • Opsomer, J., Y. Wang, and Y. Yang (2001), Nonparametric regression with correlated errors, Stat. Sci., 16, 134153.
  • Potts, J. M., D. Hirst, A. C. Edwards, and D. A. Elston (2003), Comparison of trends in stream water quality, Hydrol. Process., 17, 24492462.
  • Schreider, S., Yu. Jakeman, A. J. Letcher, R. A. Nathan, B. P. Neal, and S. G. Beavis (2002), Detecting changes in streamflow response to changes in non-climatic conditions: Farm dam development in the Murray-Darling basin, Australia, J. Hydrol., 262, 8498.
  • Sen, P. K. (1968), Estimates of the regression coefficient based on Kendall's Tau, J. Am. Stat. Assoc., 63, 13791389.
  • Theil, H. (1950), A rank-invariant of linear and polynomial regression analysis, Konikl. Ned. Akad. Wetenschap. Proc., 53, 386−392, 521−525, and 1397−1412.
  • Wang, X. L., and V. R. Swail (2001), Changes of extreme wave height in northern hemisphere oceans and related atmospheric circulation regimes, J. Clim., 14, 22012204.
  • Yue, S., P. Pilon, B. Phinney, and G. Cavadias (2002), The influence of autocorrelation on the ability to detect trend in hydrological series, Hydrol. Process., 16, 18071829.
  • Zhang, X., and F. W. Zwiers (2004), Comment on Applicability of pre-whitening to eliminate the influence of serial correlation on the Mann-Kendall test by Sheng Yue and Chun Yuan Wang, Water Resour. Res., 40, W03805, doi:10.1029/2003WR002073.

Supporting Information

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Seasonal Kendall's Slope
  5. 3. An Improved Approach Through GAMS
  6. 4. Inference for GAMS With Correlated Errors
  7. 5. Robust Trend Estimation
  8. 6. Comparison of Estimation Methods
  9. 7. Case Study: Salinity Trend at Balranald
  10. 8. Discussion
  11. Appendix A:: Weighted Analysis of Variance
  12. Acknowledgments
  13. References
  14. Supporting Information
FilenameFormatSizeDescription
wrcr11413-sup-0001-t01.txtplain text document2KTab-delimited Table 1.
wrcr11413-sup-0002-t02.txtplain text document0KTab-delimited Table 2.
wrcr11413-sup-0003-t03.txtplain text document0KTab-delimited Table 3.
wrcr11413-sup-0004-t04.txtplain text document0KTab-delimited Table 4.
wrcr11413-sup-0005-t05.txtplain text document0KTab-delimited Table 5.
wrcr11413-sup-0006-taA01.txtplain text document0KTab-delimited Table A1.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.