2.1. Simple and Local Linear Regression
[5] Many climatic series consist of realvalued data and are regularly spaced in time. A commonly used approach to trend estimation is the linear model
where Y_{1}, …, Y_{n} is the series of interest, the trend function μ_{t} = β_{0} + β_{1}t is a linear function of the time index t and the errors ε_{t} form a sequence of independent, normally distributed random variables with zero mean and a common variance.
[6] In contrast with the linear trend model, nonparametric techniques impose minimal assumptions on the form of μ_{t} and hence allow the data to speak freely about any underlying changes in the mean. Local linear regression [Bowman and Azzalini, 1997] is one such technique, in which the trend function is assumed to be smooth so that μ_{t} can be approximated by a straight line in the neighborhood of any time point τ : μ_{t} ≈ μ_{τ} + β_{τ}(t − τ) say. Estimates of the coefficients μ_{τ},β_{τ} are found by solving the weighted least squares problem
where the weight function w(•; h) is the normal probability density function with mean 0 and standard deviation h so that observations distant from τ are downweighted. The estimate of μ_{τ} ( , say) is then taken as an estimate of the underlying mean at time τ. This procedure is repeated over a range of values of τ to build the estimated trend function. A variability band indicating the size of two standard errors above and below the estimate can be constructed, and used to provide an informal assessment of uncertainty in the estimation. Reference bands to assess the suitability of particular models, such as constant mean or linear trend, can also be produced.
[7] The smoothing parameter (or bandwidth) h controls smoothness of the resulting trend estimate and is expressed in the same units as t. The sm package [Bowman and Azzalini, 1997] (also their R package ‘sm’: Nonparametric smoothing methods (version 2.2–4), http://www.stats.gla.ac.uk/∼adrian/sm, http://azzalini.stat.unipd.it/Book_sm) in the R programming environment [R Development Core Team, 2010] offers three automatic methods for the selection of h: cross validation (h_{cv}), an approximate degrees of freedom criterion (h_{df}) and a corrected Akaike information criterion (h_{aicc}) [Hurvich et al., 1998]. Alternatively, local linear regression can be viewed as a linear filter and its squared gain function used to guide the final selection of h [Chandler and Scott, 2011, chapter 2]. If, for example, features on temporal scales of less than a few years are of little interest then h could be selected so that the squared gain is effectively zero at frequencies corresponding to these scales and close to 1 at frequencies corresponding to longer time scales.
[8] A key issue is whether a fitted regression curve represents a real longterm trend or whether it can be attributed to random variation. The sm package contains routines that enable a formal test of either no change or a linear trend in the null hypothesis, against the local linear regression model in the alternative hypothesis (for the theory underlying this test seeBowman and Azzalini [1997, section 5.2] or Chandler and Scott [2011, section 4.1.7]). Under a given null hypothesis, the observed significance level (pvalue) for the test can be obtained from the quantiles of a scaled and shifted chisquare distribution. With respect to the selection ofh, it is important to reflect on the difference between inference (where the aim is to detect the presence of some effect of interest) and estimation (where interest is focused on the best possible estimate of the regression function). The literature is replete with automatic bandwidth selection procedures, all of which have strengths and weaknesses and, critically, it is usually extremely difficult to tell from the data alone whether the conditions are met for one method to be preferred over another. There are also the open questions of whether the bandwidths for inference and estimation should be taken as the same, and whether error criteria for the determination of an optimal bandwidth for estimation are appropriate for inference [Gijbels, 2003]. Therefore, rather than relying on a single bandwidth, we encourage the use of h_{cv}, h_{df} and h_{aicc} and the squared gain function as a reference to establish a ‘reasonable’ range of values for h. A plot of the pvalue as a function ofh is called the significance trace, and it assesses the sensitivity of the test results to the choice of h.
[10] Large values of the above test statistic suggest that one or more discontinuities are present, and pvalues can be calculated under the assumption of independent normal errors with zero mean and constant variance. As before, a significance trace can be constructed to assess the sensitivity of test results to the choice ofh. Bowman et al. [2006] show via simulation that the procedure has good power to detect genuine discontinuities when they are present, and apply it to detect an abrupt change in the flow of the Nile River in 1899.
[11] This approach allows evaluation of the evidence for discontinuities in a datadriven way, without imposing specific functional forms (such as linearity, piecewise linearity, or step functions) on the underlying regression function (other techniques offering a similar approach include those ofGrégoire and Hamrouni [2002] and Gijbels and Goderniaux [2004]). Moreover, the approach acknowledges that a series may contain both discontinuities and a smooth trend. This is a substantial advantage over many popular techniques for discontinuity detection, which implicitly assume that any longterm change must be due to a discontinuity (this is true for several of the techniques reviewed in the Introduction to this paper). The approach is also open to the possibility that a series contains only discontinuities and no additional trend. In this case, the ‘smooth trend’ is a constant function.
[12] The reference distributions for the test statistics above are derived under the assumption that the error terms (ε_{t}) in (1) are independent and normally distributed with constant variance. In practice, normality is not a critical requirement in large samples; however, failure of the constant variance assumption can lead to incorrect pvalues. This is potentially problematic for the analysis of count data (i.e., data that are only measurable with integers), such as the numbers of occurrences of rare events in specified time intervals. The Poisson process is often used as a starting point for modeling such series. The variance of a Poisson distribution is equal to its mean: thus, in the presence of trends in the mean of a count series, the variance is unlikely to be constant. However, the square root of a Poisson distributed random variable has a variance that is approximately independent of the mean [Davison, 2003, p. 59]. Thus, to apply the above regression framework to the typhoon count series in Section 4 below, we first take square roots of the counts (for an earlier example of this approach, see Dobson and Stewart [1974]). More recent techniques are available that can handle heteroscedasticity directly [e.g., Grégoire and Hamrouni, 2002]; in this article however, we retain the simpler approach of transforming the data because it can be implemented more easily using existing software. For all of our analyses, we use residual diagnostics to check for violations of the modeling assumptions [see, e.g., Draper and Smith, 1998].
[13] The final critical assumption underlying the test is that the (ε_{t}) in (1) are independent. With time series data, correlation may be present and if this is ignored then results regarding the automatic selection of h and the calculation of pvalues and reference bands will be incorrect. Where necessary, a good strategy is to fit a simple time series model, such as a firstorder autoregression, to the residuals from a local linear estimate with a carefully chosen smoothing parameter. The corresponding covariance matrix can then be incorporated into the distributional calculations. The technical details are beyond the scope of the present paper, where the examples do not require this amendment. For more discussion, seeChandler and Scott [2011, section 4.2.4] and references therein.
2.2. Pettitt Change Point Test
[14] For comparative purposes we use the nonparametric approach developed by Pettitt [1979] to detect change points in the winter NAO index, relative humidity and typhoon count series. The null hypothesis for this test is that the observations are independent and identically distributed. Suppose R_{1}, …, R_{t} are the ranks of the t observations Y_{1}, …, Y_{t} in the complete sample of nobservations. For a twosided test (i.e., one in which the alternative hypothesis does not specify the direction of change), the test statistic is defined by
where U_{t,n} = 2W_{t} − t(n + 1) and j = 1, …, t.When the observations are continuousvalued, thepvalue forK = k, p_{k} say, is approximated by p_{k} ≈ 2 exp[−6k^{2}/(n^{3} + n^{2})] if p_{k} ≤ 0.5. While this approximation is useful for the analyses of the winter NAO index and relative humidity data, the typhoon count series considered herein contains only seven unique values and hence cannot be considered as continuousvalued. Thus we used bootstrap resampling [Efron and Tibshirani, 1993] to compute p_{k} for the typhoon series.
2.3. Poisson Modeling
[15] Tu et al. [2009] applied two Poisson models to their typhoon count series: one in which the rate parameter was assumed to be time invariant, and the other in which a single change point was assumed with different rate parameters before and after the change point. Denoting by X_{t} the number of typhoons in year t, the first of these models has
for x = 0, 1, 2, …; and the second has
where λ_{i} > 0 (i = 0, 1, 2) are unknown rate parameters (the expected number of typhoons per year); and δis an unknown change point. We used maximum likelihood to estimate the rate parameters, and the likelihood ratio (deviance) test to compare the fit of the Poisson models. The usual chisquare reference distribution for likelihood ratio test statistics fails in change point problems, however [Davison, 2003, section 4.6]; thus we used bootstrap resampling to compute the pvalue of the deviance test statistic [seeChandler and Scott, 2011, sections 3.4.3 and 3.6].
2.4. Statistical Power Analysis
[16] We performed a statistical power analysis (via simulation) to compare the abilities of the discontinuity and Pettitt tests to detect change points under various conditions. Power is defined here as the probability of detecting a discontinuity when it is present. For a realistic study, we base our simulations upon an analysis of the winter NAO index series. This series exhibits a complex nonlinear trend (section 3), which we estimate nonparametrically. The fitted trend function is then used to determine the residual variance estimate . The simulated series are generated by adding independent Gaussian errors with zero mean to the fitted trend function, for three different levels of error variance where c = {0.25, 1, 2.5}. We then insert jumps in the series. Several jump locations are considered, with three jump sizes at each location. One of the locations is in the middle of the series; the others at t = {15, 25, 49, 98, 125, 135}. The jump sizes used, namely Δ = {0, 1, 2, 3}, encompass zero and the root mean square successive difference defined by
[17] Three values of h were chosen to represent small, moderate and large amounts of smoothing (section 3). For each of the above settings, 1000 experimental series are generated and the number of rejections of the null hypothesis at the 0.05 level recorded. An accurate test procedure is expected to reject the null hypothesis 5% of the time in the absence of a jump, that is when Δ = 0.
[18] The presence of trends in the experimental series could cause violations of the null hypothesis of the Pettitt test [see, e.g., Vincent et al., 2011]. Since our interest here is the detection of sharp discontinuities, one approach is to detrend the series prior to application of the Pettitt test [see, e.g., Rodionov, 2005]. In practice the true form of the underlying trend is unknown and trend removal methods such as differencing and the fitting of low order polynomials can change the error structure. For illustrative purposes, we assume that the underlying trend could be represented adequately by a linear regression model and assess the consequences of detrending on the power properties of the test.
[19] We also investigate the ability of the discontinuity and Pettitt tests to detect changes in a hypothetical series for which the conditions of the null hypothesis of the Pettitt test are met. Here, data are simulated from the trend function defined by
where t_{c} is the change point, Δ is the jump size at t = t_{c} and m = − 2Δ/(n − 1). The settings used for error variance, Δ and h are the same as those described above. The results are presented in Tables 1 and 2 and discussed in section 3 below.
Table 1. Power Properties of the Discontinuity Test^{a}Jump  h  Experimental Series  Hypothetical Series 

c = 0.25  c = 1.0  c = 2.5  c = 0.25  c = 1.0  c = 2.5 


0  5.0  0.044  0.048  0.046  0.048  0.050  0.048 
0  7.5  0.064  0.049  0.052  0.047  0.060  0.057 
0  10.0  0.170  0.074  0.064  0.064  0.052  0.044 
1  5.0  0.070  0.053  0.048  0.049  0.060  0.060 
1  7.5  0.108  0.060  0.052  0.070  0.053  0.067 
1  10.0  0.317  0.091  0.073  0.115  0.058  0.059 
2  5.0  0.141  0.065  0.052  0.134  0.068  0.057 
2  7.5  0.336  0.099  0.062  0.222  0.086  0.049 
2  10.0  0.685  0.164  0.101  0.397  0.094  0.068 
3  5.0  0.298  0.098  0.065  0.247  0.107  0.059 
3  7.5  0.726  0.173  0.089  0.570  0.150  0.097 
3  10.0  0.945  0.315  0.136  0.853  0.235  0.096 
Table 2. Power Properties of the Pettitt Test^{a}Jump  Experimental Series  Detrended Experimental Series  Hypothetical Series 

c = 0.25  c = 1.0  c = 2.5  c = 0.25  c = 1.0  c = 2.5  c = 0.25  c = 1.0  c = 2.5 


0  0.765  0.205  0.091  0.177  0.017  0.000  0.039  0.047  0.041 
1  1.000  0.891  0.498  0.038  0.006  0.000  0.339  0.093  0.058 
2  1.000  0.999  0.960  0.001  0.002  0.001  0.896  0.336  0.147 
3  1.000  1.000  1.000  0.241  0.001  0.001  1.000  0.637  0.285 