Detecting abrupt change on the basis of skewness: numerical tests and applications

Authors


Correspondence to: Wenping He, National Climate Center, China Meteorological Administration, Beijing, China. E-mail: wenping_he@163.com

ABSTRACT

An abrupt change occasionally occurs when the dynamical system suddenly shifts from one stable state to a new state, which can take place in many complex systems, such as climate, ecosystem, social system, and so on. In order to detect abrupt change, this article presents a novel method – sliding transformation parameter (STP) on the basis of skewness change and the Box–Cox transformation. Tests on model time series and 1000 simulated daily precipitation data show the ability of the present method to identify and detect abrupt change of probability density function. The applications of STP in daily precipitation data show that there is an abrupt climate change between 1979 and 1980 in the selected observational stations, which is almost the same with the result obtained by approximate entropy (ApEn). Furthermore, it is found that the sample sizes of sliding windows have some influence on the Lambda parameter of the Box–Cox transformation, but it does not significantly affect the varying trend of the parameter and the identification of the change point in annual or interannual time scale. Comparing STP with the coefficient of skewness and kurtosis, ApEn, and some statistics approaches (e.g. percentiles and annual maxima), we find that the performance of the present method is much better than that of these methods.

Introduction

Continuously and gradually varied processes are popular phenomena in nature. However, many discontinuous processes with abrupt changes occur in all kinds of natural phenomena, such as seasonal abrupt changes in atmospheric circulation and climate (Ye et al., 1958; Xiao and Li, 2007). Ecological systems occasionally undergo rapid shifts from one stable state to an alternative stable state with dramatically different properties. Well-studied examples of such drastic changes include lake and coastal oceans eutrophication, ecosystems, changes in states of coral reefs, collapse of vegetation in semi-arid ecosystems, and catastrophic shifts in rangelands, fish populations, or wildlife populations may threaten ecosystem services (Scheffer et al., 2001; Folke et al., 2005). In medicine, we have spontaneous systemic failures such as asthma attacks (Venegas et al., 2005) or epileptic seizures (Litt et al., 2001); in global finance, there is a concern about systemic market crashes (May et al., 2008); in the Earth system, abrupt shifts in ocean circulation may occur (Marotzke, 2000; Lenton et al. 2008).

A general definition of abrupt climate change is that the climate system is from one stable state suddenly transition to a new stable state, such as Younger Dryas, Dansgaard–Oeschger events. Comparing with these two events, there have been a series of less dramatic abrupt climate shifts since 1976. The circulation shift in the western Pacific in the winter of 1976–1977 proved to have much wider impacts (Miller et al., 1994). Land temperatures had remained relatively trendless from 1950 to 1976, despite the CO2 rising from 310 to 332 ppm as fossil fuel emissions tripled. In contrast, there was a marked shift in 1977 with a rising about 2 °C/century of the observed global mean surface temperature (Thompson et al., 2008). The abrupt changes also often occur in the signals recorded during disturbances in the electrical power network. In the recent years, abrupt change has become an increasingly important scientific problem in many complex systems, such as ecosystem (Scheffer and Carpenter, 2003; Brock et al., 2006), social system (Brock et al., 1999), mechanical signals, engineering (Holling, 1996), Internet Protocol networks (Thottan and Ji, 2003), financial markets (May et al., 2008), and climate system (Alley et al., 2003; Alley, 2004; Lenton et al., 2008; Xiao and Li, 2011; Xiao et al., 2012), and so on. Obviously, it is important to identify and detect abrupt changes in all kinds of signals.

A lot of research has been done concerning the detection method for abrupt changes during past decades. There are some traditional methods, such as moving t-test (MTT) (Ducré-Robitaille et al., 2003; Xiao and Li, 2007), Cramer method, Mann–Kendall test (Mann, 1945; Kendall et al., 1975), Yamamoto method (Yamamoto et al., 1985), standard normal homogeneity test (SNHT) (Alexandersson, 1986), Lepage test (Yonetani, 1993), multiple linear regression method (Vincent, 1998), cumulative sum methods (Buishand, 1982; Rebstock, 2002), sequential methods (Rodionov, 2004), and two-phase regression technique (Lund and Reeves, 2002; Reeves et al., 2007; Solow, 1987). These traditional methods can monitor some direct indicators of the process observables, such as means or trends of a process measurement.

In the recent years, many new methods have been presented. Ilin et al. (2004) uses a novel method of a nonlinear state-space model, nonlinear dynamical factor analysis algorithm (NDFA), for the problem of state change detection, and finds that NDFA outperforms the classical methods such as the cumulative sum method and the discrete cosine transform. The main drawback in NDFA is the large number of iterations needed for offline learning of the process model, although the actual change detection can then be done online. Using long-range power-law correlations in a dynamical system, He et al. (2008) proposed an approach moving detrended fluctuation analysis (MDFA), and further they developed a new method moving-cut detrended fluctuation analysis (MC-DFA) to detect abrupt dynamic changes in correlated time series (He et al., 2012). Because detrended fluctuation analysis is a method for quantifying the correlation property in non-stationary time series, MDFA, and MC-DFA are both only suitable to detect abrupt change in those time series which exhibit complex behaviour characterized by power law. Approximate entropy (ApEn) is a measure to quantify system complexity (Pincus, 1991; Pincus and Goldberger, 1994), which reflects the likelihood that ‘similar’ patterns of observations will not be followed by additional ‘similar’ observations. A time series containing many repetitive patterns has a relatively small ApEn. A less predictable (i.e. more complex) process has a higher ApEn. In the recent years, ApEn has been used to detect abrupt change in meteorological observational data (Wang and Zhang, 2008).

These above methods provide various ways to detect different types of abrupt change. The traditional method mainly can be used to detect abrupt change in means, variations, trends, and frequency. However, any method for abrupt change detection is not a panacea, and these traditional methods cannot detect abrupt probability density function (PDF) change. In order to deal with this problem, it is crucial to develop some new different abrupt change detection technologies. Using model simulations that mimic field measurements and a simple analysis of real data from abrupt climate change in the Sahara, Guttal and Jayaprakash (2008) studies the feasibility of coefficient of skewness (CS) as an early warning signal of regime shifts in ecosystems by using data available from routine monitoring. They found that the change in asymmetry in the distribution of time series data, quantified by changing skewness, is a model-independent and reliable early warning signal for abrupt change. Inspired by this reference (Guttal and Jayaprakash, 2008), skewness will be applied to detect abrupt PDF changes in this article. For a stable dynamical system, PDF is relatively stable; however, the PDF will vary with the change in dynamic structure of the system. The Box–Cox transformation of a time series can provide a transformation parameter (Box and Cox, 1964, 1982; Sakia, 1992; Qian et al., 2010a, 2010b), which can quantitatively indicate a skewness.

On the basis of the characteristic of PDF and using the Box–Cox transformation, this article presents a new method, namely, sliding transformation parameter (STP) to detect abrupt changes in time series by means of identifying some small changes in PDF. Tests on model time series and 1000 simulated daily precipitation data show the ability of the present method to identify and detect abrupt PDF change, such as abrupt PDF change caused by changing in parameter of an equation or abrupt dynamical change. Further studies show that the STP results are almost independent on the size of a subseries. The applications in daily precipitation records verify the validity of STP. Comparing STP with CS, coefficient of kurtosis (CK), ApEn, and some statistics approach (e.g. percentiles and annual maxima), we find that the performance of the present method is better than that of these methods.

This article is organized as follows: Section 'Methods and data' first briefly outlines the definitions of CS and CK, and then describes the Box–Cox transformation method. In Section 'Results', the performances of STP are tested on simulated time series and observational data, respectively. Moreover, comparing STP with other approaches is presented in Section 'Results'. Section 'Discussion and conclusions' summarizes the main results and conclusions of the present study with a brief discussion.

Methods and data

Definitions of CS and CK

Skewness is a measure of the degree of asymmetry of a distribution. If the left tail (tail at small end of the distribution) is more pronounced than the right tail (tail at the large end of the distribution), the probability density distribution function is said to have negative skewness. If the reverse is true, it has positive skewness. If the two are equal, it has zero skewness.

Coefficient of skewness is often denoted by γ. Given a probability function P(x), with mean μ and standard deviation σ, CS is defined as the scaled third moment about the mean:

display math(1)

Kurtosis is the degree of peakedness of a distribution, defined as a normalized form of the fourth central moment of a distribution. CK is three for a standard normal distribution and often denoted by k. For this reason, the following definition of kurtosis (often referred to as ‘excess kurtosis’) is often used:

display math(2)

This definition is used so that the standard normal distribution has a CK of zero. The positive CK indicates a ‘peaked’ distribution and negative CK indicates a ‘flat’ distribution.

The Box–Cox transformation

Many statistical tests and intervals are based on the assumption of normality. The assumption of normality often leads to the tests that are simple, mathematically tractable, and powerful compared to the tests that do not make the normality assumption. Unfortunately, many real data sets are in fact not approximately normal. However, an appropriate transformation of a data set can often yield a data set that does follow approximately a normal function. This increases the applicability and usefulness of statistical techniques based on the normality assumption.

The Box–Cox transformation is a particularly useful family of transformations. It is defined as:

display math(3)

Where Yi is the transform data using the Box–Cox transformation method and λ is the transformation parameter. For λ = 0, the natural log of the data is taken instead of using the above formula. The transformation in Equation (3) is valid only for xi > 0 and therefore, modifications have to be made for negative observations. Box and Cox (1964) proposed the shifted power transformation with the form

display math(4)

Where δx is chosen such that xi + δx > 0. Box and Cox (1964) proposed maximum likelihood as well as Bayesian methods for the estimation of the parameter λ. In this article, a maximum likelihood estimation (MLE) is used to find the value λ, and we use an MLE routine from the standard library of the MATLAB 7.0. The parameter settings of the above maximization process are determined by the MLE routine automatically.

In order to allow readers to apply this transformation readily in other compiled languages, let us briefly outline the MLE steps. For a time series, we assume the parameter λ ranges from A and B, and then divide the range B–A by a fixed width w to n boxes. First, let the parameter λ = A, then the Equation (4) can be written as

display math(5)

Next, the geometry means of Yi (i = 1, 2, …, m) can be calculated, and the corresponding average residual sum of squares is denoted by Sj(λ)/m (j = 1, 2, …, n). Then, we increase the parameter λ by adding a box width w, and calculate the new time series Yi and the new average residual sum of squares Sj(λ)/m. Repeating the above procedure until to the parameter λ = B. Finally, we can obtain the optimum parameter λ0 which minimizes the residual sum of squares Sj(λ)/m. We test the algorithm of MLE in FORTRAN codes, in which the box width w is 0.01, the parameters A and B are −5 and 5, respectively. The estimation results of the parameter λ are almost identical to that by an MLE routine from the standard library of the MATLAB 7.0. The detailed algorithm of MLE can be found in Box and Cox (1964).

If λ > 1, the left tail of the distribution is more pronounced than the right tail, and the PDF is said to have negative skewness. If λ < 1, the reverse is true, it has positive skewness. If λ = 1, it has zero skewness.

The STP method

Guttal and Jayaprakash (2008) investigated the qualitative changes in the shape of the distribution are observed in the time series of the state variable as one approaches a threshold point. They illustrated this using the results of numerical simulations of the one-variable vegetation model, and found that the skewness is almost monotonously increase as the model parameter increases with fixed external noise distribution or the variance of the external noise increases holding the other parameters fixed. The parameter λ in the Box–Cox transformation is a qualitative and quantitative indicator of the skewness. On the basis of these characteristics of skewness and the Box–Cox transformation, a detailed description of STP has been presented as follows. For a time series, first select a sliding window, and then calculate a transformation parameter λ by using the Box–Cox transformation. Next, move the window progressively but keep the length of the window unchanged and repeat above computation until to the end of the analysed time series. If there is no abrupt PDF change of a time series, the transformation parameter by STP will have a relative small change, which is mainly caused by an insufficiency of samples size; whereas, the transformation parameter will have a corresponding change in the vicinity of abrupt change points if there is an abrupt change in PDF. If the change of the transformation parameter is significant in statistics, an abrupt PDF change in time series can be detected by using STP.

Data

In this article, the model time series generated by Logistic map (May, 1976). Mathematically, the Logistic map is written as

display math(6)

Where xn is a number between 0 and 1, representing the population at year n. x0 represents the initial population (at year 0). The parameter u is a positive number varying from 0 to 4, and represents a combined rate for reproduction and starvation. At u approximately 3.57 is the onset of chaos. Slight variations in the initial population yield dramatically different results over time, a prime characteristic of chaos. Where, x0 = 0.8, u = 3.8, and v = 1.0, respectively. Without specifically pointed out, the parameters of Equation (6) will keep unchanged in below text.

In order to show the performances of the STP in a meteorological time series which contains two artificial abrupt PDF changes and produce a thorough comparison with some current technique, 1000 simulated daily precipitation series have been generated by a stochastic weather generator (Richardson, 1981; Srikanthan and McMahon, 2001; Liao et al., 2004).

The observational data used in this article are daily precipitation records (from 1 January 1961 to 31 December 2010) provided by National Meteorological Information Center, China Meteorological Administration.

Results

Performance tests of STP on model time series

In this section, we first test the performance of STP on model time series generated by Logistic map. Two abrupt change cases have been studied. In the first case, the evolution of population dominated by classic Logistic model at first, and then there is an abrupt decrease in the parameter u at a certain time instant. In the second case, the evolution of population also dominated by classic Logistic model at first, but the evolutionary dynamic equation suddenly displays a random behaviour at a certain time instant because of an abrupt disaster such as diseases, abrupt climate change, and so on.

On the basis of the first case, the first model time series (FMTS) is generated by Equation (6) with the sample size 20 000. An abrupt parameter change occurs in time instant n = 10 001 in which there is an abrupt decrease in the parameter u from 3.8 to 3.7 (Figure 1(a)). It is easy to find that the standard deviation before the parameter u = 3.8 is obviously bigger than that after the parameter u = 3.7 in FMTS. To avoid this pitfall, FMTS is standardized based on the following standard formula respectively,

display math(7)
display math(8)
Figure 1.

The varying curve of the model time series FMTS in which there is an abrupt parameter change from 3.8 to 3.7 in time instant n = 10 001. (a) The model time series FMTS generated by Logistic map on the basis of the first abrupt change case, sample size is 20 000. (b) The normalized FMTS.

Here, math formula and math formula are the means respectively before and after the abrupt parameter change, and σ1 and σ2 is the corresponding standard deviation. Figure 1(b) shows the varying curves of the normalized FMTS. The following tests on FMTS all are based on the normalized time series. The normalized FMTS still marked as FMTS in next text. It should be noted that the standardized procedure does not change the original distribution pattern of the model time series. Obviously, it is difficult to identify the abrupt change in FMTS without using any abrupt detection method.

We calculate STP from the model time series FMTS by using the Box–Cox transformation, and find that most of the transformation parameter λ is greater than 1 for L = 100, 200, and 500, or other values. On the basis of the physical meaning of the transformation parameter, the function of the model time series FMTS has a negative skewness. The varying state for STP can be divided into two distinct phases. Generally, the transformation parameter before n < 10 001 is smaller than the transformation parameter after n > 10 000 (Figure 2). It can be found that the fluctuation range of transformation parameter has been reduced with the increase in the length of sliding windows, which is mainly because of the reduced statistical uncertainty by increasing sample size analysed. A lot of tests have been repeated for different sizes of sliding windows, the results are unique. So, the sample sizes of sliding windows have small effect on the detection results of STP.

Figure 2.

The STP results for FMTS under different sample sizes of sliding window. (a) The STP results for FMTS, the sample size of sliding window L = 100; (b) same with (a), but for L = 200; (c) same with (a), but for L = 500.

The successful application of the above single case is not enough to prove the effect of STP. In order to further test the performance of STP, we design another abrupt dynamic change case. The second model time series (SMTS) is generated by classic Logistic model and random number generator (uniform function), and the sample size of SMTS is 20 000. An abrupt dynamic change occurs in time instant n = 10 001 in which the dynamic equation abruptly change from Logistic map into random one (Figure 3(a)). Similar to pretreatment for FMTS, standardization procedure has been conducted in the SMTS, and the following tests are all based on the normalized time series. The varying curve of the normalized SMTS is shown in Figure 3(b) and the normalized SMTS still marked as SMTS.

Figure 3.

The varying curve of model time series SMTS in which there is an abrupt dynamic change in time instant n = 10 001. (a) The model time series SMTS generated based on the second abrupt change case, sample size is 20 000; (b) The normalized SMTS.

The varying curves of STP for SMTS have been respectively shown in Figure 4 for L = 100, 200, and 500. We find that the fluctuation range of the transformation parameter is relatively steady before and after n ≈ 10 001. There has an abrupt change of the transformation parameter from a relatively larger value into a relatively smaller value in n ≈ 10 001. The results obtained under three different window sizes are similar, and the time instants of abrupt change by STP are almost coincident with the real one in SMTS. That means that the STP method can be used to identify and detect an abrupt PDF change caused by abrupt dynamic change. It is easy to find that the larger is the window size, more easily to identify the abrupt change point. The similar results were observed in numerous similar experiments of model time series.

Figure 4.

The STP results for SMTS under different sample sizes of sliding window.(a) The STP results for SMTS, the sample size of sliding window L = 100; (b) same with (a), but for L = 200; (c) same with (a), but for L = 500.

The performances of the STP method in meteorological time series which contain two or more abrupt change points will be greatly conductive to the application of STP in the climate research. To deal with this problem, a set of daily precipitation data with two abrupt PDF changes occurred in the 10th–11th year and the 20th–21th year was simulated by a stochastic weather generator in this article. The lengths of the precipitation data all are 30 years. To produce a thorough investigation on the performances of the STP method in these precipitation data, 1000 series with identical statistical features (Table 1) are generated.

Table 1. Statistical features of the simulated daily precipitation data
YearMeanStandard deviationStandard errorCSCK
1–101.134.750.0467.5982.69
11–201.455.250.0517.2786.25
21–301.385.830.0578.21102.8

Two examples of the simulated daily precipitation data with two artificial abrupt PDF changes are presented in Figure 5(a) and (b). It can be seen that there are two abrupt changes in the mean of the transformation parameter (Figure 5(c) and (d)) whether for the series showed in Figure 5(a) or (b). The identification of the change point is left to the opinion of the analyst, based on a visual analysis of the lambda statistics graph. Therefore, it is not an automatic method, but only a subjective inspection tool. In order to quantitatively identify the change points in the lambda statistics graph by STP, MTT procedure (Ducré-Robitaille et el., 2003; Xiao and Li, 2007) could be applied to provide an objective identification of the change point position.

Figure 5.

Two simulated daily precipitation series and the corresponding detection results by STP. (a) and (b) are the simulated series with two artificial abrupt PDF changes: one occurs between the 10th and 11th year, and another one occurs between the 20th and 21th year, respectively; (c) the STP results for the series in (a), and the sample size of sliding window is one year; (d) Same with (c), but for (b); (e) and (f) the MTTs results of the STP series showed in (c) and (d), respectively, and the significance level is 0.01, solid line is the critical t-statistics, square dot represents t-statistics, if there is an extremum of the t-statistics exceeds the critical line, then there is an abrupt change in mean.

MTT is used to detect abrupt change through examining whether the difference between the mean values of two subsamples is significant or not. For a time series of the length n {Xi, i = 1, 2, …, n}, a certain sample is selected, by moving, as a cutting point to obtain the two subseries (x1 and x2) before and after it.

The t-test is defined as follows:

display math(9)

where math formulaand math formula are the averages over n1 and n2 points before and after a potential discontinuity, math formula, n1 and n2 are the number of points, and s1 and s2 are the variances of the two subseries. The significance level by t-test is 0.01 in this article, if there is an extremum of t-statistics exceeding the critical t-statistics, and then there is an abrupt mean change in the series. The detected results by MTT indicate that there are both two abrupt changes in the means of the lambda series showed in Figure 5(c) and (d) with a significance level α = 0.01 (Figure 5(e) and (f)). The detection results by MTT are robust for the subseries n1 = n2 = 10.

The STP detection results for 1000 daily precipitation series simulated by a stochastic weather generator are all shown in Figure 6(a), and the corresponding MTT results are presented in Figure 6(b) with the significance level α = 0.01. It is easy to find that there are almost two abrupt changes in means of these lambda series showed in Figure 6(a). The identified change point positions in these lambda series by MTT are situated in the timeinstants of the 10th, 11th, 12th, 18th, 19th, and 20th years, respectively, and the percentages of these change point positions are 97.8, 2, 0.2, 0.1, 2.1, and 97.7 (Table 2). Because there are two artificial abrupt PDF changes in those simulated daily precipitation data, namely, one occurs between the 10th and 11th year, and another one occurs between the 20th and 21th year, the detected change points in the 10th, 11th, and 12th years are very close to the first artificial one, and the 19th and 20th years are close to the second artificial one. Then, the truly percentage of the STP result is almost 100 for the first artificial change point, and 99.8 for the second artificial change point. There is only a completely failure detection for the second artificial change point in the 1000 simulated series, i.e. a detected change point excursion, namely the 18th year. So, the STP method has a relatively low rate of false detection. The results indicate that STP not only can be used to identify single abrupt PDF change in model time series, but also can be used to identify two abrupt PDF changes in simulated daily precipitation data.

Figure 6.

The STP detection results for 1000 daily precipitation series simulated by a stochastic weather generator, and the detected change point position in the STP results by MTT. (a) The STP results for 1000 simulated daily precipitation series, and the sample size of sliding window is 1 year; (b) Change point position in the STP results by MTT with subseries n1 = n2 = 10, and the significance level is all α = 0.01,

Table 2. The identified change point positions in lambda series by MTT and the percentage of these detected change point for 1000 simulated daily precipitation data
Position (year)101112181920
Number of change point978202121977
Percentage97.820.20.12.197.7

Comparing STP with other methods

Similar to the STP method, CS and CK also can be used to detect abrupt change in time series. For a time series, we select a sliding window, and calculate the CS and CK for the first subseries. Then, the CS and CK for the second time series are calculated by moving the subseries progressively and keeping the sample size of the subseries unchanged. This calculation continues until to all the subseries are covered. Thus, an abrupt PDF change in a time series may be detected on the basis of the evolution of the CS and CK.

The performances of CS and CK in model time series are shown in Figure 7. The results indicate that CS and CK both can detect the time instants of abrupt change in model time series FMTS (Figure 7(a) and (b)), and the detection results are almost independent of the size of sliding window (not shown in figures). It is obvious that the varying curve of the CS can be divided into two distinct states before and after abrupt change in model time series SMTS (Figure 7(c)). Moreover, the CS results for SMTS are robust to the different sizes of sliding windows, such as L = 100, 200, 400, 500, and so on. The CK results for SMTS are different from that of CS, namely, it is difficult to identify and detect an abrupt change by CK when the sizes of sliding windows are relatively small, such as L = 100 (not shown in figures). With the increase of sliding window sizes, we find that CK also can identify abrupt change in SMTS to a certain extent (Figure 7(d)). But it is difficult to exactly detect the time instant of abrupt change by CK. The reason for that is mainly because of the relatively small difference in the kurtosis between the Logistic map and uniform random number. With the increase of sliding window sizes, it is easy to identify an abrupt change in SMTS by using CK, such as the sample size is 1000 or 2000 (not shown in figures).

Figure 7.

The CS and CK results for model time series FMTS and SMTS respectively, and the sample size of sliding window L = 500. (a) The CS results for FMTS; (b) the CK results for FMTS; (c) the CS results for SMTS; (d) the CK results for SMTS.

The MTT detections of the CS, CK, and ApEn results for 1000 simulated daily precipitation series are presented in Figure 8, the subseries are n1 = n2 = 10 in MTT, and the significance level is α = 0.01. It is easy to find that most of the change point positions are zero in the CS and CK detected by MTT, which means CS and CK cannot identify the two abrupt PDF change in most of those 1000 simulated precipitation data. The percentage of the CS and CK results truly detected is 2.2, 0.6 for the first artificial change point, and 2.5, 1.3 for the second artificial change point, respectively (Figure 8(a) and (b)). There is only twice for CS which can simultaneously detect these two artificial change points in 1000 simulated series, and that is no one for CK. The percentage of the ApEn results truly detected is 57.2 for the first artificial change point, and 79.3 for the second artificial change point. The percentage of which can simultaneously detect these two artificial change point in 1000 simulated series is 52.7 for ApEn (Figure 8(c)).

Figure 8.

The MTT detections of the CS, CK, and ApEn results for 1000 daily precipitation series simulated by a stochastic weather generator, the subseries n1 = n2 = 10 in MTT with the significance level α = 0.01. (a) Change point position in the 1000 CS series detected by MTT, the position zero represents no change point detected; (b) same with (a), but for CK; (c) same with (a), but for ApEn.

Application of STP and several other methods in observational data

In this section, we examine the performances of STP in daily precipitation records. Six observational stations in China have been randomly selected as the examples of its application. It can be seen that all the STP results for six daily precipitation records display almost identical varying trend (Figure 9). Similar to the detection results on model time series, the varying curve of transformation parameter also can be divided into two distinct states before and after a time around 1979 based on the fluctuation range of the transformation parameters. The results indicate that the transformation parameters undergo a rapid shift from one stable state to an alternative stable state with obviously different means between 1979 and 1980, which can be showed by the MTT results for lambda series (Figure 10). Whether the time scale for the subseries is 5 or 10 years, the abrupt means changes between 1979 and 1980 in six lambda series (As shown in Figure 9) can be detected by MTT. In the six cases, the significance level is α = 0.01. So, there is an abrupt increase in the mean of the transformation parameters, which means an abrupt climate change occurring between 1979 and 1980 for the selected six observational stations, and the time instants detected by STP is same as the results detected by ApEn (Wang et. al., 2008). All of the transformation parameters are far less than zero (Figure 9), it indicate that these daily precipitation records have positive skewness. Based on the varying characteristics of these Lambda series, it can be concluded that there is a reduction for the number of rainy days with relatively smaller precipitation and an increase for the number of rainy days with relatively bigger ones, or that there is a reduction for the number of dry days and an increase for the number of rainy days. In order to make clear of this, average annual number of dry days and rainy days of different precipitation for the six selected daily precipitation records have been presented in the Table 3. It can be found that there is an obvious reduction in the number of dry days before and after 1979, resulting in the abrupt PDF change between1979 and 1980. When the sample size of sliding window increases to two years, the time instant of abrupt precipitation change detected by STP is still between 1979 and 1980 (not shown in figures), which indicates that the sample sizes of sliding windows have no significant effect on the identification of change point.

Figure 9.

The STP results for daily precipitation records (from 1 January 1961 to 31 December 2010) in six observational stations selected randomly, and the sample size of sliding window is 1 year. (a) Tonghe observational station in Heilongjiang Province; (b) Hailun observational station in Heilongjiang Province; (c) Qiqihaer observational station in Heilongjiang Province; (d) Zhalantun observational station in Inner Mongolia; (e) Sunwu observational station in Heilongjiang Province; (f) Nenjiang observational station in Heilongjiang Province.

Figure 10.

MTT detection for the STP results in Figure 9. (a) Subseries is n1 = n2 = 5, the significance level is 0.01, solid line is the critical t-statistics, square dot represents t-statistics; (b) same with (a), but for n1 = n2 = 10.

Table 3. Average annual number of dry days and rainy days of different precipitation for six selected daily precipitation records
YearPrecipitation (Pr) (mm)TongheHailunQiqihaerZhalantunSunwuNenjiang
1961–1979Pr = 0284.21291.68307.37300.63289.79296.84
 0 < Pr ≤ 1064.1157.9546.7951.4760.4755.47
 10 < Pr ≤ 3014.6812.329.0510.4212.2610.89
 30 < Pr ≤ 501.742.471.421.842.321.47
 50 < Pr ≤ 1000.470.740.580.840.370.53
 100 < Pr00.050000
1980–2010Pr = 0250.06262.48281.22279251.13265.97
 0 < Pr ≤ 1097.8486.7470.7771.7498.2284.58
 10 < Pr ≤ 3014.6112.7410.4811.1313.1012.35
 30 < Pr ≤ 502.132.612.032.712.292.03
 50 < Pr ≤ 1000.580.680.680.580.450.29
 100 < Pr0.0300.060.090.060.03

The ApEn results for daily precipitation records in six selected observational stations have been shown in Figure 11, and the sample size of sliding window is 1 year. Similar to the STP detection results, the varying curve of ApEn can be divided into two distinct states about before and after 1979 based on the fluctuation range of ApEn. Namely, there is a shift in the means of ApEn. In order to quantitatively identify the change points, MTT can be used to detect abrupt means change in these ApEn series. Figure 12 shows the MTT results of these ApEn series. When the subseries is n1 = n2 = 5, the change points detected by MTT are 1979 in Tonghe, 1980 in Sunwu and Nengjiang, respectively, and none of t-statistics exceeds the critical threshold of t-statistics in other three ApEn series (Figure 12(a)). When the subseries increase to n1 = n2 = 10, the change points are 1978 in Tonghe, 1979 in Hailun and Sunwu, 1981 in Qiqihaer and Nengjiang, respectively, and there is no change point can be detected in Zhanlantun with significance level α = 0.01 (Figure 12(b)). However, the change point can also be identified as the subseries size varied, for example, the detected change point is 1979 in Zhalantun whether the subseries is n1 = n2 = 11 or n1 = n2 = 12.

Figure 11.

The ApEn results for daily precipitation records (from 1 January 1961 to 31 December 2010) in six observational stations, and the sample size of sliding window is one year. (a) The ApEn results for Tonghe observational station in Heilongjiang Province; (b) Hailun observational station in Heilongjiang Province; (c) Qiqihaer observational station in Heilongjiang Province; (d) Zhalantun observational station in Inner Mongolia; (e) Sunwu observational station in Heilongjiang Province; (f) Nenjiang observational station in Heilongjiang Province.

Figure 12.

MTT detection of the ApEn results in (a). The subseries is n1 = n2 = 5, the significance level is 0.01, solid line is the critical t-statistics, square dot represents t-statistics; (b) same with (a), but for n1 = n2 = 10.

In order to provide a more robust demonstration of the advantages of the lambda parameter, it is important to include additional statistics in the comparison (e.g. percentiles, annual maxima, parameters of the distribution function, etc.) for daily precipitation data. The results of the annual maxima and percentiles of daily precipitation records in six selected observational stations have been calculated. However, it is difficult to identify the abrupt PDF change between 1979 and 1980 by means of analysing the annual maxima and percentiles of daily precipitation records. The CS and CK methods also cannot distinguish this abrupt change.

Discussion and conclusions

On the basis of changing skewness of a time series, a novel method STP for abrupt change detection is developed in this article. The tests on model time series and 1000 daily precipitation series simulated by a stochastic weather generator indicate the ability of STP to identify abrupt PDF changes in time series. It is found that the sample sizes of sliding windows have some influence on the value of the transformation parameter (Lambda) by Box–Cox transformation, but do not affect the varying trend of the Lambda series, and cannot significantly affect the identification of change point. It must be pointed that STP only presents a visual analysis of the lambda statistics graph, and a traditional change point detection method (e.g. MTT) have to be applied to the lambda series to provide an objective identification of the change point position.

Comparing STP with other methods including CS, CK, and ApEn, we find that the percentages of the CS and CK results truly detected do not exceed 2.5 for the 1000 simulated series. The percentage of ApEn is 52.7 for which can simultaneously detect these two artificial change points in the 1000 simulated series, and it shows the ability of ApEn to identify an abrupt PDF change to a certain extent. However, the truly percentage of the STP result is almost 100 for the first artificial change point, and 99.8 for the second artificial change point. Obviously, the STP method outperforms the CS, CK, and ApEn in the simulated time series.

The applications of STP in daily precipitation data show that there is an abrupt precipitation change occurring between 1979 and 1980 in China. ApEn can identify an abrupt change in daily precipitation records, but it is necessary to select relatively larger subseries when MTT is applied to the ApEn series to identify the change point, for example, the subseries n1 = n2 = 11 or n1 = n2 = 12. However, whatever the time scales of the subseries is 5 or 10 years, MTT both can detect abrupt means changes between 1979 and 1980 in six lambda series. So, the performance of the STP method is better than that of ApEn in detecting an abrupt PDF in daily precipitation data. The CS, CK, and statistics including percentiles and annual maxima cannot be used to identify the abrupt PDF change in daily precipitation data. So, the performance of the present method is better than that of these methods including CS, CK, ApEn, and other two statistics (percentiles and annual maxima).

The extraction of the parameter Lambda by Box–Cox transformation projects the shape of an empirical PDF smoothly onto a single number. In the simulated time series, it must be pointed out that a small change in lambda even in time series with identical skewness is mainly caused by an insufficiency of samples size. With the increase of sample size, the change in lambda can be largely reduced for identical skewness. In fact, the lambda is no universal method which can detect all kinds of abrupt change in time series. The present method mainly deals with an abrupt PDF change. Indeed, if the mean and the variance change, without changing the distribution type, a change of lambda occurs if the window used for the computation of lambda contains the abrupt change in mean and/or variance. Whether this is large enough to be detected in a real case depends on the significant of the difference in means and/or variances. In this case, the abrupt change detection methods in mean and/or variance should be selected to, such as MTT, multiple liner regression method (Vincent, 1998), SNHT (Alexandersson, 1986), the two-phase regression (Reeves et al., 2007) can be used to detect abrupt mean changes, and Downton–Katz test (Karl and Williams, 1987) can be used to detect abrupt changes in variance. Furthermore, similar to the idea from Guttal and Jayaprakash (2008), changing skewness obtained by Box–Cox transformation could be used as an early warning signal as well. This work could be carried out in the future study.

Acknowledgements

The authors thank anonymous reviewers and editors for beneficial and helpful suggestions for this manuscript. This research was jointly supported by National Basic Research Program of China (973 Program) (2012CB955301 and 2012CB955902), the National Natural Science Foundation of China (41275074, 40905034, 41175067, and 41005041), and the Special Scientific Research Fund of Meteorological Public Welfare Profession of China (GYHY201106015 and GYHY201106016).

Ancillary