Robustness Assessment of the RSD t‐Test for Detecting Trend Turning in a Time Series

Trend turning (or trend change) is a type of structural change that is common in climate data, and methods for detecting it in time series with multiple turning‐points need to be developed. A recently developed method for this, the running slope difference (RSD) t‐test, examines trend differences in sub‐series of the sample time series to identify the trend turning‐points. In this paper, we use Monte Carlo simulation to evaluate this method's detection ability. Evaluation results show the method to be an effective tool for detecting trend turning time series and identify three major advantages of the RSD t‐test: ability to detect multiple turning‐points, capacity to detect all three types of trend turning, and great performance of reducing false alarm rate.

Trend turning is a commonly observed phenomenon in climate data; it can be found in parameters such as Surface Air Temperature (SAT, Xing et al., 2017;Yao et al., 2017;Yu & Lin, 2018), precipitation (Alexander et al., 2006), tropical cyclone frequency (Cheung et al., 2015), and haze days (Zhao et al., 2016). Some studies in ecoclimate have also reported trend turning (Liang et al., 2015;Peng et al., 2011;Piao et al., 2011). Detection of climate trend turning is beneficial to reveal the mechanism of climate change and the internal relationship between different climate variables. A well-known example of this is the trend turning in the global SAT, which is characterized by a multidecadal warming-warming slow down-warming pattern since 20th century (Li et al., 2013;Xing et al., 2017;Yao et al., 2017;Yu & Lin, 2018). Studies have shown that the North Atlantic Oscillation (NAO) has strong impact on the SAT multidecadal variability (Li et al., 2013;Li et al., 2019;Sun et al., 2015;Wang et al., 2017). The positive NAO forces the enhancing of the Atlantic meridional overturning circulation (AMOC), resulting in a positive phase of Atlantic Multidecadal Oscillation (AMO), which can affect the SAT not only in pan-Atlantic regions but also in the northern hemisphere and even globally (Kravtsov & Spannagle, 2008;Sun et al., 2015;Sun et al., 2017;Sun et al., 2019a;Sun et al., 2019b;Tung & Zhou, 2013;Wyatt & Curry, 2014;Wyatt & Peters, 2012).
Currently, methods for detecting trend turning require further development, as existing methods are often restricted to a single turning-point (Andrews, 1993;Chu & White, 1992;Perron & Zhu, 2005;Ploberger & Kramer, 1996;Toms & Lesperance, 2003), and methods for multiple turning-points are relatively few and inadequate. Optimal piecewise linear regression (OPLR; Liu et al., 2010;Tomé & Miranda, 2004) and running trend test (RTT; Meehl et al., 2011;Thanasis et al., 2011) are the two commonly used methods for detecting multiple turning-points time series. However, both OPLR and RTT methods struggle to detect same sign but different trend magnitude (TD) type of turning (Zuo et al., 2019).
Trend turning can be divided into three main types according to the different combinations of the trends before and after turning: Trend Reversal (TR) type is turning from a significant positive trend to a significant negative trend (or vice versa), Trend/No-trend (TN) type involves turning from a significant linear trend to no significant trend (or vice versa), and same sign but Different Trend magnitude (TD) type turning occurs between two significant trends (both either positive or negative) of different magnitude. TD type of turning is as important as the TN and TR turning in climate research; it is quite common in observational data which closely related to climate change (e.g., global atmospheric CO 2 concentration, Zuo et al., 2019; and arctic seasonal sea ice extent, Meehl et al., 2018). However, the existing OPLR and RTT method cannot effectively detect TD type of trend turning. OPLR method computes the best piecewise regression combination of the sample time series that minimize the residual sum under the condition that two nearby sub-series trends have opposite sign (Tomé & Miranda, 2004). This condition of opposite sign will reject trend turnings in which both sub-series have the same sign (i.e., TD type of trend turning). RTT method detects trend turnings making use of trend significant test on running windows along the sample time series, if the running test results (there are three possible outcomes: significant positive trend, significant negative trend, and no significant trend) changed after a certain point, the RTT will return this point as the trend turning-point (Thanasis et al., 2011). Since both sides of TD trend turning are significant positive trend (or significant negative trend), RTT cannot effectively detect this type of trend turning as well.
We recently proposed the running slope difference (RSD) t-test as a novel method for detecting trend turning (Zuo et al., 2019). It can detect multiple turning-points from all three turning types using a principle different from either OPLR or RTT. It conducts statistical t-test of slope differences (of two nearby sub-series) to identify the turning-points, as the sub-series slope difference between both sides of the turning-point is the basic characteristic of all three turning types.
In the previous study which based on an idealized time series case and several climate time series cases, we reported that the RSD t-test performs better than either OPLR or RTT in identifying TD trend turning, and its probability of triggering false alarm is also lower compared to the two existing methods (Zuo et al., 2019). However, there is a lack of robustness assessment of the RSD t-test for detecting trend turning because the former research achievements are based on case studies. In addition, the possible factors that influence the detection ability of the RSD t-test have not been discussed either. In this article, we will assess the detection ability of the RSD t-test and study various factors that influence its performance in several Monte Carlo simulation experiments (Tomé & Miranda, 2005;Yue et al., 2002). These influencing factors include trend turning amplitude, noise amplitude, and interference from nearby turning-points in multiple turning-points cases (details please see Table 1). This paper will provide researchers meaningful references for choosing trend turning detection methods.
The remainder of this article is structured as follows. Section 2 describes the RSD t-test method and the evaluation of its detection ability. Its ability to avoid false acceptance and false rejection errors is discussed in sections 3 and 4, respectively. Section 5 discusses a simple and useful pretreatment for small sample series. Finally, the conclusions will be presented in section 6.

The RSD t-Test Method
The statistical model of trend turning by Miao (1988) suggests that its basic feature is the significant trend difference between two sub-series around the turning-point (β 1 ≠ β 2 ). Given this, the RSD t-test is designed using a statistical t-test of slope differences to identify the turning-points.
Let Y : {y i = β Y i+α Y +ε i | 1 ≤ i ≤ n} and Z : {z j = β Z j+α Z +ε j | 1 ≤ j ≤ m} be two sample sub-series. Assume that the error terms ε i and ε j are normally distributed independent random variables with zero mean and same variance σ 2 . The form of this statistical t-test for slope differences is as follows: where b β Y and b β Z are the least squares linear slope of Y and Z, respectively,b y i andb z j are the least squares regressions of y i and z j , respectively, and N ¼ ∑ are two known constants of n and m, respectively. The null hypothesis of no slope difference between Y and Z is rejected at the significance level α if |t slope | ≥ t f,1 − α/2 , where f = n+m − 4 is the degrees of freedom of the statistic t slope .
If the above Y and Z are two adjacent sub-series of the sample time series, then we can use the statistic t slope to detect whether there is a trend turn between Y and Z. Let L be the sample time series and τ be the chosen Figure 1. Trend turning detection results (purple line) of two regional surface air temperature (SAT) time series (solid, black line) using the RSD t-test method (detection parameter timescale as 15 years, confidence level at 95%). The two regions are Europe (40-60°N, 0-40°E) and southeast tropical Pacific (10-30°S, 100-150°W). "Tr" and "D" represent the least squares trend value and duration of each sub-series, respectively. Note that the trend value unit here is°C per decade and the duration unit is year.
sliding window. The trend turning-point of L is identified from the slope difference between the sub-series (for a length of sliding window τ) before and after each point using the t slope test.

Evaluation of Detection Ability
Monte Carlo simulation experiments are used to evaluate the detection ability of the RSD t-test method. A large number of sample series are first generated by Monte Carlo simulations (these sample series are hereinafter called SS series, Synthetic Sample series); they are pre-designed with or without trend turning-points. All these SS series are then analyzed by the RSD t-test, and the detection ability of the test method can then be evaluated given its detection results for identifying the pre-designed turning-points or returned false turning.
To evaluate the method's ability to avoid false turning (i.e., to avoid the false acceptance type of statistical error), SS series are generated in the simulation experiments as independent, normally distributed series with zero mean (white noise series). A white noise series has neither significant overall trend nor trend turning; therefore, any turning-point returned by the detection method will be a false positive. The ability of the method to avoid false acceptance is defined by Figure 3. SS series (dashed lines) of the three types of trend turning used in MC2 with their detection results (solid, black, two-section line; τ = 15). "No noise" represents the trend turning true value series, and "noise SD" is the standard deviation of the superposed white noise. Purple lines are the SS series after de-noising pretreatment using a 5-point Gaussian filter.
where N is the total number of SS series in the simulation experiments and N fa is the number of correctly detected SS series (i.e., those not identified as have false turning). . Ability of the RSD t-test to avoid false rejection errors (A fr1 and A fr2 ) in MC2 as a function of the standard deviation of the superposed white noise using different sizes of sliding window, τ. A fr1 represents the ability of the method correctly to locate the turning-point (with an error between detection result and its true value of less than 10% of the length of the two sub-series beside the turning-point, only bottom bar). A fr2 represents the ability of the method correctly to judge the existence of a turning-point, but not necessarily accurately locate it (error less than 25% of the length of two sub-series, bottom bar plus top bar).

Earth and Space Science
To evaluate the method's ability to identify turning-points (i.e., to avoid the false rejection type of statistical error), SS series are generated as combinations of two parts: One part is a true value series, and the other is a superimposed white noise series. The true value series has two linear sections either side of a pre-designed trend turning-point, and the superimposed white noise series is an independent normally distributed series with zero mean and has the same sample size as the true value series. Detection ability here is evaluated through whether the method identified the pre-designed turning-points. As the statistical properties of the true value series may have changed due to the superimposed noise, small errors between the detection results and the pre-designed true value should be allowed. Here, let l 1 and l 2 be the lengths of the two sub-series beside the turning-point. If the error between the detection result and its true value is less than 10% of min (l 1 , l 2 ), the detection result can be considered correct. Correct judgment of the existence of turning-points is also important (allowing for possible incorrect location of the turning-point); therefore, we apply another index that allows an error no more than 25% of min(l 1 , l 2 ). The ability of the RSD t-test to avoid false rejection is defined by both Figure 5. Ability of the RSD t-test to estimate the partial trend of a sub-series (A slope ) in MC2 as a function of the standard deviation of the superposed white noise using different sizes of sliding window, τ. The lines represent the middle value of three results obtained using different sliding windows τ, and shading indicates the range of change.
where N fr1 is the number of SS series whose error between detection result and true value is less than 10% of min(l 1 , l 2 ) and N fr2 is the number with error less than 25%. In summary, A fr1 is the proportion that the method correctly located the turning-points in the Carlo simulation experiment, and A fr2 (A fr2 ≥A fr1 ) is the proportion that the method correctly determined the existence of turning-points (allowing incorrectly locating of the turning-point).
For sample series containing trend turning, estimating the trend of each partial segment is as important as locating the turning-point. Another parameter, A slope , for evaluating the ability of the method to estimate the partial trend of each sub-series is defined by where N slope is the number of SS series whose pre-designed partial trends have no significant difference (at the 99% confidence level) from those estimated. As accurate estimation of the partial trend of each sub-series relies on locating the turning-point, only SS series with correctly identified turning-points are included. Figure 6. Abilities A fr1 and A fr2 in MC3 as a function of the standard deviation of the superposed white noise using different sizes of sliding window, τ. The conditions of MC3 are the same as those for MC2, except that the trend turning amplitude is 1.0 (double that in MC2, left) and 1.5 (triple that in MC2, right).

False Acceptance
Possible factors affecting the detection ability regarding false acceptance are the size of the sample series and the magnitude of white noise (SD noise ). The first Monte Carlo simulation experiment (MC1) here generates 10,000 independent white noise series for each sample size (45, 91, and 135), with nine different levels of SD noise from 0.4 to 3.6 (in 0.4 increments).
Two detection parameters used in the RSD t-test (being confidence level, P, and sliding window, τ) are set before trend turning detection. The confidence level, P, is the threshold of the slope difference test during trend turning detection: Commonly used values are 90%, 95%, and 99%. A large P value (near 100%) will increase the method's ability to avoid false acceptance but increase the probability of false rejection. In order to evaluate both types of statistical error appropriately, P is selected as 95% here. The other parameter, sliding window, τ, should be determined in practice according to the timescale of the specific study. Here, we aim to show the detection ability of the RSD t-test in an idealized situation with appropriate parameters; therefore, for SS series of sample size 45, 91, and 135, τ is selected to be around 15, 30, and 45, respectively. As this study also intends to show the influence of small changes of the sliding window on the detection ability, each SS series in the experiment is detected three times using three close, but distinct, τ values.

10.1029/2019EA001042
Earth and Space Science Figure 2 shows the results of the detection ability A fa in experiment MC1. The top figure is the result for SS series of sample size 45: In each case A fa fluctuates around 94%, and neither varying the white noise level nor slightly changing τ greatly affects it (impact less than ±0.50%). These results indicate that the RSD t-test has a great performance of avoiding false acceptance, which is a major advantage of this method.
The detection ability A fa for SS series of sample sizes 91 and 135 is given respectively in the middle and bottom figures of Figure 2. The A fa values are within a narrow range between 92% and 94%, thus showing that the sample size of the SS series has no notable effect on detection ability. The next section assesses the significant influence of SS series sample size on the abilities A fr1 and A fr2 and discusses the reasons for the dissimilarity to the results seen here for ability A fa .

False Rejection
There are more possible factors affecting false rejection besides the size of the sample series and the magnitude of white noise, for example, trend turning amplitude |β 1 − β 2 | and interference from nearby turning-points (in cases with multiple turning-points).
This section first studies the influence of the white noise amplitude, SD noise , the trend turning amplitude, |β 1 − β 2 |, and the SS series sample size on the ability of the RSD t-test. The experiments (MC2-MC4) involve several single turning-point simulations. The following experiment (MC5) examines interference from nearby turning-points in multiple turning-point simulations.

Single Turning-Point
The false rejection tests consider all three main types of turning-point, with each series having a single turning-point of type TR, TN, or TD. The SS series in Monte Carlo simulation experiments MC2-MC4 are generated by superimposing white noise series on the true value series. Take the sample in MC2 (Figure 3) as an example. Its true value series is a two-section line with sample size 45; it contains one pre-designed trend turning-point at the 23rd data point (Figure 3, "no noise"). The trend turning amplitude |β 1 − β 2 | for all three types of turning-point is set as 0.5, and each section has a set trend value: TR (β 1 = − 0.25,β 2 = 0.25), TN (β 1 = 0,β 2 = 0.50), and TD (β 1 = 0.25,β 2 = 0.75). The superimposed white noise series in MC2 is of equal sample size to the true value series; it is an independent normally distributed series with zero mean and standard deviation varying from 0.4 to 3.6 (in nine level at 0.4 increments). For each different turning type and noise amplitude level, MC2 generates 10,000 independent white noise series, similar to in MC1.
In experiment MC2, the RSD t-test is applied to detect trend turning in each series using the same confidence level and sliding window as in the MC1 cases with sample size 45 (i.e., P as 95% and τ is selected to be around 15). Figure 4 shows results for detection abilities A fr1 and A fr2 in experiment MC2. As expected, the RSD t-test detects TD type turning with similar ability to types TR and TN, indicating its capacity to detect all three types. Both detection abilities, A fr1 and A fr2 , for all three turning types approach 100% at low noise but decrease with increasing noise. Even at the highest level of white noise considered here, the method shows a 65% rate of correctly judging the presence of a turning-point, and a 40% rate of correctly locating it. The minor variations of sliding window size, τ, have little effect on detection capability: for A fr1 the influence ranges from approximately 0.20% to 2.5% as the noise level rises.
For sample series with trend turning-point, if the detection method can correctly locate the turning-point, it will have a high probability of correctly estimating the trend value of each sub-series. Figure 5 shows results for sub-series trend estimation, A slope , in MC2. All estimations of partial trends are above 90% (most great than 95%), indicating that the trend of a sub-series can be well estimated if the turning-point has been correctly located. These results further emphasize the importance of correctly identifying the turning-point during trend turning detection.
Simulation experiment MC3 (Figure 6) considers the influence of different trend turning amplitudes on the detection ability. Its experiment design is similar to that in MC2, except that the trend turning amplitude is 1.0 or 1.5 (compared with 0.5 in MC2). This increase improves the detection ability: Doubling the amplitude to 1.0 increases A fr1 by about 25% at a high level of white noise, and tripling it to 1.5 sees an increase of about 35%. Increasing the trend turning amplitude also reduces the influence of small changes of τ, from about 2.5% to 1.5% at a high level of noise for A fr1 .
Simulation experiment MC4 (Figure 7) is also based on MC2 and considers the influence of the SS series sample size on detection ability. Series lengths are 91 and 135 (compared with 45 in MC2), and the turning-point is moved to the 46th and 68th data points, respectively. However, the trend values β 1 and β 2 before and after the turning-point are both the same as those in MC2. The sliding window, τ, in MC4 doubled and tripled with the series size. Detection results of MC4 (Figure 7) show that the detection abilities A fr1 and A fr2 increase to nearly 75% and 97%, respectively, when the sample series size is doubled, and to about 90% and 100%, respectively, when it is tripled.
Changing the SS series sample size clearly influences the abilities A fr1 and A fr2 .
In comparison with a small sample size, a larger sample size will have more points included during t slope testing. In this case, if the test sample contains a trend turning-point, the difference in slopes either side of the turning-point will be more significant than for a small sample size. However, if the test sample contains no trend turning, changes in sample size will have only limited influence on the detection results. This feature can also be seen from the formula of statistic t slope . From formula (3), the numerator of statistic t slope is defined by the square root of C multiplied by C is the constant related to the sample size of the two sample sub-series Y and Z. A larger sample size will increase the statistic t slope when the slope difference is not near zero and will eventually improve the RSD t-test when identifying the turning-point. This is why changing of SS series sample size has only limited influence on the detection ability A fa but notably affects the abilities A fr1 and A fr2 .

Multiple Turning-Points
This section discusses samples with multiple turning-points (experiment MC5). Like the previous experiments, the Monte Carlo simulation experiments use SS series generated by superimposing white noise series on the true value series. The data series comprise four linear sections (containing three turning-points) of the true value series of sample size 95 (Figure 8, "no noise"). To make the experiments comprehensive, each of the three turning-point types is included: The first turning-point (TP1, at data point 27) is type TR; the second turning-point (TP2, at data point 48) is type TD; and the third turning-point (TP3, at data point 76) is type TN. The trend values of the four sub-series are set as 0.3, −0.2, −0.6, and 0. The superimposed white noise series are generated independently with nine different standard deviation levels from 0.4 to 3.6 (each with 10,000 independent samples). The detection parameters P and τ in MC5 are the same as those in MC2. Figure 9 shows results for A fr1 and A fr2 for all three turning-points in MC5. Comparison with Figure 4 shows that the detection ability of each turning-point in series with multiple turning-points is similar to that for series with only one. In MC5, the RSD t-test demonstrates the ability to detect all three types of trend turning, with both detection abilities A fr1 and A fr2 for all three turning-points being near 95-100% at a low white noise and decreasing as the noise level rises.
The detection ability for each turning-point in a multiple turning-point sample is not greatly affected by the other nearby turning-points, as shown more clearly in Table 2, which lists the detection ability A fr1 for TP2 and TP3 under the condition of TP1 being or not being correctly located (τ = 15). Whether TP1 is correctly located barely influences the detection of TP3. For the nearer turning-point TP2, although the detection of TP1 does influence its detection, failure to detect TP1 does not necessarily lead to failed detection of TP2.
As the detection of each turning-point in a multiple turning-point sample is largely independent, the RSD t-test shows similar abilities to detect single or multiple turning-points. The results in Figure 10 for the detection ability A slope for each sub-series in experiment MC5 show that the trend estimation for a sub-series will likely be close to the true value if the turning-point has been correctly located. Note. N() is the number of SS series (synthetic sample series) that meet the conditions in the bracket, with TP1, TP2, and TP3 representing correct identification of each turning-point, and an overbar representing incorrect identification. "Correct" here means that the error between the returned turning-point and its true value is less than 10% of the length of the two sub-series beside the turning-point. Figure 10. Ability A slope in MC5 as a function of the standard deviation of the superposed white noise using different sizes of sliding window, τ. The lines represent the middle value of three results obtained using different sliding windows τ, and shading indicates the range of change.

De-noising Pretreatment
As discussed in the previous section, a large sample size will lead to better detection results. However, small samples (such as MC2) are unavoidable in practice. Therefore, a de-noising pretreatment is introduced here to improve the detection ability of the RSD t-test for small sample sizes.
Climate variability is frequently characterized by the multi-timescales (Ji et al., 2014;Wang et al., 2009); small timescale disturbance is a kind of noise to the long timescale trend. Using filter to de-noise the sample time series, which is a very common technique during trend analysis in climatology research (Xie et al., 2019;Xing et al., 2017), can better highlight the trend of sample time series. As for the detection of trend turning, the results in the previous experiments also showed that the excessive random noise is usually the reason to the failed trend turning detection. Thus, a small filter may be able to improve the detection ability of the RSD t-test. In addition, a recent RTT detection method proposed by Thanasis et al. (2011) which used a small filter to remove glitches while retaining the signature of the data demonstrates the effectiveness of using filter in practice as well. Therefore, a pretreatment step using a 5-point Gaussian filter was added to each SS series in MC2 (single turning-point) and MC5 (multiple turning-points). The left part of Figure 11 shows results for the detection abilities A fr1 and A fr2 in MC2 after the pretreatment step. Pretreatment improved both A fr1 (by approximately 5%) and A fr2 (by 10%) at a high white noise level. The right part of Figure 11 shows that the percentage of no turning-point detection (i.e., failure of trend turning detection) in this experiment clearly drops after de-noising. These results indicate that adding the de-noise pretreatment step can effectively improve the detection ability of the RSD t-test. Figure 12 shows results for A fr1 and A fr2 in MC5 after pretreatment. The situation for experiment MC5 is similar to that for MC2 and further supports the effectiveness of de-noising pretreatment. An important question is whether this pretreatment will invalidate the relative independence of the ability to detect each turning-point individually in series with multiple turning-points. This is assessed by considering A fr1 for TP2 and TP3 under the condition of TP1 being or not being correctly located by the method after de-noising pretreatment. The results in Table 3 show that

Conclusions
This paper evaluates the detection ability of the RSD t-test in several Monte Carlo simulation experiments. The tested ability criteria are avoiding false acceptance (A fa ), correctly locating a turning-point (A fr1 ), correctly judging the presence of a turning-point (A fr2 ), and estimating the trend of each partial segment (A slope ).
The results show that the RSD t-test has a great performance of reducing false acceptances regardless of the sample series size or magnitude of white noise in the sample series. It also demonstrates the ability to detect all three types of trend turning. The method's detection ability increases when the trend turning amplitude increases and decreases when the white noise magnitude increases. The method's ability to detect each turning-point in the samples including multiple ones is relatively independent from whether other nearby turning-points are detected; this means the method applies to the detection of single turning-point the similar way from that of multiple turning-point samples. Its ability to estimate trends in sub-series is generally accurate if the turning-point has been correctly located.
Overall, the RSD t-test is an effective tool for detecting trend turning series, showing three major advantages: It has a high probability of avoiding false acceptances, competently detects all three types of trend turnings, and is applicable to samples with multiple turning-points.
While detection using a large sample size will lead to better detection results, working with small samples is unavoidable in practice. Therefore, we propose a de-noising pretreatment based on Gauss filtering for detection cases using small samples. This relatively simple technique can improve the detection ability for small sample detection.
Finally, this paper discusses the most common random perturbation (i.e., the normal distribution). Although stochastic perturbation based on the normal distribution has wide applicability and is worth examining, it is not necessarily universally applicable. When using the RSD t-test to detecting trend turning series, researchers should also consider the actual stochastic perturbations in their specific data.