Robust Discrimination between Long-Range Dependence and a Change in Mean

In this paper we introduce a robust to outliers Wilcoxon change-point testing procedure, for distinguishing between short-range dependent time series with a change in mean at unknown time and stationary long-range dependent time series. We establish the asymptotic distribution of the test statistic under the null hypothesis for $L_1$ near epoch dependent processes and show its consistency under the alternative. The Wilcoxon-type testing procedure similarly as the CUSUM-type testing procedure of Berkes, Horv\'ath, Kokoszka and Shao (2006), requires estimation of the location of a possible change-point, and then using pre- and post-break subsamples to discriminate between short and long-range dependence. A simulation study examines the empirical size and power of the Wilcoxon-type testing procedure in standard cases and with disturbances by outliers. It shows that in standard cases the Wilcoxon-type testing procedure behaves equally well as the CUSUM-type testing procedure but outperforms it in presence of outliers.


Introduction
Since the pioneering work of Hurst (1951), Mandelbrot and Van Ness (1968) and Mandelbrot and Wallis (1968), the phenomenon of long-range dependence or Hust effect has been observed in many data sets, e.g. in hydrology, geophysics and economics. A lively debate also rages over the observed Hurst effect is due to long-range dependence or nonstationarity. Bhattacharya et al. (1983) showed that the Hurst effect detected by R/S statistics can be explained not only by long-range dependence, but by presence of a deterministic trend in short-range dependent data. Giraitis et al. (2001) showed that some modified R/S statistics reject the hypothesis of short-range dependence for long-range dependence but also for short-range dependent data in presence of a trend or change-points. The phenomenon of spurious long-range dependence has also been discussed in many other papers, see e.g. Granger and Hyung (2004). A first attempt for distinguishing between long-range dependence and short-range dependence with a monotonic trend was made by Künsch (1986), who showed that the Date: April 4, 2018. * Fakultät für Mathematik, Ruhr-Universität Bochum, 44780 Bochum, Germany periodogram in these two cases behaves differently. A test allowing to distinguish between a stationary long-range dependent process and short-range dependent process with a change in mean was introduced by Berkes et al. (2006) and is based on the CUSUM statistic It is well known that the CUSUM statistic is sensitive to outliers since it sums up the observations. In this paper we introduce a new robust to outliers testing procedure, which is based on the Wilcoxon change-point test statistic (1 {X i ≤X j } − 1/2), m ≤ k ≤ n.
(2) Dehling et al. (2013bDehling et al. ( , 2015 used this test statistic for testing for changes in the mean of long-range dependent and short-range dependent processes respectively. In both papers the simulation studies point out that the Wilcoxon test statistic (2) is more robust to outliers than the CUSUM statistic (1). Recently, Gerstenberger (2018) showed that Wilcoxon-type change-point location estimator for a change in mean of short-range dependent data based on test statistic (2) is also robust against outliers.
The new Wilcoxon-type testing procedure suggested in this paper is based on the idea of Berkes et al. (2006). Firstly, given a sample X 1 , . . . , X n , one estimates the location k of a possible change in mean. Then the test statistic is defined as the maximum of the Wilcoxon change-point statistic (2) applied to the subsamples X 1 , . . . , Xk and Xk +1 , . . . , X n .

Wilcoxon-type testing procedure
Assuming that sample X 1 , . . . , X n is given, we want to test the hypothesis H 0 : X i = Y i + µ i , i = 1, . . . , n is generated by a stationary zero mean short-range dependent process (Y j ) and has a change in mean µ 1 = . . . = µ k * = µ k * +1 = . . . = µ n at unknown time k * , against the alternative H 1 : X 1 , . . . , X n is a sample from a stationary long-range dependent process.
To construct the test statistic, first, we estimate the location k * of a change-point by a Wilcoxon-type change-point location estimator k = min k : max 1≤l<n W 1,n (l) = W 1,n (k) , which is defined as the smallest k for which |W 1,n (k)| attains its maximum.
Finally, we define the test statistic M n = max{T n,1 , T n,2 }.
We show that T (X 1 , . . . , X n ) allows to discriminate whether the sample has been generated by a short or long-range dependent stationary process. Hence, if we split the sample at timek, which is close to the true change-point k * , sincek/k * → p 1 asymptotically we can assume that X 1 , . . . , Xk and Xk +1 , . . . , X n are samples from a stationary sequence with a constant mean, see Lemma 4.1 in Section 4. Subsequently, M n can be used to test if the samples X 1 , . . . , Xk and Xk +1 , . . . , X n have been generated by a short-range or long-range dependent stationary process.
The outline of the paper is as follows. Section 2 specifies assumptions allowing to establish asymptotic distribution of M n under H 0 and consistency under H 1 . Section 3 compares finite sample performance of the Wilcoxon-type and the CUSUM-type testing procedure. All proofs are given in Section 4.

Definitions, assumptions and main results
In this section we present main assumptions, definitions and main results. Throughout the paper, C denotes a generic non-negative constant, which may vary from time to time. The notation a n ∼ b n means that sequences a n and b n of real numbers have property a n /b n → c, as n → ∞, where c = 0. d − → and → p stand for convergence in distribution and probability, respectively. By d = we denote equality in distribution. g ∞ = sup x |g(x)| denotes the supremum norm of a function g.

Null hypothesis: short-range dependence with a change in mean
Under the null hypothesis we assume the random variables X 1 , . . . , X n follow the changepoint model where k * denotes the unknown location of the change-point in the mean and (Y j ) is a zero-mean stationary short-range dependent process.
To cover a wide range of processes, we assume that the underlying process (Y j ) can be written as Y j = f (Z j , Z j−1 , Z j−2 , . . .), j ∈ Z, where f : R Z → R is a measurable function, and (Z j ) is an absolutely regular (weakly dependent) process.
Definition 2.1. A stationary process (Z j ) is called absolutely regular (or β-mixing) if as k → ∞, where G m k is the σ-field generated by random variables Z k , . . . , Z m , k < m. Absolute regularity or β-mixing implies the weaker property of α-mixing, see e.g. Bradley (2007). In addition, we will assume that (Y j ) satisfies near epoch dependence condition, i.e. Y j depends on the near past of (Z j ).
where G k −k is the σ-field generated by random variables Z −k , . . . , Z k and a k → 0 as k → ∞.
Notice that a linear process or AR process might not be absolutely regular, but it would be L 1 near epoch dependent; see Example 2.1 in Gerstenberger (2018) for linear processes and Hansen (1991) for GARCH(1,1) processes. More examples of L 1 NED processes can be found in Borovkova et al. (2001), who also discuss more general L r NED processes, r ≥ 1.
We need further additional assumptions on the distribution function F of Y 1 , the mixing coefficients β k in (8) and a k in (9). Assumption 1. The process (Y j ) in (7) is L 1 NED on some absolutely regular process (Z j ) with mixing coefficients β k and approximation constants a k such that Moreover, Y 1 has a continuous distribution function F with bounded second derivative, and variables Y 1 − Y k , k ≥ 1 satisfy for all x ≤ y, where C does not depend on k and x, y.
We suppose that both, the unknown change-point k * and the magnitude of change ∆ n in (7), depend on the sample size n.
b) The magnitude of change ∆ n in (7) depends on n, and is such that An important step of our testing procedure is the estimation of the location k * of the change-point in mean. Gerstenberger (2018) showed that under Assumptions 1 and 2 the Wilcoxon-type change-point location estimatork in (3) is consistent, Alternative: long-range dependence Under alternative H 1 , the sample X 1 , . . . , X n is generated by a stationary long-range dependent process: where µ is the unknown mean and (ξ j ) is a stationary long memory Gaussian process with E(ξ 1 ) = 0, Var(ξ 1 ) = 1 and (non-summable) auto-covariances γ k = Cov(ξ 1 , ξ 1+k ) ∼ k 2d−1 c 0 , where c 0 > 0 and d ∈ (0, 1/2). Furthermore, we assume that G : R → R is a measurable, strictly monotone function such that E(G(ξ 1 )) = 0.

Main results
The following theorem derives the limit distribution of the test procedure under the null hypothesis H 0 . Below, B(t) = W (t)−tW (1) denotes a standard Brownian bridge, where W (t) is a standard Brownian motion.
Theorem 2.1. Let (X j ) follow the model in (7). Then, under Assumptions 1 and 2, where B (1) and B (2) are two independent Brownian bridges, and F denotes the distribution function of Y 1 .
Since the limit distribution of M n depends on the long-run variance σ 2 , to calculate the critical values for the test, we need to estimate the long-run variance; see Section 3.
We will compare performance of our test with the CUSUM-type test by Berkes et al. (2006) defined asM C,n = max{T C (X 1 , . . . , Xk C ),T C (Xk C +1 , . . . , X n )}, whereT is based on the CUSUM statistic C 1,n (k) in (1).k C = min k : max 1≤l≤n C 1,n (l) = C 1,n (k) is a CUSUM-type estimator of k * andŝ 2 n is a long-run variance estimator of (21). Berkes et al. (2006) showed that under their assumptions under the null hypothesis,M C,n d − → Z.
The next theorem establishes consistency of the test M n , i.e. that the test will detect long-range dependence with probability tending to 1.
Theorem 2.2. Let (X j ) be as in (13). Then, as n → ∞, Proofs of Theorem 2.1 and 2.2 are given in Section 4.

Simulation Study
In this simulation study we compare the finite sample performance (size and power) of the Wilcoxon-type testing procedure M n in (6) with the CUSUM-type testing procedurẽ M C,n of Berkes et al. (2006), given in (16).

Simulation set up
To calculate the empirical size we generate the sample of random variables X 1 , . . . , X n using the change-point model where Y i = ρY i−1 + i is an AR(1) process with ρ = 0.4 and standard normal innovations i . We set k * = [nθ], θ = 0.25, 0.5, 0.75 and ∆ = 0.5, 1, 2.

Critical values
To analyse the empirical size and power, we need to know the critical values for the tests M n andM C,n . By Theorem 2.1, under the null hypothesis, Hence, ifσ 2 (X 1 , . . . , X k ) is a consistent estimator for the long-run variance σ 2 based on the sample X 1 , . . . , X k , then The same asymptotics holds for the CUSUM test:M C,n d − → Z, see Corollary 2.1 of Berkes et al. (2006). Thus, the critical value c α for a given significance level α is obtained by Since B (1) and B (2) are independent Brownian bridges, (19) reduces to where sup 0≤t≤1 B (1) (t) has the well-known Kolmogorov-Smirnov distribution, and its quantiles can be found in statistical tables. For α = 5% (20) implies c 5% = 1.478.

Estimation of long-run variance
The selection of a long-run variance estimateσ inM n has a strong impact on the size and power properties of the tests in finite samples.
To estimate the long-run variance σ 2 (16), Berkes et al. (2006) suggested to use the Bartlett estimator whereX n = n −1 n i=1 X i , with the bandwidth q (n) = C log 10 (n). Table 1 reports the empirical size (for θ = 0.5, ∆ = 1) and power (for d = 0.4) in % at significance level 5% ofM C,n test, withŝ 2 n as in (21) computed with bandwidth 15 log 10 (n). It shows thatM C,n with Bartlett estimatorŝ 2 n is too conservative and has low power against the alternative, which has also been pointed out by Baek and Pipiras (2012) and Preuß et al. (2017). In our simulation study to improve the performance ofM C,n test we proceed as follows.
To estimate σ 2 C , instead ofŝ 2 n , we use the non-overlapping subsampling estimator of σ 2 C by Carlstein (1986), with block length l n , which yields better size and power balance forM C,n , as seen from Tables 2 and 4. This estimator has also been used by Dehling et al. (2015) for a CUSUM-type test for changes in the mean of a short-range dependent process. In turn, for our testM n to estimate σ we shall use the Carlstein type estimator for long-run variance proposed by Dehling et al. (2013a), where F n (x) = n −1 n i=1 1 {X i ≤x} . Note thatσ W estimates σ, not σ 2 . The Carlstein estimatorσ 2 C as well as the estimatorσ W (23) are subsampling type estimators and require to choose a suitable block length l n . The choice of l n is widely discussed in the literature. For AR(1)-processes Carlstein (1986) suggests to use where ρ denotes the autocorrelation coefficient at lag 1. In our simulation study we use this block length with ρ estimated by the sample autocorrelation coefficientρ since it yields good results for the empirical size and power.
In the presence of outliers, we need to robustify further the choice of the block length.
Since the sample autocorrelation is highly sensitive to outliers, we use in (24) a robust estimator of ρ proposed by Ma and Genton (2000), , which is the k = n 2 /4-th order statistic of the n 2 interpoint distances, is a robust scale estimator introduced by Rousseeuw and Croux  (1) andρ Q (1) based on 10,000 replications. X i is generated by an AR(1) process with outliers, i ∼ N (0, 1), ρ = 0.4 and n = 500.
(1993), u = (X 1 , . . . , X n−1 ) and v = (X 2 , . . . , X n ). Figure 1 contains the histogram of estimatesρ andρ Q based on 10,000 replications of sample X 1 , . . . , X 500 with outliers, generated by an AR(1) model with ρ = 0.4 and i.i.d. standard normal innovations. For a further discussion on robust estimation of autocorrelation function see Dürre et al. (2015). Table 2 reports the empirical size at the 5% significance level based on 10,000 replications ofM C,n andM n tests, for the model (17) without outliers. The empirical size ofM n and M C,n slightly exceed the 5% level for large sample size n for θ = 0.5 and ∆ = 0.5, 1, 2. The size of the tests is more distorted if the change-point is located close to the beginning or end of the sample, i.e. for θ = 0.25, 0.75. We also consider the situation of no change, i.e. ∆ = 0, for which the empirical size of both testing procedures is close to the nominal size. Empirical sizes ofM n andM C,n are comparable in the absence of outliers. Table 3 reports the empirical size ofM n andM C,n in presence of outliers. While test M n is robust to the outliers, the testM C,n becomes too conservative. Tables 4 and 5 report the empirical power of testM C,n andM n , for X i in (18) without outliers and with outliers, respectively. Table 4 shows that the power of both tests increases with increasing sample size and dependence parameter d (except power ofM n for n = 200, d = 0.4). It shows that in absence of outliersM n andM C,n have similar power properties. Table 5 shows that the empirical size ofM n is practically not affected by the outliers, whereasM C,n suffers a loss of power.

Simulation results
Since the nominator of the CUSUM-type test is based on partial sums, outliers in the data have strong impact on the test statisticM C,n and hence, one should expect that it over rejects the true hypothesis H 0 . Since presence of outliers increases the longrun variance estimate in (22)  6.30 6.12 6.15 5.58 6.45 6.29 6.01 5.46 Table 2: Empirical size ofM C,n andM n tests at the 5% significance level, 10,000 replications. X i follows the model (17) Table 3: Empirical size ofM C,n andM n tests at the 5% significance level, 10,000 replications. X i follows the model (17) Table 5: Empirical power ofM C,n andM n tests at the 5% significance level, 10,000 replications. X i follows the model (18) with outliers.
reduction of size and a loss in power.
In general, we conclude that Wilcoxon testM n allows discrimination between long-range dependence and short-range dependence with a change in mean that is robust to outliers. In absence of outliers it performs equally well as CUSUM testM C,n , but outperforms it in presence of outliers.

Proofs
This section contains the proofs of Theorem 2.1, Theorem 2.2 and auxiliary lemmas.

Proof of Theorem 2.1
Suppose that X 1 , . . . , X n follow the model in (7) and Assumptions 1 and 2 are satisfied. Throughout the proofs without loss of generality, we assume µ = 0 and ∆ n > 0.
Proof. To prove Lemma 4.2 we will use the idea of the proof of Theorem 3 of Dehling et al. (2015).
Therefore, by the continuity of Brownian motion W n and using the continuous mapping theorem, W n k /n − W n (θ) = o P (1). Hence, since Brownian motions have stationary increments and W n (0) = 0. Finally, since Brownian motions are scale invariant, i.e. θ −1/2 W n (t) d = W n (t/θ), and The increments of Brownian motions are independent, thus B (1) and B (2) are independent. This proves the lemma.

Concept of 1-continuity
Before we state the auxiliary results, we recall the concept of 1-continuity, which was introduced by Borovkova et al. (2001).
To study the asymptotic behaviour of the Wilcoxon test we need to show that the function h(x, y) = 1 {x≤y} is 1-continuous. Then the variables (h(Y i , Y j )) retain some characteristics of the variables (Y i , Y j ).
Definition 4.1. (Borovkova et al. (2001)) We say that the kernel h (x, y) is 1-continuous with respect to a distribution of a stationary process (Y j ) if there exists a function φ( ), ≥ 0 such that φ ( ) → 0, → 0, and for all > 0 and k ≥ 1 where Y 2 is an independent copy of Y 1 and Y 1 is any random variable that has the same distribution as Y 1 .
For a univariate function g(x), the 1-continuity property is defined as follows.
Definition 4.2. The function g (x) is 1-continuous with respect to a distribution of a stationary process (Y j ) if there exists a function φ( ), ≥ 0 such that φ ( ) → 0, → 0, and for all > 0 where Y 1 is any random variable that has the same distribution as Y 1 .
Remark 4.1. Let (Y j ) be a stationary process, Y 1 has continuous distribution function F with bounded second derivative and the variables Y 1 − Y k , k ≥ 1 satisfy (11).
ii) Lemma 2.15 of Borovkova et al. (2001) yields that if a general function h(x, y) satisfies (32) and (33) with some function φ( ) then E h(x, Y 2 ), where Y 2 is an independent copy of Y 1 , satisfies the condition in (34) with the same function φ( ).

Auxiliary results
The following lemma yields maximum inequalities used in the proofs of Lemma 4.1 and Lemma 4.2.
The following lemma derives the functional central limit theorem for partial sum processes of (h 1 (Y j )).
Corollary 4.1. Suppose that the assumptions of Lemma 4.2 hold. Then, where W (t) is a Brownian motion and σ is given in (15). Proof. Wooldridge and White (1988) in Corollary 3.2 established a functional central limit theorem for partial sum process k i=1Ỹ i , k ≥ 1, for a process (Ỹ j ) which is L 2 NED on a strongly mixing process (Z j ). Therefore, Corollary 4.1 is proved, by showing that (h 1 (Y j )) is L 2 NED on a strongly mixing process. By Proposition 2.11 of Borovkova et al. (2001), if (Y j ) is L 1 NED on a stationary absolutely regular process (Z j ) with approximation constants a k and g(x) is 1-continuous with function φ, then (g(Y j )) is also L 1 NED on (Z j ) with approximation constants a The last inequality holds, because by L 1 NED of (h 1 (Y j )), E |h 1 (Y 1 ) − E(h 1 (Y 1 )|G k −k )| ≤ a k . Therefore, the process (h 1 (Y j )) is also L 2 NED on (Z j ) with approximation constant a k = Ca 1/2 k . Moreover, absolute regularity of (Z j ) implies the process (Z j ) is also strong mixing. Assumption (10) yields a k = O(k −1/2 ) and β k = O(k −2 ). Thus, (h 1 (Y j )) satisfies the conditions of Corollary 3.2 of Wooldridge and White (1988) where W (t) is a Brownian motion and σ 2 = ∞ k=−∞ Cov(F (Y 1 ), F (Y k )).
Next we show that the contribution of g(x, y) of the Hoeffding decomposition (31) is negligible.
In the following we state auxiliary results to deal with the terms appearing in the proof of Lemma 4.1. Note that the termsŨ 1,k (k * ) andŨk +1,n (k * ) can be written as a second order U-statistic with kernel function h n (x, y) = 1 {y<x≤y+∆n} . Applying Hoeffding's decomposition of U-statistics (Hoeffding (1948)) toŨ a,b (k), decomposes the kernel function h n into the sum h n (x, y) = Θ ∆n + h 1,n (x) + h 2,n (y) + g n (x, y) , where Y 1 and Y 2 are independent copies of Y 1 .
Lemma 4.5. Suppose that the assumptions of Lemma 4.1 hold. Then, and where Θ ∆n = E 1 {Y 2 <Y 1 ≤Y 2 +∆n} and Y 1 and Y 2 are independent copies of Y 1 .
In the proof of Lemma 4.6 we apply the empirical process non-central limit theorem of Dehling and Taqqu (1989), which uses the Hermite expansion of 1 {G(ξ)≤x} − F (x). Before proceeding to the proof, we will have a brief look at this concept.
Hermite process: The limit process Z m (t) in Theorem 1.1 of Dehling and Taqqu (1989) is called m-th order Hermite process and is defined e.g. in Taqqu (1978). If m = 1, Z 1 (t) is the standard Gaussian fractional Brownian motion.
Proof of Lemma 4.6. Dehling et al. (2013b) have shown in their Theorem 1 that Since F is a continuous distribution function, R F (x)dF (x) = 1/2. Denote F k (x) = almost surely, uniformly in 0 < s ≤ t < 1.