A new approach for open-end sequential change point monitoring

We propose a new sequential monitoring scheme for changes in the parameters of a multivariate time series. In contrast to procedures proposed in the literature which compare an estimator from the training sample with an estimator calculated from the remaining data, we suggest to divide the sample at each time point after the training sample. Estimators from the sample before and after all separation points are then continuously compared calculating a maximum of norms of their differences. For open-end scenarios our approach yields an asymptotic level $\alpha$ procedure, which is consistent under the alternative of a change in the parameter.


Introduction
Nowadays, nearly all fields of applications require sophisticated statistical modelling and statistical inference to draw scientific conclusions from the observed data. In many cases data is time dependent and the involved model parameters or the model itself may not be necessarily stable. In such situations it is of particular importance to detect changes in the processed data as soon as possible and to adapt the statistical analysis accordingly.
These changes are usually called change points or structural breaks in the literature. Due to its universality, methods for change point analysis have a vast field of possible applications -ranging from natural sciences [for example biology and meteorology] to humanities [economics, finance, social sciences]. Since the seminal papers of Page (1954Page ( , 1955) the problem of detecting change points in time series has received substantial attention in the statistical literature. The contributions to this field can be roughly divided into the areas of retrospective and sequential change point analysis.
In the retrospective case, historical data sets are examined with the aim to test for changes and identify their position within the data. In this setup, the data is assumed to be completely available before the statistical analysis is started (a-posteriori analysis). A comprehensive overview of retrospective change point analysis can be found in Aue and Horváth (2013). In many practical applications, however, data arrives consecutively and breaks can occur at any new data point. In such cases the statistical analysis for changes in the processed data has to start immediately with the target to detect changes as soon as possible.
This field of statistics is called sequential change point detection [sometimes also: online change point detection].
In the major part of the 20th century the problem of sequential change point detection was tackled using procedures [mostly called control charts, see Lai (1995Lai ( , 2001 for comprehensive reviews], which are optimized to have a minimal detection delay but do not control the probability of a false alarm (type I error). A new paradigm was then introduced by Chu et al. (1996), who use initial data sets and therefrom employ invariance principles to also control the type I error. The methods developed under this paradigm [see below] can again be subdivided into closed-end and open-end approaches. In closed-end scenarios monitoring is stopped at a fixed pre-defined point of time, while in open-end scenarios monitoring can -in principle -continue forever if no change point is detected.
In the paper at hand we develop a new approach for sequential change point detection in an open-end scenario. To be more precise let {X t } t∈Z denote a d-dimensional time series and let F t be the distribution function of the random variable X t at time t. We are studying monitoring procedures for detecting changes of a parameter θ t = θ(F t ), where θ = θ(F ) is a p-dimensional parameter of a distribution function F on R d (such as the mean, variance, correlation etc.). In particular we will develop a decision rule for the hypothesis of a constant parameter, that is H 0 : θ 1 = · · · = θ m = θ m+1 = θ m+2 = . . . , (1.1) against the alternative that the parameter changes (once) at some time m + k with k ≥ 1, that is H 1 : ∃k ∈ N : θ 1 = · · · = θ m+k −1 = θ m+k = θ m+k +1 = . . . . (1.2) In this setup, which was originally introduced by Chu et al. (1996), the first m observations are assumed to be stable and will serve as an initial training set. The problem of sequential change point detection in the hypotheses paradigm as pictured above has received substantial interest in the literature. Since the seminal paper of Chu et al. (1996) several authors have worked in this area. Aue et al. (2006), Aue et al. (2009), Fremdt (2014b and Aue et al. (2014) developed methodology for detecting changes in the coefficients of a linear model, while Wied and Galeano (2013) and Pape et al. (2016) considered sequential monitoring schemes for changes in special functionals such as the correlation or variance. A Mosum-approach was employed by Leisch et al. (2000), Horváth et al. (2008) or Chen and Tian (2010) to monitor the mean and linear models, respectively. Recently, Hoga (2017) used an 1 -norm to detect changes in the mean and variance of a multivariate time series, Kirch and Weber (2018) defined a unifying framework for detecting changes in different parameters with the help of several statistics and Otto and Breitung (2019) considered a Backward CUSUM, which monitors changes based on recursive residuals in a linear model.
A helpful but not exhaustive overview of different sequential procedures can be found in Section 1, in particular Table 1, of Anatolyev and Kosenok (2018). A common feature of all procedures in the cited literature consists in the comparison of estimators from different subsamples of the data. To be precise, let X 1 , . . . , X m denote an initial training sample and X 1 , . . . , X m , . . . , X m+k be the available data at time m + k. Several authors propose to investigate the differencesθ (in dependence of k), whereθ j i denotes the estimator of the parameter from the sample X i , . . . , X j . In the sequential change point literature monitoring schemes based on the differences (1.3) are usually called (ordinary) CUSUM procedures and have been considered by Horváth et al. (2004), Aue et al. (2006Aue et al. ( , 2009Aue et al. ( , 2014, Schmitz and Steinebach (2010) or Hoga (2017). Other authors suggest using a function of the differences θ m 1 −θ m+k m+j+1 j=0,...,k−1 (1.4) (in dependence of k) and the corresponding procedures are usually called Page-CUSUM tests [see Fremdt (2014b), Aue et al. (2015), or Kirch and Weber (2018) among others]. As an alternative we propose -following ideas of Dette and Gösmann (2018) -a monitoring scheme based on a function of the differences θ m+j 1 −θ m+k m+j+1 j=0,...,k−1 . (1.5) The intuitive advantage of (1.5) over (1.3) is the screening for all possible positions of the change point, which takes into account that the change point not necessarily comes with observation X m+1 and soθ m+k m+1 maybe 'corrupted' by pre-change observations. This issue is also partially addressed by (1.4), where different positions are examined and compared with the estimator of the parameter from the training sample. We will demonstrate in Section 4 that sequential monitoring schemes based on the differences (1.5) yield a substantial improvement in power compared to the commonly used methods based on (1.3) and (1.4).
To avoid misunderstandings, the reader should note that a (total) comparison based on differences of the form (1.5), is typically also called a CUSUM-approach in the retrospective change point analysis [see Aue and Horváth (2013) for a comprehensive overview of (retrospective) change point analysis].
The present paper is devoted to a rigorous statistical analysis of a sequential monitoring based on the differences defined in (1.5) in the context of an open-end scenario. In Section 2 we introduce the new procedure and develop a corresponding asymptotic theory to obtain critical values such that monitoring can be performed at a controlled type I error. The theory is broadly applicable to detect changes in a general parameter θ of a multivariate time series. As all monitoring schemes in this context the method depends on a threshold function and we also discuss the choice of this function. In particular we establish an interesting result regarding this choice and establish a connection to corresponding ideas made by Horváth et al. (2004) and Fremdt (2014b), which may also be of interest in closedend scenarios. In Section 3 we discuss several special cases and demonstrate that the new methodology is applicable to detect changes in the mean and the parameters of a linear model. Finally, we present a small simulation study in Section 4, where we compare our approach to those developed by Horváth et al. (2004) andFremdt (2014b). In particular we demonstrate that the monitoring scheme based on the differences (1.5) yields a test with a controlled type I error and a smaller type II error than the procedures in the cited references.

Asymptotic properties
Throughout this paper let F denote a d-dimensional distribution function and θ = θ(F ) a p-dimensional parameter of F . We will denote bŷ the empirical distribution function of observations X i , . . . , X j (here the inequality is understood component-wise) and consider the canonical estimatorθ j i = θ(F j i ) of the parameter θ from the sample X i , . . . , X j .
To test the hypotheses (1.1) and (1.2) in the described online setting in a open-end scenario we propose a monitoring scheme defined bŷ where the statisticΣ denotes an estimator of the long-run variance matrix Σ (defined in Assumption 2.2) and the symbol v 2 A = v Av denotes a weighted norm of the vector v induced by the positive definite matrix A. The monitoring is then performed as follows.
With observation X m+k arriving, one computesÊ m (k) and compares it to an appropriate threshold function, which is also called weighting function in the literature, say w. If monitoring is stopped and the null hypothesis (1.1) is rejected in favor of the alternative (1.2). If the inequality (2.3) does not hold, monitoring is continued with the next observation X m+k+1 . We will derive the limiting distribution of sup ∞ k=1Ê m (k)/w(k/m) in Theorem 2.6 below to determine the constant c α involved in (2.3), such that the test keeps a nominal level of α (asymptotically as m → ∞).
Remark 2.1 The statistic (2.2) is related to a detection scheme, which was recently proposed by Dette and Gösmann (2018) for the closed-end case, where monitoring ends with observation mT , for some T ∈ N. These authors considered the statistiĉ and showed where W denotes a p-dimensional Brownian motion and throughout this paper the symbol D =⇒ denotes weak convergence (in the space under consideration). However, this statistic cannot be considered in an open-end scenario for the typical threshold functions considered in the literature satisfying lim sup t→∞ t/w(t) < ∞ (in this case the limit on the right-hand side of (2.5) would be almost surely infinite for T = ∞). As threshold functions satisfying lim sup t→∞ t 2 /w(t) < ∞ will cause a loss in power as demonstrated in an unpublished simulation study, we propose to replace the factor (m + j) in (2.4) by the size of the initial sample m, which leads to the monitoring scheme defined by (2.2).
To discuss the asymptotic properties of our approach, we require the following notation.
where δ x (z) = I{x ≤ z} is the distribution function of the Dirac measure at the point x ∈ R d and the inequality in the indicator is again understood component-wise. We will focus on functionals that allow for an asymptotic linearization in terms of the influence function, that isθ with asymptotically negligible remainder terms R i,j . Finally, for the sake of readability we introduce the following abbreviation where F t is again the distribution function of X t . Under the null hypothesis (1.1) we will impose the following assumptions on the underlying time series.
Assumption 2.2 (Approximation) The time series {X t } t∈Z is (strictly) stationary, such that F t = F for all t ∈ Z. Further, for each m ∈ N there exist two independent, pdimensional standard Brownian motions W m,1 and W m,2 , such that for some positive constant ξ < 1/2 the following approximations hold as m → ∞, where Σ = t∈Z Cov IF 0 , IF t ∈ R p×p denotes the long-run variance matrix of the process IF t t∈Z , which we assume to exist and to be non-singular.

Assumption 2.3 (Threshold function)
The threshold function w : R ≥0 → R + is uniformly continuous, has a positive lower bound, say w > 0, and satisfies lim sup Assumption 2.4 (Linearization) The remainder terms in the linearization (2.7) satisfy as k → ∞ with probability one.
Remark 2.5 Let us give a brief explanation on the assumption stated above.
(i) Assumption 2.2 is a uniform invariance principle and frequently used in the (sequential) change point literature [see for example Aue et al. (2006) or Fremdt (2014b) among others]. In the one-dimensional case, Assumption (2.8) was verified by Aue and Horváth (2004) for different classes of time series including GARCH or strongly mixing processes and can be easily extended to the multivariate case considered here.
Assumption 2.2 is stronger than a functional central limit theorem, which is usually sufficient to work in a closed-end setup [see for example Wied and Galeano (2013), Pape et al. (2016) or Dette and Gösmann (2018)] (ii) Assumption 2.3 gives necessary restrictions on the feasible set of threshold functions, which are required for the existence of a (weak) limit derived in Theorem 2.6. It is also worth mentioning that the assumption of a lower bounded threshold can be relaxed to lim t→0 t γ w(t) = 0 for a constant 0 ≤ γ < 1/2. In this case, the assumption for the remainders in (2.10) has to be replaced by For the sake of a transparent presentation we use the assumption of a lower bound here as this also simplifies the technical arguments in the proofs later on.
(iii) Assumption 2.4 is crucial for the proof of our main theorem and directly implies Note that in the location model θ(F ) = E F [X] we have R i,j = 0 and (2.10) obviously holds. In general however, Assumption 2.4 is highly non-trivial and crucially depends on the structure of the functional θ and the time series {X t } t∈Z . For a comprehensive discussion the reader is referred to Dette and Gösmann (2018), where the estimate (2.10) has been verified in probability for different functionals including quantiles and variance.
The following result is the main theorem of this section.
Theorem 2.6 Assume that the null hypothesis (1.1) and Assumptions 2.2 -2.4 hold. If furtherΣ m is a consistent and non-singular estimator of the long-run variance matrix Σ it holds that where W is a p-dimensional Brownian motion with independent components.
For the sake of completeness, the reader should note that due to Assumption 2.3 the asymptotic behaviour of the threshold guarantees that the random variable on the righthand side of (2.11) is finite (with probability one).
In light of Theorem 2.6 one can choose a constant c(α), such that (2.12) The following corollary then states that our approach leads to a level α detection scheme.
Corollary 2.7 Grant the Assumptions of Theorem 2.6 and further let c(α) satisfy inequality The limit distribution obtained in Theorem 2.6 strongly depends on the considered threshold. A special family of thresholds that has received considerable attention in the literature [see Horváth et al. (2004), Fremdt (2014b), Kirch and Weber (2018) among many others] is given by where the cutoff ε > 0 can be chosen arbitrary small in applications and only serves to reduce the assumptions and technical arguments in the proof [see also Wied and Galeano (2013)]. With these functions the limit distribution in (2.11), with the threshold function as the denominator, can be simplified to an expression that is more easily tractable via simulations. Straightforward calculations yield that Assumption 2.3 is satisfied by the function w γ and the limit distribution in Theorem 2.6 simplifies as follows.
Corollary 2.8 For a p-dimensional Brownian motion W with independent components it holds that For the investigation of the consistency of the monitoring scheme (2.2) we require the following assumption.
Assumption 2.9 Under the alternative H 1 defined in (1.2) let Further assume that k * is independent of m and that the process {IF t } is of the following order before and after the change, respectively, (2.14) Additionally assume that the remainders defined in (2.4) satisfy Remark 2.10 The assumptions stated above are substantially weaker than those used to investigate the asymptotic properties of sup ∞ k=1Ê m (k)/w( k m ) under the null hypothesis. Basically, we only assume reasonable behavior of the time series before and after the change point and can drop the uniform approximation in Assumption 2.2 and the uniform negligibility of the remainders in Assumption 2.4. It is easy to see, that (2.14) is already satisfied if both, the phases before and after the change fulfill a central limit theorem. Finally, it is worth mentioning that one can also derive the subsequent results when replacing the 2m by cm for an arbitrary constant c > 1, however -for the sake of better readability -we will work with this (minimally stricter) assumption.
The next Theorem yields consistency under the alternative hypothesis.
Theorem 2.11 Assume that the alternative hypothesis (1.2) and Assumptions 2.3 and 2.9 hold. If furtherΣ m is a consistent and non-singular estimator of the long-run variance for any constant c ∈ R.

Some specific change point problems
In this section we briefly illustrate how the theory developed in Section 2 can be employed to construct monitoring schemes for a specific parameter of the distribution function. For the sake of brevity we restrict ourselves to the mean and the parameters in a linear model.
Other examples such as the variance or quantiles can be found in Dette and Gösmann (2018).

Changes in the mean
The sequential detection of changes in the mean has been extensively discussed in the literature [see Horváth (2004), Fremdt (2014b) or Hoga (2017) among many others].
Is is easy to verify (and well known), that the influence function for the mean is given by and Assumption 2.4 and the second part of Assumption 2.9 are obviously satisfied in this case since we have R i,j = 0 for all i, j. For the remaining assumptions in Section 2 it now suffices that the centered time series X t − E[X t ] t∈Z fulfills Assumption 2.2, which also implies the remaining part of Assumption 2.9 [see also the discussion in Remark 2.5].
In this situation both, Theorem 2.6 and Theorem 2.11 are valid provided that the chosen threshold fulfills Assumption 2.3.

Changes in linear models
Consider the time-dependent linear model where the random variables {P t } t∈N are the R p -valued predictors, β t ∈ R p is a p-dimensional parameter and {ε t } t∈N is a centered random sequence independent of {P t } t∈N . The identification of changes in the vector of parameters in the linear model represents the prototype problem in sequential change point detection as it has been extensively studied in the literature [see Chu et al. (1996) This situation is covered by the general theory developed in Section 2 and 3. To be precise be the the joint vectors of predictor and response with (joint) distribution function F t , such that the marginal distributions of Y t and P t are given by respectively, where we will assume that the predictor sequence is stationary, that is F t,P = F P . In a first step we will consider the case, where the moment matrix is known (we will discuss later on why this assumption is non-restrictive) and non-singular.
In this setup, the parameter β t can be represented as a functional of the distribution function which leads to the estimatorsβ which is the influence function (for β) in the linear model stated above [see for example Hampel et al. (1986) for a comprehensive discussion of influence functions]. In the following, we will use the notation IF t = IF X t , F t , β again. Note that which directly gives E[IF t ] = 0. Under the null hypothesis the random sequence (X t ) t∈N is stationary and the linearization defined in (2.7) simplifies tô Consequently, the remainders in (2.7) vanish and Assumption 2.4 is obviously satisfied.
Next, note that the long-run variance matrix is given by an estimator for Γ. Observing (3.5) it is now easy to see that in the resulting statisticÊ m the matrix M cancels out, that iŝ and therefore does not depend on the matrix M . We therefore obtain the following result, which describes the asymptotic properties of the monitoring scheme based on the statistiĉ E m for a change in the parameter in the linear regression model (3.1). The proof is a direct consequence of Theorem 2.6 and 2.11. (ii) Under the alternative hypothesis H 1 assume that {IF t } t∈N fulfills (2.14) of Assumption 2.9. Then the monitoring based on the statisticÊ in (3.7) is consistent.

Finite sample properties
In this section we investigate the finite sample properties of our monitoring procedure and demonstrate its superiority with respect to the available methodology. We choose the following two statistics as our benchmark (4.1) The procedure based onQ was originally proposed by Horváth et al. (2004) for detecting changes in the parameters of linear models and since then reconsidered for example by Aue et al. (2012), Wied and Galeano (2013) and Pape et al. (2016) for the detection of changes in the CAPM-model, correlation and variances, respectively. A statistic of the typeP was recently proposed by Fremdt (2014b) and has been already reconsidered by Kirch and Weber (2018). In the simulation study we will restrict ourselves to the commonly used class of threshold functions w γ defined in (2.13), where we set the involved, technical constant Under the assumptions made in Section 2, it can be shown by similar arguments as given where W denotes a p-dimensional Brownian motion. For detailed proofs (under slightly different assumptions) of (4.2) and (4.3), the reader is relegated to Horváth et al. (2004) and Fremdt (2014b), where procedures of these types are considered in the special case of a linear model.
Recall the notation of L 1,γ introduced in Corollary 2.8. By (4.2), (4.3) and Corollary 2.7 the necessary critical values for the proceduresÊ,Q andP combined with threshold w γ are given as the (1 − α)-quantiles of the distributions L 1,γ , L 2,γ and L 3,γ , respectively and can easily be obtained by Monte Carlo simulations. The quantiles are listed in Table 1 for dimensions p = 1 and p = 2 and have been calculated by 10000 runs simulating the corresponding distributions where the underlying Brownian motions have been approximated on a grid of 5000 points. In Sections 4.1 and 4.2 below, we will examine the finite sample properties of the three statistics for the detection of changes in the mean and in the regres-sion coefficients of a linear model, respectively. All subsequent results presented in these sections are based on 1000 independent simulation runs and a fixed test level of α = 0.05.  Table 1: (1-α)-quantiles of the distributions L 1,γ , L 2,γ and L 3,γ for different choices of γ and different dimensions of the parameter. The cutoff constant was set to ε = 0. The results for L 2,γ and L 3,γ for p = 1 are taken from Fremdt (2014b) and Horváth et al. (2004), respectively.

Changes in the mean
In this section we will compare the finite sample properties of the procedures based on the statisticsÊ,P andQ for changes in the mean as outlaid in Section 3.1. Here we test the null hypothesis of no change which is given by while the alternative, that the parameter µ t changes beyond the initial data set, is defined as We will consider two different data generating models, a white noise process and an autoregressive process given by For the AR(1)-process specified in model (M2), we create a burn-in sample of 100 observations in the first place. To simulate the alternative hypotheses, changes in the mean are added to the data, that is denotes the desired change amount. For the necessary covariance estimation we employ the well known quadratic spectral estimator [see (Andrews, 1991)] with its implementation in the R-package 'sandwich' [see Zeileis (2004)]. To take into account the possible appearance of changes only the initial stable segment X 1 , . . . , X m is used for this estimate, which is standard in the literature [see for example Horváth et al. (2004), Wied and Galeano (2013), or Dette and Gösmann (2018) among many others].
In Table 2  This effect may be caused by a less precise estimation of the long-run variance for small sample sizes. Accordingly, this effect is weaker for the case m = 100.

Changes in linear models
In this section we present some simulation results for the detection of changes in the linear model (3.1). We aim to detect changes in the unknown parameter vector β t ∈ R p by testing the null hypothesis against the alternative that the parameter β t changes beyond the initial data set, that is To be precise, we consider the model (3.1) with p = 2 and the following choice of predictors where Z t denotes an i.i.d. sequence of N (0, 0.5) random variables in both models. The parameter vector is fixed at β t = (1, 1) under the null hypothesis and to examine the alternative hypothesis, changes are added to its second component, that is For both scenarios we simulated the residuals ε t in model (3.1) as i.i.d. N (0, 0.5) sequences.
Note that the models specified above have been already considered by Fremdt (2014b). As pointed out in Section 3.2 the asymptotic variance that needs to be estimated within our procedures is given by We estimate this quantity based on the stable segment of observations (Y 1 , P 1 ), . . . , (Y m , P m ) using the well known quadratic spectral estimator [see Andrews (1991)] with its implementation in the R-package 'sandwich' [see Zeileis (2004)].
The problem of detecting changes in the parameter of the linear model has also been addressed using partial sums of the residualsε t = Y t − P tβ I in statistics similar to (4.1), whereβ I is an initial estimate of β computed from the initial stable segment [see for example Chu et al. (1996, Fremdt (2014a) among many others].
Our approach directly compares estimators for the vector β t , which are derived using the general methodology introduced in Section 2 and 3. The resulting statistics are obtained replacingθ byβ in equation (4.1). We also refer to Leisch et al. (2000) for a comparison of residual based methods with methods using the estimators directly (these authors consider a statistic similar toQ).
In Table 3 we display the approximation of the nominal level for the three statistics with different values of the parameter γ in the threshold function, where monitoring was stopped after 1500 observations. We observe a reasonable approximation of the nominal level 5% in the case γ = 0, while the rejection probabilities for for γ = 0.25 or γ = 0.45 slightly exceed the desired level of 5%. The fact that larger values of γ ∈ [0, 1/2) can lead to a worse approximation of the desired type 1-Error has also been observed by other authors [see, for example, Wied and Galeano (2013)] and can be explained by a more sensitive threshold function at the monitoring start if γ is chosen close to 1/2. Overall, the approximation is slightly better for the independent case in model (LM1).
In Figure 5 we compare the power with respect to the change amount for different change positions, where we restrict ourselves to the case γ = 0 for the sake of brevity. The results are very similar to those provided for the mean functional in Section 4.1. Again the monitoring scheme based onÊ outperforms the procedures based onQ andP , and the superiority is larger for a later change. We omit a detailed discussion and summarize that the empirical findings have indicated superiority (w.r.t. testing power) of the monitoring scheme based on the statisticÊ.

Closed-end scenarios
It is worthwhile to mention that the theory developed so far also covers the case of closedend scenarios [sometimes also called finite time horizon in the literature]. In this section, we will very briefly discuss this situation and present a small batch of simulation results, which also indicate the superiority of the statisticÊ for closed-end scenarios. Note that the null hypothesis in this setup is given by which is tested against the alternative that the parameters changes (once) at some time Here the factor T ∈ N controls the length of the monitoring period compared to the size of the initial data set. Under the assumptions stated in Section 2, we can prove a corresponding statement of Theorem 2.6 and Corollary 2.8. where W is a p-dimensional Brownian motion with independent components.
The proof of Theorem 5.1 follows by a straightforward adaption of the proofs of Theorem 2.6 and Corollary 2.8 given in Appndix A. The corresponding results for the tests based on statisticsQ andP defined in (4.1) read as follows To complete the discussion on closed-end scenarios we will display a small batch of simulation results for the detection of changes in the mean as described in Section 3.1. For

the sake of brevity, only the choice T = 4 is examined here [unpublished simulation results
show similar outcomes for other choices of T ]. The remaining simulation settings are the same as used for the simulation study presented in Section 4.1 and in Table 4 we display the necessary critical values defining the rejection regions for the different procedures.
The approximation of the nominal level under the null hypothesis is displayed in Table 5 and in Figures 6 and 7 Table 4: (1-α)-quantiles of the distributions L 1,γ (4), L 2,γ (4) and L 3,γ (4) for different choices of γ. The cutoff constant was set to ε = 0 and the dimension is p = 1.

Two applications
In this section, we discuss some practical issues that arise when our methodology is applied to monitor for changes in linear models. As a summary of our recommendations, we provide Algorithm 1 and two elaborated examples related to the United Kingdom European Union membership referendum 2016. We consider the linear model where Y t is a real-valued response and (P 1,t , P 2,t ) is a two-dimensional predictor, which is a special case of the linear model considered in Section 3.2.
As most monitoring procedures, our approach requires a stable segment of m observations in which no changes have yet happened. To ensure that a candidate for a stable segment is suitable, we suggest to make use of well known retrospective change point test methodology. A comprehensive introduction and summary for this topic can, for example, be found in Csörgö and Horváth (1997). For the sake of completeness details of the retrospective test that is employed as part of our analysis are provided in Section B of the appendix.
In light of our asymptotic arguments in Section 2, we require that m will not be chosen smaller than a minimum segment size, say m min , to be specified by the user. In practice, the observations of times t = 1, . . . , m are then used as (a first candidate for) a stable segment and monitoring starts not earlier than at time m + 1; see line 2 of Algorithm 1.
Before monitoring begins, we apply the retrospective test to the candidate for a stable segment and use it as such only if the test does not reject the null hypothesis of no change.
Otherwise, if there is evidence of a change, we record the change point as 'found during setup', and continue in this manner with the next m min observations onwards from the change that we identified until a segment is found where the retrospective test does not reject. Obtaining the stable segment for monitoring is described in lines 4-10 of Algorithm 1.
Once a stable segment is available monitoring starts: The detector is updated for every incoming observation, namely (Y t , P 1,t , P 2,t ), and a decision is made whether to reject the null hypothesis and stop the procedure or to continue monitoring with the subsequent observation. This is described in lines 11-13 of Algorithm 1.
After a change has been detected, the monitoring phase ends and an estimate for the time of the change is obtained via the formulâ where P t = (P 1,t , P 2,t ) . The objective function in (6.2) resembles the one in the detector defined in (3.7), but we use the weights of the retrospective tests defined in (B.1). Note that -once monitoring has stopped -the problem of estimating the change time is of retrospective nature with the constraint that the first m observations are known as stable. The estimated time of changeˆ is then labeled as 'found during monitoring' and recorded.
For the next monitoring phase the observations from the change point that was just identified to the last observation previously monitored are considered as a candidate for the next stable segment. In the case where this set would have fewer than m min elements it is enlarged by adding incoming observations until m min observations are available. This is described in lines 14-16 of Algorithm 1. The procedure then goes back to line 4 where the retrospective test is applied to assess the suitability of the candidate to be the next stable segment. Note that, if a change is identified in the candidate for the next stable segment, a new candidate is chosen to be the observations starting with that identified change point to the time where monitoring has last stopped or m min observations onwards from the identified change in case that the candidate would otherwise have fewer than m min observations; see line 8 of Algorithm 1.
In the remaining part of this section, we present the outcomes of the statistical analysis by Algorithm 1 for two data sets related to the United Kingdom (UK) European Union (EU) membership referendum, which took place on 23 June 2016. For our analysis we chose the significance levels to be α = 0.1 and the threshold function w 0 , as defined in (2.13). All data used was obtained from https://www.ariva.de on 6 April 2019.
As our first example, we consider the relation of UK's currency, Pound Sterling (GBP), to the Eurozone's currency, the Euro (EUR), and Switzerland's currency, the Swiss franc (CHF). More precisely, we consider daily log returns of the exchange rate of GBP to the United States dollar (USD) as a response Y t of a linear model as described in (6.1). As predictors we now consider the log returns of EUR to USD (P 1,t ) and CHF to USD (P 2,t ).
A graphical representation of the exchange rates and associated log returns for the period from Januar 2016 to March 2019 can be seen in Figure 8. Nov 2016 is considered as a candidate for a stable segment and no evidence against this is found. The 71 observation in this segment are used for the third iteration of monitoring, Data: Univariate responses Y t and bivariate predictors (P 1,t , P 2,t ); Input: Minimum size m min for the stable segment, γ to be used for the threshold function w γ , significance level α; Output: A sequence of change points c 1 , c 2 , . . . and labels T 1 , T 2 , . . . that indicate whether the changes were identified by the retrospective test during setup (R) or by the sequential procedure during monitoring (S); 1 begin observations as a stable set without stopping until the end of the available data.
As our second example, we consider the relation of the UK's market to that of the United States (US) and the EU. More precisely, we consider daily log returns of the FTSE 100, a share index of the 100 companies listed on the London Stock Exchange with the highest market capitalization, as a response Y t of the linear model described in (6.1). As predictors we consider the log returns of two similarly constructed indices that are related to the US and EU markets, namely the S&P 500 (P 1,t ) and the EuroStoxx 50 (P 2,t ). A graphical representation of the prices and log returns for the period from January 2016 to March

Summary and outlook
In this paper we developed a new monitoring scheme for change point detection in a parameter of multivariate time series which is applicable in an open-end scenario. Compared to the commonly used methods we replace the estimator of the parameter from the initial sample X 1 , . . . , X m by an estimator from the sample X 1 , . . . , X m+j . We then compare this estimator with the estimator from the sample X m+j+1 , . . . , X m+k for every j = 0, . . . , k − 1, For the new statistic the asymptotic distribution under the null hypothesis and the consistency of a corresponding test, which controls the type-1 error, are established. By considering a common class of thresholds w γ defined in (2.13) the limit reduces to an elementary distribution, for which quantiles can be obtained by straightforward Monte Carlo simulations.
Finally, we demonstrate by a comprehensive simulation study that the new monitoring scheme is superior (in terms of testing power) to a benchmark consisting of common methods proposed in the literature. The new statistic can also be used in closed-end scenarios, for which the same superiority in power is observed.
A very challenging subject for future research is the characterization of the asymptotic distribution for the stopping times based on the statisticÊ defined in (2.2). Corresponding results are already known for the methods based onQ andP , see Aue and Horváth (2004) and Fremdt (2014a), respectively. Moreover, as the finite sample properties depend sensitively on the efficient estimation of the long-run variance it is of interest to develop a concept of self-normalization [see Shao and Zhang (2010)], which is applicable in an open-end scenario.

A Proofs of the results in Section 2
Proof of Theorem 2.6: In the proof we use the following extra notation. Define the where we have replaced the long-run variance estimatorΣ by the (unknown) true long-run variance Σ in definition (2.2). Further definẽ where we have replaced the estimatorθ j i by corresponding averages of the influence function in (A.1). Finally, the triple (Ω, A, P) will denote the underlying probability space.
The proof itself is now split into several Lemmas A.1 -A.5. The first Lemma shows that E m (k) and E m (k) are (asymptotically) equivalent. Lemma A.2 will approximate E m (k) by Brownian motions, while Lemma A.3 then yields a limit for this approximation. Lemma A.4 finishes the proof by plugging in the covariance estimator, meaning thatÊ m (k) and E m (k) are asymptotically equivalent. Finally, Lemma A.5 will establish the other representation of the limit distribution in (2.11). In each Lemma, we suppose that the assumptions of Theorem 2.6 are valid.
Proof. By the (reverse) triangle inequality and the linearization in (2.7) we obtain where we used that θ t is constant for the last equality. Next, we obtain that Using Assumption 2.3 for the threshold w we obtain and by similar arguments it holds that Further note that due to Assumption 2.4 with probability one Proof. For the remainder of the proof let W Σ m,i := √ ΣW m,i for i = 1, 2 and note that this The last display and the (reverse) triangle inequality then yield We will treat the three summands of the last display separately. Using the definition of the operator norm, we derive the following bound for the first summand By Assumption 2.2 and the estimate x Σ −1 ≤ Σ −1 1/2 op |x| for all x ∈ R p , the second factor is of order O P (1). Since w has a lower bound w we obtain for the first factor, that Next, we can bound the second summand of the right-hand side in (A.8) by where we used that (m + j) = (m + j) ξ (m + j) 1−ξ ≥ j ξ m 1−ξ . Using again Assumption 2.2, the second factor in the last display is of order O P (1). Moreover, following the idea of the proof of Lemma 3 in Aue et al. (2006) it holds that and it remains to treat the third summand of the right-hand side in (A.8), which can be bounded by Using Assumption 2.2 and the arguments in (A.9) this term is of order o P (1), which finishes the proof of Lemma A.2.
Lemma A.3 (Obtain limit process) The following weak convergence holds as m → ∞, where W 1 and W 2 denote independent, p-dimensional standard Brownian motions.
Proof. First note that due to the scaling properties of the Brownian motion [see for example Chapter 2 of Shorack and Wellner (1986)], it holds in distribution that and so within the proof we will -without loss of generality -only consider P m (k). Additionally, we define the processes We will show that L (i) (s, t) for i = 1, 2 are uniformly continuous on R + ∆ = {(s, t) ∈ R 2 | 0 ≤ s ≤ t} with probability one.
Case L (1) : In the following let ε > 0 be fixed but arbitrary. For almost every ω ∈ Ω the law of iterated logarithm implies that we can choose a C = C(ε, ω) > 0 sufficiently large such that Depending on C, we can split R + ∆ into the (overlapping) sets where we use the definitions Further let d denote the maximum distance that is d (s 1 , t 1 ), (s 2 , t 2 ) = max |s 1 − s 2 |, |t 1 − t 2 | .
Set M 1 (C): As this set is compact, the (ordinary) almost sure continuity of L (1) already implies that (A.13) holds for j = 1 and δ > 0 sufficiently small.
Set M 2 (C): We have the following bound sup d((s 1 ,t 1 ),(s 2 ,t 2 ))<δ (s 1 ,t 1 ),(s 2 ,t 2 )∈M 2 (C) (A.14) We will treat both summands of the last display individually. The first one can be bounded again as follows sup |t 1 −t 2 |<δ 0≤s≤C+1, t 1 ,t 2 >C and again we will treat both terms separately. For the first term note that and since w is uniformly continuous this expression is smaller than ε/3 for sufficiently small δ. For the second term of the right-hand side of (A.15), note that we have the bound and by the choice of C in (A.11) and for sufficiently small δ this is bounded by ε/3. To complete the treatment of M 2 (C) it only remains to examine the second term on the right-hand side of (A.14). We obtain that sup |s 1 −s 2 |<δ 0≤s 1 ,s 2 ≤C+1, t>C which can be bounded by ε/3 for sufficently small δ since the first factor is a constant and the function f (s) = W (s + 1)/(s + 1) is uniformly continuous on the compact set [0, C + 1] with probability one.
This completes the third case and so the almost sure uniform continuity of L (1) on the set R + ∆ is established.
Combining Lemma A.1, A.2 and A.3 we have already proven that and it only remains to investigate the impact of the covariance estimator. Therefore the following Lemma finishes the proof of Theorem 2.6.
and it remains to show the equality in distribution stated in (2.11).