A state-dependent linear recurrent formula with application to time series with structural breaks

An underlying assumption in Multivariate Singular Spectrum Analysis (MSSA) is that the time series are governed by a linear recurrent continuation. However, in the presence of a structural break, multiple series can be trans-ferred from one homogeneous state to another over a comparatively short time breaking this assumption. As a consequence, forecasting performance can degrade significantly. In this paper, we propose a state-dependent model to incorporate the movement of states in the linear recurrent formula called a State-Dependent Multivariate SSA (SD-MSSA) model. The proposed model is examined for its reliability in the presence of a structural break by conducting an empirical analysis covering both synthetic and real data. Comparison with standard MSSA, BVAR, VAR and VECM models shows the proposed model outperforms all three models significantly.


| INTRODUCTION
This paper examines an adaptation of the Singular Spectral Analysis (SSA) model in the presence of a structural break. Time series are well known to exhibit abnormal behaviour, such as clustering of outliers, non-linearities, and the particular focus here, structural breaks. Structural breaks may be conceptualised in two ways, the time series fluctuates between states of equilibrium to a state of recovering from a shock, or as a result of natural limit cycles within the economic system as proposed in Beaudry et al. (2015). Real world time series are also known to exhibit non-stationary and heteroscedastic behaviour which can render standard linear time series analysis methods (e.g., ARIMA) inappropriate (Kantz & Schreiber, 2004). This has led to the development of several non-linear and embedding techniques and motivated the development of SSA (Broomhead & King, 1986). SSA is a non-parametric modelling technique which makes no underlying assumptions about the distribution or stationarity of the time series and so can handle nonlinearity by removing noise from the time series using eigen decomposition methods (Golyandina et al., 2001).
SSA assumes that a time series evolves according to an underlying state space system, with the movement in that state space described by a linear recurrent formula (LRF) (Golyandina et al., 2001). In addition, an underlying assumption in SSA is that an observed time series is one realisation from a single LRF. However, this usually does not account for the presence of a structural break.
Either the structural break dynamics are not apparent due to an insufficient history or the structural break is part of an another process that also evolves the state space, albeit at a rapid pace. To account for the structural break while continuing to use SSA one must therefore assume an LRF exists that determines the evolution before the break, the break itself in which evolution occurs due to a different process/shock, and the evolution after the break is described either by a new LRF or the old LRF (which is starting from a new point in the state space). Ignoring the structural break and estimating the LRF from the entire data incurs forecast performance degradation for reasons described in Theorem 1. The solution proposed here is to change the LRF from a fixed recurrent equation to an adaptive recurrent equation. Such a model is known as a state-dependent model (SDM). The SDM was introduced by Priestley (1980) to target shifts in the dynamics of a time series from one state to another in the presence of a shock and, to the best of our knowledge, its use in SSA is novel.
The aim of this paper is to incorporate the SDM into the SSA framework to prevent shifts in a time series state from introducing a persistent bias in model forecasts (as will be shown later). In addition, we present results in the multivariate setting in which several inter-related economic time series are to be forecast together. The resulting model is shown to greatly improve Multivariate SSA (MSSA) (Golyandina et al., 2001) in the presence of a structural break, while still maintaining MSSA's ability to model a time series in the embedded time domain with noise rejection.
The proposed model is tested on synthetic data and the monthly Industrial Production Indicators (IPI) of France, Germany and the UK and monthly Unemployment Rate (UR) series for the EU and USA. To provide a better understanding of its performance with respect to both standard MSSA and classical autoregressive models, the results are benchmarked against the standard VAR, Bayesian VAR (BVAR), VECM and MSSA models. The remainder of the paper is organised as follows: Section 2presents related work, Section 3 extends MSSA and proposes the SD-MSSA (State-Dependent Multivariate Singular Spectrum Analysis) model. Section 4 presents the empirical results and discusses the findings. Finally, Section 5 draws conclusions and suggests avenues for future work.

| RELATED WORK
In this section we give a brief overview of SSA and focus on common models and features derived from time series in the presence of a structural break. In the last two paragraphs, we discuss research related to the core addition; state dependent model. SSA was first introduced in the 80's by Broomhead and King (1986) and Fraedrich (1986) as a time series approach to model dynamic behaviour through embedding delays as inspired by Taken's Theorem. Since then, the theoretical foundations of SSA for time series analysis and also forecasting have been established in a series of papers (Allan & Smith, 1996;Danilov & Zhigljavsky, 1997;Golyandina et al., 2001;Groth & Ghil, 2011;Golyandina et al., 2001;Kapl & Mueller, 2010;Patterson et al., 2011;Oropeza & Sachchi, 2011;Vautard et al., 1992;Yiou et al., 2000;Zhang & Hui, 2012;Zhao et al., 2011), and the references therein.
The embedded representation of a time series used in SSA decomposes it into a set of data-adaptive orthonormal components. These components can be projected into a lower dimension and then reconstructed to form a smoother time series, which can then be used for explaining structure and forecasting (Viljoen & Nel, 2010). Of particular interest, this paper focuses on an extension of SSA called Multivariate SSA. MSSA seeks commonalities between the embedding delay covariance matrices of several time series which are called common modes. Essentially, the applicability of MSSA centres on the existence (or not) of these common modes. It is assumed that common mode exists which explain simultaneous variations in a system of multiple time series (Viljoen & Nel, 2010). For example, when the modes in the time series are well-matched to each other then the MSSA performance can improve significantly; especially when a single time series alone is not sufficient to obtain good (univariate) estimates of the modes. As noted in Viljoen and Nel (2010) two or more time series may share common modes but may also exhibit specific modes (specific to just one time series).
Given the common modes characteristic, MSSA has been found to perform well in some domains such as noise reduction in EEG signals (Groth & Ghil, 2011), anomaly detection and traffic patterns in computer networks (Babaie et al., 2014), spatial statistics (Awichi & Müller, 2013), reconstruction of shared oscillations and analysis of oscillatory modes of several time series (Groth & Ghil, 2011), phase synchronisation analysis (Groth et al., 2015), and many others.
Although in all the examples mentioned above MSSA is used as a filtering and reconstruction technique, it can also be used as a powerful forecasting technique. For example, Patterson et al. (2011) show that MSSA, as a non-parametric tool, can be useful for forecasting realtime data subject to a revision process, when compared to those time series models which involve restrictive assumptions such as linearity, normality and stationarity. Moreover, applying MSSA on the UK's IPI time series was found to improve forecasting accuracy over standard parametric models like VAR and ARMA. More comparisons between MSSA (and SSA) forecasting and other classical models like autoregressive models (ARIMA and GARCH) and random walk models, through a wide range of financial and economic time series, can be found in previous works (Hassani et al., 2013;Hassani et al., 2012;Heravi et al., 2004;Kapl & Mueller, 2010;Osborn et al., 1999;Patterson et al., 2011;Skare & Buterin, 2015;Zhang & Hui, 2012).
MSSA has been also combined with other methods to target specific features of particular datasets. For example, Menezes et al. (2014) combine MSSA with Periodic AutoRegressive models (PAR), to forecast the monthly average wind speed of two regions of Northeast Brazil (Hydrological time series). Their outcome shows PAR(P)-MSSA is far superior than the two others (PAR(P) and PAR(P)-SSA), since MSSA takes into account the spatial dependence between the two stations. Multi-scale SSA (MS-MSSA) is yet another improvement on MSSA and proposed by Yiou et al. (2000). They define a moving window proportional to series length and use a wavelet transform to detect regular part of the lag-covariance matrix. Despite the fact that empirical results show a similar behaviour of MS-MSSA and wavelet analysis, MS-MSSA shows faster performance on tracking sharper regime transitions in the evolution of a non-linear system than either MSSA or wavelet analysis (Yiou et al., 2000).
MSSA forecasting is conducted via the recurrent coefficients of an LRF (Formula 2 in Section 3). There exist several variations of the LRF, such as the nearest subspace LRF approach when there is an approximate separability in which the selected linear space cannot be the trajectory space of the time series anymore (Golyandina et al., 2001) and the minimal recurrent formula, which removes the extraneous roots of the characteristic polynomial of the LRF (reduction method).
Previous works (Gelman et al., 2013;Hendry & Mizon, 2005;Kim et al., 2004;Pesaran et al., 2006) discuss different approaches to dealing with structural breaks in several econometric models. Hendry and Mizon (2005) point out that when a structural break occurs forecasts invariably become quite bad as usually the underlying assumptions about the data generating process (DGP) become invalid. It is then suggested that a reset type approach works best in which one re-estimates the model states given only the new data (e.g., an intercept can be reset as it can be estimated from the data directly). However, for SSA one has a problem as there is no clear relationship between the LRF coefficients and the data (the SVD clouds the relationship, Section 3.2) and in addition we would like to retain information prebreak especially as it relates to the seasonal pattern of the time series. For the LRF (Golyandina et al., 2001) find that in the presence of abrupt changes, the number of the perturbed extraneous roots increases significantly which results in a less precise forecast. Furthermore, while the time series before and after the break may locally follow a LRF (or two separate LRF's) the combined time series may not. To detect such changes, Golyandina et al. (2001) defined a heterogeneity function and an eigentriple rearrangement by measuring the discrepancy between two (or more) different states of a time series. They have also examined the root function of the characteristic polynomial of the LRF. This gives a dynamic description of the sequence of the homogeneous part of the series which is affected by the eigentriple rearrangement and structural changes. Although the system post-break evolves according to the same dynamics, MSSA is unable to estimate these parameters correctly (as we shall prove in Theorem 4.1). Note that it is not that the data pre-and post-structural break are necessarily samples from a different process but rather estimating the LRF does not take the break into account and this leads to a biased estimate of the model. Therefore, we require not a fixed state space system as in SSA but one which adapts to the changes in the system as in State Dependent Models (SDM).
The SDM (Priestley, 1982) uses a sequential structure to distinguish between states transition and can be used in forecasting and detecting non-linearity in time series. According to Priestley (1980) using state-dependent models has advantages in two aspects. First, they can be used directly with prediction and forecasting, second, since they can be fitted to data without specific prior assumptions about the form of the non-linearity, they can provide an overview of the inherent non-linear characteristic in a given series (Priestley, 1982). An SDM operates by splitting the state spaces into a large number of small segments, and models the process as locally linear inside each segment.
In this paper, the movement of LRF parameters are tracked by using an SDM in such a way that the parameters can change not only over time but also over the states of the system. In other words, the LRF's behaviour in the presence of a structural beak is explained by a non-linear function connecting the current states to its previous states (and the previous states of other series) and random shocks. The whole process is called SD-MSSA since the parameters are no longer fixed for all time series and will be updated recursively via the Extended Kalman Filter (EKF). In the next section, we briefly describe the MSSA model and then present theoretical foundations of SD-MSSA.

| METHODOLOGY
In this section, we first present MSSA and the LRF. Then, after having shown that SSA is biased in the presence of a structural break, we propose the state dependent LRF to alleviate the bias. 1 3.1 | MSSA model Here, L is called embedding dimension or window length and is the maximum delay applied to the series to form the embeddings. These embeddings 2 are then stacked to form the block trajectory matrix as The next step involves estimating a lower dimensional reconstruction of X M via the singular value decomposi- . Those first L À 1 rows of X (m) can form a linear projection onto the Lth row aŝ which is also defined as an LÀcontinuation of the series where the coefficients vector (ϕ 1 , … , ϕ L À 1 ) is applied to the linear space L r ¼ spanfU M 1 ,…, U M r g. Note that the choice of L depends on the structure of the data and no single rule for L can cover all applications. However, in general L should be proportional to the periodicity of the data, large enough to obtain sufficiently separated components but not greater than N/2. Full discussion of the choice of this parameter is given in Golyandina et al. (2001).
To obtain the coefficients vector, (ϕ 1 , … , ϕ L À 1 ), let U 5 M j denotes the vector of the first L À 1 rows of the eigenvectors U M j , and π M j denotes the last row of the eigenvectors U M j , then if e L = 2 L r it can be simply proved that π 2 M 1 þ π 2 M 2 þ … þ π 2 M r < 1, see (Golyandina et al., 2001). It is also shown that there exists a unique which means the LRF coefficients, Φ ¼ ðϕ LÀ1 , …, ϕ 1 Þ T , can be expressed as 1 We caution the reader that although SSA and AR/VAR models share similarities, they are fundamentally quite different and in our experience drawing comparisons can hinder understanding. 2 In principle, because of the embedding theorem (Takens, 1981), the time delayed version of a time series is generically sufficient to reconstruct the dynamics of the underlying systems if enough delayed coordinates are used (Cao et al., 1998).
Equation (2) is the core of this paper and has many interesting properties (see Golyandina & Zhigljavsky, 2013). First, it can be used with the chain rule of forecasting time series to give multi-step ahead forecasts. Second, it is not a typical AR(L À 1) model as the residual is not iid. 3 More importantly, Φ captures the dynamics of the time series (specifically, it defines the linear recurrence Golyandina and Zhigljavsky (2013)) and gives an optimal multi-step ahead forecast while an AR model typically gives the optimal 1-step ahead forecast (this is the major advantage of SSA as a forecasting tool). An example, to be used later, is shown in Figure 1a which shows three time series with very different characteristics, but y et all have been generated from the exact same LRF.
Before turning to the theoretical effect of a structural break on the estimation of an LRF, we give a brief intuitive illustration of an LRF's behaviour in the presence of a structural break. Figure 1b shows a cloud map of 100 bootstrapped LRF estimates of ϕ 1 (t), estimated at time t, for the French Vehicle time series (used later in Section 4.2.2). It is evident that, before the structural break (at time 226), the LRF's distribution has a lower variance and is more concentrated around the mean (white area). One would expect that after a break that the variance will temporarily increase; however, the increased variance appears to persist and this is worrying. This would seem to imply that once a structural break has occurred we should discard pre-break data which is obviously an unsatisfactory outcome.

| Linear recurrent formula in the presence of a structural break
As shown above, the LRF is the main vehicle through which future forecasts are obtained. In the following theorem we assume a single LRF dictates the evolution of a time series. We then examine the effect of a structural break which affects only the mean of the time series. It is convenient mathematically to assume the break affects the series before the break instead of after which is justified as both segments follow the same LRF.

Theorem 1 (LRF Theorem). Let
Y R NÂM be a multivariate time series which follows a LRF without any structural breaks. Assume that the observed time series is a back-shifted version of where each time series experiences a structural shift in the mean, δ m , at time τ m . Let C X and C X denote the covariance matrices of the trajectories of Y and Y respectively. Then the deviation between the first eigenvalue of C X and C X is λ 1 À λ 1 ≈ T LðQ 2 À 2Q yÞ. Furthermore, this deviation decreases as Oðk=NÞ.
Proof. First note that for t > maxðτ m Þ all structural shifts have occurred and the observed time series (by assumption) is a linear continuation of Y , not Y. Thus, C X contains the correct covariance structure for the time series going forward. The covariance matrix of the trajectory matrix can be 3 A trivial example is when r ¼ 1, that is, the residual has seasonal components which are auto-correlated, that is, not iid.
Note that C X is different from C X , in just a shift in the mean for the first τ m elements and so can be expressed as where l ¼ minði,jÞ and 1 A m is a step function with 1 A m ¼ 1 for A m ¼ ft < τ m g and zero otherwise. Note that structural break by definition takes the system from one mean to another such that δ m > > δ y k . 4 Expanding these terms results in and so it can be seen that C X is related to C X as then the expected value of the covariance matrix can be written as where T is a row vector with the numbers of samples prior to the structural breaks, Q is a column vector of amplitudes of the structural breaks, 1 L is vector of ones with L elements, and J L Â L is a square matrix of ones. Thus, the net effect of a structural break is that the first eigenvalue of the covariance matrix has a bias of approximately T LðQ 2 À 2Q yÞ . From Lemma 2, below, this bias can be seen to tend to zero with N as where k is some constant.
Theorem 1 states that the first eigenvalue of C X will contain a bias which slowly decays as new samples arrive (see Figure 2b for an empirical example of how this affects the LRF coefficients). For this reason a dynamic LRF is proposed based on a state dependent model.

| State-dependent format of linear recurrent formula parameters
In this section, the LRF parameters are viewed as state parameters which are allowed to recursively evolve based on the observations. Let Y Note that in contrast to the LRF in MSSA, which assumes the same LRF coefficients for all series, SD-LRF allocates 4 By definition a structural break in the mean takes the system from one mean to another rapidly and is clearly larger than any noise or trend in the time series.
different LRF coefficients to each series. In addition, fϕ u ðY ðmÞ N Þg ¼ fϕ ðmÞ u,N g; u ¼ 1, …, L À 1 , are assumed to be analytic functions of y ðmÞ Nþ1 which change smoothly over time and so can be expressed via a Taylor's series expansion as where Δy are unknown hyper-parameters which are assumed to change slowly following a random walk as while Γ Nþ1 is a sequence of independent matrix-valued random variables such that V ðmÞ Nþ1 $ Nð0, Σ V Nþ1 Þ . With some modifications, equation (1) can be rewritten in state space form in which the state-vector is no longer Y ðmÞ N , but is replaced by the statedependent coefficients augmented with the unknown gradients where θ ðmÞ Nþ1 is the vector of all unknown parameters of the model (and of course a function of Y ðmÞ N ). Finally, replacing ϕ u with (12) in (9) results in the following state space model where H * Nþ1 ¼ ðy N , …, y NÀLþ2 ,0,…,0Þ is an inherent characteristic of the system capturing its motion.
Nþ1 Þ is a state transition matrix characterised by a corresponding block diagonal form and open to modification over time. W N + 1 is the measurement noise and ϵ t is the process noise. The state vector is then estimated using the Extended Kalman filter (see Appendix A1 for the full algorithm).
F I G U R E 2 (a and d) Synthetic data generated from data generating process (DGP 1) and (DGP 2) in Section 4.1. With (L ¼ p ¼ 2), (b and e) LRF coefficients and (c and f) SD-LRF coefficients

| EMPIRICAL RESULTS
In this section, we perform experiments on (i) synthetic and (ii) real time series. In the first set of experiments, we begin with a bivariate time series model generated from a VAR(2) examining the forecasting performance of the LRF in the presence of structural breaks and results in detail. The analysis is then extended to a more complicated Data Generating Process (DGP).
The second set of experiments is conducted on three sets of real time series, IPI: from I.N.S.E.E (Institute National de la Statistiuqe et des Etudes Economiques) for France, from Statistisches Bundesamt Wiesbaden for Germany, from the Office for National Statistics (ONS) for the UK and UR: from Eurostat for 30 countries in Europe and from Bureau of Labour statistics for the 48 states of the USA.

| Synthetic data experiments
In the first set of experiments, we simulate a multivariate time series from the VAR model, which has been found in several studies to efficiently model the dynamical behaviour of multivariate series (Lütkepohl, 2005).
denote an (1 Â M) vector of time series, thus the basic p-lag, VAR(p) model can be expressed as where are coefficient matrices, ϵ k R MÂ1 is white noise zero mean vector process with covariance matrix Σ, and δ is an intercept. Next, we insert a structural break as a shift in the mean level of the multivariate time series simulated by the VAR model mentioned above. The length of each time series is 250 with the shift occurring at time 150. Our DGP consists of three models: (1) two time series have been simulated and shifted to the same level (Figure 2a), (2) two time series have been simulated but only one has been shifted (Figure 2d), (3) six time series have been simulated and each pair have been shifted to different levels, Figure 3a. For simplicity we used p ¼ 2, which means an embedding dimension of two for both MSSA and SD-MSSA. 5 The goal of this experiment is to show the exponential decay in the empirical LRF coefficients and show how SD-LRF can recursively track the changes and alleviate the exponential decay. 4.1.1 | DGP 1 When a structural break hits both time series at the same time, as shown in Figure 2a, the LRF coefficient suddenly jumps from 1.005 to 1.03 and then starts decaying slowly back to its pre-structural break values, Figure 2b. On the other side, SD-LRF's coefficient grows comparatively faster, from 1.00 to 1.20 but decays exponentially as time moves on as can be seen from Figure 2c. Moreover, the response time for SD-LRF (10% to 90% of its final value) is 20 steps while for LRF this is far greater and is 70 steps.

| DGP 2
In the case that only one of the two time series has been shifted ( Figure 2d) the LRF coefficient decays faster than DGP 1 and therefore, the response time drops from 70 steps to 60 steps (Figure 2e). We emphasise once again that MSSA finds one set of coefficients for both time series which is a trade off between computational cost and forecast accuracy. In contrast, SD-LRF finds two sets of coefficients and only the coefficient corresponding to the shifted time series decays while the other one remains stable (Figure 2e).

| DGP 3
The third simulation consists of 6 time series with different shift levels: zero-inflated, moderately inflated and highly-inflated shift, Figure 3a. Once again LRF coefficients display a slow decay over time (Figure 3b) whereas SD-LRF assigns coefficients specifically to the time series related to the changes which in almost all cases decay significantly faster than LRF. As shown in Theorem 1, the presence of a structural break in the mean, δ, is related to the eigenvector of ½1, 1, …,1 T 1= ffiffiffi L p in U i . Consequently, the empirical LRF coefficients show an exponential decay and die out very slowly as we move away from the shift and it can be proportional to their linear continuation of the time series pre-break.

| Real time series experiments
In this section, we examine the performance of SD-LRF to forecast real time series with structural breaks.

| Data
The first datasets are the IPI time series for France, Germany and the UK. Apart from structural changes, IPI 5 We tested other embedding dimensions and the results show that the first LRF coefficient always has an exponential decay following the break. series share dynamical characteristics which make them quite suitable for multivariate modelling. The data represent eight major components of real industrial production in France, Germany and the UK. These are seasonally adjusted series which measure real output for all facilities located in the manufacturing of Electric and Gas (E&G) utilities, Chemicals, Fabricated Metals (F.M.), Vehicle, Food Products (F.P.), Basic Metals (B.M.), Electrical Machinery (E.M.) and Machinery. These indicators show movements in production output and highlight structural developments as well as short-term changes in the economy. Although we consider only those eight of the two digit industries in this study, these eight industries account for more than 50% of the total industrial production in each country. The same eight industries have been considered by Heravi et al. (2004) and Osborn et al. (1999). Table 1 shows a descriptive analysis of IPI for France, Germany and the UK.
In all cases, our sample period ends in February 2014. However, the data for France start from January 1990, for Germany from January 1991 and for the UK start from January 1998. Most series (France and Germany) present a long period of growth in the 1990s and up to the last recession of 2008-2009. For the UK, however, most series show a period of stagnation in the early 2000s and recession in 2008. For almost all the series, the steepest drop can be seen around 2008-2009, which is attributed to the banking crisis and subsequent recession. 6 To test collinearity and cointegration between the series the Variance Inflation Factor (VIF) 7 has been measured. Based on Table 1, almost half of the series are shown to be strongly correlated and a quarter of them 6 We used the Bai and Perron test (Bai & Perron, 2003) for structural breaks. It tests the null hypothesis that there is no structural break in the data against the alternative that there is a known structural break at time t. We have reported the last break point in Table 1. As can be seen in all cases the estimated break date occurs during the 2008 economic downturn. 7 VIF regresses an explanatory variable towards the rest of variables and subsequently measures the square of the multiple correlation coefficients by VIF ¼ 1 1ÀR 2 . A rule of thumb is that if VIF goes above 10, then multi-collinearity is high while for 5<VIF<10 multi-collinearity is moderate.
F I G U R E 3 (a) Synthetic data generated from data generating process (DGP 3) in Section 4.1. With (L ¼ p ¼ 2), (b and c) LRF coefficients, (d and e) SD-LRF coefficients for zero, moderately and highly mean-inflated shift, respectively are moderately correlated. In particular France and German's economies show higher correlation possibly due to a larger industrial component than the UK's.
The second datasets are the EU Unemployment Rate (UR) which is a collection of quarterly series based on the quarterly results of the EU Labour Force Survey and taken from Eurostat. 8 Unemployment is among a number of critical variables whose evolution has been continuously subject to close analysis by economic authorities and academics alike. Indeed, there is a large body of literature that deals with the estimation of (econometrics and time series) models aimed at understanding the determinants of this variable, and evaluating their ability to produce accurate forecasts; a few recent examples include, inter alia, Milas and Rothman (2008), Lahiani and Scaillet (2009), Fendel et al. (2011), Trendle (2002 and Ball et al. (2015). Table 2 shows a descriptive analysis of the UR over the time period 1998 to 2017 (79 quarters for 30 EU Member States). Note that the UR is expressed in percentage. The first half of the data are used for training and the remaining are used as a hold-out set for forecast performance evaluation. As can be seen from Table 2 there are some missing periods for most of the countries. In this study, we used a cubic interpolation function in MATLAB to impute the missing values. Note that these quarterly time series are fairly short which make them suitable to evaluate SD-LRF performance on forecasting short time series. Most of the unemployment series presents a short period of fall in the 1990s and up to [2008][2009]. Depending on the countries the UR's was seen to grow considerably during the 2008 recession. and New Mexico. However, states like North and South Carolina and Nevada display 10% reduction in their unemployment rate.
Descriptive analysis in Table 3 shows that the null hypothesis of a unit root at the 1% level can not be rejected for any of the states. Also, the normality test (SW) confirms that these time series are not normally distributed. Last but not least, the colinearity test confirms that the USA unemployment series are strongly correlated. This can be mainly related to the spatial dependency and geographical homogeneity which determines an association between the commuting flow variables and spatial interdependencies between states.

| Forecasting results
In this section, we evaluate the forecasting performance of SD-MSSA relative to standard MSSA and different time series models, such as Vector AutoRegressive (VAR 9 ), Bayesian VAR (BVAR 10 ) and Vector Error Correction Model (VECM 11 ) for IPI and UR time series in Tables 4-6.
Our algorithm settings are as follows for all experiments. The window length is set to values between 2 and 50 and in the process, we evaluate all possible values of r for r < L. for the monthly series in almost all case we found L's between 12 to 36 result in better forecasting accuracy. As mentioned before, we used multivariate bootstrap SSA (Golyandina et al., 2001) to generate initial values for SD-LRF. The maximum number of epochs is set to 1000. The selected range for the smoothing factor in this study is between 10 À3 to 10 À5 .
Forecast accuracy is measured using the prediction root-mean-Square error (RMSE) and mean absolute error (MAE). However, since these measures give quantitatively similar results and to conserve space, we only report the RMSE, as this is the most frequently quoted measure in forecasting. For assessing the statistical significance of the forecasting methods we use the Diebold-Mariano test (DM test) (Diebold & Mariano, 2002). The 9 All VAR models were found to have a regressor of lag 1. All models were tested for the presence of a constant term, a trend, and both together but only the presence of a trend was found to be significant. 10 A Bayesian approach to estimating the parameters of a VAR model is presented in Del Negro and Primiceri (). Among the advantages of this approach is the ability of the model to adapt to time varying changes in the data including to shocks/structural breaks. An implementation may be found in Krüger (2015)    forecasting accuracy is also assessed for four different horizons, one step ahead, three and six steps ahead and one year ahead (12-step).
The results of the forecasting competition are shown in Tables 4-6. These tables present the values of the RMSE obtained from recursive forecasts for 1, 3, 6 and 12 months. Table 4 shows RMSE values for IPI while  Tables 5 and 6  While the out-of-sample forecast performance of MSSA models differs between horizons, the SD-MSSA model shows significantly lower forecasting errors compared to its counterpart, MSSA. In Table 7 we present the summary of the number of significant predictive accuracy based on the results of the DM test between each two models for one month to twelve month ahead forecasts. Among autoregressive models, BVARSV model shows a higher percentage of significant forecast accuracy for horizons greater than one month. MSSA performs better for one step ahead forecasts while its counterpart, SD-MSSA outperforms MSSA, BVARSV, VAR and VECM for horizons of 3,6 and 12 months. These differences are statistically significant in 75%, 79% and 83% of the cases respectively.
In order to better understand the forecasting gain the cumulative squared forecast of the out-of-sample errors (for all French Vehicle series) are shown in Figure 4. Most noticeably of all the SD-MSSA forecasting errors are significantly smaller than those of the MSSA, VAR and VECM models for both horizons, h ¼ 3 and h ¼ 12 . In addition, the forecast errors estimated by SD-MSSA models are shown to be much smaller than their Bayesian counterparts. Table 5 shows values of RMSE for MSSA and SD-MSSA for UR in Europe states. Not surprisingly, the values of RMSE of the SD-MSSA model are much smaller for horizons up to three month in comparison to the RMSE's for MSSA. Nevertheless, MSSA performs better for forecasting Ireland and Spain for horizons up to 6. These are two out of five countries with significant growth in their unemployment rate. Comparisons between two forecasting models tested by DM test, Table 7, shows MSSA's forecasts are statistically significant in 10% and 23% of cases for h ¼ 1 and 3, respectively. On the other side, SD-MSSA is yet again a more accurate model for horizons 6 and 12 and the differences between two models are statistically significant in 63% and 93% of the cases.
Next, horizons (3-12 months) SD-MSSA reports the lowest RMSE in nearly all cases. Table 7 also confirms that SD-MSSA returns higher percentage of forecast accuracy than MSSA. The fact that SD-MSSA outperforms MSSA for forecasting IPI and UR time series is related to the fact that all of these time series are affected by the global economic and financial crisis in 2008. However, the impact of the shock varies between each country, and the strong correlation between these inter-related economic variables suggests multivariate time series analysis improves forecasting performance. Our results also suggest that the there is a trade-off between computation cost (updating the LRF coefficients related to each series) and forecasting accuracy obtained with the SD-LRF.

| CONCLUSION AND SUMMARY
In this paper, we developed a state dependent model for the LRF in multivariate singular spectrum analysis. We proved theoretically that when a structural break happens to a time series the LRF, as an underlying assumption of MSSA (and SSA), does not coincide with a recurrent continuation of the series pre-break. Accordingly, in our model, called SD-MSSA, we propagate coefficients as a function of the state vector to involve their dependent movement in the LRF. The propagation coefficients can then mathematically explain their time evolution as well as their state position. Empirical evidence shows the proposed model to be superior to MSSA in terms of the model parameters recovering from a shock.
The performance of the SD-MSSA model is evaluated using both synthetic and real time series including a structural break. Of the five methods compared (SD-MSSA, MSSA,VAR, Bayesian VAR and VECM) SD-MSSA was found to be the most accurate method for forecasting horizons of up to a year. Our results showed that MSSA performs better for one step ahead forecasts while its counterpart, SD-MSSA outperforms MSSA, BVAR, VAR and VECM significantly for horizons of 3, 6 and 12 months. It was also shown that SD-MSSA has a higher percentage of forecast accuracy than MSSA.
The SDM model examined assumes an approximate Gaussian posterior for the coefficients of the LRF model (Underlying assumption of EKF) but further work involving a non-Gaussian posterior may be interesting such as using the unscented Kalman filter or the particle filter to estimate parameters. Also this might provide us with a means to reject noise in the time series which can have an adverse effect on forecasts. It would be also of interest to look at different kinds of non-linearity in the SDM (i.e., change the random walk model in Equation (10)).
This would be interesting and in fact might align with the idea of using the unscented Kalman filter or particle filter.

DATA AVAILABILITY STATEMENT
Sample data and codes for Bootstrap MSSA and SD-MSSA may be found online (https://github.com/ drahmani/SD-MS-SA).
ORCID Donya Rahmani https://orcid.org/0000-0003-1044-1005 learning and applied graph theory. He has published over 40 peer reviewed papers and was awarded a commercialisation award at UCC for his research and patents into time of arrival prediction techniques.

APPENDIX A
In this section, we show how the Extended Kalman Filter is used to estimate the state vector representation of the LRF:θ where F * N ℝ 2ðLÀ1ÞÂ2ðLÀ1Þ is given by: and K * Nþ1 is the "Kalman gain" matrix where K * Nþ1 ¼ Φ Nþ1 ðH * Nþ1 Þ T σ 2 e , and Φ Nþ1 is the variance-covariance matrix of the one-step prediction error of θ N + 1 , that is, is the variance of the one-step ahead prediction error of y N + 1 , i.e., σ 2 e is the variance of e Nþ1 ¼ y Nþ1 À H * Nþ1 F * Nθ N . Let C N + 1 be the variance-covariance matrix of ðθ Nþ1 À θ Nþ1 Þ, then successive values ofθ Nþ1 can be estimated by using the standard Kalman filter: To provide initial values for the SD-LRF model, θ N + 1 , σ 2 ϵ and C N + 1 , (Priestley, 1980) recommends fitting a standard AR model to the data, and provides a good starting points for SDM. In this paper, we use bootstrap SSA (Golyandina et al., 2001;Rahmani, 2017) and set θ Nþ1 0 À1 ¼ ðφ Nþ1 0 ,1 , …,φ Nþ1 0 ,LÀ1 ,0,…,0Þ,σ 2 ϵ and C Nþ1 0 À1 1 Ω ϕ 1 ,…,ϕ LÀ1 0 0 0 ! , whereΩ is the estimated variancecovariance matrix of ðφ Nþ1 0 ,1 ,…,φ Nþ1 0 ,LÀ1 Þ obtained via bootstrapping. Note that as soon as the Kalman filtering recursion starts, the effect of the chosen starting value diminish quickly. As the Kalman filter is an optimal state estimator applied to a dynamical system perturbed with random noise then the estimates obtained within the sample period for the LRF are also optimal.