Detecting Co‐Movements in Non‐Causal Time Series

This paper introduces the notion of common non&#8208;causal features and proposes tools to detect them in multivariate time series models. We argue that the existence of co&#8208;movements might not be detected using the conventional stationary vector autoregressive (VAR) model as the common dynamics are present in the non&#8208;causal (i.e. forward&#8208;looking) component of the series. We show that the presence of a reduced rank structure allows to identify purely causal and non&#8208;causal VAR processes of order P&gt;1 even in the Gaussian likelihood framework. Hence, usual test statistics and canonical correlation analysis can be applied, where either lags or leads are used as instruments to determine whether the common features are present in either the backward&#8208; or forward&#8208;looking dynamics of the series. The proposed definitions of co&#8208;movements are also valid for the mixed causal&#8212;non&#8208;causal VAR, with the exception that a non&#8208;Gaussian maximum likelihood estimator is necessary. This means however that one loses the benefits of the simple tools proposed. An empirical analysis on Brent and West Texas Intermediate oil prices illustrates the findings. No short run co&#8208;movements are found in a conventional causal VAR, but they are detected when considering a purely non&#8208;causal VAR.


I. Introduction
A 'feature' is defined as a dominant characteristic observed in univariate time series. One might be interested in testing whether certain time series contain the same feature. We say that a feature is common if a linear combination of multiple series no longer has the feature, while it is present in each of the series individually. There are ample examples of common features in the literature: e.g. cointegration, common ARCH, co-breaking and common nonlinearity. In this paper, we focus on the presence of common cyclical features as introduced by Engle and Kozicki (1993) and Vahid and Engle (1993). That is, we are interested in extracting commonalities in the dynamics of multivariate time series models. This is commonly done by imposing short run restrictions, which are known to have direct implications for forecasting accuracy and parameter efficiency as less coefficients have to be estimated. Common features restrictions are also directly implied by economic theories such as the present value model or the permanent income hypothesis (Guillén et al., 2015;Issler and Vahid, 2001). Impulse response functions are collinear when common transmission mechanisms are present and economies with strong nexus generally show statistical similarities in their business cycle fluctuation phases.
This paper contributes to this literature by studying the existence of common cyclical features in economic and financial variables for both causal and non-causal vector autoregressive (VAR) models. In our terminology, the conventional vector autoregressive model popularized by Sims (1980) is called causal, as the variables of interest only depend on their own past values. The non-causal VAR has been investigated in the literature to a much lesser extent and is defined as the analogue model in reverse time (see e.g. Davis and Song, 2012;Lanne and Saikkonen, 2013;Gouriéroux and Jasiak, 2017). In this framework, variables are modelled to depend on their future values. From an empirical point of view, non-causal VAR models are worth investigating as they are simple linear models that are able to generate different dynamics than their causal counterparts. For instance, Gouriéroux and Zakoïan (2016) show that non-causal models are able to mimic speculative bubbles, i.e. processes that experience (locally) a rapid increase or decrease followed by a sudden crash or recovery. A more general model that depends on both past and future values is called a mixed causal-non-causal model 1 and combines the specific characteristics of both the backward-and forward-looking VAR. It is able to generate even richer dynamics and therefore captures characteristics of macroeconomic and financial variables that could previously only be generated using highly nonlinear and complex models (see Hecq, Lieb and Telg, 2016;Gouriéroux and Jasiak, 2017).
Serial correlation common features (SCCF hereafter, see Engle and Kozicki, 1993) is a well-known approach to test for co-movements in an n−dimensional stationary time series process Y t generated by a causal VAR model of order p: where L is the lag operator, and " t are i.i.d. innovations with mean vector E(" t ) = 0 and positive definite covariance matrix E(" t " t ) = . The presence of SCCF implies that each VAR(p) coefficient matrix can be written as the product of two reduced rank matrices such that 1 More precisely, a non-causal model is defined as a model that has a unique stationary solution in terms of current and future error terms. For the mixed causal-non-causal model, this is the two-sided moving average representation, i.e. past, current and future disturbances.
where the full rank n × (n − k) matrix (0 < k < n) ⊥ is the orthogonal complement matrix of such that ⊥ = 0. It is clear that there exist k linear combinations Y t = " t that annihilate the entire dynamics. A reduced rank structure in the matrix [ 1 ,…, p ] is typically employed to detect the commonalities among series Y t . This paper extends this methodology to the case of mixed causal-non-causal VAR models.
Various papers successfully identify the presence of co-movements in the cyclical fluctuations of economic time series (see e.g. Centoni and Cubadda, 2015 and references therein). Macroeconomic variables like the growth rate of consumption, investment and the gross domestic product tend to co-move (Hecq, Palm and Urbain, 2006); economic indicators (industrial production index, unemployment or inflation) for several countries or regions within a country display some co-movements when a certain degree of convergence is reached. In contrast, there are also cases with well-motivated and graphically visible commonalities that are not detected by formal test statistics. We consider such a case in our empirical application examining monthly growth rates of two oil price series, whose graphs seem to move together almost identically. However, usual test statistics, that we review later, strongly reject the presence of co-movements. This leads to the conclusion that an exogenous shock adjusts differently on the two oil price indicators, which is highly unlikely from an economic point of view. One explanation for the lack of detecting co-movements is the fact that cycles do not have to be synchronous. Several less restrictive statistical models have been introduced to account for adjustment delays. 2 Additional reasons have been suggested in the literature, among which the impact of misspecifications: presence of conditional heteroscedasticity (Candelon, Hecq and Verschoor, 2005), impact of seasonal adjustment (Hecq, 1998;Cubadda, 1999), outlying observations and nonlinear phenomena.
In this paper, we investigate whether co-movements in time series can be identified when they are modelled by means of a mixed causal-non-causal VAR rather than the conventional purely causal VAR. We show that for purely causal or non-causal VAR models with more than one lag or lead, the presence of a reduced rank structure in the VAR coefficient matrices allows to identify causal from non-causal systems even using the Gaussian framework. Using either lags or leads within a canonical correlation analysis or a Generalized Method of Moments (GMM) approach, the reduced rank restrictions help to identify the correct model. This result stems from the fact that, except for VAR models with either one lag or one lead only, the existence of SCCF implies that the autocorrelation matrices of series Y t have a common left null space for either any lag or lead different from zero. However, this important result does not extend to mixed causal-non-causal models. In these cases, estimation methods that do not solely depend on second-order properties have to be employed to detect the presence of common features.
The rest of the paper is organized as follows. Section II summarizes results about the mixed causal-non-causal autoregressive model with common features. Section III introduces our testing strategy for identifying purely causal and non-causal models with SCCF. The finite sample behaviour of reduced rank regressions and GMM orthogonality tests when non-causality is present is studied in Section IV. Section V applies our new 2 Those extensions include the polynomial serial correlation common feature specification (Cubadda and Hecq, 2001), the weak form common feature model (Hecq et al., 2006) and the co-dependent cycle approach (Vahid and Engle, 1997). See Cubadda (2007) for a unifying framework of the various forms of common features. modelling procedure to investigate the presence of commonalities in Brent and WTI price series. Section VI summarizes and concludes. Additional simulation results are collected in an online Appendix.

II. Causal and non-causal models
Mixed causal-non-causal autoregressive models have recently become increasingly popular because of their appealing properties. At a theoretical level, they offer the possibility to rewrite a process with explosive roots in direct time into as process in reverse time with roots outside the unit circle. In the applied econometric literature, there is a growing interest for mixed causal-non-causal models as they (i) might improve forecast performances, (ii) can be solutions to rational expectations models which take forward-looking behaviour into account and (iii) are able to model in a simple way some nonlinear features observed in macroeconomic and financial data (see e.g. Lanne and Saikkonen, 2011;Lanne, Luoto and Saikkonen, 2012;Hencic and Gouriéroux, 2014;Gouriéroux and Zakoïan, 2016;Hecq et al., 2016). In this paper, we focus on the multivariate aspects.

Multivariate models
Consider the mixed vector autoregressive model of order (r, s), denoted as MVAR(r, s). For an n−vector time series Y t the MVAR(r, s) is given by where " t is a zero mean i.i.d. innovation process with positive definite covariance matrix E(" t " t ) = and (L) and (L −1 ) are n × n polynomial matrices of order r and s respectively. The roots of both det{ (z)} and det{ (z)} are located strictly outside the unit circle. It follows that the series Y t have a strictly stationary solution as a two-sided MA representation where (L, L −1 ) ≡ [ (L) (L −1 )] −1 , with j converging to zero at a geometric rate as j → ±∞. This model appears to be a natural extension of the univariate case. However, differently from the univariate case, representation (1) is not unique since the product of matrices is generally not commutative. That is, the two polynomials matrices in (1) are ordered such that we first have the causal and then the non-causal component. However, one may find an alternative model with this order swapped around, which might represent the processes Y t as well. This gives rise to a model where again both det{ (z)} and det{ (z)} have roots outside the unit circle. Lanne and Saikkonen (2013) provide reasons to consider the polynomial matrices in the order (z) (z) based on economic intuition. However, for the sake of completeness, we do not rule out the alternative representation (3). Note that Davis and Song (2012) propose a different way to characterize the MVAR, which does not involve the multiplicative structure of lag and lead polynomial coefficient matrices from the outset. Even though Gouriéroux and Jasiak (2017) prove that this alternative representation is more general than the multiplicative one, we adhere to the representations in (1) and (3) as they offer a more explicit and natural way to disentangle the 'causal' and 'non-causal' parts of the process. As shown by Lanne and Saikkonen (2013), series Y t in (1) also admit the following pseudo-causal VAR(p) representation where errors t are a multivariate white noise but, in general, not i.i.d. innovations with respect to the past. Representations (1), (3) and (4) have the same spectral density matrix and hence they cannot be distinguished by means of the Gaussian likelihood framework. Similar to the univariate case, it is therefore common to first estimate VAR(p) models of the form (4) in order to determine the total autoregressive order p. The p chosen in this pseudo-causal model is then fixed and all MVAR(r, s) models with p = r + s are estimated in a second step. The model yielding the highest log-likelihood at the estimated parameters constitutes the final model. Consequently, the choice of p is crucial as it has an impact on the potential models determined in the second step. See Fries and Zakoïan (2017) for an alternative approach to perform model selection using Portmanteau-type tests and extremes clustering.

Non-causal common cyclical features: definition
To extend the SCCF property to the MVAR(r, s) case, we search for the existence of a full rank matrix such that E( Y t |Y t+s ,…, Y t+1 , Y t−1 ,…, Y t−r ) = 0. This gives rise to the following definition.
Definition 1. Series Y t have k common features, CFs henceforth, if there exists an n × k (0 < k < n) full rank matrix such that (L) (L −1 ) = or, equivalently, Y t = " t . We also have that (L −1 ) (L) = although the product of matrices is not commutative.
The definition of common features in the MVAR is thus a straightforward extension of the conventional VAR case. Our goal is to determine whether such combinations exist and to estimate their number, namely the column rank of . We would like to remind the reader of the distinction made by Vahid and Engle (1993) between the notion common cyclical features that refers to a rank reduction in the dynamics of the VAR or the vector error-correction model (VECM) and the expression common cycles. The latter refers to the implication of common cyclical features in the Beveridge-Nelson decomposition of the level of the series into common trends and common cycles (which is not considered here).
Remark 2. By premultiplying both sides of equation (1) by and in view of the definition of CFs we have that which indicates that the parameters in the moving average representation of the series Y t are collinear at any lag/lead different from zero. It is straightforward to check that the reverse implication applies as well.
As discussed in the previous section, it should be noted that the chosen MVAR specification does not need to be unique due to the multiplicative structure of the autoregressive polynomial matrices. The fact that matrices generally do not commute implies the existence of a different model with the roles of the backward-and forward-looking component reversed. For completeness, Definition 1 takes this alternative model into account. We provide below a simple example of how the same full rank matrix can eliminate the common features in both specifications.
Example 3. Consider the following specification for the causal and non-causal polynomials: such that both det{ (z)} and det{ (z)} have roots outside the unit circle and which is the same as (L −1 ) (L) if we define (L) = (L) and (L −1 ) = (L −1 ). In both models, annihilates the dynamics. Note that, even in cases when the product (L) (L −1 ) is not commutative, it is generally still possible to find (L −1 ) and (L) such that their product (L −1 ) (L) is the same.

Co-movements in purely causal or non-causal models
We first focus on the two polar cases in which data are generated by either a purely causal MVAR(r, 0) or a purely non-causal MVAR(0, s) model. These are interesting cases as we have to look at linear combinations of series that are orthogonal to respectively the past or the future of the variables. We show that usual common cyclical feature test statistics (e.g. canonical correlations) can be applied to non-causal models as well. We prove that the presence of commonalities in non-causal dynamics is not preserved when reversing time (i.e. into a causal framework), except for the case of the MVAR(0,1) model. Consequently, detecting a reduced rank structure in one direction directly identifies the dynamics of the whole underlying system even under Gaussianity. Finally, we show that the mixed model necessitates a non-Gaussian likelihood framework even when a reduced rank structure is present. Accordingly, our findings for pure models cannot be extended to mixed models.
In a sense, this makes the applicability of the results somewhat restrictive. However, to the best of our knowledge, the existing literature has solely focused on conventional (causal) VAR models, which makes the identification of causal from non-causal co-movements in purely causal or non-causal VAR models (in a straightforward manner) important for applied research.
Definition 4. In purely causal [non-causal] models with ( Remark 5. The usual SCCF in a VAR(r) is obviously equivalent to the CCF in an MVAR(r, 0).
We start the analysis by concentrating on the cases in which data are generated by a purely non-causal MVAR(0, s) model, i.e. s > 0. Assuming that the matrix nc exists (we skip the indexes nc and c to simplify notations when no confusion is possible). The non-causal VAR model then reads as where ⊥ is an n × (n − k) full rank matrix such that ⊥ = 0 and at least one of the n × (n − k) matrices A j , for j = 1,…, s, has full rank. We are now in the position to state and prove the main result of this section.
Proposition 6. If series Y t are generated by a stationary MVAR(0, s) [MVAR(r, 0)] with s > 1 [r > 1], the existence of k common feature relationships does not imply the presence of any CF in the pseudo-causal [pseudo-non-causal] representation of series Y t .
Proof. We start by defining autocovariances y (−j) = E(Y t Y t+j ) and noting that Hence, it follows that we can factorize these autocovariances as the product We now rewrite the non-causal VAR in (6) under reduced rank restrictions as where The coefficient matrix is linked to the autocorrelation matrix function y (−j) through the relation where and hence = 0. We now write the pseudo-causal representation of series Y t as where which implies that matrices in do not have a common left null space and thus that = 0. In view of (10), we conclude that the pseudo-causal VAR in (9) generally does not have any (pseudo) CFs. It is straightforward to see that the reverse is also true, i.e. the non-causal representation of a causal VAR with CFs generally does not exhibit (pseudo) CFs. However, a special case occurs when p = 1. Indeed, equations (8) and (10) from which we see that = 0 leads to y (1) ⊥ = 0. Hence, the existence of k CFs in an MVAR(0, 1) implies the presence of k CFs in the pseudo-causal representation of series Y t although the CF vectors will be different in the two representations. It is easy to check that the reverse is true as well.
Remark 7. With regard to statistical inference, a relevant implication of Proposition 6 is that, except for the case of VAR models with either one lead or one lag, the Gaussian ML approach is able to identify the unique representation of series Y t that exhibit the common features. It is well-known that the Gaussian likelihood analysis of the reduced rank regression model (7) is based on canonical correlations between Y t and X t , denoted as CanCor(Y t , X t ); see e.g. Johansen (2008) and the references therein.
Remark 8. In case of the MVAR(1, 0) and the MVAR(0, 1) models, it is impossible to discriminate between causal and non-causal CFs using the Gaussian ML analysis. Indeed, by the assumption of stationarity and noting that autocorrelation is a symmetric measure, it follows that Consequently, even if there exists a non-causal common feature vector, we will also detect a causal common feature relationship using canonical correlations and we cannot identify whether the system is an MVAR(0,1) or an MVAR(1,0). However, although the eigenvalues from the solutions of CanCor(Y t , Y t+1 ) and CanCor(Y t , Y t−1 ) are identical, the eigenvectors, and hence the relationships that link series, will be different. This might explain why 'strange' estimates of common cyclical feature relationships are sometimes obtained in empirical works.

Co-movements in mixed models
Now consider the general MVAR(r, s) case where both r and s are larger than zero. Also in this case, series Y t admit a pseudo-causal VAR(p) representation (4), where p = r + s (Lanne and Saikkonen, 2013). Our main result on the general case is given by the following proposition.
Proposition 10. If series Y t are generated by a stationary MVAR(r, s) with r > 0 and s > 0, the existence of k CFs does not imply the presence of any CFs in the pseudo-causal [pseudo-non-causal] representation of series Y t .
Proof. In view of equation (2), we see that when both r and s are positive, the autocorrelation matrix function of series Y t reads which, in combination with equation (5), implies that y (j) = −j . Hence, the coefficient matrix of the corresponding pseudo-causal VAR, i.e.
will generally be such that = 0. It is straightforward to see that the same conclusion applies to the pseudo-non-causal representation as well.
Remark 11. Notice that a simple solution to detecting CFs in mixed models such as CanCor(Y t , U t ), where U t = [X t , Z t ], cannot be adopted. The reason is that in the associated reduced rank regression model where C is a full rank np × (n − k) matrix and 0 ≡ min(r,s) j=− min(r,s) j j , the regressors are correlated with the errors.
To sum up, we have shown that the presence of a reduced rank structure allows to identify purely causal and non-causal systems even under Gaussianity (for lag/lead order higher than one), since the true data generating process has either a backward-or forward-looking MA representation. This implies that causal processes are independent of future errors, while non-causal processes are independent of past errors. As the common features are present either in the causal or non-causal part, one can always find suitable instruments (i.e. either the lags or the leads) to draw inference based on the first two moments. In a mixed causalnon-causal model, the processes Y t possess a two-sided moving average representation and thus such instruments cannot be found. Whereas the detection of commonalities in mixed causal-non-causal models is clearly an interesting topic, it is outside the scope of this paper. The reason for this is that there is a price to pay in case the researcher wants to go ahead with the mixed model. Under the alternative, one would need to identify a pseudo-causal VAR(p) and then find the MVAR(r, s) that maximizes the log-likelihood of a particular non-Gaussian distribution. Imposing the reduced rank restriction to perform either a Likelihood Ratio (LR) test or a Wald test, requires the derivation of their asymptotic distributions and an evaluation of their finite sample properties.

III. Common feature test statistics
In order to test for common cyclical features in either purely causal or non-causal models, we first determine the lag order p in the pseudo-causal VAR model by using information criteria. Even in the presence of a non-causal component, we obtain a consistent estimate of the pseudo-causal order. In Section IV, we investigate the performances of LR and GMM tests for the presence of co-movements both in purely causal and non-causal models. 3 We propose to perform all tests twice: once with lags and once with leads as instruments. Since for P > 1, a reduced rank can only be present in either the causal or the non-causal dynamics, the proposed tests for detecting common features also offer a useful way to identify whether the model is causal or non-causal.

LR tests
The Gaussian LR test is based on partial canonical correlations between Y t and W t , where for the purely causal VAR, [Y t+1 ,…, Y t+p ] for the purely non-causal VAR, having concentrated out the effects of the deterministic terms D t (e.g. intercepts, deterministic trends, dummies). This procedure is denoted as The LR test, which has the null hypothesis that there exist at least k common feature vectors, is based on the statistic whereˆ i is the i-th smallest sample squared canonical correlations. More specifically,ˆ i is the i-th smallest eigenvalue of whereˆ AB denotes the sample covariance matrix of two vector time series A t and B t . Under the null hypothesis, LR follows asymptotically a 2 ( ) distribution with = knp − k(n − k) (see Vahid and Engle, 1993). The advantage of the LR test based on canonical correlation is the possibility to extract more than one common feature vector. It is more complicated to do this in the GMM framework reviewed in the next subsection. However, one of the main advantages of the GMM approach is that it is for instance relatively easy to robustify the method to the presence of heteroscedasticity.

GMM tests
We split the time series Y t in two subsets as Y t = [y 1t , y (2:n)t ] in which y 1t denotes the dependent variable, while y (2:n)t = [y 2t , y 3t ,…, y nt ] collects the n − 1 series out of the n variables. A GMM test for a single common feature can now be conducted. First, we write = [1, − ] and estimate the (n − 1) × 1 vector by the instrumental variable (IV) estimator using W t as instruments. The IV estimator of is given bŷ where [y 1 , y (2:n) ] ≡ Y and W indicate the matrices of T − p observations of variables Y t and W t , respectively, after the linear influence of the deterministic terms D t has been removed. Second, we run an overidentification J -test (Hansen, 1982) for the validity of the orthogonality conditions, i.e. the existence of a common feature relationship, using the statistic The tests presented so far embed the assumption of homoskedasticity. In a Monte Carlo study, Candelon et al. (2005) illustrate that J 1 ( IV ) has large size distortions in the presence of GARCH disturbances. Therefore, we use the White heteroscedasticity robust estimator (see, e.g. Hamilton, 1994): where B = diag(u 2 1 , u 2 2 ,…, u 2 T ) and u t = y 1t − y (2:n)tˆ IV . Then we run a J -test robust to heteroscedasticity using the statistic where u * = y 1 −y (2:n)ˆ IV −W . 4 Under the null hypothesis, both J 1 ( IV ) and J 2 ( IV −W ) follow asymptotically a 2 ( ) distribution with = knp − k(n − k) degrees of freedom. For example, this reduces to a 2 (2p − 1) in the bivariate case with k = 1.

IV. Monte Carlo results
In this section, we investigate by simulations the empirical size and power of the test statistics LR , J 1 ( IV ) and J 2 (ˆ IV −W ) in finite samples. We restrict our attention to the either purely causal or purely non-causal VAR. More precisely, we consider the bivariate causal Under the null hypothesis, the causal [non-causal] VAR has a CCF [NCCF] relationship with a cofeature vector = [1, − 1]. We remark that, according to the theoretical results in Section II, non-rejection should be interpreted as evidence that the process is causal [non-causal] and that a CCF [NCCF] exist. Consequently, rejection of both tests for the existence of a CCF and an NCCF indicates that there is neither the former nor the latter.
Tables 1 and 2 report the frequencies with which each common feature test rejects the null hypothesis at a 5% significance level. Both tests have asymptotically a 2 (3) distribution under the null hypothesis. An entry in the vicinity of 5% denotes an absence of size distortion. Regardless of the distribution chosen to simulate processes, neither LR nor GMM tests suffer from severe size distortions. To obtain this result the 'correct' 6 set of instruments must be used when performing those tests, namely orthogonality conditions for the causal [non-causal] model. This means that purely noncausal common features can easily be found by reverting the timeline of the instruments. The only test that faces a problem is the robust GMM when the Cauchy distribution is implemented. This is not surprising because in a heteroscedasticity robust correctionà la White, it is assumed that the unconditional variance exists. In practice, the squared residuals cause problems in the Cauchy case when large 'outlying' observations are introduced via the B = diag(u 2 1 , u 2 2 ,…, u 2 T ) matrix. When the wrong set of instruments is used, namely only lags for the non-causal model or leads for the causal specification, the presence of a reduced rank structure is rejected. The rejection frequencies increase with the sample size, indicating that we are indeed under the alternative hypothesis.
According to these findings, a recommendation for empirical works is to first determine the lag order p and then to apply the common cyclical feature test statistics in both directions. It has to be mentioned that in the proposed two-step procedure, the detection of CFs in either the causal or non-causal dynamics is conditional on the correct determination of p. This step is important and should be performed with caution, as an overidentification of p decreases the power of the tests, while choosing p too low might lead to a failure of detecting common features at all. In an online appendix to this paper, we present additional simulation results on the determination of the order p, the performances of the three proposed test statistics when the DGP is of a mixed form, and some misspecification tests for the null hypothesis that the errors are i.i.d. if a pseudo-causal model is estimated when data are generated by a non-causal model.

V. Co-movements in oil prices
Now, we illustrate how to use the tools that we propose in this paper in practice. We study the presence of short-run co-movements between two oil price series, that is the price of a barrel of crude oil for Brent in Europe (Brent t ) and West Texas Intermediate at delivery in Cushing, Oklahoma (WTI t ). Our sample runs from May 1987 to September 2017, which accounts for T = 365 monthly observations. The data is available at the U.S. Energy Information Administration. We first consider the monthly growth rates of the series as plotted in the second panel of Figure 1 and we define the bivariate vector Using information criteria to determine the VAR order for ln Y t , we obtain p = 1 with BIC, p = 2 with HQ and p = 4 for AIC. Table 3 reports the P-values of three misspecification tests for the three different VAR models considered. More specifically, we use a multivariate LM test for the null of no autocorrelation from lag 1 to 2, the multivariate version of the Jarque-Bera test as well as the multivariate heteroscedasticity test (no cross terms). Both the information criteria and residual tests statistics can be obtained with several statistical packages. In this study, we use EViews 10.
From the LM test, it emerges that the VAR(1) and to a lesser extent the VAR(2) are misspecified as we reject the null of no autocorrelation at a 5% significance level. In contrast, we fail to reject the null of no autocorrelation at 5% for the VAR(4). From the Jarque-Bera test, we conclude that we reject the null of normally distributed residuals for all three models considered. This provides an additional motivation to distinguish between backward-and forward-looking behaviour. Given that the presence of heteroscedasticity can have an impact on the tests of autocorrelation, we continue the investigation with all three models. In particular, we keep the VAR(1) in the set of considered models, as it helps to illustrate that while possibly finding a reduced rank structure, it is not possible to discriminate between causal and non-causal models (see Remark 7).
The last four columns of Table 3 report for both causal and non-causal models the Pvalues of the LR canonical correlation and the robust GMM test statistics. 7 Concerning the VAR(1), i.e. the row associated to the BIC criterion, neither LR nor J 2 (ˆ IV −W ) rejects the null of a reduced rank. It is also seen that P-values for both the causal and the non-causal specification are identical by construction. The eigenvector from canonical correlations gives for the causal relationship c = [1, − 0.928] and nc = [1, − 1.118] for the non-causal counterpart. The GMM relationships are very similar and yield respectively c = [1, − 0.930] and nc = [1, − 1.102]. In this case, it is not obvious from those vector coefficients to favor the causal or the non-causal relationship.
For the VAR(2), both tests fail to reject the null of common cyclical features only for the non-causal model. If we consider the VAR(4), there is a common feature vector in the non-causal framework only if we rely on the GMM test. It emerges from simulations in Candelon et al. (2005) that the latter test is more robust to the presence of heteroscedasticity. Since the results in Table 3 suggest heteroscedastic residuals, it is expected that this test is more appropriate in this empirical application. As far as the relationships are concerned, we find very similar eigenvectors in all cases. That is, the VAR (2)  From these results we conclude that a purely non-causal VAR has generated the dynamics of ln Y t . Note that we identified this process using simple tools that neither require the imposition of a particular non-Gaussian distribution nor estimation of models by complex nonlinear iterative algorithms. Finally, we consider the co-movements in the log levels of the series. We observe that the two prices display commonalities in the trend component as well as strong co-movements in the bubble feature, except for the period 2011-2014 where the Brent is systematically higher than the WTI. Several economic motivations are advocated to explain this deviation during that period, e.g. the extraction of more expensive oil sands in the United States until the installation of new pipelines for its transport. It is not obvious to determine the order of integration of this type of data. Are series I(1) as we have assumed 8 or are they globally stationary with some temporary explosive roots generating the bubbles? Several authors (Cavaliere, Nielsen and Rahbek, 2018;Hencic and Gouriéroux, 2014;Fries and Zakoïan, 2017) detrend similar series with a polynomial trend function before working with the 'stationary' detrended series. While we see the interest of this step as it keeps the bubble pattern intact, this procedure is rather ad hoc. An alternative way to model the oil prices would involve the estimation of a step function with a kind of natural level in prices of roughly 20$ per barrel until 2000 and around 60-80$ per barrel from 2005 onwards. However, the story of a natural price is incorrect now retrospectively. When the barrel reached its peak in June 2008, several people predicted the price to be as high as 200$ the barrel before the end of 2000. This is not what happened. This being said, a polynomial trend of order 4 seems to best capture the trending pattern of our series in levels, a result that is not supported with the stationary features we observe in first differences.
If we assume that the series are I(1), we also fail to reject the presence of a cointegrating relationship within a VAR(3) in levels using Johansen's approach. The trace test is 28.31 for the null of zero cointegrating vectors, the critical value at a 5% significance level being 15.5. The cointegrating vector is given by = [1, − 1.102]. In order to check to what extent our common feature results are robust to the presence of cointegration, we look at the weak form common feature approach proposed by Hecq et al. (2006). That is, we look for a combination in the VECM with 2 lags in our specific case such that c only annihilates the short run dynamics. That is, under the null, c ( ln Y t − − ln Y t−1 ) = c v t which implies that c 1 = c 2 = 0. Although the implication of this VECM in reverse time is outside the scope of this paper, we also look at co-movements in ln Y t = * + * ln Y t+1 + * 1 ln Y t+1 + * 2 ln Y t+2 + v * t such that nc only annihilates the lead short run dynamics. Similarly, we have under the null hypothesis that nc ( ln Y t − * − * ln Y t+1 ) = nc v * t which implies that nc * 1 = nc * 2 = 0. P-values are respectively 0.004 and 0.020 for the LR and the robust GMM test for the causal model; they are 0.173 and 0.368 for the non-causal specification. Common feature vectors are similar to the ones without cointegration. Hence, the results appear to be robust to the presence of a long run relationship.

VI. Conclusion
This paper provides a novel way to detect common features in a VAR framework with potentially a lead component. We show that common features that cannot be detected in the causal (i.e. backward-looking) representation might be revealed in the non-causal (i.e. forward-looking) dynamics of the series. However, whereas the difference between data generated from purely causal, purely non-causal and mixed causal-non-causal VAR models is not always graphically visible, it is well known that the latter two are able to generate much richer dynamics than the conventional causal VAR. This makes it possible to detect for example speculative bubbles and asymmetric cycles which are common to a set of series.
We propose tools to test for the potential presence of non-causal co-movements. Using sets of either lags or leads as instruments within a canonical correlation or a GMM framework, we show how additional relationships are discovered between series both in Monte Carlo simulations and in oil price series. Interestingly, we find that the presence of a reduced rank structure in the dynamics of purely causal and non-causal systems with more than one lag/lead permits identification in the usual Gaussian framework. This result cannot be extended to mixed causal-non-causal models, as the instruments are correlated with the error term of the reduced rank model by construction.