On the Sensitivity of Granger Causality to Errors‐In‐Variables, Linear Transformations and Subsampling

This article studies the sensitivity of Granger causality to the addition of noise, the introduction of subsampling, and the application of causal invertible filters to weakly stationary processes. Using canonical spectral factors and Wold decompositions, we give general conditions under which additive noise or filtering distorts Granger‐causal properties by inducing (spurious) Granger causality, as well as conditions under which it does not. For the errors‐in‐variables case, we give a continuity result, which implies that: a ‘small’ noise‐to‐signal ratio entails ‘small’ distortions in Granger causality. On filtering, we give general necessary and sufficient conditions under which ‘spurious’ causal relations between (vector) time series are not induced by linear transformations of the variables involved. This also yields transformations (or filters) which can eliminate Granger causality from one vector to another one. In a number of cases, we clarify results in the existing literature, with a number of calculations streamlining some existing approaches.


INTRODUCTION
Granger causality is one of the most important concepts for the analysis of the structure of multivariate time series. Accordingly, the original article of Granger (1969) triggered a substantial number of publications, see for example Sims (1972), Pierce and Haugh (1977), Granger (1980Granger ( , 1988 , Geweke (1982Geweke ( , 1984aGeweke ( , 1984b , Boudjellaba et al. (1992), Dufour and Tessier (1993), Dufour and Renault (1998), Al-Sadoon (2014) and the references therein. Here we deal with an aspect of Granger causality, namely the sensitivity of Granger causality relations with respect to measurement errors (or errors-in-variables) in the observations. In particular, we study the effect of additive noise on Granger causality in the context of a general weakly stationary multivariate model, especially in view of finding when spurious causality could appear, and when properties of non-causality are unaffected by measurement errors.
The problem of measurement errors is a classical issue in statistical theory; see for example the reviews of Fuller (1987), Wansbeek and Meijer (2000), Carroll et al. (2006), Gustafson (2003), and Buonaccorsi (2010). However, or equivalently (2) Here E[X B (t) | X A (s), X B (s) ∶ s < t] denotes the conditional expectation of X B (t) [given the variables X A (s), X B (s) such that s < t (and similarly elsewhere)], and Var the variance of the one-step-ahead forecast error. If inequality holds in (1) and (2), one says that X A (Granger) causes X B . Granger (1969) in addition introduced the notion of 'instantaneous causality' , meaning that the approximation of X B (t) can be more accurately achieved if X A (t) is known: for further discussion of this notion, see Pierce and Haugh (1977) and Granger (1988). The assumption of second-order stationarity is clearly restrictive, but is standard in the Granger-causality literature. Further, general characterizations of non-causality are typically little affected when common forms of forms of non-stationaritysuch deterministic time trends and integration) -are allowed; see, for example, Dufour and Renault (1998) and Dufour et al. (2006).
It is clear from the above definitions that Granger causality depends on the vector X considered and on the way X is split into subvectors X A and X B . Such choices (which are of course finite in number) depend on the context: which variables are of interest, and the objectives of the analysis. For example, X A can represent policy instruments (e.g., fiscal and monetary variables) or leading indicators of economic activity, and X B economic outcomes (e.g., national income, unemployment, etc.): the nature of the variables often provides a natural criterion for splitting X into subvectors. Clearly, the causal structure of a time series should in general depend on such choices. However, the question remains whether apparently less fundamental features, such as contamination by noise and various linear transformations, including filtering and subsampling, can affect the causal properties of a time series.
This article studies the sensitivity of Granger causality to the addition of noise, the application of causal invertible filters, and subsampling in weakly stationary processes. We give general conditions under which additive noise or filtering creates distortions by inducing (spurious) Granger causality, as well as conditions under which it does not. Even though additive noise and filtering can in general produce spurious Granger causality, there is a remarkably wide range of cases where it does not. For example, if the 'caused variable' X B is not noisy, noise added to the 'causal variable' X A cannot induce spurious Granger causality from X A to X B . This covers cases where lagged values of X A are contaminated by noise, and X B does Granger-cause X A . We also give a continuity result which entails a 'small' noise-to-signal ratio in measurement errors entails 'small' distortions in Granger causality. In a number of cases, we clarify results in the existing literature, with a number of calculations streamlining some existing approaches.
We also consider the effects of linear transformations, filtering and subsampling. In particular, we give general necessary and sufficient conditions under which 'spurious' causal relations between (vector) time series are not be induced by linear transformations of the variables involved. This also yields linear transformations (or filters) which can eliminate Granger causality from one vector to another one.
Section 2 summarizes a collection of known results available for the characterization of Granger causality, using canonical spectral factors, Wold decompositions and spectra. In Section 3, we establish some connections not clearly stated in the earlier literature, which are useful for studying causality in the presence of measurement errors. These include : a general lower bound on the conditional variance of the sum of two processes, and some general relations between Granger causality and instantaneous causality. In Section 4, we study the effect of measurement errors on Granger non-causality. Section 5 provides the continuity result in terms of signal-to-noise ratio. The effects of linear transformations, filtering and subsampling are studied in Sections 6 and 7. Section 8 offers some concluding remarks. Proofs appear in the Appendix.

CHARACTERIZATIONS OF GRANGER CAUSALITY
We review some classical characterizations of Granger causality which will be useful for studying the effect of errors-in-variables. We first record some notational conventions associated with rational (matrix) transfer functions (see e.g. Rozanov, 1967;Hannan and Deistler, (Hannan and Deistler, 2012)). We emphasize the use of spectral methods, for which Geweke (1982Geweke ( , 1984aGeweke ( , 1984b was an early promoter in the context of analyzing Granger-Wiener causality. A rational transfer function is called stable if its poles are outside the unit circle, and it is called miniphase or minimum phase if its zeros are outside the unit circle. If we commence from a rational spectral density Φ XX (z), z ∈ ℂ, which is positive definite everywhere on the unit circle, there is a spectral factorization Assumption 1. (Full rank stationary process with no spectral zero on the unit circle) X = (X ⊤ A X ⊤ B ) ⊤ is a real full-rank stationary stochastic process in ℝ d , with rational spectrum Φ XX (z) having no zero on the unit circle, such that (4) is satisfied, W(0) = I d , and The above assumption entails that X(t) has both a moving average (Wold) representation and an autoregressive representation The following theorems provide characterizations of Granger causality; see Sims (1972), Pierce and Haugh (1977), Geweke (1982Geweke ( , 1984aGeweke ( , 1984b , Boudjellaba et al. (1992), Dufour and Tessier (1993), Dufour and Renault (1998). The first one is based on the structure of the spectral factor matrix W(z). Theorem 1. (Canonical spectral factor characterization of Granger causality) Suppose Assumption 1 holds. Then the following two conditions are equivalent: The following conditions are also equivalent: (i) X A neither Granger causes X B , nor does it cause X B instantaneously; (ii) W 21 (z) = 0 and Q is block diagonal (i.e. Q 12 = Q ⊤ 21 = 0).
The intuition behind the above claim is the following. Let the innovation process be denoted by (t) = [ A (t) ⊤ B (t) ⊤ ] ⊤ with A and B two independent white noise processes. When W 21 (z) = 0, we have : It is intuitively reasonable to conclude from these equations that knowledge of the X A process up till time t − 1 will not be of help in determining the B process and thus the X B process. Spectral approaches for Granger causality analysis were emphasized in the seminal work of Geweke (1982Geweke ( , 1984aGeweke ( , 1984b .
For completeness, we note a further characterization of Granger causality, which follows from the above.
Theorem 2. (AR characterization of Granger causality) Suppose Assumption 1 holds, and X(t) has the (possibly infinite) autoregressive representation where the A i and the covariance matrix Var[ (t)] of the innovations sequence (t) are partitioned conformably with X = (X ⊤ A X ⊤ B ) ⊤ . Then X A does not Granger cause X B if and only if A i21 = 0 for all i ≥ 1. In addition, X A neither Granger causes X B , nor does it cause X B instantaneously if and only if A i21 = 0 for all i ≥ 1 and Σ 12 = Σ ⊤ 21 = 0.
Theorems 1 and 2 give characterizations of the absence of causality based on the spectral factor and infinite AR representations (the latter is obtained from the inverse of the spectral factor). Sims (1972) gave an additional characterization (for d = 2), based on Wiener filtering ideas, where no factorization is required. Let the spectral density Φ XX be partitioned conformably with Then we have the following spectral characterization of non-causality.
Theorem 3. (Transfer function characterization of Granger causality) Suppose Assumption 1 holds, and let Φ XX be partitioned as in (10). Then, the following conditions are equivalent: is a stable transfer function. The following conditions are also equivalent: is a stable transfer function assuming the value 0 at z = 0.
Remark 1. The above theorem can be viewed as an extension of the corresponding theorem given by (Sims, 1972, Theorem 2) in the special case where d = 2. Theorem 3 allows for d ≥ 2, and covers instantaneous causality as well. 2 We are not contending that the characterization of this theorem is necessarily attractive from a computational point of view. As later parts of the article show though, the result is of theoretical interest, in that it can be applied to give rapid derivations of the sensitivity properties associated with Granger causality.
Remark 2. The transfer function Φ AB (z)Φ −1 BB (z) is the transfer function of the optimum two-sided Wiener filter for approximating the process X A from the process X B ; the two-sided aspect refers both to the fact that the transfer function has a Laurent series expansion with both negative and positive powers of z, and to the related fact that X A (t) is being approximated from X B (s), −∞ < s < ∞, that is, from the past and future of X B . If the two-sided transfer function in a particular case is causally one-sided, then future values of X B are irrelevant in approximating current values of X A . This will be the case if past values of X A do not affect present or future values of X B .
Remark 3. It is important to note that the characterizations given in this section hold for series in discrete time observed at a given frequency. They are directly applicable to continuous time series, and modifications arise typically when the series are transformed or filtered. The effect of such transformations will be considered in sections 6 and 7 below.

DIRECTIONS OF GRANGER CAUSALITY
In the literature, one finds remarkable similarity between conditions said to capture 'X A does not cause X B ' and 'X B causes X A ' and similar pairings. To study the effect of errors-in-variables on causality, we establish in this section some connections not clearly stated in the earlier literature. We start with the following preliminary result. Lemma 1. Let X and Y be two independent stationary stochastic processes with spectral densities. Let Z = X+Y. Then the covariance matrix of the one step prediction error in approximating Z(t + 1) from Z(s), s ≤ t is bounded from below by the sum of the covariance matrices of the one step prediction error in approximating X(t + 1) from X(s), s ≤ t and in approximating Now we spell out the following relations between Granger causality and instantaneous causality.
Theorem 4. Adopt the same hypothesis as in Theorem 1. Suppose X A does not Granger cause X B nor does it cause X B instantaneously. Then either the two processes are independent, or X B Granger causes X A . Further, suppose alternatively that X A does not cause X B . Then, either the two processes are independent, or X B Granger causes X A , or X B causes X A instantaneously.
Note that neither claim of the theorem goes in the reverse direction. This is because it is possible that both X A Granger causes X B and simultaneously X B Granger causes X A . Such a situation will generally arise when the canonical spectral factor W is not triangular (or diagonal), as in the following example: Here, A , B are independent white noise processes with variances Q A , Q B . One can verify that and the transfer function matrix is easily verified to be stable and minimum phase, assuming the value I when by a similar argument to that used in the proof of Theorem 4.

ADDITIVE NOISE AND GRANGER CAUSALITY
We consider the effect of additive noise on Granger causality (compare with Anderson and Deistler (1984) and Anderson (1985)). Our starting point, again, is the full-rank stationary process X = [X ⊤ A X ⊤ B ] ⊤ with rational spectral density.
Suppose that X A does not Granger cause X B . Suppose further that the processes X A , X B are both contaminated by stationary colored additive noise processes N A , N B with rational spectral densities, which are independent of each other and of the processes X A , X B . Then one can ask whether it is now true that the processX A = X A + N A does not Granger cause the processX B = X B +N B . Perhaps of equal if not greater interest is the associated question : suppose thatX A ,X B are regarded as noisy measurements of underlying processes X A , X B and that analysis of measurement data reveals thatX A does not causeX B . Can one conclude then that X A does not Granger cause X B ?
In the next section, we will construct an example showing that the answer to the first question is generally no, a conclusion that is perhaps not counterintuitive since non-causality corresponds to zero restrictions. In the following section, we show how the Sims (1972) characterization of the absence of Granger causality summarized in Theorem 3 reveals that the claim remains valid if the contaminating noise N B is zero, and this is generically a necessary condition for the claim to hold. There is no similar requirement on the noise N A . In a article of Solo (2007), several important questions are raised about the sensitivity of Granger causality (or its absence) to changes in the underlying assumptions. We consider one of these, namely the effect of additive noise. Our results differ from those obtained in Solo (2007). 3 We first study the stationary full-rank vector process can be regarded as the juxtaposition of two subprocesses X A and X B . Suppose that X A does not Granger cause X B nor does it cause X B instantaneously.

Noise-induced Granger Causality
We will now introduce the promised example. To define the X A , X B processes where X A does not Granger cause X B nor does it cause X B instantaneously, following Theorem 1 we shall choose an upper triangular canonical spectral factor. The two processes are scalar, and we assume and we further assume the innovations covariance Q is the identity matrix. An easy calculation delivers Now assume that additive noise with a white spectrum of intensity 3 4 is added to X B , to produce a new process X B , while no noise is added to X A . The cross spectrum between X A and X B is unaffected. So the new joint spectral matrix is If it were true thatX A does not Granger causeX B , nor causeX B instantaneously, then this matrix would need to have a canonical spectral factorW(z) say, which like W(z) is upper triangular withW(0) = I, and an associated innovations covariance matrix which is diagonal. To derive a contradiction, let us assume this to be the case and findW(z). The upper triangularity implies that the (2, 2) termW 22 ofW(z) must satisfyW 22 (0) = I and which means thatW 22 (z) itself is a canonical spectral factor, for ΦBB(z). One can easily verify that so we see thatW Now consider the (1, 2) entry Φ AB (z) of the spectrum. From the fact that whenW(z) is triangular, we have that from which we obtain It is easy to see thatW 12 (z) has a pole at −1∕(2 + √ 3), which is inside the unit circle. This is a contradiction to the requirement on the poles of a canonical spectral factor that they should all lie outside the unit circle.

Spectral Characterization of Noise-induced Granger Causality
It is now straightforward to understand the effect of adding noise to the processes X A , X B on the property that X A does not Granger cause X B . Suppose as before that N A , N B are two processes, independent of X A , X B and each other, The absence of Granger causality will carry over, that is,X A will not Granger causeX B if and only if (by Theorem 3), ΦĀBΦ −1 BB is a stable transfer function. If there is noise on the process X A but not the process X B , the result is immediate that absence of causality continues to hold; the same transfer function fraction in fact arises, On the other hand, if there is noise on the process X B , for 'almost all' spectra of Φ N B N B , including certainly a white spectrum, unless Φ BB is itself white, the zeros of Φ BB + Φ N B N B will differ from those of Φ BB and not be the same as the poles of ΦĀB = Φ AB . So the cancellation of unstable pole-zero pairs in forming the fraction will no longer occur and the absence of Granger causality will then be lost. Now let us postulate that processesX A ,X B are measured and found to have the property thatX A does not Granger causeX B ; these processes are assumed to be noisy versions of underlying processes X A , X B , with the additive noise processes being independent of each other and the underlying X A , X B processes. Ultimate interest lies in saying whether or not X A Granger causes X B . Then the above argument shows that if we knew that there was no noise perturbing X B , processing of the noisy measurements would allow answering of the question. On the other hand, if there is noise perturbing X B , one could not infer from the presence or absence of a causality property involvinḡ X A ,X B the corresponding property for X A , X B . The noise process N B would need to have a specialized spectrum for absence of causality in the noisy case to imply it in the noiseless case. Note that there is no adjustment to the conclusions which arises in the special case of the noise process N B being white.
The results above are summed up in Theorem 5.
Theorem 5. Adopt the same hypothesis as in Theorem 1. Let N A , N B be two stationary processes with rational spectra, with the same dimensions as X A , X B respectively, where X, N A , N B mutually independent, and setX A = of interest is provided by the situation where the two processes are actually independent. Then Φ AB = 0, and so the relevant transfer function Φ AB Φ −1 BB with or without noise added remains zero and there is no causality introduced through the addition of noise.
We comment that our conclusions are at variance with those of Solo (2007), who asserts that addition of both noise sequences N A , N B to X A , X B where X A does not Granger cause X B means thatX A does not Granger causē X B . There appears to be an unjustified assumption in his work (as confirmed in private communication) where he constructs a triangular spectral factor for theX process but does not ensure that the off diagonal term is guaranteed to be stable-stability is simply assumed automatically. Such stability would be a necessary condition for asserting thatX A does not Granger causeX B .

SIGNAL-TO-NOISE RATIO AND GRANGER CAUSALITY
We argue a form of continuity result. If there is additive noise perturbing an arrangement where there is absence of causality, then although generically absence of causality will be lost, we shall show that in a certain sense made more precise below, the introduced degree of non-causality is small. The practical effect of this result is that small amounts of noise in a particular situation may well be tolerable.
Our starting point is the following observation.

Lemma 2. Consider a complex matrix function M(z), analytic in
and define the causal and anticausal parts by Then the matrix function L(z) ∶= I + M(z) is analytic in < |z| < −1 , with L(z) = L ⊤ (z −1 ), and positive definite on |z| = 1. Further to first order in > 0, there holds with I + M + stable and miniphase.
We remark that the terminology 'to first order in ' is shorthand for saying that the L 2 norm of the error between L above and the approximation of it on the right-hand side of (25), call it Δ(z), is of order 2 . The square of this L 2 norm can be computed with the aid of an integration of around the unit circle, as trace 1 2 ∫ [Δ(exp(j ))] 2 d or by taking the squared sum of the coefficients in the Laurent series of the error, that is, We will use this result to show that small perturbations in a spectrum give small perturbations in the associated spectral factors, and thence conclude that Granger causality is in a sense continuously dependent on the noise spectrum, it being absent when there is no noise. Accordingly we consider the arrangement studied in the previous section, with the introduction of a scaling parameter on the noise : thus X = [X ⊤ A X ⊤ B ] ⊤ and X A does not Granger cause X B nor does it cause X B instantaneously. The canonical factor W(z) for the noise-free spectrum Φ XX (z) is upper block triangular and the innovations covariance matrix Q is block diagonal, and they obey the fundamental spectral factorization Eq. 4. Assume that 1∕2 N B for some > 0 is a noise process additively perturbing X B , thus

111
(We have effectively previously dealt with the effect of having a noise process N A perturbing X A -the noisy process X A + N A is known to inherit the property of not Granger causing X B , and so no further consideration is given to N A and for convenience we take it as zero). Now note that The spectrum ΦXX gives rise to a canonical spectral factor, call itW(z) and an associated covariance matrix, call itQ, satisfying ΦXX(z) =W(z)QW ⊤ (z −1 ).
Our first result follows.
Theorem 6. Adopt the same hypothesis as in Theorem 1 and let N B be a stationary process with rational spectrum, with the same dimension as X B , and with X, N B independent. For fixed positive , defineX B = X B + 1∕2 N B so that ΦXX = Φ XX + Φ NN where the (1, 1), (1, 2), (2, 1) blocks of Φ NN are zero, and the (2, 2) block is Φ N B N B . Let W(z), Q with W(z) upper block triangular and Q block diagonal andW(z),Q define canonical spectral factorizations of Φ XX (z) and ΦXX(z) as in (4) and (27) respectively. Then We remark that the first and third bounds imply bounds on the L 2 norms of the quantities which are also O( ). Evidently, theX process is 'close to' a process in which X A does not causeX B in two senses : the canonical spectral factor is close to upper block triangular with the innovations covariance matrix being block diagonal, and (separately), the anti-causal part of the two-sided Wiener filter associated with predicting X A fromX B has small magnitude on |z| = 1 and in L 2 norm.
In the above theorem, we focused on the changes to transfer functions and to the innovations covariance caused by the introduction of noise. It is also relevant to compare the prediction error variances when X A (s), s ≤ t, X B (s), s < t and X A (s), s ≤ t, XB(s), s < t are used to predict X B andX B respectively. The results are summarized Theorem 7. It shows that the prediction error 'measure' of Granger causality is O( 2 ). Theorem 7. Adopt the same hypothesis as in Theorem 6 and assume that > 0 is sufficiently small thatW 22 is minimum phase. Then there exist positive R, R ′ of O( 2 ) for which there hold the upper and lower bounds ∶ and

Consider a stationary full-rank process
where T(L) is a causal transfer function, partitioned conformably with [X ⊤ A X ⊤ B ] ⊤ . We consider the question : if X A does not Granger cause X B , willX A not Granger causeX B ? Conversely, and on occasions more importantly, if one observes thatX A does not Granger causeX B , can one conclude that X A does not Granger cause X B ? Questions of this type go back some time, see for example, Pierce and Haugh (1977), Solo (2007Solo ( , 2016 , Florin et al. (2013), Seth (2011), andSeth et al. (2013).
In the following theorem, we give a general necessary and sufficient condition forX A not to causeX B in the sense of Granger.
Provided Π 22 (z) is miniphase, condition (31) shows it is generally possible to choose the filter T(L) so thatX A does not Granger causeX B , irrespective of the causal relation between X A and X B . Filtering can make Granger causality invisible. In general, Granger causality from X A to X B is not invariant to the application of a multivariate causal miniphase filter to X(t). When condition (31) here X(t) = (t) and Barnett and Seth (2011 Abstract, Sections 1 and 3) claim that 'G-causality for a stationary vector autoregressive (VAR) process is fully invariant under the application of an arbitrary invertible filter'. The above counterexample shows clearly this claim should be qualified. Conditions under which this type of invariance hold are given by Theorem 8.
In the following corollary, we consider the important case where T(L) has a block triangular form.
values of both X A (t) and X B (t). Further, the fact that (35) is an equivalence means that non-causality between the filtered series does allow one to conclude that the unfiltered series have the same property. Solo (2016) recently considered the special case where T AB (L) = 0 and T BA (L) = 0, and showed only sufficiency (X A does not Granger cause X B implies thatX A does not Granger causeX B ); see also Florin et al. (2013), Barnett and Seth (2011) and Seth et al. (2013) for related results.
Remark 6. (36) of Theorem 8 gives a general condition under which Granger causality from X A to X B can be suppressed through filtering. In particular, if Π 21 (z) is miniphase, we can always choose T(L) so thatX A does not Granger causeX B . For example, this is achieved by taking T B (L) invertible and In particular, by taking T B (L) = I and T BA (L) = Π 22 (L) −1 Π 21 (L), we find that the filter eliminates Granger causality from X A toX B :X B is 'orthogonal' to X A in the Granger sense.

Remark 7.
The assumption that W(L) is miniphase entails that both W A (L) and W B (L) are miniphase when There is a heuristic explanation of why the introduction of a non miniphase but stable W B (z) might produce causality where previously it did not hold. Suppose X A does not cause X B , but X B does cause X A . For a specific example, suppose that with A , B independent white noise sequences. Now suppose that X B is subject to processing by a filter which delays it. It is well understood in signal processing theory that any non-minimum phase filter introduces a delay; a particularly evident example is the filter with transfer function z p , which produces a delay of p units. Suppose p is say 3. Then X A (t) =X B (t + 1) + A (t), so future values ofX B are correlated with past values of X A , which means precisely that X A does causeX B .
Remark 8. Consider the case where X A defines a filter which is simply a delay. Then the theorem says that if X A does not Granger cause X B , the same will be true if X A is replaced by a delayed version of itself. This can of course be argued from first principles also.

EFFECT OF SUBSAMPLING ON GRANGER CAUSALITY
A further question raised in the work of Solo (2007) deals with subsampling. Suppose that a process X A does not Granger cause X B nor does it cause X B instantaneously. Suppose the two processes are subsampled. Will the absence of Granger causality continue to hold for the subprocesses? As we show here, again appealing to Theorem 3, absence of Granger causality may be lost, and Granger causality may arise in the subsampled processes.
By way of brief comment, we note a refinement of the question. Subsampling of two processes at the same rate may not occur synchronously. Thus, for example, samples of X A with even time index might be considered along with samples of X B with odd time index. Also, subsampling may occur at different rates. In fact, it could be that X A is not subsampled, while X B is. We will not consider these variants in this section.
We first note a very easily established theorem on the transformation of spectra under subsampling (see Hannan, 1970). 4 Theorem 9. Let Φ XX (z) denote the spectrum of a process X, and suppose the process is sampled (synchronously in the case of a multivariate process) every M time units for some positive integer M , to generate a processX, defined for time instants … , −M, 0, M, 2M, … . Then the spectrum ofX, defined using covariance data with lags which are integer multiples of M as is expressible as The proof is an immediate consequence of the fact that and the addition of these equations for i = 0, … , M − 1. In our counterexample, we shall take M = 2. This means that ΦXX(z) = Φ XX (z) + Φ XX (−z). In detail, consider a process X defined by a unit matrix for the innovations covariance, and a canonical spectral factor given by with the particular expression for the (1, 1) entry not being provided, since it turns out to be irrelevant to the calculation. From this expression, it follows that the (1, 2) and (2, 2) entries of Φ XX are : Based on the formula for subsampled spectra, we now have: The denominator polynomial has four zeros on the unit circle (at ±0.81 ± 0.58 √ −1) and there are no cancellations with the zeros of the numerator polynomial (at 0, 0, ±0.876 √ −1 ). (The presence of unit circle zeros rather than zeros inside the unit circle is not of particular note. One can also verify that if the (2, 2) term of the spectral factor in the example is changed to be (1 − z 3 )(1 − z 2 ) −1 , then the denominator polynomial has two zeros inside and two outside the unit circle, at ±1.382 and ±0.869.) So in both cases the transfer function is not causal. This means that the downsampling destroys the absence of Granger causality.
This result is not consistent with the claim of Solo (2007). It would appear that in seeking to construct a canonical spectral factorization for the subsampled process, he forces triangularity (which is needed to conclude absence of Granger causality) but in the process cannot assure that the proposed spectral factor remains stable, that is, the simultaneous requirements on the spectral factor for it to be triangular and stable cannot both be met. The fact that an underlying process may have unidirectional Granger causality, while a time aggregated version of it (a form of subsampling) has bidirectional Granger causality, has also been observed in Chambers and McCrorie (2004).

CONCLUSION
In this article, we have sought to explain how intuition can be misleading in determining the effect of certain changes made to an underlying model on the Granger causality properties associated with that model. For the important case of additive noise, we have established general conditions under which additive noise does not affect non-causality properties, as well as conditions under which noise induces spurious Granger causality. It is clear from these that additive noise generally distorts causal relations, even though there are interesting cases where it does not. For example, in the case of measurement errors, if X B is not noisy, noise on X A does not induce spurious Granger causality from X A to X B . To derive our results, we mostly rely on generating function methods, canonical spectral factors, Wold decompositions and spectra. These approaches considerably simplify many proofs. Besides rigorous demonstrations, our findings extend and qualify the early results of Newbold (1978), which focused on special cases. Further, in our discussion of noise, we have drawn attention to the usefulness of thinking of an 'amount' of causality : if noise is 'low', the distortion to the predictability and Granger causal properties of the process is also 'low'.
We have also revisited the effects of linear transformations, filtering and subsampling on Granger causality properties. These results include general necessary and sufficient conditions under which 'spurious' causal relations between time series are not induced by linear transformations of the variables involved. This characterization also yields linear transformations (or filters) which can eliminate Granger causality from one vector to another one. The various results presented allow one to correct some erroneous statements in the earlier literature [as in Solo (2007) and Barnett and Seth (2011)] on Granger causality in the presence of measurement errors and variable transformations.
The fact that additive noise can distort Granger causality relations means that empirical findings on such properties become more delicate to interpret. Indeed, inferences on 'noisy variables' remain perfectly valid as long the causal properties are ascribed to observed variables. If we have no theory (such as an economic or a physical model) that can lead one to distinguish between a 'true' latent variable and a 'measured' variable, there is no error-in-variables problem. Otherwise, one must be aware that the causal structure of the original 'uncontaminated' variables can be different. In this context, it is quite interesting to observe that no 'spurious' causality can arise when noise only affects the output variable (see Theorem 5). Further, when the signal-to-noise ratio is large, distortions on apparent causality will also be small (see Theorem 6).
Similarly, the fact that aggregation and subsampling can distort dynamic relations has been widely discussed; see Tiao and Wei (1976), Wallis (1974), Sims (1974), Wei (1982), Hylleberg (1986), Marcellino (1999), Kaiser and Maravall (2001), Breitung and Swanson (2002), Gong et al. (2015), and the references in the survey of Silvestrini and Veredas (2008). For a specific example showing that causal relations are modified by changing observation frequency, see Dufour et al. (2012). However, Theorem 8 and Corollary 1 give general necessary and sufficient conditions under which 'spurious' causal relations between (vector) time series will not be induced by linear transformations of the variables involved. This also yields linear transformations (or filters) which can eliminate Granger causality from one vector to another one.
Unaddressed to this point is a corresponding result on subsampling. It is well known that a continuous-time band-limited process and a sampled version of it with sampling frequency in excess of twice the approximate 'cut-off' frequency of the continuous-time process have more or less the same information. This would suggest that Granger causality properties of continuous-time band-limited processes would be 'almost' preserved if they were sampled frequently enough, that is, in these circumstances the loss of Granger causality would be small. On a possible avenue in this direction, see Pollock (2012).
Due to dependence on an information set, Granger causality properties are not generally invariant to observation frequency. In contrast with contamination by 'noise' -which is typically problematic -non-invariance to observation frequency may not be a 'problem' . Indeed, it can have considerable practical meaning and usefulness: different mechanisms may matter for short and long-run predictability as well as decisions, in the same way that different 'laws' apply to micro-phenomena and macro-phenomena in physics. For economic and financial decisions, short-run and long-decisions may depend on different factors and require different rules. High-frequency decisions require prediction for data observed at high frequency, while longer-run decisions require predictability for data observed at low frequencies. Granger causality analysis at different observation frequencies can provide information on this. It would certainly be of interest to develop a systematic framework for exploring and exploiting such features. The use of mixed frequency data [as proposed in Ghysels et al. (2016)] could also be relevant in this context. However these problems clearly go beyond the scope of the present article.
A different area worthy of examination is that of networked systems. The underlying structure could be viewed using a directed graph, An edge from vertex i to vertex j would mean that process i is caused by process j . One could begin by examining whether, in a path graph with vertex set {v 1 , v 2 , … , v N } and edge set {e 12 , e 23 , … , e (N−1)N }, Granger causality properties of nonadjacent nodes could be predicted from the corresponding properties of adjacent nodes.
The results given in this article rely on the common assumption of (second-order) stationarity. Even though there is no reason to think distortions induced by measurement errors would suddenly stop to exist in nonstationary setups, it would be of interest to extend our results to nonstationarity setups, for example, through Hilbert space techniques, especially to evaluate whether non-stationarity tends to increase or decrease the distortions. Note that nonstationary models are explicitly allowed in Dufour and Renault (1998) and Dufour et al. (2006); for other articles which apply Hilbert space methods to Granger causality, see Hosoya (1977), Mouchart (1982, 1985) , Florens et al. (1993), Florens and Fougère (1996), Triacca (1998Triacca ( , 2000 , Al-Sadoon (2014). To study measurement errors, careful consideration of the type of nonstationarities is needed, and different proof methods may be required. Another possibly more general approach to incorporate non-stationarity would consist in working with the underlying probability measures, along the lines suggested by Mykland (1986). We leave such extensions to future work.
Accordingly, using the fact that the space spanned by Z(s) = X(s) + Y(s) for s < t is a subspace of the space spanned by X(s), Y(s) for s < t , we have as claimed. Of course the independence of the processes X, Y is used at the second last step.

Proof of Theorem 3
First, consider the question of absence of Granger causality and instantaneous causality and assume that condition 2 of Theorem 1 holds. Using (4), (10) and the fact that W 21 = 0, it is easily checked that Also, the triangular structure of W and its minimum phase character imply that W −1 22 is stable. Hence Φ AB Φ −1 BB is also stable. It also assumes the value 0 at z = 0 because of the normalization condition W(0) = I.
To prove the converse, consider a canonical spectral factorization of the type of (4) (with no apriori restriction on the block triangular structure of W). Suppose that Φ AB Φ −1 BB ∶= T(z) is stable and zero at z = 0. We first assert that this will imply the condition W 21 = 0 in the minimum phase, stable spectral factor normalized to be I at z = 0. To see this, suppose first that V B (z) is a minimum phase, stable transfer function with V B (∞) = I and R B is a positive definite matrix such that Note that V AB (z) is then stable and zero at z = 0. Note also that the last two equations yield By the fact that Φ XX is positive definite for all z = e j , it follows that Φ AA − Φ AB Φ −1 BB Φ BA is also positive definite for all z = e j , and accordingly, we can define a minimum phase, stable transfer function V A (z) with V A (0) = I and a positive definite R A such that Observe that Now from equations (A9), (A6) and (A4), we have Now make the definitions and observe that V(z) is stable, minimum phase and has V(0) = I. Accordingly, V(z) is the unique miniphase stable spectral factor with V(0) = I, and condition 2 of Theorem 1 is evidently fulfilled. The argument for Granger causality only is similar but not identical, and so we present the details. Assume then there is an absence of Granger causality, which means that there is a spectral factorization with canonical spectral factor W(z) and not necessarily block diagonal innovations covariance matrix of the form It then follows that and this is evidently stable.
For the converse, we follow the proof applying for the case where the aim was to conclude an absence of Granger causality and instantaneous causality. The proof applies with the first change that V AB (z) is no longer guaranteed to be zero at z = 0. Equation (A10) holds, including the fact that R is block diagonal, but V(∞) ≠ I, due to the generally nonzero nature of V AB (0). Now by setting equation (A12) arises, and Granger causality is then proved.

Proof of Theorem 4
For the first claim, let W 11 (z), W 12 (z), W 22 (z), Q 11 , Q 12 and Q 22 be the matrices of the canonical spectral factor description of the joint spectrum of X A , X B . If W 12 = 0, then it is evident that the two processes are independent. So we must prove that if W 12 ≠ 0, then X B Granger causes X A . This is equivalent to showing that and the two conditional covariance matrices are unequal. Now observe that from the canonical factorization of the joint process X A , X B , we have immediately Further, we know that X A (t) = W 11 (L) A (t)+W 12 (L) B (t). This expresses X A as a sum of two independent processes and the lemma above applies. Note that the variance of the one step ahead prediction for the process W 11 (L) A (t) is precisely Q 1 , since W 11 (z) is a canonical spectral factor. Hence the claim (A15) is established. For the second part, we are required to show that when X A does not cause X B , either the processes are independent or where equality does not hold. Let Q 12 be the (1, 2) block of the matrix Q in the canonical spectral factorization; in general, it is not zero. Then it is easily seen that 11Q 12 . We can now use Lemma 1 as we did in the proof of Theorem 4 to conclude that From the canonical factorization for ΦXX we have that the second term on the right in the above equation is given by We now turn to establishing the bound on R.X B (t) fromX A (s), s ≤ t,X B (s), s < t. This is, as is well known, pre-ciselyQ 22 −Q ⊤ 12Q −1 11Q 12 . That for predictingX B (t) fromX B (s) is more complicated. To make progress observe that Consider the transfer function acting on̄A in (A26). From Theorem 6,Q 12 andW 21 (z) are O( ) and so the transfer function multiplyinḡA is of order . Hence we see that R, being the prediction error variance in estimating a variable whose spectrum is proportional to 2 must itself be proportional to 2 . Hence the increase in prediction error covariance whenX A ceases to be available for estimatingX B is bounded from below by a quantity proportional to 2 . We now derive the overbound of the same order. Choose a constant R ′ so that for all z = exp(j ), there holds This is possible since for all z = exp(j ) the left side is positive definite. Since the right side is O( 2 ), it is clear that R ′ can be taken also as O( 2 ). NowX B is written in (A26) as the sum of two orthogonal processes. Hence the spectrum ofX B will be the sum of the spectra of these two processes, that is, again for z = exp(j ). Hence there exists a process, call it X D , which is independent ofX B , for which X C =X B +X D and whose spectrum is Φ CC (z) − ΦXX(z). By Lemma 1, the variance of the one step prediction estimate using its own past of the process X C with spectrum of Φ CC overbounds the sum of the variances of the one step prediction estimates of each ofX B and X D . A fortiori it overbounds the variance of the one step prediction estimate ofX B : thus