SEARCH

SEARCH BY CITATION

Keywords:

  • Bootstrap test;
  • Causality in variance;
  • Dimension reduction;
  • Extended GARCH(1,1) model;
  • Financial returns;
  • Portfolio volatility;
  • Quasi-maximum-likelihood estimator;
  • Time series

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. Simulation
  6. 4. Real data examples
  7. 5. Conclusional remark
  8. Acknowledgements
  9. References
  10. Appendix

Summary.  We propose to model multivariate volatility processes on the basis of the newly defined conditionally uncorrelated components (CUCs). This model represents a parsimonious representation for matrix-valued processes. It is flexible in the sense that each CUC may be fitted separately with any appropriate univariate volatility model. Computationally it splits one high dimensional optimization problem into several lower dimensional subproblems. Consistency for the estimated CUCs has been established. A bootstrap method is proposed for testing the existence of CUCs. The methodology proposed is illustrated with both simulated and real data sets.


1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. Simulation
  6. 4. Real data examples
  7. 5. Conclusional remark
  8. Acknowledgements
  9. References
  10. Appendix

One of the most prolific areas of research in financial econometrics literature in the last two decades has been to model time varying volatility of financial returns. Many statistical models, most designed for univariate data, have been proposed for this. From a practical point of view, there are at least two incentives to model several financial returns jointly. First, time varying correlations between different securities are important and useful information for portfolio optimization, asset pricing and risk management. Secondly, models for a single security may be improved by incorporating the relevant information in other related models. The quest for modelling multivariate processes, which are often represented by conditional covariance matrices, has motivated the attempts to extend univariate volatility models to multivariate cases, aiming for practical and/or statistical effectiveness. We list some of the endeavours below.

Let {Xt} be a vector-valued (return) time series with

  • image

where ℱt is the σ-algebra that is generated by {Xt,Xt−1,…}, and Σt is an ℱt−1-measurable d×d semipositive definite matrix. One of the most general multivariate generalized auto-regressive conditional heteroscedasticity GARCH(p,q) models is the BEKK representation (Engle and Kroner, 1995)

  • image( (1.1))

where C, Aij and Bij are d×d matrices, and C is positive definite. Although the form of this model is quite general especially when m is reasonably large (proposition 2.2 of Engle and Kroner (1995)), it suffers from overparameterization. Similar to multivariate auto-regressive moving average models, not all parameters in model (1.1) are necessarily identifiable even when m=1. Overparameterization will also lead to a flat likelihood function, making statistical inference intrinsically difficult and computationally troublesome (Engle and Kroner, 1995; Jerez et al., 2001).

To overcome the difficulties due to overparameterization, a dynamic conditional correlation (DCC) model (Engle, 2002; Engle and Sheppard, 2001) has been proposed. It is based on the decomposition

  • image( (1.2))

where inline image, σt,ii is the conditional variance of the ith component of Xt and Rt≡(ρt,ij) is the conditional correlation matrix. A simple way to facilitate such a model is to model each σt,ii with a univariate volatility model and to model conditional correlation by using a rolling exponential smoothing as follows:

  • image

where inline image and λi,λj ∈ (0,1) are constants. Even with such a simple specification, the estimation typically involves solving a high dimensional optimization problem as, for example, the Gaussian likelihood function cannot be factorized into several lower dimensional functions. To overcome the computational difficulty, Engle (2002) proposed a two-step estimation procedure as follows: first fit each σt,ii in equation (1.2) with a univariate GARCH(1,1) model, and then model the conditional correlation matrix Rt by the simple GARCH(1,1) form

  • image( (1.3))

and ɛt is a (d×1)-vector of the standardized residuals that are obtained in the separate GARCH(1,1) fittings for the d components of Xt and R is the sample correlation matrix of ɛt. There are only two unknown parameters θ1 and θ2 in model (1.3), so it can be easily implemented even for large or very large d. However, it may not provide adequate fitting when the components of Xt exhibit different dynamic correlation structures; see the real data examples in Section 4 later. Indeed, the conditional correlation matrix in model (1.3) is a linear combination of the static sample correlation matrix R and the exponential smoothing of inline image, which is a non-parametric estimator. When θ1+θ2=1, it is a pure non-parametric (exponential smoothing) estimator. The biases are inevitable in such an estimation for the conditional correlation.

Alexander (2001) proposed an orthogonal GARCH model which fits each principal component with a univariate GARCH model separately and treats all principal components as conditionally uncorrelated random variables. Since principal components are only unconditionally uncorrelated, such a misspecification may lead to non-negligible errors in the fitting; see the first example in Section 4.

Other multivariate volatility models include, for example, the vectorized multivariate GARCH models of Bollerslev et al. (1998), the constant conditional correlation multivariate GARCH models of Bollerslev (1990), a multivariate stochastic volatility model of Harvey et al. (1994), a generalized orthogonal GARCH model of van der Weide (2002), an easy-to-fit ad hoc approach of Wang and Yao (2005) and a hidden Markov switching model of Pelletier (2006); see also a survey in Bauwens et al. (2006) and the references therein.

In this paper, we propose a new alternative for modelling multivariate volatilities. The basic idea is to assume that Xt is a linear combination of a set of conditionally uncorrelated components (CUCs); see Section 2.1. One fundamental difference from the orthogonal GARCH model is that we use CUCs, instead of PCs, which are genuinely conditionally uncorrelated. The advantages of the new approach include

  • (a)
    the CUC decomposition leads to a parsimonious and identifiable representation, and the number of parameters in the model is significantly reduced comparing with for example, the BEKK representation or the vectorized multivariate GARCH models,
  • (b)
    it has the flexibility to model each CUC separately with any appropriate univariate volatility models,
  • (c)
    computationally it splits a high dimensional optimization problem into several lower dimensional subproblems and
  • (d)
    it allows the volatility model for one CUC to depend on the lagged value of the other CUCs.

However, the estimation of CUCs involves solving a non-linear optimization problem with d(d−1)/2 variables, where d is the dimension of Xt. This poses some limitation on the dimensionality d with the available computing capacity. We view the CUC as a model that is capable of catching sophisticated dynamical correlation structures, but its potential may only be fully capitalized with further development in computing power and/or high dimensional optimization algorithms.

The idea of using CUCs is similar to the so-called independent component analysis (Hyvärinen et al., 2001). However, instead of requiring that all the component series are independent of each other, we impose only a weaker condition that the component series are conditionally uncorrelated; see condition (2.1) below. This relaxation is critical for the problem that is of concern in this paper. Of course, like independent components, CUCs may not always exist. We propose a bootstrap test to assess the existence of CUCs. Our empirical experience indicates that, for a large number of practical examples with small or moderately large d, there is no significant evidence to reject the hypothesis on the existence of CUCs.

Literature on applying independent components analysis to financial and economic time series includes, for example, Back and Weigend (1997), Kiviluoto and Oja (1998), Mălăroiu et al. (2000) and van der Weide (2002). Although our basic idea is quite similar to that of van der Weide (2002) which dealt with Gaussian innovation models only, our approach is completely different; we separate the estimation for the CUCs from fitting the volatility models for the CUCs. In fact, fitting each CUC becomes a univariate volatility modelling problem.

The rest of the paper is organized as follows. Section 2 contains a detailed description of the new methodology proposed and the associated theoretical results. Simulation results are reported in Section 3. Illustration with two real data examples of dimension d=4 and d=10 is presented in Section 4. Applicability of the CUC method beyond its standard setting is discussed in Section 5. Technical proofs are relegated to Appendix A.

The data that are analysed in the paper and the program that was used can be obtained from

2. Methodology

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. Simulation
  6. 4. Real data examples
  7. 5. Conclusional remark
  8. Acknowledgements
  9. References
  10. Appendix

2.1. Basic setting

To simplify the matter concerned, we assume that var(Xt)=Id—the d×d identity matrix. In practice, this amounts to replacing Xt by S−1/2Xt, where S is the sample covariance matrix of Xt. We assume that each component of Xt is a linear combination of d CUCs Zt1,…,Ztd which satisfy the conditions E(Zti|ℱt−1)=0, var(Zti)=1 and

  • image( (2.1))

Put Zt=(Zt1,…,Ztd)T. This setting implies that

  • image( (2.2))

for a constant matrix A. Necessarily, var(Xt)=AT var(Zt)A=AAT=Id. Hence, A is a d×d orthogonal matrix with (d/2)(d−1) free elements and Zt=ATXt. Put

  • image( (2.3))

i.e. inline image. It is easy to see that, once we have specified inline image—the volatility of the jth CUC, for j=1,…,d—volatilities for any portfolios can be deduced accordingly. For example, for any portfolios inline image and inline image it holds that

  • image

where inline image (j=1,2). Hence, the CUC decomposition (2.2) facilitates a parsimonious modelling for d-dimensional multivariate volatility processes via d univariate volatility models. In this way, we reduce the number of parameters that are involved substantially.

The assumption that var(Xt)=Id is not essential. It is introduced to reduce the free parameters in A from d2 to d(d−1)/2. This is similar to the independent component analysis which performs a principal component analysis to reduce a d2-dimensional optimization problem to a d(d−1)/2-dimensional problem; see, for example, section 7.4 of Hyvärinen et al. (2001), and also Section 2.2.1 below.

2.2. Estimation of conditionally uncorrelated components

2.2.1. Estimation for A

By equation (2.2), inline image, and a1,…,ad are d orthogonal vectors. The goal is to estimate the orthogonal matrix A=(a1,…,ad). Condition (2.1) is equivalent to

  • image( (2.4))

for any π-class ℬt⊂ℱt−1 such that the σ-algebra that is generated by ℬt is equal to ℱt−1 (theorem 7.1.1 of Chow and Teicher (1997)). In practice, we use some simple ℬt for tractability. This leads to choosing an orthogonal matrix A=(a1,…,ad)T which minimizes

  • image( (2.5))

where k0geqslant R: gt-or-equal, slanted1 is a prescribed integer, ℬ consists of countable subsets in ℛd and w(·) is a weight function such that ΣB ∈ ℬ w(B)<∞. We denote by inline image the resulting estimator.

The order of a1,…,ad is arbitrary and ai may be replaced by −ai. Therefore we measure the estimation error by

  • image( (2.6))

Note that, for any orthogonal matrices A and B, D(A,B)geqslant R: gt-or-equal, slanted0. Furthermore, if the columns of A are obtained from a permutation of the columns of B or their reflections, D(A,B)=0.

In practice, we may let ℬ be the collection of all the balls that are centred at the origin in ℛd. Note that E(Xt)=0 and var(Xt)=Id. When the distribution of Xt is spherically symmetric and unimodal, ℬ is the collection of the minimum volume sets which determine the distribution of Xt (Polonik, 1997). With any given n observations, effectively such a ℬ consists of {x ∈ ℛd|‖xleqslant R: less-than-or-eq, slantXt‖} for t=1,…,n and therefore has at most n different members. Hence we may let w(B)=1/n.

To overcome the difficulties in handling the constraint ATA=Id in solving the above optimization problem, we parameterize A as

  • image( (2.7))

where Γij(ϕij) is obtained from the identity matrix Id with the following replacements: both the (i,i)th and the (j,j)th elements are replaced by  cos (ϕij); the (i,j)th and the (j,i)th elements are replaced respectively by  sin (ϕij) and − sin (ϕij) (Vilenkin, 1968; van der Weide, 2002). Obviously Γij(ϕij) is an orthogonal matrix; so is A given in equation (2.7). Writing A in equation (2.2) in the form of equation (2.7), the constrained minimization of expression (2.5) over orthogonal A is transformed to an unconstrained minimization problem over a d(d−1)/2×1 vector ϕ=(ϕ12,ϕ13,…,ϕ1d,ϕ23,…,ϕd−1,d)T. This minimization problem is typically solved by iterative algorithms. We stop the iteration when D(Ak,Ak+1) is smaller than a prescribed small constant, where Ak denotes the value of A in the kth iteration. Note that Ψn(A)=Ψn(B) for any orthogonal A and B with D(A,B)=0.

2.2.2. Asymptotic properties

Let

  • image( (2.8))

Theorem 1 below states that the estimator inline image is consistent under the regularity conditions (a)–(e) that are listed in Appendix A. Theorem 1 does not require the condition that the CUCs exist. Instead condition (c) assumes only that there is an A0 which is a unique minimizer, under D-distance, of Ψ(A). Since function Ψn cannot tell any difference between orthogonal A and B as long as D(A,B)=0, we call inline image a consistent estimator of A0 if the D-distance between inline image and A0 converges to 0 in probability.

Theorem 1.  Let k0geqslant R: gt-or-equal, slanted1 be a fixed integer. Under conditions (a)–(c) in Appendix A, inline image in probability as n[RIGHTWARDS ARROW]∞. If, in addition, condition (d) holds, it holds that for any orthogonal A

  • image

Furthermore, inline image provided that, in addition, condition (e) also holds.

When the CUCs exist, Ψ(A0)=0. In contrast, when the CUCs do not exist, Ψ(A0)≠0 and A0 may now depend on the choice of ℬ. In this case, we naturally seek an orthogonal transform such that the resulting components are the least conditionally correlated. Note that Ψ(·) defined in expression (2.8) may be written as

  • image( (2.9))

We view Ψ(A) as a collective conditional correlation measure among the d directions a1,…,ad. Thus, our criterion may be seen as to find an orthogonal transform A to minimizer Ψ(A). (See also the discussion in Section 5.) Theorem 2 indicates that asymptotically the transformed components along any other orthogonal matrix inline image lead to a higher collective conditional correlation, in terms of Ψ(·), than that along inline image.

Theorem 2.  Let k0geqslant R: gt-or-equal, slanted1 be a fixed integer, and conditions (a) and (b) in Appendix A hold. Then, for any other orthogonal transform inline image,

  • image

The proof of theorem 1 is more involved and is presented in Appendix A. We omit the proof of theorem 2 for brevity.

2.3. Modelling volatilities for conditionally uncorrelated components

Once the CUCs have been identified, we may fit each inline image with an appropriate univariate volatility model such as a GARCH or stochastic volatility model; see the survey by Shephard (1996). As a simple illustration, we establish below an extended GARCH(1,1) model for each inline image that is given in equation (2.3).

2.3.1. Extended GARCH(1,1) models

We assume, for the jth CUC, j=1,…,d,

  • image( (2.10))

where {ɛtj,−∞<t<∞} is a sequence of independent and identically distributed random variables with mean 0 and variance 1, ɛtj is independent of ℱt−1, γj>0 and αj,αji,βjgeqslant R: gt-or-equal, slanted0. To ensure that var(Ztj)=1, we set γj=1−βj−Σ1leqslant R: less-than-or-eq, slantileqslant R: less-than-or-eq, slantd αji. This model contains extra d−1 terms inline image from the standard GARCH(1,1) model, which incorporates the possible association between the jth CUC and the other CUCs, whereas the conditional zero-correlation condition (2.1) still holds. Such dependence is termed as that the ith component (if αji≠0) is causal in variance to the jth component (Granger et al., 1984). Note that, under the specification (2.10), the CUC becomes a restricted form of the BEKK representation.

In practice, we expect that inline image may depend on inline image only for a small number of is, including i=j, i.e. many coefficients αji (for ij) may be 0. Section 2.3.3 outlines a data analytic approach for building such a component-dependent model.

Model (2.10) may be viewed as a special case of the vectorized auto-regressive moving average–GARCH model of which the conditions for stationarity and ergodicity may be found in, for example, Ling and McAleer (2003). When βj ∈ [0,1), expression (2.10) admits the representation

  • image( (2.11))
2.3.2. Quasi-maximum-likelihood estimation

To facilitate a likelihood estimation, let us assume hypothetically that ɛtj in model (2.10) is standard normal. The implied (negative) twice log-likelihood function for θj≡(αj1,…,αjd,βj)T is

  • image( (2.12))

for a given integer νgeqslant R: gt-or-equal, slanted1, where σtj(θj)2=var(Ztj|ℱt−1) is given by equation (2.11). The quasi-maximum-likelihood estimator (QMLE) inline image minimizes equation (2.12). In practice, we let Zti≡0 for all tleqslant R: less-than-or-eq, slant0 on the right-hand side of equation (2.11). The sum in equation (2.12) is taken from t=ν+1 to alleviate the effect of this truncation.

2.3.3. Selection of causal components

To obtain a parsimonious representation for inline image, we may select only those significant Zt−1,i on the right-hand side of the second equation in expression (2.10). This is particularly important when the number of components d is large. It may be achieved by using the ideas for variable selection in regression analysis. Below we outline an algorithm that is based on a combination of the stepwise addition method and the Bayes information criterion BIC, which is particularly computationally effective. An obvious alternative is to adopt a forward search algorithm that is based on the statistical tests for the causality in variance (Cheung and Ng, 1996; Hafner and Herwartz, 2006).

We start with the standard GARCH(1,1) model (i.e. αjj≠0 and αji=0 for ji). We then add one more Zt−1,i each time which maximizes the (quasi-)likelihood. More precisely, suppose that the model contains k−1 terms Zt−1,j1,…,Zt−1,jk−1 already. We choose an additional term Zt−1,l among l∉{j,j1,…,jk−1} which maximizes the quasi-likelihood function. Note that this is a two-step maximization problem: for each given l∉{j,j1,…,jk−1}, we compute the QMLE inline image for inline image with the constraints αji=0, for i∉{j,j1,…,jk−1,l}. We then choose an l∉{j,j1,…,jk−1} to minimize inline image and denote by lj(k) the minimum value and the index of the selected variable jk. Put

  • image

We choose rj which minimizes BICj(k) over 0leqslant R: less-than-or-eq, slantkleqslant R: less-than-or-eq, slantd. Note that k=0 corresponds to the standard GARCH(1,1) model for Ztj.

2.3.4. Least absolute deviation estimator

It is well documented that QMLE inline image suffers from complicated asymptotic distributions and slow convergence rates if ɛtj is heavy tailed in the sense that E(|ɛtj|4)=∞ (Hall and Yao (2003) and section 7.3 of Straumann (2005)). In contrast, a least absolute deviation estimator that is based on a log-transformation is always asymptotically normal with the standard root n convergence rate provided that inline image; see Peng and Yao (2003).

To construct the least absolute deviation estimator with the constraint var(Ztj)=1, we write ɛtj=v0etj in the first equation in expression (2.10), where the median of inline image is equal to 1 and v0=1/STD(etj). With σtj(θj)2 expressed in equation (2.11), parameters θj and v0 are (jointly) identifiable. Now

  • image

Since the median of inline image is 0, the true values of the parameters minimize

  • image

Therefore we may estimate the parameters by minimizing

  • image( (2.13))

where σtj(θj)2 is given in equation (2.11), with the part of aji=0 for the non-causal component in the variance. So far θj and v0 have been treated as free parameters. The estimators that are obtained are root n consistent.

To make an explicit use of the condition that var(ɛtj)=1, we may estimate parameters θj as follows. With the initial estimate inline image, let inline image be the reciprocal of the sample standard deviation of the residuals inline image, where inline image. With the given inline image and inline image, we can minimize

  • image

where inline image. We may update inline image and iterate further until the estimated θj converges. Note that we have used a weighted L2 loss function to approximate the L1-loss to expedite the computation.

2.4. Inference based on bootstrapping

A natural question for the approach proposed is whether the CUCs Zt1,…,Ztd exist or not, although the minimizer inline image of expression (2.5) always exists. To address this issue statistically, we may construct a test for the null hypothesis

  • image( (2.14))

where ATA=Id, ɛt=(ɛt1,…,ɛtd)T, {ɛt1},…,{ɛtd} are d independent series and each of them is a sequence of independent and identically distributed random variables with mean 0 and variance 1. Note that the null hypothesis above is a sufficient but not necessary condition for the existence of CUCs. The independence condition is required to construct a bootstrap estimation of the null distribution. Also note that Zt1,…,Ztd may not be independent of each other.

When Zti and Ztj are not conditionally uncorrelated, the left-hand side of equation (2.4) is equal to a positive constant instead of 0. Therefore, large values of inline image indicate that the CUCs do not exist. We adopt a bootstrap method below to assess how large is sufficiently large to reject hypothesis H0.

If the null hypothesis H0 could not be rejected, we may also construct confidence sets for the coefficients aj (i.e. the columns of A) of the CUCs, and the parameters θj based on the same bootstrap scheme. Formally confidence sets for θj could also be constructed on the basis of asymptotic distributions of, for example, the least absolute deviation estimator inline image, which may be derived in a similar manner to that of Peng and Yao (2003). However, such an approach is based on the assumption that the CUCs are known (i.e. the vectors aj are known) and, therefore, fails to take into account the errors due to the estimation for aj.

Let inline image be the estimator that is derived from minimizing expression (2.5). Let inline image and inline image be an estimator for θj.

The bootstrap sampling scheme consists of the three steps below.

  • (a)
    For j=1,…,d, draw inline image, for −∞<tleqslant R: less-than-or-eq, slantn, by sampling randomly with replacement from the standardized residuals inline image which are obtained from standardizing the raw residuals
    • image
  • (b)
    For j=1,…,d, draw inline image, for −∞<tleqslant R: less-than-or-eq, slantn, where
    • image
  • (c)
    Let inline image for t=1,…,n.
2.4.1. A test for the existence of the conditionally uncorrelated components

Let inline image be defined as in expression (2.5) with {Xt} replaced by inline image, and the bootstrap estimator inline image be computed in the same manner as inline image with Ψn replaced by inline image. Note that the bootstrap sample inline image is drawn from the model with inline image as its genuine CUCs. Hence the conditional distribution of inline image (given the original sample {Xt}) may be taken as an approximation for the distribution of inline image under hypothesis H0. Thus we reject H0 if inline image is greater than the (Bα)th largest value of inline image in a replication of the above bootstrap resampling for B times, where α ∈ (0,1) is the size of the test and B is a large integer.

2.4.2. Confidence sets for A

A bootstrap approximation for a 1−α confidence set of the transformation matrix A can be constructed as

  • image( (2.15))

where c is the (Bα)th largest value of inline image in a replication of the bootstrap resampling for B times. When A is in the confidence set, so is B if the columns of B form a permutation of the (reflected) columns of A; see equation (2.6).

2.4.3. Interval estimators for the components of inline image

A bootstrap confidence interval for any component βj, say, of θj may be obtained as follows. Repeat the above bootstrap sampling B times for some large integer B, resulting in bootstrap estimates inline image. An approximate 1−α confidence interval for βj is inline image, where inline image denotes the ith smallest value among inline image, and b1=[Bα/2] and b2=[B(1−α/2)].

We have adopted the standard bootstrap procedure above. However, the wild bootstrap method has proved to be effective for the inference for mean functions in the presence of heteroscedastic noise (Wu, 1986; Mammen, 1993; Hafner and Herwartz, 2000). It is an interesting open question how to adapt the wild bootstrap idea for the inference on conditional second moments.

3. Simulation

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. Simulation
  6. 4. Real data examples
  7. 5. Conclusional remark
  8. Acknowledgements
  9. References
  10. Appendix

We conduct a Monte Carlo experiment to illustrate the CUC approach proposed. In particular we check the accuracy of the estimation for the transformation matrix A in equation (2.2).

We consider a CUC extended GARCH(1,1) model with d=3:

  • image( (3.1))

where inline image, i=1,2,3, and parameter values given in Table 1. It is easy to see that ATA=I3 and γi=1−αi1αi2αi3βi. Thus the variances of the CUCs are 1. Since α11+α12+α13+β1=0.98, the volatility for the first CUC is highly persistent. In contrast, the volatility persistence in the third component is less pronounced, as α31+α32+α33+β3=0.72 only.

Table 1.   Parameter values
Aiγiβiαi1αi2αi3
00.5000.86610.020.900.0400.04
00.866−0.50020.100.8000.100
−10030.280.60000.12

For each of 800 samples with size n=500 or n=1000 generated from the above model, we estimated A by minimizing Ψn(A) defined in expression (2.5). As far as the estimation of A is concerned, two orthogonal matrices are treated as identical if the D-distance between them is 0; see equation (2.6). The coefficients αij, βi and γi were estimated by using QMLE based on Gaussian likelihood. The estimates are summarized in Table 2 and Fig. 1. Estimation errors for α12, α21, α23, α31 and α32 are all very close to 0 and are not reported here for brevity.

Table 2.   Summary statistics of the estimation errors in simulation
nParameterinline imageinline imageinline imageinline imageinline imageinline imageinline imageinline image
500Mean0.1300.8420.0350.0410.7610.0760.6160.084
Median0.1280.8840.0300.0360.8030.0720.6680.076
Standard deviation0.0800.1470.0300.0290.1750.0450.2570.058
Bias−0.058−0.0050.001−0.039−0.0240.016−0.036
Root-mean-squared error0.1580.0310.0290.1800.0520.2580.068
1000Mean0.1140.8690.0370.0370.7820.0770.6160.089
Median0.1020.8850.0360.0350.8040.0760.6410.087
Standard deviation0.0770.0780.0190.0190.1190.0330.2140.043
Bias−0.031−0.003−0.003−0.018−0.0230.016−0.032
Root-mean-squared error0.0840.0200.0190.1200.0410.2150.054
image

Figure 1.  Boxplots of the estimation errors for the CUC-GARCH(1,1) model (3.1) with (a) inline image estimated and (b) the true A: the sample size is n=1000

Download figure to PowerPoint

Both the means and the standard deviations of inline image are small. This indicates that the estimation for A seems to be reasonably accurate. The coefficients in each CUC model were also estimated accurately. The estimators are almost unbiased, as the biases are negligible in comparisons with the corresponding variances. The errors in estimation decrease as the sample size increases from 500 to 1000, roughly by a factor of √2.

Since most of the biases that are reported in Table 2 are negative (see also Fig. 1), the coefficients in the GARCH models for CUCs were slightly underestimated. Also note that the estimation errors decrease when the volatility persistence (which is measured by αi1+αi2+αi3+βi) increases; see Fig. 1(a) with the sample size 1000. Fig. 1(b) presents the estimation errors of the GARCH coefficients with A given. The difference between the estimation errors of the two cases is small.

4. Real data examples

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. Simulation
  6. 4. Real data examples
  7. 5. Conclusional remark
  8. Acknowledgements
  9. References
  10. Appendix

In this section we illustrate the method proposed with two real data examples with d=4 and d=10. First we analyse the 2527 daily log-returns (in percentages) of the Standard and Poors 500 index S&P500, the stock prices of Cisco Systems, Intel Corporation and Sprint in the period from January 2nd, 1991, to December 31st, 2000. This data set was downloaded from Yahoo!Finance. The close prices adjusted for dividends and splits were used to produce the return series that are plotted in Fig. 2. We use the first 2275 observations (i.e. the data up to the end of 1999) for estimating the parameters in the models, and we leave the last 252 data points (i.e. the data in 2000) for checking the post-sample forecasting performance.

image

Figure 2.  Plots of daily log-returns of (a) the S&P500-index, (b) Cisco Systems, (c) Intel Corporation and (d) Sprint in the period January 2nd, 1991–December 31st, 1999

Download figure to PowerPoint

To account for the conditional mean of the return series, a vector AR(2) model, which was selected by both M(i) (Tiao and Box, 1981) and the Akaike information criterion, was first fitted to the data. We denote by Yt, t=1,2,…,2273, the residuals that resulted from this fitting. In what follows, we focus on modelling the conditional covariance matrix process of Yt.

Let S be the sample covariance matrix of Yt, and Xt=S−1/2Yt. The estimator inline image was obtained by minimizing Ψn(A). For comparison, the estimator inline image that is obtained by maximizing the likelihood function of the GO-GARCH(1,1) model (van der Weide, 2002) was also computed. We applied the bootstrap test that was described in Section 2.4, with bootstrap sampling repeated 400 times, to test for the existence of the CUCs and we obtained the P-value 0.34. This indicates that there is no significant evidence against the hypothesis that the CUCs exist for this data set. The 95% bootstrap confidence set for the transformation matrix A is inline image. Since inline image, I4 is not contained in the confidence sets. Thus the principal components cannot be taken as the CUCs. Also inline image; therefore inline image is not contained in the confidence set either. This suggests that the MLE that is based on the GO-GARCH(1,1) model does not lead to CUCs and, therefore, it would be inappropriate to assume that the conditional covariance matrix of inline image is diagonal, as implied by the GO-GARCH(1,1) approach.

Table 3 lists the estimated extended GARCH(1,1) models for the estimated CUCs. The models were selected by the algorithm that was specified in Section 2.3.3. There is a causality-in-variance relationship from the fourth CUC to the second CUC. Also the last two CUCs are highly persistent as the sum of all the GARCH and ARCH coefficients is close to 1 for both of them. On the basis of the fitted volatility inline image (i=1,2,3,4) for the CUCs, the conditional covariance matrix for the original residuals Yt is of the form

  • image( (4.1))
Table 3.   Extended GARCH model for CUCs of the S&P500–Cisco Systems–Intel Corporation–Sprint data
jjiinline image
1 inline image
24inline image
3 inline image
4 inline image

where inline image.

For the comparison, we also computed the estimated volatility processes for Yt based on the O-GARCH(1,1) model of Alexander (2001), the DCC-GARCH(1,1) model of Engle (2002) and the GO-GARCH(1,1) model of van der Weide (2002). We also included the CUC-GARCH(1,1) model in our comparison, i.e. we fitted for each CUC a standard GARCH(1,1) model without incorporating the lagged values from the other CUCs. As we have pointed out earlier, the GO-GARCH(1,1) model does not fit this data set well. In fact the estimated conditional correlation process between the S&P500-return and the Intel return based on the GO-GARCH(1,1) model is negatively correlated with its counterpart based on any other models that were mentioned above. Therefore we exclude the GO-GARCH(1,1) model results in the comparison below.

Fig. 3 displays the time plots of the estimated conditional variance processes of the S&P500-return by the O-GARCH(1,1) model, the DCC-GARCH(1,1) model and the CUC-GARCH(1,1) model. Whereas the estimated processes by the DCC-GARCH(1,1) and the CUC-GARCH(1,1) models look similar, the O-GARCH(1,1) model certainly leads to a very different volatility profile. Comparing with the original return series in Fig. 2(a), the two peaks around t=850 should not be there. They were caused by the extreme negative returns of the Cisco Systems price in the same period; see Fig. 2(b). Such a misleading phenomenon resulted from treating the principal components as CUCs in the O-GARCH(1,1) model. The estimated conditional correlation processes between the S&P500-return and the Intel price return are plotted in Fig. 4. The conditional correlation estimated by the CUC-GARCH(1,1) model is more volatile than those estimated by the O-GARCH(1,1) and the DCC-GARCH(1,1) models. In particular the CUC-estimated conditional correlation is small in the middle period before it peaks up twice towards the end. Those two peaks correspond to the two peaks in the volatility process of the S&P500-return. Note that the estimated correlations by the DCC and the CUC models are quite different numerically from each other.

image

Figure 3.  Estimated volatility processes for the S&P500-return

Download figure to PowerPoint

image

Figure 4.  Estimated conditional correlation processes between the S&P500-return and the Intel Corporation return

Download figure to PowerPoint

We now apply two diagnostic checking statistics to assess the various fitted models. Following the lead of Tse and Tsui (1999), we use the Box–Pierce statistic to check the cross-product of the standardized residuals. For this, let inline image be the standardized residual for the ith component, where inline image is the (i,i)th element of the fitted conditional variance of Yt. Put

  • image( (4.2))

where inline image is the estimated conditional correlation between Yti and Ytj. If the model is correctly specified, there is no auto-correlation in {Ct,ij,tgeqslant R: gt-or-equal, slanted1} for any fixed i and j. Define

  • image( (4.3))

where rij,k is the sample auto-correlation of Ct,ij at lag k. It is intuitively clear that large values of Q(i,j;M) are indicative for the lack of fit for the conditional correlation between the ith and the jth components Yt when ij, and for the lack of fit for the conditional variance of the ith component when i=j. We also employ a multivariate portmanteau statistic (section 5.5 of Reinsel (1997)) to test for the auto-correlation in the vectorized cross-product of residuals inline image, where inline image. Let inline image be the auto-covariance matrix of ξt at lag l. The multivariate portmanteau statistic is defined as

  • image( (4.4))

This may be seen as a multivariate extension of McLeod and Li (1983) which applied a univariate portmanteau test to squared residuals.

Table 4 lists the values of Q(i,j;M), 1leqslant R: less-than-or-eq, slanti<jleqslant R: less-than-or-eq, slant4 and M=5, for five different models. Significant levels of Q(i,j;M) were computed according to the inline image-distribution; see Tse and Tsui (1999). Table 5 lists the values P(k) for 1leqslant R: less-than-or-eq, slantkleqslant R: less-than-or-eq, slant5. Although the asymptotic distribution of P(k) is unavailable to conduct a formal testing, it is intuitively clear that large values of P(k) would indicate the lack of fit of the model concerned.

Table 4. Q(i, j; M) with M=5 for the S&P500–Cisco Systems–Intel Corporation–Sprint data
i,jResults for the following models:
O-GARCHDCCGO-GARCHCUC-GARCHCUC-Ex GARCH
  1. †Significant at level 0.01.

  2. ‡Significant at level 0.1.

  3. §Significant at level 0.05.

1,169.37†5.005.235.195.19
2,210.389.058.918.228.12
3,32.114.675.881.551.59
4,41.291.080.970.460.41
1,248.11†10.91‡10.31‡8.368.31
1,354.44†15.79†10.67‡4.734.55
1,418.69§1.861.511.401.38
2,31.055.157.724.344.25
2,46.993.043.353.112.93
3,42.154.112.312.832.82
Table 5. P(k) for the S&P500–Cisco Systems–Intel Corporation–Sprint data
kResults for the following models:
O-GARCHDCCGO-GARCHCUC-GARCHCUC-Ex GARCH
1182.76117.3299.8396.6996.75
2307.64210.99190.85186.95184.49
3439.22325.91302.53302.87295.92
4523.74412.77392.74395.79387.39
5634.51507.46486.91494.16489.16

Tables 4 and 5 indicate that the O-GARCH(1,1) model provided overall the poorest fit among the five models concerned according to both Q(i,j;M) and P(k); in particular four of its Q-statistics are significant at the 0.05-level. However, the tests with Q(1,2;5) and Q(1,3;5) for both DCC-GARCH(1,1) and GO-GARCH(1,1) models are significant at least at level 10%, whereas both the CUC-GARCH(1,1) and CUC-extended GARCH(1,1) models passed all the tests with the statistic Q(i,j;M). Note that the values of P(k) for the two CUC-based models are smaller than those for the O-GARCH(1,1) and the DCC-GARCH(1,1) models. Overall both the diagnostic statistics indicate that the CUC-extended GARCH(1,1) model is the best model for this particular data set.

To make a post-sample comparison between these models, we need to construct proxies for unobserved conditional covariance matrices by using the daily returns. Let inline image be the p-days-ahead forecast of the covariance matrix at t. Following the lead of Pelletier (2006) and Fan et al. (2007), we gauge the quality of forecasting on the basis of the adaptive mean absolute deviations:

  • image( (4.5))

where v is a non-negative integer, and the sum over t is over the n* post-sample points. When v=0, AMAD reduces to the mean absolute deviation that was used in Pelletier (2006) and the proxy for the covariance matrix at time t+p is just the cross-product of the return vector at that day; when v>0, the adjacent 2v+1 days returns are used to average out the stochastic error in the proxy.

We use the last 252 observations in the data to compute the AMADs. On the basis of the p-step-ahead forecast of the univariate GARCH model (see, for example, page 94 in Tsay (2002)) for each component and the transformation matrix, inline image can be constructed in a straightforward way for the O-GARCH, GO-GARCH and CUC-GARCH models. The forecast for the DCC model follows the procedure of Pelletier (2006). The lengths of the samples that were used for parameter estimation are 500 and 1000, and the estimates are updated every 5 days, and no causal component is considered for the CUCs. Table 6 lists the results for p=1 and p=5.

Table 6.   AMAD for the S&P500–Cisco Systems–Intel Corporation–Sprint data
TpResults for the following models:
O-GARCHDCCGO-GARCHCUC-GARCH
v=0
50017.30957.21877.92277.2104
57.28697.23837.70487.1909
100017.20947.16367.83617.0859
57.17817.16467.69887.1436
v=1
50014.56504.58974.83604.6894
54.95434.96345.46634.8978
100014.58564.59214.79724.6747
54.91584.93435.77904.9142

AMAD for the CUC-GARCH model is always the smallest when v=0. When v=1, this is still true for the 5-day-ahead forecast, but for the 1-day-ahead forecast AMADs for the O-GARCH and DCC models are both smaller than those for the CUC-GARCH model. In contrast, the GO-GARCH model provides the worst forecasts for this data set. Overall the CUC-GARCH model outperforms the other three models in this forecasting comparison.

Our second example concerns the daily log-return of the exchange rates of the 10 European currencies against the US dollar during January 2nd, 1990–December 31st, 1998, immediately before the introduction of the euro. The currencies concerned are from Austria, Belgium, Finland, France, Germany, Ireland, Italy, the Netherlands, Portugal and Spain. For this data set, n=2263 and d=10. Diagnostic checks similar to those in the first example were carried out. For brevity, in Table 7 we list only the multivariate portmanteau statistics for the various models. It is not surprising that P(k) of the extended CUC-GARCH model is the smallest in each row and the values of P(k) for the CUC-GARCH model are smaller than those for the O-GARCH, DCC and GO-GARCH models as k>3. The DCC model may be too simple to catch the dynamical structure of a 10-dimensional volatility process. The extension to incorporate more flexibility into the DCC structure would present an interesting line for further development.

Table 7. P(k) for the exchange rates data
kResults for the following models:
O-GARCHDCCGO-GARCHCUC-GARCHCUC-Ex GARCH
1678310271635272976316
2112241679210918115629871
31573623530150431528812706
42053830701198711944816205
52365537077233132283919022
104163162943417704119735928

Again the comparison that is based on post-sample forecasting was also in favour of the CUC approach. In fact we reserve the whole-year data in 1998 (with 252 observations) for checking the post-sample forecasting performance. Both 1-day-ahead and 5-day-ahead forecasts are made on the basis of the fitted models using 500 observations in the immediate past. Table 8 lists the AMAD-values (see equation (4.5)) for the forecasts that are based on the four different models. Except for one case with v=1 and p=1, the CUC-GARCH model provides the best forecasts among the models concerned. On the basis of Tables 7 and 8, we would conclude that the CUC provides an alternative parsimonious representation for the dynamics of conditional covariance processes which is more accommodating than, for example, the simple DCC model when the dimension of the underlying process is large.

Table 8.   AMAD for the exchange rates data
TpResults for the following models:
O-GARCHDCCGO-GARCHCUC-GARCH
v0
50010.30970.30980.40740.2978
20.31120.31090.40340.2987
30.30890.30830.38850.2958
40.30970.30890.35020.2974
50.30970.30870.37520.2991
v1
50010.20610.20380.24750.2088
20.21270.21150.34390.2108
30.21170.21020.31930.2087
40.21380.21210.31450.2086
50.21500.21330.30430.2094

5. Conclusional remark

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. Simulation
  6. 4. Real data examples
  7. 5. Conclusional remark
  8. Acknowledgements
  9. References
  10. Appendix

It is extremely effective for analysing multivariate time series to find an appropriate linear transformation such that the components of the transformed series exhibit certain ‘unrelatedness’. There are at least three types of unrelatedness. For modelling conditional covariance processes, the conditional uncorrelatedness is the correct measure which serves the purpose adequately, whereas the unconditional uncorrelatedness that is required in the orthogonal GARCH model (Alexander, 2001) is too weak and the independence in the independent component analysis is too strong.

Modelling multivariate volatility processes is a practically important and methodologically challenging problem. The CUC-based method that is proposed in this paper attempts to catch sophisticated conditional heteroscedasticity structures while maintaining a parsimonious representation for matrix processes. One natural question arises: do the CUCs that are so defined exist? Empirical experiments with various real data sets indicate that the P-value of the bootstrap test which is described in Section 2.4 tends to decrease as d increases. However, with small or moderately large d the hypothesis of the existence of CUCs has rarely been rejected in our empirical experiments.

In the event that the CUCs do not exist, we argue that it is very natural to find the linear transformation such that the resulting components are the least conditionally correlated, especially if we take the viewpoint that any statistical model is merely an approximation to the reality. In this sense, our CUC estimation leads to the least conditionally correlated directions and we build up an (approximate) volatility model by assuming that the conditional correlations between those least conditionally correlated directions are 0. The least conditionally correlated directions are the directions which minimize Ψ(·) that is defined in equation (2.8); also see equation (2.9). Theorem 1 indicates that the columns of inline image are the consistent estimators for the least conditionally correlated directions. Note that both theorem 1 and theorem 2 still apply when the CUCs do not exist (i.e. Ψ(A0)≠0); see condition (c) in Appendix A. Even if the CUCs do not exist, a CUC-GARCH(1,1) model, for example, still provides a more relevant fit than O-GARCH(1,1) and GO-GARCH(1,1) models.

Finally we point out that, for any multivariate time series Xt, there is always an ℱt−1-measurable orthogonal matrix At−1 for which the components in inline image are conditionally uncorrelated. The CUC requires a further constraint At−1A. It is reasonable to assume that At−1 varies smoothly in t (see also equation (1.3)). Therefore we may assume that the CUCs exist for a short time period in which At−1A. This further extends the scope of the applicability of our method.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. Simulation
  6. 4. Real data examples
  7. 5. Conclusional remark
  8. Acknowledgements
  9. References
  10. Appendix

We thank the reviewers for very helpful comments and suggestions. Jianqing Fan was supported partially by National Science Foundation grants DMS-0355179 and DMS-0704337, and Chinese National Science Foundation grant 10628104, Mingjin Wang was partially supported by Engineering and Physical Sciences Research Council grant GR/R97436 and Chinese National Science Foundation grant 70201007, and Qiwei Yao was partially supported by Engineering and Physical Sciences Research Council grants GR/R97436 and EP/C549058.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. Simulation
  6. 4. Real data examples
  7. 5. Conclusional remark
  8. Acknowledgements
  9. References
  10. Appendix
  • Alexander, C. (2001) Orthogonal GARCH. In Mastering Risk, vol. 2, pp. 2138. London: Financial Times–Prentice Hall.
  • Arcones, M. A. and Yu, B. (1994) Central limit theorems for empirical processes and U-processes of stationary mixing sequences. J. Theoret. Probab., 7, 4771.
  • Back, A. and Weigend, A. S. (1997) A first application on independent component analysis to extracting structure from stock returns. Int. J. Neur. Syst., 8, 473484.
  • Bauwens, L., Laurent, S. and Rombouts, J. V. K. (2006) Multivariate GARCH models: a survey. J. Appl. Econometr., 21, 79109.
  • Bollerslev, T. (1990) Modelling the coherence in short-run nominal exchange rates: a multivariate generalized ARCH model. Rev. Econ. Statist., 72, 498505.
  • Bollerslev, T. R., Engle, R. and Wooldridge, J. (1998) A capital asset pricing model with time varying covariances. J. Polit. Econ., 96, 116131.
  • Cheung, Y. W. and Ng, R. P. (1996) A causality in variance test and its application to financial market prices. J. Econometr., 72, 3348.
  • Chow, Y. S. and Teicher, H. (1997) Probability Theory, 3rd edn. New York: Springer.
  • Engle, R. (2002) Dynamic conditional correlation—a simple class of multivariate GARCH models. J. Bus. Econ. Statist., 20, 339350.
  • Engle, R. F. and Kroner, K. F. (1995) Multivariate simultaneous generalized ARCH. Econometr. Theory, 11, 122150.
  • Engle, R. F. and Sheppard, K. (2001) Theoretical and empirical properties of dynamic conditional correlation multivariate GARCH. Working Paper W8554. National Bureau of Economic Research, Cambridge. (Available from http://www.econ.ucsd.edu/papers/files/2001-15.pdf.)
  • Fan, J., Fan, Y. and Lv, J. (2007) Aggregation of nonparametric estimators for volatility matrix. J. Finan. Econ., 5, 321357.
  • Fan, J. and Yao, Q. (2003) Nonlinear Time Series: Nonparametric and Parametric Methods. New York: Springer.
  • Granger, C. W. J., Robins, R. P. and Engle, R. F. (1984) Wholesale and retail prices: bivariate time series modeling with forecastable error variances. In Model Reliability (eds D.Belsley and E.Kuh). Cambridge: Massachusetts Institute of Technology Press.
  • Hafner, C. M. and Herwartz, H. (2000) Testing for linear autoregressive dynamics under heteroscedasticity. Econometr. J., 3, 177197.
    Direct Link:
  • Hafner, C. M. and Herwartz, H. (2006) A Lagrange multiplier test for causality in variance. Econ. Lett., 93, 137141.
  • Hall, P. and Yao, Q. (2003) Inference for ARCH and GARCH models. Econometrica, 71, 285317.
  • Harvey, A., Ruiz, E. and Shephard, N. (1994) Multivariate stochastic variance models. Rev. Econ. Stud., 61, 247264.
  • Hyvärinen, A., Karhunen, J. and Oja, E. (2001) Independent Component Analysis. New York: Wiley.
  • Jerez, M., Casals, J. and Sotoca, S. (2001) The likelihood of multivariate GARCH models is ill-conditioned. Technical Report . Universidad Complutense de Madrid, Madrid. (Available from http://www.ucm.es/info/icae/e4download.htm.)
  • Kiviluoto, K. and Oja, E1998) Independent component analysis for parallel financial time series. In Proc. Int. Conf. Neural Information Processing, vol. 2, pp. 895989. Tokyo.
  • Ling, S. and McAleer, M. (2003) Adaptive estimation in non-stationary ARMA models with GARCH noises. Ann. Statist., 31, 642674.
  • Mălăroiu, S., Kiviluoto, K. and Oja, E. (2000) Time series prediction with independent component analysis. Technical Report . Helsinki University of Technology, Helsinki. (Available from http://www.cis.hut.fi/kkluoto/publications/ait99ica.pdf.)
  • Mammen, E. (1993) Bootstrap and wild bootstrap for high dimensional linear models. Ann. Statist., 21, 255285.
  • McLeod, A. I. and Li, W. K. (1983) Diagnostic checking ARMA time series models using squared-residual autocorrelations. J. Time Ser. Anal., 4, 269273.
  • Pelletier, D. (2006) Regime switching for dynamic correlations. J. Econometr., 131, 445473.
  • Peng, L. and Yao, Q. (2003) Least absolute deviations estimation for ARCH and GARCH models. Biometrika, 90, 967975.
  • Polonik, W. (1997) Minimum volume sets and generalized quantile processes. Stoch. Processes Appl., 69, 124.
  • Reinsel, G. C. (1997) Elements of Multivariate Time Series Analysis, 2nd edn. New York: Springer.
  • Shephard, N. (1996) Statistical aspects of ARCH and stochastic volatility. In Time Series Models in Econometrics, Finance and Other Fields (eds D. R.Cox, D. V.Hinkley and O. E.Barndorff-Nielsen), pp. 167. London: Chapman and Hall.
  • Straumann, D. (2005) Estimation in Conditional Heteroscedastic Time Series Models. Heidelberg: Springer.
  • Tiao, G. C. and Box, G. E. P. (1981) Modeling multiple time series with applications. J. Am. Statist. Ass., 76, 802816.
  • Tsay, R. (2002) Analysis of Financial Time Series. New York: Wiley.
  • Tse, Y. K. and Tsui, A. K. C. (1999) A note on diagnosing multivariate conditional heteroscedasticity models. J. Time Ser. Anal., 20, 679691.
    Direct Link:
  • Van Der Vaart, A. W. and Wellner, J. A. (1996) Weak Convergence and Empirical Processes. New York: Springer.
  • Vilenkin, N. (1968) Special Functions and the Theory of Group Representation. Providence: American Mathematical Society.
  • Wang, M. and Yao, Q. (2005) Modelling multivariate volatilities: an ad hoc approach. In Contemporary Multivariate Analysis and Experimental Designs (eds J.Fan, G.Li and R.Li). Singapore: World Scientific.
  • Van Der Weide, R. (2002) GO-GARCH: a multivariate generalized orthogonal GARCH model. J. Appl. Econometr., 17, 549564.
  • Wu, J. F. J. (1986) Jackknife, bootstrap and other resampling methods in regression analysis (with discussion). Ann. Statist., 14, 12611350.
  • Yu, B. (1994) Rates of convergence for empirical processes of stationary mixing sequences. Ann. Statist., 22, 94116.

Appendix

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. Simulation
  6. 4. Real data examples
  7. 5. Conclusional remark
  8. Acknowledgements
  9. References
  10. Appendix

Appendix A: Conditions and proof of theorem 1

We first introduce two concepts: the mixing which measures the speed of decay of the auto-dependence for a time series over an increasing time span and the Vapnik–C̆ervonenkis (VC) index which measures the complexity of a collection of sets.

Let inline image be the σ-algebra that is generated by {Xt,ileqslant R: less-than-or-eq, slanttleqslant R: less-than-or-eq, slantj}. The β-mixing coefficients are defined as

  • image

(See section 2.6.1 of Fan and Yao (2003).)

For an arbitrary set of n points {x1,…,xn}, there are 2n possible subsets. Say that ℬ picks out a certain subset from {x1,…,xn} if this can be formed as a set of the form B∩{x1,…,xn} for a set B in ℬ. The collection ℬ shatters {x1,…,xn} if each of its 2n subsets can be picked out by ℬ. The VC index of ℬ refers to the smallest n for which no set of size n is shattered by ℬ. A collection of sets ℬ is called a VC class if its VC index is finite. The collections of sets of rectangles, balls and their unions are VC classes. See section 2.6 of van der Vaart and Wellner (1996) for further discussion on VC classes.

Under the regularity conditions that are listed below, the estimator inline image is consistent; see theorem 1 in Section 2.2.2.

  • (a)
    The collection ℬ consists of countable subsets in ℛd and is a VC class. Furthermore ΣB⊂ℬw(B)<∞.
  • (b)
    The process {Xt} is strictly stationary with EXt2<∞, where ‖·‖ denotes the Euclidean norm. Furthermore, the β-mixing coefficients {Xt} satisfy β(n)=O(nb) for some b>0.
  • (c)
    There is a d×d orthogonal matrix A0 which minimizes Ψ(·) defined in equation (2.8). Furthermore the minimum value of Ψ is obtained at an orthogonal matrix A if and only if D(A,A0)=0.
  • (d)
    EXt2p<∞ for some p>2 and b>p/(p−2), where b is given in assumption (b).
  • (e)
    Ψ(A0)−Ψ(A)leqslant R: less-than-or-eq, slantaD(A,A0) for any orthogonal matrix A such that D(A,A0) is smaller than a small but fixed constant, where a>0 is a constant.

Remark 1.  Let ℋ be the set consisting of all d×d orthogonal matrices. Then ℋ may be partitioned into the equivalent classes that are defined by the distance D in equation (2.6) as follows: the D-distance between any two elements within an equivalent class is 0, and the D-distance between any two elements from different classes is greater than 0. Let ℋD be the quotient space ℋ/D consisting of those equivalent classes in ℋ, i.e. we treat A and B as the same element in ℋD if and only if D(A,B)=0. Condition (c) ensures that A0 is the unique minimizer of Ψ(A) on ℋD.

We introduce some notation. Let

  • image

Lemma 1 shows that both Ψ(·) and Ψn(·) are Lipschitz continuous on ℋD with D-distance.

Lemma 1.  For any U,V ∈ ℋD, it holds that

  • image

and

  • image

almost surely, where c>0 is a constant and tr(A) is the trace of a matrix A.

Proof.  We prove lemma 1 only for Ψ(·). The result for Ψn(·) may be shown in the same manner. Let U=(u1,…,ud)T, V=(v1,…,vd)T, inline image and inline image. We assume that the orders and the directions of ui and vj are arranged such that inline image for all i, and

  • image( (A.1))

See equation (2.6). Put the spectral decomposition for Ck(B) as

  • image

where μ1(B,k)geqslant R: gt-or-equal, slantedgeqslant R: gt-or-equal, slantedμd(B,k)geqslant R: gt-or-equal, slanted0 are the eigenvalues of Ck(B), and γ1,…,γd are their corresponding (orthonormal) eigenvectors. It is easy to see that μl(B,k)leqslant R: less-than-or-eq, slantμl for all k and B, where μ1geqslant R: gt-or-equal, slantedgeqslant R: gt-or-equal, slantedμd are the eigenvalues of the matrix inline image. Consequently, by noticing that inline image and inline image, we have

  • image

By using the Cauchy–Schwartz inequality, the above inequality is further bounded by

  • image( (A.2))

For x≠0, it holds that

  • image( (A.3))

Hence,

  • image( (A.4))

where

  • image
  • image

On the set B1B2,

  • image

This, combining with equations (A.2) and (A.4), implies that

  • image( (A.5))

Now lemma 1 follows from inequality (A.5) and the inequality

  • image

see also equation (A.1). This completes the proof.

A.1. Proof of theorem 1

Since Cn,k(B)−Ck(B) is a real symmetric matrix, it holds for any unit vectors a and b that

  • image

where ‖Cn,k(B)−Ck(B)‖ denotes the sum of the absolute values of the eigenvalues of Cn,k(B)−Ck(B). This may be obtained by using the spectral decomposition of Cn,k(B)−Ck(B). Consequently it holds uniformly for any orthogonal matrix A that

  • image( (A.6))

where c>0 is a constant. Note that the (i,j)th element of Cn,k(B)−Ck(B) is

  • image

where Xti denotes the ith element of Xt. Since E|XtiXtj|<∞ and ℬ is a VC class, the covering number for the set of functions {XtiXtjI(Xtk ∈ B),B ∈ ℬ} has a polynomial rate of growth for any underlying probability measure (theorem 2.6.4, van der Vaart and Wellner (1996)). Hence, it is a Glivenko–Cantelli class. It follows now from theorem 3.4 of Yu (1994) that

  • image

Consequently,

  • image

where λmax(B,k) and λmin(B,k) denote respectively the maximum and the minimum eigenvalues of Cn,k(B)−Ck(B). Thus

  • image

for k=1,…,k0. Now it follows from inequality (A.6) that

  • image

Combining this with lemma 1 above and the continuity of the ‘argmax’ mapping (theorem 3.2.2 and corollary 3.2.3, van der Vaart and Wellner (1996)), it holds that inline image. This completes the proof of the first part of theorem 1.

Under the additional condition E|XtiXtj|2p<∞ and the mixing condition that is given in condition (d), theorem 1 of Arcones and Yu (1994) implies that the set of functions {XtiXtjI(Xtk ∈ B),B ∈ ℬ} is a Donsker class, and hence the process {Δn,k(B),B ∈ ℬ} that is indexed by B ∈ ℬ converges weakly to a Gaussian process, where Δn,k(B)=n1/2{Cn,k(B)−Ck(B)}. It follows from equation (A.3) that

  • image( (A.7))

where

  • image
  • image

The last equality in expression (A.7) follows from the fact that, on B3B4,

  • image

It follows from equation (A.7) and condition (e) that

  • image( (A.8))

Now, by substituting A by inline image, the left-hand side of expression (A.8) must be non-negative by the definition of inline image. The right-hand side of expression (A.8) would be negative unless

  • image

This completes the proof.