Multivariate GARCH models: a survey



This paper surveys the most important developments in multivariate ARCH-type modelling. It reviews the model specifications and inference methods, and identifies likely directions of future research. Copyright © 2006 John Wiley & Sons, Ltd.


Understanding and predicting the temporal dependence in the second-order moments of asset returns is important for many issues in financial econometrics. It is now widely accepted that financial volatilities move together over time across assets and markets. Recognizing this feature through a multivariate modelling framework leads to more relevant empirical models than working with separate univariate models. From a financial point of view, it opens the door to better decision tools in various areas, such as asset pricing, portfolio selection, option pricing, hedging and risk management. Indeed, unlike at the beginning of the 1990s, several institutions have now developed the necessary skills to use the econometric theory in a financial perspective.

Since the seminal paper of Engle (1982), traditional time series tools such as autoregressive moving average (ARMA) models (Box and Jenkins, 1970) for the mean have been extended to essentially analogous models for the variance. Autoregressive conditional heteroscedasticity (ARCH) models are now commonly used to describe and forecast changes in the volatility of financial time series. For a survey of ARCH-type models, see Bollerslev et al. (1992, 1994), Bera and Higgins (1993), Pagan (1996), Palm (1996) and Shephard (1996), among others.

The most obvious application of MGARCH (multivariate GARCH) models is the study of the relations between the volatilities and co-volatilities of several markets.1 Is the volatility of a market leading the volatility of other markets? Is the volatility of an asset transmitted to another asset directly (through its conditional variance) or indirectly (through its conditional covariances)? Does a shock on a market increase the volatility on another market, and by how much? Is the impact the same for negative and positive shocks of the same amplitude? A related issue is whether the correlations between asset returns change over time.2 Are they higher during periods of higher volatility (sometimes associated with financial crises)? Are they increasing in the long run, perhaps because of the globalization of financial markets? Such issues can be studied directly by using a multivariate model, and raise the question of the specification of the dynamics of covariances or correlations. In a slightly different perspective, a few papers have used MGARCH models to assess the impact of volatility in financial markets on real variables like exports and output growth rates, and the volatility of these growth rates.3

Another application of MGARCH models is the computation of time-varying hedge ratios. Traditionally, constant hedge ratios are estimated by OLS as the slope of a regression of the spot return on the futures return, because this is equivalent to estimating the ratio of the covariance between spot and futures over the variance of the futures. Since a bivariate MGARCH model for the spot and futures returns directly specifies their conditional variance–covariance matrix, the hedge ratio can be computed as a byproduct of estimation and updated by using new observations as they become available. See Lien and Tse (2002) for a survey on hedging and additional references.

Asset pricing models relate returns to ‘factors’, such as the market return in the capital asset pricing model. A specific asset excess return (in excess of the risk-free return) may be expressed as a linear function of the market return. Assuming its constancy, the slope, or β coefficient, may be estimated by OLS. Like in the hedging case, since the β is the ratio of a covariance to a variance, an MGARCH model can be used to estimate time-varying β coefficients. See Bollerslev et al. (1988), De Santis and Gérard (1998), Hafner and Herwartz (1998) for examples.

Given an estimated univariate GARCH model on a return series, one knows the return conditional distribution, and one can forecast the value-at-risk (VaR) of a long or short position. When considering a portfolio of assets, the portfolio return can be computed directly from the asset shares and returns. A GARCH model can be fit to the portfolio returns for given weights. If the weight vector changes, the model has to be estimated again. On the contrary, if a multivariate GARCH model is fitted, the multivariate distribution of the returns can be used directly to compute the implied distribution of any portfolio. There is no need to re-estimate the model for different weight vectors. In the present state of the art, it is probably simpler to use the univariate framework if there are many assets, but we conjecture that using a multivariate specification may become a feasible alternative. Whether the univariate ‘repeated’ approach is more adequate than the multivariate one is an open question. The multivariate approach is illustrated by Giot and Laurent (2003) using a trivariate example with a time-varying correlation model.

MGARCH models were initially developed in the late 1980s and the first half of the 1990s, and after a period of tranquillity in the second half of the 1990s, this area seems to be experiencing again a quick expansion phase. MGARCH models are partly covered in Franses and van Dijk (2000), Gourieroux (1997) and most of the surveys on ARCH models cited above, but none of them presents, as this one, a comprehensive and up-to-date survey of the field, including the most recent findings.

The paper is organized in the following way. In Section 2, we review existing MGARCH specifications. Section 3 is devoted to estimation problems and Section 4 to diagnostic tests. Finally, we offer our conclusions and ideas for further developments in Section 5.


Consider a vector stochastic process {yt} of dimension N × 1. As usual, we condition on the sigma field, denoted by It−1, generated by the past information (here the yt's) until time t − 1. We denote by θ a finite vector of parameters and write:

equation image(1)

where µt(θ) is the conditional mean vector and

equation image(2)

where equation image is a N × N positive definite matrix. Furthermore, we assume the N × 1 random vector zt to have the following first two moments:

equation image(3)

where IN is the identity matrix of order N. We still have to explain what equation image is (for convenience we leave out θ in the notation). To make this clear we calculate the conditional variance matrix of yt:

equation image(4)

Hence equation image is any N × N positive definite matrix such that Ht is the conditional variance matrix of yt, e.g. equation image may be obtained by the Cholesky factorization of Ht. Both Ht and µt depend on the unknown parameter vector θ, which can in most cases be split into two disjoint parts, one for µt and one for Ht.4 A case where this is not true is that of GARCH-in-mean models, where µt is functionally dependent on Ht. In this section, we take no account of the conditional mean vector for notational ease. It is usually specified as a function of the past, through a vectorial autoregressive moving average (VARMA) representation for the level of yt.

In the following subsections we review different specifications of Ht. They differ in various aspects. We distinguish three nonmutually exclusive approaches for constructing multivariate GARCH models; (i) direct generalizations of the univariate GARCH model of Bollerslev (1986); (ii) linear combinations of univariate GARCH models; (iii) nonlinear combinations of univariate GARCH models. In the first category we have VEC, BEKK and factor models. Related models like the flexible MGARCH, Riskmetrics, Cholesky and full factor GARCH models are also in this category. In the second category we have (generalized) orthogonal models and latent factor models. The last category contains constant and dynamic conditional correlation models, the general dynamic covariance model and copula-GARCH models. To keep the notational burden low, we present the models in their ‘(1,1)’ form rather than in their general ‘(p,q)’ form.

2.1. Generalizations of the Univariate Standard GARCH Model

The models in this category are multivariate extensions of the univariate GARCH model. When we consider VARMA models for the conditional mean of several time series the number of parameters increases rapidly. The same happens for multivariate GARCH models as straightforward extensions of the univariate GARCH model. Furthermore, since Ht is a variance matrix, positive definiteness has to be ensured. To make the model tractable for applied purposes, additional structure may be imposed, for example in the form of factors or diagonal parameter matrices. This class of models lends itself to relatively easy theoretical derivations of stationarity and ergodicity conditions, and unconditional moments (see e.g. He and Teräsvirta, 2002a).

VEC and BEKK Models

A general formulation of Ht has been proposed by Bollerslev et al. (1988). In the general VEC model, each element of Ht is a linear function of the lagged squared errors and cross-products of errors and lagged values of the elements of Ht.

Definition 1 The VEC(1, 1) model is defined as:

equation image(5)


equation image(6)
equation image(7)

and vech(·) denotes the operator that stacks the lower triangular portion of a N × N matrix as a N(N + 1)/2 × 1 vector. A and G are square parameter matrices of order (N + 1)N/2 and c is a (N + 1)N/2 × 1 parameter vector.

The number of parameters is N(N + 1)(N(N + 1) + 1)/2 (e.g. for N = 3 it is equal to 78), which implies that in practice this model is used only in the bivariate case. To overcome this problem some simplifying assumptions have to be imposed. Bollerslev et al. (1988) suggest the diagonal VEC (DVEC) model in which the A and G matrices are assumed to be diagonal, each element hijt depending only on its own lag and on the previous value of ϵitϵjt. This restriction reduces the number of parameters to N(N + 5)/2 (e.g. for N = 3 it is equal to 12). But even under this diagonality assumption, large-scale systems are still highly parameterized and difficult to estimate in practice.

Necessary and sufficient conditions on the parameters to ensure that the conditional variance matrices in the DVEC model are positive definite almost surely are most easily derived by expressing the model in terms of Hadamard products (denoted by ⊙).5 In particular, let us define the symmetric N × N matrices A°, G° and C° as the matrices implied by the relations A = diag[vech(A°)],6G = diag[vech(G°)] and c = vech(C°). The diagonal model can thus be written as follows:

equation image(8)

It is straightforward to show (see Attanasio, 1991) that Ht is positive definite for all t provided that C°, A°, G° and the initial variance matrix (H0) are positive definite. Moreover, these conditions are easily imposed through a Cholesky decomposition of the parameter matrices in (8). Note that even simpler versions of the DVEC model constrain the A° and G° matrices to be rank one matrices, or a positive scalar times a matrix of ones, also called a scalar model (see Ding and Engle, 2001).

Riskmetrics (1996) uses the exponentially weighted moving average model (EWMA) to forecast variances and covariances. Practitioners who study volatility processes often observe that their model is very close to the unit root case. To take this into account, Riskmetrics defines the variances and covariances as IGARCH-type models (Engle and Bollerslev, 1986):

equation image(9)

In terms of the VEC model in (5) we have

equation image(10)

which is a scalar VEC model. The decay factor λ proposed by Riskmetrics is equal to 0.94 for daily data and 0.97 for monthly data. The decay factor is not estimated but suggested by Riskmetrics. In this respect, the model is easy to work with in practice. However, imposing the same dynamics on every component in a multivariate GARCH model, no matter which data are used, is difficult to justify.

Because it is difficult to guarantee the positivity of Ht in the VEC representation without imposing strong restrictions on the parameters,7 Engle and Kroner (1995) propose a new parametrization for Ht that easily imposes its positivity, i.e. the BEKK model (the acronym comes from synthesized work on multivariate models by Baba, Engle, Kraft and Kroner).

Definition 2 The BEKK(1, 1, K) model is defined as:

equation image(11)

where C*, equation imageandequation imageare N × N matrices but C* is upper triangular.

The summation limit K determines the generality of the process. The parameters of the BEKK model do not represent directly the impact of the different lagged terms on the elements of Ht, like in the VEC model. The BEKK model is a special case of the VEC model. We refer to Engle and Kroner (1995) for propositions and proofs about VEC and BEKK models. For example, to avoid observationally equivalent structures they provide sufficient conditions to identify BEKK models with K = 1. These conditions are that equation image, equation image and the diagonal elements of C* are restricted to be positive.

The number of parameters in the BEKK(1,1,1) model is N(5N + 1)/2. To reduce this number, and consequently to reduce the generality, one can impose a diagonal BEKK model, i.e. equation image and equation image in (11) are diagonal matrices. This model is also a DVEC model but it is less general, although it is guaranteed to be positive definite while the DVEC is not. This can again be easily checked in the bivariate model: the DVEC model contains 9 parameters while the BEKK model contains only 7 parameters. This happens because the parameters governing the dynamics of the covariance equation in the BEKK model are the products of the corresponding parameters of the two variance equations in the same model. Another way to reduce the number of parameters is to use a scalar BEKK model, i.e. equation image and equation image are equal to a scalar times a matrix of ones.

For the VEC model in Definition 1 to be covariance-stationary it is required that the eigenvalues of A + G are less than one in modulus. The unconditional variance matrix Σ, equal to E(Ht), is given by equation image, where N* = N(N + 1)/2. Similar expressions can be obtained for the BEKK model. Hafner (2003) provides analytical expressions of the fourth-order moments of the general VEC model; see also Nijman and Sentana (1996).

Besides the BEKK model, another option to guarantee the positivity of Ht in the VEC representation is given by Kawakatsu (2003), who proposes the Cholesky factor GARCH model. Instead of specifying a functional form for Ht, he specifies a model on Lt where Ht = LtLt. The advantage of this specification is that Ht is always positive definite without any restrictions on the parameters. The disadvantage is that identification restrictions are needed, which implies that the order of the series in yt is relevant and that the interpretation of the parameters is difficult. A similar model based on the Cholesky decomposition can be found in Gallant and Tauchen (2001) and in Tsay (2002).

The difficulty when estimating a VEC or even a BEKK model is the high number of unknown parameters, even after imposing several restrictions. It is thus not surprising that these models are rarely used when the number of series is larger than 3 or 4. Factor and orthogonal models circumvent this difficulty by imposing a common dynamic structure on all the elements of Ht, which results in less parameterized models.

Factor Models

Engle et al. (1990b) propose a parameterization of Ht using the idea that co-movements of the stock returns are driven by a small number of common underlying variables, which are called factors. Bollerslev and Engle (1993) use this parametrization to model common persistence in conditional variances. The factor model can be seen as a particular BEKK model. We take the definition of Lin (1992).

Definition 3 The BEKK(1, 1, K) model in Definition 2 is a factor GARCH model, denoted by F-GARCH(1, 1, K), if for each k = 1, …, K, equation imageandequation imagehave rank one and have the same left and right eigenvectors,8 λkand wk, i.e.

equation image(12)

where αk and βk are scalars, and λk and wk (for k = 1, …, K) are N × 1 vectors satisfying

equation image(13)
equation image(14)

If we substitute(12)and(13)into(11)and define Ω = C*′C*, we get

equation image(15)

Restriction (14) is an identification restriction. The K-factor GARCH model implies that the time-varying part of Ht has reduced rank K, but Ht remains of full rank because Ω is assumed positive definite. The vector λk and the scalar wkϵt (denoted by fkt hereafter) are also called the kth factor loading and the kth factor, respectively. The number of parameters in the F-GARCH(1, 1, 1) is N(N + 5)/2. In (15), the expression between brackets can be replaced by other univariate GARCH specifications.

For example, the conditional variance matrix of the F-GARCH (1, 1, 2) model is:

equation image(16)

where the parameter vectors λk = (λk1, λk2, …, λkN)′ and wk are of dimension N × 1 while equation image and equation image are scalar parameters. Denoting equation image we can write (16) in a more familiar way as:

equation image(17)
equation image(18)

where τij = ωij − λ1iλ1jω1 − λ2iλ2jω2 and ωk = wk′Ωwk. Hence σkt is defined as a univariate GARCH(1, 1) model. The persistence of the conditional variance in (16) is measured by equation image and can also be interpreted as common persistence. In other words, the dynamics of the elements of Ht is the same. We can write Ht as:

equation image(19)

where Ω* = Ω − λ1λ′1ω1 − λ2λ′2ω2. Note that Et−1(f1tf2t) = ω1′Ωw2 because ω′kλl = 0 for kl, see (13). This implies that in the case of more than one factor we have the result that any pair of factors has a time-invariant conditional covariance.

Alternatively, the two-factor model described in (19) can be obtained from

equation image(20)

where et represents an idiosyncratic shock with constant variance matrix and uncorrelated with the two factors. Each factor fkt has zero conditional mean and conditional variance like a GARCH(1,1) process, see (18). The K-factor model can be written as

equation image(21)

where Λ is a matrix of dimension N × K and ft is a K × 1 vector. A factor is observable if it is specified as a function of ϵt, like in (16). See Section 2.6 for a brief discussion of latent factor models.

Several variants of the factor model are proposed in the literature. For example, Vrontos et al. (2003) introduce the full-factor multivariate GARCH model.

Definition 4 The FF-MGARCH model is defined as

equation image(22)

where W is a N × N triangular parameter matrix with ones on the diagonal and the matrixequation imagewhereequation imageis the conditional variance of the ith factor, i.e. the ith element of W−1ϵt, which can be separately defined as any univariate GARCH model.

By construction, Ht is always positive definite. Note that Ht has a structure that depends on the ordering of the time series in yt, because of the triangular structure of W. The restriction of having ones on the diagonal of W avoids superfluous parameters if each equation image has a free constant term.

Rigobon and Sack (2003) start from a system of simultaneous equations in structural form where the conditional variances of the innovations are jointly specified. By deriving the reduced form model one obtains innovations with a conditional variance matrix that can be compared with other unrestricted reduced-form MGARCH models. The structural model imposes a number of restrictions on the functional form of the conditional variance of the reduced-form innovations, resulting in less parameters than in a VEC model.

2.2. Linear Combinations of Univariate GARCH Models

In this category, we consider models, like orthogonal models and latent factor models (briefly discussed in Section 2.6), that are linear combinations of several univariate models, each of which is not necessarily a standard GARCH (e.g. the EGARCH model of Nelson (1991), the APARCH model of Ding et al. (1993), the fractionally integrated GARCH of Baillie et al. (1996), the contemporaneous asymmetric GARCH model of El Babsiri and Zakoian (2001) or the quadratic ARCH model of Sentana (1995)).

In the orthogonal GARCH model, the observed data are assumed to be generated by an orthogonal transformation of N (or a smaller number of) univariate GARCH processes. The matrix of the linear transformation is the orthogonal matrix (or a selection) of eigenvectors of the population unconditional covariance matrix of the standardized returns. In the generalized version, this matrix must only be invertible. The orthogonal models can also be considered as factor models, where the factors are univariate GARCH-type processes.

In the orthogonal GARCH model of Kariya (1988) and Alexander and Chibumba (1997), the N × N time-varying variance matrix Ht is generated by mN univariate GARCH models.

Definition 5 The O-GARCH(1,1, m) model is defined as:

equation image(23)

where V = diag(v1, v2, …, vN), with vi the population variance of ϵit, and Λm is a matrix of dimension N × m given by:

equation image(24)

l1 ≥ … ≥ lm > 0 being the m largest eigenvalues of the population correlation matrix of ut, and Pm the N × m matrix of associated (mutually orthogonal) eigenvectors. The vector ft = (f1t…fmt)′ is a random process such that:

equation image(25)
equation image(26)


equation image(27)

The parameters of the model are V, Λm and the parameters of the GARCH factors (αi's and βi's). The number of parameters is N(N + 5)/2 (if m = N). In practice, V and Λm are replaced by their sample counterparts, and m is chosen by principal component analysis applied to the standardized residuals ût. Alexander (2001, section 7.4.3) illustrates the use of the O-GARCH model. She emphasizes that using a small number of principal components compared to the number of assets is the strength of the approach (in one example, she fixes m at 2 for 12 assets). However, note that the conditional variance matrix has reduced rank (if m < N), which may be a problem for applications and for diagnostic tests which depend on the inverse of Ht.

In van der Weide (2002) the orthogonality condition assumed in the O-GARCH model is relaxed by assuming that the matrix Λ in the relation ut = Λft is square and invertible, rather than orthogonal. The matrix Λ has N2 parameters and is not restricted to be triangular like in the model of Vrontos et al. (2003), see Definition 4.

Definition 6 The GO-GARCH(1, 1) model is defined as in Definition 5, where m = N and Λ is a nonsingular matrix of parameters. The implied conditional correlation matrix of ϵt can be expressed as:

equation image(28)

In van der Weide (2002), the singular value decomposition of the matrix Λ is used as a parametrization, i.e. Λ = PL1/2U, where the matrix U is orthogonal, and P and L are defined as above (from the eigenvectors and eigenvalues). The O-GARCH model (when m = N) corresponds then to the particular choice U = IN. More generally, van der Weide expresses U as the product of N(N − 1)/2 rotation matrices:

equation image(29)

where Gijij) performs a rotation in the plane spanned by the ith and jth vectors of the canonical basis of IRN over an angle δij. For example, in the trivariate case

equation image(30)

and G23 has the block with cosδ23 and sinδ23 functions in the lower right corner. The N(N − 1)/2 rotation angles are parameters to be estimated.

For estimation, van der Weide (2002) replaces in a first step P and L by their sample counterparts and the remaining parameters (those of U) are estimated together with the parameters of the GARCH factors in a second step. Note that such a two-step estimation method is not applicable if an MGARCH-in-mean effect is included (this is also the case for the O-GARCH model). More generally, as pointed out by a referee, the elements in the matrix Λ could be estimated together with the GARCH parameters of the factors, in a single step.

The orthogonal models are particular F-GARCH models and thus are nested in the BEKK model. As a consequence, their properties follow from those of the BEKK model. In particular, it is obvious that the (G)O-GARCH model is covariance-stationary if the m univariate GARCH processes are themselves stationary.

2.3. Nonlinear Combinations of Univariate GARCH Models

This section collects models that may be viewed as nonlinear combinations of univariate GARCH models. This allows for models where one can specify separately, on the one hand, the individual conditional variances, and on the other hand, the conditional correlation matrix or another measure of dependence between the individual series (like the copula of the conditional joint density). For models of this category, theoretical results on stationarity, ergodicity and moments may not be so straightforward to obtain as for models presented in the preceding sections. Nevertheless, they are less greedy in parameters than the models of the first category, and therefore they are more easily estimable.

Conditional Correlation Models

The conditional variance matrix for this class of models is specified in a hierarchical way. First, one chooses a GARCH-type model for each conditional variance. For example, some conditional variances may follow a conventional GARCH model while others may be described as an EGARCH model. Second, based on the conditional variances one models the conditional correlation matrix (imposing its positive definiteness equation image).

Bollerslev (1990) proposes a class of MGARCH models in which the conditional correlations are constant and thus the conditional covariances are proportional to the product of the corresponding conditional standard deviations. This restriction greatly reduces the number of unknown parameters and thus simplifies the estimation.

Definition 7 The CCC model is defined as:

equation image(31)


equation image(32)

hiitcan be defined as any univariate GARCH model, and

equation image(33)

is a symmetric positive definite matrix withequation image.

R is the matrix containing the constant conditional correlations ρij. The original CCC model has a GARCH(1, 1) specification for each conditional variance in Dt:

equation image(34)

This CCC model contains N(N + 5)/2 parameters. Ht is positive definite if and only if all the N conditional variances are positive and R is positive definite. The unconditional variances are easily obtained, as in the univariate case, but the unconditional covariances are difficult to calculate because of the nonlinearity in (31). He and Teräsvirta (2002b) use a VEC-type formulation for (h11t, h22t, …, hNNt)′, to allow for interactions between the conditional variances. They call this the extended CCC model.

The assumption that the conditional correlations are constant may seem unrealistic in many empirical applications. Christodoulakis and Satchell (2002), Engle (2002) and Tse and Tsui (2002) propose a generalization of the CCC model by making the conditional correlation matrix time-dependent. The model is then called a dynamic conditional correlation (DCC) model. An additional difficulty is that the time-dependent conditional correlation matrix has to be positive definite equation image. The DCC models guarantee this under simple conditions on the parameters.

The DCC model of Christodoulakis and Satchell (2002) uses the Fisher transformation of the correlation coefficient. The specification of the correlation coefficient is ρ12, t = (emath image − 1)/(emath image + 1), where rt can be defined as any GARCH model using equation image as innovation. This model is easy to implement because the positive definiteness of the conditional correlation matrix is guaranteed by the Fisher transformation. However, it is only a bivariate model. The DCC models of Tse and Tsui (2002) and Engle (2002) are genuinely multivariate and are useful when modelling high-dimensional data sets.

Definition 8 The DCC model of Tse and Tsui (2002) orDCCT(M) is defined as:

equation image(35)

whereDtis defined in(32), hiitcan be defined as any univariate GARCH model and

equation image(36)

In(36), θ1and θ2are non-negative parameters satisfying θ1 + θ2 < 1, R is a symmetric N × N positive definite parameter matrix with ρii = 1 and Ψt−1 is the N × N correlation matrix of ϵτ for τ = t − M, t − M + 1, …, t − 1. Its i,jth element is given by:

equation image(37)

whereequation image. The matrix Ψt−1 can be expressed as:

equation image(38)

where Bt−1 is a N × N diagonal matrix with ith diagonal element given byequation image and Lt−1 = (ut−1, …, utM) is a N × M matrix, with ut = (u1tu2tuNt)′.

A necessary condition to ensure the positivity of Ψt−1, and therefore also of Rt, is that MN.9 Then Rt is itself a correlation matrix if Rt−1 is also a correlation matrix (notice that equation image).

Alternatively, Engle (2002) proposes a different DCC model (see also Engle and Sheppard, 2001).

Definition 9 The DCC model of Engle (2002) or DCCE(1, 1) is defined as in(35)with

equation image(39)

where the N × N symmetric positive definite matrix Qt = (qij, t) is given by:

equation image(40)

with ut as in Definition 8. Q̄ is the N × N unconditional variance matrix of ut, and α and β are non-negative scalar parameters satisfying α + β < 1.

The elements of can be estimated or alternatively set to their empirical counterpart to render the estimation even simpler (see Section 3). To show more explicitly the difference between DCCT and DCCE, we write the expression of the correlation coefficient in the bivariate case: for the DCCT(M)

equation image(41)

and for the DCCE(1, 1)

equation image(42)

Unlike in the DCCT model, the DCCE model does not formulate the conditional correlation as a weighted sum of past correlations. Indeed, the matrix Qt is written like a GARCH equation, and then transformed to a correlation matrix. However, for both the DCCT and DCCE models, one can test θ1 = θ2 = 0 or α = β = 0, respectively to check whether imposing constant conditional correlations is empirically relevant.

A drawback of the DCC models is that θ1, θ2 in DCCT and α, β in DCCE are scalars, so that all the conditional correlations obey the same dynamics. This is necessary to ensure that Rt is positive definite equation image through sufficient conditions on the parameters. If the conditional variances are specified as GARCH(1,1) models then the DCCT and DCCE models contain (N + 1)(N + 4)/2 parameters.

Interestingly, DCC models can be estimated consistently in two steps (see Section 3.2), which makes this approach feasible when N is high. Of course, when N is large, the restriction of common dynamics gets tighter, but for large N the problem of maintaining tractability also gets harder. In this respect, several variants of the DCC model are proposed in the literature. For example, Billio et al. (2003) argue that constraining the dynamics of the conditional correlation matrix to be the same for all the correlations is not desirable. To solve this problem, they propose a block-diagonal structure where the dynamics is constrained to be identical only within each block. The price to pay for this additional flexibility is that the block members have to be defined a priori, which may be cumbersome in some applications. Pelletier (2003) proposes a model where the conditional correlations follow a switching regime driven by an unobserved Markov chain so that the correlation matrix is constant in each regime but may vary across regimes. Another extension proposed by Engle (2002) consists of changing (40) into

equation image(43)

where i is a vector of ones and A and B are N × N matrices of parameters. This increases the number of parameters considerably, but the matrices A and B could be defined to depend on a small number of parameters (e.g. A = aa′).

To conclude, DCC models open the door to using flexible GARCH specifications in the variance part. Indeed, as the conditional variances (together with the conditional means) can be estimated using N univariate models, one can easily extend the DCC-GARCH models to more complex GARCH-type structures (as mentioned at the beginning of Section 2.2). One can also extend the bivariate CCC FIGARCH model of Brunetti and Gilbert (2000) to a model of the DCC family.

General Dynamic Covariance Model

A model somewhat different from the previous ones but that nests several of them is the general dynamic covariance (GDC) model proposed by Kroner and Ng (1998). They illustrate that the choice of a multivariate volatility model can lead to substantially different conclusions in an application that involves forecasting dynamic variance matrices. We extend the definition of Kroner and Ng (1998) to cover models with dynamic conditional correlations.

Definition 10 The GDC model is defined as:

equation image(44)


equation image(45)

Elementwise we have:

equation image(46)

where the θijt are given by the BEKK formulation in (45). The GDC model contains several MGARCH models as special cases. To show this we adapt a proposition from Kroner and Ng (1998). Consider the following set of conditions:

  • (ia)θ1 = θ2 = 0(DCCT) or α = β = 0(DCCE);
  • (ib)R = IN(DCCT) or = IN(DCCE);
  • (ii)ai = αili and equation image, where li is the ith column of an (N × N) identity matrix, and αi and βi, i = 1, …, N are scalars;
  • (iii)equation image;
  • (iv)equation image;
  • (v)A = α(wλ′) and G = β(wλ′) where A = [a1, …, aN] and G = [g1, …, gN] are N × N matrices, w and λ are N × 1 vectors, and α and β are scalars.

The GDC model reduces to different multivariate GARCH models under different combinations of these conditions. Specifically, the GDC model becomes:

  • the DCCT or the DCCE(1, 1) model with GARCH(1,1) conditional variances under conditions (ii) and (iii);

  • the CCC model with GARCH(1,1) conditional variances under conditions (ia), (ii) and (iii);

  • a restricted DVEC(1,1) model under conditions (i) and (ii);

  • the BEKK(1,1,1) model under conditions (i) and (iv);

  • the F-GARCH(1,1,1) model under conditions (i), (iv) and (v).

Condition (ib) serves as an identification restriction for the VEC, BEKK and F-GARCH models. As we can see, the GDC model is an encompassing model. This requires a large number of parameters (i.e. [N(7N − 1) + 4]/2). For example, in the bivariate case there are 11 parameters in θt, 3 in Rt and 1 in Φ, which makes a total of 15. This is less than for an unrestricted VEC model (21 parameters), but more than for the BEKK model (11 parameters).

Copula-MGARCH Models

Another approach for modelling the conditional dependence is known as the copula-GARCH model. This approach makes use of the theorem due to Sklar (1959) stating that any N-dimensional joint distribution function may be decomposed into its N marginal distributions, and a copula function that completely describes the dependence between the N variables. See Nelsen (1999) for a comprehensive introduction to copulas. Patton (2000) and Jondeau and Rockinger (2001) have proposed copula-GARCH models. These models are specified by GARCH equations for the conditional variances (possibly with each variance depending on the lag of the other variances and of the other shocks), marginal distributions for each series (e.g. t-distributions) and a conditional copula function. Both papers highlight the need to allow for time-variation in the conditional copula, extending in some sense the DCC models to other specifications of the conditional dependence. The copula function is rendered time-varying through its parameters, which can be functions of past data. In this respect, like the DCC model of Engle (2002), copula-GARCH models can be estimated using a two-step maximum likelihood approach (see Section 3.2) which solves the dimensionality problem. An interesting feature of copula-GARCH models is the ease with which very flexible joint distributions may be obtained in the bivariate case. Their application to higher dimensions is a subject for further research.

2.4. Leverage Effects in MGARCH Models

For stock returns, negative shocks may have a larger impact on their volatility than positive shocks of the same absolute value (this is most often interpreted as the leverage effect unveiled by Black, 1976). In other words, the news impact curve, which traces the relation between volatility and the previous shock, is asymmetric. Univariate models that allow for this effect are the EGARCH model of Nelson (1991), the GJR model of Glosten et al. (1993) and the threshold ARCH model of Zakoian (1994), among others. For multivariate series the same argument applies: the variances and covariances may react differently to a positive than to a negative shock. In the multivariate case, a shock can be defined in terms of ϵt or zt. Note that the signs of ϵit and zit do not necessarily coincide, see (2).10

The MGARCH models reviewed in the previous subsections define the conditional variance matrix as a function of lagged values of ϵtϵ′t. For example, each conditional variance in the VEC model is a function of its own squared error but it is also a function of the squared errors of the other series as well as the cross-products of errors. A model that takes explicitly the sign of the errors into account is the asymmetric dynamic covariance (ADC) model of Kroner and Ng (1998). The only difference with Definition 10 is an extra term based on the vector vt = max[0, −ϵt] in θijt to take into account the sign of ϵit:

equation image(47)

The ADC model nests some natural extensions of MGARCH models that incorporate the leverage effect. Kroner and Ng (1998) apply the model to large and small firm returns. They find that bad news about large firms can cause additional volatility in both small-firm and large-firm returns. Furthermore, this bad news increases the conditional covariance. Small firm news has only minimal effects.

Hansson and Hordahl (1998) add the term Dvt−1vt−1, in a DVEC model like (8), where D is a diagonal matrix of parameters. To incorporate the leverage effect in the (bivariate) BEKK model, Hafner and Herwartz (1998) add the terms D1ϵt−1ϵ′t−1D1Dmath image + 1′2ϵt−1ϵ′t−1D21math image, where D1 and D2 are 2 × 2 matrices of parameters and 1{…} is the indicator function. This generalizes the univariate GJR specification.

2.5. Transformations of MGARCH Models

Not all MGARCH models are invariant with respect to linear transformations. By invariance of a model, we mean that it stays in the same class if a linear transformation is applied to yt, say t = Fyt, where F is a matrix of constants (for simplicity we assume F is square). If yt is a vector of returns, a linear transformation corresponds to new assets (portfolios combining the original assets). It seems sensible that a model should be invariant, otherwise the question arises which basic assets should be modelled. In some cases (stocks), these are naturally defined, in other cases, like exchange rates, they are not, since a reference currency must be chosen (see Gourieroux and Jasiak, 2001, p. 140). Lack of invariance of a model does not imply that the model is not suitable at all for use in empirical work. Implications of invariance, or lack of invariance, are an open issue. For example, if the model is invariant, one can estimate it with some number of basic assets, as well as with a smaller number of portfolios of the basic assets. Estimates of the larger model imply estimates of the smaller models, which could be compared to the direct estimates of the latter. Very different estimates may lead us to question the specification.

Lack of invariance occurs whenever a diagonal matrix in the equation defining Ht is premultiplied by the matrix F defined above. The general VEC and BEKK models are invariant, but their diagonal versions are not. Conditional correlation models are not invariant, since FDt is not diagonal when Dt is diagonal, see (31).

A related question is the marginalization of MGARCH processes: starting from a strong MGARCH model for yt, can we characterize the implied marginal process of a subvector of yt, in particular of the scalar yit? Nijman and Sentana (1996) provide an answer to that question.11 To take a simple case, for a bivariate VEC(1,1) model, the implied process for y1t is at most a weak GARCH (3,3) process.12 In the DVEC(1,1) case, the marginal process of y1t remains a strong GARCH process. In proving such results, they use the VARMA(1,1) representation of the VEC(1,1) model ht = c + Aηt−1 + Ght−1, given by ηt = c + (A + Gt−1 + ωtGωt−1 where ωt = ηt − Et−1t) is a martingale difference. Hence it is clear that this approach cannot be applied to the conditional correlation models and the GDC model. Marginalization results for the latter models are not known.

Another question is that of temporal aggregation of MGARCH processes. Hafner (2004) shows that, like Drost and Nijman (1993) in the univariate case, the class of weak multivariate GARCH processes is closed under temporal aggregation. Weak multivariate GARCH models are characterized by a weak VARMA structure of ηt in (7). Fourth moment characteristics turn out to be crucial for deriving the low-frequency dynamics. The issue of estimation of the parameters of the low-frequency model is difficult because the probability law of the innovation vector is unknown, since it is only assumed to be a weak white noise. See Hafner and Rombouts (2003) for more details.

2.6. Alternative Approaches to Multivariate Volatility

There are at least two other approaches to multivariate volatility than MGARCH models: stochastic volatility (SV) models and realized volatility.

Multivariate stochastic volatility models (see e.g. Harvey et al., 1994) specify that the conditional variance matrix depends on some unobserved or latent processes rather than on past observations. A multivariate SV model is typically specified as N univariate SV models for the conditional variances (see Ghysels et al., 1996, for a survey of SV models):

equation image(48)

where σi is a parameter. The innovation vector zt = (z1t, …, zNt)′ has E(zt) = 0 and Var(zt) = Σz, while the vector of volatilities ht = (h1t, …, hNt)′ follows a VAR(1) process ht = Φht−1 + ηt where ηt is i.i.d. ∼N(0, Ση). In this model, the dynamics of the covariances depends on the dynamics of the corresponding conditional variances, in other words, there is no direct specification of changing covariances or correlations. A drawback of SV models is the complexity of estimation.

Because the main emphasis of this survey is on ‘data-driven’ MGARCH models, a thorough discussion of the vast literature on latent factor models is beyond the scope of this paper. The factor model in (21) becomes a latent model if Ft is latent, which means that it is not included in It, implying that the conditional variance matrix, see for example (19), is not measurable any more. This is in contrast with Section 2.1, where the conditional variance of the factors is specified as a function of the past data (ϵt). Therefore, latent factor models can be classified as stochastic volatility models as mentioned in Shephard (1996). The elements of Ft typically follow dynamic heteroscedastic processes, for example Diebold and Nerlove (1989) use ARCH models. The fact that the factor is considered as nonobservable complicates inference considerably, since the likelihood function must be marginalized with respect to it (see Gourieroux, 1997, section 6.3). The conditional covariance between the factors is usually assumed to be equal to zero. See Sentana and Fiorentini (2001) and Fiorentini et al. (2004) for more details on identification and estimation of factor models. Sentana (1998) shows that the observed factor model is observationally equivalent (up to conditional second moments) to a class of conditionally heteroscedastic factor models including latent factor models. Doz and Renault (2003) elaborate on this result and draw the conclusions in terms of model specification and identification, and in terms of inference methodologies.

The second alternative has been proposed by Andersen et al. (2003). In this case, a daily measure of variances and covariances is computed as an aggregate measure from intraday returns. More specifically, a daily realized variance for day t is computed as the sum of the squared intraday equidistant returns for the given trading day and a daily realized covariance is obtained by summing the products of intraday returns. Once such daily measures have been obtained, they can be modelled, e.g. for a prediction purpose. A nice feature of this approach is that unlike MGARCH and multivariate stochastic volatility models, the N(N − 1)/2 covariance components of the conditional variance matrix (or, rather, the components of its Choleski decomposition) can be forecasted independently, using as many univariate models. As shown by Andersen et al. (2003), although the use of the realized covariance matrix facilitates rigorous measurement of conditional volatility in much higher dimensions than is feasible with MGARCH and multivariate SV models, it does not allow the dimensionality to become arbitrarily large. Indeed, to ensure the positive definiteness of the realized covariance matrix, the number of assets (N) cannot exceed the number of intraday returns for each trading day. The main drawback is that intraday data remain relatively costly and are not readily available for all assets. Furthermore, a large amount of data handling and computer programming is usually needed to retrieve the intraday returns from the raw data files supplied by the exchanges or data vendors. On the contrary, working with daily data is relatively simple and the data are broadly available.

Which approach is best, for example in terms of forecasting, is beyond the scope of the paper and an interesting topic for future theoretical and empirical research.


In the previous section we have defined existing specifications of conditional variance matrices that enter the definition either of a data generating process (DGP) or of a model to be estimated. In Section 3.1 we discuss maximum likelihood (ML) estimation of these models, and in Section 3.2 we explain a two-step approach for estimating conditional correlation models. Finally, we review briefly various issues related to practical estimation in Section 3.3.

3.1. Maximum Likelihood

Suppose the vector stochastic process {yt} (for t = 1, …, T) is a realization of a DGP whose conditional mean, conditional variance matrix and conditional distribution are respectively µt0), Ht0) and p(yt0, It−1), where ζ0 = (θ0η0) is a r-dimensional parameter vector and η0 is the vector that contains the parameters of the distribution of the innovations zt (there may be no such parameter). Importantly, to justify the choice of the estimation procedure, we assume that the model to be estimated encompasses the true formulations of µt0) and Ht0).

The procedure most often used in estimating θ0 involves the maximization of a likelihood function constructed under the auxiliary assumption of an i.i.d. distribution for the standardized innovations zt. The i.i.d. assumption may be replaced by the weaker assumption that zt is a martingale difference sequence with respect to It−1, but this type of assumption does not translate into the likelihood function. The likelihood function for the i.i.d. case can then be viewed as a quasi-likelihood function.

Consequently, one has to make an additional assumption on the innovation process by choosing a density function, denoted g(zt(θ)|η), where η is a vector of nuisance parameters. The problem to solve is thus to maximize the sample loglikelihood function LT(θ, η) for the T observations (conditional on some starting values for µ0 and H0), with respect to the vector of parameters ζ = (θ, η), where

equation image(49)


equation image(50)

and the dependence with respect to θ occurs through µt and Ht. The term |Ht|−1/2 is the Jacobian that arises in the transformation from the innovations to the observables. Note that unless g(·) belongs to the class of elliptical distributions, i.e. is a function of ztzt, the ML estimator depends on the choice of decomposition of equation image, since equation image.

The most commonly employed distribution in the literature is the multivariate normal, uniquely determined by its first two moments (so that ζ = θ since η is empty). In this case, the sample loglikelihood is (up to a constant):

equation image(51)

It is well known that the normality of the innovations is rejected in most applications dealing with daily or weekly data. In particular, the kurtosis of most financial asset returns is larger than three, which means that they have too many extreme values to be normally distributed. Moreover, their unconditional distribution often has fatter tails than what is implied by a conditional normal distribution: the increase of the kurtosis coefficient brought by the dynamics of the conditional variance is not usually sufficient to match adequately the unconditional kurtosis of the data.

However, as shown by Bollerslev and Wooldridge (1992), a consistent estimator of θ0 may be obtained by maximizing (51) with respect to θ even if the DGP is not conditionally Gaussian. This estimator, called (Gaussian) quasi-maximum likelihood (QML) or pseudo-maximum likelihood (PML) estimator, is consistent provided the conditional mean and the conditional variance are specified correctly. Jeantheau (1998) proves the strong consistency of the Gaussian QML estimator of multivariate GARCH models. He also provides sufficient identification conditions for the CCC model. See Gourieroux (1997) for a detailed description of the QML method in an MGARCH context and its asymptotic properties. For these reasons and as far as the purpose of the analysis is to estimate consistently the first two conditional moments, estimating MGARCH models by QML is justified.

Nevertheless, in certain situations it is desirable to search for a better distribution for the innovation process. For instance, when one is interested in obtaining density forecasts (see Diebold et al., 1998, in the univariate case and Diebold et al., 1999, in the multivariate case) it is natural to relax the normality assumption, keeping in mind the risk of inconsistency of the estimator (see Newey and Steigerwald, 1997).

A natural alternative to the multivariate Gaussian density is the Student density, see Harvey et al. (1992) and Fiorentini et al. (2003). The latter has an extra scalar parameter, the degrees of freedom parameter, denoted ν hereafter. When this parameter tends to infinity, the Student density tends to the normal density. When it tends to zero, the tails of the density become thicker and thicker. The parameter value indicates the order of existence of the moments, e.g. if ν = 2, the second-order moments do not exist, but the first-order moments exist. For this reason, it is convenient (although not necessary) to assume that ν > 2, so that Ht is always interpretable as a conditional covariance matrix. Under this assumption, the Student density can be defined as:

equation image(52)

where Γ(·) is the Gamma function. Note that in this case η = ν. The density function of yt is easily obtained by applying (50).

The relevance of the Student distribution may be questioned when the innovations are found to be skewed. To account for both the skewness and the excess kurtosis in returns, an MGARCH model can be combined with a multivariate density for the innovations, which is skewed and has fat tails. Densities used in this context are mixtures of multivariate normal densities (see Vlaar and Palm, 1993), the generalized hyperbolic distribution (see Barndorff-Nielsen and Shephard, 2001, for the density and Mencía and Sentana, 2003, for a recent application to an MGARCH context) and a multivariate skew-Student density (see Bauwens and Laurent, 2002). The latter authors show, in applications to several portfolios of stocks and currencies, that the multivariate skew-Student density improves the quality of out-of-sample Value-at-Risk forecasts, by comparison with a symmetric density.

Alternatively, Hafner and Rombouts (2004) propose a semi-parametric estimation technique, extending the previous work of Engle and González-Rivera (1991) and Drost and Klaassen (1997) to MGARCH models. This consists of first estimating the model by QML, which provides consistent estimates of the innovations. In a second step, these are used to estimate the function g(·) nonparametrically. Finally, the parameters of the GARCH model are estimated using ĝ(·) to define the likelihood function.

The asymptotic properties of ML and QML estimators in multivariate GARCH models are not yet firmly established, and are difficult to derive from low level assumptions. As mentioned previously, consistency has been shown by Jeantheau (1998). Asymptotic normality of the QMLE is not established generally. Gourieroux (1997, section 6.3) proves it for a general formulation using high level assumptions. Comte and Lieberman (2003) prove it for the BEKK formulation. Since F-GARCH and (G)O-GARCH models are special cases of the BEKK model, this result holds also for these two models (see van der Weide, 2002). Researchers who use MGARCH models have generally proceeded as if asymptotic normality holds in all cases. Asymptotic normality of the MLE and QMLE has been proved in the univariate case under low level assumptions, one of which is the existence of moments of order four or higher of the innovations (see Lee and Hansen, 1994; Lumsdaine, 1996; Ling and McAleer, 2003). However, Hall and Yao (2003) show that the asymptotic distribution of the QMLE in the univariate GARCH(p, q) model is not normal, but is a multivariate stable distribution (with fatter tails than the normal) if the innovations are in the domain of attraction of a stable law with exponent smaller than two (implying nonexisting fourth moments). Extension of this result to the multivariate case is a subject for further research.

Finally, it is worth mentioning that the conditional mean parameters may be consistently estimated in a first stage, prior to the estimation of the conditional variance parameters, for example for a VARMA model, but not for a GARCH-in-mean model. Estimating the parameters simultaneously with the conditional variance parameters would increase the efficiency at least in large samples (unless the asymptotic covariance matrix is block diagonal between the mean and variance parameters), but this is computationally more difficult. For this reason, one usually takes either a very simple model for the conditional mean or one considers equation image as the data for fitting the MGARCH model. A detailed investigation of the consequences of such a two-step procedure on properties of estimators has still to be conducted. Conditions for block diagonality of the asymptotic covariance matrix have also to be worked out (generalizing results of Engle, 1982 for the univariate case).

3.2. Two-Step Estimation

A useful feature of the DCC models presented earlier is that they can be estimated consistently using a two-step approach. Engle and Sheppard (2001) show that in the case of a DCCE model, the loglikelihood can be written as the sum of a mean and volatility part (depending on a set of unknown parameters equation image) and a correlation part (depending on equation image).

Indeed, recalling that the conditional variance matrix of a DCC model can be expressed as Ht = DtRtDt, an inefficient but consistent estimator of the parameter equation image can be found by replacing Rt by the identity matrix in (51). In this case the quasi-loglikelihood function corresponds to the sum of loglikelihood functions of N univariate models:

equation image(53)

Given equation image and under appropriate regularity conditions, a consistent, but inefficient, estimator of equation image can be obtained by maximizing:

equation image(54)

where equation image. The sum of the likelihood functions in (53) and (54), plus half of the total sum of squared standardized residuals (∑tutut/2, which is almost equal to NT/2), is equal to the loglikelihood in (51). It is thus possible to compare the loglikelihood of the two-step approach with that of the one-step approach and of other models.

Engle and Sheppard (2001) explain that the estimators equation image and equation image, obtained by maximizing (53) and (54) separately, are not fully efficient (even if zt is normally distributed) since they are limited information estimators. However, one iteration of a Newton–Raphson algorithm applied to the total likelihood (51), starting at equation image, provides an estimator that is asymptotically efficient.

Another two-step approach for the diagonal VEC model is proposed by Ledoit et al. (2003). To avoid estimating c, A and G jointly, they estimate each variance and covariance equation separately. The resulting estimates do not necessarily guarantee positive semi-definite Ht's. Therefore, in a second step, the estimates are transformed in order to achieve the requirement, keeping the disruptive effects as small as possible. The transformed estimates are still consistent with respect to the parameters of the DVEC model.

3.3. Various Issues

Analytical vs. Numerical Score

Typically, for conditionally heteroscedastic models, numerical techniques are used to approximate the derivatives of the loglikelihood function (the score) with respect to the parameter vector. As shown by Fiorentini et al. (1996) and McCullough and Vinod (1999), in a univariate framework, using analytical scores in the estimation procedure improves the numerical accuracy of the resulting estimates and speeds-up ML estimation. According to Hafner and Rombouts (2004), the score vector corresponding to a term of (49) takes the form

equation image(55)

where st(ζ) = ∂logf(yt|ζ, It−1)/∂ζ, vec(·) is the operator that stacks the columns of a M × N matrix into a MN × 1 vector, DN is the duplication matrix defined so that DNvech(A) = vec(A) for every symmetric matrix A of order N and equation image is its generalized inverse. As pointed out by a referee, one sees from (55) that the choice of the square root matrix equation image has consequences for the exact form of the score.

In this respect, Lucchetti (2002) proposes a closed-form expression of the score vector for the BEKK model with a Gaussian loglikelihood. Hafner and Herwartz (2003) also provide analytical formulae for the score and the Hessian of a general MGARCH model in a QML framework and propose two methods to estimate the expectation of the Hessian. The authors show in a simulation study that analytical derivatives clearly outperform numerical methods.

Variance Targeting

We have seen that what renders most MGARCH models difficult for estimation is their high number of parameters. A simple trick to ensure a reasonable value of the model-implied unconditional covariance matrix, which also helps to reduce the number of parameters in the maximization of the likelihood function, is referred to as variance targeting by Engle and Mezrich (1996). For example, in the VEC model (and all its particular cases), the conditional variance matrix may be expressed in terms of the unconditional variance matrix (see earlier) and other parameters. Doing so one can reparametrize the model using the unconditional variance matrix and replace it by a consistent estimator (before maximizing the likelihood). When doing this, one should correct the covariance matrix of the estimator of the other parameters for the uncertainty in the preliminary estimator. In DCC models, this can also be done with the constant matrix of the correlation part, e.g. in (40). In this case, the two-step estimation procedure explained in Section 3.2 becomes a three-step procedure.

Imposing or Not the Positivity Contraints

A key problem in MGARCH models is that the conditional variance matrix has to be positive definite almost surely for all t. As shown in the previous section this is done by constraining the parameter space (for instance by using a constrained optimization algorithm), assuming that the constraints are known. However, these constraints are usually sufficient but not necessary. For instance, we know since Nelson and Cao (1992) that imposing ωi > 0 and αi, βi ≥ 0 in (34) is overly restrictive and that negative values of αi and βi are not incompatible with a positive conditional variance. If one imposes positivity restrictions to facilitate estimation, one incurs the risk of rejecting θ0 from the parameter space.


Brooks et al. (2003) review the relatively small number of software packages that are currently available for estimating MGARCH models. It is obvious that the development of MGARCH models in standard econometric packages is still in its infancy, and that further developments would greatly help applied researchers who cannot afford to program the estimation of a particular model, but who would rather try several models and distributions.


Since estimating MGARCH models is time-consuming, both in terms of computations and their programming (if needed), it is desirable to check ex ante whether the data present evidence of multivariate ARCH effects. Ex post, it is also of crucial importance to check the adequacy of the MGARCH specification. However, compared to the huge body of diagnostic tests devoted to univariate models, few tests are specific to multivariate models.

In the current literature on MGARCH models, one can distinguish two kinds of specification tests, namely univariate tests applied independently to each series and multivariate tests applied to the vector series as a whole. We deliberately leave out the first kind of tests and refer interested readers to surveys of univariate ARCH processes (see Section 1). As emphasized by Kroner and Ng (1998), the existing literature on multivariate diagnostics is sparse compared to the univariate case. However, although univariate tests can provide some guidance, contemporaneous correlation of disturbances entails that statistics from individual equations are not independent. As a result, combining test decisions over all equations raises size control problems, so the need for joint testing naturally arises (Dufour et al., 2003).

Since the dynamics of the series is assumed to be captured by the model (at least in the first two conditional moments), the standardized error term equation image should obey the following moment conditions (see Ding and Engle, 2001):13

  • (A)E(ztzt) = IN
  • (B)equation image
  • (C)equation image

While testing A has power to detect misspecification in the conditional mean, testing B is suited to check if the conditional distribution is Gaussian, which could be false even if Ht is correctly specified. In contrast, testing C aims at checking the adequacy of the dynamic specification of Ht, regardless of the validity of the assumption about the distribution of zt. Ding and Engle (2001) show that if the true conditional distribution is the multivariate Student described in (52), equation image, for ij, which is different from 0 when 1/ν ≠ 0 (the none-Gaussian case). Moreover, starting from a conditionally homoscedastic multivariate regression model (i.e. equation image), testing C is equivalent to testing the presence of ARCH effects in the data. Provided that a sufficient number of moments exist (which is not always the case), testing conditions A–C could be done using the conditional moment test principle of Newey (1985) and Tauchen (1985).

A quite different approach aims at checking the overall adequacy of a model, i.e. the coincidence of the assumed density f(yt|θ, η, It−1) and the true density equation image. Diebold et al. (1998) (in the univariate case) and Diebold et al. (1999) (in the multivariate case) propose an elegant and practical procedure based on the concept of density forecasts. For more details about density forecasts and their applications in finance, see the special issue of the Journal of Forecasting (Timmermann, 2000).

As mentioned by Tse (2002), diagnostics for conditional heteroscedasticity models applied in the literature can be divided into three categories: portmanteau tests of the Box–Pierce–Ljung type, residual-based diagnostics and Lagrange multiplier tests.

4.1. Portmanteau Statistics

The most widely used diagnostics to detect ARCH effects are probably the Box–Pierce/Ljung–Box portmanteau tests. Following Hosking (1980), a multivariate version of the Ljung–Box test statistic is given by:

equation image(56)

where Yt = vech(ytyt) and Cmath image(j) is the sample autocovariance matrix of order j. Under the null hypothesis of no ARCH effects, HM(M) is distributed asymptotically as χ2(K2M). Duchesne and Lalancette (2003) generalize this statistic using a spectral approach and obtain higher asymptotic power by using a different kernel than the truncated uniform kernel used in HM(M). This test is also used to detect misspecification in the conditional variance matrix Ht, by replacing yt by equation image. The asymptotic distribution of the portmanteau statistics is, however, unknown in this case since t has been estimated. Furthermore, ad hoc adjustments of degrees of freedom for the number of estimated parameters have no theoretical justification. In such a case, portmanteau tests should be interpreted with care even if simulation results reported by Tse and Tsui (1999) suggest that they provide a useful diagnostic in many situations.

Ling and Li (1997) propose an alternative portmanteau statistic for multivariate conditional heteroscedasticity. They define the sample lag-h (transformed) residual autocorrelation as:

equation image(57)

Their test statistic is given by equation image and is asymptotically distributed as χ2(M) under the null of no conditional heteroscedasticity. In the derivation of the asymptotic results, normality of the innovation process is not assumed. The statistic is thus robust with regard to the distribution choice. Tse and Tsui (1999) show that there is a loss of information in the transformation equation image of the residuals and the test may suffer from a power reduction. Furthermore, Duchesne and Lalancette (2003) argue that if an inappropriate choice of M is selected, the resulting test statistic may be quite inefficient (the same comment applies to the residual-based tests presented below). For these reasons, these authors propose a more powerful version of the LL(M) test based on the spectral density of the stochastic process equation image, which is i.i.d. under the null of homoscedasticity. Interestingly, since their test is based on a spectral density estimator, a data-dependent choice of M is available.

4.2. Residual-Based Diagnostics

These tests involve running regressions of the cross-products of the standardized residuals (ût) on some explanatory variables and testing for the statistical significance of the regression coefficients. The key problem is that since the regressors (transformed residuals) are obtained after estimating a first model and so depend on estimated parameters with their own uncertainty, the usual OLS theory does not apply. The contribution of Tse (2002) is to establish the asymptotic distribution of the OLS estimator in this context. Let us define equation image as the ith (i = 1, …, N) standardized residual at time t and equation image as the estimated conditional correlation between yit and yjt. Tse (2002) proposes to run the following regressions:

equation image(58)
equation image(59)

where it and iit are the estimated counterparts of respectively equation image, and δi and δij are the regression coefficients. The choice of the regressors may be changed depending on the particular type of model inadequacy one wants to investigate. An advantage of the residual-based diagnostics is that they focus on several distinctive aspects of possible causes of ‘remaining’ ARCH effects.

Tse (2002) shows that under reasonable assumptions the statistics equation image and equation image are each asymptotically distributed as χ2(M) under the null of correct specification of the first two conditional moments, where

equation image(60)
equation image(61)
equation image(62)

and assuming that under certain conditions, the MLE estimate of θ satisfies the condition equation image. Naturally, to compute these statistics, one has to replace the unobservable components by their estimated counterparts.

4.3. Lagrange Multiplier Tests

Lagrange multiplier tests usually have a higher power than portmanteau tests when the alternative is correct (although they can be asymptotically equivalent in certain cases), but they may have low power against other alternatives. Bollerslev et al. (1988) and Engle and Kroner (1995), among others, have developed LM tests for MGARCH models. Recently, Sentana and Fiorentini (2001) have developed a simple preliminary test for ARCH effects in common factor models.

To reduce the number of parameters in the estimation of MGARCH models, it is usual to introduce restrictions. For instance, the CCC model of Bollerslev (1990) assumes that the conditional correlation matrix is constant over time. It is then desirable to test this assumption afterwards. Tse (2000) proposes a test for constant correlations. The null is equation image where the conditional variances are GARCH (1,1), the alternative is equation image. The test statistic is a LM statistic which under the null is asymptotically χ2(N(N − 1)/2). Bera and Kim (2002) also develop a test for constancy of the correlation parameters in the CCC model of Bollerslev (1990). It is an information matrix-type test that besides constant correlations examines at the same time various features of the specified model. An alternative test has been proposed by Longin and Solnik (1995).

Engle and Sheppard (2001) propose another test of the constant correlation hypothesis, in the spirit of the DCC models presented earlier. The null equation image is tested against the alternative equation image. The test is easy to implement since H0 implies that coefficients in the regression equation image are equal to zero, where Xt = vechu(ttIN), vechu is like the vech operator but it only selects the elements under the main diagonal, equation image is the N × 1 vector of standardized residuals (under the null), and equation image.


The main purpose of this paper is to review MGARCH models. Since the seminal paper of Engle (1982), much progress has been made in understanding GARCH models and their multivariate extensions. As mentioned in Section 1, these models are increasingly used in applied financial econometrics. Given the large and increasing variety of existing models, an applied econometrician is confronted with the issue of choosing among them for each particular application. A related question is ‘which model is most appropriate under which circumstances?’.

Applied research has naturally followed theoretical research and used existing models. There are very few papers where a comparison of different MGARCH models for the same problem and data is done.14 The discussion paper version of this article—see Bauwens et al. (2003)—contains a detailed review of the application fields and gives an idea of which models have been used for each field. This reveals that to a large extent, when a model came out in the literature, applied researchers soon started to try it, discarding the previous models. This partly reflects the fact that parsimonious models were introduced progressively to overcome the difficulty of estimating the VEC model.

In our opinion, the crucial point in MGARCH modelling is to provide a realistic but parsimonious specification of the variance matrix ensuring its positivity. There is a dilemma between flexibility and parsimony. BEKK models are flexible but require too many parameters for multiple time series of more than four elements. Diagonal VEC and BEKK models are much more parsimonious but very restrictive for the cross-dynamics. They are not suitable if volatility transmission is the object of interest, but they usually do a good job in representing the dynamics of variances and covariances. This may be sufficient for some applications like asset pricing models.

In contrast, factor GARCH models allow the conditional variances and covariances to depend on the past of all variances and covariances, but they imply common persistence in all these elements. In this respect, the DCC models allow for different persistence between variances and correlations, but impose common persistence in the latter (although this may be relaxed). They open the door to handling more than a very small number of series. They are an extension of the CCC model which is relatively easy to estimate.

One way to deal with flexible but heavily parametrized models is to keep a flexible functional form and to reduce the number of parameters by imposing restrictions. An example of this approach is provided in Engle et al. (1990a), who impose restrictions in a VEC model. We conjecture that researchers will propose new ways to impose restrictions. An idea is to base the restrictions on preliminary (easy to obtain) estimates, in the spirit of Ledoit et al. (2003) or Bauwens and Rombouts (2003).

Finally, here is a list of open issues/research topics (stated in a condensed way since they have been discussed in this survey):

  • 1.Improving software for inference (this is a prerequisite for progress in applications).
  • 2.Comparing the performance and assessing the financial value of different specifications in applications.15
  • 3.Implications of stability or not of a model class with respect to linear transformations.
  • 4.More flexible specifications for the dynamics of correlations of DCC models.
  • 5.Unconditional moments of correlations/covariances, marginalization and temporal aggregation in DCC models.
  • 6.Development of a copula tool for specification and inference.
  • 7.Impact of choice of the square root decomposition of Ht on statistical procedures.
  • 8.Conditions for two-step efficient estimation (MGARCH on residuals of the mean model).
  • 9.Asymptotic properties of MLE (in particular low level, easy to check, sufficient conditions for asymptotic normality when it holds).
  • 10.Further developments of multivariate diagnostic tests.

There is little doubt that progress on these issues would greatly contribute to the theory and practice of MGARCH models.


The authors would like to thank Christian Hafner and Roy van der Weide for useful comments. They are especially grateful to three referees and the editor (T. Bollerslev) for their detailed comments and numerous suggestions that helped to improve the paper. This text presents research results of the Belgian Programme on Interuniversity Poles of Attraction initiated by the Belgian State Prime Minister's Office, Science Policy Programming. Scientific responsibility is assumed by the authors.

  • 1

    Kearney and Patton (2000) and Karolyi (1995) exemplify such studies.

  • 2

    See Bollerslev (1990) and Longin and Solnik (1995).

  • 3

    See Kim (2000).

  • 4

    Note that although the GARCH parameters do not affect the conditional mean, the conditional mean parameters generally enter the conditional variance specification through the residuals.

  • 5

    If A = (aij) and B = (bij) are both m × n matrices, then AB is the m × n matrix containing elementwise products (aijbij).

  • 6

    If v is a vector of dimension m then diag(v) is the m × m diagonal matrix with v on the main diagonal.

  • 7

    Gourieroux (1997, section 6.1) gives sufficient conditions for the positivity of Ht. These conditions are obtained by writing the model for the matrix Ht itself rather than for its vectorized version.

  • 8

    A 1 × m vector v ≠ 0 satisfying vA = λv for an m × m matrix A and a complex number λ is a left eigenvector of A corresponding to the eigenvalue λ, see Lütkepohl (1996, p. 256).

  • 9

    Note that when M = 1, Ψt−1 is equal to a matrix of ones.

  • 10

    Remember that equation image and hence zt is not unique.

  • 11

    Nijman and Sentana (1996) and Meddahi and Renault (1996) study the issue of contemporaneous aggregation, i.e., the aggregation of independent univariate GARCH processes.

  • 12

    In a weak GARCH process, the dynamic equation for ht defines the best linear predictor of equation image given the past of ϵt. In a strong GARCH, ht is the conditional variance. See Drost and Nijman (1993).

  • 13

    The definition of the exact form of the square root matrix equation image is deliberately left unspecified by Ding and Engle (2001). It is not known to what extent a particular choice has a consequence on the tests presented in this section.

  • 14

    Examples are Karolyi (1995), Bera et al. (1997), Kroner and Ng (1998), Engle and Sheppard (2001).

  • 15

    See Rombouts and Verbeek (2004) and Fleming et al. (2003) for recent examples.