Volume 35, Issue 1
RESEARCH ARTICLE
Open Access

Two are better than one: Volatility forecasting using multiplicative component GARCH‐MIDAS models

Christian Conrad

Department of Economics, Heidelberg University, Heidelberg, Germany

Search for more papers by this author
Onno Kleen

Corresponding Author

E-mail address: onno.kleen@awi.uni-heidelberg.de

Department of Economics, Heidelberg University, Heidelberg, Germany

Correspondence

Onno Kleen, Department of Economics, Heidelberg University, Bergheimer Strasse 58, 69115 Heidelberg, Germany.

Email: onno.kleen@awi.uni-heidelberg.de

Search for more papers by this author
First published: 02 November 2019
Citations: 4

Summary

We examine the properties and forecast performance of multiplicative volatility specifications that belong to the class of generalized autoregressive conditional heteroskedasticity–mixed‐data sampling (GARCH‐MIDAS) models suggested in Engle, Ghysels, and Sohn (Review of Economics and Statistics, 2013, 95, 776–797). In those models volatility is decomposed into a short‐term GARCH component and a long‐term component that is driven by an explanatory variable. We derive the kurtosis of returns, the autocorrelation function of squared returns, and the R2 of a Mincer–Zarnowitz regression and evaluate the QMLE and forecast performance of these models in a Monte Carlo simulation. For S&P 500 data, we compare the forecast performance of GARCH‐MIDAS models with a wide range of competitor models such as HAR (heterogeneous autoregression), realized GARCH, HEAVY (high‐frequency‐based volatility) and Markov‐switching GARCH. Our results show that the GARCH‐MIDAS based on housing starts as an explanatory variable significantly outperforms all competitor models at forecast horizons of 2 and 3 months ahead.

1 INTRODUCTION

The idea of modeling volatility as consisting of multiple components has a long tradition in financial econometrics (see, e.g., Ding & Granger, 1996; Engle & Lee, 1999). Early models typically featured additive volatility components and did not allow for explanatory variables in the conditional variance. More recently, the focus has shifted to multiplicative component models (see, e.g., Amado & Teräsvirta, 2013, 2017; Engle, Ghysels, & Sohn, 2013; Engle & Rangel, 2008; Han & Kristensen, 2015). In particular, the class of generalized autoregressive conditional heteroskedasticity–mixed‐data sampling (GARCH‐MIDAS) models proposed in Engle et al. (2013) has been proven to be useful for analyzing the link between financial volatility and the macroeconomic environment (see Asgharian, Hou, & Javed, 2013; Conrad & Loch, 2015; Dorion, 2016). In GARCH‐MIDAS, a unit‐variance GARCH component fluctuates around a time‐varying long‐term component that is a function of (macroeconomic or financial) explanatory variables. By allowing for a mixed‐frequency setting, this approach bridges the gap between daily stock returns and low‐frequency (e.g., monthly, quarterly) explanatory variables. For further applications of GARCH‐MIDAS‐type models see, for example, Conrad, Loch, and Rittler (2014), Opschoor, van Dijk, and van der Wel (2014), Dominicy and Vander Elst (2015), Lindblad (2017), Amendola, Candila, and Scognamillo (2017), Pan, Wang, Wu, and Yin (2017), Conrad, Custovic, and Ghysels (2018), and Borup and Jakobsen (2019). For a recent survey on multiplicative component models see Amado, Silvennoinen, and Teräsvirta (2019). Throughout this paper, the GARCH‐MIDAS model will be our leading example for a multiplicative component GARCH (M‐GARCH) model. However, we will also discuss how the class of M‐GARCH models nests other specifications such as the Markov‐switching GARCH (MS‐GARCH) of Haas, Mittnik, and Paolella (2004), the spline‐GARCH of Engle and Rangel (2008), and the multiplicative time‐varying GARCH (MTV‐GARCH) of Amado and Teräsvirta (2008).

Our contribution to this recent strand of literature is twofold. In the first part of this paper, we analyze several statistical properties of the GARCH‐MIDAS model that have not received much attention so far. In the second part of the paper, we compare the out‐of‐sample (OOS) forecast performance of GARCH‐MIDAS with the performance of various competitor models such as the heterogeneous autoregression (HAR) of Corsi (2009), the realized GARCH of Hansen, Huang, and Shek (2012), the high‐frequency‐based volatility (HEAVY) of Shephard and Sheppard (2010), and the MS‐GARCH.

Our main theoretical findings can be summarized as follows. In the GARCH‐MIDAS model, the kurtosis of the returns is always bigger than the kurtosis of the returns in the nested GJR‐GARCH (see Glosten, Jagannathan, & Runkle, 1993) component. If the long‐term component is sufficiently persistent, the autocorrelation function (ACF) of the squared returns as well as the ACF of the conditional variances is more persistent than the corresponding ACFs in the nested GJR‐GARCH. Both findings suggest a multiplicative component structure in the volatility of stock returns as a potential explanation for the common failure of simple one‐component GARCH models to adequately capture the stylized facts of returns and realized variances. It should also be noted that our results are remarkably similar to findings in Han (2015) on GARCH‐X models, even though Han considers models with an additive explanatory variable in the conditional variance and focuses on the asymptotic limit of the sample kurtosis and the sample ACF. Further, we derive an upper bound for the population R2 in the k‐step‐ahead Mincer and Zarnowitz (1969) regression (henceforth MZ regression) of the squared return on the volatility forecast. We show that the population R2 decreases monotonically in the forecast horizon but increases monotonically in the variability of the long‐term component. The latter feature leads to the unpleasant property that the goodness‐of‐fit is particularly high in situations in which the squared error loss is also high. Clearly, this finding questions the usefulness of the MZ R2 for comparing forecast accuracy across volatility regimes. In this context, we derive an explicit expression for the one‐step‐ahead R2 of the GARCH‐MIDAS specification and obtain the results from Andersen and Bollerslev (1998) for the simple GARCH(1,1) as a special case.

Empirically, we first evaluate the quasi‐maximum likelihood estimator (QMLE) of GARCH‐MIDAS models by means of a Monte Carlo simulation. We show that the QMLE is unbiased and that the asymptotic standard errors based on Wang and Ghysels (2015) are valid in the presence of exogenous explanatory variables. Further, we show that measurement error in the explanatory variable or a misspecification of the lag structure has only minor effects. We also confirm our theoretical result that the R2 of a MZ regression is highest in regimes with high volatility, although in those regimes forecast performance is the worst. Following the arguments put forth in Patton and Sheppard (2009) and Patton (2011), we use the QLIKE to evaluate the OOS forecast performance of the GARCH‐MIDAS model against the MS‐GARCH and the nested GARCH. We find that the correctly specified and, in most settings, even the misspecified GARCH‐MIDAS models beat the competitor models.

Finally, we apply the GARCH‐MIDAS model to a long time series of S&P 500 returns combined with data on US macroeconomic and financial conditions. We consider GARCH‐MIDAS models with one or two explanatory variables and, for the OOS forecast evaluation, estimate all models on a rolling window using the appropriate real‐time vintage data. Because macroeconomic time series are revised substantially after the first release, we avoid a “look‐ahead‐bias” by using real‐time data. In the OOS forecast evaluation, we compare the GARCH‐MIDAS with eight competitor models: Among those competitor models are the realized GARCH, the HEAVY, the MS‐GARCH, and HAR models with and without leverage. We evaluate all models jointly by constructing model confidence sets (MCS) as introduced in Hansen, Lunde, and Nason (2011). For forecast horizons of 2 weeks and 1 month, the MCS consists of the realized GARCH, the HAR, and GARCH‐MIDAS models with the CBOE Volatility Index (VIX) (or the VIX combined with another explanatory variable). That is, at these forecast horizons the GARCH‐MIDAS is on a par with those models but beats the HEAVY as well as MS‐GARCH models. At longer forecast horizons of 2 and 3 months ahead, only GARCH‐MIDAS models are included in the MCS. At both horizons the GARCH‐MIDAS based on housing starts achieves the lowest QLIKE. This finding is remarkable because our OOS period begins in 2010 and hence does not include the financial crisis and the collapse of the housing bubble.

To facilitate the replication of our results, we provide R packages for downloading real‐time data from the ALFRED database of the Federal Reserve Bank of St. Louis (see Kleen, 2017), as well as for estimating GARCH‐MIDAS models (see Kleen, 2018). 11 The packages are available at: https://cran.r-project.org/package=alfred and https://cran.r-project.org/package=mfGARCH

Our paper is organized as follows: In Section 2, the M‐GARCH model and our theoretical results are presented. In Section 3, we perform a simulation study and, in Section 4, we apply the GARCH‐MIDAS model to S&P 500 return data. The conclusion follows in Section 5. All proofs are contained in Supporting Information Appendix A of the Online Appendix. Additional material can be found in Appendices B–H.

2 THE MULTIPLICATIVE COMPONENT GARCH MODEL

In this section, the M‐GARCH model is introduced and its theoretical properties are derived. In particular, we show that the M‐GARCH model inherits certain time series properties that are in line with stylized facts typically observed for financial return data but cannot be captured by simple GARCH models.

2.1 Model specification

We denote daily log‐returns by ri,t, whereby the index t=1,…,T refers to a certain period (e.g., a week or a month) and the index i=1,…,It to days within that period. For simplicity, we model the returns as ri,t=μ+εi,t. 22 It would be straightforward to allow for richer dynamics in the conditional mean. However, for daily return data a constant conditional mean is usually sufficient. For simplicity, in the following we refer to εi,t as the (demeaned) return. The M‐GARCH model assumes that the scaled (demeaned) returns can be written as
urn:x-wiley:jae:media:jae2742:jae2742-math-0003(1)
where τt is specified as a function of a (low‐frequency) explanatory variable Xt, gi,t follows a GARCH equation, and Zi,t is an i.i.d. innovation process with mean zero and variance one. Let urn:x-wiley:jae:media:jae2742:jae2742-math-0004 denote the information set up to day i in period t and define urn:x-wiley:jae:media:jae2742:jae2742-math-0005. If τt depends on lagged values of Xt only, then
urn:x-wiley:jae:media:jae2742:jae2742-math-0006(2)
is the conditional variance of the daily returns; that is, urn:x-wiley:jae:media:jae2742:jae2742-math-0007. We refer to gi,t as the short‐term component of volatility and to τt as the long‐term component of volatility. Whereas gi,t varies daily, τt is constant across all days within period t and thus changes at the lower frequency only. The short‐term component is intended to describe the well‐known day‐to‐day clustering of volatility and is assumed to follow a mean‐reverting unit‐variance GJR‐GARCH(1,1) process:
urn:x-wiley:jae:media:jae2742:jae2742-math-0008(3)

Remark 1.We use the convention that urn:x-wiley:jae:media:jae2742:jae2742-math-0009 and urn:x-wiley:jae:media:jae2742:jae2742-math-0010. Similarly, we can write the long‐term component as τi,t=τt for i=1,…,n and urn:x-wiley:jae:media:jae2742:jae2742-math-0011. That is, for It>1, τt is piecewise constant. If It=1, then both components vary at the same frequency. In this case we can write ε1,t=εt, g1,t=gt, ε0,t=ε1,t−1=εt−1, and g0,t=g1,t−1=gt−1. Thus we can drop the index i.

A characteristic of the two‐component M‐GARCH model defined in Equation 1 is that the scaled returns, urn:x-wiley:jae:media:jae2742:jae2742-math-0012, are assumed to follow a GARCH process. Hence the forcing variable in Equation (3) is urn:x-wiley:jae:media:jae2742:jae2742-math-0013. This feature distinguishes the two‐component M‐GARCH specification from standard GARCH models. In those models it is assumed that τt=1 and hence the returns themselves follow a GARCH process. Similarly, additive component GARCH models, such as the model of Engle and Lee (1999), assume that τt=1 and decompose gi,t into two or more GARCH components (with forcing variable urn:x-wiley:jae:media:jae2742:jae2742-math-0014). We make the following assumptions regarding the innovation process Zi,t and the parameters of the short‐term component.

Assumption 1.Let Zi,t be i.i.d. with E[Zi,t]=0, urn:x-wiley:jae:media:jae2742:jae2742-math-0015, and 1<κ<, where urn:x-wiley:jae:media:jae2742:jae2742-math-0016.

Assumption 2.We assume that α>0, α+γ>0, β ≥ 0, and α+γ/2+β<1. Moreover, the parameters satisfy the condition (α+γ/2)2κ+2(α+γ/2)β+β2<1.

Assumptions 1 and 2 imply that urn:x-wiley:jae:media:jae2742:jae2742-math-0017 is a covariance stationary GJR‐GARCH(1,1) process. The first‐ and second‐order moments of gi,t are given by E[gi,t]=1,
urn:x-wiley:jae:media:jae2742:jae2742-math-0018(4)
and the fourth moment of urn:x-wiley:jae:media:jae2742:jae2742-math-0019 is finite. The role of the second component, τt, is to describe smooth movements in the conditional variance. In general, we specify τt as a measurable, positive‐valued function, f(·), of the present and K ≥ 1 lagged values of an explanatory variable Xt:
urn:x-wiley:jae:media:jae2742:jae2742-math-0020(5)

The appropriate choice of the explanatory variable Xt and of the function f(·) is up to the researcher and will depend on the specific application at hand. 33 While we focus on multiplicative GARCH models, Han and Park (2014) and Han (2015) analyze the properties of a GARCH‐X specification with an explanatory variable that enters additively into the conditional variance equation. See also Francq and Thieu (2019). The explanatory variable can either vary at the daily frequency (i.e., It=1) or at a lower frequency (i.e., It>1). Thus, the choice of Xt defines the low frequency t. In GARCH‐MIDAS‐type models τt depends on lagged values of Xt only. By explicitly allowing τt to depend on Xt in Equation 5, we ensure that our setting also covers MS‐GARCH models (see Section 2.2 for details). We make the following assumption about the explanatory variable Xt and the function f(·):

Assumption 3.Let f(·)>0 be a measurable function and Xt be a strictly stationary and ergodic time series with E[|Xt|q]<, where q is sufficiently large to ensure that urn:x-wiley:jae:media:jae2742:jae2742-math-0021. Xt is independent of Zi,tj for all t, i and j.

Note that Assumption 3 implies that τt is strictly stationary (see Billingsley, 1995, p. 495), covariance stationary, and independent of the ‘GARCH part' (i.e.  urn:x-wiley:jae:media:jae2742:jae2742-math-0022) of the model. In empirical applications the function f(·)>0 is often chosen as being linear in the lagged Xt:
urn:x-wiley:jae:media:jae2742:jae2742-math-0023(6)
The linear specification requires m>0 and πl ≥ 0, for l=1,…,K, and is feasible only if Xt is a nonnegative variable. If Xt can take positive as well as negative values, it is natural to opt for an exponential specification:
urn:x-wiley:jae:media:jae2742:jae2742-math-0024(7)

The assumption that Xt is independent of Zi,tj for all t, i, and j might appear to be rather strong. However, without imposing any restrictions on the functional form of f(·), it greatly simplifies the analysis when discussing the statistical properties of M‐GARCH models in Section 2.3. From an empirical perspective, we believe that it is reasonable to assume that a low‐frequency explanatory variable Xt—such as monthly industrial production growth—is (close to being) independent of the daily innovations Zi,tj. For daily explanatory variables (e.g., measures of realized volatility) the independence assumption might appear to be restrictive. However, even if there is a dependence between the innovation to the daily returns and the daily explanatory variable, the dependence between τt and Zi,tj is likely to be negligible. This is because τt is a rather smooth function that is obtained as a weighted average of many lags of the daily Xt. Indeed, in Section 3 and Supporting Information Appendix D we illustrate in simulations that a mild violation of the independence assumption does not affect our main results.

It should also be noted that the same independence assumption has been previously made in related literature on M‐GARCH models (see Han & Kristensen, 2015). Nevertheless, it clearly imposes a limitation that should be overcome in future work. Two examples in this direction are the estimation of GARCH‐MIDAS models employing lagged values of realized variances (Wang & Ghysels, 2015) and testing for an omitted long‐term component in one‐component GARCH models (Conrad & Schienle, 2018).

Assumptions 1, and 3 imply that the εi,t have mean zero, are uncorrelated, and have an unconditional variance given by var(εi,t)=E[τt]. Moreover, the unconditional variance of the squared returns is well defined: urn:x-wiley:jae:media:jae2742:jae2742-math-0025. If the long‐term component is constant and chosen as τt=ω/(1−αγ/2−β), our model reduces to the GJR‐GARCH with intercept ω.

A measure that is often used to quantify the relative importance of the long‐term component is the following variance ratio (see Engle et al., 2013):
urn:x-wiley:jae:media:jae2742:jae2742-math-0026(8)
where urn:x-wiley:jae:media:jae2742:jae2742-math-0027. The ratio measures how much of the total variation in the (log) conditional variance can be explained by the variation in the (log) long‐term component.

2.2 Nested and related specifications

We first discuss two models that are directly nested in the M‐GARCH setting. The two models are the GARCH‐MIDAS of Engle et al. (2013) and (a restricted version of) the MS‐GARCH model of Haas et al. (2004). Closely related are the Spline‐GARCH of Engle and Rangel (2008) and the MTV‐GARCH of Amado and Teräsvirta (2008). For further models that have a multiplicative component structure see Amado et al. (2019).

2.2.1 GARCH‐MIDAS

In the GARCH‐MIDAS the long‐term component is defined as in Equation 6 or 7, whereby the weights πl are parsimoniously specified via a weighting scheme. The most common choice of long‐term component is based on the exponential specification with πl=θ·φl(w1,w2). Here, the parameter θ determines the sign of the effect of the lagged Xt on the long‐term component and the weights φl(w1,w2) ≥ 0 are parametrized via the Beta weighting scheme
urn:x-wiley:jae:media:jae2742:jae2742-math-0028(9)

By construction, the weights sum to one; that is, urn:x-wiley:jae:media:jae2742:jae2742-math-0029. It directly follows that urn:x-wiley:jae:media:jae2742:jae2742-math-0030. Engle et al. (2013) use monthly industrial production growth and monthly inflation as explanatory variables, whereas Conrad and Loch (2015) employ quarterly macroeconomic variables such as gross domestic product (GDP) growth. For further applications of this model see Asgharian et al. (2013), Opschoor et al. (2014), and Dorion (2016). Wang and Ghysels (2015) consider the special case that f(·) is linear, It=1 and urn:x-wiley:jae:media:jae2742:jae2742-math-0031. That is, Xt is the realized variance based on the last J daily returns. Note that for this specification Xt and Zt are dependent and hence Assumption 3 would be violated.

2.2.2 MS‐GARCH

In MS‐GARCH the returns are given by urn:x-wiley:jae:media:jae2742:jae2742-math-0032, where {Xt} is a Markov chain with finite state space S={1,2,…,s} and transition matrix P with typical element pi,j=P(Xt=j|Xt−1=i). A restricted version of the MS‐GARCH model of Haas et al. (2004) is nested in our setting with It=1. This is best illustrated in the case of s=2: We assume that the conditional variances in the regimes differ in the intercepts but have the same ARCH and GARCH parameters; for example, urn:x-wiley:jae:media:jae2742:jae2742-math-0033, kS. Defining τt=[(2−Xt)ω1+(Xt−1)ω2]/(1−αβ), we can rewrite the returns as urn:x-wiley:jae:media:jae2742:jae2742-math-0034, where urn:x-wiley:jae:media:jae2742:jae2742-math-0035. Thus the conditional variance has a multiplicative structure. In the following, we will refer to this model as MS‐GARCH with time‐varying intercept (MS‐GARCH‐TVI). Stationarity conditions for MS‐GARCH models can be found in Haas et al. (2004).

2.2.3 Spline‐GARCH and multiplicative time‐varying (MTV) GARCH

In both models it is assumed that It=1. The spline‐GARCH model specifies the long‐term component as a spline function and chooses Xt=t. Similarly, in the MTV‐GARCH f(·) is specified in terms of logistic transition functions and Xt=t/T is the rescaled time. Thus in both models the long‐term component is a deterministic function of time and hence Assumption 3 is violated.

2.3 Properties of the M‐GARCH

In the following, we derive properties of M‐GARCH models for which Assumptions 1, and 3 are satisfied.

2.3.1 Kurtosis and autocorrelation function

Financial returns are often found to be leptokurtic. Hence a desirable feature of a volatility model is that it generates returns with a kurtosis that is similar to the one empirically observed for financial returns. Under Assumptions 1, and 3, the kurtosis of the returns defined in Equation 1 is given by
urn:x-wiley:jae:media:jae2742:jae2742-math-0036
Thus the kurtosis of the M‐GARCH process is larger than the kurtosis of the innovation Zi,t. This is a well‐known feature of GARCH‐type processes. The following proposition relates the kurtosis urn:x-wiley:jae:media:jae2742:jae2742-math-0037 of the M‐GARCH to the kurtosis urn:x-wiley:jae:media:jae2742:jae2742-math-0038 of the nested GARCH(1,1).

Proposition 1.Under Assumptions 1, the kurtosis urn:x-wiley:jae:media:jae2742:jae2742-math-0039 of an M‐GARCH process is given by

urn:x-wiley:jae:media:jae2742:jae2742-math-0040
where urn:x-wiley:jae:media:jae2742:jae2742-math-0041 is the kurtosis of the nested GARCH process and where the equality holds if and only if τt is constant.

Hence, for nonconstant τt, the kurtosis urn:x-wiley:jae:media:jae2742:jae2742-math-0042 is the product of urn:x-wiley:jae:media:jae2742:jae2742-math-0043 and the ratio urn:x-wiley:jae:media:jae2742:jae2742-math-0044. When τt=ω/(1−αγ/2−β) is constant, Proposition 1 nests the kurtosis of the GJR‐GARCH model. Thus, for volatile long‐term components the kurtosis of an M‐GARCH process can be much larger than the kurtosis of the nested GARCH model. 44 Han (2015) obtains a similar result for the sample kurtosis of the returns from a GARCH‐X model with a covariate that can either be stationary or nonstationary. Specifically, Proposition 1 holds for the GARCH‐MIDAS and for the MS‐GARCH‐TVI defined in Section 2.2.

The empirical ACFs of volatility proxies such as squared returns or realized variances are known to be very persistent (see, e.g., Andersen, Bollerslev, Diebold, & Labys, 2003; Ding, Granger, & Engle, 1993). In particular, squared returns are often found to decay more slowly than the exponentially decaying ACF implied by the simple GARCH(1,1) model. In the literature on GARCH models, this is usually interpreted as either evidence for long memory (see, e.g., Baillie, Bollerslev, & Mikkelsen, 1996), structural breaks (see, e.g., Hillebrand, 2005), or an omitted persistent covariate (see Han & Park, 2014) in the conditional variance.

The following propositions show that the theoretical ACFs of the M‐GARCH process have a much slower decay than the ACF of the nested GARCH component if the long‐term component is sufficiently persistent. Hence the multiplicative structure provides an alternative explanation for the empirical observation of highly persistent ACFs of squared returns or realized variances. For Propositions 2 and 3, we consider the case that both components are varying at the same frequency; that is, the length of the period t is one day (It=1).

Proposition 2.If It=1 and Assumptions 1 are satisfied, the ACF, urn:x-wiley:jae:media:jae2742:jae2742-math-0045, of the squared returns from an M‐GARCH process is given by

urn:x-wiley:jae:media:jae2742:jae2742-math-0046(10)
with urn:x-wiley:jae:media:jae2742:jae2742-math-0047 and
urn:x-wiley:jae:media:jae2742:jae2742-math-0048
being the ACF of the GJR‐GARCH component. 55 Note that urn:x-wiley:jae:media:jae2742:jae2742-math-0049 reduces to the ACF of a (symmetric) GARCH(1,1) when γ=0 (see Karanasos, 1999).

Proposition 2 shows that the ACF of the squared returns is given by the sum of two terms: the first term corresponds to the ACF of the long‐term component urn:x-wiley:jae:media:jae2742:jae2742-math-0050 times a constant, whereas the second term equals the exponentially decaying ACF of the nested GARCH model urn:x-wiley:jae:media:jae2742:jae2742-math-0051 times a term that depends again on urn:x-wiley:jae:media:jae2742:jae2742-math-0052. Hence, if τt is sufficiently persistent, urn:x-wiley:jae:media:jae2742:jae2742-math-0053 will essentially behave as urn:x-wiley:jae:media:jae2742:jae2742-math-0054 for k large. 66 Again, Han (2015) also obtains a bicomponent structure for the sample ACF of the squared returns from a GARCH‐X model with a fractionally integrated covariate. Similarly, Han and Kristensen (2015) show that the empirical ACF in a multiplicative model can display long‐memory‐type behavior. For τt being constant, the first term in Equation 10 is equal to zero and the second term reduces to the ACF of an asymmetric GARCH(1,1). Also, note that the ratio urn:x-wiley:jae:media:jae2742:jae2742-math-0055 is closely related to the variance ratio defined in Equation 8 and measures how much of the variation in the squared returns can be attributed to the variation in the long‐term component; that is, it measures the importance of the long‐term component.

Haas et al. (2004, p. 503) make a similar observation for the MS‐GARCH‐TVI model that we discussed in Section 2.2. For this model, they show that the autocorrelations of the squared returns decay at a rate of urn:x-wiley:jae:media:jae2742:jae2742-math-0056, where ϖ=p1,1+p2,2−1 is the degree of persistence due to the Markov effects. 77 Haas et al. (2004) consider a symmetric GARCH. Hence the persistence in the GARCH component is α+β. If ϖ is close to one—that is, if the long‐term component is very persistent—the decay rate of this component dominates the decay of the autocorrelation function.

A standard misspecification test for GARCH models is the Ljung–Box statistic applied to the squared deGARCHed residuals, urn:x-wiley:jae:media:jae2742:jae2742-math-0057. The result in Proposition 2 may explain why in empirical applications the null hypothesis of this test is often rejected. In the multiplicative model, the ACF of the squared deGARCHed residuals is given by urn:x-wiley:jae:media:jae2742:jae2742-math-0058, which follows the rate of decay of the long‐term component and hence is still persistent. Using similar arguments to those in the proof of Proposition 2, we can derive the ACF of urn:x-wiley:jae:media:jae2742:jae2742-math-0059.

Proposition 3.If It=1 and Assumptions 1 are satisfied, the ACF, urn:x-wiley:jae:media:jae2742:jae2742-math-0060, of urn:x-wiley:jae:media:jae2742:jae2742-math-0061 is given by

urn:x-wiley:jae:media:jae2742:jae2742-math-0062(11)
with urn:x-wiley:jae:media:jae2742:jae2742-math-0063 as before and urn:x-wiley:jae:media:jae2742:jae2742-math-0064 being the ACF of the gt component.

Again, Assumption 3 holds for the GARCH‐MIDAS and the MS‐GARCH‐TVI.

The implications of Proposition 3 are depicted in Figure 1. The bars in light gray display the empirical ACF of the daily S&P 500 realized variances for the 2000:M1 to 2018:M4 period. 88 The underlying data will be described in detail in Section 4.1. The autocorrelations were estimated using the instrumental variables estimator suggested in Hansen and Lunde (2014). We employ their preferred specification, a two‐stage least squares estimator in which lagged realized variances of order 4–10 are used as instrumental variables (see Hansen & Lunde, 2014, p. 82). By choosing appropriate parameter values for a GARCH‐MIDAS process, we obtain an ACF of urn:x-wiley:jae:media:jae2742:jae2742-math-0065 (dashed red line), which behaves very similar to the empirical ACF of the realized volatilities. The figure shows that the second term—that is, the ACF of gt (dot‐dashed blue line)—determines the decay behavior of ρk(σ2)MG when k is small, whereas the first term—that is, the ACF of τt (solid green line)—dominates when k is large. Finally, it is important to note that although our results on the kurtosis and the ACFs are presented for a GJR‐GARCH(1,1) short‐term component, they directly extend to a covariance stationary GJR‐GARCH(p,q) component.

jae2742-fig-0001
Autocorrelation function of the volatility process in a GARCH‐MIDAS model. We depict the ACF of the volatility process in a GARCH‐MIDAS model (red, dashed) and its components: the first (green, solid) and second term (blue, dot‐dashed) in Equation 11. The long‐term component is defined as in Equations 7 and 9 with m=−0.1, θ=0.3, w1=1,w2=5, and K=264. The explanatory variable is given by urn:x-wiley:jae:media:jae2742:jae2742-math-0066, where ϕ=0.98 and urn:x-wiley:jae:media:jae2742:jae2742-math-0067. The GARCH(1,1) parameters are α=0.06, β=0.91, and γ=0. Moreover, we set κ=3. Bars in light gray display the empirical autocorrelation of S&P 500 daily realized variances between 2000:M1 and 2018:M4 as measured by Hansen and Lunde (2014). For details see Section 4 [Colour figure can be viewed at wileyonlinelibrary.com]

2.3.2 Forecast evaluation with MZ regression

In empirical applications, the coefficient of determination from an MZ regression is often used as a measure of forecast accuracy. In this section, we will argue against using this measure when comparing forecast performance across volatility regimes. We now exclusively focus on the case of a GARCH‐MIDAS. We assume that forecasts are produced on the last day It of period t and denote the k‐step‐ahead volatility forecast by hk,t+1|t with k ≤ It+1. The optimal forecast from the GARCH‐MIDAS is urn:x-wiley:jae:media:jae2742:jae2742-math-0068, where urn:x-wiley:jae:media:jae2742:jae2742-math-0069. When evaluating the volatility forecast, one has to deal with the problem that the true conditional variance, urn:x-wiley:jae:media:jae2742:jae2742-math-0070, is unobservable. Patton (2011) discusses the situation in which the forecast evaluation is based on some conditionally unbiased volatility proxy urn:x-wiley:jae:media:jae2742:jae2742-math-0071 instead. He defines a loss function urn:x-wiley:jae:media:jae2742:jae2742-math-0072 as robust if the expected loss ranking of two competing forecasts is preserved when replacing urn:x-wiley:jae:media:jae2742:jae2742-math-0073 by urn:x-wiley:jae:media:jae2742:jae2742-math-0074. In the MZ regression urn:x-wiley:jae:media:jae2742:jae2742-math-0075 is often replaced by the conditionally unbiased but noisy proxy urn:x-wiley:jae:media:jae2742:jae2742-math-0076. 99 To illustrate the severeness of the noise, consider an example with urn:x-wiley:jae:media:jae2742:jae2742-math-0077. Then urn:x-wiley:jae:media:jae2742:jae2742-math-0078 will either over‐ or underestimate the true urn:x-wiley:jae:media:jae2742:jae2742-math-0079 by more than 50% with a probability of about 74%.

The MZ regression for evaluating the k‐step‐ahead volatility forecast is given by
urn:x-wiley:jae:media:jae2742:jae2742-math-0080(12)
We denote the respective coefficient of determination by urn:x-wiley:jae:media:jae2742:jae2742-math-0081. As shown in Hansen and Lunde (2006), the ranking of competing one‐step‐ahead volatility forecasts based on the urn:x-wiley:jae:media:jae2742:jae2742-math-0082 of the MZ regression is robust to using the proxy urn:x-wiley:jae:media:jae2742:jae2742-math-0083 instead of the latent conditional variance urn:x-wiley:jae:media:jae2742:jae2742-math-0084 as the dependent variable. For hk,t+1|t=τt+1gk,t+1|t, the population parameters of the MZ regression are given by δ0=0 and δ1=1 and hence the population urn:x-wiley:jae:media:jae2742:jae2742-math-0085 can be written as
urn:x-wiley:jae:media:jae2742:jae2742-math-0086(13)
where we use that the variance of ηk,t+1 equals the expected squared error (SE) loss of the forecast evaluated against urn:x-wiley:jae:media:jae2742:jae2742-math-0087; that is, urn:x-wiley:jae:media:jae2742:jae2742-math-0088. Using that urn:x-wiley:jae:media:jae2742:jae2742-math-0089, it follows that
urn:x-wiley:jae:media:jae2742:jae2742-math-0090(14)

That is, the expected SE based on the noisy proxy equals the expected SE based on the latent volatility plus a term that depends on the fourth moment, κ, of Zi,t and the expected value of the squared conditional variance. Hence using a noisy proxy for forecast evaluation can lead to a substantially higher expected SE than the expected SE based on the latent volatility. Patton, (2011, p. 248) basically makes the same point by arguing that “although the ranking obtained from a robust loss function will be invariant to noise in the proxy, the actual level of expected loss obtained using a proxy will be larger than that which would be obtained when using the true conditional variance.”

Using the insight from Equation 14 that the expected SE loss based on the noisy proxy is at least urn:x-wiley:jae:media:jae2742:jae2742-math-0091, we obtain the following bound:
urn:x-wiley:jae:media:jae2742:jae2742-math-0092(15)

The upper bound for urn:x-wiley:jae:media:jae2742:jae2742-math-0093 given by Equation 15 nicely illustrates that a low urn:x-wiley:jae:media:jae2742:jae2742-math-0094 is not necessarily evidence for model misspecification but can simply be due to using a noisy volatility proxy. This point has been made before by Andersen and Bollerslev (1998), but for the special case of a one‐step‐ahead forecast from a GARCH(1,1). 1010 See Andersen, Bollerslev, and Meddahi (2005) for a model‐free adjustment procedure for the predictive R2. Note that the result in Equation 15 does not depend on the two‐component structure of the model but is true for any conditionally heteroskedastic process.

Next, we derive an explicit expression for the MZ urn:x-wiley:jae:media:jae2742:jae2742-math-0095 of the GARCH‐MIDAS model.

Proposition 4.If urn:x-wiley:jae:media:jae2742:jae2742-math-0096 follows a GARCH‐MIDAS process, Assumptions 1 are satisfied, and hk,t+1|t=τt+1gk,t+1|t, then the population urn:x-wiley:jae:media:jae2742:jae2742-math-0097 of the MZ regression is given by

urn:x-wiley:jae:media:jae2742:jae2742-math-0098(16)
with urn:x-wiley:jae:media:jae2742:jae2742-math-0099 as in Equation 4 and
urn:x-wiley:jae:media:jae2742:jae2742-math-0100(17)

We obtain the following two properties:

  1. urn:x-wiley:jae:media:jae2742:jae2742-math-0101 decreases monotonically with increasing forecast horizon k and, in the limit, converges 1111 Although by assumption k ≤ It in our setting, we can think of, for example, a semiannual period and daily volatility forecasts. In this case k can be at most 132 (=6·22). For such a large k and under reasonable assumptions on the GARCH parameters, we have urn:x-wiley:jae:media:jae2742:jae2742-math-0102. to urn:x-wiley:jae:media:jae2742:jae2742-math-0103
  2. urn:x-wiley:jae:media:jae2742:jae2742-math-0104 increases monotonically in urn:x-wiley:jae:media:jae2742:jae2742-math-0105.

The first property rests on the insight that the forecast of the GARCH component converges to one (as k) and hence the MZ regression reduces to a regression of urn:x-wiley:jae:media:jae2742:jae2742-math-0106 on a constant and τt+1. Thus urn:x-wiley:jae:media:jae2742:jae2742-math-0107 can be interpreted as the fraction of the total variation in daily returns that can be attributed to the variation in the long‐term component. Note that urn:x-wiley:jae:media:jae2742:jae2742-math-0108 corresponds to the weight that is attached to the ACF of τt in the first term in Equation 10.

Second, the result that urn:x-wiley:jae:media:jae2742:jae2742-math-0109 increases when τt+1 gets more volatile implies that for the very same model the urn:x-wiley:jae:media:jae2742:jae2742-math-0110 will be higher in high‐volatility regimes (i.e., when the squared error loss is high) than in low‐volatility regimes (i.e., when the squared error loss is low). This can be misleading when calculating urn:x-wiley:jae:media:jae2742:jae2742-math-0111 for different regimes. The intuition is best illustrated when looking at one‐step‐ahead forecasts. Equations 13 and 14 imply
urn:x-wiley:jae:media:jae2742:jae2742-math-0112(18)

When urn:x-wiley:jae:media:jae2742:jae2742-math-0113 is increasing, the unconditional variance of returns rises at a faster rate than the expected squared error and hence the MZ urn:x-wiley:jae:media:jae2742:jae2742-math-0114 is increasing. We can express urn:x-wiley:jae:media:jae2742:jae2742-math-0115 directly as a function of the model parameters:

Lemma 1.If urn:x-wiley:jae:media:jae2742:jae2742-math-0116 follows a GARCH‐MIDAS process, Assumptions 1 are satisfied, and h1,t+1|t=τt+1g1,t+1, then the population urn:x-wiley:jae:media:jae2742:jae2742-math-0117 of the MZ regression is given by

urn:x-wiley:jae:media:jae2742:jae2742-math-0118(19)

For τt+1 being constant and γ=0, Equation 19 is reduced to the expression in Andersen and Bollerslev (1998, p. 892) for the symmetric GARCH(1,1); that is, urn:x-wiley:jae:media:jae2742:jae2742-math-0119.

The effect of an increase in urn:x-wiley:jae:media:jae2742:jae2742-math-0120 on urn:x-wiley:jae:media:jae2742:jae2742-math-0121, urn:x-wiley:jae:media:jae2742:jae2742-math-0122 and urn:x-wiley:jae:media:jae2742:jae2742-math-0123 is illustrated in Figure 2. We set E[τt+1]=1, α=0.05,β=0.92,γ=0, and κ=3. As expected, the left‐hand panel shows that the expected squared error increases when we move from a low‐volatility regime (say urn:x-wiley:jae:media:jae2742:jae2742-math-0124) to a high‐volatility regime (say urn:x-wiley:jae:media:jae2742:jae2742-math-0125). However, it also shows that the variance of the returns is increasing even faster (as evident from the larger slope coefficient). The right‐hand panel of Figure 2 shows that this translates into an increase of urn:x-wiley:jae:media:jae2742:jae2742-math-0126. That is, although the expected squared error increases, the “forecast accuracy” as measured by urn:x-wiley:jae:media:jae2742:jae2742-math-0127 increases as well. In this regard, the R2 of an MZ regression should be interpreted as a measure of relative forecast accuracy; that is, forecast accuracy is measured relative to the unconditional variance of the process. In contrast, the squared error loss is a measure of absolute forecast accuracy. Note that for rather moderate values of urn:x-wiley:jae:media:jae2742:jae2742-math-0128 the coefficient of determination is already close to its upper bound of 1/3.

jae2742-fig-0002
urn:x-wiley:jae:media:jae2742:jae2742-math-0129, urn:x-wiley:jae:media:jae2742:jae2742-math-0130, and MZ urn:x-wiley:jae:media:jae2742:jae2742-math-0131 as a function of urn:x-wiley:jae:media:jae2742:jae2742-math-0132. The left‐hand panel shows urn:x-wiley:jae:media:jae2742:jae2742-math-0133 (red, solid) and urn:x-wiley:jae:media:jae2742:jae2742-math-0134 (blue, dashed) as a function of urn:x-wiley:jae:media:jae2742:jae2742-math-0135 (see Equation 18). The right‐hand panel depicts the corresponding population Mincer‐Zarnowitz urn:x-wiley:jae:media:jae2742:jae2742-math-0136 as a function of urn:x-wiley:jae:media:jae2742:jae2742-math-0137. We set E[τt+1]=1, α=0.05,β=0.92,γ=0, and κ=3 [Colour figure can be viewed at wileyonlinelibrary.com]

Although the previous results are derived under the assumption that squared daily returns are used as the volatility proxy, it is true that the main insights still hold when using a better volatility proxy. For example, consider the hypothetical case of observing urn:x-wiley:jae:media:jae2742:jae2742-math-0138 ex post. Then, for k, we obtain urn:x-wiley:jae:media:jae2742:jae2742-math-0139. Hence urn:x-wiley:jae:media:jae2742:jae2742-math-0140 would still vary across volatility regimes and increase in the variance of the long‐term component. In the simulation in Section 3, we will consider the case in which the realized variance is used as a proxy for urn:x-wiley:jae:media:jae2742:jae2742-math-0141.

Finally, we consider cumulative volatility forecasts. The MZ regression for evaluating the cumulative k‐day‐ahead volatility forecast is given by
urn:x-wiley:jae:media:jae2742:jae2742-math-0142
where the latent variance is proxied by the realized variance urn:x-wiley:jae:media:jae2742:jae2742-math-0143 (purely based on daily return data) and urn:x-wiley:jae:media:jae2742:jae2742-math-0144. The corresponding urn:x-wiley:jae:media:jae2742:jae2742-math-0145 is given by
urn:x-wiley:jae:media:jae2742:jae2742-math-0146(20)

As before, one can show that urn:x-wiley:jae:media:jae2742:jae2742-math-0147 increases monotonically in urn:x-wiley:jae:media:jae2742:jae2742-math-0148.

2.4 Forecasting long‐term volatility

In the empirical application and in the simulation in Section 3 we also consider forecasting volatility for horizons that are beyond one low‐frequency period. The optimal forecast hk,t+s|t with s>1 is then given by urn:x-wiley:jae:media:jae2742:jae2742-math-0149. It is straightforward to obtain urn:x-wiley:jae:media:jae2742:jae2742-math-0150. Because we do not explicitly model the dynamics of Xt, we are unable to obtain urn:x-wiley:jae:media:jae2742:jae2742-math-0151. Instead, based on the information set urn:x-wiley:jae:media:jae2742:jae2742-math-0152, we forecast τt+s by τt+1. Holding the long‐term component constant when forecasting is reasonable if τt changes smoothly and the forecast horizon is not “too large.” Otherwise, one may use predictions of Xt—for example, survey or time series forecasts—for calculating predictions of τt (see Conrad & Loch, 2015).

3 SIMULATION

In this section, we mainly focus on M‐GARCH models from the GARCH‐MIDAS class. Since asymptotic theory for the QMLE is available only for the special case of a GARCH‐MIDAS with realized volatility as the explanatory variable (see Wang & Ghysels, 2015), we first evaluate the finite‐sample performance of the QMLE in a Monte Carlo simulation. Second, we compare the QMLE of the correctly specified model with the QMLE of misspecified models. We consider misspecification in terms of (i) lag length K, (ii) the explanatory variable being measured with noise, (iii) both, or (iv) omitting the long component completely. Finally, within the Monte Carlo simulation we evaluate the OOS forecast performance and provide empirical support for the theoretical results in Section 2.3.2. For each model specification, we perform 2,000 Monte Carlo replications.

3.1 Data generating process

We simulate an intraday version of the two‐component GARCH model as
urn:x-wiley:jae:media:jae2742:jae2742-math-0153(21)
where the index n=1,…,N now denotes the intraday frequency. The Zn,i,t are assumed to be i.i.d. and follow either a standard normal or a standardized Student t distribution with five degrees of freedom. We generate N=48 intraday returns. Hence, by aggregating returns to a daily frequency, urn:x-wiley:jae:media:jae2742:jae2742-math-0154, the model in Equation 21 is consistent with our daily model. 1212 Alternatively, we simulated the intraday returns using a stochastic volatility model that is consistent with our GARCH‐MIDAS setting. The corresponding results, which are very similar to those based on the specification in Equation 21, are presented in Supporting Information Appendix E. Simulating intraday returns allows us to calculate the daily realized variance, urn:x-wiley:jae:media:jae2742:jae2742-math-0155, as a precise measure of the daily variance. Similarly, we obtain the realized variance over the first k days of month t as urn:x-wiley:jae:media:jae2742:jae2742-math-0156. We simulate data for a period of 40 years of intradaily returns, from which we construct 10,560 daily return and realized variance observations. The parameters of the GARCH component, gi,t, are given by α=0.06, β=0.91, and γ=0. We consider two alternative specifications of the long‐term component:

Monthly τt. The first specification assumes a mixed‐frequency setting with τt fluctuating at a monthly frequency. We assume that each month consists of It=22 days. As in Equation 7, we choose an exponential specification for the long‐term component and specify the MIDAS weights according to the Beta weighting scheme in Equation 9 with m=0.1, θ=0.3, w1=1, w2=4, and K=36. The choice of 3 years as MIDAS lag length follows Conrad and Loch (2015). Setting w2=4 implies a monotonically decaying weighting scheme with weights close to zero for lags greater than two‐thirds of K. The explanatory variable Xt is assumed to follow an AR(1) process, Xt=ϕXt−1+ξt, urn:x-wiley:jae:media:jae2742:jae2742-math-0157, with ϕ=0.9 and urn:x-wiley:jae:media:jae2742:jae2742-math-0158. When averaged over the 2,000 Monte Carlo simulations, these parameter values lead to an empirical VR of 18.60%/18.09% for normally/Student t distributed innovations (recall that the VR was defined in Equation 8).

Daily τt. The second specification assumes that both components fluctuate at a daily frequency (i.e., It=1). The parameters of the long‐term component are chosen as m=−0.1, θ=0.3, w1=1, w2=5, and K=264. Choosing a lag length of roughly 1 year is motivated by our empirical results in Section 4 when estimating a GARCH‐MIDAS model using RVoli,t as the explanatory variable. In addition, we choose ϕ=0.98 and urn:x-wiley:jae:media:jae2742:jae2742-math-0159. In the simulations, the former choice leads to an average VR of 32.49%/31.66% for normally/Student t distributed innovations.

3.2 Parameter estimates

3.2.1 Correctly specified models: Bias and asymptotic standard errors

We use the first 20 years of simulated data as the “in‐sample” period to obtain QML estimates of the model parameters. Table 1 reports the average bias of the QMLE across the 2,000 Monte Carlo simulations. In panels A/B the innovations Zn,i,t are normally/Student t distributed. First, we focus on panel A. In this case the density is correctly specified and the QMLE is the maximum likelihood estimator. Note that for all parameters except w2 the average bias is close to zero when the conditional variance is correctly specified (i.e., with MIDAS lag length of K=36 (monthly) and K=264 (daily) respectively). For w2 we clearly observe an upward bias. 1313 Figure C.1 in the Supporting Information Appendix compares the histogram of the standardized parameter estimates over the 2,000 Monte Carlo replications with a standard normal distribution. The figure shows that for all parameters except w2 the empirical distribution of the parameter estimates is very well approximated by the normal distribution. Based on the 2,000 Monte Carlo replications, we also calculate the empirical standard deviation of the estimated parameters. In Table 1 these figures are presented in curly brackets. The numbers in parentheses are the average asymptotic standard errors based on the results in Wang and Ghysels (2015). A comparison of these numbers shows that the asymptotic standard errors are close to the empirical standard deviation of estimated parameters. The only exception is the specification with monthly τt where the asymptotic standard errors of w2 appear to be too big. Nevertheless, the overall performance of the asymptotic standard errors is very satisfying. That is, the Wang and Ghysels (2015) asymptotic standard errors that were derived under the assumption that urn:x-wiley:jae:media:jae2742:jae2742-math-0160 are applicable more generally.

Table 1. Monte Carlo parameter estimates
α β m θ w2 κ−3
Panel A: Zn,i,t normally distributed
Monthly τt GARCH‐MIDAS (36) ‐0.000 ‐0.004 ‐0.007 0.036 1.959 ‐0.010
{0.008} {0.014} {0.071} {0.145} {6.494}
(0.009) (0.015) (0.070) (0.137) (12.240)
GARCH‐MIDAS (12) ‐0.000 ‐0.003 ‐0.006 ‐0.029 ‐0.470 ‐0.009
GARCH‐MIDAS (36, urn:x-wiley:jae:media:jae2742:jae2742-math-0161) 0.000 ‐0.003 ‐0.006 0.000 0.788 ‐0.009
GARCH‐MIDAS (12, urn:x-wiley:jae:media:jae2742:jae2742-math-0162) 0.000 ‐0.002 ‐0.005 ‐0.075 ‐0.869 ‐0.008
GARCH 0.000 0.003 0.009 0.001
Daily τt GARCH‐MIDAS (264) ‐0.000 ‐0.003 ‐0.003 0.010 1.030 ‐0.006
{0.008} {0.014} {0.063} {0.078} {5.020}
(0.008) (0.014) (0.062) (0.075) (4.786)
GARCH‐MIDAS (66) ‐0.000 ‐0.002 ‐0.001 ‐0.053 ‐3.247 ‐0.004
GARCH‐MIDAS (264, urn:x-wiley:jae:media:jae2742:jae2742-math-0163) ‐0.000 ‐0.003 ‐0.002 0.002 0.332 ‐0.005
GARCH‐MIDAS (66, urn:x-wiley:jae:media:jae2742:jae2742-math-0164) 0.000 ‐0.002 0.000 ‐0.066 ‐3.414 ‐0.003
GARCH 0.003 0.003 0.031 0.020
Panel B: Zn,i,t Student t distributed
Monthly τt GARCH‐MIDAS (36) ‐0.000 ‐0.004 ‐0.008 0.040 1.491 0.108
{0.008} {0.014} {0.075} {0.152} {5.983}
(0.008) (0.015) (0.071) (0.141) (11.033)
GARCH‐MIDAS (12) ‐0.000 ‐0.003 ‐0.006 ‐0.030 ‐0.589 0.109
GARCH‐MIDAS (36, urn:x-wiley:jae:media:jae2742:jae2742-math-0165) ‐0.000 ‐0.003 ‐0.006 0.003 0.715 0.110
GARCH‐MIDAS (12, urn:x-wiley:jae:media:jae2742:jae2742-math-0166) ‐0.000 ‐0.002 ‐0.004 ‐0.073 ‐0.797 0.111
GARCH ‐0.000 0.003 0.011 0.122
Daily τt GARCH‐MIDAS (264) ‐0.000 ‐0.003 ‐0.002 0.012 1.136 0.112
{0.008} {0.014} {0.065} {0.082} {5.896}
(0.008) (0.014) (0.063) (0.075) (6.039)
GARCH‐MIDAS (66) 0.000 ‐0.002 0.000 ‐0.052 ‐2.730 0.114
GARCH‐MIDAS (264, urn:x-wiley:jae:media:jae2742:jae2742-math-0167) 0.000 ‐0.003 ‐0.001 0.003 0.341 0.114
GARCH‐MIDAS (66, urn:x-wiley:jae:media:jae2742:jae2742-math-0168) 0.000 ‐0.002 0.001 ‐0.064 ‐3.372 0.116
GARCH 0.003 0.003 0.034 0.141
  • Note. The table reports the average bias of parameter estimates and the corresponding standard errors across 2,000 Monte Carlo simulations. We provide results for both daily and monthly long‐term components. In curly brackets, empirical standard deviations of parameter estimates are reported. Entries in parentheses correspond to the square root of average Wang and Ghysels (2015) asymptotic variances. The parameter estimates are based on (the first) 20 years of observations (i.e. the in‐sample period). In both long‐term components (see Equations 7 and 9), we choose θ=0.3 and w1=1. We use m=0.1 and w2=4 in the monthly τt and m=−0.1 and w2=5 in the daily τt. The long‐term component is assumed to depend on K=36 monthly or K=264 daily observations. The covariate Xt is modeled as an AR(1) process; that is, urn:x-wiley:jae:media:jae2742:jae2742-math-0169, with ϕ=0.9, urn:x-wiley:jae:media:jae2742:jae2742-math-0170 for a monthly, and ϕ=0.98, urn:x-wiley:jae:media:jae2742:jae2742-math-0171 for a daily τt. The parameters of the short‐term component are in both cases given by α=0.06, β=0.91 and γ=0. For each model that is estimated based on the true value of Xt, we also incorporate estimations in which Xt is replaced by a noisy proxy urn:x-wiley:jae:media:jae2742:jae2742-math-0172. It is modeled as urn:x-wiley:jae:media:jae2742:jae2742-math-0173 in the case of the monthly varying τt and urn:x-wiley:jae:media:jae2742:jae2742-math-0174 in the case of a daily varying τt. The column “κ−3” presents the mean excess kurtosis of the standardized residuals from each model.

3.2.2 Misspecified models: Bias

Next, we investigate the effect of model misspecification. First, we consider specifications with a smaller lag length than the true one. 1313 We do not report results for K being chosen too large as the Beta weighting scheme is flexible enough to downweight uninformative lags to almost zero. Choosing a lag length that is too small (K=12 for monthly τt or K=66 for daily τt) does not lead to a bias in the parameter estimates—with the exception of w2. Now the QMLE of w2 is downwardly biased. As the estimated weighting schemes in Figure 3 show, the downward bias in w2 translates into biased weighting schemes. Second, we consider the case of observing the explanatory variable Xt with measurement error. This is a reasonable scenario because in practice the true Xt is either unknown to or unobservable for the researcher, who will base his analysis on a reasonable proxy. We denote the proxy by urn:x-wiley:jae:media:jae2742:jae2742-math-0175 and specify it as Xt plus conditionally heteroskedastic noise. In the case of monthly τt the noise is given by urn:x-wiley:jae:media:jae2742:jae2742-math-0176 and in the case of daily τt by urn:x-wiley:jae:media:jae2742:jae2742-math-0177. The average correlation between Xt and urn:x-wiley:jae:media:jae2742:jae2742-math-0178 is 68.79%/62.71% for monthly/daily τt. As before, only the QML estimates of w2 appear to be biased when Xt is replaced with urn:x-wiley:jae:media:jae2742:jae2742-math-0179. Last, we estimate a misspecified one‐component GARCH model that is obtained when restricting τt to be constant. Despite the omitted long‐term component, the parameter estimates of α and β are essentially unbiased.

jae2742-fig-0003
Weighting schemes implied by mean parameter estimates. Estimated Beta weighting schemes (see Equation 9) as implied by the mean parameter estimates reported in Table 1. The green (solid) line corresponds to the case of a correctly specified model, whereas the red (dot‐dashed) line corresponds to a model with K being too small. With the brown (long dashed) and purple (short dashed) line, the corresponding cases of a GARCH‐MIDAS with measurement error are reported. The black line shows the true weighting scheme [Colour figure can be viewed at wileyonlinelibrary.com]

Note that the numbers in panel B of Table 1 are very similar to those in panel A. When replacing the normally distributed innovations with Student t distributed innovations, the density in the maximum likelihood estimation is misspecified and the estimator is truly QMLE. Nevertheless, this change hardly affects our findings. The only notable difference can be seen in the last column of Table 1, which shows the average excess kurtosis of the fitted standardized residuals. Those residuals are given by urn:x-wiley:jae:media:jae2742:jae2742-math-0180 for the GARCH‐MIDAS models and by urn:x-wiley:jae:media:jae2742:jae2742-math-0181 for the GARCH model. While the excess kurtosis is essentially zero in panel A, in panel B there is still excess kurtosis, reflecting the fact that the innovations are Student t distributed.

3.3 Forecast evaluation

Next, we evaluate the forecast performance of the different specifications. Based on the in‐sample parameter estimates, we construct OOS volatility forecasts for the remaining 20 years. Keeping the parameter estimates fixed is usually referred to as a “fixed (forecasting) scheme.” 1414 In contrast, in the empirical forecast evaluation in Section 4.4 we apply a “rolling scheme.” As we will discuss below, this is important because it takes into account the real‐time nature of the data and allows for changes in the model parameters. The forecast performance of the different models will be evaluated over the 2,000 Monte Carlo replications.

We compare the forecast performance of the correctly specified GARCH‐MIDAS with all the misspecified models presented in Table 1. In addition, we consider the two‐state MS‐GARCH‐TVI model that was introduced in Section 2.2. 1515 In‐sample parameter estimates for the MS‐GARCH‐TVI model can be found in the Supporting Information Appendix, Table B.1. The median estimates of α and β are close to the true values. The estimates of ω1 and ω2 represent a low‐ and a high‐volatility regime. As measured by ϖ=p1,1+p2,2−1, the degree of persistence in the long‐term component is very high.

3.3.1 MZ regression

We first present the outcomes of MZ regressions. Figure 4 shows the urn:x-wiley:jae:media:jae2742:jae2742-math-0182 of MZ regressions for volatility forecasts, hk,t+1|t, with k=1,…,22 (i.e., for up to 1 month ahead). Forecast evaluation is based on the noisy proxy urn:x-wiley:jae:media:jae2742:jae2742-math-0183, whereby the data generating process is the GARCH‐MIDAS with monthly τt and normally distributed innovations. The forecasts are generated from the correctly specified GARCH‐MIDAS model. We present the urn:x-wiley:jae:media:jae2742:jae2742-math-0184 for the full OOS period as well as for three different volatility regimes: low, normal, and high. Volatility regimes are defined as follows. We consider the empirical distribution of daily realized variances during the OOS period. A forecast falls into the low/normal/high‐volatility regime if the level of the realized variance on the day the forecast has been issued is below the 25% quantile, between the 25% and 75% quantile, or above the 75% quantile of the empirical distribution. In line with our theoretical result in Proposition 4, the urn:x-wiley:jae:media:jae2742:jae2742-math-0185s for the full sample are decreasing with increasing forecast horizon. As expected, urn:x-wiley:jae:media:jae2742:jae2742-math-0186 is below the upper bound of one‐third (see Equation 15). Among the three regimes, we observe the highest urn:x-wiley:jae:media:jae2742:jae2742-math-0187s in the high‐volatility regime. Clearly, the high urn:x-wiley:jae:media:jae2742:jae2742-math-0188s in the high‐volatility regime do not reflect an improved absolute forecast performance but rather an improved relative forecast performance. Further, note that for almost all forecast horizons the urn:x-wiley:jae:media:jae2742:jae2742-math-0189s in the full sample are higher than in each subsample.

jae2742-fig-0004
MZ R2—monthly τt—evaluation based on urn:x-wiley:jae:media:jae2742:jae2742-math-0190. The figure shows the average urn:x-wiley:jae:media:jae2742:jae2742-math-0191 of MZ regressions based on the predictions from the correctly specified GARCH‐MIDAS model over all 2,000 Monte Carlo replications. The true volatility is proxied by urn:x-wiley:jae:media:jae2742:jae2742-math-0192. Besides the full out‐of‐sample period, we consider low‐, normal‐, and high‐volatility regimes. For a definition of the regimes see Section 3.3.1 [Colour figure can be viewed at wileyonlinelibrary.com]

For empirical applications, cumulative volatility forecasts are of greater importance than k‐step‐ahead forecasts. Hence in Figure 5 we present the urn:x-wiley:jae:media:jae2742:jae2742-math-0193 of MZ regressions for cumulative volatility forecasts, h1:k,t+1|t, with k=1,…,22. Note that, by construction, the volatility forecasts are nonoverlapping. We now present forecasts from the correctly specified and the misspecified GARCH‐MIDAS models as well as from the MS‐GARCH‐TVI and the nested GARCH. Forecast evaluation is based on the precise proxy RV1:k,t+1. Panels (a)/(b) show the results for monthly/daily τt. Based on Figure 5, we are able to rank the different models' forecast performance. While the performance of all GARCH‐MIDAS models is essentially indistinguishable, the one‐component GARCH and the MS‐GARCH‐TVI models lead to a lower urn:x-wiley:jae:media:jae2742:jae2742-math-0194. Differences between models are most pronounced in the low and normal regime.

jae2742-fig-0005
MZ urn:x-wiley:jae:media:jae2742:jae2742-math-0195—monthly and daily τt—evaluation based on RV1:k,t+1: (a) monthly τt; (b) daily τt. For each model the figure shows the average urn:x-wiley:jae:media:jae2742:jae2742-math-0196 of the MZ regressions over the 2,000 Monte Carlo replications. The true volatility is proxied by RV1:k,t+1. The upper/lower panels display the case of monthly/daily long‐term components. Besides the full out‐of‐sample period, we consider low‐, normal‐, and high‐volatility regimes. For a definition of the regimes see Section 3.3.1 [Colour figure can be viewed at wileyonlinelibrary.com]

3.3.2 Model confidence sets

Next, we formally test for superior predictive ability. We base our analysis on the MCS approach introduced by Hansen et al. (2011). Following the arguments in Patton (2011), we use the QLIKE loss as the evaluation criterion. For a k‐step‐ahead volatility forecast, the QLIKE is defined as
urn:x-wiley:jae:media:jae2742:jae2742-math-0197(22)

The QLIKE is the only robust loss function that depends solely on the standardized forecast error, urn:x-wiley:jae:media:jae2742:jae2742-math-0198. As discussed in Patton (2011), the QLIKE is less sensitive with respect to extreme observations than the squared error loss. Further, it can be shown that the moment conditions required for Diebold and Mariano (1995) or Giacomini and White (2006) type tests are weaker under QLIKE than under squared error loss (see Patton, 2006).

We consider the following forecasting schemes. Based on the information available at the last day of the current month, cumulative volatility forecasts are computed for horizons of 1 day (1d), 2 weeks (2w), and 1 month (1m), as well as forecasts of volatility in 2 months (2m) and 3 months (3m). Whenever the forecast horizon is longer than the frequency of the long‐term component, the optimal forecast requires predicting the long‐term component. Instead, we simply fix the long‐term component at its current level (see Section 2.4). Forecast evaluation is now based on the precise proxy RV1:k,t+1. Next, we explain how the MCS is obtained. Denote by urn:x-wiley:jae:media:jae2742:jae2742-math-0199 the set of all competing models. We define
urn:x-wiley:jae:media:jae2742:jae2742-math-0200
as the difference in the QLIKE loss of models i and j. For example, when s=1 and k∈{1,5,22} the forecast urn:x-wiley:jae:media:jae2742:jae2742-math-0201 denotes the cumulative forecast for the first (1d), the first 5 (1w), or all 22 (1m) days in the following month while for s∈2,3 and k=22 we obtain the forecast for 2 (2m) and 3 (3m) months in the future. We compute the average loss difference, urn:x-wiley:jae:media:jae2742:jae2742-math-0202, and calculate the test statistic:
urn:x-wiley:jae:media:jae2742:jae2742-math-0203(23)

The MCS test statistic is then given by urn:x-wiley:jae:media:jae2742:jae2742-math-0204 and has the null hypothesis that all models have the same expected loss. Under the alternative, there is some model i that has an expected loss greater than the expected loss of all other models urn:x-wiley:jae:media:jae2742:jae2742-math-0205. If the null hypothesis is rejected, the worst‐performing model is eliminated. The test is performed iteratively, until no further model can be eliminated. We denote the final set of surviving models by urn:x-wiley:jae:media:jae2742:jae2742-math-0206. This final set contains the best forecasting model with confidence level 1−ν. We set ν=0.1. This choice is common practice in the literature. See, for example, Laurent, Rombouts, and Violante (2013) and Liu, Patton, and Sheppard (2015).

Since the asymptotic distribution of the test statistic urn:x-wiley:jae:media:jae2742:jae2742-math-0207 is nonstandard, we approximate it by block‐bootstrapping as proposed by Hansen et al. (2011), where the block length is determined by fitting an AR(p) process to the series of loss differences. In our analysis, 8,000 bootstrap replications at each stage were sufficient in order to obtain stable results. 1616 For implementing the MCS procedure, we use the R package rugarch (Ghalanos, 2018), which includes the implementation used in the MFE Matlab Toolbox by Kevin Sheppard. See https://www.kevinsheppard.com/MFE_Toolbox.

Table 2 reports how often a certain model is included in the MCS across the 2,000 replications. Panel A provides results for normally distributed innovations and panel B for Student t distributed innovations. For example, for normally distributed innovations, monthly τt, and a forecast horizon of 1 day, the correctly specified GARCH‐MIDAS (36) is included in the MCS in 85% of the replications. The table clearly shows that the misspecified one‐component GARCH model is included less often in the MCS than the GARCH‐MIDAS models. In particular, this is the case for daily τt. Further, for daily τt and forecast horizons of up to 2 months the MS‐GARCH‐TVI is less often part of the MCS than all GARCH‐MIDAS models. Additionally, among the GARCH‐MIDAS models the correctly specified one has the highest inclusion rates in the MCS when the forecast horizon is up to 1 month. At least for monthly τt, it appears that a misspecification of the lag length is less severe than observing the explanatory variable with measurement error. Finally, at the longest forecast horizon (3m) all forecasts suffer from a misspecified forecast of the long‐term component and hence it becomes increasingly difficult to distinguish between models.

Table 2. Model confidence set inclusion rates
1d 2w 1m 2m 3m
Panel A: Zn,i,t normally distributed
Monthly τt GARCH‐MIDAS (36) 0.850 0.758 0.770 0.795 0.792
GARCH‐MIDAS (12) 0.852 0.745 0.762 0.818 0.827
GARCH‐MIDAS (36, urn:x-wiley:jae:media:jae2742:jae2742-math-0208) 0.723 0.559 0.589 0.650 0.661
GARCH‐MIDAS (12, urn:x-wiley:jae:media:jae2742:jae2742-math-0209) 0.696 0.539 0.560 0.648 0.684
MS‐GARCH‐TVI 0.765 0.560 0.603 0.664 0.673
GARCH 0.477 0.221 0.216 0.260 0.310
Daily τt GARCH‐MIDAS (264) 0.946 0.893 0.861 0.784 0.743
GARCH‐MIDAS (66) 0.850 0.796 0.836 0.890 0.878
GARCH‐MIDAS (264, urn:x-wiley:jae:media:jae2742:jae2742-math-0210) 0.843 0.672 0.646 0.663 0.688
GARCH‐MIDAS (66, urn:x-wiley:jae:media:jae2742:jae2742-math-0211) 0.763 0.614 0.664 0.778 0.831
MS‐GARCH‐TVI 0.376 0.100 0.138 0.467 0.765
GARCH 0.257 0.043 0.050 0.244 0.493
Panel B: Zn,i,t Student t distributed
Monthly τt GARCH‐MIDAS (36) 0.912 0.790 0.772 0.761 0.764
GARCH‐MIDAS (12) 0.922 0.808 0.785 0.812 0.818
GARCH‐MIDAS (36, urn:x-wiley:jae:media:jae2742:jae2742-math-0212) 0.842 0.656 0.640 0.652 0.650
GARCH‐MIDAS (12, urn:x-wiley:jae:media:jae2742:jae2742-math-0213) 0.841 0.636 0.622 0.668 0.683
MS‐GARCH‐TVI 0.875 0.666 0.654 0.675 0.664
GARCH 0.734 0.331 0.267 0.280 0.309
Daily τt GARCH‐MIDAS (264) 0.968 0.912 0.866 0.792 0.742
GARCH‐MIDAS (66) 0.918 0.839 0.862 0.885 0.854
GARCH‐MIDAS (264, urn:x-wiley:jae:media:jae2742:jae2742-math-0214) 0.927 0.769 0.712 0.694 0.685
GARCH‐MIDAS (66, urn:x-wiley:jae:media:jae2742:jae2742-math-0215) 0.877 0.726 0.731 0.812 0.822
MS‐GARCH‐TVI 0.690 0.222 0.206 0.501 0.758
GARCH 0.602 0.112 0.093 0.276 0.485
  • Note. The numbers are the empirical frequencies of a model being included in the 90% model confidence set at different forecast horizons: 1 day (1d), 2 weeks (2w), 1 month (1m), 2 months (2m), and 3 months (3m). Panel A corresponds to the simulation with normally distributed intraday returns and Panel B to standardized Student t distributed intraday returns with five degrees of freedom. The averages are taken across 2,000 Monte Carlo replications.

In summary, independently of whether the long‐term component is specified at a daily or monthly frequency, the correctly specified GARCH‐MIDAS model as well as the GARCH‐MIDAS with misspecified lag length clearly outperform the one‐component GARCH as well as the MS‐GARCH‐TVI in terms of forecast performance. For models with daily long‐term components this result also holds when the explanatory variable is observed with measurement error. Only for monthly long‐term components and measurement error in Xt, we find that the MS‐GARCH‐TVI performs slightly better.

Remark 2.As discussed in Section 2.1, Assumption 3 is likely to hold for explanatory variables that are observed at a lower frequency than the daily returns. For certain daily explanatory variables (e.g., the VIX index) Assumption 3 might be violated. However, under reasonable assumptions the correlation between the innovations to the daily returns and Xt itself can be expected to be small. The correlation with future τt will be even smaller. For a more detailed discussion see Supporting Information Appendix D, which also provides additional simulations. The simulations show that even if Assumption 3 is mildly violated all the previous findings still hold.

4 EMPIRICAL ANALYSIS

Last, we turn to an empirical application of the GARCH‐MIDAS models to S&P 500 return data. In Section 4.1 we introduce our data set. Full sample estimation results for various GARCH‐MIDAS models are reported in Section 4.2. Thereafter, in Section 4.3 we explain how real‐time volatility forecasts can be constructed when taking into account the release schedule of macroeconomic variables. The forecast comparison is carried out in Section 4.4, where we evaluate the GARCH‐MIDAS volatility forecasts against forecasts from eight competitor models.

4.1 Data

4.1.1 Stock market data

We consider daily log‐returns on the S&P 500, calculated as urn:x-wiley:jae:media:jae2742:jae2742-math-0216, for the 1971:M1 to 2018:M4 period. For evaluating the volatility forecasts, we employ daily realized variances, RVi,t, defined as the sum of the squared 5‐minute intraday log‐returns on day t plus the squared overnight log‐return. The latter is defined as the log of the open price on day t minus the log of the close price on day t−1. This approach follows Bollerslev, Hood, Huss, and Pedersen (2018), among others. The data for constructing RVi,t were obtained from the Realized Library of the Oxford‐Man Institute of Quantitative Finance and are available from the year 2000 onwards (see Heber, Lunde, Shephard, & Sheppard, 2009).

4.1.2 Explanatory variables

As explanatory variables we use daily measures of financial risk, a weekly measure of financial conditions and monthly macroeconomic variables. We employ backward‐ and forward‐looking measures of daily volatility. The former is proxied by a rolling window of the average realized volatility (based on squared daily returns) over the previous 22 days, urn:x-wiley:jae:media:jae2742:jae2742-math-0217, and the latter by the VIX index (converted to a daily level by dividing it by urn:x-wiley:jae:media:jae2742:jae2742-math-0218). In addition, we consider the difference between the VIX (divided by urn:x-wiley:jae:media:jae2742:jae2742-math-0219) and RVol(22) as a proxy for the (square root of the) variance risk premium (VRP). 1616 Note that the conventional definition of the variance risk premium is the squared VIX minus realized variance. We are interested in expressing the quantity in volatility units. Because the realized VRP takes positive as well as negative values, we take the square root of both quantities before we take the difference.

We use the weekly National Financial Conditions Index (NFCI) as a measure for the tightness of financial conditions in the USA. The NFCI is a weighted average of 105 standardized financial indicators of risk, credit and leverage derived by dynamic factor analysis. Monthly macroeconomic conditions are measured by the Chicago Fed National Activity Index (NAI) and growth rates of industrial production (IP) and housing starts, both calculated as urn:x-wiley:jae:media:jae2742:jae2742-math-0220. While the macroeconomic variables are included from 1971 onwards, the NFCI series begins in 1973 and the VIX is available from 1990 onwards. 1717 Table B.2 in the Supporting Information Appendix provides summary statistics for the stock returns and the seven explanatory variables. Figure C.2 in the Supporting Information Appendix shows the evolution of the corresponding time series. Further details on the data set are provided in Supporting Information Appendix F.

Before we estimate GARCH‐MIDAS models, we employ the Conrad and Schienle (2018) Lagrange multiplier (LM) test for an omitted multiplicative component in one‐component GARCH models. This test checks whether a simple GJR‐GARCH(1,1) is misspecified in the sense of neglecting a second component that is driven by an explanatory variable X. Since the test is of the LM type, it requires estimation of the model under the null hypothesis only. Assuming that under the alternative there is a second component which is driven by K lags of the variable X, the test statistic can be shown to be χ2 with K degrees of freedom. An appealing property of the test is that it can be applied in settings where X is observed at the same frequency as the returns but also when X is observed at a lower frequency. Intuitively, the test checks whether the squared standardized residuals from the GJR‐GARCH are predictable using (functions of) past values of X. Table 3 shows the outcome of the test when applied to each of our explanatory variables. When choosing either K=1 or K=2, the test clearly rejects the null hypothesis that a GJR‐GARCH is correctly specified for all variables except housing starts. Thus the LM test results suggest using GARCH‐MIDAS models instead. The estimates for a GARCH‐MIDAS model based on housing starts in Section 4.2.1 will show that housing starts are a leading indicator with respect to financial volatility. This implies that the choice of K=1 or K=2 is too small. When redoing the LM test for a lag length of up to K=12 the LM test indeed rejects the null hypothesis also for housing starts.

Table 3. LM test for misspecification of GJR‐GARCH(1,1)
Xt VIX RVol NFCI NAI Δ IP Δ Hous
K=1 urn:x-wiley:jae:media:jae2742:jae2742-math-0221 urn:x-wiley:jae:media:jae2742:jae2742-math-0222 urn:x-wiley:jae:media:jae2742:jae2742-math-0223 urn:x-wiley:jae:media:jae2742:jae2742-math-0224 urn:x-wiley:jae:media:jae2742:jae2742-math-0225 urn:x-wiley:jae:media:jae2742:jae2742-math-0226
K=2 urn:x-wiley:jae:media:jae2742:jae2742-math-0227 urn:x-wiley:jae:media:jae2742:jae2742-math-0228 urn:x-wiley:jae:media:jae2742:jae2742-math-0229 urn:x-wiley:jae:media:jae2742:jae2742-math-0230 urn:x-wiley:jae:media:jae2742:jae2742-math-0231 urn:x-wiley:jae:media:jae2742:jae2742-math-0232
  • Note. The table reports the test statistics and corresponding p‐values of the Conrad and Schienle (2018) misspecification test for one‐component GJR‐GARCH(1,1) models. The test is implemented using either one (K=1) or two (K=2) lags of the explanatory variable Xt. For VIX and RVol(22) the test is based on daily data from 1990 onwards; for the NFCI, NAI, Δ IP, and Δ Housing starts the test is based on weekly/monthly data from 1974 onwards.

We can also apply the LM test jointly to several variables at the same time. However, all variables need to be observed at the same frequency. When including the NAI, industrial production and housing starts and selecting an appropriate lag length, the NAI and housing starts are individually significant, whereas industrial production is not. This suggests that among the macroeconomic variables the NAI and housing starts are most informative. We also aggregated the VIX and the NFCI to a monthly frequency and performed the LM test jointly for all variables. While the overall LM statistic is highly significant, the VIX, the NFCI and housing starts are the only variables that are individually significant.

4.2 Full‐sample parameter estimates

4.2.1 One explanatory variable

We first estimate a GARCH‐MIDAS model for each explanatory variable for the full sample. We include a constant in the mean equation; that is, returns are modeled as ri,t=μ+εi,t. After visual inspection of the estimated weighting schemes for alternative choices of K, we select a lag length that is rather too large than too small. As discussed in Section 3, the data will identify the optimal weighting scheme as long as K is chosen large enough. We choose K=264 for RVol(22), K=3 for the VIX/VRP and K=52 for the NFCI. 1717 For all variables, Figure C.3 in the Supporting Information Appendix shows the estimated weighting schemes for selected choices of K. The figure illustrates that the estimated weighting schemes no longer change once the selected lag length is sufficiently large. In all cases, our choice of the lag length is rather conservative. Thus, for the forward‐looking VIX/VRP, only the most recent information appears to drive long‐term volatility, while the backward‐looking RVol(22) is smoothed over many lags. As in Conrad and Loch (2015), we choose K=36 for the monthly macroeconomic variables. The estimates for the parameters in the conditional variance are reported in Table 4. For all variables except housing starts, we find that a restricted Beta weighting scheme with w1=1 is the best choice; that is, the optimal weights are declining from the beginning. For housing starts, an unrestricted scheme that allows for “hump‐shaped” weights is required. This confirms the finding in Conrad and Loch (2015) that housing starts are leading with respect to long‐term volatility. 1818 Figure C.4 in the Supporting Information Appendix shows the estimated weighting schemes. Note that the GARCH‐MIDAS models based on the NFCI and the three macroeconomic variables employ return data for the 1974:M1 to 2018:M4 period, while the models with daily τt employ data from 1990:M1 onwards. Hence models based on daily τt cannot be compared to models based on weekly/monthly τt in terms of log‐likelihood or Bayesian information criterion (BIC).

Table 4. Full‐sample estimation results: GARCH‐MIDAS with one explanatory variable
α β γ m θ w1 w2 K LLH BIC VR(X)
Daily τt
RVol(22) 0.000 0.843*** 0.192*** ‐1.261*** 1.177*** 1 3.049*** 264 −9,201 18,465 42.78
(0.008) (0.012) (0.015) (0.112) (0.096) (0.675)
VIX 0.000 0.853*** 0.095*** ‐2.129*** 1.524*** 1 3.470** 3 −9,138 18,339 76.14
(0.010) (0.021) (0.015) (0.086) (0.067) (1.371)
VRP 0.017** 0.902*** 0.128*** ‐0.384*** 1.084*** 1 5.571** 3 −9,174 18,410 10.92
(0.007) (0.007) (0.011) (0.137) (0.096) (2.591)
Weekly τt
NFCI 0.017*** 0.902*** 0.115*** ‐0.101 0.252*** 1 2.892 52 −15,103 30,271 11.42
(0.006) (0.005) (0.007) (0.073) (0.048) (2.314)
Monthly τt
NAI 0.019*** 0.900*** 0.116*** ‐0.058 ‐0.359*** 1 9.066*** 36 −14,569 29,202 14.14
(0.006) (0.005) (0.007) (0.079) (0.073) (3.312)
Δ IP 0.019*** 0.903*** 0.113*** 0.074 ‐0.650*** 1 5.271*** 36 −14,573 29,211 10.63
(0.006) (0.005) (0.007) (0.089) (0.161) (1.782)
Δ Housing 0.019*** 0.897*** 0.119*** ‐0.079 ‐0.237*** 1.695*** 2.586*** 36 −14,559 29,192 19.63
(0.005) (0.005) (0.007) (0.076) (0.034) (0.383) (0.770)
GARCH 0.021*** 0.911*** 0.103*** ‐0.073 −15,355 30,757
(0.005) (0.005) (0.007) (0.098)
  • Note. Estimation results for GARCH‐MIDAS models are reported for seven explanatory variables. Estimation using the NFCI, NAI, IP, and housing starts begins in 1974:M1 based on low‐frequency observations reaching as far as 1971:M1 in line with the lag length K. Estimation of the GARCH‐MIDAS models using RVol(22) and VIX as an explanatory variable employs daily return data starting in 1990:M1. For all explanatory variables except housing starts a restricted weighting scheme is chosen (w1=1). Bollerslev–Wooldridge standard errors are reported in parentheses, where asterisks indicate significance at the ***1%, **5%, and *10% level. LLF is the value of the maximized log‐likelihood function and BIC is the Bayesian information criterion. The variance ratio urn:x-wiley:jae:media:jae2742:jae2742-math-0233 is calculated on monthly aggregates. Estimates for μ are omitted.

Concerning the parameter estimates, it is interesting to observe that the GARCH‐MIDAS models with daily τt lead to lower estimates of β than models with weekly or monthly τt. While for the models with daily τt the estimates of α are close to zero, there is strong evidence for asymmetry (as indicated by the highly significant γ parameter). These parameter estimates imply that the deviations of the short‐term component from the long‐term component are more short lived for GARCH‐MIDAS models with daily τt. 1818 This behavior is also evident from Figure C.5 in the Supporting Information Appendix, which shows the evolution of the annualized long‐term components and the conditional volatilities. The signs of the estimated θs for realized volatility, the VIX, and the macroeconomic variables are in line with findings in the previous literature. Higher levels of financial volatility tend to increase long‐term volatility, whereas an improvement in macroeconomic conditions decreases long‐term volatility. The finding that a higher variance risk premium and tighter financial conditions (i.e., an increase in the NFCI) predict higher volatility is new. While the positive relation between realized/expected measures of volatility and long‐term volatility might be viewed as “mechanical,” the NFCI as well as the macroeconomic variables can be considered fundamental drivers of financial volatility.

We gauge the importance of the variation in the long‐term component for the overall expected variation in return volatility by the variance ratio introduced in Equation 8. To facilitate comparison across models, we focus on the monthly variation of volatility. That is, for all models we denote the monthly aggregate volatility by urn:x-wiley:jae:media:jae2742:jae2742-math-0234. For models with monthly long‐term components, we have that urn:x-wiley:jae:media:jae2742:jae2742-math-0235. For models with daily or weekly long‐term components, urn:x-wiley:jae:media:jae2742:jae2742-math-0236 refers to monthly aggregates of the daily/weekly long‐term component. We then calculate urn:x-wiley:jae:media:jae2742:jae2742-math-0237, where X indicates that the variance ratio is based on a specific explanatory variable. As Table 4 shows, the models with daily τt achieve much higher variance ratios than the models with a weekly/monthly long‐term component. Among the models with daily long‐term components, the variance ratio of 76.14% for the VIX‐based model is by far the highest and implies that three quarters of the expected variation in return volatility can be traced back to variation in the VIX. In Section 4.4 we will investigate whether a high variance ratio necessarily translates into good OOS predictive performance.

4.2.2 Two explanatory variables

The GARCH‐MIDAS setting allows us to include two or more explanatory variables in the long‐term component. Based on the results in the previous section, the VIX appears to be better suited to capture daily movements in the long‐term component than RVol(22) or the VRP. Since the NFCI and, in particular, the macroeconomic variables capture lower frequency movements, it is natural to estimate GARCH‐MIDAS models with the VIX and one of those variables jointly in the long‐term component. This allows us to formally check whether the NFCI and the three macroeconomic variables contain information that is complementary to the VIX. The long‐term component for those models is given by
urn:x-wiley:jae:media:jae2742:jae2742-math-0238(24)

Estimation results are presented in Table 5. Note that KVIX and KX are chosen as in Table 4. For all models the estimation period is now determined by the availability of the VIX. When controlling for the VIX, the θX parameter turns out to be significant for the NAI and housing starts. Thus macroeconomic variables appear to contain information that is complementary to the one included in the VIX. However, none of the models that include two variables achieves a higher VR than the model based on the VIX alone.

Table 5. Full‐sample estimation results: VIX combined with second explanatory variable
α β γ m θX urn:x-wiley:jae:media:jae2742:jae2742-math-0239 urn:x-wiley:jae:media:jae2742:jae2742-math-0240 θVIX urn:x-wiley:jae:media:jae2742:jae2742-math-0241 KX LLH BIC VR(VIX,X)
Daily τt
VIX 0.000 0.853*** 0.095*** ‐2.129*** 1.524*** 3.470** 3 −9,138 18,339 76.14
(0.010) (0.021) (0.015) (0.086) (0.067) (1.371)
Weekly τt
NFCI 0.000 0.852*** 0.099*** ‐1.993*** 0.118 1 2.252 1.451*** 3.617** 52 −9,110 18,300 75.84
(0.010) (0.020) (0.016) (0.143) (0.085) (4.152) (0.093) (1.518)
Monthly τt
NAI 0.000 0.870*** 0.092*** ‐2.032*** ‐0.108** 1 119.372 1.431*** 3.775** 36 −9,133 18,346 75.06
(0.009) (0.018) (0.015) (0.100) (0.046) (326.330) (0.079) (1.594)
Δ IP 0.000 0.876*** 0.084*** ‐2.133*** ‐0.043 1 8.960 1.528*** 3.806** 36 −9,139 18,357 75.91
(0.009) (0.018) (0.014) (0.096) (0.089) (34.803) (0.072) (1.520)
Δ Housing 0.000 0.863*** 0.097*** ‐2.035*** ‐0.061** 1.001 2.139 1.446*** 3.605** 36 −9,135 18,359 74.99
(0.009) (0.019) (0.015) (0.094) (0.024) (0.743) (2.462) (0.074) (1.503)
  • Note. Estimation results for GARCH‐MIDAS models are reported, in which the daily VIX is combined with the low‐frequency variables reported in Table 4—that is, the NFCI, NAI, and changes in industrial production and housing starts. The estimates are based on daily return data from 1990:M1 to 2018:M4. For comparison, the estimation results using only the VIX as a covariate from Table 4 are included in the first row. All parameters with a superscript X relate to the second explanatory variable. KVIX is always equal to 3. Bollerslev–Wooldridge standard errors are reported in parentheses, where asterisks indicate significance at the ***1%, **5%, and *10% level. LLF is the value of the maximized log‐likelihood function and BIC is the Bayesian information criterion. The variance ratio urn:x-wiley:jae:media:jae2742:jae2742-math-0242 is calculated on monthly aggregates. Estimates for μ are omitted.

4.2.3 More than two explanatory variables

As an extension to Section 4.2.2, one could employ more than two covariates. We experimented with combining three variables in the long‐term component but found no further improvements in terms of model fit. Moreover, GARCH‐MIDAS models including more than two variables in the long‐term component are difficult to estimate because the likelihood is relatively insensitive with respect to changes in the weighting parameters. Instead, in Section 4.4 on OOS forecasting, we will aggregate the information in the different variables by simply calculating the average forecast across all GARCH‐MIDAS models with one explanatory variable.

4.3 Real‐time estimates

In the following, we make use of vintage data. This allows for a realistic evaluation of the GARCH‐MIDAS models' ability to describe the behavior of long‐term financial volatility in real time. 1818 To the best of our knowledge, Lindblad (2017) appears to be the only other paper that makes use of real‐time data when estimating GARCH‐MIDAS models. In order to compare full‐sample estimates of the long‐term component with corresponding real‐time estimates, we reestimate all GARCH‐MIDAS models from Table 4 on a daily basis. Estimation is performed on a rolling window. For each explanatory variable, the window size is determined by the length of the first estimation period ending in 2009:M12. The period 2010:M1 to 2018:M4 will be used as the OOS period for the forecast evaluation in Section 4.4. In order to ensure that our estimates of the long‐term component are feasible in real time, we employ vintage data that are available for the NFCI, the NAI, IP, and housing starts from the ALFRED database hosted by the St. Louis Fed. 1919 For more details on real‐time data availability see Supporting Information Appendix F. When using real‐time data, the long‐term component no longer changes its value at the beginning of a week/month but whenever a new data release becomes available.

Figure 6 shows the estimated long‐term components based on the full‐sample estimates (as reported in Table 4, dotted lines) and based on the rolling window real‐time estimates (solid lines). For RVol(22), VIX, and VRP the long‐term component estimates in the full sample might differ from the rolling window estimates, because they are based on distinct sample periods (rolling window vs. full sample). For the NFCI, NAI, IP, and housing starts, the two long‐term components are not only based on distinct sample periods but also on different data vintages (real‐time vs. final). Figure 6 shows that for RVol(22), VIX, and VRP the rolling window estimate of the long‐term component is often somewhat higher than the full‐sample estimate. For the macroeconomic variables the real‐time estimates of the long‐term component are occasionally below or above the full‐sample estimates. However, the average absolute differences are quite sizable. For example, the average absolute difference between the full‐sample and real‐time estimates based on industrial production is 6.80%. To put this into context, for industrial production the mean absolute revision from the initial release to the latest available data was 2.18% during the 1965:Q3 to 2006:Q4 period (see Croushore, 2011). Among the variables considered in Croushore (2011), this is the highest value (even higher than for GDP). Similar numbers in terms of changes in the long‐term component are obtained for the other variables: 9.35% for housing starts, 4.78% for the NAI, and 2.68% for the NFCI. In summary, these figures highlight the importance of using real‐time instead of final data releases for the macroeconomic variables for a realistic forecast evaluation.

jae2742-fig-0006
Comparison of rolling window and full‐sample long‐term components. For each explanatory variable, the monthly averaged long‐term volatility components, urn:x-wiley:jae:media:jae2742:jae2742-math-0243, are depicted for the period 2010:M1 to the end of 2018:M1, the last month of issuing forecasts and hence real‐time estimation. The long‐term component obtained from the full‐sample estimates is given in green (dotted). Real‐time estimates of the most recently fitted urn:x-wiley:jae:media:jae2742:jae2742-math-0244 are depicted in red (solid). Volatilities are presented on an annualized scale [Colour figure can be viewed at wileyonlinelibrary.com]

4.4 Forecast evaluation

Finally, we evaluate the predictive performance of the GARCH‐MIDAS models in the 2010:M1 to 2018:M4 OOS period. As before, we consider cumulative volatility forecasts for horizons up to 3 months. When computing the forecasts, we keep the long‐term component fixed at its current level. Volatility forecasts are based on the real‐time rolling window parameter estimates as obtained in Section 4.3 (i.e., we apply a “rolling (forecasting) scheme”).

4.4.1 Competitor models

For forecast comparison, we use an extensive range of competitor models which are either extensions of the simple GARCH specification or which model the realized variance directly.

First, we consider the simple one‐component GARCH(1,1) model and a no‐change (or random‐walk) forecast, which simply scales the realized variance on the last day of period t to the appropriate horizon: h1:k,t+s|t=k·RVn,t. Second, we use the MS‐GARCH‐TVI model that we employed in Section 3.3. The only difference is that we now use a GJR‐GARCH specification in both regimes. In addition, we use an MS‐GARCH model that consists of two GARCH equations with individual intercepts and individual ARCH and GARCH parameters. We incorporate asymmetric effects in the low‐volatility regime only. 2020 Initially, we estimated a GJR‐GARCH specification in both regimes. However, it turned out that the asymmetry term was only significant in the component that represents the low‐volatility regime. In addition, we select this specification because it is much more stable in the rolling window estimation than the one with two GJR‐GARCH regimes. We refer to this model as MS‐GARCH with time‐varying coefficients (MS‐GARCH‐TVC). Further, we use the HEAVY model by Shephard and Sheppard (2010) and the realized GARCH model by Hansen et al. (2012). The specifications of the HEAVY and realized GARCH models employ a measure of pure intraday realized variance, urn:x-wiley:jae:media:jae2742:jae2742-math-0245 (defined as the sum of squared intraday returns). Third, we consider two specifications that directly model the realized variance, RVi,t, (including squared overnight returns) and allow us to compute direct (as compared to iterated) volatility forecasts. We employ the HAR model of Corsi (2009) and the HAR model with leverage effect proposed in Corsi and Reno (2012).

For more details on the specification of the competitor models, their estimation, and volatility forecasting see Supporting Information Appendix G. 2121 Table B.3 in the Supporting Information Appendix shows the full‐sample parameter estimates for the competitor models. For the OOS forecast evaluation all competitor models are reestimated on a rolling window basis.

4.4.2 Forecast error statistics and model confidence set

As in Section 3.3.2, we base the comparison of the forecast performance of the different models on the QLIKE loss. Table 6 reports the average QLIKE loss for each model and forecast horizons of 1 day (1d), 2 weeks (2w), 1 month (1m), 2 months (2m), and 3 months (3m). We use the MCS approach to test whether there are one or several models that significantly outperform the others. As in Section 3.3.2, we rely on 90% model confidence sets. 2222 As a robustness check, we present the corresponding results for a 95% MCS in Supporting Information Appendix H. Essentially all findings remain unaffected.

Table 6. QLIKE losses and model confidence sets: full out‐of‐sample period
1d 2w 1m 2m 3m
GARCH‐MIDAS
RVol(22) 0.306 0.246 0.271 0.387 0.428
VIX 0.275 0.215 0.240 0.359 0.414
VRP 0.291 0.227 0.260 0.384 0.430
NFCI 0.324 0.248 0.264 0.363 0.393
NAI 0.343 0.266 0.283 0.391 0.424
Δ IP 0.345 0.267 0.285 0.395 0.438
Δ Housing 0.328 0.252 0.264 0.347 0.380
VIX and NFCI 0.274 0.213 0.236 0.349 0.399
VIX and NAI 0.275 0.215 0.241 0.358 0.409
VIX and Δ IP 0.274 0.214 0.239 0.355 0.409
VIX and Δ Housing 0.275 0.218 0.243 0.351 0.405
Avg. 0.317 0.246 0.264 0.364 0.400
Competitor models
GARCH 0.342 0.263 0.282 0.395 0.434
MS‐GARCH‐TVI 0.362 0.292 0.315 0.426 0.488
MS‐GARCH‐TVC 0.355 0.271 0.283 0.387 0.421
Real GARCH 0.260 0.206 0.233 0.356 0.390
HEAVY 0.277 0.238 0.299 0.539 0.662
HAR 0.254 0.210 0.243 0.368 0.419
HAR (lev.) 0.238 0.207 0.245 0.371 0.419
No‐change 0.358 0.498 0.636 1.157 1.292
  • Note. Numbers reported are the average out‐of‐sample QLIKE losses for each model for 1‐day‐ (1d), 2‐week‐ (2w), 1‐month‐ (1m), 2‐month‐ (2m), and 3‐month‐ahead (3m) variance forecasts. Bold entries indicate the model with the lowest average QLIKE loss per horizon. Shaded entries indicate that the respective model is included in the 90% model confidence set. The average forecast (avg.) is the mean forecast across all GARCH‐MIDAS models employing one explanatory variable. The out‐of‐sample evaluation period spreads from 2010:M1 to 2018:M4.

MCS for full OOS period. Shaded areas in Table 6 indicate that for the corresponding forecast horizon the respective model is included in the final set, urn:x-wiley:jae:media:jae2742:jae2742-math-0246. For example, for a forecast horizon of 1 day the only model that is included in the final MCS is the HAR model with leverage. Thus, at the very short horizon of 1 day, the HAR with leverage dominates all other models. At forecast horizons of 2 weeks the MCS includes both HAR models, the realized GARCH, and GARCH‐MIDAS specifications that either include the VIX alone or in combination with the NFCI/NAI/IP. At the 1‐month horizon only the realized GARCH and the GARCH‐MIDAS that combines the VIX and the NFCI are included. The picture changes at horizons of 2 and 3 months. At these horizons GARCH‐MIDAS models that either combine the VIX with the NFCI/housing starts or models based on housing starts alone are included in the MCS. These results illustrate that the performance of a GARCH‐MIDAS model strongly depends on choosing the best horizon‐specific explanatory variable. In summary, the HAR model with leverage and the realized GARCH achieve the lowest QLIKE at forecast horizons of 1 day and 2 weeks/1 month, respectively. In contrast, the GARCH‐MIDAS model based on housing starts performs best at horizons of 2 and 3 months ahead (see the bold entries).

MCS for volatility regimes. In addition to the results for the full OOS period, we also provide MCS for subsamples of low, normal, and high volatility. We define these regimes in the same way as outlined in Section 3.3. Quantiles are now computed based on the empirical distribution of full‐sample realized variances. In total, we have 764 observations in the low, 961 in the normal, and 304 in the high regime. Table 7 presents the regime‐specific analysis.

Table 7. QLIKE losses and model confidence sets: low/normal/high‐volatility regimes
Low‐volatility regime Normal‐volatility regime High‐volatility regime
1d 2w 1m 2m 3m 1d 2w 1m 2m 3m 1d 2w 1m 2m 3m
GARCH‐MIDAS
RVol(22) 0.364 0.264 0.305 0.399 0.409 0.271 0.232 0.260 0.400 0.463 0.273 0.241 0.217 0.313 0.365
VIX 0.332 0.210 0.250 0.367 0.354 0.233 0.204 0.231 0.355 0.454 0.262 0.259 0.243 0.347 0.437
VRP 0.349 0.237 0.288 0.405 0.424 0.252 0.215 0.245 0.375 0.440 0.266 0.238 0.237 0.360 0.414
NFCI 0.400 0.274 0.304 0.389 0.402 0.272 0.228 0.248 0.364 0.417 0.292 0.244 0.217 0.293 0.297
NAI 0.438 0.308 0.338 0.432 0.460 0.284 0.240 0.260 0.384 0.427 0.292 0.241 0.216 0.309 0.322
Δ IP 0.441 0.313 0.343 0.437 0.468 0.287 0.241 0.262 0.389 0.447 0.288 0.236 0.213 0.310 0.335
Δ Housing 0.402 0.277 0.300 0.386 0.406 0.279 0.234 0.249 0.331 0.385 0.295 0.249 0.219 0.300 0.298
VIX and NFCI 0.331 0.212 0.250 0.364 0.351 0.235 0.203 0.228 0.345 0.440 0.254 0.249 0.229 0.321 0.391
VIX and Δ Indpro 0.332 0.211 0.251 0.363 0.351 0.234 0.204 0.230 0.354 0.450 0.257 0.253 0.237 0.338 0.424
VIX and NAI 0.333 0.212 0.254 0.367 0.354 0.234 0.205 0.231 0.358 0.449 0.259 0.254 0.237 0.334 0.417
VIX and Δ Housing 0.330 0.213 0.254 0.369 0.359 0.237 0.209 0.234 0.343 0.439 0.260 0.261 0.242 0.333 0.409
Avg. 0.396 0.273 0.303 0.391 0.403 0.269 0.226 0.247 0.362 0.418 0.272 0.240 0.217 0.306 0.335
Competitor models
GARCH 0.430 0.296 0.325 0.419 0.452 0.285 0.241 0.263 0.394 0.441 0.300 0.252 0.232 0.340 0.370
MS‐GARCH‐TVI 0.468 0.338 0.370 0.452 0.519 0.303 0.270 0.298 0.429 0.488 0.286 0.246 0.233 0.348 0.405
MS‐GARCH‐TVC 0.461 0.318 0.335 0.414 0.437 0.290 0.245 0.263 0.386 0.432 0.295 0.239 0.218 0.324 0.350
Real GARCH 0.237 0.182 0.239 0.380 0.409 0.256 0.208 0.229 0.358 0.408 0.331 0.261 0.229 0.287 0.289
HEAVY 0.272 0.223 0.326 0.591 0.759 0.262 0.228 0.273 0.498 0.593 0.339 0.305 0.313 0.535 0.642
HAR 0.234 0.189 0.254 0.359 0.385 0.243 0.212 0.238 0.374 0.430 0.340 0.257 0.230 0.371 0.470
HAR (lev.) 0.226 0.187 0.258 0.362 0.387 0.232 0.211 0.240 0.378 0.429 0.286 0.245 0.227 0.373 0.470
No‐change 0.418 0.821 1.143 2.213 2.310 0.304 0.297 0.336 0.532 0.715 0.382 0.320 0.314 0.481 0.555
  • Note. Numbers reported are the average out‐of‐sample QLIKE losses for each model for 1‐day‐ (1d), 2‐week‐ (2w), 1‐month‐ (1m), 2‐month‐ (2m), and 3‐month‐ahead (3m) variance forecasts across three different volatility regimes; forecasts are issued at a day for which the daily realized volatility is below the empirical 25% quantile (low regime), between the 25% and 75% quantile (normal regime), or above the 75% quantile (high regime). Bold entries indicate the model with the lowest average QLIKE loss per regime and horizon. Shaded entries indicate that the respective model is included in the 90% model confidence set. The average forecast (avg.) is the mean forecast across all GARCH‐MIDAS models employing one explanatory variable. The out‐of‐sample evaluation period spreads from 2010:M1 to 2018:M4.

Interestingly, in the low‐volatility regime the realized GARCH and the two HAR models are the only models in the MCS for short horizons of 1 day and 2 weeks. For a forecast horizon of 1 month, various GARCH‐MIDAS models are included in the MCS. For 3 months ahead, two GARCH‐MIDAS specifications based on the VIX are the only models in the MCS. The results for the normal‐volatility regime are even more in favor of the GARCH‐MIDAS models. At essentially all horizons GARCH‐MIDAS models based on the VIX are included in the MCS. As for the full OOS period, GARCH‐MIDAS based on housing starts is the only model in the 3‐month MCS. Finally, in the high‐volatility regime and for horizons of 2 weeks and 1 month, essentially all models are included in the MCS. This result may be driven by the fact that the intermediate‐term forecast performance of all models substantially deteriorates during the high‐volatility regime and, therefore, it becomes increasingly difficult to distinguish between models. Nevertheless, even in the high‐volatility regime the GARCH‐MIDAS models are very competitive for longer forecast horizons. Specifically, GARCH‐MIDAS models based on the NFCI and housing starts are included in the MCS.

In summary, we find that the informative content of the explanatory variables depends on the volatility regime. While in low‐ and normal‐volatility regimes GARCH‐MIDAS models based on the VIX or VIX combined with another variable perform well, in high‐volatility regimes models purely based on macroeconomic variables are very competitive. Because recessions typically coincide with regimes of high volatility, our results are consistent with the finding from the previous literature that macroeconomic variables are particularly useful to predict financial volatility during the onset of recessions (see, e.g., Paye, 2012). At the longest forecast horizons, housing starts and the NFCI become increasingly important. Among the competitor models it is again the realized GARCH which performs very well across volatility regimes.

4.4.3 MZ regressions

Lastly, we consider the outcome of MZ regressions. As Table 8 shows, for forecast horizons of 1 day and 2 weeks the highest R2 is achieved by GARCH‐MIDAS‐type models. This is in sharp contrast to the results from the previous section. However, for longer forecast horizons (1m–3m) the winning models according to the R2 are exactly the same as when using the MCS approach. Thus, at forecast horizons at which the correct modeling of the long‐term component pays off, the R2 selects the same model as the MCS. Again, the last three columns of Table 8 show that the highest R2s are obtained in the high‐volatility regime. 2121 For brevity, we now focus on a forecast horizon of 1 month.

Table 8. Mincer–Zarnowitz R2
Panel A: Full out‐of‐sample period Panel B: Volatility regimes
Low Normal High
1d 2w 1m 2m 3m 1m 1m 1m
GARCH‐MIDAS
RVol(22) 0.312 0.367 0.340 0.086 0.008 0.037 0.061 0.314
VIX 0.347 0.346 0.321 0.145 0.047 0.071 0.099 0.297
VRP 0.343 0.404 0.354 0.128 0.030 0.041 0.083 0.324
NFCI 0.295 0.375 0.354 0.146 0.062 0.030 0.073 0.341
NAI 0.294 0.373 0.352 0.143 0.062 0.025 0.071 0.339
Δ IP 0.296 0.374 0.348 0.124 0.029 0.017 0.065 0.341
Δ Housing 0.293 0.372 0.355 0.168 0.102 0.031 0.077 0.334
VIX and NFCI 0.353 0.359 0.333 0.147 0.050 0.072 0.100 0.302
VIX and NAI 0.348 0.349 0.323 0.146 0.048 0.067 0.099 0.297
VIX and Δ IP 0.348 0.347 0.321 0.145 0.047 0.070 0.099 0.296
VIX and Δ Housing 0.347 0.346 0.321 0.153 0.056 0.064 0.099 0.295
Avg. 0.322 0.380 0.357 0.149 0.057 0.036 0.078 0.341
Competitor models
GARCH 0.288 0.373 0.353 0.138 0.051 0.027 0.068 0.343
MS‐GARCH‐TVI 0.316 0.357 0.288 0.118 0.016 0.005 0.015 0.339
MS‐GARCH‐TVC 0.311 0.390 0.368 0.142 0.052 0.030 0.066 0.374
Real GARCH 0.318 0.394 0.377 0.146 0.070 0.076 0.112 0.303
HEAVY 0.297 0.322 0.272 0.061 0.004 0.028 0.084 0.173
HAR 0.312 0.394 0.374 0.125 0.052 0.058 0.087 0.315
HAR (lev.) 0.342 0.392 0.366 0.122 0.053 0.056 0.088 0.303
No‐change 0.254 0.227 0.189 0.060 0.020 0.046 0.044 0.088
  • Note. We report coefficients of determination derived from MZ regressions. Bold entries indicate the models with the highest R2 for a specific forecast horizon. The last three columns correspond to the forecast evaluation divided into three volatility regimes; forecasts are issued at a day for which the daily realized volatility is below the empirical 25% quantile (low regime), between the 25% and 75% quantile (normal regime), or above the 75% quantile (high regime). The out‐of‐sample evaluation period spreads from 2010:M1 to 2018:M4.

5 CONCLUSION

We introduce and discuss the properties of a class of multiplicative volatility models. This class of models includes the GARCH‐MIDAS but also a variant of the MS‐GARCH. We show that multiplicative volatility models can generate an autocorrelation structure in the conditional variance that mimics the long‐memory‐type behavior that is often observed for realized variances. We also argue that the R2 of an MZ regression can be a misleading measure of forecast accuracy across volatility regimes because the R2 will be highest in the regime with the highest squared error loss. In a Monte Carlo simulation, we investigate the properties of the QMLE of the GARCH‐MIDAS model and show that the estimator is unbiased and that the Wang and Ghysels (2015) asymptotic standard errors are valid in the presence of exogenous explanatory variables. We also reveal that forecast performance is relatively insensitive with respect to moderate misspecification of the explanatory variable and the true lag length.

In an empirical application to S&P 500 stock returns, we compare the forecast performance of the GARCH‐MIDAS model with a wide range of competitor models. As expected, relative forecast performance depends on the forecast horizon. Among all models, the HAR with leverage performs best at a one‐day horizon. For longer forecast horizons the realized GARCH is very competitive and performs best at forecast horizons of 2 weeks and 1 month. The performance of GARCH‐MIDAS models depends on the choice of explanatory variable. The best GARCH‐MIDAS specifications generate volatility forecasts that are comparable to or improve upon the forecasts from the realized GARCH. Specifically, GARCH‐MIDAS specifications that combine the VIX with the NFCI are included in the MCS for forecast horizons of 2 weeks up to 2 months. Most importantly, the GARCH‐MIDAS based on housing starts achieves the lowest QLIKE at forecast horizons of 2 and 3 months ahead. Thus our results are useful for selecting the appropriate horizon‐specific explanatory variable and suggest that models based on low‐frequency information can be more useful than models that exploit high‐frequency intraday data.

ACKNOWLEDGMENTS

We would like to thank the co‐editor, Eric Ghysels, and two anonymous referees for comments, which greatly improved our paper. We also thank Richard Baillie, Matei Demetrescu, Markus Haas, Karin Stürmer, Timo Teräsvirta, and Peter Winker for helpful comments and suggestions. A preliminary version of this paper was circulated under the title “On the statistical properties of multiplicative GARCH models.”

    image

    This article has earned an Open Data Badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available at [http://qed.econ.queensu.ca/jae/datasets/conrad002/].

    OPEN RESEARCH BADGES

    image

    This article has earned an Open Data Badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available at [http://qed.econ.queensu.ca/jae/datasets/conrad002/].

    • 1 The packages are available at: https://cran.r-project.org/package=alfred and https://cran.r-project.org/package=mfGARCH
    • 2 It would be straightforward to allow for richer dynamics in the conditional mean. However, for daily return data a constant conditional mean is usually sufficient. For simplicity, in the following we refer to εi,t as the (demeaned) return.
    • 3 While we focus on multiplicative GARCH models, Han and Park (2014) and Han (2015) analyze the properties of a GARCH‐X specification with an explanatory variable that enters additively into the conditional variance equation. See also Francq and Thieu (2019).
    • 4 Han (2015) obtains a similar result for the sample kurtosis of the returns from a GARCH‐X model with a covariate that can either be stationary or nonstationary.
    • 5 Note that urn:x-wiley:jae:media:jae2742:jae2742-math-0049 reduces to the ACF of a (symmetric) GARCH(1,1) when γ=0 (see Karanasos, 1999).
    • 6 Again, Han (2015) also obtains a bicomponent structure for the sample ACF of the squared returns from a GARCH‐X model with a fractionally integrated covariate. Similarly, Han and Kristensen (2015) show that the empirical ACF in a multiplicative model can display long‐memory‐type behavior.
    • 7 Haas et al. (2004) consider a symmetric GARCH. Hence the persistence in the GARCH component is α+β.
    • 8 The underlying data will be described in detail in Section 4.1.
    • 9 To illustrate the severeness of the noise, consider an example with urn:x-wiley:jae:media:jae2742:jae2742-math-0077. Then urn:x-wiley:jae:media:jae2742:jae2742-math-0078 will either over‐ or underestimate the true urn:x-wiley:jae:media:jae2742:jae2742-math-0079 by more than 50% with a probability of about 74%.
    • 10 See Andersen, Bollerslev, and Meddahi (2005) for a model‐free adjustment procedure for the predictive R2.
    • 11 Although by assumption k ≤ It in our setting, we can think of, for example, a semiannual period and daily volatility forecasts. In this case k can be at most 132 (=6·22). For such a large k and under reasonable assumptions on the GARCH parameters, we have urn:x-wiley:jae:media:jae2742:jae2742-math-0102.
    • 12 Alternatively, we simulated the intraday returns using a stochastic volatility model that is consistent with our GARCH‐MIDAS setting. The corresponding results, which are very similar to those based on the specification in Equation 21, are presented in Supporting Information Appendix E.
    • 13 Figure C.1 in the Supporting Information Appendix compares the histogram of the standardized parameter estimates over the 2,000 Monte Carlo replications with a standard normal distribution. The figure shows that for all parameters except w2 the empirical distribution of the parameter estimates is very well approximated by the normal distribution.
    • 13 We do not report results for K being chosen too large as the Beta weighting scheme is flexible enough to downweight uninformative lags to almost zero.
    • 14 In contrast, in the empirical forecast evaluation in Section 4.4 we apply a “rolling scheme.” As we will discuss below, this is important because it takes into account the real‐time nature of the data and allows for changes in the model parameters.
    • 15 In‐sample parameter estimates for the MS‐GARCH‐TVI model can be found in the Supporting Information Appendix, Table B.1. The median estimates of α and β are close to the true values. The estimates of ω1 and ω2 represent a low‐ and a high‐volatility regime. As measured by ϖ=p1,1+p2,2−1, the degree of persistence in the long‐term component is very high.
    • 16 For implementing the MCS procedure, we use the R package rugarch (Ghalanos, 2018), which includes the implementation used in the MFE Matlab Toolbox by Kevin Sheppard. See https://www.kevinsheppard.com/MFE_Toolbox.
    • 16 Note that the conventional definition of the variance risk premium is the squared VIX minus realized variance. We are interested in expressing the quantity in volatility units. Because the realized VRP takes positive as well as negative values, we take the square root of both quantities before we take the difference.
    • 17 Table B.2 in the Supporting Information Appendix provides summary statistics for the stock returns and the seven explanatory variables. Figure C.2 in the Supporting Information Appendix shows the evolution of the corresponding time series. Further details on the data set are provided in Supporting Information Appendix F.
    • 17 For all variables, Figure C.3 in the Supporting Information Appendix shows the estimated weighting schemes for selected choices of K. The figure illustrates that the estimated weighting schemes no longer change once the selected lag length is sufficiently large. In all cases, our choice of the lag length is rather conservative.
    • 18 Figure C.4 in the Supporting Information Appendix shows the estimated weighting schemes.
    • 18 This behavior is also evident from Figure C.5 in the Supporting Information Appendix, which shows the evolution of the annualized long‐term components and the conditional volatilities.
    • 18 To the best of our knowledge, Lindblad (2017) appears to be the only other paper that makes use of real‐time data when estimating GARCH‐MIDAS models.
    • 19 For more details on real‐time data availability see Supporting Information Appendix F.
    • 20 Initially, we estimated a GJR‐GARCH specification in both regimes. However, it turned out that the asymmetry term was only significant in the component that represents the low‐volatility regime. In addition, we select this specification because it is much more stable in the rolling window estimation than the one with two GJR‐GARCH regimes.
    • 21 Table B.3 in the Supporting Information Appendix shows the full‐sample parameter estimates for the competitor models.
    • 22 As a robustness check, we present the corresponding results for a 95% MCS in Supporting Information Appendix H. Essentially all findings remain unaffected.
    • 21 For brevity, we now focus on a forecast horizon of 1 month.

    Number of times cited according to CrossRef: 4

    • Predicting the long-term stock market volatility: A GARCH-MIDAS model with variable selection, Journal of Empirical Finance, 10.1016/j.jempfin.2020.05.007, (2020).
    • Volatility Transmission from Equity, Bulk Shipping, and Commodity Markets to Oil ETF and Energy Fund—A GARCH-MIDAS Model, Mathematics, 10.3390/math8091534, 8, 9, (1534), (2020).
    • Can Mixed-Frequency Data Improve the Higher-Order Moments Portfolio Performance?, Emerging Markets Finance and Trade, 10.1080/1540496X.2020.1785862, (1-21), (2020).
    • A realized EGARCH-MIDAS model with higher moments, Finance Research Letters, 10.1016/j.frl.2019.101392, (101392), (2019).

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.