2.1. Assumed Data Structure
Our analysis will be based on daily financial returns:
and a corresponding sequence of daily realised measures:
Realised measures are theoretically sound high-frequency, nonparametric-based estimators of the variation of the price path of an asset during the times at which the asset trades frequently on an exchange. Realised measures ignore the variation of prices overnight and sometimes the variation in the first few minutes of the trading day when recorded prices may contain large errors. The background to realised measures can be found in the survey articles by Andersen et al. (2009) and Barndorff-Nielsen and Shephard (2007).
The simplest realised measure is realised variance:
where tj, t are the normalised times of trades or quotes (or a subset of them) on the tth day. The theoretical justification of this measure is that if prices are observed without noise then, as minj|tj, t − tj−1, t|↓0, it consistently estimates the quadratic variation of the price process on the tth day. It was formalised econometrically by Andersen et al. (2001a) and Barndorff-Nielsen and Shephard (2002). In practice, market microstructure noise plays an important part and the above authors use 1- to 5-minute return data or a subset of trades or quotes (e.g. every 15th trade) to mitigate the effect of the noise. Hansen and Lunde (2006) systematically study the impact of noise on realised variance. If a subset of the data is used with the realised variance, then it is possible to average across many such estimators each using different subsets. This is called subsampling. When we report RV estimators we always subsample them to the maximum degree possible from the data, as this averaging is always theoretically beneficial, especially in the presence of modest amounts of noise.
Three classes of estimators which are somewhat robust to noise have been suggested in the literature: pre-averaging (Jacod et al., 2009), multiscale (Zhang, 2006; Zhang et al., 2005) and realised kernel (Barndorff-Nielsen et al., 2008).2 Here we focus on the realised kernel in the case where we use a Parzen weight function. It has the familiar form of a HAC type estimator (except that there is no adjustment for mean and the sums are not scaled by their sample size):
where k(x) is the Parzen kernel function:
It is necessary for H to increase with the sample size in order to consistently estimate the increments of quadratic variation in the presence of noise. We follow precisely the bandwidth choice of H spelt out in Barndorff-Nielsen et al. (2009a), to which we refer the reader for details. This realised kernel is guaranteed to be non-negative, which is quite important as some of our time series methods rely on this property.3
We will write a sequence of daily returns as r1, r2, …, rT, while we will use to denote low-frequency past data. A benchmark model for time-varying volatility is the GARCH model of Engle (1982) and Bollerslev (1986), where we assume that
This can be extended in many directions, for example allowing for statistical leverage. The persistence of this model, αG + βG, can be seen through the representation
since is a martingale difference with respect to .
Our focus is on additionally using some daily realised measures. The models we will analyse will be called ‘HEAVY models’ (High-frEquency-bAsed VolatilitY models) and are made up of the system
where is used to denote the past of rt and RMt, that is, the high-frequency dataset. The most basic example of this is the linear model
These semiparametric models could be extended to include on the right-hand side of both equations the variable (see the discussion above (5) in a moment) but we will see these variables typically test out. Hence it is useful to focus directly on the above model.4 Other possible extensions include adding a more complicated dynamic to (4), such as a component structure with short- and long-term components, a fractional model, allowing for statistical leverage type effects, or a Corsi (2009) type approximate long-memory model.
Note that (3) models the close-to-close conditional variance, while (4) models the conditional expectation of the open-to-close variation.
It will be convenient to have labels for the two equations in the HEAVY model. We call (3) the HEAVY-r model and (4) the HEAVY-RM model. Econometrically it is important to note that GARCH and HEAVY models are non-nested.
It is helpful to solve out explicitly stationary HEAVY-r model and GARCH models as
In applied work we will typically estimate β to be around 0.6–0.7 and ω to be small. Thus the HEAVY-r's conditional variance is roughly a small constant plus a weighted sum of very recent realised measures. In estimated GARCH models in our later empirical work βG is usually around 0.91 or above, so it has much more memory and thus it averages more data points.
Note that, unlike GARCH models, the HEAVY-r model has no feedback and so the properties of the realised measures determine the properties of .
The predictive model for the times series of realised measures is not novel. The work of Andersen et al. (2001a,b, 2003, 2007) typically looked at using least squares estimators of autoregressive cousins discussed in (4) or their logged transformed versions. These authors also emphasised the evidence for long memory in these time series and studied various ways of making inference for those types of processes. Some of this work uses the model of Corsi (2009), which is easy to estimate and mimics some aspects of long memory.
Engle (2002) estimated GARCHX type models, which specialise to (3), based on realised variances computed using 5-minute returns. He found the coefficient on to be small. He also fitted models like (4) but again including lagged square daily returns. He argues that the squared daily return helps forecast the realised variance, although there is some uncertainty over whether the effect is statistically significant (see his footnote 2). He did not, however, express (3)–(4) as a simple basis for a multistep-ahead forecasting system. Lu (2005) looked at extensions of GARCH models allowing the inclusion of lagged realised variance. He provides extensive empirical analysis of these GARCHX models.
Engle and Gallo (2006) extended Engle (2002) to look at multiple volatility indicators, trying to pool information across many indicators including daily ranges, rather than focusing solely on theoretically sound high-frequency-based statistics. They then relate this to the VIX. In that paper they do study multistep-ahead forecasting using a trivariate system which has daily absolute returns, daily range and realised variance (computed using 5-minute returns for the S&P500). Their estimated models are quite sophisticated with, again, daily returns playing a large role in predicting each series. These results are at odds with our own empirical experience expressed in Section 4. Some clues as to why this might be the case can be seen from their Table I, which shows realised volatility having roughly the same average level as absolute returns and daily range but realised volatility being massively more variable and having a very long right-hand tail. Further, their out-of-sample comparison was based only on 217 observations, which makes their analysis somewhat noisy. Perhaps these two features distracted from the power and simplicity of using realised measures in HEAVY type models.
Table I. A description of the ‘OMI's realised library’, version 0.1. The table shows how each measure is built and the length of time series available, denoted T. ‘Med dur’ denotes the median duration in seconds between price updates during September 2008 in our database. All data series stop on 27 March 2009
|Dow Jones Industrials||2||2-1-1996||3278||MSCI Australia||60||2-12-1999||2323|
|Nasdaq 100||15||2-1-1996||3279||MSCI Belgium||60||1-7-1999||2442|
|S&P 400 Midcap||15||2-1-1996||3275||MSCI Brazil||60||4-10-2002||1587|
|S&P 500||15||2-1-1996||3284||MSCI Canada||60||12-2-2001||2013|
|Russell 3000||15||2-1-1996||3279||MSCI Switzerland||60||9-6-1999||2434|
|Russell 1000||15||2-1-1996||3279||MSCI Germany||60||1-7-1999||2448|
|Russell 2000||15||2-1-1996||3281||MSCI Spain||60||1-7-1999||2423|
|CAC 40||30||2-1-1996||3322||MSCI France||60||1-7-1999||2455|
|FTSE 100||15||20-10-1997||2862||MSCI UK||60||8-6-1999||2451|
|German DAX||15||2-1-1996||3317||MSCI Italy||60||1-7-1999||2437|
|Italian MIBTEL||60||3-7-2000||2194||MSCI Japan||15||2-12-1999||2240|
|Milan MIB 30||60||2-1-1996||3310||MSCI South Korea||60||3-12-1999||2263|
|Nikkei 250||60||5-1-1996||3177||MSCI Mexico||60||4-10-2002||1612|
|Spanish IBEX||5||2-1-1996||3288||MSCI Netherlands||60||1-7-1999||2454|
|S&P TSE||15||31-12-1998||2546||MSCI World||60||11-2-2001||2101|
|British pound||2||3-1-1999||2584|| || || || |
|Euro||1||3-1-1999||2600|| || || || |
|Swiss franc||3||3-1-1999||2579|| || || || |
|Japanese yen||2||3-1-1999||2599|| || || || |
Brownlees and Gallo (2009) look at risk management in the context of exploiting high-frequency data. Their model, in Section 5 of their paper, links the conditional variance of returns to an affine transform of the predicted realised measure. In particular, their model has a HEAVY type structure but instead of using ht = ω+ αRMt−1 + βht−1 they model ht = ωB + αBµt. That is, they place in the HEAVY-r equation a smoothed version µt of the lagged realised measures where the smoothing is chosen to perform well in the HEAVY-RM equation, rather than the raw version which is then smoothed through the role of the momentum parameter β (which is optimally chosen to perform well in the HEAVY-r equation). Although these models are distinct, they have quite a lot of common thinking in their structure. Maheu and McCurdy (2009) have similarities with Brownlees and Gallo (2009), but focusing on an even more tightly parameterised model working with open-to-close daily returns (i.e., ignoring overnight effects) where realised variance captures much of the variation of the asset price. Giot and Laurent (2004) looks at some similar types of models. Bollerslev et al. (2009) model multiple volatility indicators and daily returns, where the return model has a conditional variance which is contemporaneous realised variance.
Finally, for some data the realised measure is not enough to entirely crowd out the lagged squared daily returns. In that case it makes sense to augment the HEAVY-r model into its extended version:
This could be thought of as a GARCHX type model, but that name suggests it is the squared returns which drives the model, whereas in fact in our empirical work it is the lagged realised measure which does almost all the work at moving around the conditional variance, even on the rare occasions that γX is estimated to be statistically significant. There seems little point in extending the HEAVY-RM model in the same way.
2.3. Representations and Dynamics
2.3.1. Multiplicative Representation
The vector multiplicative representation of HEAVY models rewrites (3) and (4) as
Such representations are the key behind the work of Engle (2002) and Engle and Gallo (2006). They are powerful as (εt, ηt)′− (1, 1)′ is a martingale difference with respect to .5
The dynamic structure of the bivariate model can be gleaned from writing
Hence this process is driven by a common factor RMt − µt, which is itself a martingale difference sequence with respect to .
The memory in the HEAVY model is governed by
This has two eigenvalues (e.g. Golub and Van Loan, 1989, p. 333): β, which we call a momentum parameter (a justification for this name will be given shortly), and αR + βR, which is the persistence parameter of the realised measure. In empirical work we will typically see β to be around 0.6 and the persistence parameter being close to but slightly less than one, so αR + βR governs the implied memory of at longer lags. The persistence parameter will be close to that seen for estimated αG + βG for GARCH models.
The role of β is interesting. In typical GARCH models the main feature is that the current value of conditional variance monotonically mean reverts to the long-run average value as the forecast horizon increases. In HEAVY models this is not the case because of β.
2.3.2. Dynamics of the Process
The HEAVY model can be solved out to imply the autocovariance function of the squared returns. This seems of little practical interest but allows some theoretical insights.
Assume that αR, βR, β∈[0, 1) and αR + βR < 1. Define , uRt = RMt − µt, which under the model are martingale difference sequences with respect to . We can write out the process for the from a HEAVY model as
where L is the lag operator. Therefore
Combining delivers the result
If we assume that
exists then ξt has a zero-mean weak MA(2) representation and is weak GARCH(2,2) in the sense of Drost and Nijman (1993). The autoregressive roots of are β and αR + βR, so are real and positive. A biproduct of the derivation of these results is the VARMA(1,1) representation
and the equilibrium correction form (see Hendry, 1995):
An important aspect of the above result is that the memory parameters in the MA(2) depend upon the covariance matrix of (ut, uRt).
The weak GARCH(2,2) representation has some similarities with the component model of Engle and Lee (1999, equations (2.4) and (2.5)), which models
The qt process is called the long-term component and the transitory component of the conditional variance. Thus we expect ρC to be close to one and αC + βC to be substantially less than one.
An importance aspect of the marginal process is that
This makes plain the role of β in generating momentum. It can push αR + βR + β above one, heightening significant moves in the volatility, while αR + βR < 1 causes it to mean revert. If β = 0 then becomes a weak GARCH(1,2) and has no momentum, although the realised measure still drives volatility. The component model of Engle and Lee (1999) is also a weak GARCH(1,2) if ρC = 0. The sophisticated model of Engle and Gallo (2006) is capable of generating momentum effects, of course.
If βR = β then
so we can divide through by (1 − βRL) to produce
Hence under that constraint the is a weak GARCH(1,1) model.
2.3.4. Integrated HEAVY Models
The marginal process (8) can be rewritten in equilibrium correction form as
where Δ is the difference operator. In practice the coefficients on the level and difference are likely to be slightly negative and close to β, respectively.
Clements and Hendry (1999) have argued that most economic forecasting failure is due to shifts in long-run relationships and so this can be mitigated by imposing unit roots on the model. In this context this means setting (1 − β)(1 − αR − βR)to be zero. In order to avoid β being set to one, this is achieved by setting αR + βR = 1, and killing the intercept ωR (otherwise the intercept becomes a trend slope). The resulting forecasting model would then be based around
which has momentum but no mean reversion. This type of model would not be upset by structural changes in the level of the process. Imposing the unit root in GARCH type models is usually associated with the work of RiskMetrics, but that analysis does not have any momentum effects. Hence such a suggestion looks novel in the context of volatility models. It would imply using a HEAVY model of the type, for example, of
We call this the ‘integrated HEAVY model’. We will see later that this very simple model can generate reliable multiperiod forecasts.
2.3.5. Iterative Multistep-Ahead Forecasts
Multistep-ahead forecasts of volatility are very important for asset allocation or risk assessment since these tasks are usually carried out over multiple days. For one-step-ahead forecasts of volatility we only need (3), but for the multistep equation (4) plays a central role.
For s ≥ 0, from the martingale difference representation, we have
Write ϑ = (αR + βR). It has two roots β and αR + βR. Further
Of course, of interest is the integrated variance prediction . We will assume this can be simplified to
which would mean (11) could be used to compute it.
2.3.6. Targeting Reparameterisation
In the case of a stationary HEAVY model there are some advantages in reparameterising the equations in the HEAVY model so the intercepts are explicitly related to the unconditional mean of squared returns and realised measures. In the HEAVY-RM model this is easy to do as
so that E(RMt) = µR. For the HEAVY-r equation it is less clear since the realised measure is likely to be a biased downward measure of the daily squared return (due to overnight effects). Writing then we can set
Taken together we call (13) and (12) the ‘targeting parameterisation’ for the HEAVY model.
This parameterisation of the HEAVY model has the virtue that it is possible to use the estimators6
of µR, µ and κ. Thus this reparameterisation is the HEAVY extension of variance targeting introduced by Engle and Mezrich (1996). When these estimators are plugged into the quasi-likelihood functions it makes optimisation easier, as the dimension is smaller, but it does alter the resulting asymptotic standard errors. This is discussed in the next subsection.
2.4. Inference for HEAVY Based Models
2.4.1. Quasi-likelihood Estimation
Inference for HEAVY models is a simple application of multiplicative error models discussed by Engle (2002), who uses standard quasi-likelihood asymptotic theory.
The HEAVY model has two equations:
We will estimate each equation separately, which makes optimisation straightforward. No attempt will be made to pool information across the two equations, although more information is potentially available if this was attempted (see the analysis of Cipollini et al., 2007).
The first equation will be initially estimated using a Gaussian quasi-likelihood:
where we take .
The second equation will be estimated using the same structure with
where we take .
In inference we will regard the parameters as having no link between the HEAVY-r and HEAVY-RM models, i.e. (ω, ψ) and (ωR, ψR) are variation free (e.g. Engle et al., 1983), which we will see in the next subsection is important for inference. It then follows that equation-by-equation optimisation is all that is necessary to maximise the quasi-likelihood. This is convenient as existing GARCH type code can simply be used in this context. We will write θ = (ω, ψ′, ωR, ψR′)′ and the resulting maximum of the quasi-likelihoods as .
The alternative targeting parameterisation has
so that E(RMt) = µR and . This has the virtue that we can employ a two-step approach, first setting
and then we compute
This reduces the dimension of the optimisations by one each time; this has the disadvantage that the two equations are no longer variation-free, which complicates the asymptotic distribution.
2.4.2. Quasi-likelihood Based Asymptotic Distribution
Inference using robust standard errors is standard in this context of (14) and (15). We stack the scores so that
where θ = (λ′, λR′)′. Then if we denote the point in the parameter space where the model (3) and (4) holds as θ* then under the model
that is, mt(θ*) is a martingale difference sequence with respect to . Under standard quasi-likelihood conditions we have
where the Hessian is
The block diagonality of (16) is due to the variation-free property of the parameters, while it is not necessary to use an HAC estimator in (17) due to the martingale difference features of the stacked scores. This is a straightforward application of quasi-likelihood theory and can be viewed as an extension of Bollerslev and Wooldridge (1992) and is discussed extensively in Cipollini et al. (2007).
The most important implication of the block diagonality of the Hessian (16) is that the equation-by-equation standard errors for the HEAVY-r and HEAVY-RM are correct, even when viewing the HEAVY model as a system. This means that standard software can be used to compute them.
When the two-step approach is used on the targeting parameterisation then the moment conditions change to
The moment conditions are no longer martingale difference sequences, but they do have a zero mean for all values of t at the true parameter point:
while needs to be an HAC estimator applied to the time series of mt(θE).
2.4.3. Non-nested Tests
One natural way to assess the forecasting power of the HEAVY model is to compare it to that generated by the GARCH model. This can be assessed at distinct horizons by comparing the performance using the QLIK loss function:
where is the proxy used for the time t + s (latent) variance and is some predictor made at time t − 1. This loss function has been shown to be robust to certain types of noise in the proxy in Patton (2009) and Patton and Sheppard (2009a). It will later be used to compare the forecast performance of non-nested volatility models. Also important is the cumulative loss function, which we take as
which is distinct from the cumulative sum of losses. This uses the s-period realised variance as the observations.
The temporal average (s + 1)-step-ahead relative loss between a HEAVY and GARCH model will be
Here ht+s|t−1 is the forecast from the HEAVY model, is the corresponding GARCH forecast and f(x|µ, σ2) denotes a Gaussian density with mean µ and variance σ2, evaluated at x. The framework will allow both the HEAVY and GARCH model to be estimated using QML techniques. The HEAVY model will be favoured if is negative.
estimates Ls = E(Lt, s), s = 0, 1, …, S, for each s, the unconditional average likelihood ratio between the two models. The HEAVY model will be favoured at s-steps if Ls < 0 and the GARCH model if Ls > 0. We will say that the HEAVY model forecast-dominates the GARCH model if Ls < 0 for all s = 1, 2, …, S. ‘Weakly forecast-dominates’ means that Ls⩽0 for all s = 1, 2, …, S with at least one of the ⩽ relationships being a strict inequality. This approach follows the ideas of Cox (1961b) on non-nested testing using the Vuong (1989) and Rivers and Vuong (2002) implementation.7
The above scheme can be implemented if Lt, s (evaluated at their pseudo-true parameter values) is sufficiently weakly dependent to allow the parameter estimates of the HEAVY and GARCH models to obey a standard Gaussian central limit theorem (e.g. Rivers and Vuong, 2002). Then
where Vs is the long-run variance of the Lt, s. The scale Vs has to be estimated by an HAC estimator (e.g. Andrews, 1991).
2.4.4. Horizon-Tuned Estimation and Evaluation
Having multistep-ahead loss functions suggests separately estimating the model at each forecast horizon by minimising expected loss at that horizon. This way of tuning the model to produce multistep-ahead forecasts is called ‘direct forecasting’ and has been studied by, for example, Marcellino et al. (2006) and Ghysels et al. (2009). The former argue direct forecasting may be more robust to model misspecification than iterating one-period-ahead models, although they find iterative methods more effective in forecasting for macroeconomic variables in practice. Direct forecasting dates at least to Cox (1961a). Marcellino et al. (2006) provide an extensive discussion of the literature.
Minimising the QLIK multistep-ahead loss can be thought of as maximising a distinct quasi-likelihood for each value of s:
where the quasi-likelihood is the Gaussian likelihood based on multistep-ahead forecasts. This delivers the sequence of horizon-tuned estimators , , , , whose standard errors can be computed using the usual theory of quasi-likelihoods. In practice, because of the structure of our HEAVY model, by far the most important of these equations is the second one, which allows horizon tuning for the HEAVY-RM forecasts.8 The same exercise can be carried out for a GARCH model.
Like GARCH models, a drawback of HEAVY models is that they only specify the conditional means of and RMt given . It is sometimes helpful to give the entire forecast distributions:
A simple way of carrying this out is via a model-based bootstrap. We use the representation , RMt = ηtµt, , and then assume that . Typically these bivariate variables will be contemporaneously correlated. For equities we would expect a sharp negative correlation reflecting statistical leverage. If we had knowledge of Fζ, η it would be a trivial task to carry out model-based simulation from (19) or (20).
We can estimate the joint distribution function Fζ, η by simply taking the filtered (ht, µt)′ and computing the devolatilised9
and computing the empirical distribution function . Then we can sample with replacement pairs from this population,10 which can then be used to drive a simulated joint path of the pair (rt, RMt)′, (rt+1, RMt+1)′, …, (rt+s, RMt+s)′. Discarding the drawn realised measures gives us paths of daily returns rt, rt+1, …, rt+s. Carrying out this simulation many times approximates the predictive distributions.