4.2.1. SPA Tests of OptionImplied Forecasts for Individual Stocks
In this section we present the SPA test results for all three individual stocks, IBM, MSFT and GE, with both MF and ATM used as respective benchmarks. Comparative results for the S&P500 index are reported in Section 4.2.5. In the spirit of Hansen and Lunde (2006a) and Patton (2006) we use a ‘robust’ criterion, to measure the accuracy of forecast j, namely mean squared forecast error (MSFE) for variance quantities, with . We provide results for one and 22 days ahead in Tables I and II respectively, with the maturity of the options used to construct MF and ATM matching the forecast horizon in the second case only. In both tables, the results for IBM, MSFT and GE are given respectively in Panels A, B and C. Across the columns of each table we order the eight measures , and associated results, according to the extent to which each measure accommodates noise and/or jumps. Specifically, we report results for measures that do not formally adjust for noise or jumps: RV(5) and RVA(5); measures that adjust for noise only: TSRV, TSRV2, RKERN, OSRV and ALTM; and the measure that adjust for both noise and jumps: BV. We annotate the results in the following way: (i) if a benchmark is rejected at the 5% level, the SPA pvalue appears in bold; (ii) in the case where a benchmark is rejected, the ‘most significant’ forecast model according to the pairwise ‘t statistics’ is indicated in parentheses in the line below;26 (iii) if a benchmark is not rejected and its MSFE loss is the smallest of that of all m + 1 models in the choice set, the pvalue is allocated a # superscript.27
Table I. SPA pvalues: forecasts based on a onedayahead forecast horizon. An optionimplied volatility forecast is used as benchmark: MF (model free) and ATM (atthemoney). The SPA test is based on a mean squared forecast error (MSFE) loss criterion, for variance quantities. For each dataset the number of models against which the benchmark model is compared (m), plus the number of observations in the forecast evaluation period from which the pvalues and sample loss are calculated (N), are as follows: IBM: m = 67; N = 1149; MSFT: m = 63; N = 1154; GE: m = 66; N = 1147. The model set always includes the optionimplied forecast that is the alternative to the one being tested as the benchmark. pvalues that are associated with rejection of the benchmark forecast at the 5% level are highlighted in bold font. In the case of rejection, the ‘most significant’ alternative forecast, according to the pairwise ‘t statistics’, is reported in parentheses below the pvalue. The acronym LM_{own(cross)} denotes a longmemory ARFIMA own (cross)forecast, while the acronym SM_{own(cross)} denotes a shortmemory ARMA own (cross)forecast. In the case where a benchmark is not rejected, the superscript # indicates that the forecast also has the smallest MSFE loss of all m + 1 forecasts in the choice setBenchmark  Measure to be forecasta 

RV(5)  RVA(5)  OSRV  RKERN  TSRV1  TSRV2  ALTM  BV 


Panel A: IBM 
MF  0.000  0.000  0.000  0.000  0.000  0.000  0.011  0.000 
(most sig.)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM) 
ATM  0.301  0.207  0.233  0.346  0.238  0.217  0.764^{#}  0.002 
(most sig.)   (SM_{cross}) 
Panel B: MSFT 
MF  0.014  0.012  0.006  0.014  0.001  0.001  0.364  0.025 
(most sig.)  (ATM)  (ATM)  (ATM)  (LM_{own})  (SM_{cross})  (LM_{cross})   (ATM) 
ATM  0.472  0.474  0.485  0.471  0.306  0.337  0.382  0.502 
Panel C: GE 
MF  0.003  0.000  0.000  0.075  0.041  0.015  0.121  0.000 
(most sig.)  (ATM)  (ATM)  (ATM)   (LM_{cross})  (ATM)   (ATM) 
ATM  0.934^{#}  0.941^{#}  0.935^{#}  0.740  0.560  0.626  0.404  0.804 
Table II. SPA pvalues: forecasts based on a 22dayahead forecast horizon. An optionimplied volatility forecast is used as benchmark: MF (model free) and ATM (atthemoney). The SPA test is based on a mean squared forecast error (MSFE) loss criterion, for variance quantities. For each dataset the number of models against which the benchmark model is compared (m), plus the number of observations in the forecast evaluation period from which the pvalues and sample loss are calculated (N) are as follows: IB M: m = 67; N = 1149; MSFT: m = 63; N = 1154; GE: m = 66; N = 1147. The model set always includes the optionimplied forecast that is the alternative to the one being tested as the benchmark. pvalues that are associated with rejection of the benchmark forecast at the 5% level are highlighted in bold font. In the case of rejection, the ‘most significant’ alternative forecast, according to the pairwise ‘t statistics’, is reported in parentheses below the pvalue. The acronym LM_{cross} denotes a longmemory ARFIMA cross forecast In the case where a benchmark is not rejected, the superscript # indicates that the forecast also has the smallest MSFE loss of all m + 1 forecasts in the choice setBenchmark  Measure to be forecasta 

RV(5)  RVA(5)  OSRV  RKERN  TSRV1  TSRV2  ALTM  BV 


Panel A: IBM 
MF  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 
(most sig.)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM) 
ATM  0.117  0.045  0.043  0.208  0.060  0.052  0.844^{#}  0.000 
(most sig.)   (LM_{cross})  (LM_{cross})   (LM_{cross}) 
Panel B: MSFT 
MF  0.000  0.000  0.000  0.000  0.000  0.000  0.038  0.000 
(most sig.)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM) 
ATM  0.571^{#}  0.961^{#}  0.934^{#}  0.585^{#}  0.870^{#}  0.940^{#}  0.579^{#}  0.592^{#} 
Panel C: GE 
MF  0.000  0.000  0.000  0.004  0.001  0.000  0.057  0.000 
(most sig.)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)  (ATM)   (ATM) 
ATM  0.964^{#}  0.960^{#}  0.946^{#}  0.984^{#}  0.956^{#}  0.927^{#}  0.993^{#}  0.816^{#} 
The results in Table I provide little evidence that the MF implied volatility is an accurate forecast of actual volatility one day ahead. For IBM the SPA test rejects at the 5% level for all eight measures of volatility. In all cases, ATM is the most ‘significant’ alternative, as based on the individual pairwise ‘t statistics’. For MSFT and GE there is support for MF using the ALTM measure, and a small amount of support in the case of GE using the RKERN measure also; however, in all other cases the MF benchmark is rejected, with ATM again the most ‘significant’ alternative in many instances. Both longmemory and shortmemory direct forecasts also feature as the most significant alternatives in some cases.
While the lack of support for the MF benchmark may, superficially, be unsurprising, given the mismatch between option maturity (22 trading days) and forecast horizon (one day), the results for the ATM benchmark provide a startling refutation of the maturity explanation. In all but one case (the BV measure for IBM) ATM is accepted as a superior forecast, with the pvalues all exceeding 0.2, usually well and truly so. In four cases the ATM is not only not rejected as benchmark, but also has the smallest MSFE loss of all models considered (as indicated by the ^{#} superscript).
Most importantly, given one of the main focuses of this paper, these qualitative results—strong support for ATM and lack of support for MF—are almost completely invariant to the measure used to proxy future volatility. This result is consistent with the robustness results reported by Ghysels and Sinko (2006), in the context of a more limited forecasting analysis of direct intraday returnsbased forecasts. The only result that really stands out here is the inability of ATM to forecast the ‘jumpfree’ BV measure for IBM, a result that contrasts with all other results in Table I related to this benchmark.
Given the particular maturity associated with the optionimplied forecasts—22 trading days—one would anticipate an improved performance when the forecast horizon matches that maturity. As indicated by the results reported in Table II, for the ATM forecast of MSFT and GE volatility this is indeed the case, with the pvalues for the ATM benchmark uniformly higher for the 22day forecast horizon than the corresponding pvalues for the oneday horizon, and close to one in many cases. Moreover, the ATM forecast has the lowest MSFE in the forecast set (again, as indicated by the ^{#} superscript) for all eight forecast variables, for both the MSFT and GE series. The results for IBM are less clearcut, although there is still support for the benchmark ATM for the majority of forecast measures. In contrast, the results for the MF benchmark are even weaker at the longer horizon, with only a single failure to reject MF as the superior forecast, across all series and all measures, and that support for MF being only marginal (pvalue = 0.057). Once again, both optionimplied volatilities fail to successfully predict the BV measure for IBM. The pvalue for the ATM forecast of the BV measure, in the case of GE, although very supportive of the ATM benchmark, is the smallest across the alternative measures. The corresponding pvalue is amongst the smallest in the case of MSFT.
As with the onedayahead predictions, there is some support for direct forecasts, in that for the three instances in which ATM is rejected as the benchmark model, a longmemory direct forecast is the ‘most significant’ according to the pairwise test. For the longer time horizon, shortmemory direct forecasts do not feature at all. For neither forecast horizon is any support given to the GARCHtype forecasts based on daily returns. Indeed, although these figures are not reported here, this category of model is consistently ranked amongst the worst performers in terms of MSFE, for all series and measures, and for both forecast horizons.
In the following section we attempt to shed some light on the contrast between the support for the ATM benchmark and the (overall) lack of support for the MF benchmark, by examining the option market information from which the forecasts have been extracted. In Section 4.2.3 we shed further light on the issue via reference to the analysis in Bollerslev and Zhou (2006) of the volatility risk premium.
4.2.2. Implied Volatility Curves
In Figure 1 (a), (c) and (e) we plot one particular volatility measure, OSRV, for each series, against MF.28 In the righthand panels, (b), (d) and (f) respectively, we plot MF against ATM for each series. The intraday measure reported is for the 22dayahead forecast horizon and all volatility measures (both realized and optionimplied) are graphed as annualized standard deviation figures.29 Four features in Figure 1, common to all three series, are immediately apparent: (i) there are two distinct subperiods: a highvolatility period from 30 August 2001 to (approximately) 30 July 2004, and a lowervolatility period from 2 August 2004 to 31 May 2006;30 (ii) the MF forecast tends to exceed realized volatility (overall), and by a greater amount in the high than in the lowvolatility period; (iii) the MF forecast tends to exceed the ATM forecast, again by a greater amount in the highvolatility period; (iv) the MF forecast is excessively noisy, relative to realized volatility, and more so than is the ATM forecast, again in the highvolatility period in particular.
The empirical features of OSRV, MF and ATM, for all three series, and for the full sample period and both subperiods identified here, are summarized in Table III. Using to represent OSRV, setting f_{t} = MF, ATM (as variance quantities), and using the decomposition of the MSFE as , we report sample estimates of the forecast bias and forecast error variance, and respectively, as well as the sample variance of the forecast itself, var(f_{t}). The numerical results clearly support the informal graphical evidence: MF is both a more biased forecast and has a larger forecast error variance than ATM, in particular over the highvolatility period. Most notably, the (magnitude of the) bias of MF is approximately twice as large as that for ATM in the highvolatility period, in the case of IBM and MSFT, and more than five times larger for the GE dataset. In the lowvolatility period, however, the corresponding bias and variance figures for both forecasts are much more similar, for MSFT and GE in particular. Both optionsbased forecasts overestimate actual volatility in both the high and lowvolatility sample periods.
Table III. Summary statistics for the two optionimplied forecasts, over the full sample and the high and lowvolatility subperiods; realized volatility measured by OSRVForecast (f_{t}):  IBM  MSFT  GE 

MF  ATM  MF  ATM  MF  ATM 

Full sample period (30 August 2001 to 31 May 2006) 
 − 0.0343  − 0.0190  − 0.0313  − 0.0115  − 0.0244  − 0.0060 
 0.0021  0.0012  0.0040  0.0023  0.0026  0.0020 
 0.0061  0.0038  0.0107  0.0060  0.0079  0.0046 
Highvolatility sample period (30 August 2001 to 30 July 2004) 
 − 0.0503  − 0.0271  − 0.0518  − 0.0194  − 0.0372  − 0.0073 
 0.0026  0.0018  0.0049  0.0030  0.0038  0.0032 
 0.0068  0.0044  0.0108  0.0060  0.0080  0.0049 
Lowvolatility sample period (2 August 2004 to 31 May 2006) 
 − 0.0095  − 0.0066  − 0.0065  − 0.0061  − 0.0045  − 0.0042 
 1.52e − 004  1.32e − 004  9.27e − 005  8.76e − 005  4.36e − 005  4.13e − 005 
 1.12e − 004  8.81e − 005  7.58e − 005  7.12e − 005  3.14e − 005  2.89e − 005 
From the high and lowvolatility subperiods we reproduce, in turn, a representative sequence of implied volatility curves from which both MF and ATM have been constructed, as per the explanation in Section 4.1. In Figure 2, all three curves, on each of four representative days from the highvolatility period, give higher implied volatility figures for each moneyness ratio, when compared with the comparable curves for the lowvolatility period in Figure 3. Moreover, the former also exhibit a much more pronounced curvature than the latter, with the volatilities associated with very low values for X/P_{t} (and, in some instances, those associated with very high values for X/P_{t}) exceeding the nearthemoney volatilities (X/P_{t} ≈ 1) by a large amount. This pattern reflects, in turn, both the existence of quotes for OTM put options (X/P_{t} low) and OTM calls (X/P_{t} high), plus the assignment of high values to some of those options. In a high volatility state the market thus places high value on options that pay off only if the asset price either rises or falls by a large amount, i.e., only if the present highvolatility state persists. A positive liquidity premium, associated with the relative lack of liquidity in farfromthemoney options, may also contribute to some of the high volatilities observed at the extreme ends of the moneyness spectrum. Only on one of the chosen days (17 May 2002) do all three implied volatility curves display the downward sloping skew pattern that is often a feature of equity option data.
Given that ATM is equated to the ordinate of the volatility curve at X/P_{t} = 1, and MF constructed from a formula that uses all ordinates, the reason why MF tends to exceed ATM by a large amount in the highvolatility period is clear. In addition, an examination of the sequence of implied volatility curves over the entire highvolatility period, of which the graphs in Figure 2 provide a snapshot, highlights a large degree of variation in the awayfromthemoney volatilities in particular, a feature that contributes to the large variation in MF reported in Table III, which contributes, in turn, to the large forecast error variance. Again, this noise in the awayfromthemoney volatilities is likely to be exacerbated by the lack of liquidity in options far from the money.
In contrast to the rather distinct smile shape that characterizes some of the curves in Figure 2, during the lowvolatility period highlighted in Figure 3 the curves tend to be skewed, the majority having the negative slope that typifies equity option graphs, and all exhibiting much less variation across the moneyness spectrum than the curves in Figure 2. The flat curves beyond certain narrow ranges around X/P_{t} = 1 indicate that no quotes on awayfromthemoney options are made at the end of the relevant day, with the implied volatilities at these boundary points simply being extrapolated to the outer boundaries of 0.5 and 1.5 (see Jiang and Tian, 2005). In the lowvolatility state, options that have positive payoffs only if P_{t} varies substantially from its current value, i.e., if volatility is high over the maturity of the option, are not traded. In this case, there is much less difference between the MF and ATM values, plus much less variation in the MF values, than during the highvolatility state.
In summary, close examination of the volatility smile information from which MF and ATM are extracted provides some explanation for both the discrepancy between the two measures and for the added variability in the MF measure, in particular in times of high volatility.31 In the following section we draw upon the insights of Bollerslev and Zhou (2006) in order to provide an explanation for the positive bias in both measures and for the fact that the magnitude of that bias is larger in the highvolatility period.
4.2.3. Forecasting Bias: Implied Volatility Risk Premium
Bollerslev and Zhou (2006) demonstrate that under the assumption of the square root stochastic volatility model of Heston (1993), the coefficients in the regression
 (16)
are functions of the parameters of the riskneutralized version of the distribution with respect to which in (16) is defined. We refer readers to Bollerslev and Zhou for details of the objective and riskneutral distributions in question and the links between them. It is sufficient to note here that for standard values of the objective parameters, the negative market price of volatility risk that is observed empirically (e.g., Guo, 1998; Eraker, 2004; Forbes et al., 2007) leads unambiguously to ϕ_{1} < 1. Translated into the option context, the negative price means that the riskneutralized distribution for volatility reverts more slowly to a higher longrun mean, in comparison with the objective distribution. That is, option prices have a positive premium factored in, as a consequence of stochastic volatility. It is this positive premium that leads to the implied volatility measure exceeding, on average, the objective measure of volatility, with the bias in the forecasting regression in (16) being a manifestation of the deviation between the two forms of volatility. As Bollerslev and Zhou demonstrate via simulation experiments, this qualitative result is unaffected by the estimation of using observed intraday returns. The empirical results reported in the previous section, in which both optionimplied forecasts have positive bias with respect to one particular estimate of , namely OSRV, support this finding.32
The assumption of an underlying stochastic volatility process for returns is completely consistent with the implied volatility patterns observed in practice, including for the data analysed here. That is, implied volatility smiles/skews can be linked to the fat tails (and/or skewness) that characterize empirical returns—characteristics that, in turn, can be associated with a stochastic volatility process (see, for example, Heston, 1993; Bakshi et al., 1997; Bates, 2000). The particular shape of the implied volatility curve can be linked to features of the underlying stochastic volatility process, most notably the degree of volatility and the magnitude (and sign) of the instantaneous correlation between volatility and returns. The varying shapes observed over the sample period considered are suggestive of an underlying stochastic volatility model with timevarying parameters, although we attempt no formal investigation here of that observation.33 Certainly, the varying degree of bias, in particular between the high and lowvolatility periods, is indicative of a timevarying risk premium that is a positive function of the level of actual volatility. This empirical feature is consistent with the linear (in volatility) risk premium that is adopted in the Heston stochastic volatility model, along with a negative value for the relevant risk premium parameter (see Carr and Wu, 2004; Bollerslev et al., 2008).
It is the MF measure that is formally consistent with an underlying stochastic volatility models for returns and, hence, legitimately affected by any volatility risk premium via its method of calculation, whereby all available smile information is used. The ATM forecast, on the other hand, approximated by an implied volatility at a single point in the moneyness spectrum, does not formally factor in a risk premium and, as a consequence, exhibits less bias as a forecast of actual volatility, as attested to by the results in Table III.34
In summary, then, any potential additional forecast accuracy associated with the added flexibility of the assumptions underlying the MF forecast appears to be offset by the bias and noise which beset its calculation in practice. As such, it is of interest to ascertain whether or not a truncated version of MF, which retains some of the smile information, but not all, manages to outperform ATM. We investigate this in the following section by reporting SPA test results for three modified versions of MF.
4.2.4. SPA Tests of Truncated MF Forecasts
In Table IV we present the SPA pvalues associated with the 22dayahead forecasts using five benchmarks: MF and ATM, plus three truncated versions of MF, denoted by MF_{1.5}, MF_{2.0} and MF_{2.5}. The benchmark MF_{1.5}, for example, is the estimate of MF produced from implied volatilities within the moneyness range . The benchmarks MF_{2.0} and MF_{2.5} are defined correspondingly.35 We produce the test results for the full sample period (panel A), as well as results for the lowvolatility period identified in Section 4.2.2 (panel B), the idea here being that the reduced bias and variation in all MF estimates in this latter period may lead to these benchmarks being given more support by the SPA test. The results for benchmarks MF and ATM are reproduced under the expanded model set in which MF_{1.5}, MF_{2.0} and MF_{2.5} are included as alternatives. Hence, the results in the rows headed MF and ATM in Table IV differ in some cases from the corresponding results reported in Table II. In order to reduce the number of results reported, we focus on only three measures for each series: RKERN, ALTM and BV.
Table IV. SPA pvalues: forecasts based on a 22dayahead forecast horizon. Alternative optionimplied volatility forecasts are used as benchmark: MF, MF_{2.5}, MF_{2.0}, MF_{1.5} and ATM. The SPA test is based on a mean squared forecast error (MSFE) loss criterion, for variance quantities, with three alternative measures used for the actual volatility: RKERN, ALTM and BV. Results are produced for the full sample and lowvolatility periods. pvalues that are associated with rejection of the benchmark forecast at the 5% level are highlighted in bold font. In the case of rejection, the ‘most significant’ alternative forecast, according to the pairwise ‘t statistics’, is reported in parentheses below the pvalue. The model set always includes all of the optionimplied forecasts that are alternatives to the one being tested as the benchmark. Hence, the model set for each series underlying the results in Tables I and II is augmented by three here, to cater for the three additional versions of MF. The acronym LM_{cross} denotes a longmemory ARFIMA cross forecast. In the case where a benchmark is not rejected, the superscript # indicates that the forecast also has the smallest MSFE loss of all m + 1 forecasts in the choice setBenchmark  Measure to be forecasta 

IBM  MSFT  GE 

RKERN  ALTM  BV  RKERN  ALTM  BV  RKERN  ALTM  BV 


Panel A: Full sample period (30 August 2001 to 31 May 2006) 
MF  0.000  0.000  0.000  0.000  0.017  0.000  0.000  0.000  0.000 
(most sig.)  (MF_{2.0})  (MF_{2.0})  (MF_{1.5})  (ATM)  (MF_{1.5})  (ATM)  (MF_{2.5})  (MF_{2.5})  (MF_{1.5}) 
MF_{2.5}  0.000  0.000  0.000  0.000  0.017  0.000  0.001  0.004  0.000 
(most sig.)  (MF_{1.5})  (MF_{2.0})  (MF_{1.5})  (ATM)  (MF_{1.5})  (ATM)  (MF_{2.0})  (MF_{2.0})  (MF_{2.0}) 
MF_{2.0}  0.000  0.000  0.000  0.000  0.003  0.000  0.004  0.088  0.000 
(most sig.)  (MF_{1.5})  (MF_{1.5})  (MF_{1.5})  (MF_{1.5})  (MF_{1.5})  (MF_{1.5})  (MF_{1.5})   (MF_{1.5}) 
MF_{1.5}  0.000  0.000  0.000  0.001  0.316  0.001  0.595  1.000^{#}  0.023 
(most sig.)  (ATM)  (ATM)  (ATM)  (ATM)   (ATM)   (ATM) 
ATM  0.208  0.844^{#}  0.000  0.649^{#}  0.753^{#}  0.634^{#}  0.945^{#}  0.216  0.855^{#} 
(most sig.)   (LM_{cross})  
Panel B: Low volatility period (2 August 2004 to 31 May 2006) 
MF  0.000  0.000  0.000  0.001  0.147  0.001  0.009  0.002  0.000 
(most sig.)  (ATM)  (ATM)  (ATM)  (MF_{1.5})   (LM_{cross})  (MF_{1.5})  (LM_{cross})  (LM_{cross}) 
MF_{2.5}  0.000  0.000  0.000  0.001  0.153  0.001  0.013  0.003  0.000 
(most sig.)  (ATM)  (ATM)  (ATM)  (MF_{1.5})   (LM_{cross})  (MF_{1.5})  (LM_{cross})  (LM_{cross}) 
MF_{2.0}  0.000  0.000  0.000  0.001  0.075  0.001  0.009  0.002  0.000 
(most sig.)  (MF_{1.5})  (MF_{1.5})  (MF_{1.5})  (MF_{1.5})   (LM_{cross})  (MF_{1.5})  (LM_{cross})  (LM_{cross}) 
MF_{1.5}  0.000  0.000  0.000  0.005  0.163  0.001  0.008  0.003  0.000 
(most sig.)  (ATM)  (ATM)  (ATM)  (ATM)   (LM_{cross})  (LM_{cross})  (LM_{cross})  (LM_{cross}) 
ATM  0.028  0.287  0.000  0.007  0.140  0.000  0.009  0.002  0.000 
(most sig.)  (LM_{cross})   (LM_{cross})  (LM_{cross})   (LM_{cross})  (LM_{cross})  (LM_{cross})  (LM_{cross}) 
For the full sample period, the truncation of the smile used to estimate the MF implied volatility does nothing to improve its forecast performance in the case of IBM. The MF_{1.5} benchmark is given limited support for GE and MSFT (for the ALTM volatility measure in particular). However, overall, the ATM forecast remains dominant, even when the model set is expanded to include the added variants of MF.36 For the lowvolatility period, as would be anticipated from the results recorded in Table III, the performance of both forms of optionimplied forecasts (ATM, plus all variants of MF) is more similar, overall, than is their performance for the full period. However, rather than the performance of the MF forecasts improving when assessed over the lowvolatility period, both the ATM and MFtype forecasts are now rejected as benchmarks in virtually all cases. Only for a single measure (ALTM for the IBM and MSFT series) is there any support for an optionimplied forecast in the low volatility period.
As is consistent with earlier results, it is the BV measure which has the smallest pvalues overall in Table IV, with the majority being zero to three decimal places. As was also the case for the earlier results, a longmemory direct forecast sometimes features as the most significant alternative according to a pairwise test. This is most notable in the lowvolatility period. However the superiority of any particular longmemory forecast, taking into account the multiple alternative forecasts, would need to be formally verified by conducting SPA tests of longmemory benchmarks.
4.2.5. SPA Tests for the S&P500 Index
The small amount of work that has assessed the forecasting performance of the MF implied volatility has done so without formal account being taken of multiple alternative forecasts; see Jiang and Tian (2005) and Bollerslev and Zhou (2006). The analysis has also focused on the volatility of the S&P500 index, with the MF implied volatility being proxied by the VIX in the case of Bollerslev and Zhou. The results reported in Jiang and Tian, in which the MF method is compared with the BS method, give some support to MF. This result is thus in conflict with our SPA test results, which cast doubt on the usefulness of the MF method in forecasting the volatility of individual stocks. It is of interest, therefore, to assess the robustness of our SPAbased conclusions to the shift from individual equities to the index, in particular given that the MF formula is designed for the Europeanstyle option data associated with the index. Given that the different forms of noise adjustments that have been used in this paper have their prime motivation in the case of data on traded assets, rather than observations on a constructed index, we conduct SPA tests of the S&P500 implied volatility measures for the case where actual volatility is measured by RV(5) and BV only.37
In Figure 4 we plot, respectively, RV(5) and MF, RV(5) and BS, MF and VIX, and MF_{2.5} and VIX, for the 22dayahead forecast horizon. As is evident from panels (a) and (b), both implied volatility forecasts are very biased, even more so than was the case with the individual stocks. This is consistent with a substantial risk premium being factored into the index options. Panel (c) demonstrates the accuracy with which the VIX reproduces the MF method, with the truncated MF_{2.5} being virtually indistinguishable from the CBOE measure in panel (d). SPAbased tests of all five benchmarks used in the previous section were conducted, in addition to the test for the VIX benchmark. The tests were conducted over the full and lowvolatility periods. The results (not reported here) provide a resounding rejection of all implied volatility benchmarks, with all pvalues (to several decimal places) being equal to zero.