In this section, we illustrate an alternative approach which does not require to assume the existence of an option observed without measurement error. Assuming that each observed NP is affected by observation noise is less arbitrary, and seems more natural, but it complicates the evaluation of the sample loglikelihood, because when there are more sources of uncertainty than observed quantities, the likelihood can not be computed using the Jacobian formula, but requires the evaluation of a high-dimensional integral. In some special cases, e.g. affine term structure models, the problem can be simplified by casting it in a Gaussian state space model and exploiting the Kalman filter recursions, as in De Jong (2000), but in general its solution requires Importance Sampling (IS) techniques. This is the avenue followed e.g. by Brandt and He (2005) in their analysis of affine term structure models.

In this paper, we show how to evaluate the loglikelihood by combining an IS scheme and a Particle Filter algorithm, along the lines suggested by Durham and Gallant (2002, sect. 7); we present it in detail in Section 4.1. In particular, we highlight that including option prices in the observation sample significantly improves the performance of the SML estimator because of the huge amount of information they convey about the latent state variable, i.e. the volatility.

#### 4.1. *Likelihood evaluation*

Let be the filtration generated by the variables observed up to time *t*, i.e. , and . The likelihood function is given by

- ((4.1))

The second equality derives from the Markov property of the diffusion and the independence of measurement errors. The initial condition *P*_{0} is known, and *V*_{0} will be integrated out (see below). Consider the time *t* contribution to

- ((4.2))

This two-dimensional integral can be interpreted as an expected value w.r.t. *V*_{t}, *V*_{t−1} under the distribution implicitly defined by the integrand. Its value can be approximated using an IS scheme by specifying a sampling density for the integration variables. However, the transition pdf *f*(*P*_{t}, *V*_{t}|*P*_{t−1}, *V*_{t−1}) is unknown, and must be approximated using the MBB strategy outlined in Appendix B. Luckily, the two IS schemes can be merged into a single one. To see how, let , and . Following (B.1), the integral on the right-hand side (rhs) of (4.2) can be approximated with

- ((4.3))

where and are defined in Appendix B. According to (4.3), the likelihood evaluation requires to numerically approximate *T* integrals whose dimension equals *M*+1. This can be done using a single IS scheme. Let be a pdf on , and rewrite (4.3) as follows:

- ((4.4))

(4.4) highlights that can be seen as the expected value w.r.t. of the ratio in the integrand under the joint distribution defined by the product of densities . Let , be *L* independent draws from . The IS estimate of (4.4) is given by

- ((4.5))

To implement (4.5), we need to specify (i) which density to choose as , and how to draw from it; and (ii) how to draw from . The next sections consider these points in turn.

* 4.1.1. The auxiliary density*.

To keep low the MC variance of (4.5), the sampling density should be as much as possible proportional to over the whole support of *V*_{t} and . This product is informative about the uncertainty surrounding *V*_{t} and in two ways: it reflects (i) the information about *V*_{t} in the observed cross-section of NPs **H**_{t} through the measurement errors density *f*(**H**_{t}|*P*_{t}, *V*_{t}) and (ii) the information about both *V*_{t} and contained in . In our framework, the second source of information is clearly dominated by the information in the option prices, and this remark suggests that, instead of the usual recursive factorization, it is more convenient to factorize the auxiliary sampling density as

Consider first *q*(*V*_{t}|*V*_{t−1}). Ideally, this density should equal *f*^{a}(*V*_{t}|**H**_{t}, *P*_{t}, *P*_{t−1}, *V*_{t−1}), which is unavailable, but can be approximated by noting that

- ((4.6))

where the two densities on the rhs correspond to the two sources of information about *V*_{t} discussed above.

In this paper, we use as *q*(*V*_{t}|*V*_{t−1}) the Laplace approximation to (4.6). The Laplace approximation is a powerful and accurate strategy widely used in mathematics and statistics to represent unknown densities; see Gelman et al. (1995) for a general presentation, and Durham (2006) and Huber et al. (2009) for two applications in financial econometrics. In a nutshell, it consists of a Gaussian pdf centred at the mode of the target density, with dispersion given by minus the inverse of the Hessian matrix of the log of target, evaluated at the mode. In practice, we proceed as follows. Let us approximate *f*^{a}(*P*_{t}, *V*_{t}|*P*_{t−1}, *V*_{t−1}) with the Gaussian distribution derived from the Euler discretization over the whole interval (*t*−1, *t*)—i.e. ignoring the subintervals defined above. We first compute

using Newton’s method, and

The Laplace sampling density for *V*_{t} is then Gaussian, with mean and variance . Notice that both and depend on *V*_{t−1}. In practice, this implies that the Laplace approximation must be computed for each simulated value of the lagged volatility. While this might seem complicated, it should be noted that the whole procedure amounts to solve a large number of straightforward univariate maximization problems, given the availability of good initial points and of analytical expressions of the derivatives of the function to be maximized. Usually (see e.g. Durham, 2006, 2007) the Laplace approximation is computed w.r.t. to the whole trajectory of the volatility state because the likelihood is not sequentially factorized as in (4.1), but rather defined as a single integral w.r.t. the volatility trajectory, whose dimension is equal to *T*. In this paper we prefer to work with the factorized loglikelihood for several reasons. The whole-trajectory strategy is well-suited for discrete-time models, but becomes much more complicated in a continuous-time setting, in which there are multiple ‘intermediate’ volatility values to integrate out. Moreover, the sequential strategy naturally provides a way to compute the generalized residuals that we will use later to conduct a specification analysis.

A couple of remarks about this result are in order. First, the usefulness of our sampling density depends on the validity of two simplifying approximations: using the Euler discretization instead of the true transition density to derive and , assuming at most one jump between *t*−1 and *t*. These steps, however, can be easily checked ex post by examining the MC variance of (4.5), and checking that this estimate has finite variance. We show in Section 5 that this variance is actually very low in all our applications.

Second, a similar approach could be used also to approximate the pdf which is needed in order to integrate out *V*_{0} in (4.4) for *t*=1. To this end, given the lack of a lagged volatility, we use a Gaussian density computed as the Laplace approximation above, but based on a target density that neglects the transition density, and focuses exclusively on the measurement errors density *f*(**H**_{0}|*P*_{0}, *V*_{0}).

It remains to discuss our choice of , which is a pdf for the ‘intermediate’ volatility states given *V*_{t−1} and *V*_{t}. The ideal pdf would be , which is unknown. However, given the density used to draw *V*_{t}, we argue that no information about is lost if we drop the conditioning on **H**_{t}, *P*_{t} and *P*_{t−1}. This allows to factorize as

We set each pdf in the product of the rhs as a Gaussian density with moments computed in the same way as the MBB strategy discussed in Appendix B. Notice however that, unlike the ‘pure’ MBB approach, the simulated trajectories do not start from the same volatility state *V*_{t−1}, and do not end up in the same volatility state *V*_{t}, as both these values are simulated by and *q*(*V*_{t}|*V*_{t−1}), respectively.

#### 4.2. *Diagnostic testing and filtered (generalized) residuals*

To assess the validity of the models specification, we use simulation based techniques to estimate sequences of filtered estimates of the latent volatility *V*_{t} and of functions of *V*_{t}.

It should be noted that pricing some options relative to one observed option is also possible in the approach of Section 3, in which one option at each date was assumed to be free of measurement error. The two procedures, however, are fundamentally different. On one hand, volatility is filtered by inverting an observed option price; on the other, it is filtered by estimating a conditional expectation given an option which is observed with error. It is likely that, if such error actually exists, neglecting it might induce biased volatility and option prices estimates. Furthermore, in our approach, there is no need to condition on just one option at each date; we might as well condition on all but one of the observed options, and compute the predicted price of the contract left over. This should further enhance the accuracy of the predictions, and it is of course impossible to do under the assumptions of the approach of Section 3.

Case (c) considers the widest information set comprising the log stock price and all the options observed at each date. We label the values of predicted in this way as ‘fully updated’. Notice that their computation can be done using the volatility trajectories used in the likelihood evaluation discussed in Section 4.1. In all cases, 100,000 trajectories were used to approximate the above expressions using Monte Carlo integration techniques.

In our set up, predicted values can be computed for the options NPs and the log stock index price. In the case of options, they allow to compute residuals that, according to our hypothesis about measurement errors, should conform to a Gaussian distribution independent across dates. This can be checked using standard test procedures, such as the Box–Pierce test, either applied to the residuals or to their squares, and the Jarque–Bera test.

In the case of the log stock prices, the assumed distribution is not Gaussian; rather, it is a mixture of conditionally heteroscedastic Gaussian densities. To perform diagnostic checking, we computed the associated generalized residuals as follows. Consider the first kind of filtering rule discussed above, based on the conditioning information set , and the predicted values for , the conditional cdf corresponding to the pdf derived in Appendix . If the model specification is correct, these predicted values should be i.i.d. uniformly distributed in [0,1]. If we further transform these uniform generalized residuals using the inverse standard Gaussian cdf, we obtain generalized residuals that, under the null hypothesis of correct specification, should be i.i.d. standard Gaussian. This can be tested using, as in the case of options, the Box–Pierce or the Jarque–Bera tests.