## 1. Introduction

Diffusion processes are extensively used for modelling continuous time phenomena in many scientific areas; an incomplete list with some indicative references includes economics (Black and Scholes, 1973; Chan *et al.*, 1992; Cox *et al.*, 1985; Merton, 1971), biology (McAdams and Arkin, 1997), genetics (Kimura and Ohta, 1971; Shiga, 1985), chemistry (Gillespie, 1976, 1977), physics (Obuhov, 1959) and engineering (Pardoux and Pignol, 1984). Their appeal lies in the fact that the model is built by specifying the instantaneous mean and variance of the process through a stochastic differential equation (SDE). Specifically, a diffusion process *V* is defined as the solution of an SDE of the type

driven by the scalar Brownian motion *B*. The functionals *b*(·;*θ*) and *σ*(·;*θ*) are called the *drift* and the *diffusion coefficient* respectively and are allowed to depend on some parameters *θ* ∈ *Θ*. They are presumed to satisfy the regularity conditions (locally Lipschitz, with a linear growth bound) that guarantee a weakly unique, global solution of equation (1); see chapter 4 of Kloeden and Platen (1995). In this paper we shall consider only one-dimensional diffusions, although multivariate extensions are possible.

For sufficiently small time increment d*t* and under certain regularity conditions (see Kloeden and Platen (1995)), *V*_{t+dt}−*V*_{t} is *approximately* Gaussian with mean and variance given by the so-called *Euler* (or Euler–Maruyama) approximation

though higher order approximations are also available. The *exact* dynamics of the diffusion process are governed by its transition density

We shall assume that the process is observed without error at a given collection of time instances,

this justifies the notion of a *discretely observed diffusion process*. The time increments between consecutive observations will be denoted Δ*t*_{i}=*t*_{i}−*t*_{i−1} for 1*i**n*.

The log-likelihood of the data set **v** is

Unfortunately, in all except a few special cases the transition density of the diffusion process and thus its likelihood are not analytically available. Therefore, it is already well documented that deriving maximum likelihood estimates (MLEs) for discretely observed diffusion processes is a very challenging problem. None-the-less, theoretical properties of such MLEs are now well known in particular under ergodicity assumptions; see for example Kessler (1997) and Gobet (2002).

Inference for discretely observed diffusions has been pursued in three main directions. One direction considers estimators that are alternative to the MLE. Established methods within this paradigm include techniques that are based on estimating functions (Bibby *et al.*, 2002), indirect inference (Gourieroux *et al.*, 1993) and efficient methods of moments (Gallant and Long, 1997). Another direction involves numerical approximations to the unknown likelihood function. Aït-Sahalia (2002) advocated the use of closed form analytic approximations to the unknown transition density; see Aït-Sahalia (2004) for multidimensional extensions. An alternative strategy has been to estimate an approximation to the likelihood by using Monte Carlo (MC) methods. The approximation is given by Euler-type discretization schemes, and the estimate is obtained by using importance sampling. The strategy was put forward by Pedersen (1995) and Santa-Clara (1995) and was considerably refined by Durham and Gallant (2002). The third direction employs Bayesian imputation methods. The idea is to augment the observed data with values at additional time points so that a satisfactory complete-data likelihood approximation can be written down and to use the Gibbs sampler or alternative Markov chain Monte Carlo (MCMC) schemes; see Roberts and Stramer (2001), Elerian *et al.* (2001) and Eraker (2001). An excellent review of several methods of inference for discretely observed diffusions is given in Sørensen (2004).

The approach that is introduced in this paper follows a different direction, which exploits recent advances in simulation methodology for diffusions. Exact simulation of diffusion sample paths has become feasible since the introduction of the exact algorithm (EA) in Beskos *et al.* (2004a). The algorithm is reviewed in Section 2 and relies on a technique called *retrospective sampling* which was developed originally in Papaspiliopoulos and Roberts (2004). To date there are two versions of the algorithm: EA1, which can be applied to a rather limited class of diffusion processes, which we call ��_{1}, and EA2, which is applicable to the much more general ��_{2}-class; all definitions are given in Section 2. The greater applicability of EA2 over EA1 comes at the cost of higher mathematical sophistication in its derivation, since certain results and techniques from stochastic analysis are required. However, its computer implementation is similar to that of EA1.

In this paper we show how to use the EA to produce a variety of methods that can be used for maximum likelihood and Bayesian inference. We first discuss three unbiased MC estimators of the transition density (2) for a fixed value of *θ*: the bridge method (Section 4; first proposed in Beskos *et al.* (2004a)), the acceptance method (AM) (Section 5) and the Poisson estimator (Section 6; first proposed in Wagner (1988a)). The last two estimators are evolved in Sections 5.1 and 6 to yield unbiased estimators of the transition density *simultaneously* for all *θ* ∈ *Θ*. Thus, the simultaneous estimators can readily be used in conjunction with numerical optimization routines to estimate the MLE and other features of the likelihood surface.

We proceed by introducing a Monte Carlo expectation–maximization (MCEM) algorithm in Section 8. The construction of the algorithm crucially depends on whether there are unknown parameters in the diffusion coefficient *σ*. The simpler case where only drift parameters are to be estimated is treated in Section 8.1, whereas the general case necessitates the path transformations of Roberts and Stramer (2001) and it is handled in Section 8.2.

Section 9 presents an MCMC algorithm which samples from the joint posterior distribution of the parameters and of appropriately chosen latent variables. Unlike currently favoured methods, our algorithm is not based on imputation of diffusion paths but instead on what we call a *hierarchical simulation model*. In that way, our MCMC method circumvents computing the likelihood function.

Therefore, all our methodology is simulation based, but it has advantages over existing methods of this type for two reasons.

- (a) The methods are
*exact*in the sense that no discretization error exists, and the MC estimation provides the only source of error in our calculations. Specifically, as the number of MC samples increases, the estimated MLE converges to the true MLE and, as the number of iterations in our MCMC algorithm increases, the samples converge to the true posterior distribution of the parameters. - (b) Our methods are computationally efficient. Whereas approximate methods require rather fine discretizations (and consequently a number of imputed values which greatly exceeds the observed data size) to guarantee sufficient accuracy, our methodology suffers from no such restrictions.

A limitation of the methods that are introduced here is that their applicability is generally attached to that of the EA. However, on-going advances on the EA itself (Beskos *et al.*, 2005a) will weaken further the required regularity conditions so that a much larger class of diffusions than ��_{2} can be effectively simulated. It is expected that these enhanced simulation algorithms will be of immediate use to the methods that are presented in this paper.

Our methods are illustrated on three different diffusion models. The first is the periodic drift model, which belongs to ��_{1} and, although it is quite interesting in its own right since its transition density is unavailable, it is used primarily for exposition. However, we also consider two more substantial and well-known applications: the logistic diffusion model for population growth and the Cox–Ingersoll–Ross (CIR) model for interest rates. The former belongs to the ��_{2}-class, whereas the latter is a diffusion process that is outside the ��_{2}-class, and it is used to illustrate how our exact methods can be extended for processes for which the EA2 algorithm is not applicable. Moreover, since we can calculate analytically the likelihood for this model, we have a bench-mark to test the success of our approach. We fit the CIR model to a well-studied data set, which contains euro–dollar rates (recorded every 10 days) between 1973 and 1995, to allow for comparisons with existing methods.

All the algorithms that are presented in this paper are coded in C and have been executed on a Pentium IV 2.6 GHz processor. We note that our methods are not computationally demanding according to modern statistical computing standards, and in the examples that we have considered the computing times (which are reported explicitly in the following sections) were in the magnitude of seconds, or at worst minutes.

The structure of the paper is as follows. Section 2 reviews the EA. Section 3 sets up the context of transition density estimation, Sections 4–6 present the three different estimators and Section 7 compares them theoretically and empirically. Section 8 introduces the MCEM algorithm and Section 9 the MCMC algorithm. We finish with some general conclusions and directions for further research in Section 10. Background material and proofs are collected in a brief appendix.