1. Introduction
 Top of page
 Abstract
 References
 References in the discussion
 Appendices
Diffusion processes are extensively used for modelling continuous time phenomena in many scientific areas; an incomplete list with some indicative references includes economics (Black and Scholes, 1973; Chan et al., 1992; Cox et al., 1985; Merton, 1971), biology (McAdams and Arkin, 1997), genetics (Kimura and Ohta, 1971; Shiga, 1985), chemistry (Gillespie, 1976, 1977), physics (Obuhov, 1959) and engineering (Pardoux and Pignol, 1984). Their appeal lies in the fact that the model is built by specifying the instantaneous mean and variance of the process through a stochastic differential equation (SDE). Specifically, a diffusion process V is defined as the solution of an SDE of the type
 (1)
driven by the scalar Brownian motion B. The functionals b(·;θ) and σ(·;θ) are called the drift and the diffusion coefficient respectively and are allowed to depend on some parameters θ ∈ Θ. They are presumed to satisfy the regularity conditions (locally Lipschitz, with a linear growth bound) that guarantee a weakly unique, global solution of equation (1); see chapter 4 of Kloeden and Platen (1995). In this paper we shall consider only onedimensional diffusions, although multivariate extensions are possible.
For sufficiently small time increment dt and under certain regularity conditions (see Kloeden and Platen (1995)), V_{t+dt}−V_{t} is approximately Gaussian with mean and variance given by the socalled Euler (or Euler–Maruyama) approximation
though higher order approximations are also available. The exact dynamics of the diffusion process are governed by its transition density
 (2)
We shall assume that the process is observed without error at a given collection of time instances,
this justifies the notion of a discretely observed diffusion process. The time increments between consecutive observations will be denoted Δt_{i}=t_{i}−t_{i−1} for 1in.
The loglikelihood of the data set v is
Unfortunately, in all except a few special cases the transition density of the diffusion process and thus its likelihood are not analytically available. Therefore, it is already well documented that deriving maximum likelihood estimates (MLEs) for discretely observed diffusion processes is a very challenging problem. Nonetheless, theoretical properties of such MLEs are now well known in particular under ergodicity assumptions; see for example Kessler (1997) and Gobet (2002).
Inference for discretely observed diffusions has been pursued in three main directions. One direction considers estimators that are alternative to the MLE. Established methods within this paradigm include techniques that are based on estimating functions (Bibby et al., 2002), indirect inference (Gourieroux et al., 1993) and efficient methods of moments (Gallant and Long, 1997). Another direction involves numerical approximations to the unknown likelihood function. AïtSahalia (2002) advocated the use of closed form analytic approximations to the unknown transition density; see AïtSahalia (2004) for multidimensional extensions. An alternative strategy has been to estimate an approximation to the likelihood by using Monte Carlo (MC) methods. The approximation is given by Eulertype discretization schemes, and the estimate is obtained by using importance sampling. The strategy was put forward by Pedersen (1995) and SantaClara (1995) and was considerably refined by Durham and Gallant (2002). The third direction employs Bayesian imputation methods. The idea is to augment the observed data with values at additional time points so that a satisfactory completedata likelihood approximation can be written down and to use the Gibbs sampler or alternative Markov chain Monte Carlo (MCMC) schemes; see Roberts and Stramer (2001), Elerian et al. (2001) and Eraker (2001). An excellent review of several methods of inference for discretely observed diffusions is given in Sørensen (2004).
The approach that is introduced in this paper follows a different direction, which exploits recent advances in simulation methodology for diffusions. Exact simulation of diffusion sample paths has become feasible since the introduction of the exact algorithm (EA) in Beskos et al. (2004a). The algorithm is reviewed in Section 2 and relies on a technique called retrospective sampling which was developed originally in Papaspiliopoulos and Roberts (2004). To date there are two versions of the algorithm: EA1, which can be applied to a rather limited class of diffusion processes, which we call ��_{1}, and EA2, which is applicable to the much more general ��_{2}class; all definitions are given in Section 2. The greater applicability of EA2 over EA1 comes at the cost of higher mathematical sophistication in its derivation, since certain results and techniques from stochastic analysis are required. However, its computer implementation is similar to that of EA1.
In this paper we show how to use the EA to produce a variety of methods that can be used for maximum likelihood and Bayesian inference. We first discuss three unbiased MC estimators of the transition density (2) for a fixed value of θ: the bridge method (Section 4; first proposed in Beskos et al. (2004a)), the acceptance method (AM) (Section 5) and the Poisson estimator (Section 6; first proposed in Wagner (1988a)). The last two estimators are evolved in Sections 5.1 and 6 to yield unbiased estimators of the transition density simultaneously for all θ ∈ Θ. Thus, the simultaneous estimators can readily be used in conjunction with numerical optimization routines to estimate the MLE and other features of the likelihood surface.
We proceed by introducing a Monte Carlo expectation–maximization (MCEM) algorithm in Section 8. The construction of the algorithm crucially depends on whether there are unknown parameters in the diffusion coefficient σ. The simpler case where only drift parameters are to be estimated is treated in Section 8.1, whereas the general case necessitates the path transformations of Roberts and Stramer (2001) and it is handled in Section 8.2.
Section 9 presents an MCMC algorithm which samples from the joint posterior distribution of the parameters and of appropriately chosen latent variables. Unlike currently favoured methods, our algorithm is not based on imputation of diffusion paths but instead on what we call a hierarchical simulation model. In that way, our MCMC method circumvents computing the likelihood function.
Therefore, all our methodology is simulation based, but it has advantages over existing methods of this type for two reasons.
 (a)
The methods are exact in the sense that no discretization error exists, and the MC estimation provides the only source of error in our calculations. Specifically, as the number of MC samples increases, the estimated MLE converges to the true MLE and, as the number of iterations in our MCMC algorithm increases, the samples converge to the true posterior distribution of the parameters.
 (b)
Our methods are computationally efficient. Whereas approximate methods require rather fine discretizations (and consequently a number of imputed values which greatly exceeds the observed data size) to guarantee sufficient accuracy, our methodology suffers from no such restrictions.
A limitation of the methods that are introduced here is that their applicability is generally attached to that of the EA. However, ongoing advances on the EA itself (Beskos et al., 2005a) will weaken further the required regularity conditions so that a much larger class of diffusions than ��_{2} can be effectively simulated. It is expected that these enhanced simulation algorithms will be of immediate use to the methods that are presented in this paper.
Our methods are illustrated on three different diffusion models. The first is the periodic drift model, which belongs to ��_{1} and, although it is quite interesting in its own right since its transition density is unavailable, it is used primarily for exposition. However, we also consider two more substantial and wellknown applications: the logistic diffusion model for population growth and the Cox–Ingersoll–Ross (CIR) model for interest rates. The former belongs to the ��_{2}class, whereas the latter is a diffusion process that is outside the ��_{2}class, and it is used to illustrate how our exact methods can be extended for processes for which the EA2 algorithm is not applicable. Moreover, since we can calculate analytically the likelihood for this model, we have a benchmark to test the success of our approach. We fit the CIR model to a wellstudied data set, which contains euro–dollar rates (recorded every 10 days) between 1973 and 1995, to allow for comparisons with existing methods.
All the algorithms that are presented in this paper are coded in C and have been executed on a Pentium IV 2.6 GHz processor. We note that our methods are not computationally demanding according to modern statistical computing standards, and in the examples that we have considered the computing times (which are reported explicitly in the following sections) were in the magnitude of seconds, or at worst minutes.
The structure of the paper is as follows. Section 2 reviews the EA. Section 3 sets up the context of transition density estimation, Sections 4–6 present the three different estimators and Section 7 compares them theoretically and empirically. Section 8 introduces the MCEM algorithm and Section 9 the MCMC algorithm. We finish with some general conclusions and directions for further research in Section 10. Background material and proofs are collected in a brief appendix.
2. Retrospective exact sampling of diffusions
 Top of page
 Abstract
 References
 References in the discussion
 Appendices
Understanding the statistical methodology to be presented in later sections presupposes the introduction of the simulation techniques that are central to our approaches. In particular, we need to understand the form of the output that is provided by the EA. The main references for the material of this section are Beskos et al. (2004a) and Beskos and Roberts (2005). For this paper it suffices to analyse the EA for simulating diffusion paths conditional on their ending point (also known as diffusion bridges). In particular we shall show how to simulate a diffusion path starting from V_{0}=v and ending at V_{t}=w, for any t > 0, v,w ∈ R. Simulation of unconditioned paths follows in the same lines and is sketched in Section 2.3. When necessary we characterize the EA as conditional or unconditional to emphasize the type of diffusion simulation that it is used for.
The EA performs rejection sampling by proposing paths from processes that we can simulate and accepting them according to appropriate probability density ratios. The novelty lies in the fact that the paths proposed are unveiled only at finite (but random) time instances and the decision whether to accept the path or not can be easily taken.
It is essential that we first transform the diffusion process (1) into an SDE of unit diffusion coefficient by applying the 1–1 transformation V_{s}η(V_{s};θ)=:X_{s}, where
 (3)
is any antiderivative of σ^{−1}(·;θ). Assuming that σ(·;θ) is continuously differentiable, we apply Itô’s rule to find that the SDE of the transformed process writes as
 (4)
where
η^{−1} denotes the inverse transformation and σ^{′} denotes the derivative with respect to the space variable. In what follows we shall make the following standard assumptions for any θ ∈ Θ.
We define
to be any antiderivative of α. The transition density of X is defined as
 (5)
Before proceeding, we require the following preliminary notation. Let C≡C([0,t],R) be the set of continuous mappings from [0,t] to R, �� the corresponding cylinder σalgebra and ω=(ω_{s},s ∈ [0,t]) a typical element of C. Let denote the distribution of the process X conditioned to start at X_{0}=x and to finish at X_{t}=y, for some fixed x and y, and ��^{(t,x,y)} be the probability measure for the corresponding Brownian bridge (BB). The notation highlights the dependence of the measure on θ.
The objective is to construct a rejection sampling algorithm to draw from . The following lemma proved in Appendix B.1 is central to the methodology. By ��_{t}(u) we denote the density of the normal distribution with mean 0 and variance t evaluated at u ∈ R.
Lemma 1. Under conditions (a)–(c) above, is absolutely continuous with respect to ��^{(t,x,y)} with density
It is now clear that
 (7)
Thus we have managed in expression (7) to bound the density ratio. The key to proceed to a feasible rejection sampler is to recognize expression (7) as a specific Poisson process probability.
Theorem 1. Let ω ∈ C, Φ be a homogeneous Poisson process of intensity r(ω,θ) on [0,t]×[0,1] and N be the number of points of Φ below the graph s↦φ(ω_{s};θ). Then
Theorem 1 suggests rejection sampling by means of an auxiliary Poisson process as follows.
 Step 1:
simulate a sample path ω∼��^{(t,x,y)}.
 Step 2:
calculate r(ω,θ); generate a marked Poisson process Φ={Ψ,ϒ}, with points Ψ={ψ_{1},…,ψ} that are uniformly distributed on [0,t] and marks ϒ={υ_{1},…,υ} that are uniformly distributed on [0,1], where κ∼Po{r(ω,θ)t}.
 Step 3:
compute the acceptance indicator
 (8)
 Step 4:
if I=1, i.e. {N=0} has occurred, then accept ω; otherwise return to step 1.
Unfortunately, this ‘algorithm’ is impossible to implement since it requires the simulation of complete BBs on [0,t]. However, it might be possible to determine I on the basis of only partial information about the path proposed. For instance, when r is only a function of θ (see Section 2.1) we can actually reverse the order in which steps 1 and 2 are carried out. Specifically, we would first simulate Φ, and afterwards, retrospectively, we would realize ω at the time instances that are determined by Ψ, since this is sufficient for determining I. The technique of exchanging the order of simulation to implement in finite time simulation of infinite dimensional random variables has been termed retrospective sampling in Papaspiliopoulos and Roberts (2004).
The general framework under which the EA operates assumes that it is possible to write
where k(ω) has the following properties.
 (a)
k(ω) is finite dimensional.
 (b)
The law of k(ω) can be simulated (under ��^{(t,x,y)}).
 (c)
The finite dimensional distributions of ��^{(t,x,y)} given k(ω) can be simulated.
Under these conditions, the following retrospective implementation of the rejection sampler can be carried out in finite time: the EA.
 Step 1:
simulate k(ω).
 Step 2:
generate a realization of the marked Poisson process Φ={Ψ,ϒ} of rate r{k(ω),θ}.
 Step 3:
simulate the skeleton {ω_{ψ1},…,ω_{ψκ}}, conditionally on k(ω).
 Step 4:
compute the acceptance indicator I.
 Step 5:
if I=1, then accept the skeleton proposed, and return k(ω) and S(ω):={(0,x),(ψ_{1},ω_{ψ1}),…,(ψ,ω_{ψκ}),(t,y)}; otherwise return to step 1.
S(ω) is an exact draw from a finite dimensional distribution of , which can be filled in at any required times afterwards using simply BB interpolation (see Section 2.3). The technical difficulty of finding or simulating k(ω), and consequently simulating the process at some time points given k(ω), imposes some restrictions on the applicability of the EA. We now describe the two cases where the algorithm can be easily applied.
2.1. Exact algorithm 1 (EA1)
Implementation of the EA is straightforward when r does not depend on ω. This will be true within the following diffusion class.
Definition 1. We say that a diffusion process V with SDE (1) belongs to ��_{1}, and write V ∈ ��_{1}, if the drift of the transformed process X_{s}=η(V_{s};θ), s ∈ [0,t], satisfies conditions (a)–(c) below equation (4) and (α^{2}+α^{′})(·;θ) is bounded above.
Within ��_{1}, step 1 of the EA is unnecessary, since no information about the path is a priori needed for determining the Poisson rate. Moreover, step 3 entails simulation from a finite dimensional distribution of the BB (see Appendix A).
2.1.1. Example 1 (periodic drift)
Though apparently simple, the SDE cannot be solved analytically. However, the EA1 algorithm can be applied since X ∈ ��_{1} with and r(θ)=9/8. Our proposed methods will be tested on the SINE data set simulated from X with the unconditional version of EA1 (Beskos and Roberts, 2005) under the specifications n=1000, Δt_{i}=1, X_{0}=0 and θ=π; see also Fig. 1. When θ is to be estimated, we take Θ=[0,2π] for identifiability.
2.2. Exact algorithm 2 (EA2)
The EA2 algorithm applies to the wider class of diffusion processes that is defined below.
Definition 2. We say that a diffusion process V with SDE (1) belongs to the class ��_{2}, and write V ∈ ��_{2}, if the drift of the transformed process X_{s}=η(V_{s};θ),s ∈ [0,t], satisfies conditions (a)–(c) below equation (4) and
 (9)
or lim sup _{u−∞}{(α^{2}+α^{′})(u;θ)}<∞.
Owing to symmetry, we shall study only case (9). We define the following elements of the path space C=C([0,t],R):
thus m is the minimum value of a path ω, and τ is the time that the minimum is attained. Within ��_{2}, k(ω)=m(ω) and
As shown in Beskos et al. (2004a) and summarized in Appendix A, m satisfies the three requirements that are stated just above the EA. Simulation of (τ,m) can be done by using simple transformations of elementary random elements. It is known (Asmussen et al., 1995) that the BB conditionally on (τ,m) can be derived in terms of two independent Bessel bridges, each operating on either side of (τ,m). The Bessel bridge is defined as a BB that is constrained to be positive and its simulation can be carried out by means of independent BBs; see Appendix A for details.
2.2.1. Example 2 (the logistic growth model)
A popular model for describing the dynamics of a population which grows at a geometric rate in an environment with limited feeding resources is given by the SDE
R is the growth rate per individual, Λ the maximal population that can be supported by the resources of the environment and σ a noise parameter; for more details see chapter 6 of Goel and RichterDyn (1974). Related models have been investigated in the context of financial economics; see for example Gourieroux and Jasiak (2003). The transition density of V is not known analytically. The modified process X_{s}=− log (V_{s})/σ solves the SDE
It can be verified that V ∈ ��_{2} and that the EA2 algorithm is applicable with l(θ)=σ^{2}/8−R/2 and
Our proposed methods will be tested on the LOGGROWTH data set simulated from V by first simulating from X by using the unconditional EA2 algorithm and then transforming X_{s}V_{s} (Beskos et al., 2004a). We took n=1000, Δt_{i}=1, V_{0}=700 and (R,Λ,σ)=(0.1,1000,0.1); see Fig. 1.
2.3. Unconditional exact algorithm and path reconstruction
The EA can be applied in a similar fashion when the diffusion is conditioned only on its initial point X_{0}=x. The main difference lies in that the final value of the path proposed is distributed according to the density
(which is assumed to be integrable). Simulation from h can be done efficiently by using an adaptive rejection sampling approach. Conditionally on this final point, the rest of the path is a BB, and the EA proceeds as already described.
The output of the EA is a skeleton S(ω) and possibly the collection k(ω) of variables related to the path. However, we can afterwards fill in this finite representation of an accepted path ω according to the dynamics of the proposed BB and without further reference to the target process. For the EA1 algorithm, the path reconstruction requires the simulation of the BBs that connect the currently unveiled instances of an accepted path ω. For the EA2 algorithm, the extra conditioning of the proposed BB on its minimum m implies that the filling in will be driven by two independent Bessel bridges; see Appendix A.
6. The Poisson estimator
 Top of page
 Abstract
 References
 References in the discussion
 Appendices
Corollary 1 relates the transition density of the diffusion to an expectation over the BB measure. Thus, any unbiased estimator of the expectation on the righthand side of equation (11) corresponds to an unbiased estimator of . Generalizing, suppose that the expectation
 (18)
is to be estimated for arbitrary continuous function f and diffusion bridge measure ℙ^{(t,x,y)}. For any c ∈ R, λ > 0, and path ω, we write
where the expectation is taken with respect to κ∼Po(λt), conditionally on ω. If ψ∼Un[0,t], then {c−f(ω)}/λ is an unbiased estimator of . Let Ψ={ψ_{1},…,ψ} be a Poisson process of rate λ on [0,t], and ω∼ℙ^{(t,x,y)}. Then, we obtain the following simple unbiased estimator of expectation (18), which we call the Poisson estimator:
 (19)
This estimator was introduced in the context of statistical physics by Wagner (1988a, 1989). It can be derived from first principles that its second moment is
 (20)
which is not guaranteed to be finite. The choice of c and λ with the view of improving the efficiency of the algorithm is discussed in Section 7.
Taking ℙ^{(t,x,y)}≡��^{(t,x,y)}, and f=(α^{2}+α^{′})/2, the Poisson estimator can be used with equation (11) to estimate . This transition density estimator will also be referred to as the Poisson estimator. A simultaneous estimation of the complete map for any data points v and w merely requires decoupling of ω in expression (19) from x and y, since Ψ clearly is independent of θ. Since in the current context ℙ^{(t,x,y)}≡��^{(t,x,y)}, it suffices to exploit the relocation invariance property of BBs and to rewrite expression (19) as
 (21)
This estimator will be referred to as the simultaneous Poisson estimator. For more general diffusion bridge measures, decoupling of ω from x and y can be cumbersome, since the intuitive relocation invariance property does not hold in general. Such a case is treated in the following section.
6.1. Inference for the Cox–Ingersoll–Ross model
We apply the Poisson estimator to infer about a diffusion process outside ��_{2}, the CIR diffusion model, which solves
It is assumed that all parameters are positive and 2ρμ > σ^{2}, which guarantees that V does not hit zero (see page 391 of Cox et al. (1985)). We fit the CIR model to a wellstudied data set (used among others by AïtSahalia (1996), Elerian et al. (2001) and Roberts and Stramer (2001)) which contains daily euro–dollar rates between 1973 and 1995, to allow for comparisons with existing methods. We take a subsample of 550 values, corresponding to time intervals of 10 days, since the diffusion model seems more appropriate on that scale. We call this subsample the EURODOLLAR data set and plot it in Fig. 3(a).
In this case η(u;θ)=2√u/σ; thus the transformation X_{s}=η(V_{s};θ) solves the SDE
Thus, we can readily use the Poisson estimator to estimate the transition density of the CIR model for a fixed θ ∈ Θ. However, simultaneous estimation for all θ ∈ Θ is nontrivial, since the decoupling of ω∼ℝ^{(t,x,y)} from x and y is not straightforward. The appropriate construction is in fact a variation of the approach that we have devised for the SAM for diffusions in ��_{2} and is contained in Beskos et al. (2005b). This stems from the characterization of the Bessel bridge as a BB that is conditioned to remain positive. As in the EA2 algorithm, we consider first the minimum m of the Bessel bridge and subsequently reconstruct the path given m. The distribution of m is the restriction on (0,∞) of the distribution of the minimum of the corresponding BB. Given m, the path can be reconstructed exactly as if it were a BB with known minimum m.
We applied the simultaneous Poisson estimator to the EURODOLLAR data set taking, after some pilot tuning, λ=c=1. In Fig. 3(b) we show the real and the estimated profile loglikelihood for σ by using the downhill simplex optimization algorithm. Even with 100 MC samples the estimated curve coincides with the true curve, although it can be shown that the variance of the estimator is infinite in this case.
7. Comparison of different transition density estimators
 Top of page
 Abstract
 References
 References in the discussion
 Appendices
We have introduced three different methods which can be used to estimate the diffusion transition density. Whereas the AM (and SAM) were devised solely for this purpose, it is important to recognize that the other two methods have greater scope. The Poisson estimator can estimate expectations of general diffusion exponential functionals (18). The bridge method is based on equation (12), which is a form of Rao–Blackwellization that can be used to estimate other conditional probabilities for diffusions, e.g. hitting probabilities.
In the context of transition density estimation, the weakness of the AM and the bridge method over the Poisson estimator is that their applicability is determined by that of the EA. On the contrary, the Poisson estimator can be used without assuming that the drift α is suitably bounded.
A significant advantage of the AM over the competitors is that it is guaranteed to have finite polynomial moments. For the bridge method little is known since we have not yet derived analytic expressions for its variance. Checking whether the variance of the Poisson estimator is finite is in general tedious. A notable exception is when the diffusion is in ��_{1}, where, since f=(α^{2}+α^{′})/2 is bounded, it is easy to see from expression (20) that the variance will be finite for any finite c and λ. Two examples outside ��_{1} for which we know that this will not be the case are the CIR and the logistic growth models.
A feature of the bridge method which is likely to make it generally less efficient than the competitors is that the simulated paths ignore the final data point, since the method is founded on the unconditional EA.
When the aim is to explore features of the likelihood surface, it is imperative that the estimators yield estimates of the transition density simultaneously for all θ ∈ Θ. The bridge method is the only method for which we have been unable to achieve that.
Although derived from very different perspectives, the SAM and the simultaneous Poisson estimator are related. Taking λ=r_{max}, c=λ+l(θ) and contrasting equation (17) with expression (21) reveals that, in ��_{1}, the SAM is a special case of the simultaneous Poisson estimator. Moreover, there are certain optimality properties of this choice. For any c such that c>r(θ)+l(θ), expression (20) is bounded above by exp ([−2c+λ+{c−l(θ)}^{2}/λ]t). This quantity is minimized by any pair (λ,c) such that c=λ+l(θ), where λr(θ). Requiring that the Poisson estimator yields estimates simultaneously for all θ ∈ Θ, the computationally most efficient bound on the variance is achieved by the choice λ=r_{max} and c=λ+l(θ), under which the Poisson estimator and the SAM coincide. Note that, in this choice, λ is the range and c the maximum of the functional (α^{2}+α^{′})(u;θ)/2 over all u ∈ R,θ ∈ Θ. It is not obvious whether choosing c > r(θ)+l(θ) is optimal. The connection between the two methods is less transparent outside ��_{1}. The rate of the Poisson process that is used in the SAM for diffusions in ��_{2} will depend on the minimum of the path proposed, and it is precisely because of this dependence that the estimator is almost surely bounded. The Poisson process and the path proposed in the Poisson estimator are inherently independent. Guided by our findings for ��_{1}, we propose to choose λ according to an estimate of the range of (α^{2}+α^{′})(u;θ)/2 over all u ∈ R, θ ∈ Θ, and to take c=λ+l(θ). This choice has proved successful empirically.
A brief study of the performance of the SAM and the simultaneous Poisson estimator in estimating the MLE of the LOGGROWTH data set is summarized in Table 1. The parameters λ and c were chosen as suggested above. Taking into account its smaller computational cost, the simultaneous Poisson estimator is more efficient in this example. A feature that is not depicted in Table 1 is the sensitivity of the simultaneous Poisson estimator on the choice of c and λ. The method can produce unreliable estimates, especially with few MC samples, for certain choices of the tuning parameters. In contrast the SAM is fully automatic.
Table 1. Summary of 500 independent estimations of the MLE of the LOGGROWTH data set† Parameter  Estimator  Minimum  1st quartile  Median  3rd quartile  Maximum 


R  SAM  0.1051  0.1083  0.1092  0.1100  0.1130 
Simultaneous Poisson  0.1030  0.1072  0.1087  0.1099  0.1139 
Λ  SAM  1010.13  1013.91  1014.89  1016.07  1021.07 
Simultaneous Poisson  1008.88  1013.48  1014.93  1016.50  1022.80 
σ  SAM  0.10045  0.10053  0.10056  0.10058  0.10065 
Simultaneous Poisson  0.10040  0.10050  0.10054  0.10058  0.10069 
9. Hierarchical simulation models and inference using Markov chain Monte Carlo methods
 Top of page
 Abstract
 References
 References in the discussion
 Appendices
In this section we develop an MCMC algorithm for Bayesian inference for discretely observed diffusions. The stationary distribution of our MCMC algorithm is the exact posterior distribution of the parameters. However, it is not only the exactness which distinguishes our approach from competitive existing MCMC algorithms. Existing methods follow a path augmentation approach, as we did in the EM algorithm in Section 8. The missing continuous paths are approximated by a fine discrete time Markov chain, whose transition is assumed to follow an Eulertype approximation to the true diffusion transition. The joint distribution of the observed and missing data is given by an appropriate approximation of Girsanov's formula. Then, a Gibbs sampler (or more general componentwise updating MCMC algorithm) is used to sample from the approximate posterior distribution of the parameters and the missing paths. Often, for any two data points several points need to be imputed in between them to obtain a good approximation to the true posterior of θ. However, the performance of basic MCMC schemes can severely deteriorate as the amount of imputation increases; see Roberts and Stramer (2001) for details.
The approach that is introduced here is not based on augmentation of paths. Instead, we construct a graphical model which involves the variables that are used in the EA, and we show that the posterior distribution of the parameters is obtained as a marginal in this graphical model. Thus, we use an appropriate Metropolis–Hastings algorithm to sample from the joint posterior of all variables in the graph. One of the steps of the sampler involves running the conditional EA. The state space of our MCMC algorithm typically has much smaller dimension than the competing augmentation methods and as a result it can be computationally more efficient. However, a comparison between alternative MCMC methods is not carried out in this paper and will be reported elsewhere.
We describe in detail the MCMC algorithm when it is assumed that V ∈ ��_{1}. The derivation of the algorithm when V ∈ ��_{2} is technically much more difficult, and it can be found in Beskos et al. (2004b). Nevertheless, we present simulation results for both cases. An essential ingredient of the algorithm is the following lemma which derives the density of the output of the conditional EA1 algorithm.
Lemma 3. Consider any two fixed points x and y. Let Φ={Ψ,ϒ} be the marked Poisson process on [0,t]×[0,1] with rate r(θ) and number of points κ∼Po{r(θ)t}, which is used in EA1 for simulating from . Let ω∼��^{(t,0,0)}, and I be the acceptance indicator (16) which decides whether (ω_{s}+(1−s/t)x+(s/t)y,s ∈ [0,t]) is accepted as a path from . Then, the conditional density of ω and Φ given {I=1}, π(ω,Φθ,x,y,I=1), is
 (26)
with respect to the product measure Φ×��^{(t,0,0)}, where Φ is the measure of a unit rate Poisson process on [0,t]×[0,1], and a(x,y,θ) is the acceptance probability of the EA1 algorithm.
To clarify the notation, in the remainder of the section (ω^{*},Φ^{*}) represent the accepted random elements in EA1, with density π(ω^{*},Φ^{*}θ,x,y) given in expression (26); Φ^{*}, and ω^{*} at any collection of times, can be easily sampled by using the conditional EA1 algorithm.
9.1. Example: periodic drift
We implemented the algorithm for the SINE data set. We used a uniform prior on [0,2π]. It took 47 s to run the algorithm for 10000 iterations, and some summaries are shown in the rightmost column of Fig. 4. The posterior mean is estimated as 3.1127 and the posterior standard deviation as 0.04. We used a Metropolis step to update θ, which had acceptance probability 0.49. The algorithm mixes very rapidly, and essentially the autocorrelation in the θseries is because a Metropolis step is used rather than direct simulation from its conditional distribution. The dependence between θ and the latent variables is very weak.
When the assumed model is in ��_{2}, the construction of the MCMC algorithm is more complicated. In particular, derivation of the joint density of (θ,ω^{*},Φ^{*}) is challenging owing to the more complex structure of the EA2 algorithm. This is done in Beskos et al. (2004b), where also other important issues related to the implementation of the MCMC algorithm are tackled. The algorithm necessitates recent noncentring reparameterization methodology for hierarchical models as described in Roberts et al. (2004) and Papaspiliopoulos et al. (2003).
9.2. Example: logistic growth
Using the MCMC algorithm that is described in Beskos et al. (2004b), we obtained 50000 samples from the posterior distribution of the parameters (R,Λ,σ) for the LOGGROWTH data set. The computing time was 20 min, and a summary of the results is given in Fig. 4. The posterior means were estimated as (0.1075, 1017.4, 0.1007) and the posterior precision matrix as
from which the posterior standard deviations read as (0.015, 31.13, 0.002).
10. Conclusions
 Top of page
 Abstract
 References
 References in the discussion
 Appendices
In this paper we have introduced a variety of methods which can be used for likelihoodbased inference for discretely observed diffusions. The methods rely on recent advances in the exact simulation of diffusions. The computational efficiency of the methods was illustrated in a collection of examples. However, an exhaustive simulation study which tests the relative performance of our methods and existing approaches, under various model specifications and parameter settings, is not given here. Such a detailed empirical investigation is currently taking place. However, some qualitative remarks are made below.
In general, the computing time that is required for our methods critically depends on the rate of the Poisson process. Ceteris paribus, this rate is a function of the time increment Δt between
the observations. The performance (measured either in computing time or MC error) of our methods will be very strong for small Δt (high frequency data). However, the performance of the methods as presented here deteriorates for sparser data sets (Δt large). Specifically, the problem stems from the following two characteristics of the conditional EA:
The computing time that is required for the AM and SAM increases because of (a) and for the MCEM and MCMC algorithms because of (a) and (b). In the AM and SAM, the variance will also increase with Δt, since MC averages would be heavily dominated by the terms (of very small probability) corresponding to Poisson configurations with very few points. Similarly, expression (20) suggests that the variance of the Poisson estimator increases exponentially with Δt. In MCMC sampling, the computing cost can be transformed to be linear in Δt. This is achieved by augmenting any two observed data with additional points in between. This is implemented by applying the conditional EA on intervals of length a fraction of Δt. Of course, the additional augmentation will affect the MCMC mixing.
Some extensions of our methodology are possible. One important direction is the extension of our methods for diffusions outside the ��_{2}class. As we have shown, the Poisson estimator can be readily used for likelihood inference for more general diffusion processes. However, current progress on the EA itself, which attempts to remove the boundedness conditions on the drift (Beskos et al. 2005a), is expected to broaden the applicability of the estimation methods. It is worth noting that MCEM and the AM remain unaltered irrespective of which version of the EA is being used; thus they can readily accommodate extensions in the EA. The other methods use explicitly the structure of the EA and will have to be modified appropriately.
We have concentrated here on the case where the diffusion is observed without error at a finite set of times. However, data often occur in different forms. For instance, data might be subject to observation error. All the methods for evaluating MLEs are difficult to extend to this case. However, the MCMC approach can be extended in a straightforward manner. An alternative interesting data form is when we observe a onedimensional component of a higher dimensional diffusion, as for instance in continuous time filtering models. We are currently working on a collection of such filtering problems, and we have found that our methods can be extended to this case, although there are significant additional implementation challenges in this approach.
Our methodology extends some way to time inhomogeneous and multivariate diffusions. In carrying out these extensions, there are two steps in the arguments that are used in this paper that need to be generalized. Firstly, it is necessary to generalize the transformation (3) which eliminates the diffusion coefficient. This is straightforward under mild smoothness conditions on σ in the time inhomogeneous extension. For multivariate extensions, however, the generalized version of transformation (3) involves the solution of an appropriate vector differential equation which is often intractable or insolvable (see for example AïtSahalia (2004)). This imposes restrictions on the class of multivariate diffusions to which the methods that are presented here can currently be applied. Secondly, we need to eliminate the stochastic integral from Girsanov's formula (32) to derive lemma 1. Again this is fairly routine in the time inhomogeneous case, whereas an extra condition is needed in the multivariate case, requiring that the multivariate drift be the gradient of a suitable potential function. This is a wellknown condition in stochastic analysis and for ergodic, unit diffusion coefficient diffusions corresponds to reversibility.
It is natural to ask whether the ideas of this paper extend to SDEs that are driven by Lévy processes. However, the Cameron–Martin–Girsanov formula, providing a closed form likelihood function, is critical to all our methodology. Unfortunately, there is not an analogous expression for the Radon–Nikodym derivative of an infinite activity Lévydriven SDE with respect to a measure that is tractable and easy to simulate from. Nevertheless, it is straightforward to extend our methods to incorporate SDEs with jumps according to a finite activity Lévy process (as for example considered in Roberts et al. (2004)).
We hope that the collection of techniques that are described in this paper will have further applications. We are currently working on MC estimation of derivative hedge ratios in finance. The methodology builds on Rao–Blackwellization techniques such as those devised for the bridge method.
Although we recognize that the mathematical details of this work are complex for those without a working knowledge of diffusion theory, we firmly believe that our methods have the potential to influence applied statistical work. This is because the algorithms that we use are relatively simple and easy to code, and the methods are not computationally demanding and are capable of handling long time series. In addition, motivated by the desire to make our work as accessible as possible, we have started to develop generic software for the implementation of some of our methods, beginning with the EA.
Dan Crisan (Imperial College London)
The paper extends classical results related to exact simulations of random variables to an infinite dimensional setup. Here random processes, in particular, solutions of onedimensional stochastic differential equations, are exactly sampled and various maximum likelihood and Bayesian inference methods are developed on the basis of the two versions EA1 and EA2 of the exact sampling algorithms.
It is very easy to obtain a sample path from a Brownian motion or from its conditioned version, the Brownian bridge. This observation is the basis of the exact simulation of diffusions: if a diffusion X has an absolutely continuous distribution with respect to that of a Brownian motion or a Brownian bridge, then the retrospective sampling method that was developed by Papaspiliopoulos and Roberts can generate an exact simulation of X by using the perfect Brownian sampling.
Unfortunately the class of onedimensional stochastic differential equations with solutions that are absolutely continuous with respect to a Brownian motion is very small. In effect we can only have equations of the form
(I shall not make explicit the coefficients’ dependence of the unknown parameter θ as it is not relevant to the arguments that are presented.) X is simply a Brownian motion plus a drift term. This stems from the fact that the Brownian path t B_{t} has quadratic variation 〈B〉_{t}=t and X must have the same property if it wants to have an absolute continuous distribution with respect to that of B. In what follows I shall call these processes ℬdiffusions.
The methods that are presented in the paper are applicable to a larger class of processes. Under additional assumptions, all diffusions that can be reduced to a ℬdiffusion via a suitable change of space coordinates can be simulated exactly. These are the diffusions V for which there is a diffeomorphism η:ℝℝ such that η(V) is a ℬdiffusion. From a pathwise perspective, the change of coordinates amounts to stretching or squashing the paths so that they obtain the right quadratic variation. This can be done, more often than not, at the expense of ending up with unbounded or, even worse, exploding drifts α. As a result, the simulation procedures and the inference methodology become either more difficult or not possible at all.
It is far more natural to start with a diffusion X that has an explicit distribution and the same quadratic variation structure as V and then to check whether it has an absolutely continuous distribution with respect to that of V. There is a generic method that finds such diffusions X and it stems from a classical result by Doss (1977) and Sussmann (1978). The result states that, if b is Lipschitz continuous and that σ is twice differentiable with bounded first and second derivatives, then the onedimensional stochastic differential equation
has a unique strong solution which can be written in the form
where u:ℝ^{2}ℝ is the solution of the ordinary differential equation
 (33)
and the function t Y_{t}(ω) solves an ordinary differential equation for every ω ∈ Ω. However, if we choose a deterministic or a constant Y, the resulting process
will satisfy the equation
where
In particular, if Y is constant, then
where the last integral is a Stratonovitch integral. Applying this strategy to the logistic growth model (the second example in the paper), the corresponding process X can be chosen to be the lognormal diffusion
with the explicit solution
The density of the (unconditional) distribution of V with respect to the distribution of X (on the path space) will then be proportional to
There are three apparent advantages of this method over the change of spatial coordinates approach:
 (a)
it applies to a wide class of sets of coefficients (b,σ);
 (b)
it usually leads to simpler expressions for the Radon–Nikodym derivative, hence perhaps simplifying the ensuing estimation procedures;
 (c)
it extends naturally to multivariate diffusions and time inhomogeneous diffusions. We need to consider the multidimensional and, respectively, the time inhomogeneous version of
expression (33).
For the Cox–Ingersoll–Ross model (the third example in the paper) though, the resulting process X is not absolutely continuous with respect to V. However, in the same vein, the Cox–Ingersoll–Ross diffusion is absolutely continuous with respect to the squared Bessel process
I have two further comments on the paper. It would be desirable to see a theoretical analysis of the computational effort that is involved for the various inference methods presented as a function of the number of time instances when the data are acquired. Also no comparison or analogy is made between the methods presented and the sequential Monte Carlo methods. However, the authors say that a comparison study between their methods and other existing methods is in progress.
In conclusion, the authors have already done a large amount of work on the subject, though I would venture to say that there is still much to be done on this line of research and I look forward to seeing the followups. It gives me great pleasure in congratulating the authors on their paper and in seconding the vote of thanks.
The vote of thanks was passed by acclamation.
N. H. Bingham (University of Sheffield)
I congratulate the authors on this interesting paper. I confine myself to a few comments on theoretical and financial aspects.
The multidimensional extensions that are mentioned in the final section are intriguing from the point of view of mathematical finance. There, by Markowitzian diversification, one holds a large number of assets; the danger here is the curse of dimensionality. Methods are available to counter this which reduce the effective dimensionality from the number or assets to the number of industrial sectors; see for example Bingham et al. (2003).
An alternative approach to estimating diffusion coefficients by discrete methods arises where we have (or can obtain) an ensemble of paths, or diffusing particles say, rather than one. We can then extract information from the counts of the number of diffusing particles in some window of observation, as a function of time. The technique is called number fluctuation spectroscopy. It was used by Smoluchowski, following Einstein's work, to estimate Avogadro's number, and by Rothschild to study mobility of spermatozoa. For details and references, see for example Bingham and Dunham (1997) and Bingham and Pitts (1998).
As observed in Section 2, granted a sample path over a time interval of positive length, we can determine the diffusion coefficient with certainty (e.g. in the Brownian case from the quadratic variation). But this involves (uncountably) infinite sample sizes, and we can obtain and handle only finite samples in reality.
Nevertheless, the suggestion is that in principle we can discriminate between different diffusion coefficients with certainty; if both are the same, we can form likelihood ratios and, for example, test hypotheses comparing the drifts.
Diffusions, or Itô processes, being locally Gaussian, relevant here is the split between mutual absolute continuity and singularity of Gaussian processes, for which see for example Ibragimov and Rozanov (1978), chapter III.
The use of random times such as when the minimum is (last) attained—which is far from being a stopping time—to split the path into independent fragments is a powerful technique. Such times are called splitting times; for general theory, see Rogers and Williams (1994), section III.49.
Multidimensional diffusions
The irreducibility condition that was used by AïtSahalia (2004) to reduce to the case of unit diffusion coefficient predates the standard work on multidimensional diffusions by Stroock and Varadhan (1979)—which does not contain it. It is good to see practice driving theory here.
Sara Pasquali and Fabrizio Ruggeri (Istituto di Matematica Applicata e Tecnologie Informatiche, Milan)
We express our gratitude to the authors for an enlightening paper which provides an outbreaking new method for likelihoodbased inference for discretely observed diffusions. The paper provides many details about the method and discusses many of its aspects, leaving a thorough analysis of some of them to forthcoming papers. We are looking forward to reading them since they will address some of our concerns. Here we just want to comment on possible extensions of the method.
The most general stochastic differential equation is given by
as in the following case about population dynamics:
where θ=(α,β,σ),w_{t} is Gaussian white noise, T is the temperature (possibly dependent on the time s) and R(s) is a known recruitment function.
In the paper, the time s does not appear directly in the drift and in the diffusion, but only through V_{s}. We wonder whether the method proposed is valid also in this case.
As a second comment, we think of practical problems in which parallel, multivariate diffusion processes are to be analysed all together. We suppose that the diffusion processes might slightly differ in drift and/or diffusion and/or the prior on the parameters. Is there a way to estimate all of them together, sharing as many simulated points as possible, or should they be analysed separately?
Another related comment is about evolving systems in which new data are made available as time goes by. Is it possible to analyse the process on [0,s+t], starting from estimation in [0,s] and adding the data that are observed in (s,s+t], or should the estimation on [0,s+t] restart from scratch?
It would be interesting to explore the relationship between parameter estimation and stochastic stability, e.g. in the logistic growth model (example 2) where the equilibrium state 0 is stable for σ^{2}>2R and unstable otherwise. In the latter case, which is considered in the paper, it is known that the process V_{s} fluctuates around the value Λ. Once data have been simulated from a logistic model, we would like to know what happens about the stability properties of the equilibrium, especially when parameters are close to the threshold σ^{2}=2R.
Mathias Rousset (University of Toulouse) and Arnaud Doucet (University of British Columbia, Vancouver)
The authors are to be congratulated for this impressive paper which solves many problems and opens many avenues of investigation. We present here a direct application of their methodology to time discretization errorfree filtering of partially observed diffusions. Consider the following diffusion where X_{0}∼π_{0} and for t>0:
This diffusion is partially observed at times {t_{k}}_{k}_{1} (where t_{k}>t_{k−1}) and the conditional density of the kth observation Y_{tk} given by g_{k}(y_{tk}x_{tk}) is known analytically. We are interested in estimating sequentially the distributions
where t_{0}=0, x_{t0:tk}=(x_{t0},x_{t2},…,x_{tk}) and y_{t1:tk}=(y_{t1},y_{t2},…,y_{tk}). To achieve this we propose to use a sequential Monte Carlo (SMC) algorithm (Doucet et al., 2001). The distributions are approximated by a large number N of weighted random samples termed particles. The particles are sampled by using
with
where {q_{n}(x_{tn}y_{tn},x_{tn−1})} are importance distributions known pointwise. In the standard SMC framework, these particles should be reweighted according to normalized weights proportional to
where
with
and
To be efficient, the SMC method requires the design of ‘good’ importance distributions and to obtain estimates of the importance weights with low variance. To design the importance distributions, approximate analytically by a Gaussian distribution using a local linearization technique (Durham and Gallant, 2002) and combine this approximate prior with g(y_{tk}x_{tk}) or a linearized version of it to obtain q_{k}(x_{tk}y_{tk},x_{tk−1}).
It is also crucial to reduce the variance of the estimates of the weights given by
To achieve this, if the Poisson estimator of Section 6 is used, we could sample for each particle P Poisson random variables but use the same Brownian bridge to sample retrospectively for computational savings. However, a large Poisson parameter λ and a large P may be needed to obtain a reasonable variance.
Osnat Stramer (University of Iowa, Iowa City)
This paper proposes a new, thorough and interesting approach for calculating the likelihood of discretely observed diffusions. The loglikelihood of the data set v is
where the transition density of the diffusion process V is p_{Δti}(V_{ti−1},V_{ti};θ). I thank the authors for providing a mine of ideas for modelling continuous time models.
In my comment, I shall also refer to two other existing approaches for estimating the transition density. One approach is the simulation methods that were proposed by Pedersen (1995) and improved substantially by Durham and Gallant (2002). This approach involves two types of approximations or errors:
Increasing the number of time intervals per Δ_{ti} reduced the bias, but at the cost of increasing the number of simulations (see Stramer and Jun (2005a)). The second error will not disappear even if we use exact simulation. The methodology that is introduced in this paper avoids the first error but its efficiency depends heavily on the efficiency of the exact simulation of conditional diffusions. As pointed out by the authors, the acceptance rate of the Poisson process typically decreases exponentially to 0 in the length of the interval Δt_{i} between adjacent observations.
Another approach for estimating the transition probability is the closed form method that was introduced in AïtSahalia (2002a, 2004). It has been shown in AïtSahalia (2002b) and Stramer and Jun (2005b) that the closed form methods can be much faster and more accurate than the Durham and Gallant (2002) method in many interesting examples, though in theory it is assumed that Δt_{i} is small. It would be interesting to see a numerical comparison of the methods that were presented in this paper with the simulationbased methods and the closed form methods.
The methodology that is presented in this paper requires that the diffusion process be transformed to a new one with unit volatility. The simulation approach is in general more amenable when this transformation can be done (see Durham and Gallant (2002)). As noted by the authors a multivariate extension of their methods is therefore only possible for a limited class of models. More investigation in this direction will be necessary.
The authors replied later, in writing, as follows.
Firstly we thank all the discussants for their many contributions, insights and thoughtprovoking questions. The area of inference for partially observed diffusions is rapidly developing and hopefully our paper and the subsequent discussion will add further momentum to this exciting field.
The scope of the exact algorithm
A number of discussants (including Ball and his colleagues, Bingham, Chopin, Crisan, Kalogeropoulos, Moulines, Pasquali and Ruggeri, and Robert) asked about the scope of the exact algorithm (EA) methodology. Our paper introduces a general framework for inference which is illustrated mainly within the quite restrictive EA1 context and to some extent the more general EA2 situation.
We recall that the application of algorithm EA2 requires conditions (a)–(c) just below equation (4) to be satisfied, and the additional condition (9). Whereas (a)–(c) are largely innocuous, condition (9) is a serious constraint on the applicability of the method. For instance, many natural diffusion models violate this condition (including for instance the Ornstein–Uhlenbeck process). There are two ways in which the applicability of our methodology can be extended significantly beyond these constraints.
 (a)
As outlined in the paper, the Poisson estimator permits unbiased estimation for models outside ��_{2}. This permits the construction of a Monte Carlo EM algorithm for classes of diffusions that are outside ��_{2}. The potential problem with this approach is that finite variances of estimators are not guaranteed to exist.
 (b)
We have recently developed an extension of the EA called the EA3 algorithm (see
Beskos et al. (2005a)) which is applicable to all diffusions in a class ��
_{3} which is described as diffusions which satisfy only conditions (a)–(c). Since ��
_{3} is not constained by restriction (9), it covers a very broad class of models.
Algorithm EA3 is mathematically more complex than either EA1 or EA2, though not necessarily less efficient. It is based on the idea that, once upper and lower bounds for the trajectory are known, the usual rejection sampling algorithm is easy to implement. For this, it introduces a different decomposition of Brownian motion, which we call layered Brownian motion. This decomposition permits upper and lower bounds on the diffusion sample path to be imposed once the sample path layer has been constructed. What makes algorithm EA3 more complicated than algorithm EA2 is the fact that the joint law of Brownian motion and its maximum modulus up to a fixed time t is not easily tractable, and exact simulation requires the construction of specific retrospective simulation techniques for probabilities that are expressible as infinite sums of alternating terms.
A minor modification of algorithm EA3 can also be used to cover other cases. For instance, by a judicious choice of the layers that are used in the construction, diffusions with finite entrance boundaries (and which are therefore not reached with probability 1) such as the shape space Ornstein–Uhlenbeck model of Ball and his colleagues can be simulated exactly. However, we also appreciate that the approximate approach that is adopted in their contribution represents sound methodology, since it seems easy to ensure that the approximation that is involved is arbitrarily accurate.
Several discussants enquired about other extensions. Beskos et al. (2005a) also shows that extensions of the EA to the time inhomogeneous case (as queried by Pasquali and Ruggeri) is straightforward, although the function φ must be modified accordingly. Moreover extensions to the multivariate case (as mentioned by Kalogeropoulos and Bingham) is routine provided that the following two (nontrivial) conditions are met.
 (a)
The multivariate version of the transformation that is given in
equation (3) needs to be possible. Although
equation (3) is always possible for scalar diffusions, its multivariate counterpart (translating a diffusion with arbitrary diffusion coefficient to a diffusion with unit diffusion coefficient) may not exist. Conditions for this are given in
AïtSahalia (2004) as mentioned in the contributions of Kalogeropoulos and Bingham.
 (b)
Assuming that condition (3) is possible, the resulting multivariate drift function must be of gradient type, i.e. there is a function
V(
x) such that the drift
α(
x) satisfies
This is a very common and natural condition for diffusions which corresponds (in the stationary case) to the reversibility of the diffusion.
However, when these two conditions are not met, it seems difficult to see how the EA can be applied at all.
Kalogeropoulos mentions the interesting extension to nonMarkov models. Stochastic differential equations are very naturally formulated in nonMarkov settings, and existing data augmentation techniques can in principle handle this generalization. Furthermore this generalization is important for certain application areas such as finance where Markovianity can be doubtful. (See Dellaportas et al. (2004) for an example of a Bayesian analysis of a nonMarkov model by data augmentation.) Except in special cases, the EA cannot be applied for nonMarkov models.
Pavlopoulos discusses two generalizations. Firstly the case of regular accessible boundaries for diffusions, for instance with reflection at the boundary, is complicated. Where diffusion drifts are locally bounded and converging to 0 at the boundary, and volatilities are bounded away from 0, pure reflecting boundaries can easily be dealt with by unfolding the reflecting boundary producing mirror image dynamics either side of the boundary. This approach works for twosided boundaries also. However, it is not clear how to extend this to the case where the drift is not converging to 0 at the boundary since now the unfolded diffusion has discontinuous drift.
Secondly, Pavlopoulos asks about varying Δ_{t}. In fact this causes no problem for our methodology since all the Monte Carlo procedures are carried out independently between each two consecutive observations.
Kluppelberg and Mamon and Yu ask the intriguing question about whether our methodology can be extended to deal with processes that are driven by Lévy process noise. To date, we have no idea how this can be accomplished in the infinite activity Lévy case. However, we certainly agree that this is an important open problem arising from our work here.
Numerical investigations and comparisons
Several discussants (Clifford, Crisan, Lyons and Stramer) request further comparisons with existing methodology and we agree completely that this is needed.
There are two (not entirely mutually exclusive) stages to this work. Firstly are numerical investigations into the effectiveness of the EA for different diffusions in comparison with competing discretization schemes. Important progress towards this goal has been achieved in Casella's comment and in more detail in his doctoral thesis.
Furthermore, there are tight analytic bounds on the computational cost of the EA in Beskos and Roberts (2005) and Beskos et al. (2004). For instance in the recast von Mises example in the interesting discussion of Kent, Beskos and Roberts (2005) demonstrated that to simulate the diffusion over a time interval of length T requires computational effort of order λ^{2}T/σ^{2}.
The second stage in a comparison study will involve comparisons of the methods of this paper for likelihoodbased inference with competitor methodologies. One important issue here will be that, although we would expect most (or maybe all) methods to deteriorate for increasingly sparse data sets, it is of great interest to know winch methods show the greatest level of robustness to this phenomenon. For the Markov chain Monte Carlo data augmentation methodology for instance this involves studying the effects of block updating strategies for data sets where imputing the entire missing data between observations is inefficient. Some interesting conclusions in this direction appear in Chib et al. (2004).
For this comparison work, we emphasize again the important distinction between methods that can smoothly estimate whole likelihood surfaces and those which are focused on pointwise estimation. The former methods are highly likely to be the more effective in most statistical contexts. Smoothness (more precise) of the estimated likelihood surface is required for the proof of consistency (for large Monte Carlo samples) of maximum likelihood estimates as demonstrated in Beskos et al. (2005b). Lyons suggests comparisons with numerical methods for solving the partial differential equations that are satisfied by the diffusion transition density. This approach was explored in Lo (1988) although it appears difficult to estimate entire likelihood surfaces simultaneously by using this method.
Kessler asks interesting questions about the properties of estimators that are produced by our Monte Carlo methodology. We have only partial answers to this question, showing for instance in Beskos et al. (2005b) that, as the Monte Carlo sample size increases, the Monte Carlo maximum likelihood estimate converges almost surely to the true maximum likelihood estimate. Furthermore, the Monte Carlo EM method can give an unbiased estimate of the score function, and thus we would expect that, even for fixed Monte Carlo sample size per observation, the Monte Carlo EM maximum likelihood estimate would be consistent in the limit as the sample size increases. These are both important consequences of unbiasedness in the estimators. We certainly concur that more quantitative rate results would be desirable.
However, although our methods are based on Monte Carlo methods, as pointed out by Robert, the transparency of our methods readily permits variance reduction ideas which mean that, in the examples that we have considered so far, very robust estimates of whole likelihood surfaces are obtained using minimal computational effort.
Filtering and the Poisson estimator
Robert and Stramer ask for guidance about the optimal implementation of the Poisson estimator. In recent work (Fearnhead et al., 2005) we have generalized the Poisson estimator to permit a general distribution for the number of time points at which the diffusion is evaluated, and we have produced optimality criteria for this distribution. Motivated by these criteria, a sensible proposal distribution appears to be a negative binomial distribution whose mean is close to . In the EA1 setting of the SINE example, such a choice can lead to a reduction in variance by two or three orders of magnitude over the Poisson estimator with parameters that are chosen as in Section 7.
There are several comments concerning how our work can be applied to filtering problems, e.g. where we make partial observations of the underlying diffusion at discrete time points. Chopin points out that the EA enables a simple implementation of the basic particle filter (Gordon et al., 1993) as the EA enables us to simulate exactly the value of the state at the next time point given the current value of the state. Künsch suggests that the rejectionsamplingbased particle filter of Hürzeler and Künsch (1998) could be applied, and Doucet suggests a more general particle filter where the Poisson estimator is used to generate random weights for each particle, with the mean of these random weights being equal to the true (analytic) weight.
We have independently been investigating how the EA and related ideas can be applied to particle filters, and in Fearnhead et al. (2005) we have developed a general framework for unbiased particle filters. Here we give a brief summary of results in the case of observing the SINE data set with normal error (variance σ^{2}). For this application, the rejection sampling idea of Künsch can be applied by
This algorithm has the same overall acceptance probability as a simple implementation of algorithm EA1 for simulating from the SINE state diffusion, but it has the important advantage of simulating from the particle filter's target distribution at time t_{i+1} and avoiding the need for an importance sampling correction by the likelihood. This is particularly useful if σ^{2} is small relative to t_{i+1}−t_{i}. However, an efficient randomweight particle filter (similar to the ideas of Rousset and Doucet) has computational advantages over even this rejection algorithm. The advantages depend on the value of the states at time t_{i} and t_{i+1}—but in extreme cases (where both state values are close to a multiple of 2π) computational gains of over a factor of 10 are possible.
As Rousset and Doucet point out, to implement a randomweight particle filter requires the random weights to be positive with probability 1. Although this is straightforward to achieve in EA1 situations, we have adapted EA3 ideas to ensure this for general diffusions. The methodology that we have developed is trivially extended to inference for diffusiondriven Cox processes (Møller and Clifford), and it should be possible to extend to stochastic volatility models (Kalogeropoulos) and velocity models where positions are observed (Kent). For full details see Fearnhead et al. (2005).
Crisan suggests that it is more natural to avoid the step which transforms the observed diffusion to one of unit diffusion coefficient. It is clear that this is mathematically equivalent to our approach, and the Radon–Nikodym derivative that is obtained by using both approaches should be almost surely equivalent. In fact, the solution to the differential equation that he gives in his equation (32) turns out to define our unit diffusion coefficient transformation (3) in the time homogeneous case, and the time inhomogeneous case results in a simple generalization of transformation (3) which we describe in detail in Beskos et al. (2005a).
In fact, from a statistical perspective, it seems to us to be more natural to work with the transformed unit volatility process since this can be thought of as an infinitesimal standardization of the missing data.
Implicit in Crisan's comment is the useful idea that different dominating measure proposal distributions can be used as the basis for the rejection sampling approach. There are several different tractable unit diffusion coefficient proposals which could be used in this way: for example in the Cox–Ingersoll–Ross example we used a Bessel process, and in more recent work we have used Ornstein–Uhlenbeck processes. However, Crisan's approach offers no additional generality since any general diffusion which has explicit finite dimensional distributions corresponds (through transformation(3)) to one which has unit diffusion coefficient.
We concur with the view of Clifford, Künsch, and Mamon and Yu, who mention both the fact that diffusions that are used in statistical modelling are often rather approximate, but that precise macroscopic properties of models are likely to be unimportant in many practical modelling contexts. For instance in financial data, stylized facts about financial asset power variation are at odds with the characteristic finite nonzero quadratic variation which diffusions are bound to exhibit.
Clifford points out that the exact likelihood in the Cox–Ingersoll–Ross model is explicit. This is precisely why we used this example so that the analytic solution could be explicitly compared with our results.
Chopin raises the possibility of substantially increasing efficiency at the cost of a certain bias by truncating drifts in regions which the diffusion is unlikely to visit in the time period of interest. This seems a sensible practical suggestion. It turns out that algorithm EA3 using the layered Brownian motion construction can be motivated in a similar way, and moreover allows unbiasedness to be retained.
Pasquali and Ruggeri and GenonCatalot ask about the relationship between parameter estimation and diffusion stability. In the asymptotic case, much is known about this through the elegant theory of mixed asymptotic normality (see Basawa and Rao (1980)). In the finite sample case, it is difficult to make general statements. However, as the discussants point out, even our simple logistic growth model is sufficiently simple for interesting structure to emerge. In this example for instance, for σ^{2}>2R, the diffusion will eventually converge to 0 so presumably K will be badly estimated in this case.
Mamon and Yu ask about the comparisons of our methods with nonparametric methods for stationary diffusions. Kent mentions a similar issue, pointing out that serial correlation of the data can be taken into account by standard time series methods. Although this idea requires stationarity, an appealing feature of such methodology is the fact that it will provide an increasingly good approximation to an exact likelihoodbased approach for sparse data—exactly the situation which is problematic for all currently existing exact likelihoodbased methods.