### Abstract

- Top of page
- Abstract
- 1 Introduction
- 2 Nonlinear state-space models
- 3 Filter approximations based on Taylor expansion
- 4 Filter approximations based on numerical integration
- 5 Discussion
- 6 Example: bifurcation system
- 7 Example: ordinal measurements
- 8 Conclusion
- Acknowledgement
- References

Stochastic differential equations (SDE) are used as dynamical models for cross-sectional discrete time measurements (panel data). Thus causal effects are formulated on a fundamental infinitesimal time scale. Cumulated causal effects over the measurement interval can be expressed in terms of fundamental effects which are independent of the chosen sampling intervals (e.g. weekly, monthly, annually). The nonlinear continuous–discrete filter is the key tool in deriving a recursive sequence of time and measurement updates. Several approximation methods including the extended Kalman filter (EKF), higher order nonlinear filters (HNF), the local linearization filter (LLF), the unscented Kalman filter (UKF), the Gauss–Hermite filter (GHF) and generalizations (GGHF), as well as simulated filters (functional integral filter FIF) are compared.

### 1 Introduction

- Top of page
- Abstract
- 1 Introduction
- 2 Nonlinear state-space models
- 3 Filter approximations based on Taylor expansion
- 4 Filter approximations based on numerical integration
- 5 Discussion
- 6 Example: bifurcation system
- 7 Example: ordinal measurements
- 8 Conclusion
- Acknowledgement
- References

Continuous time models are natural, as time is a continuously flowing quantity without steps. On the other hand, empirical data in the social sciences and economics are mostly available only at certain time points, e.g. daily, weekly, quarterly or at arbitrary times. Only physical quantities such as voltages, pressures, levels of rivers, etc. may be measured on a continuous basis. Therefore, there has been a tendency to formulate dynamical models in discrete time (times series and panel analysis). Thus, the causal relations are specified between the arbitrary discrete measurement times. Bartlett (1946) argues as follows:

It will have been apparent that the discrete nature of our observations in many economic and other time series does not reflect any lack of continuity in the underlying series. Thus theoretically it should often prove more fundamental to eliminate this imposed artificiality. *An unemployment index does not cease to exist between readings, nor does Yule's pendulum cease to swing*. (emphasis H.S.)

Indeed there are many disadvantages of discrete time models. One of the most basic defects is that the dynamics are modeled between the (arbitrarily sampled) measurements and not between the dynamically relevant system states. For example, a physical system like a pendulum (cf. the citation above) fulfils a simple linear relation (Newton's equation for small amplitudes) between the state and its velocity change (acceleration), whereas the relationship between sampled measurements (e.g. daily) is very complicated and nonlinearly dependent on the parameters (mass, length of the pendulum, etc.) and the sampling interval. Moreover, the velocity cannot be measured with discrete time data (latent variable).

Discrete time studies with different sampling intervals cannot be compared, because the causal parameters relate to the chosen interval. Moreover, if the same dataset is analyzed with different intervals (select a weekly or monthly dataset from daily measurements), one gets estimates corresponding to these intervals which can be in contradiction.

Nevertheless, the *continuous–discrete state-space model* is able to combine both points of view:

This hybrid model first appeared in engineering (Jazwinski, 1970), but is now well known in econometrics, sociology, and psychology. One can estimate the parameters of the continuous time model from time series or panel measurements. This is achieved by computing the conditional probability density between the measurement times. In the linear Gaussian case, only the time-dependent conditional mean and autocovariance are needed. More generally, in the presence of latent states and errors of measurement, a measurement model can be defined, mapping the continuous time state to observable discrete time data.

### 2 Nonlinear state-space models

- Top of page
- Abstract
- 1 Introduction
- 2 Nonlinear state-space models
- 3 Filter approximations based on Taylor expansion
- 4 Filter approximations based on numerical integration
- 5 Discussion
- 6 Example: bifurcation system
- 7 Example: ordinal measurements
- 8 Conclusion
- Acknowledgement
- References

Whereas the linear continuous–discrete state-space model can be treated completely and efficiently by using the Kalman filter algorithm (Harvey and Stock, 1985; Jones and Ackerson, 1990; Jones and Boadi-Boateng, 1991; Jones, 1993; Singer, 1993, 1995, 1998; Hamerle, Nagl and Singer, 1993) or by structural equations models (SEM) with nonlinear parameter restrictions (Oud and Jansen, 1996, 2000; Singer, 2007; but cf. Hamerle, Nagl and Singer, 1991), there are many issues and competing approaches in the nonlinear field. It is presently an area of very active research because of the growing interest in finance models. The option price model of Black and Scholes (1973) relies on a stochastic differential equation (SDE) model for the underlying stock variable, and Merton's (1990) monograph on continuous finance has given the field a strong ‘continuous’ flavor. This is in contrast to econometrics where times-series methods still dominate and also sociology, despite the tradition of Bergstrom (1976), Coleman (1968) and others.

In the measurement model (2), *h*:ℝ^{p}×ℝ^{q}×ℝ^{u}ℝ^{k} is a measurement function mapping the latent state *y*_{n}(*t*) onto discrete time measurements *z*_{in},*i*=0,…,*T*_{n},*n*=1,…,*N*. The free parameters in *h* may be interpreted as nonlinear factor loadings. The error terms in (1) and (2) are mutually independent Gaussian white-noise processes with zero means and covariance *E*[(d*W*_{n}(*t*)/d*t*)(d*W*_{m}(*s*)/d*s*)^{′}]=*I*_{r}*δ*(*t*−*s*)*δ*_{nm}, , where *I*_{r} is the *r*-dimensional unit matrix.

As the panel units are independent, the panel index is dropped in the sequel for simplicity of notation. For maximum likelihood estimation, one only has to sum the *N* likelihood contributions of each panel unit. Alternatively, using Bayesian estimation, the parameter vector is filtered with the other states, and one has to use the extended state vector *η*(*t*)={*y*_{1}(*t*),…,*y*_{N}(*t*),*ψ*(*t*)}.

In case of random time effects *γ*d*V*(*t*) acting on each panel unit in the same way, the panel units are correlated and one can filter the extended state *η*(*t*)={*y*_{1}(*t*),…,*y*_{N}(*t*)} with random process error d*W*(*t*)={d*W*_{1}(*t*),…,d*W*_{N}(*t*),d*V*(*t*)}. This works both for maximum likelihood (ML) and Bayesian estimation of *ψ* (cf. section 6.2).

In the nonlinear case it is important to interpret the SDE (1) correctly. We use the Itô interpretation yielding simple moment equations (for a thorough discussion of the system theoretical aspects, see Arnold, 1974, ch. 10; Van Kampen, 1981; Singer, 1999, ch. 3). A strong simplification occurs when the state is completely measured at times *t*_{i}, i.e. *z*_{i}=*y*_{i}=*y*(*t*_{i}). Then, only the transition density *p*(*y*_{i+1},*t*_{i+1}|*y*_{i},*t*_{i}) must be computed in order to obtain the likelihood function (cf. Aït-Sahalia, 2002; Singer, 2006d). Unfortunately, the transition probability can be computed analytically only in some special cases (including the linear), but in general approximation methods must be employed. As the transition density fulfils a partial differential equation (PDE), the so-called Fokker-Planck equation (cf. 6), approximation methods for PDE, e.g. finite difference methods can be used (cf. Jensen and Poulsen, 2002).

A large class of approximations rests on linearization methods which can be applied to the exact moment equations [extended Kalman filter (EKF); second-order nonlinear filter (SNF); cf. Jazwinski, 1970] or directly to the nonlinear differential equation using Itô’s lemma [local linearization (LL); Shoji and Ozaki, 1997, 1998]. As linearity is only approximate in the vicinity of a measurement or of a reference trajectory, the conditional Gaussian schemes are valid only for short measurement intervals Δ*t*_{i}=*t*_{i+1}−*t*_{i}. Other linearization methods relate to the diffusion term, but are interpretable in terms of the EKF (Nowman, 1997).

A different class of approximations relates to the filter density. In the unscented Kalman filter (UKF), (cf. Julier and Uhlmann, 1997; Julier, Uhlmann and Durrant-White, 2000), the true density is replaced by a singular density with correct first and second moment, whereas the Gaussian filter (GF) assumes a normal density. Integrals in the update equations may be obtained using Gauss–Hermite quadrature [Gauss–Hermite filter (GHF); Ito and Xiong, 2000]. More generally, the density can be approximated by Gaussian sums (Gaussian sum filter) and the expectations in the moment equations are computed using the EKF, UKF or GHF (cf. Alspach and Sorenson, 1972; Ito and Xiong, 2000). Alternatively, the Monte Carlo method can be employed to obtain approximate transition densities (Pedersen, 1995; Andersen and Lund, 1997; Elerian, Chib and Shephard, 2001; Singer, 2002, 2003).

More recently, Hermite expansions of the transition density have been utilized by Aït-Sahalia (2002). In this approach, the expansion coefficients are expressed in terms of conditional moments and computed analytically by using computer algebra programs. The computations comprise the multiple action of the backward operator *L* on polynomials [*L*=*F*^{†} is the adjoint of the Fokker-Planck operator (6)]. Alternatively, one can use systems of moment differential equations (Singer, 2006d) or numerical integration (Challa, Bar-Shalom and Krishnamurthy, 2000; Singer, 2006b,c). It seems that this approach is most efficient both in accuracy and computing time (cf. Aït-Sahalia, 2002, Figure 1; Jensen and Poulsen, 2002).

Nonparametric approaches attempt to estimate the drift function *f* and the diffusion function Ω without assumptions about a certain functional form. They typically involve kernel density estimates of conditional densities (cf. Bandi and Phillips, 2003). Other approaches utilize Taylor series expansions of the drift function and estimate the derivatives (expansion coefficients) as latent states using the LL method (similar to the SNF; Shoji, 2002). Finally, the Daum filter, an *exact* nonlinear continuous–discrete filtering approach must be mentioned (Daum, 2005).

#### 2.1 Exact continuous–discrete filter

The exact time and measurement updates of the continuous–discrete filter are given by the recursive scheme (Jazwinski, 1970) for the conditional density *p*(*y*,*t*|*Z*^{i}) (again dropping panel index *n*).

##### 2.1.1 Time update

- (3)

#### 2.2 Exact moment equations

Instead of solving the time update equations for the conditional density (3), the moment equations for the first, second and higher order moments are considered. The general vector case is discussed in Singer (2006c). Using the Euler approximation for the SDE (1), one obtains in a short time interval *δ**t*(*δ**W*(*t*):=*W*(*t*+*δ**t*)−*W*(*t*))

- (9)

Taking the expectation *E*[⋯|*Z*^{i}] one gets the moment equation

- (10)

or in the limit *δ**t*0

- (11)

The higher order central moments

- (12)

fulfil (scalar notation, dropping the condition)

- (13)

Inserting the first moment (10) and setting *a*:=(*y*−*E*(*y*))+( *f*−*E*( *f*))*δ**t*:=*y*^{*}+*f*^{*}*δ**t* one obtains

- (18)

In general, up to *O*(*δ**t*) we have (*M*_{k}:= (*y*−*μ*)^{k})

- (19)

The exact moment equations (11, 19) are not differential equations, as they depend on the unknown conditional density *p*(*y*,*t*|*Z*^{i}). Using Taylor expansions or approximations of the conditional density one obtains several filter algorithms.

#### 2.3 Continuous-discrete filtering scheme

Using only the first and second moment equation (11, 18), and the optimal linear update (normal correlation) one obtains the recursive scheme (*A*^{−} is the generalized inverse of *A*)

##### 2.3.1 Initial condition: *t*=*t*_{0}

##### 2.3.2 Time update: *t* ∈ [*t*_{i},*t*_{i+1}]

##### 2.3.3 Measurement update: *t*=*t*_{i+1}

**Remarks. **

- 1
The time update for the interval

*t* ∈ [

*t*_{i},

*t*_{i+1}] was written using time slices of width

*δ**t*. They must be chosen small enough to yield a good approximation for the moment

equations (11, 18).

- 2
The measurement update is written using the theorem on normal correlation (

Liptser and Shiryayev, 1977, 1978, ch. 13, Thm 13.1, Lem. 14.1)

- (20)

- (21)

Inserting the measurement

equation (2) one obtains the measurement update of the filter. The formula is exact for Gaussian variables and the optimal

*linear* estimate for

*μ*(

*t*_{i+1}|

*t*_{i+1}),Σ(

*t*_{i+1}|

*t*_{i+1}) in the non-Gaussian case. It is natural to use, if only two moments are considered. Despite the linearity in

*z*_{i+1}, it still contains the measurement nonlinearities in the expectations involving

*h*(

*y*,

*t*). Alternatively, the Bayes formula (4) can be evaluated directly. This is necessary, if strongly nonlinear measurements are taken (e.g. the threshold mechanism for ordinal data; see

section 7).

The approximation of the expectation values containing the unknown filter density leads to several well-known algorithms:

- 1
Taylor expansion of

*f*,Ω and

*h* (

section 3): EKF, SNF, higher order nonlinear filter, HNF(2,

*L*) (

Jazwinski, 1970;

Singer, 2006d). Direct linearization in the SDE (1) using the Itô formula yields the LL approach of

Shoji and Ozaki (1997, 1998); cf.

Singer (2002).

- 2
Approximation of the filter density (

section 4): using sigma points: UKF (

Julier and Uhlmann, 1997;

Julier*et al.*, 2000), Gaussian density using Gauss–Hermite quadrature: GHF (

Ito and Xiong, 2000), Hermite expansion of filter density: GGHF (

Singer, 2006b,c).

### 7 Example: ordinal measurements

- Top of page
- Abstract
- 1 Introduction
- 2 Nonlinear state-space models
- 3 Filter approximations based on Taylor expansion
- 4 Filter approximations based on numerical integration
- 5 Discussion
- 6 Example: bifurcation system
- 7 Example: ordinal measurements
- 8 Conclusion
- Acknowledgement
- References

The nonlinear state-space model (1, 2) is flexible enough to model ordinal measurements via the threshold model

- (65)

where *θ* is the Heaviside step function and *c*_{j} are thresholds contained in the parameter vector *ψ*. The variance of the measurement error *R*=var(*ε*) is taken small (10^{−6} here), so that the measurement density

is proportional to the indicator function *χ*_{h−1(z)}(*y*). Now the measurements are strongly nonlinear and the *a posteriori* density is proportional to the *a priori* density truncated by the windows *C*_{j}= (*c*_{j},*c*_{j+1}] defined from the thresholds *c*= {−∞,*c*_{1},…,*c*_{J},∞}. Figure 7 shows the trajectory of panel unit *n*=10 together with the thresholds {*c*_{1},*c*_{2}}={−2,2} and the ordinal data *z*(*t*) ∈ {−1,0,1} (setting *z*_{0}=−1). The data were generated as in section 6 using an Euler–Maruyama scheme with *δ**t*=0.1. The data were filtered using the GGHF comparing the normal correlation and the Bayes update. As explained in section 4.3, the filter density is represented by the Hermite expansion *φ*(*y*;*μ*,*σ*^{2})*H*(*y*,*K*) and the measurement update is obtained either by the normal correlation (20, 21) or by the Bayes formula (4). In both cases, Gauss–Hermite integration can be used. Denoting the linear estimates (20, 21) by *μ*_{0} and Σ_{0}, the normal correlation update is given by the product

- (66)

- (67)

- (68)

In the equations above, the *a posteriori* distribution is non-Gaussian due to the Hermite part *H*(*y*;{*μ*,*m*_{2},…,*m*_{K}}), where {*μ*,*m*_{2},…,*m*_{K}} are the *a priori* moments. For strongly nonlinear measurements, the Bayes formula yields the exact expression

- (69)

- (70)

The likelihood integral contains the *a priori* Gauss part *φ*(*y*;*μ*,*σ*^{2}), but the efficiency of the Gauss–Hermite quadrature can be improved by integrating over the normal correlation update *φ*(*y*;*μ*_{0},Σ_{0}) (analogously to importance sampling). Figures 8 and 9 compare the updates in the case *K*=2 (Gaussian filter). The densities are always Gaussian, but the *a posteriori* moments are either the linear estimates or are computed using the Bayes formula. The latter method yields better updates which more closely approximate the truncated Gaussian *a posteriori* densities. Note that the measurement density *p*(*z*|*y*)=*φ*(*z*;*h*(*y*),*R*), which does not integrate to unity as function of *y*, was scaled by 10^{−3} in the graphics.

Using more terms (e.g. *K*=20) in the Hermite expansion yields a more realistic modeling of the bimodal *a priori* density (Figures 10 and 11). Note that the normal correlation update is non-Gaussian as well, because of the Hermite term *H*(*y*,*K*). In some cases the Bayes update tends to unrealistic oscillations in the *a posteriori* density. This is due to locally negative values of the Hermite series.