SEARCH

SEARCH BY CITATION

Keywords:

  • Itô calculus;
  • sampling;
  • continuous–discrete state-space models;
  • nonlinear Kalman filtering;
  • Hermite orthogonal expansion;
  • numerical integration;
  • Monte Carlo simulation.

Abstract

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Nonlinear state-space models
  5. 3 Filter approximations based on Taylor expansion
  6. 4 Filter approximations based on numerical integration
  7. 5 Discussion
  8. 6 Example: bifurcation system
  9. 7 Example: ordinal measurements
  10. 8 Conclusion
  11. Acknowledgement
  12. References

Stochastic differential equations (SDE) are used as dynamical models for cross-sectional discrete time measurements (panel data). Thus causal effects are formulated on a fundamental infinitesimal time scale. Cumulated causal effects over the measurement interval can be expressed in terms of fundamental effects which are independent of the chosen sampling intervals (e.g. weekly, monthly, annually). The nonlinear continuous–discrete filter is the key tool in deriving a recursive sequence of time and measurement updates. Several approximation methods including the extended Kalman filter (EKF), higher order nonlinear filters (HNF), the local linearization filter (LLF), the unscented Kalman filter (UKF), the Gauss–Hermite filter (GHF) and generalizations (GGHF), as well as simulated filters (functional integral filter FIF) are compared.


1 Introduction

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Nonlinear state-space models
  5. 3 Filter approximations based on Taylor expansion
  6. 4 Filter approximations based on numerical integration
  7. 5 Discussion
  8. 6 Example: bifurcation system
  9. 7 Example: ordinal measurements
  10. 8 Conclusion
  11. Acknowledgement
  12. References

Continuous time models are natural, as time is a continuously flowing quantity without steps. On the other hand, empirical data in the social sciences and economics are mostly available only at certain time points, e.g. daily, weekly, quarterly or at arbitrary times. Only physical quantities such as voltages, pressures, levels of rivers, etc. may be measured on a continuous basis. Therefore, there has been a tendency to formulate dynamical models in discrete time (times series and panel analysis). Thus, the causal relations are specified between the arbitrary discrete measurement times. Bartlett (1946) argues as follows:

It will have been apparent that the discrete nature of our observations in many economic and other time series does not reflect any lack of continuity in the underlying series. Thus theoretically it should often prove more fundamental to eliminate this imposed artificiality. An unemployment index does not cease to exist between readings, nor does Yule's pendulum cease to swing. (emphasis H.S.)

Indeed there are many disadvantages of discrete time models. One of the most basic defects is that the dynamics are modeled between the (arbitrarily sampled) measurements and not between the dynamically relevant system states. For example, a physical system like a pendulum (cf. the citation above) fulfils a simple linear relation (Newton's equation for small amplitudes) between the state and its velocity change (acceleration), whereas the relationship between sampled measurements (e.g. daily) is very complicated and nonlinearly dependent on the parameters (mass, length of the pendulum, etc.) and the sampling interval. Moreover, the velocity cannot be measured with discrete time data (latent variable).

Discrete time studies with different sampling intervals cannot be compared, because the causal parameters relate to the chosen interval. Moreover, if the same dataset is analyzed with different intervals (select a weekly or monthly dataset from daily measurements), one gets estimates corresponding to these intervals which can be in contradiction.

Nevertheless, the continuous–discrete state-space model is able to combine both points of view:

  • 1
    a continuous time dynamical model;
  • 2
    discrete time (sampled) measurements.

This hybrid model first appeared in engineering (Jazwinski, 1970), but is now well known in econometrics, sociology, and psychology. One can estimate the parameters of the continuous time model from time series or panel measurements. This is achieved by computing the conditional probability density between the measurement times. In the linear Gaussian case, only the time-dependent conditional mean and autocovariance are needed. More generally, in the presence of latent states and errors of measurement, a measurement model can be defined, mapping the continuous time state to observable discrete time data.

2 Nonlinear state-space models

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Nonlinear state-space models
  5. 3 Filter approximations based on Taylor expansion
  6. 4 Filter approximations based on numerical integration
  7. 5 Discussion
  8. 6 Example: bifurcation system
  9. 7 Example: ordinal measurements
  10. 8 Conclusion
  11. Acknowledgement
  12. References

Whereas the linear continuous–discrete state-space model can be treated completely and efficiently by using the Kalman filter algorithm (Harvey and Stock, 1985; Jones and Ackerson, 1990; Jones and Boadi-Boateng, 1991; Jones, 1993; Singer, 1993, 1995, 1998; Hamerle, Nagl and Singer, 1993) or by structural equations models (SEM) with nonlinear parameter restrictions (Oud and Jansen, 1996, 2000; Singer, 2007; but cf. Hamerle, Nagl and Singer, 1991), there are many issues and competing approaches in the nonlinear field. It is presently an area of very active research because of the growing interest in finance models. The option price model of Black and Scholes (1973) relies on a stochastic differential equation (SDE) model for the underlying stock variable, and Merton's (1990) monograph on continuous finance has given the field a strong ‘continuous’ flavor. This is in contrast to econometrics where times-series methods still dominate and also sociology, despite the tradition of Bergstrom (1976), Coleman (1968) and others.

We define the nonlinear continuous–discrete state-space model (Jazwinski, 1970, ch. 6.2) for the panel units n=1,…,N and the unit-specific measurement times tin

  • image(1)
  • image(2)

with nonlinear drift and diffusion functions f:ℝp×ℝq×ℝu[RIGHTWARDS ARROW]p and g:ℝp×ℝq×ℝu[RIGHTWARDS ARROW]p×ℝr, respectively. ψ ∈ ℝu is a u-dimensional parameter vector. The state vector yn(t) ∈ ℝp is a continuous time random process and the xn(t) ∈ ℝq are deterministic exogenous (control) variables. As usual, stochastic controls are treated by extending the state yn(t)[RIGHTWARDS ARROW]{yn(t),xn(t)}. The dependence on xn(t) includes the nonautonomous case xn(t)=t. Person-specific random-effects πn can be included by extending the state according to yn(t)[RIGHTWARDS ARROW]{yn(t),πn(t)} and defining the trivial dynamics dπn=0.

In the measurement model (2), h:ℝp×ℝq×ℝu[RIGHTWARDS ARROW]k is a measurement function mapping the latent state yn(t) onto discrete time measurements zin,i=0,…,Tn,n=1,…,N. The free parameters in h may be interpreted as nonlinear factor loadings. The error terms in (1) and (2) are mutually independent Gaussian white-noise processes with zero means and covariance E[(dWn(t)/dt)(dWm(s)/ds)]=Irδ(ts)δnm, inline image, where Ir is the r-dimensional unit matrix.

As the panel units are independent, the panel index is dropped in the sequel for simplicity of notation. For maximum likelihood estimation, one only has to sum the N likelihood contributions of each panel unit. Alternatively, using Bayesian estimation, the parameter vector is filtered with the other states, and one has to use the extended state vector η(t)={y1(t),…,yN(t),ψ(t)}.

In case of random time effects γdV(t) acting on each panel unit in the same way, the panel units are correlated and one can filter the extended state η(t)={y1(t),…,yN(t)} with random process error dW(t)={dW1(t),…,dWN(t),dV(t)}. This works both for maximum likelihood (ML) and Bayesian estimation of ψ (cf. section 6.2).

In the nonlinear case it is important to interpret the SDE (1) correctly. We use the Itô interpretation yielding simple moment equations (for a thorough discussion of the system theoretical aspects, see Arnold, 1974, ch. 10; Van Kampen, 1981; Singer, 1999, ch. 3). A strong simplification occurs when the state is completely measured at times ti, i.e. zi=yi=y(ti). Then, only the transition density p(yi+1,ti+1|yi,ti) must be computed in order to obtain the likelihood function (cf. Aït-Sahalia, 2002; Singer, 2006d). Unfortunately, the transition probability can be computed analytically only in some special cases (including the linear), but in general approximation methods must be employed. As the transition density fulfils a partial differential equation (PDE), the so-called Fokker-Planck equation (cf. 6), approximation methods for PDE, e.g. finite difference methods can be used (cf. Jensen and Poulsen, 2002).

A large class of approximations rests on linearization methods which can be applied to the exact moment equations [extended Kalman filter (EKF); second-order nonlinear filter (SNF); cf. Jazwinski, 1970] or directly to the nonlinear differential equation using Itô’s lemma [local linearization (LL); Shoji and Ozaki, 1997, 1998]. As linearity is only approximate in the vicinity of a measurement or of a reference trajectory, the conditional Gaussian schemes are valid only for short measurement intervals Δti=ti+1ti. Other linearization methods relate to the diffusion term, but are interpretable in terms of the EKF (Nowman, 1997).

A different class of approximations relates to the filter density. In the unscented Kalman filter (UKF), (cf. Julier and Uhlmann, 1997; Julier, Uhlmann and Durrant-White, 2000), the true density is replaced by a singular density with correct first and second moment, whereas the Gaussian filter (GF) assumes a normal density. Integrals in the update equations may be obtained using Gauss–Hermite quadrature [Gauss–Hermite filter (GHF); Ito and Xiong, 2000]. More generally, the density can be approximated by Gaussian sums (Gaussian sum filter) and the expectations in the moment equations are computed using the EKF, UKF or GHF (cf. Alspach and Sorenson, 1972; Ito and Xiong, 2000). Alternatively, the Monte Carlo method can be employed to obtain approximate transition densities (Pedersen, 1995; Andersen and Lund, 1997; Elerian, Chib and Shephard, 2001; Singer, 2002, 2003).

More recently, Hermite expansions of the transition density have been utilized by Aït-Sahalia (2002). In this approach, the expansion coefficients are expressed in terms of conditional moments and computed analytically by using computer algebra programs. The computations comprise the multiple action of the backward operator L on polynomials [L=F is the adjoint of the Fokker-Planck operator (6)]. Alternatively, one can use systems of moment differential equations (Singer, 2006d) or numerical integration (Challa, Bar-Shalom and Krishnamurthy, 2000; Singer, 2006b,c). It seems that this approach is most efficient both in accuracy and computing time (cf. Aït-Sahalia, 2002, Figure 1; Jensen and Poulsen, 2002).

image

Figure 1.  Potential Φ(y) as a function of y for several parameter values α=−3,−2,…,1.

Download figure to PowerPoint

Nonparametric approaches attempt to estimate the drift function f and the diffusion function Ω without assumptions about a certain functional form. They typically involve kernel density estimates of conditional densities (cf. Bandi and Phillips, 2003). Other approaches utilize Taylor series expansions of the drift function and estimate the derivatives (expansion coefficients) as latent states using the LL method (similar to the SNF; Shoji, 2002). Finally, the Daum filter, an exact nonlinear continuous–discrete filtering approach must be mentioned (Daum, 2005).

2.1 Exact continuous–discrete filter

The exact time and measurement updates of the continuous–discrete filter are given by the recursive scheme (Jazwinski, 1970) for the conditional density p(y,t|Zi) (again dropping panel index n).

2.1.1 Time update
  • image(3)
2.1.2 Measurement update
  • image(4)
  • image(5)

i=0,…,T−1, where F in

  • image(6)

is the Fokker–Planck operator, Zi={z(tj)|tjti} are the observations up to time ti, and yi:=y(ti) and p(zi+1|Zi) is the likelihood function of observation zi+1. Equation (3) describes the time evolution of the conditional density p(y,t|Zi), given information up to the last measurement, and the measurement update is a discontinuous change caused by new information using the Bayes formula. The above scheme is exact, but can be solved explicitly only for the linear case where the filter density is Gaussian with conditional moments

  • image(7)
  • image(8)

or under conditions in the Daum filter (Daum, 2005).

2.2 Exact moment equations

Instead of solving the time update equations for the conditional density (3), the moment equations for the first, second and higher order moments are considered. The general vector case is discussed in Singer (2006c). Using the Euler approximation for the SDE (1), one obtains in a short time interval δt(δW(t):=W(t+δt)−W(t))

  • image(9)

Taking the expectation E[⋯|Zi] one gets the moment equation

  • image(10)

or in the limit δt[RIGHTWARDS ARROW]0

  • image(11)

The higher order central moments

  • image(12)

fulfil (scalar notation, dropping the condition)

  • image(13)

Using the binomial formula we obtain, utilizing the independence of y(t) and δW(t)

  • image(14)
  • image(15)

as c:=δW(t) is a Gaussian process. For example, the second moment (variance) m2=σ2 fulfils

  • image(16)
  • image
  • image(17)

Inserting the first moment (10) and setting a:=(yE(y))+( fEf))δt:=y*+f*δt one obtains

  • image(18)

In general, up to O(δt) we have (Mk:= (yμ)k)

  • image(19)

The exact moment equations (11, 19) are not differential equations, as they depend on the unknown conditional density p(y,t|Zi). Using Taylor expansions or approximations of the conditional density one obtains several filter algorithms.

2.3 Continuous-discrete filtering scheme

Using only the first and second moment equation (11, 18), and the optimal linear update (normal correlation) one obtains the recursive scheme (A is the generalized inverse of A)

2.3.1 Initial condition: t=t0
  • image

i=0,…,T−1.

2.3.2 Time update: t ∈ [ti,ti+1]
  • image
2.3.3 Measurement update: t=ti+1
  • image
  • image

Remarks. 

  • 1
    The time update for the interval t ∈ [ti,ti+1] was written using time slices of width δt. They must be chosen small enough to yield a good approximation for the moment equations (11, 18).
  • 2
    The measurement update is written using the theorem on normal correlation (Liptser and Shiryayev, 1977, 1978, ch. 13, Thm 13.1, Lem. 14.1)
    • image(20)
    • image(21)
    Inserting the measurement equation (2) one obtains the measurement update of the filter. The formula is exact for Gaussian variables and the optimal linear estimate for μ(ti+1|ti+1),Σ(ti+1|ti+1) in the non-Gaussian case. It is natural to use, if only two moments are considered. Despite the linearity in zi+1, it still contains the measurement nonlinearities in the expectations involving h(y,t). Alternatively, the Bayes formula (4) can be evaluated directly. This is necessary, if strongly nonlinear measurements are taken (e.g. the threshold mechanism for ordinal data; see section 7).

The approximation of the expectation values containing the unknown filter density leads to several well-known algorithms:

3 Filter approximations based on Taylor expansion

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Nonlinear state-space models
  5. 3 Filter approximations based on Taylor expansion
  6. 4 Filter approximations based on numerical integration
  7. 5 Discussion
  8. 6 Example: bifurcation system
  9. 7 Example: ordinal measurements
  10. 8 Conclusion
  11. Acknowledgement
  12. References

3.1 Extended Kalman filter

Using Taylor expansions around the conditional mean μ(τj|ti) for the nonlinear functions in the filtering scheme, one obtains

  • image(22)
  • image(23)
  • image(24)

Expanding around μ(ti+1|ti), the measurement update is approximately

  • image(25)
  • image(26)
  • image(27)

3.2 Second-order nonlinear filter

Expanding up to second order one obtains (using short notation and dropping third moments)

  • image(28)
  • image(29)
  • image(30)
  • image(31)
  • image(32)
  • image(33)

where

  • image

etc. Expanding to higher orders in the HNF yields moments of order k>2 on the right-hand side, which must be dropped or factorized by the Gaussian assumption inline image even, mk=0,k odd. (For details, see Jazwinski, 1970, or Singer, 2006d.)

3.3 Local linearization

A related algorithm occurs if the drift is expanded directly in SDE (1). Using Itô’s lemma one obtains

  • image(34)

Freezing the coefficients at (yi,ti) and using a state-independent diffusion coefficient Ω(s), Shoji and Ozaki (1997, 1998) obtained the linearized SDE (titti+1)

  • image

The corresponding moment equations are

  • image(35)
  • image(36)

By contrast to the EKF and SNF moment equations which is a system of nonlinear differential equations, the Jacobians are evaluated once at the measurements (yi,ti) and the differential equations are linear and not coupled (for details, cf. Singer, 2002).

4 Filter approximations based on numerical integration

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Nonlinear state-space models
  5. 3 Filter approximations based on Taylor expansion
  6. 4 Filter approximations based on numerical integration
  7. 5 Discussion
  8. 6 Example: bifurcation system
  9. 7 Example: ordinal measurements
  10. 8 Conclusion
  11. Acknowledgement
  12. References

The traditional way of nonlinear filtering has been the expansion of the system functions f, Ω and h. Another approach is the approximation of the filtering density p(y|Zi).

4.1 Unscented Kalman filtering

The idea of Julier and Uhlmann (1997) was the definition of so-called sigma points with the property that the weighted mean and variance over these points coincide with the true parameters. According to Julieret al. (2000) one can take the 2p+1 points

  • image(37)
  • image(38)
  • image(39)

with weights

  • image(40)
  • image(41)

where Γ.l is the lth column of the Cholesky root of Σ=ΓΓ, κ a scaling factor and p the dimension of the random vector X. For example, in the univariate case p=1 one obtains the three points inline image.

The UT method may be interpreted in terms of the singular density

  • image(42)

Then, however, only non-negative weights αl are admissible. Generally, the expectation

  • image

and

  • image(43)
  • image(44)
  • image(45)

yields the correct first and second moment. Nonlinear expectations are easily evaluated as sums

  • image(46)
  • image(47)
  • image(48)

Using large κ, the EKF formula ETaylor[f(X)]=f(μ) is recovered.

All expectations in the filter are evaluated using the sigma points computed from the conditional moments μ(τj|ti),Σ(τj|ti). To display the dependence on the moments, the notation yl=yl(μ,Σ) will be used. For example, the terms in the time update are (short notation dropping arguments)

  • image(49)
  • image(50)
  • image(51)

with sigma points yl=yl(μ(τj|ti),τj),Σ(τj|ti)). With the new moments μ(τj+1|ti), Σ(τj|ti), updated sigma points are computed.

4.2 Gauss–Hermite filtering

For the Gaussian filter, one may assume that the true p(x) is approximated by a Gaussian distribution φ(x;μ,σ2) with the same mean μ and variance σ2. Then, the Gaussian integral

  • image(52)
  • image(53)

may be approximated by Gauss–Hermite quadrature (cf. Ito and Xiong, 2000). The ζl and wl are quadrature points and weights, respectively for the standard Gaussian distribution φ(z;0,1). If such an approximation is used, one obtains the GHF. Generally, filters using Gaussian densities are called GF. The GHF can be interpreted in terms of the singular density

  • image

concentrated at the quadrature points ξl. The Gauss–Hermite quadrature rule is exact up to order O(x2m−1). Multivariate Gaussian integrals can be computed by transforming to the standard normal distribution and p-fold application of (52). The Gaussian filter is equivalent to an expansion of f to higher orders L

  • image(54)

(HNF(2,L)) and factorization of the moments according to the Gaussian assumption

  • image

This leads to an exact computation of (52) for L[RIGHTWARDS ARROW]∞. In this limit, HNF and GF coincide. In EKF = HNF(2,1) and SNF = HNF(2,2), the higher order corrections are neglected. Moreover, third and higher order moments could be used [HNF(K,L); cf. Singer, 2006d]. It is interesting that κ=2,p=1 in the UT corresponds to a Gauss–Hermite rule with m=3 sample points (Ito and Xiong, 2000).

4.3 Generalized Gauss–Hermite filtering

The Gaussian filter assumed a Gauss density φ(y;μ,σ2) for the filter distribution p(y). More generally, one can use a Hermite expansion

  • image(55)

with Fourier coefficients c0=1,c1=0,c2=0,

  • image(56)
  • image(57)

Z:= (Yμ)/σ and orthogonal polynomials H0=1,H1=x,H2=y2−1,H3=y3−3y,H4=y4−6y2+3, etc. Expectation values occuring in the update equations are again computed by Gauss–Hermite integration, including the non-Gaussian term

  • image(58)

As H(y;{μ,m2,…,mK}) depends on higher order moments, one must use K moments equations (19). The choice K=2 recovers the usual Gaussian filter, as c0=1,c1=0,c2=0. The Hermite expansion can model bimodal, skewed, and leptokurtic distributions. (For details, see Singer, 2006b,c,d and Section 7.)

Related algorithms have been developed by Srinivasan (1970) and Challaet al. (2000), but we formulate the time update as integro-differential equations, solved stepwise by using Gauss–Hermite integration. Moreover, computation of the measurement update (Bayes formula) is improved. We use the normal correlation update as Gaussian weight function in the Gauss–Hermite quadrature to achieve higher numerical accuracy. The a posteriori moments are obtained directly without iterative procedure.

In contrast, Challaet al. (2000) use truncated moment equations in the time update (higher order moments are set to zero) and (iterated) EKF measurement updates as approximate means and variances for the posterior Hermite series (p. 3400). Moreover, only linear system equations are considered.

In the proposed approach, the time update equations are closed by explicit integration over the given Hermite expansion of order K. Therefore, for K=2 we obtain the GHF as special case, whereas Challaet al. (2000; p. 3399) obtain the EKF. Thus, like the GHF, higher order moments are not neglected but approximated through the approximate Hermite density.

5 Discussion

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Nonlinear state-space models
  5. 3 Filter approximations based on Taylor expansion
  6. 4 Filter approximations based on numerical integration
  7. 5 Discussion
  8. 6 Example: bifurcation system
  9. 7 Example: ordinal measurements
  10. 8 Conclusion
  11. Acknowledgement
  12. References
  • 1
    The density-based filters UKF and GHF have the strong advantage, that no derivatives of the system functions must be computed. This is no problem for the EKF and SNF, but for higher orders in the HNF(K,L) complicated tensor expressions arise. Moreover, higher order moments must be dropped or factorized in order to obtain closed equations. In the multivariate case, the formulas for Gaussian moments are involved. The fourth moment is
    • image(59)
    For a general formula, see Gardiner (1996, p. 36).
  • 2
    Apart from an implementation point of view [see (1)], the low-order EKF and SNF suffer from problems such as filter divergencies, especially when the sampling intervals are large. Simulation studies suggest, that the UKF and GHF are more stable and yield smaller filtering error in the mean (Singer, 2006a).
  • 3
    The moment equations and measurement updates as derived in section (2.3) involve expectations with respect to the filter density p(y), but not for the noise processes. Their statistics are already included in these updates [the terms E[Ω] dt=E[g dW(g dW)] and R=var(ε) stem from the noise sequences]. Thus no sigma points w.r.t. the noises must be computed, as suggested in the literature on the UKF (Julieret al., 2000; Sitz, Schwarz and Kurths, 2002a; Sitzet al., 2000b). This is only necessary if the system is first modeled deterministically and afterwards extended by the noises. This is neither necessary nor efficient.

6 Example: bifurcation system

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Nonlinear state-space models
  5. 3 Filter approximations based on Taylor expansion
  6. 4 Filter approximations based on numerical integration
  7. 5 Discussion
  8. 6 Example: bifurcation system
  9. 7 Example: ordinal measurements
  10. 8 Conclusion
  11. Acknowledgement
  12. References

6.1 Model without time effect

Several filtering algorithms can be used to compute the likelihood for each panel unit and the sum of all likelihood contributions is maximized. We study the nonlinear system

  • image(60)

with measurement equation

  • image(61)

measured at times ti ∈ {0,4,6,8,10,11,12,13.5,13.7,15, 15.1,17, 19,20}. The measurement times could be different for each panel unit. The random initial condition was yn(t0)∼N(0,10). The nonlinear drift f(y)=−[αy+βy3] is the negative gradient of a potential Φ(y)= (1/2)αy2+(1/4)βy4 and the motion may be visualized as a Brownian motion in the landscape defined by Φ(y) (Figure 1). The stationary density is given by pstat∝ exp [−(2/σ2)Φ(y)] (Figure 2). For β>0, the potential can have two minima and one maximum [α<0 or one minimum (α≥0)]. Such a qualitative change following a continuous variation of a parameter is called a bifurcation (Figure 3).

image

Figure 2.  Stationary density pstat∝ exp [−(2/σ2)Φ(y)] as a function of y for several parameter values α=−3,−2,…,1.

Download figure to PowerPoint

image

Figure 3.  Minima and maxima of Φ(y) as a function of α (bifurcation diagram).

Download figure to PowerPoint

The model is interesting from both a theoretical and an application point of view: The density function strongly deviates from Gaussian behavior, at least in the bimodal state α<0,β>0. Thus it is a good test for filters relying on only two moments. For applications, it can model systems with two stable states with a sudden transition to only one equilibrum. It has been used for phase transitions (Ginzburg-Landau theory; cf. Haken, 1977, chap. 6.4, 6.7–8), stability of engineering systems (Frey, 1996), and equilibrium states in economics (Herings, 1996).

Figure 4 shows the true trajectory, the measurements and approximate 67% (highest probability density) HPD confidence intervals (conditional mean ± standard deviation) for all panel units n=1,…,10 using the UKF(κ=2) = GHF(m=3) and the true parameter vector ψ= {α,β,σ,R}={−1,0.1,2,1}. Figure 5 displays the true trajectory, the measurements, and 67% HPD confidence intervals for panel unit n=10 using several filter algorithms. It can be shown that some algorithms such as the SNF exhibit divergencies in the first measurement interval [0,4] when the conditional mean approaches zero. Simulation studies demonstrate that the EKF, SNF, and LL are prone to such numerical instabilities (cf. Singer, 2006a). Higher order expansion of the drift can avoid such singularities. As noted, the Gaussian filter corresponds to an infinite Taylor expansion with Gaussian factorization of the moments (cf. eqn. 54). Generally, the density-based UKF and GHF are numerically more stable than EKF and SNF and lead to smaller filtering error.

image

Figure 4.  UKF(κ=2) = GHF(m=3) for all N=10 panel units. True trajectory, measurements (dots), and 67% HPD band.

Download figure to PowerPoint

image

Figure 5.  Panel unit n=10: comparison of several filter algorithms.

Download figure to PowerPoint

The performance of the filters was compared in a simulation study (see Table 1), where the ML estimates of ψ were computed for M=100 replications and N=10 panel units. The data were simulated by using an N-fold repetition of an Euler–Maruyama scheme with time step δt=0.1, i.e. yj+1=f(yj)δt+g(yj)δWj; j=0,…,tT/δt − 1; inline image i.i.d., y0N(0,10) i.i.d. and the data zi were selected at the indices ji=ti/δt according to zi=yji+εi;εiN(0,R) i.i.d.

Table 1.   Simulation study for bifurcation model.
ParameterValueMeanSDBiasRMSE
  1. Notes: Distribution of ML estimates in M=100 replications. M= number of converged samples.

EKF, M=100
α−1.0−0.2541690.4137990.7458310.852932
β0.10.06588080.0730564−0.03411920.080631
σ2.02.017770.6380360.0177670.638283
R1.00.9518570.444057−0.04814350.446659
SNF, M=84
α−1.0−0.1199260.09532360.8800740.885222
β0.10.02315720.00795955−0.07684280.0772539
σ2.01.467650.215205−0.532350.574203
R1.01.15050.3994590.1504970.426869
LL, M=65
α−1.0−0.08853850.1173380.9114610.918983
β0.10.01637480.00790206−0.08362520.0839977
σ2.01.480730.246138−0.5192730.574655
R1.01.250620.3708420.2506190.447586
UKF, κ=0,M=99
α−1.0−0.7127360.952550.2872640.994923
β0.10.0761220.0853785−0.0238780.0886547
σ2.01.798750.403944−0.2012460.451299
R1.01.074170.4420.07417270.44818
UKF, κ=1,M=96
α−1.0−1.093441.00893−0.09344431.01325
β0.10.1020810.0866850.00208060.08671
σ2.02.116630.4399070.1166320.455106
R1.00.9845570.450917−0.01544270.451182
UKF, κ=2,M=93
α−1.0−1.514081.19571−0.5140841.30154
β0.10.1241420.08981910.0241420.093007
σ2.02.462630.4893410.4626250.673407
R1.00.8736140.437204−0.1263860.455105
GHF, m=4,M=96
α−1.0−1.45211.14993−0.45211.23561
β0.10.1225610.08942790.02256060.0922298
σ2.02.394970.4403230.3949710.591511
R1.00.8919530.433076−0.1080470.446351

The likelihood was maximized using a quasi-Newton algorithm with numerical score function and Broyden, Fletcher, Goldfarb, Shanno (BFGS) secant updates (Dennis and Schnabel, 1983). In terms of root mean square error (RMSE), the several algorithms are comparable. There is a tradeoff between bias and variance of the estimates. For example, the estimates for α in the Taylor-based methods EKF, SNF and LL are strongly biased, but the standard deviation is smaller than that for the density-based algorithms. However, because of the long sampling intervals, EKF, SNF, and LL tend to diverge, and large conditional means |μ(t|Zi)|>YMAX=1000 were reset to zero. Overall, there is no algorithm with clear advantages, although UKF and GHF are more stable. The UKF furthermore has the problem of choosing the scaling parameter κ. It seems, in this example, that κ=0 yields the smallest RMSE, whereas the bias is minimized for κ=1.

6.2Model with time effects

As mentioned in section 2, models with random time effects of the form gdWn(t)+γdV(t) can be estimated using the extended state η(t) = {y1(t),…,yN(t)} with random process error dW(t) = {dW1(t),…,dWN(t),dV(t)}. The specification is

  • image(62)

where

  • image(63)
  • image(64)

IN:N×N is the unit matrix and 1N:N×1 is a vector of ones. The diffusion matrix inline image yields an equicorrelation structure between the panel units, but more general specifications are possible. On the other hand, the filtering problem is of dimension p*N, which is feasible only for small panels. For linear systems, simplifications are possible as the variance equation of the Kalman filter preserves the equicorrelation form of the conditional variances Σ(ti|ti) and Σ(ti+1|ti). For nonlinear systems, the equicorrelation form is not preserved as the drift terms cov[F(η(τj),τj),η(τj)|Zi] in the variance equation (cf. section 2.3) contain different elements. Figure 6 depicts the correlated movement of two panel units caused by the joint random shocks γdV(t). Individual, time-independent random effects πn could be joined to the state η as well.

image

Figure 6.  Panel units n=1,2: correlated movement due to joint time effects dV(t).

Download figure to PowerPoint

In a second simulation study (Table 2), the error term was split into the effects σ dWn(t)+γ dV(t) with true values σ=1 and γ=1. Thus, the diagonal of Ω is σ2+γ2=2. The main difference of this from the previous study is that much more divergences occurred for the Taylor-based methods. Especially, the SNF diverged in all cases and LL converged only in M=26 samples. These instabilities can be traced to the moment equations (11, 18), where second-order corrections in the μ equation lead to divergences in the variance equation. These can be compensated by third-order Taylor expansion and subsequent Gauss factorization of higher order moments. The equicorrelation part of the moment equations caused by the time effect seems to amplify the divergence problems. In the particle filters UKF or GHF these are avoided by numerical integration of the moment equations (Sections 4.1 and 4.2). Here numerical problems in some samples result from an indefinite covariance matrix. The UKF performs best for parameters α and β. Unfortunately the good results for LL are based on only 26 converged replications. The SNF even diverged in all cases.

Table 2.   Simulation study for bifurcation model with time effect γ dW(t).
ParameterValueMeanSDBiasRMSE
  1. Notes: Distribution of ML estimates in M=100 replications. M= number of converged samples. The SNF diverged in all cases.

EKF, M=91
α−1.0−0.7404020.6676060.2595980.716302
β0.10.1039610.09777440.003960580.0978545
σ1.01.583160.8430680.5831571.0251
R1.00.5515740.571612−0.4484260.726517
γ1.00.9181240.42866−0.08187560.436409
LL, M=26
α−1.0−0.2829380.1236820.7170620.72765
β0.10.02811440.00915889−0.07188560.0724667
σ1.00.7775950.216964−0.2224050.310705
R1.01.008240.2330770.008236830.233223
γ1.00.6853450.243837−0.3146550.398076
UKF, κ=0,M=98
α−1.0−1.285580.822361−0.2855790.870536
β0.10.1092120.06957660.009211810.0701838
σ1.01.719710.4908240.719710.871143
R1.00.749360.292858−0.250640.385469
γ1.01.060270.5548910.06026540.558154
UKF, κ=1,M=98
α−1.0−1.210120.784112−0.2101160.811776
β0.10.103040.0666030.003040340.0666724
σ1.01.690340.5216320.6903390.865256
R1.00.7597830.299567−0.2402170.383985
γ1.01.072460.5470390.07245730.551817
UKF, κ=2,M=97
α−1.0−1.173150.75184−0.1731530.771521
β0.10.09940640.0632362−0.0005935810.063239
σ1.01.694120.5228880.694120.869031
R1.00.767520.296705−0.232480.376936
γ1.01.042450.6129310.04244890.6144
GHF, m=4,M=94
α−1.0−1.747231.04536−0.7472331.28496
β0.10.152930.08689950.05292990.10175
σ1.01.583090.5156880.5830860.778411
R1.00.7667450.262719−0.2332550.351325
γ1.01.151340.5019230.1513410.524243

For the error parameters σ,γ, and R, the GHF performs best. Most surprising is the good performance of the EKF (lowest RMSE for α). Overall, the time effect leads to a higher RMSE of the estimates for the variance components and more convergence problems.

7 Example: ordinal measurements

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Nonlinear state-space models
  5. 3 Filter approximations based on Taylor expansion
  6. 4 Filter approximations based on numerical integration
  7. 5 Discussion
  8. 6 Example: bifurcation system
  9. 7 Example: ordinal measurements
  10. 8 Conclusion
  11. Acknowledgement
  12. References

The nonlinear state-space model (1, 2) is flexible enough to model ordinal measurements via the threshold model

  • image(65)

where θ is the Heaviside step function and cj are thresholds contained in the parameter vector ψ. The variance of the measurement error R=var(ε) is taken small (10−6 here), so that the measurement density

  • image

is proportional to the indicator function χh−1(z)(y). Now the measurements are strongly nonlinear and the a posteriori density is proportional to the a priori density truncated by the windows Cj= (cj,cj+1] defined from the thresholds c= {−∞,c1,…,cJ,∞}. Figure 7 shows the trajectory of panel unit n=10 together with the thresholds {c1,c2}={−2,2} and the ordinal data z(t) ∈ {−1,0,1} (setting z0=−1). The data were generated as in section 6 using an Euler–Maruyama scheme with δt=0.1. The data were filtered using the GGHF comparing the normal correlation and the Bayes update. As explained in section 4.3, the filter density is represented by the Hermite expansion φ(y;μ,σ2)H(y,K) and the measurement update is obtained either by the normal correlation (20, 21) or by the Bayes formula (4). In both cases, Gauss–Hermite integration can be used. Denoting the linear estimates (20, 21) by μ0 and Σ0, the normal correlation update is given by the product

image

Figure 7.  Panel unit n=10: latent trajectory (grey) and ordinal measurements (black) with thresholds {c1,c2}={−2,2}. The values of z(t) are −1,0,1.

Download figure to PowerPoint

  • image(66)
  • image(67)
  • image(68)

In the equations above, the a posteriori distribution is non-Gaussian due to the Hermite part H(y;{μ,m2,…,mK}), where {μ,m2,…,mK} are the a priori moments. For strongly nonlinear measurements, the Bayes formula yields the exact expression

  • image(69)
  • image(70)

The likelihood integral contains the a priori Gauss part φ(y;μ,σ2), but the efficiency of the Gauss–Hermite quadrature can be improved by integrating over the normal correlation update φ(y;μ00) (analogously to importance sampling). Figures 8 and 9 compare the updates in the case K=2 (Gaussian filter). The densities are always Gaussian, but the a posteriori moments are either the linear estimates or are computed using the Bayes formula. The latter method yields better updates which more closely approximate the truncated Gaussian a posteriori densities. Note that the measurement density p(z|y)=φ(z;h(y),R), which does not integrate to unity as function of y, was scaled by 10−3 in the graphics.

image

Figure 8.  GGHF: measurement updates of threshold model; normal correlation, K=2 (GHF); a priori (black), a posteriori (dark grey) and measurement density p(z|y) (light grey). Note that p(z|y) does not integrate to unity as a function of y.

Download figure to PowerPoint

image

Figure 9.  GGHF: measurement updates of threshold model; Bayes formula, K=2 (GHF). Note that the update yields a better approximation of p(y|z); a priori (black), a posteriori (dark grey) and measurement density p(z|y) (light grey).

Download figure to PowerPoint

Using more terms (e.g. K=20) in the Hermite expansion yields a more realistic modeling of the bimodal a priori density (Figures 10 and 11). Note that the normal correlation update is non-Gaussian as well, because of the Hermite term H(y,K). In some cases the Bayes update tends to unrealistic oscillations in the a posteriori density. This is due to locally negative values of the Hermite series.

image

Figure 10.  GGHF: measurement updates of threshold model; normal correlation, K=20; a priori (black), a posteriori (dark grey) and measurement density p(z|y) (light grey).

Download figure to PowerPoint

image

Figure 11.  GGHF: measurement updates of threshold model; Bayes formula, K=20; a priori (black), a posteriori (dark grey) and measurement density p(z|y) (light grey). Oscillation at t=13.7 (see text).

Download figure to PowerPoint

8 Conclusion

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Nonlinear state-space models
  5. 3 Filter approximations based on Taylor expansion
  6. 4 Filter approximations based on numerical integration
  7. 5 Discussion
  8. 6 Example: bifurcation system
  9. 7 Example: ordinal measurements
  10. 8 Conclusion
  11. Acknowledgement
  12. References

We compared filtering algorithms for nonlinear panel models in continuous time with discrete time measurements. The classical algorithms EKF and SNF are based on Taylor expansions of the moment equations (or the Itô formula in the case of LL). In contrast, UKF and GHF use numerical integration for the expectation values. The UT transformation is directly applied to the moment equations avoiding the extension of the system state and doubling the dimension. ML estimation of a Ginzburg–Landau model did not yield a uniformly best method, but in terms of bias, the UKF with κ=1 was best (model without time effects). Inclusion of time effects leads to an N*p dimensional filtering problem because of correlated panel units. Again there were advantages for the particle filters UKF and GHF. Finally, ordinal data were treated using a threshold model using the GHF and the GGHF (K=20). The Bayes update is superior to the linear normal correlation. As the measurement function is not differentiable, the EKF-type algorithms cannot be used here. The Hermite expansion yields a more realistic approximation of the truncated a posteriori density, but already the Gaussian case K=2 leads to sufficient results.

References

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Nonlinear state-space models
  5. 3 Filter approximations based on Taylor expansion
  6. 4 Filter approximations based on numerical integration
  7. 5 Discussion
  8. 6 Example: bifurcation system
  9. 7 Example: ordinal measurements
  10. 8 Conclusion
  11. Acknowledgement
  12. References
  • Aït-Sahalia, Y. (2002), Maximum likelihood estimation of discretely sampled diffusions: a closed-form approximation approach, Econometrica 70, 223262.
  • Alspach, D. and H. Sorenson (1972), Nonlinear Bayesian estimation using Gaussian sum approximations, IEEE Transactions on Automatic Control 17, 439448.
  • Andersen, T. and J. Lund (1997), Estimating continuous-time stochastic volatility models of the short-term interest rate, Journal of Econometrics 77, 343377.
  • Arnold, L. (1974), Stochastic differential equations, John Wiley, New York.
  • Bandi, F. and P. Phillips (2003), Fully nonparametric estimation of scalar diffusion models, Econometrica 71, 241283.
  • Bartlett, M. (1946), On the theoretical specification and sampling properties of autocorrelated time-series, Journal of the Royal Statistical Society (Supplement) 7, 2741.
  • Bergstrom, A. (1976), Non recursive models as discrete approximations to systems of stochastic differential equations, in: A.Bergstrom (ed.), Statistical inference in continuous time models, North Holland, Amsterdam, 1526.
  • Black, F. and M. Scholes (1973), The pricing of options and corporate liabilities, Journal of Political Economy 81, 637654.
  • Challa, S., Y. Bar-Shalom and V. Krishnamurthy (2000), Nonlinear filtering via generalized Edgeworth series and Gauss-Hermite quadrature, IEEE Transactions on Signal Processing 48, 18161820.
  • Coleman, J. (1968), The mathematical study of change, in: H.Blalock and A. B.Blalock (eds.), Methodology in social research, McGraw-Hill, New York, 428478.
  • Daum, F. (2005), Nonlinear filters: beyond the Kalman filter, IEEE A&E Systems Magazine 20, 5769.
  • Dennis, J. Jr. and R. Schnabel (1983), Numerical methods for unconstrained optimization and nonlinear equations, Prentice Hall, Englewood Cliffs, NJ.
  • Elerian, O., S. Chib and N. Shephard (2001), Likelihood inference for discretely observed nonlinear diffusions, Econometrica 69, 4, 959993.
  • Frey, M. (1996), A Wiener filter, state-space flux-optimal control against escape from a potential well, IEEE Transactions on Automatic Control 41, 216223.
  • Gardiner, C. (1996), Handbook of stochastic methods, 2nd edn, Springer, Berlin, Heidelberg, New York.
  • Haken, H. (1977), Synergetics, Springer, Berlin.
  • Hamerle, A., W. Nagl and H. Singer (1991), Problems with the estimation of stochastic differential equations using structural equations models, Journal of Mathematical Sociology 16, 201220.
  • Hamerle, A., W. Nagl and H. Singer (1993), Identification and estimation of continuous time dynamic systems with exogenous variables using panel data, Econometric Theory 9, 283295.
  • Harvey, A. and J. Stock (1985), The estimation of higher order continuous time autoregressive models, Econometric Theory 1, 97112.
  • Herings, J. (1996), Static and dynamic aspects of general disequilibrium theory, Kluwer, Boston, London, Dordrecht.
  • Ito, K. and K. Xiong (2000), Gaussian filters for nonlinear filtering problems, IEEE Transactions on Automatic Control 45, 910927.
  • Jazwinski, A. (1970), Stochastic processes and filtering theory, Academic Press, New York.
  • Jensen, B. and R. Poulsen (2002), Transition densities of diffusion processes: numerical comparision of approximation techniques, Institutional Investor, Summer 2002, 1832.
  • Jones, R. (1993), Longitudinal data with serial correlation: a state space approach, Chapman and Hall, New York.
  • Jones, R. and L. Ackerson (1990), Serial correlation in unequally spaced longitudinal data, Biometrika 77, 721731.
  • Jones, R. and F. Boadi-Boateng (1991), Unequally spaced longitudinal data with AR(1) serial correlation, Biometrics 47, 161175.
  • Julier, S. and J. Uhlmann (1997), A new extension of the Kalman filter to nonlinear systems, The 11th International Symposium on Aerospace/Defense Sensing, Simulation and Control, Orlando, FL.
  • Julier, S., J. Uhlmann and H. F. Durrant-White (2000), A new method for the nonlinear transformation of means and covariances in filters and estimators, IEEE Transactions on Automatic Control 45, 477482.
  • Liptser, R. and A. Shiryayev (1977, 1978), Statistics of random processes, Volumes I and II, Springer, New York, Heidelberg, Berlin.
  • Merton, R. (1990), Continuous-time finance, Blackwell, Cambridge, MA; Oxford UK.
  • Nowman, K. (1997), Gaussian estimation of single-factor continuous time models of the term structure of interest rates, Journal of Finance 52, 16951703.
  • Oud, J. and R. Jansen (1996), Nonstationary longitudinal LISREL model estimation from incomplete panel data using EM and the Kalman smoother, in: U.Engel and J.Reinecke (eds.), Analysis of change, de Gruyter, Berlin, New York, 135159.
  • Oud, J. and R. Jansen (2000), Continuous time state space modeling of panel data by means of SEM, Psychometrika 65, 199215.
  • Pedersen, A. (1995), A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations, Scandinavian Journal of Statistics 22, 5571.
  • Shoji, I. (2002), Nonparametric state estimation of diffusion processes, Biometrika 89, 451456.
  • Shoji, I. and T. Ozaki (1997), Comparative study of estimation methods for continuous time stochastic processes, Journal of Time Series Analysis 18, 485506.
  • Shoji, I. and T. Ozaki (1998), A statistical method of estimation and simulation for systems of stochastic differential equations, Biometrika 85, 240243.
  • Singer, H. (1993), Continuous-time dynamical systems with sampled data, errors of measurement and unobserved components, Journal of Time Series Analysis 14, 527545.
  • Singer, H. (1995), Analytical score function for irregularly sampled continuous time stochastic processes with control variables and missing values, Econometric Theory 11, 721735.
  • Singer, H. (1998), Continuous panel models with time dependent parameters, Journal of Mathematical Sociology 23, 7798.
  • Singer, H. (1999), Finanzmarktökonometrie. Zeitstetige Systeme und ihre Anwendung in Ökonometrie und empirischer Kapitalmarktforschung, Physica-Verlag, Heidelberg.
  • Singer, H. (2002), Parameter estimation of nonlinear stochastic differential equations: simulated maximum likelihood vs. extended Kalman filter and Itô-Taylor expansion, Journal of Computational and Graphical Statistics 11, 972995.
  • Singer, H. (2003), Simulated maximum likelihood in nonlinear continuous-discrete state space models: importance sampling by approximate smoothing, Computational Statistics 18, 79106.
  • Singer, H. (2006a), Continuous-discrete unscented Kalman filtering, Diskussionsbeiträge Fachbereich Wirtschaftswissenschaft 384, FernUniversität in Hagen, http://www.fernuni-hagen.de/FBWIWI/forschung/beitraege/pdf/db384.pdf.
  • Singer, H. (2006b), Generalized Gauss-Hermite filtering, Diskussionsbeiträge Fachbereich Wirtschaftswissenschaft 391, FernUniversität in Hagen, http://www.fernuni-hagen.de/FBWIWI/forschung/beitraege/pdf/db391.pdf.
  • Singer, H. (2006c), Generalized Gauss-Hermite filtering for multivariate diffusion processes, Diskussionsbeiträge Fachbereich Wirtschaftswissenschaft 402, FernUniversität in Hagen, http://www.fernuni-hagen.de/FBWIWI/forschung/beitraege/pdf/db402.pdf.
  • Singer, H. (2006d), Moment equations and Hermite expansion for nonlinear stochastic differential equations with application to stock price models, Computational Statistics 21, 385397.
  • Singer, H. (2007), Stochastic differential equation models with sampled data, in: K.Van Montfort, H.Oud and A.Satorra (eds.), longitudinal models in the behavioral and related sciences, The European Association of Methodology (EAM) Methodology and Statistics series, vol. II, Lawrence Erlbaum Associates, Mahwah, London, 73106.
  • Sitz, A., U. Schwarz and J. Kurths (2002a), The unscented Kalman filter, a powerful tool for data analysis, International Journal of Bifurcation and Chaos 14, 6, 20932105.
  • Sitz, A., U. Schwarz, J. Kurths and H. Voss (2002b), Estimation of parameters and unobserved components for nonlinear systems from noisy time series, Physical Review E 66, 016210-1–016210-9.
  • Srinivasan, K. (1970), State estimation by orthogonal expansion of probability distributions, IEEE Transactions on Automatic Control 15, 310.
  • Van Kampen, N. (1981), Itô vs. Stratonovich, Journal of Statistical Physics 24, 175187.