Nonlinear continuous time modeling approaches in panel research

Authors


hermann.singer@fernuni-hagen.de

Abstract

Stochastic differential equations (SDE) are used as dynamical models for cross-sectional discrete time measurements (panel data). Thus causal effects are formulated on a fundamental infinitesimal time scale. Cumulated causal effects over the measurement interval can be expressed in terms of fundamental effects which are independent of the chosen sampling intervals (e.g. weekly, monthly, annually). The nonlinear continuous–discrete filter is the key tool in deriving a recursive sequence of time and measurement updates. Several approximation methods including the extended Kalman filter (EKF), higher order nonlinear filters (HNF), the local linearization filter (LLF), the unscented Kalman filter (UKF), the Gauss–Hermite filter (GHF) and generalizations (GGHF), as well as simulated filters (functional integral filter FIF) are compared.

1 Introduction

Continuous time models are natural, as time is a continuously flowing quantity without steps. On the other hand, empirical data in the social sciences and economics are mostly available only at certain time points, e.g. daily, weekly, quarterly or at arbitrary times. Only physical quantities such as voltages, pressures, levels of rivers, etc. may be measured on a continuous basis. Therefore, there has been a tendency to formulate dynamical models in discrete time (times series and panel analysis). Thus, the causal relations are specified between the arbitrary discrete measurement times. Bartlett (1946) argues as follows:

It will have been apparent that the discrete nature of our observations in many economic and other time series does not reflect any lack of continuity in the underlying series. Thus theoretically it should often prove more fundamental to eliminate this imposed artificiality. An unemployment index does not cease to exist between readings, nor does Yule's pendulum cease to swing. (emphasis H.S.)

Indeed there are many disadvantages of discrete time models. One of the most basic defects is that the dynamics are modeled between the (arbitrarily sampled) measurements and not between the dynamically relevant system states. For example, a physical system like a pendulum (cf. the citation above) fulfils a simple linear relation (Newton's equation for small amplitudes) between the state and its velocity change (acceleration), whereas the relationship between sampled measurements (e.g. daily) is very complicated and nonlinearly dependent on the parameters (mass, length of the pendulum, etc.) and the sampling interval. Moreover, the velocity cannot be measured with discrete time data (latent variable).

Discrete time studies with different sampling intervals cannot be compared, because the causal parameters relate to the chosen interval. Moreover, if the same dataset is analyzed with different intervals (select a weekly or monthly dataset from daily measurements), one gets estimates corresponding to these intervals which can be in contradiction.

Nevertheless, the continuous–discrete state-space model is able to combine both points of view:

  • 1a continuous time dynamical model;
  • 2discrete time (sampled) measurements.

This hybrid model first appeared in engineering (Jazwinski, 1970), but is now well known in econometrics, sociology, and psychology. One can estimate the parameters of the continuous time model from time series or panel measurements. This is achieved by computing the conditional probability density between the measurement times. In the linear Gaussian case, only the time-dependent conditional mean and autocovariance are needed. More generally, in the presence of latent states and errors of measurement, a measurement model can be defined, mapping the continuous time state to observable discrete time data.

2 Nonlinear state-space models

Whereas the linear continuous–discrete state-space model can be treated completely and efficiently by using the Kalman filter algorithm (Harvey and Stock, 1985; Jones and Ackerson, 1990; Jones and Boadi-Boateng, 1991; Jones, 1993; Singer, 1993, 1995, 1998; Hamerle, Nagl and Singer, 1993) or by structural equations models (SEM) with nonlinear parameter restrictions (Oud and Jansen, 1996, 2000; Singer, 2007; but cf. Hamerle, Nagl and Singer, 1991), there are many issues and competing approaches in the nonlinear field. It is presently an area of very active research because of the growing interest in finance models. The option price model of Black and Scholes (1973) relies on a stochastic differential equation (SDE) model for the underlying stock variable, and Merton's (1990) monograph on continuous finance has given the field a strong ‘continuous’ flavor. This is in contrast to econometrics where times-series methods still dominate and also sociology, despite the tradition of Bergstrom (1976), Coleman (1968) and others.

We define the nonlinear continuous–discrete state-space model (Jazwinski, 1970, ch. 6.2) for the panel units n=1,…,N and the unit-specific measurement times tin

image(1)
image(2)

with nonlinear drift and diffusion functions f:ℝp×ℝq×ℝu→ℝp and g:ℝp×ℝq×ℝu→ℝp×ℝr, respectively. ψ ∈ ℝu is a u-dimensional parameter vector. The state vector yn(t) ∈ ℝp is a continuous time random process and the xn(t) ∈ ℝq are deterministic exogenous (control) variables. As usual, stochastic controls are treated by extending the state yn(t)→{yn(t),xn(t)}. The dependence on xn(t) includes the nonautonomous case xn(t)=t. Person-specific random-effects πn can be included by extending the state according to yn(t)→{yn(t),πn(t)} and defining the trivial dynamics dπn=0.

In the measurement model (2), h:ℝp×ℝq×ℝu→ℝk is a measurement function mapping the latent state yn(t) onto discrete time measurements zin,i=0,…,Tn,n=1,…,N. The free parameters in h may be interpreted as nonlinear factor loadings. The error terms in (1) and (2) are mutually independent Gaussian white-noise processes with zero means and covariance E[(dWn(t)/dt)(dWm(s)/ds)]=Irδ(ts)δnm, inline image, where Ir is the r-dimensional unit matrix.

As the panel units are independent, the panel index is dropped in the sequel for simplicity of notation. For maximum likelihood estimation, one only has to sum the N likelihood contributions of each panel unit. Alternatively, using Bayesian estimation, the parameter vector is filtered with the other states, and one has to use the extended state vector η(t)={y1(t),…,yN(t),ψ(t)}.

In case of random time effects γdV(t) acting on each panel unit in the same way, the panel units are correlated and one can filter the extended state η(t)={y1(t),…,yN(t)} with random process error dW(t)={dW1(t),…,dWN(t),dV(t)}. This works both for maximum likelihood (ML) and Bayesian estimation of ψ (cf. section 6.2).

In the nonlinear case it is important to interpret the SDE (1) correctly. We use the Itô interpretation yielding simple moment equations (for a thorough discussion of the system theoretical aspects, see Arnold, 1974, ch. 10; Van Kampen, 1981; Singer, 1999, ch. 3). A strong simplification occurs when the state is completely measured at times ti, i.e. zi=yi=y(ti). Then, only the transition density p(yi+1,ti+1|yi,ti) must be computed in order to obtain the likelihood function (cf. Aït-Sahalia, 2002; Singer, 2006d). Unfortunately, the transition probability can be computed analytically only in some special cases (including the linear), but in general approximation methods must be employed. As the transition density fulfils a partial differential equation (PDE), the so-called Fokker-Planck equation (cf. 6), approximation methods for PDE, e.g. finite difference methods can be used (cf. Jensen and Poulsen, 2002).

A large class of approximations rests on linearization methods which can be applied to the exact moment equations [extended Kalman filter (EKF); second-order nonlinear filter (SNF); cf. Jazwinski, 1970] or directly to the nonlinear differential equation using Itô’s lemma [local linearization (LL); Shoji and Ozaki, 1997, 1998]. As linearity is only approximate in the vicinity of a measurement or of a reference trajectory, the conditional Gaussian schemes are valid only for short measurement intervals Δti=ti+1ti. Other linearization methods relate to the diffusion term, but are interpretable in terms of the EKF (Nowman, 1997).

A different class of approximations relates to the filter density. In the unscented Kalman filter (UKF), (cf. Julier and Uhlmann, 1997; Julier, Uhlmann and Durrant-White, 2000), the true density is replaced by a singular density with correct first and second moment, whereas the Gaussian filter (GF) assumes a normal density. Integrals in the update equations may be obtained using Gauss–Hermite quadrature [Gauss–Hermite filter (GHF); Ito and Xiong, 2000]. More generally, the density can be approximated by Gaussian sums (Gaussian sum filter) and the expectations in the moment equations are computed using the EKF, UKF or GHF (cf. Alspach and Sorenson, 1972; Ito and Xiong, 2000). Alternatively, the Monte Carlo method can be employed to obtain approximate transition densities (Pedersen, 1995; Andersen and Lund, 1997; Elerian, Chib and Shephard, 2001; Singer, 2002, 2003).

More recently, Hermite expansions of the transition density have been utilized by Aït-Sahalia (2002). In this approach, the expansion coefficients are expressed in terms of conditional moments and computed analytically by using computer algebra programs. The computations comprise the multiple action of the backward operator L on polynomials [L=F is the adjoint of the Fokker-Planck operator (6)]. Alternatively, one can use systems of moment differential equations (Singer, 2006d) or numerical integration (Challa, Bar-Shalom and Krishnamurthy, 2000; Singer, 2006b,c). It seems that this approach is most efficient both in accuracy and computing time (cf. Aït-Sahalia, 2002, Figure 1; Jensen and Poulsen, 2002).

Figure 1.

 Potential Φ(y) as a function of y for several parameter values α=−3,−2,…,1.

Nonparametric approaches attempt to estimate the drift function f and the diffusion function Ω without assumptions about a certain functional form. They typically involve kernel density estimates of conditional densities (cf. Bandi and Phillips, 2003). Other approaches utilize Taylor series expansions of the drift function and estimate the derivatives (expansion coefficients) as latent states using the LL method (similar to the SNF; Shoji, 2002). Finally, the Daum filter, an exact nonlinear continuous–discrete filtering approach must be mentioned (Daum, 2005).

2.1 Exact continuous–discrete filter

The exact time and measurement updates of the continuous–discrete filter are given by the recursive scheme (Jazwinski, 1970) for the conditional density p(y,t|Zi) (again dropping panel index n).

2.1.1 Time update

image(3)

2.1.2 Measurement update

image(4)
image(5)

i=0,…,T−1, where F in

image(6)

is the Fokker–Planck operator, Zi={z(tj)|tjti} are the observations up to time ti, and yi:=y(ti) and p(zi+1|Zi) is the likelihood function of observation zi+1. Equation (3) describes the time evolution of the conditional density p(y,t|Zi), given information up to the last measurement, and the measurement update is a discontinuous change caused by new information using the Bayes formula. The above scheme is exact, but can be solved explicitly only for the linear case where the filter density is Gaussian with conditional moments

image(7)
image(8)

or under conditions in the Daum filter (Daum, 2005).

2.2 Exact moment equations

Instead of solving the time update equations for the conditional density (3), the moment equations for the first, second and higher order moments are considered. The general vector case is discussed in Singer (2006c). Using the Euler approximation for the SDE (1), one obtains in a short time interval δt(δW(t):=W(t+δt)−W(t))

image(9)

Taking the expectation E[⋯|Zi] one gets the moment equation

image(10)

or in the limit δt→0

image(11)

The higher order central moments

image(12)

fulfil (scalar notation, dropping the condition)

image(13)

Using the binomial formula we obtain, utilizing the independence of y(t) and δW(t)

image(14)
image(15)

as c:=δW(t) is a Gaussian process. For example, the second moment (variance) m2=σ2 fulfils

image(16)
image
image(17)

Inserting the first moment (10) and setting a:=(yE(y))+( fEf))δt:=y*+f*δt one obtains

image(18)

In general, up to O(δt) we have (Mk:= (yμ)k)

image(19)

The exact moment equations (11, 19) are not differential equations, as they depend on the unknown conditional density p(y,t|Zi). Using Taylor expansions or approximations of the conditional density one obtains several filter algorithms.

2.3 Continuous-discrete filtering scheme

Using only the first and second moment equation (11, 18), and the optimal linear update (normal correlation) one obtains the recursive scheme (A is the generalized inverse of A)

2.3.1 Initial condition: t=t0

image

i=0,…,T−1.

2.3.2 Time update: t ∈ [ti,ti+1]

image

2.3.3 Measurement update: t=ti+1

image
image

Remarks. 

  • 1The time update for the interval t ∈ [ti,ti+1] was written using time slices of width δt. They must be chosen small enough to yield a good approximation for the moment equations (11, 18).
  • 2The measurement update is written using the theorem on normal correlation (Liptser and Shiryayev, 1977, 1978, ch. 13, Thm 13.1, Lem. 14.1)
    image(20)
    image(21)
    Inserting the measurement equation (2) one obtains the measurement update of the filter. The formula is exact for Gaussian variables and the optimal linear estimate for μ(ti+1|ti+1),Σ(ti+1|ti+1) in the non-Gaussian case. It is natural to use, if only two moments are considered. Despite the linearity in zi+1, it still contains the measurement nonlinearities in the expectations involving h(y,t). Alternatively, the Bayes formula (4) can be evaluated directly. This is necessary, if strongly nonlinear measurements are taken (e.g. the threshold mechanism for ordinal data; see section 7).

The approximation of the expectation values containing the unknown filter density leads to several well-known algorithms:

3 Filter approximations based on Taylor expansion

3.1 Extended Kalman filter

Using Taylor expansions around the conditional mean μ(τj|ti) for the nonlinear functions in the filtering scheme, one obtains

image(22)
image(23)
image(24)

Expanding around μ(ti+1|ti), the measurement update is approximately

image(25)
image(26)
image(27)

3.2 Second-order nonlinear filter

Expanding up to second order one obtains (using short notation and dropping third moments)

image(28)
image(29)
image(30)
image(31)
image(32)
image(33)

where

image

etc. Expanding to higher orders in the HNF yields moments of order k>2 on the right-hand side, which must be dropped or factorized by the Gaussian assumption inline image even, mk=0,k odd. (For details, see Jazwinski, 1970, or Singer, 2006d.)

3.3 Local linearization

A related algorithm occurs if the drift is expanded directly in SDE (1). Using Itô’s lemma one obtains

image(34)

Freezing the coefficients at (yi,ti) and using a state-independent diffusion coefficient Ω(s), Shoji and Ozaki (1997, 1998) obtained the linearized SDE (titti+1)

image

The corresponding moment equations are

image(35)
image(36)

By contrast to the EKF and SNF moment equations which is a system of nonlinear differential equations, the Jacobians are evaluated once at the measurements (yi,ti) and the differential equations are linear and not coupled (for details, cf. Singer, 2002).

4 Filter approximations based on numerical integration

The traditional way of nonlinear filtering has been the expansion of the system functions f, Ω and h. Another approach is the approximation of the filtering density p(y|Zi).

4.1 Unscented Kalman filtering

The idea of Julier and Uhlmann (1997) was the definition of so-called sigma points with the property that the weighted mean and variance over these points coincide with the true parameters. According to Julieret al. (2000) one can take the 2p+1 points

image(37)
image(38)
image(39)

with weights

image(40)
image(41)

where Γ.l is the lth column of the Cholesky root of Σ=ΓΓ, κ a scaling factor and p the dimension of the random vector X. For example, in the univariate case p=1 one obtains the three points inline image.

The UT method may be interpreted in terms of the singular density

image(42)

Then, however, only non-negative weights αl are admissible. Generally, the expectation

image

and

image(43)
image(44)
image(45)

yields the correct first and second moment. Nonlinear expectations are easily evaluated as sums

image(46)
image(47)
image(48)

Using large κ, the EKF formula ETaylor[f(X)]=f(μ) is recovered.

All expectations in the filter are evaluated using the sigma points computed from the conditional moments μ(τj|ti),Σ(τj|ti). To display the dependence on the moments, the notation yl=yl(μ,Σ) will be used. For example, the terms in the time update are (short notation dropping arguments)

image(49)
image(50)
image(51)

with sigma points yl=yl(μ(τj|ti),τj),Σ(τj|ti)). With the new moments μ(τj+1|ti), Σ(τj|ti), updated sigma points are computed.

4.2 Gauss–Hermite filtering

For the Gaussian filter, one may assume that the true p(x) is approximated by a Gaussian distribution φ(x;μ,σ2) with the same mean μ and variance σ2. Then, the Gaussian integral

image(52)
image(53)

may be approximated by Gauss–Hermite quadrature (cf. Ito and Xiong, 2000). The ζl and wl are quadrature points and weights, respectively for the standard Gaussian distribution φ(z;0,1). If such an approximation is used, one obtains the GHF. Generally, filters using Gaussian densities are called GF. The GHF can be interpreted in terms of the singular density

image

concentrated at the quadrature points ξl. The Gauss–Hermite quadrature rule is exact up to order O(x2m−1). Multivariate Gaussian integrals can be computed by transforming to the standard normal distribution and p-fold application of (52). The Gaussian filter is equivalent to an expansion of f to higher orders L

image(54)

(HNF(2,L)) and factorization of the moments according to the Gaussian assumption

image

This leads to an exact computation of (52) for L→∞. In this limit, HNF and GF coincide. In EKF = HNF(2,1) and SNF = HNF(2,2), the higher order corrections are neglected. Moreover, third and higher order moments could be used [HNF(K,L); cf. Singer, 2006d]. It is interesting that κ=2,p=1 in the UT corresponds to a Gauss–Hermite rule with m=3 sample points (Ito and Xiong, 2000).

4.3 Generalized Gauss–Hermite filtering

The Gaussian filter assumed a Gauss density φ(y;μ,σ2) for the filter distribution p(y). More generally, one can use a Hermite expansion

image(55)

with Fourier coefficients c0=1,c1=0,c2=0,

image(56)
image(57)

Z:= (Yμ)/σ and orthogonal polynomials H0=1,H1=x,H2=y2−1,H3=y3−3y,H4=y4−6y2+3, etc. Expectation values occuring in the update equations are again computed by Gauss–Hermite integration, including the non-Gaussian term

image(58)

As H(y;{μ,m2,…,mK}) depends on higher order moments, one must use K moments equations (19). The choice K=2 recovers the usual Gaussian filter, as c0=1,c1=0,c2=0. The Hermite expansion can model bimodal, skewed, and leptokurtic distributions. (For details, see Singer, 2006b,c,d and Section 7.)

Related algorithms have been developed by Srinivasan (1970) and Challaet al. (2000), but we formulate the time update as integro-differential equations, solved stepwise by using Gauss–Hermite integration. Moreover, computation of the measurement update (Bayes formula) is improved. We use the normal correlation update as Gaussian weight function in the Gauss–Hermite quadrature to achieve higher numerical accuracy. The a posteriori moments are obtained directly without iterative procedure.

In contrast, Challaet al. (2000) use truncated moment equations in the time update (higher order moments are set to zero) and (iterated) EKF measurement updates as approximate means and variances for the posterior Hermite series (p. 3400). Moreover, only linear system equations are considered.

In the proposed approach, the time update equations are closed by explicit integration over the given Hermite expansion of order K. Therefore, for K=2 we obtain the GHF as special case, whereas Challaet al. (2000; p. 3399) obtain the EKF. Thus, like the GHF, higher order moments are not neglected but approximated through the approximate Hermite density.

5 Discussion

  • 1The density-based filters UKF and GHF have the strong advantage, that no derivatives of the system functions must be computed. This is no problem for the EKF and SNF, but for higher orders in the HNF(K,L) complicated tensor expressions arise. Moreover, higher order moments must be dropped or factorized in order to obtain closed equations. In the multivariate case, the formulas for Gaussian moments are involved. The fourth moment is
    image(59)
    For a general formula, see Gardiner (1996, p. 36).
  • 2Apart from an implementation point of view [see (1)], the low-order EKF and SNF suffer from problems such as filter divergencies, especially when the sampling intervals are large. Simulation studies suggest, that the UKF and GHF are more stable and yield smaller filtering error in the mean (Singer, 2006a).
  • 3The moment equations and measurement updates as derived in section (2.3) involve expectations with respect to the filter density p(y), but not for the noise processes. Their statistics are already included in these updates [the terms E[Ω] dt=E[g dW(g dW)] and R=var(ε) stem from the noise sequences]. Thus no sigma points w.r.t. the noises must be computed, as suggested in the literature on the UKF (Julieret al., 2000; Sitz, Schwarz and Kurths, 2002a; Sitzet al., 2000b). This is only necessary if the system is first modeled deterministically and afterwards extended by the noises. This is neither necessary nor efficient.

6 Example: bifurcation system

6.1 Model without time effect

Several filtering algorithms can be used to compute the likelihood for each panel unit and the sum of all likelihood contributions is maximized. We study the nonlinear system

image(60)

with measurement equation

image(61)

measured at times ti ∈ {0,4,6,8,10,11,12,13.5,13.7,15, 15.1,17, 19,20}. The measurement times could be different for each panel unit. The random initial condition was yn(t0)∼N(0,10). The nonlinear drift f(y)=−[αy+βy3] is the negative gradient of a potential Φ(y)= (1/2)αy2+(1/4)βy4 and the motion may be visualized as a Brownian motion in the landscape defined by Φ(y) (Figure 1). The stationary density is given by pstat∝ exp [−(2/σ2)Φ(y)] (Figure 2). For β>0, the potential can have two minima and one maximum [α<0 or one minimum (α≥0)]. Such a qualitative change following a continuous variation of a parameter is called a bifurcation (Figure 3).

Figure 2.

 Stationary density pstat∝ exp [−(2/σ2)Φ(y)] as a function of y for several parameter values α=−3,−2,…,1.

Figure 3.

 Minima and maxima of Φ(y) as a function of α (bifurcation diagram).

The model is interesting from both a theoretical and an application point of view: The density function strongly deviates from Gaussian behavior, at least in the bimodal state α<0,β>0. Thus it is a good test for filters relying on only two moments. For applications, it can model systems with two stable states with a sudden transition to only one equilibrum. It has been used for phase transitions (Ginzburg-Landau theory; cf. Haken, 1977, chap. 6.4, 6.7–8), stability of engineering systems (Frey, 1996), and equilibrium states in economics (Herings, 1996).

Figure 4 shows the true trajectory, the measurements and approximate 67% (highest probability density) HPD confidence intervals (conditional mean ± standard deviation) for all panel units n=1,…,10 using the UKF(κ=2) = GHF(m=3) and the true parameter vector ψ= {α,β,σ,R}={−1,0.1,2,1}. Figure 5 displays the true trajectory, the measurements, and 67% HPD confidence intervals for panel unit n=10 using several filter algorithms. It can be shown that some algorithms such as the SNF exhibit divergencies in the first measurement interval [0,4] when the conditional mean approaches zero. Simulation studies demonstrate that the EKF, SNF, and LL are prone to such numerical instabilities (cf. Singer, 2006a). Higher order expansion of the drift can avoid such singularities. As noted, the Gaussian filter corresponds to an infinite Taylor expansion with Gaussian factorization of the moments (cf. eqn. 54). Generally, the density-based UKF and GHF are numerically more stable than EKF and SNF and lead to smaller filtering error.

Figure 4.

 UKF(κ=2) = GHF(m=3) for all N=10 panel units. True trajectory, measurements (dots), and 67% HPD band.

Figure 5.

 Panel unit n=10: comparison of several filter algorithms.

The performance of the filters was compared in a simulation study (see Table 1), where the ML estimates of ψ were computed for M=100 replications and N=10 panel units. The data were simulated by using an N-fold repetition of an Euler–Maruyama scheme with time step δt=0.1, i.e. yj+1=f(yj)δt+g(yj)δWj; j=0,…,tT/δt − 1; inline image i.i.d., y0N(0,10) i.i.d. and the data zi were selected at the indices ji=ti/δt according to zi=yji+εi;εiN(0,R) i.i.d.

Table 1.   Simulation study for bifurcation model.
ParameterValueMeanSDBiasRMSE
  1. Notes: Distribution of ML estimates in M=100 replications. M= number of converged samples.

EKF, M=100
α−1.0−0.2541690.4137990.7458310.852932
β0.10.06588080.0730564−0.03411920.080631
σ2.02.017770.6380360.0177670.638283
R1.00.9518570.444057−0.04814350.446659
SNF, M=84
α−1.0−0.1199260.09532360.8800740.885222
β0.10.02315720.00795955−0.07684280.0772539
σ2.01.467650.215205−0.532350.574203
R1.01.15050.3994590.1504970.426869
LL, M=65
α−1.0−0.08853850.1173380.9114610.918983
β0.10.01637480.00790206−0.08362520.0839977
σ2.01.480730.246138−0.5192730.574655
R1.01.250620.3708420.2506190.447586
UKF, κ=0,M=99
α−1.0−0.7127360.952550.2872640.994923
β0.10.0761220.0853785−0.0238780.0886547
σ2.01.798750.403944−0.2012460.451299
R1.01.074170.4420.07417270.44818
UKF, κ=1,M=96
α−1.0−1.093441.00893−0.09344431.01325
β0.10.1020810.0866850.00208060.08671
σ2.02.116630.4399070.1166320.455106
R1.00.9845570.450917−0.01544270.451182
UKF, κ=2,M=93
α−1.0−1.514081.19571−0.5140841.30154
β0.10.1241420.08981910.0241420.093007
σ2.02.462630.4893410.4626250.673407
R1.00.8736140.437204−0.1263860.455105
GHF, m=4,M=96
α−1.0−1.45211.14993−0.45211.23561
β0.10.1225610.08942790.02256060.0922298
σ2.02.394970.4403230.3949710.591511
R1.00.8919530.433076−0.1080470.446351

The likelihood was maximized using a quasi-Newton algorithm with numerical score function and Broyden, Fletcher, Goldfarb, Shanno (BFGS) secant updates (Dennis and Schnabel, 1983). In terms of root mean square error (RMSE), the several algorithms are comparable. There is a tradeoff between bias and variance of the estimates. For example, the estimates for α in the Taylor-based methods EKF, SNF and LL are strongly biased, but the standard deviation is smaller than that for the density-based algorithms. However, because of the long sampling intervals, EKF, SNF, and LL tend to diverge, and large conditional means |μ(t|Zi)|>YMAX=1000 were reset to zero. Overall, there is no algorithm with clear advantages, although UKF and GHF are more stable. The UKF furthermore has the problem of choosing the scaling parameter κ. It seems, in this example, that κ=0 yields the smallest RMSE, whereas the bias is minimized for κ=1.

6.2Model with time effects

As mentioned in section 2, models with random time effects of the form gdWn(t)+γdV(t) can be estimated using the extended state η(t) = {y1(t),…,yN(t)} with random process error dW(t) = {dW1(t),…,dWN(t),dV(t)}. The specification is

image(62)

where

image(63)
image(64)

IN:N×N is the unit matrix and 1N:N×1 is a vector of ones. The diffusion matrix inline image yields an equicorrelation structure between the panel units, but more general specifications are possible. On the other hand, the filtering problem is of dimension p*N, which is feasible only for small panels. For linear systems, simplifications are possible as the variance equation of the Kalman filter preserves the equicorrelation form of the conditional variances Σ(ti|ti) and Σ(ti+1|ti). For nonlinear systems, the equicorrelation form is not preserved as the drift terms cov[F(η(τj),τj),η(τj)|Zi] in the variance equation (cf. section 2.3) contain different elements. Figure 6 depicts the correlated movement of two panel units caused by the joint random shocks γdV(t). Individual, time-independent random effects πn could be joined to the state η as well.

Figure 6.

 Panel units n=1,2: correlated movement due to joint time effects dV(t).

In a second simulation study (Table 2), the error term was split into the effects σ dWn(t)+γ dV(t) with true values σ=1 and γ=1. Thus, the diagonal of Ω is σ2+γ2=2. The main difference of this from the previous study is that much more divergences occurred for the Taylor-based methods. Especially, the SNF diverged in all cases and LL converged only in M=26 samples. These instabilities can be traced to the moment equations (11, 18), where second-order corrections in the μ equation lead to divergences in the variance equation. These can be compensated by third-order Taylor expansion and subsequent Gauss factorization of higher order moments. The equicorrelation part of the moment equations caused by the time effect seems to amplify the divergence problems. In the particle filters UKF or GHF these are avoided by numerical integration of the moment equations (Sections 4.1 and 4.2). Here numerical problems in some samples result from an indefinite covariance matrix. The UKF performs best for parameters α and β. Unfortunately the good results for LL are based on only 26 converged replications. The SNF even diverged in all cases.

Table 2.   Simulation study for bifurcation model with time effect γ dW(t).
ParameterValueMeanSDBiasRMSE
  1. Notes: Distribution of ML estimates in M=100 replications. M= number of converged samples. The SNF diverged in all cases.

EKF, M=91
α−1.0−0.7404020.6676060.2595980.716302
β0.10.1039610.09777440.003960580.0978545
σ1.01.583160.8430680.5831571.0251
R1.00.5515740.571612−0.4484260.726517
γ1.00.9181240.42866−0.08187560.436409
LL, M=26
α−1.0−0.2829380.1236820.7170620.72765
β0.10.02811440.00915889−0.07188560.0724667
σ1.00.7775950.216964−0.2224050.310705
R1.01.008240.2330770.008236830.233223
γ1.00.6853450.243837−0.3146550.398076
UKF, κ=0,M=98
α−1.0−1.285580.822361−0.2855790.870536
β0.10.1092120.06957660.009211810.0701838
σ1.01.719710.4908240.719710.871143
R1.00.749360.292858−0.250640.385469
γ1.01.060270.5548910.06026540.558154
UKF, κ=1,M=98
α−1.0−1.210120.784112−0.2101160.811776
β0.10.103040.0666030.003040340.0666724
σ1.01.690340.5216320.6903390.865256
R1.00.7597830.299567−0.2402170.383985
γ1.01.072460.5470390.07245730.551817
UKF, κ=2,M=97
α−1.0−1.173150.75184−0.1731530.771521
β0.10.09940640.0632362−0.0005935810.063239
σ1.01.694120.5228880.694120.869031
R1.00.767520.296705−0.232480.376936
γ1.01.042450.6129310.04244890.6144
GHF, m=4,M=94
α−1.0−1.747231.04536−0.7472331.28496
β0.10.152930.08689950.05292990.10175
σ1.01.583090.5156880.5830860.778411
R1.00.7667450.262719−0.2332550.351325
γ1.01.151340.5019230.1513410.524243

For the error parameters σ,γ, and R, the GHF performs best. Most surprising is the good performance of the EKF (lowest RMSE for α). Overall, the time effect leads to a higher RMSE of the estimates for the variance components and more convergence problems.

7 Example: ordinal measurements

The nonlinear state-space model (1, 2) is flexible enough to model ordinal measurements via the threshold model

image(65)

where θ is the Heaviside step function and cj are thresholds contained in the parameter vector ψ. The variance of the measurement error R=var(ε) is taken small (10−6 here), so that the measurement density

image

is proportional to the indicator function χh−1(z)(y). Now the measurements are strongly nonlinear and the a posteriori density is proportional to the a priori density truncated by the windows Cj= (cj,cj+1] defined from the thresholds c= {−∞,c1,…,cJ,∞}. Figure 7 shows the trajectory of panel unit n=10 together with the thresholds {c1,c2}={−2,2} and the ordinal data z(t) ∈ {−1,0,1} (setting z0=−1). The data were generated as in section 6 using an Euler–Maruyama scheme with δt=0.1. The data were filtered using the GGHF comparing the normal correlation and the Bayes update. As explained in section 4.3, the filter density is represented by the Hermite expansion φ(y;μ,σ2)H(y,K) and the measurement update is obtained either by the normal correlation (20, 21) or by the Bayes formula (4). In both cases, Gauss–Hermite integration can be used. Denoting the linear estimates (20, 21) by μ0 and Σ0, the normal correlation update is given by the product

Figure 7.

 Panel unit n=10: latent trajectory (grey) and ordinal measurements (black) with thresholds {c1,c2}={−2,2}. The values of z(t) are −1,0,1.

image(66)
image(67)
image(68)

In the equations above, the a posteriori distribution is non-Gaussian due to the Hermite part H(y;{μ,m2,…,mK}), where {μ,m2,…,mK} are the a priori moments. For strongly nonlinear measurements, the Bayes formula yields the exact expression

image(69)
image(70)

The likelihood integral contains the a priori Gauss part φ(y;μ,σ2), but the efficiency of the Gauss–Hermite quadrature can be improved by integrating over the normal correlation update φ(y;μ00) (analogously to importance sampling). Figures 8 and 9 compare the updates in the case K=2 (Gaussian filter). The densities are always Gaussian, but the a posteriori moments are either the linear estimates or are computed using the Bayes formula. The latter method yields better updates which more closely approximate the truncated Gaussian a posteriori densities. Note that the measurement density p(z|y)=φ(z;h(y),R), which does not integrate to unity as function of y, was scaled by 10−3 in the graphics.

Figure 8.

 GGHF: measurement updates of threshold model; normal correlation, K=2 (GHF); a priori (black), a posteriori (dark grey) and measurement density p(z|y) (light grey). Note that p(z|y) does not integrate to unity as a function of y.

Figure 9.

 GGHF: measurement updates of threshold model; Bayes formula, K=2 (GHF). Note that the update yields a better approximation of p(y|z); a priori (black), a posteriori (dark grey) and measurement density p(z|y) (light grey).

Using more terms (e.g. K=20) in the Hermite expansion yields a more realistic modeling of the bimodal a priori density (Figures 10 and 11). Note that the normal correlation update is non-Gaussian as well, because of the Hermite term H(y,K). In some cases the Bayes update tends to unrealistic oscillations in the a posteriori density. This is due to locally negative values of the Hermite series.

Figure 10.

 GGHF: measurement updates of threshold model; normal correlation, K=20; a priori (black), a posteriori (dark grey) and measurement density p(z|y) (light grey).

Figure 11.

 GGHF: measurement updates of threshold model; Bayes formula, K=20; a priori (black), a posteriori (dark grey) and measurement density p(z|y) (light grey). Oscillation at t=13.7 (see text).

8 Conclusion

We compared filtering algorithms for nonlinear panel models in continuous time with discrete time measurements. The classical algorithms EKF and SNF are based on Taylor expansions of the moment equations (or the Itô formula in the case of LL). In contrast, UKF and GHF use numerical integration for the expectation values. The UT transformation is directly applied to the moment equations avoiding the extension of the system state and doubling the dimension. ML estimation of a Ginzburg–Landau model did not yield a uniformly best method, but in terms of bias, the UKF with κ=1 was best (model without time effects). Inclusion of time effects leads to an N*p dimensional filtering problem because of correlated panel units. Again there were advantages for the particle filters UKF and GHF. Finally, ordinal data were treated using a threshold model using the GHF and the GGHF (K=20). The Bayes update is superior to the linear normal correlation. As the measurement function is not differentiable, the EKF-type algorithms cannot be used here. The Hermite expansion yields a more realistic approximation of the truncated a posteriori density, but already the Gaussian case K=2 leads to sufficient results.

Acknowledgement

I would like to thank an anonymous referee for detailed comments which helped me improve the presentation of the paper.

Ancillary