#### 5.1. Dynamic Probit and Ordered Probit Models

In addition to equation (1), assume that

- (15)

where **z**_{i} is the row vector of all (nonredundant) explanatory variables in all time periods. If **z**_{it} contains a full set of time period dummy variables these elements would be dropped from **z**_{i}. The presence of **z**_{i} in equation (15) means that we cannot identify the coefficients on time-constant covariates in **z**_{it}, although time-constant covariates can be included in **z**_{i} in equation (15).

Given equation (1), we can write

- (16)

where **β** = (**γ**′, ρ)′. When we integrate equation (16) with respect to the normal distribution in equation (15), we obtain the density of D(*y*_{i1}, …, *y*_{iT}|*y*_{i0}, **z**_{i}).

It turns out that we can specify the density in such a way that standard random effects probit software can be used for estimation. If we write

- (17)

where , then *y*_{it} given (*y*_{i, t−1}, …, *y*_{i0}, **z**_{i}, *a*_{i}) follows a probit model with response probability

- (18)

This is easy to derive by writing the latent variable version of the model as and plugging in for *c*_{i} from equation (17):

- (19)

where *u*_{it}|(**z**_{i}, *y*_{i, t−1}, …, *y*_{i0}, *a*_{i}) ∼ Normal(0, 1); equation (18) follows. Thus, the density of (*y*_{i1}, …, *y*_{iT}) given (*y*_{i0} = *y*_{0}, **z**_{i} = **z**, *a*_{i} = *a*) is

- (20)

and integrating equation (20) against the Normal density gives the density of (*y*_{i1}, …, *y*_{iT}) given (*y*_{i0} = *y*_{0}, **z**_{i} = **z**):

- (21)

Interestingly, the likelihood in equation (21) has exactly the same structure as in the standard random effects probit model, except that the explanatory variables at time period *t* are

- (22)

Importantly, we are not saying that *a*_{i} is independent of *y*_{i, t−1}, which is impossible. (Dependence between *a*_{i} and *y*_{i, t−1} means that a pooled probit analysis of *y*_{it} on **x**_{it} is inconsistent for the parameters and the APEs.) Further, the density in equation (21) is not the joint density of (*y*_{i1}, …, *y*_{iT}) given (**x**_{i1}, …, **x**_{iT}), as happens in the case with strictly exogenous **x**_{it}. Nevertheless, the way random effects probit works is by forming the products of the densities of *y*_{it} given (**x**_{it}, *a*_{i}), and then integrating out using the unconditional density of *a*_{i}, and this is what equation (21) calls for. So we add *y*_{i0} and **z**_{i} as additional explanatory variables in each time period and use standard random effects probit software to estimate **γ**, ρ, α_{0}, α_{1}, α_{2} and .

Under the assumptions made, we can easily obtain estimated partial effects at interesting values of the explanatory variables. The average partial effects are based on

- (23)

where the expectation is with respect to the distribution of *c*_{i}. The general formula in equation (14) turns out to be easy to obtain. Again, replace *c*_{i} with *c*_{i} = α_{0} + α_{1}*y*_{i0} + **z**_{i}**α**_{2} + *a*_{i}, so that expression (23) is

- (24)

where the expectation is over the distribution of (*y*_{i0}, **z**_{i}, *a*_{i});**z**_{t} and *y*_{t−1} are fixed values here. Now, as in Section 4, we use iterated expectations:

- (25)

The conditional expectation inside equation (25) is easily shown to be

- (26)

where the ‘a’ subscript denotes the original parameter multiplied by . Now, we want to estimate the expected value of expression (26) with respect to the distribution of (*y*_{i0}, **z**_{i}). A consistent estimator is

- (27)

where the ‘a’ subscript now denotes multiplication by , and and are the MLEs. We can compute changes or derivatives of expression (27) with respect to **z**_{t} or *y*_{t−1} to obtain APEs. Thus, we can determine the economic importance of any state dependence.

Allowing for a more flexible conditional mean in equation (15) is straightforward, provided the mean is linear in parameters. For example, including interactions between *y*_{i0} and **z**_{i} is simple, and would be warranted if we included interactions between the elements of **z**_{it} and *y*_{i, t−1} in the structural model. Allowing for heteroscedasticity in Var(*c*_{i}|*y*_{i0}, **z**_{i}) is more complicated and would probably require special programming. Still, specifying, say, leads to a tractable log-likelihood function: simply replace σ_{a} in equation (21) with σ_{a}[exp(γ_{1}*y*_{i0} + **z**_{i}**γ**_{2})]^{1/2}. With , the conditional expectation in equation (25) is still easy to obtain: —see Wooldridge [2002, problem 15.18(c)] for the static case—and so APEs would be readily computable by averaging across *i*.

Certain specification tests are easy to compute. For example, after estimating the basic model, terms such as and could be added and their joint significance tested using a standard likelihood ratio test. Obtaining score tests for exponential heteroscedasticity in Var(*c*_{i}|*y*_{i0}, **z**_{i}) or for nonnormality in D(*c*_{i}|*y*_{i0}, **z**_{i}) are good topics for future research.

The binary probit model extends in a straightforward way to a dynamic ordered probit model. If *y*_{it} takes on values in {0, 1, …, *J*} then we can specify an ordered probit model with *J* lagged indicators, 1[*y*_{i, t−1} = *j*], *j* = 1, …, *J*, and strictly exogenous explanatory variables, **z**_{it}. The underlying latent variable model would be , where **r**_{i, t−1} is the vector of *J* indicators, and *e*_{it} has a conditional standard normal distribution. The observed value, *y*_{it}, is determined by falling into a particular interval, where the cut-points must be estimated. If we specify D(*c*_{i}|*y*_{i0}, **z**_{i}) as having a homoscedastic normal distribution, standard random effects ordered probit software can be used. Probably we would allow *h*(*c*|*y*_{0}, **z**;**δ**) to depend on a full set of indicators 1[*y*_{i0} = *j*], *j* = 1, …, *J*.

Certainly there are some criticisms that one can make about the conditional MLE approach for dynamic probit models. First, suppose that there are no covariates. Then, unless α_{1} = 0, equation (15) implies that *c*_{i} has a mixture-of-normals distribution, rather than a normal distribution, as would be a standard assumption. But *c*_{i} given *y*_{i0} has some distribution, and it is unclear why an unconditional normal distribution for *c*_{i} is *a priori* better than a conditional normal distribution. For cross-sectional binary response models, Geweke and Keane (1999) find that, empirically, mixture-of-normals probit models fit significantly better than the standard probit model. Granted, the mixing probability here is tied to *y*_{0}, and the variance is assumed to be constant. But often is econometrics we assume that unobserved heterogeneity has a conditional normal distribution rather than an unconditional normal distribution.

Related to the previous criticism is that, in models without covariates, equation (15) implies a distribution D(*y*_{i0}|*c*_{i}) different from the steady-state distribution. This is not ideal—in the linear model, one can allow for a non-steady-state distribution while including the steady-state distribution as a special case—but it is only relevant in models without covariates. Plus, even if there are no covariates, it is not clear why imposing a steady-state distribution is better than that implied by equation (15). Dynamic panel data models are really about modelling the conditional distributions in Assumption 1. One can take issue with any set of auxiliary assumptions.

Another criticism is that if ρ = 0 then, because *c*_{i} given **z**_{i} cannot be normally distributed unless α_{1} = 0, the model is not compatible with Chamberlain's (1980) static random effects probit model. That the model here does not encompass Chamberlain's is true, but it is unclear why normality of *c*_{i} given **z**_{i} is necessarily a better assumption than normality of *c*_{i} given (*y*_{i0}, **z**_{i}). Both are only approximations to the truth and, when estimating a dynamic model, it is much more convenient to use equation (15). Plus, Chamberlain's static model does not allow estimation of either **ρ** or the amount of state dependence, as measured by the average partial effect.

Assumption (15) is also subject to the same criticism as Chamberlain's (1980) random effects probit model with strictly exogenous covariates. Namely, if we want the same model to hold for any number of time periods *T*, the normality assumption in equation (15) imposes distributional restrictions on the **z**_{it}. For example, suppose α_{1} = 0. Then, for equation (15) to hold for *T* and *T* − 1, **z**_{it}**α**_{2T} given (**z**_{i1}, …, **z**_{i, T−1}) would have to have a normal distribution. While theoretically this is a valid criticism, it is hardly unique to this setting. For example, every time an explanatory variable is added in a cross-sectional probit analysis, the probit model can no longer hold unless the new variable is normally distributed. Yet researchers regularly use probit models on different sets of explanatory variables.

#### 5.2. Dynamic Tobit Models

For the Tobit model the density in Assumption 2 is

To implement the conditional MLE, we need to specify a density in Assumption 3. Again, it is convenient for this to be normal, as in equation (15). For the Tobit case, we might replace *y*_{i0} with a more general vector of functions, **r**_{i0}≡**r**(*y*_{i0}), which allows *c*_{i} to have a fairly flexible conditional mean. Interactions between elements of **r**_{i0} and **z**_{i} may be warranted. We can use an argument very similar to the probit case to show that the log-likelihood has a form that can be maximized by standard random effects Tobit software, where the explanatory variables at time *t* are **x**_{it}≡(**z**_{it}, **g**_{i, t−1}, **r**_{i0}, **z**_{i}) and **g**_{i, t−1}≡**g**(*y*_{i, t−1}). In particular, the latent variable model can be written as , where *u*_{it} given (**z**_{i}, *y*_{i, t−1}, …, *y*_{i0}, *a*_{i}) has a Normal distribution. Again, we estimate rather than , but is exactly what appears in the average partial effects.

Denote E(*y*_{it}|**w**_{it} = **w**_{t}, *c*_{i} = *c*) as

- (28)

where **w**_{t} = (**z**_{t}, **g**_{t−1}). As in the probit case, for estimating the APEs it is useful to substitute for *c*_{i}:

- (29)

where the first expectation is with respect to the distribution of *c*_{i} and the second expectation is with respect to the distribution of (**y**_{i0}, **z**_{i}, *a*_{i}). The second equality follows from iterated expectations. Since *a*_{i} and (**r**_{i0}, **z**_{i}) are independent, and , the conditional expectation in equation (29) is obtained by integrating over *a*_{i} with respect to the Normal distribution. Since is obtained by integrating max(0, **w**_{t}**β** + α_{0} + **r**_{i0}**α**_{1} + **z**_{i}**α**_{2} + *a*_{i} + *u*_{it}) with respect to *u*_{it} over the Normal distribution, it is easily seen that the conditional expectation in equation (29) is

- (30)

A consistent estimator of the expected value of expression (30) is simply

- (31)

Other corner solution responses can be handled similarly. For example, suppose *y*_{it} is a fractional variable that can take on the values zero and one with positive probability. Then we can define *y*_{it} as a doubly-censored version of the latent variable introduced earlier. Standard software that estimates two-limit random effects Tobit models is readily applied.

#### 5.3. Dynamic Poisson Model

As in Section 2, we assume that *y*_{it} given (*y*_{i, t−1}, …, *y*_{i0}, **z**_{i}, *c*_{i}) has a Poisson distribution with mean given in equation (4). For Assumption 3, write

- (32)

where **r**_{i0} is a vector of functions of *y*_{i0}. Assume that *a*_{i} is independent of (*y*_{i0}, **z**_{i}) and *a*_{i} ∼ Gamma(η, η), which is analogous to Hausman *et al.* (1984). Then, for each *t*, *y*_{it}|(*y*_{i, t−1}, …, *y*_{i0}, **z**_{i}, *a*_{i}) has a Poisson distribution with mean

- (33)

where **r**_{i0} denotes a vector function of *y*_{i0}. Call the mean in expression (33) *a*_{i}*m*_{it}. Then the density of (*y*_{i1}, …, *y*_{iT}) given (**z**_{i}, *y*_{i0}, *a*_{i}) is obtained, as usual, by the product rule:

- (34)

where *n* = *y*_{1} + · · · + *y*_{T}. When we integrate out *a*_{i} with respect to the Gamma (η, η) density, we obtain a density that has the usual random effects Poisson form with Gamma (η, η) heterogeneity, as in Hausman *et al.* [1984, equation (2.3)]. The difference is that the explanatory variables are (**z**_{it}, **g**_{i, t−1}, **r**_{i0}, **z**_{i}). This makes estimation especially convenient in software packages that estimate random effects Poisson models with Gamma heterogeneity. The Chamberlain (1992) and Wooldridge (1997) moment estimators are compatible with this MLE analysis in the sense that the moment estimators only use the conditional mean assumption (4).