## 1. INTRODUCTION

In dynamic panel data models with unobserved effects, the treatment of the initial observations is an important theoretical and practical problem. Much attention has been devoted to dynamic linear models with an additive unobserved effect, particularly the simple AR(1) model without additional covariates. As is well known, the usual within estimator is inconsistent, and can be badly biased. [See, for example, Hsiao (1986, section 4.2).]

For linear models with an additive unobserved effect, the problems with the within estimator can be solved by using an appropriate transformation—such as differencing—to eliminate the unobserved effects. Then, instrumental variables (IV) can usually be found for implementation in a generalized method of moments (GMM) framework. Anderson and Hsiao (1982) proposed IV estimation on a first-differenced equation, while several authors, including Arellano and Bond (1991), Arellano and Bover (1995), Ahn and Schmidt (1995), improved on the Anderson–Hsiao estimator by using additional moment restrictions in GMM estimation. More recently, Blundell and Bond (1998) and Hahn (1999) have shown that imposing restrictions on the distribution of initial conditions can greatly improve the efficiency of GMM over certain parts of the parameter space.

Solving the initial conditions problem is notably more difficult in nonlinear models. Generally, there are no known transformations that eliminate the unobserved effects and result in usable moment conditions, although special cases have been worked out. Chamberlain (1992) finds moment conditions for dynamic models with a multiplicative effect in the conditional mean, and Wooldridge (1997) considers transformations for a more general class of multiplicative models. Honoré (1993) obtains orthogonality conditions for the unobserved effects Tobit model with a lagged dependent variable. For the unobserved effects logit model with a lagged dependent variable, Honoré and Kyriazidou (2000) find an objective function that identifies the parameters under certain assumptions on the strictly exogenous covariates.

The strength of semiparametric approaches is that they allow estimation of parameters without specifying a distribution for the unobserved effect. Unfortunately, semiparametric identification hinges on some strong assumptions concerning the strictly exogenous covariates; for example, time dummies are not allowed in the Honoré and Kyriazidou (2000) approach. Honoré and Kyriazidou also reduce the sample to cross-sectional units with no change in any discrete covariates over the last two time periods.

Another practical limitation of the Honoré (1993) and Honoré and Kyriazidou (2000) estimators—and one that often goes unnoticed—is that partial effects on the response probability or conditional mean are not identified. Therefore, the absolute importance of covariates, or the amount of state dependence, cannot be determined from semiparametric approaches.

In this paper I reconsider the initial conditions problem in a parametric framework for nonlinear models. A parametric approach has its usual drawbacks because I specify an auxiliary conditional distribution for the unobserved heterogeneity; misspecification of this distribution generally results in inconsistent parameter estimates. Nevertheless, in some leading cases the approach I take leads to some remarkably simple maximum likelihood estimators. Further, I show that the assumptions are sufficient for uncovering the quantities that are usually of interest in nonlinear applications: partial effects on the mean response, averaged across the population distribution of the unobserved heterogeneity.

Previous research in parametric, dynamic nonlinear models has focused on three different ways of handling initial conditions; these are summarized by Hsiao (1986, section 7.4). The first approach is to treat the initial conditions for each cross-sectional unit as nonrandom variables. Unfortunately, nonrandomness of the initial conditions, **y**_{i0}, implies that **y**_{i0} is independent of unobserved heterogeneity, **c**_{i}. Even when we observe the entire history of the process {**y**_{it}}, the assumption of independence between **c**_{i} and **y**_{i0} is very strong. For example, suppose we are interested in modelling earnings of college graduates once they leave college, and **y**_{i0} is earnings in the first post-school year. That we observe the start of this process is logically distinct from the strong assumption that unobserved ‘ability’ and ‘motivation’ are independent of initial earnings.

A better approach is to allow the initial condition to be random, and then to use the joint distribution of *all* outcomes on the response—including that in the initial time period—conditional on unobserved heterogeneity and observed strictly exogenous explanatory variables. The main complication with this approach is specifying the distribution of the initial condition given unobserved heterogeneity. Some authors insist that the distribution of the initial condition represent a steady-state distribution. While the steady-state distribution can be found in special cases—such as the first-order linear model without exogenous variables [see Bhargava and Sargan (1983) and Hsiao (1986, section 4.3)] and in the unobserved probit model without additional conditioning variables [see Hsiao (1986, section 7.4)]—it cannot be done for even modest extensions.

For the dynamic probit model with covariates, Heckman (1981) proposed approximating the conditional distribution of the initial condition. This avoids the practical problem of not being able to find the conditional distribution of the initial value. But, as we will see, it is computationally more difficult than necessary for obtaining both parameter estimates and estimates of averaged effects in nonlinear models.

The approach I suggest in this paper is to model the distribution of the unobserved effect conditional on the initial value and any exogenous explanatory variables. This suggestion has been made before for particular models. For example, Chamberlain (1980) mentions this possibility for the linear AR(1) model without covariates, and Blundell and Smith (1991) study the conditional maximum likelihood estimator of the same model; see also Blundell and Bond (1998). For the binary response model with a lagged dependent variable, Arellano and Carrasco (2003) study a maximum likelihood estimator conditional on the initial condition, where the distribution of the unobserved effect given the initial is taken to be discrete. When specialized to the binary response model, the approach here is more flexible, and computationally much simpler: the response probability can have the probit or logit form, strictly exogenous explanatory variables are easily incorporated along with a lagged dependent variable, and standard random effects software can be used to estimate the parameters and averaged effects.

Specifying a distribution of heterogeneity conditional on the initial condition has several advantages. First, we can choose the auxiliary distribution to be flexible, and view it as an alternative approximation to Heckman's (1981). Second, in several leading cases—probit, ordered probit, Tobit and Poisson regression—an auxiliary distribution can be chosen that leads to straightforward estimation using standard software. Third, partial effects on mean responses, averaged across the distribution of unobservables, are identified and can be estimated without much difficulty. I show how to obtain these partial effects generally in Section 4, and Section 5 covers the probit and Tobit models.