SEARCH

SEARCH BY CITATION

SUMMARY

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 THE MODEL
  5. 3 NLS ESTIMATION
  6. 4 GMM ESTIMATION
  7. 5 TESTING FOR SELECTION BIAS
  8. 6 EMPIRICAL APPLICATION
  9. 7 CONCLUSIONS
  10. ACKNOWLEDGEMENTS
  11. REFERENCES
  12. Supporting Information

We propose a new method for estimating dynamic panel data models with selection. The method uses backward substitution for the lagged dependent variable, which leads to an estimating equation that requires correcting for contemporaneous selection only. The estimator is valid under relatively weak assumptions about errors and permits avoiding the weak instruments problem associated with differencing. We also propose a simple test for selection bias that is based on the addition of a selection term to the first-difference equation and subsequent testing for significance of this term. The methods are applied to estimating dynamic earnings equations for women. Copyright © 2011 John Wiley & Sons, Ltd.

1 INTRODUCTION

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 THE MODEL
  5. 3 NLS ESTIMATION
  6. 4 GMM ESTIMATION
  7. 5 TESTING FOR SELECTION BIAS
  8. 6 EMPIRICAL APPLICATION
  9. 7 CONCLUSIONS
  10. ACKNOWLEDGEMENTS
  11. REFERENCES
  12. Supporting Information

Recently developed methods for estimating dynamic unobserved effects panel data models have become widely used in applied economics research. In the present paper, we contribute to the literature by developing a new estimation method for the models, where the panel is not balanced due to nonrandom selection.

In the absence of selection, the traditional approach to estimating dynamic panel data models is to remove the unobserved effect by first-differencing and then use instrumental variables methods for estimating the differenced equation. This approach was initially proposed by Anderson and Hsiao (1981) and was later considered within a more efficient generalized method of moments (GMM) framework by Holtz-Eakin et al. (1988), Arellano and Bond (1991), Ahn and Schmidt (1995), Blundell and Bond (1998), and others.

Several previous studies considered estimation of dynamic panel data models with selectivity.1 Arellano et al. (1999) consider autoregressive panel data models with sample selection. They model the conditional expectation of the unobserved effect as a linear function of the past values of the dependent variable and consider the distribution of the dependent variable conditional on its past. For each t, the resulting reduced-form equation is estimated on a subsample of data, which includes cross-section units without missing past values. Arellano et al. assume normality of the error terms in both primary and selection equations and use the inverse Mills ratio to account for the fact that only the subsamples with observed past values are used. The structural autoregressive coefficient is then recovered from the reduced-form coefficients using the restrictions imposed on parameters.

Another solution to the incidental truncation problem in dynamic panel data models was proposed by Kyriazidou (2001), who suggested taking differences between any two periods in which the selection index for the given unit is the same or ‘similar’. Under the assumption that the vector of errors is independent and identically distributed over time conditional on the exogenous variables, differencing eliminates both the unobserved effect and selection effect. For consistency, it is crucial that the assumptions of strict stationarity and conditional serial independence of the errors hold. Moreover, the estimator converges at a rate that is slower than the usual square root of the cross-section sample size.

Another semiparametric estimator was proposed by Gayle and Viauroux (2007), who consider a three-step sieve estimator. In the first step the selection probabilities in each period are estimated nonparametrically by a kernel estimator. In the second step the inverse probability function is linearized, the unobserved effect is removed by differencing, and the parameters in the linearized specification of the inverse probability function are estimated using a sieve minimum distance estimator (a GMM estimator with series used to approximate unknown functions). In the third step the GMM estimator is used to estimate the differenced primary equation augmented by the correction term, where the differenced correction term is again approximated by series estimators.

As seen from the discussion above, most earlier studies use differencing to remove the unobserved effect. While this is a natural estimation approach when estimating models with nonrandom attrition (Ziliak and Kniesner, 1998; Wooldridge 2002), in dynamic panel data models with arbitrary selection patterns the use of first-differencing or otherwise conditioning on observability of the dependent variable in multiple consecutive periods implies that much of the data is lost. Moreover, as noted by Blundell and Bond (1998), differencing may also lead to a weak instruments problem. This problem arises when the series are highly persistent, which happens in a simple AR(1) model with the autoregressive coefficient close to unity.2

In this paper we consider an alternative method. One of the key assumptions is that the initial condition is observed for all cross-section units. To account for unobserved heterogeneity, rather than using differencing we follow Blundell and Bond (1998) and Chamberlain (1980, 1982, 1984), and model the conditional expectation of the unobserved effect as a linear function of the exogenous variables and initial condition. Then, backward substitution for the lagged dependent variable is used to obtain the equation that contains the lags of the exogenous explanatory variables (which are assumed to be always observed) and the initial condition, but no lags of the dependent variable. As a result, selection correction reduces to a contemporaneous selection problem of the type studied in Wooldridge (1995) with strictly exogenous variables. The ability to focus on selection period by period greatly simplifies the derivation of the correction term while allowing general serial correlation in the error of the selection equation. The simplest approach relies on the assumption that the error terms in the selection equation are normally distributed, but we also briefly discuss the possibility of semiparametric estimation. Once the correction term is obtained, the augmented equation can be consistently estimated by nonlinear least squares (NLS) or GMM.

The new estimation methods have several important advantages. Modeling the unobserved effects allows us to estimate the equation of interest in levels, thereby avoiding the weak instruments problem often associated with the estimators that use differencing. In the discussed context the error terms in both primary and selection equations may be heterogeneously distributed over time, and the error in the selection equation may be arbitrarily serially dependent. We also discuss how estimation can be modified, so that the observability of the initial condition is not required, and serial dependence in the error terms is permitted in both equations. Additionally, the approach proposed here makes use of all cross-section units observed at least once after the initial period, which helps to avoid losing data.

2 THE MODEL

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 THE MODEL
  5. 3 NLS ESTIMATION
  6. 4 GMM ESTIMATION
  7. 5 TESTING FOR SELECTION BIAS
  8. 6 EMPIRICAL APPLICATION
  9. 7 CONCLUSIONS
  10. ACKNOWLEDGEMENTS
  11. REFERENCES
  12. Supporting Information

Consider a dynamic panel data model with unobserved heterogeneity:

  • display math(1)

where xit is a 1 × K vector of time-varying variables, β is a K × 1 vector of parameters, ρ is a scalar parameter, ci1 is a time-constant unobserved effect, and uit1 is an idiosyncratic error. Variables in xit are assumed to be strictly exogenous conditional on the unobserved effect, but may be correlated with ci1.

Selection occurs because of the partial observability of the dependent variable, yit. This is modeled by specifying a selection rule:

  • display math(2)

where sit is a selection indicator that equals one if yit is observed and is zero otherwise, ci2 is a time-constant unobserved effect, uit2 is an idiosyncratic error, zit is a 1 × L (L > K ) vector of variables that are strictly exogenous conditional on the unobserved effect, and δ2t is an L × 1 vector of parameters. In what follows, it is assumed that zit contains all of the regressors from the primary equation, but must also contain at least one additional time-varying variable. Additional variables may be the factors that affect selection but not the dependent variable in the primary equation. Alternatively, if selection is partly determined by the lagged values of yit (as in some labor supply models, for example), vector zit would include lagged values of xit.

Given the selection problem, estimation of equation (1) by differencing is complicated for several reasons. First, we need to observe the dependent variable and explanatory variables in the current and previous periods. Because of the lagged dependent variable, we would only be able to use observations where yit is observed in three consecutive periods. Moreover, any selection correction term would involve conditioning on observability in three different periods, making its derivation and estimation difficult.

We can avoid these problems by substituting back for yi,t − 1 and expressing yit through the current and lagged values of the explanatory variables and the initial condition, yi0:

  • display math(3)

Denote zi ≡ (zi1, zi2, …, ziT). Given (3), the estimating equation can be derived under the following assumption.

Assumption 1.

  1. yi0 and zi are always observed, while yit, t = 1,…,T, are observed only for sit = 1.
  2. E(uit1|xit, yi, t − 1, xi, t − 1, … yi0, ci1) = 0, so that cov(uit1, uis1) = 0, for all s ≠ t.
    • display math
    • display math
    • display math
    • display math
    • display math

According to part (ii) of Assumption 1, the conditional mean in equation (1) is assumed to be dynamically complete, which is a rather standard assumption in the literature. This part of the assumption ensures that yi0 is exogenous with respect to the final error in (3). At the end of this section we discuss an alternative set of assumptions and the corresponding estimating equation, where the dynamic completeness assumption is dropped, so that {uit1} may be serially correlated.

Part (iv) of Assumption 1 uses Chamberlain's (1980, 1982, 1984) device to model the conditional mean of the unobserved effect, ci1, as a linear function of exogenous variables (see also Blundell and Bond, 1998). This approach was used by Wooldridge (2005) in the context of nonlinear dynamic panel data models with balanced panels. In general, zit may contain time-constant variables; of course, the leads and lags of such variables would not be included in the conditional mean of ci1. A non-zero correlation between the time-constant variables and ci1 implies that the effect of these variables cannot be distinguished from that of the unobserved heterogeneity. However, it may still be useful to include the time-invariant characteristics in zit because controlling for more variables can help to improve on the precision of the estimator.

Under Assumption 1, parts (i)–(iv), the primary equation can be written as

  • display math(4)

where inline image, are the new error terms, which will be serially correlated even though the initial idiosyncratic errors were not.

Equation (4) can be used to estimate the parameters when the panel is balanced.3 Estimating equation (4) by NLS or GMM can serve as an alternative to traditional estimators that combine first-differencing with instrumental variables methods. As mentioned in the Introduction, a GMM estimator that uses first-differenced data suffers from the weak instruments problem when the series are highly persistent. Specifically, for a sequentially exogenous variable ωit, such as a lagged dependent variable, we can write the data-generating process as ωit = ρωi,t − 1 + εit, where cov(εis, εit) = 0 for s ≠ t. In the extreme case, where ρ = 1, ∆ωit = εit, so that past values (ωi,t−1,…,ωi1) are not correlated with ∆ωit and hence, cannot be used as instruments. When ρ is close to one, the lagged values are correlated with ∆ωit, but the correlation is weak, which results in the weak instruments problem. It is important to note, however, that this problem arises only when the estimation method is GMM. Binder et al. (2005) proposed a quasi-maximum likelihood estimator that uses differencing to remove unobserved heterogeneity, but does not suffer from the weak instruments problem. Similarly, Hsiao et al. (2002) propose a transformed likelihood approach and show that their maximum likelihood estimator that uses differenced data performs better than the GMM estimator.

In equation (4), the weak instruments problem does not arise. Because all variables in (4) are in levels, all of them are exogenous under Assumption 1 parts (ii)–(iv) and hence are used as their own instruments. Although the estimator relies on time variation in the variables, the source of this variation does not matter. Even if ρ = 1, the parameters in (4) can be consistently estimated by NLS or GMM, as long as var(εit) ≠ 0. As is true for all panel data models with large N and fixed T, the autoregressive coefficient can be identified from the cross-sectional variation in the data.

In the context of an unbalanced panel, under Assumption 1 parts (v) and (vi), the selection equation can we written as

  • display math(5)
  • display math(6)

where Chamberlain's modeling device is used to model the distribution of the time-constant unobserved effect, ci2. Note that owing to the presence of the unobserved effect, the composite errors, vit2 = uit2 + ai2, t = 1,…, T, are necessarily serially correlated. Also, error variances are allowed to vary over time. The normality assumption is not crucial for estimating the selection equation. As long as vit2 is independent of (zi, yi0) and the appropriate regularity conditions hold, parameters in (5) can be consistently estimated using a semiparametric estimator (see, for example, Klein and Spady, 1993; Ichimura, 1993). However, as discussed below, the derivation of the selection correction term is substantially simplified if Assumption 1 part (vi) holds.

To correct for the selection bias, we consider a two-step estimator and use the assumptions similar to the standard selection literature in a cross-sectional context; see, for example, Wooldridge (2002, Ch. 17). Specifically, from Assumption 1 part (vii) it follows that

  • display math(7)

where inline image and ht (·) is an unknown function.

From (7), it follows that for sit = 1, equation (4) can be written as

  • display math(8)

It is possible to estimate equation (8) semiparametrically. A semiparametric estimator would be appropriate if either the error distribution in the selection equation is not normal, or E(vit1|zi, yi0, vit2) is a nonlinear function of vit2, or both. However, it is also useful to consider a fully parametric approach that would lead to a simple estimation routine and would help to avoid computational difficulties typically associated with semiparametric methods. Therefore, in what follows we focus on the parametric case.

Under Assumption 1, parts (vi) and (vii), function ht is given by

  • display math(9)

where ϕ(·) and Φ(·) are standard normal probability density function (pdf ) and cumulative distribution function (cdf ), respectively, and λ(·) is the inverse Mills ratio. Thus, with some abuse of notation, we can write the primary equation for the selected sample as

  • display math(10)

where inline image. Under Assumption 1, equation (10) is the final estimating equation that can be consistently estimated by NLS or GMM.

As an alternative approach, one could treat the initial condition as an unobserved effect and model its conditional expectation as a linear function of exogenous variables, as suggested by Chamberlain (1984).4 In this case, the dynamic completeness of the conditional mean in equation (2) is not needed (and most likely will not hold), so that the idiosyncratic errors in (2) may be serially correlated. Formally, the set of assumptions can be summarized as follows.

Assumption 2.

  1. yi0 is not observed, zi is always observed, and yit, t = 1,…,T, are observed only for sit = 1.
    • display math
    • display math
    • display math
    • display math
    • display math
    • display math

Under Assumption 2, for sit = 1, the primary equation can be written as

  • display math(11)

where inline image. Similarly to (10), parameters in equation (11) can be consistently estimated by NLS or GMM, as discussed in the following two sections. Alternatively, one can estimate the reduced-form equation and then obtain structural coefficients, ρ and β, using nonlinear restrictions on parameters. In (11), it is possible to test the presence of the observed dynamics. If only the unobserved dynamics are present, the lags of the exogenous variables would not appear in equation (11), i.e. ρ would be zero.

Specifying the estimating equation as in (11) has the advantage of allowing serial correlation in idiosyncratic errors in equation (2). However, it also requires that the model necessarily contains exogenous time-varying explanatory variables and ignores the dynamics that are due to unobserved factors not included in the model. In what follows, we focus on the approach where the initial condition appears in the conditioning set, and the conditional mean in (2) is assumed to be dynamically complete, so that uit1 are serially uncorrelated. We emphasize, however, that equation (11) can be estimated using the proposed methods, also.

3 NLS ESTIMATION

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 THE MODEL
  5. 3 NLS ESTIMATION
  6. 4 GMM ESTIMATION
  7. 5 TESTING FOR SELECTION BIAS
  8. 6 EMPIRICAL APPLICATION
  9. 7 CONCLUSIONS
  10. ACKNOWLEDGEMENTS
  11. REFERENCES
  12. Supporting Information

A simple way to obtain a consisted estimator of parameters in equation (10) is to replace λit2 with its consistent estimator and estimate the parameters in two steps. Under Assumption 1 parts (v) and (vi), equation (5) can be consistently estimated by probit after the error variance is normalized to equal unity. Since error variances may differ across time periods, it is most appropriate to estimate the selection equation separately for each time period. Denote the first-step estimators inline imageand the first-step vector of regressors inline image. These can be used to obtain inline image and then inline image can be used instead of λit2 in equation (10).

Denote the 1 × [K + LT + T + 3] vector of the parameters θ ≡ (ρ, β, η1, ξ1, …, ξT, γ1φ21, …, φ2T). Parameters in θ can be consistently estimated by pooled NLS on the selected sample.

Define the conditional expectation of yit:

  • display math(12)

where

  • display math(13)

The correction term, λit2, is not available, but it can be replaced by a consistent estimator mentioned above. In general, let inline image be a conditional expectation obtained using the estimators of the parameters in the selection equation. Then, the pooled NLS estimator of θ is the solution to the minimization problem

  • display math(14)

where one half is used as a multiplier for convenience. The first-order condition for this problem is

  • display math(15)

which can be solved for inline image using the iterative procedures. As is standard in panel data models, for identification it is necessary that T ≥ 2.

In summary, if Assumption 1 holds, a consistent estimator of θ can be obtained from the following two-step procedure.

Procedure 1.

  1. For each t = 1,…,T, estimate separate probit models:

    • display math

    and compute the inverse Mills ratios, inline image.

  2. For sit = 1, estimate equation (10) with λit2 replaced by inline image by pooled NLS. Estimate the asymptotic variance (formulae are provided in the Supplement to the paper, available online as supporting information).

From Procedure 1 it is apparent that one needs at least one additional exogenous variable in the selection equation (L > K ). Although the inverse Mills ratio, inline image, is a nonlinear function of its argument, it is approximately linear on most of its range, which may lead to multicollinearity. Thus it is necessary to have at least one exclusion restriction in order to make the estimation convincing.

Even though the resulting estimator is consistent, it is not efficient. From equations (3) and (4) it is seen that the error terms in (10) are serially correlated. Besides, the errors will be heteroskedastic because of selection. A nonlinear analog of the seemingly unrelated regressions estimator (see Wooldridge, 2002, Problem 12.7) cannot be used in this context because selection is not strictly exogenous in the estimating equation (10). However, one can improve efficiency by using a GMM estimator, as discussed in the next section.

4 GMM ESTIMATION

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 THE MODEL
  5. 3 NLS ESTIMATION
  6. 4 GMM ESTIMATION
  7. 5 TESTING FOR SELECTION BIAS
  8. 6 EMPIRICAL APPLICATION
  9. 7 CONCLUSIONS
  10. ACKNOWLEDGEMENTS
  11. REFERENCES
  12. Supporting Information

The efficiency of the two-step estimator can be improved by using GMM at the second step. Equation (10) is linear in regressors, but nonlinear in parameters, which results in overidentification and permits obtaining a more efficient estimator than pooled NLS.

To specify a GMM estimator, define a 1 × (LT + 3) vector of instruments inline image, t = 1,…,T, and a T × T (LT + 3) matrix of instruments inline image:

  • display math(16)

where 0 denotes a 1 × (LT + 3) vector of zeros.

Define a T × 1 vector inline image where

  • display math(17)

From equation (10) it follows that the following moment conditions are available:

  • display math(18)

Since the conditional expectation of yit is different in each time period, equation (18) implies T (LT + 3) moment conditions. Moreover, because inline image is nonlinear in θ, these conditions are not redundant and can be used to enhance efficiency.

The GMM estimator of θ is the solution to the minimization problem

  • display math(19)

where inline image is a consistent estimator of a T(LT + 3) × T(LT + 3) positive semidefinite weighting matrix Ω − 1. The first-order condition for this problem is given by

  • display math(20)

Then, θ can be consistently estimated using a procedure similar to Procedure 1, where the GMM estimator is used instead of the pooled NLS estimator.

Note that the pooled NLS estimator is identical to a GMM estimator, which exploits the moment conditions

  • display math(21)

and uses the weighting matrix

  • display math(22)

Thus, in the NLS estimation, the instruments are ‘stacked’ on top of each other, and each time period receives an equal weight. In contrast, a general GMM estimator that uses a block-diagonal matrix of instruments, as in equation (16), assigns different weights to each time period, which can be used to improve efficiency. In the discussion below, it is the solution to the minimization problem (19), which we call the GMM estimator.

The proposed GMM estimator will be consistent for any positive definite matrix Ω; however, a particular form is preferred. Specifically, we formulate an additional assumption, as follows.

Assumption 3.

  1. Λ is the asymptotic variance of inline image
  2. Ω = Λ.
    • display math

The online supplement to the paper provides a formula for Ωˆ that satisfies Assumption 3. Following a standard argument for the relative efficiency of the GMM estimator, the GMM estimator that employs weighting matrix Ωˆ as specified in Assumption 3 is asymptotically more efficient than pooled NLS and results in a relatively simple expression for the asymptotic variance of θˆ. Specifically, denote G ≡ E[Wi(π)′ ∇ θgi(θ, π)]. If Ω satisfies Assumption 3, then the asymptotic variance of the described GMM estimator is

  • display math(23)

which can be estimated as inline image, using the formulae provided in the online supplement to the paper.

We can now summarize a two-step estimation procedure. Let Assumptions 1 and 3 hold. Then, an estimator of θ that is asymptotically more efficient than the estimator discussed in Section 3 can be obtained using the following procedure.

Procedure 2.

  1. For each t = 1,…,T, estimate separate probit models,

    • display math

    and compute the inverse Mills ratios, inline image.

  2. In equation (10), replace λit2 with inline image. For sit = 1, estimate the equation by GMM that uses moment conditions (18) and the weighting matrix that satisfies Assumption 3. Estimate the asymptotic variance (formulae are provided in the online supplement to the paper).

It is important to note that there are more moment conditions available in addition to those specified in equation (18). Equation (10) implies that eit1 is uncorrelated with any function of zi and yi0. Therefore any nonlinear functions of the exogenous variables and the initial condition should be valid instruments and can be used to obtain additional moment conditions.

The proposed two-step estimator can also be formulated as a joint GMM estimator of (θ, π). As suggested by Newey and McFadden (1994, Section 6.1), such an estimator can be obtained by ‘stacking’ the moment conditions from the two steps. The moment conditions from the second step are given in (18), while the first-order conditions from the first-step estimation generate the additional moment conditions:

  • display math(24)

The conditions in (18) and (24) can be used to form a vector of moment conditions for the joint GMM estimation. In that way the additional conditions can be used for estimating θ, which can help to improve efficiency. However, since the first-step equations are exactly identified, the efficiency gain may be modest or even not present at all. Moreover, the two-step GMM estimator appears to be computationally more tractable than the joint GMM estimator in applications where the number of the first-step moment conditions is large, for example, due to T being relatively large.

To study the properties of the proposed estimators in finite samples we performed Monte Carlo experiments.5 In the experiments, among the three estimators that account for the selection bias (two-step NLS, two-step GMM, and joint GMM that uses the moment conditions for both equations) the two-step NLS estimator has the smallest standard deviations and root mean square errors (RMSEs) in small samples (N = 200), which is likely due to the fact that the GMM estimators use estimated weighting matrices, Ωˆ, that cannot be precisely estimated in small samples. However, in large samples (N = 4000) both GMM estimators are more efficient than the two-step NLS estimator. The joint GMM estimator tends to have slightly smaller standard deviations and RMSEs than the two-step GMM estimator, but the differences are minor and virtually disappear when N is large (N = 4000).

The two-step NLS, two-step GMM and joint GMM estimators also perform reasonably well when testing simple hypotheses concerning parameters. Although for all three estimators the true null is rejected too often in small samples (with the over-rejection being most severe for the two-step GMM estimator), the computed size gets closer to the nominal size as N grows. Both the two-step GMM and joint GMM estimators outperform the two-step NLS estimator in terms of the power of the tests.

5 TESTING FOR SELECTION BIAS

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 THE MODEL
  5. 3 NLS ESTIMATION
  6. 4 GMM ESTIMATION
  7. 5 TESTING FOR SELECTION BIAS
  8. 6 EMPIRICAL APPLICATION
  9. 7 CONCLUSIONS
  10. ACKNOWLEDGEMENTS
  11. REFERENCES
  12. Supporting Information

It is possible to test for selection bias by testing the hypothesis H0: φ2t = 0 in equation (10). A variety of tests for GMM estimators described in Newey and McFadden (1994, Section 9) can be used for this purpose. However, such tests require estimation of either the restricted or unrestricted model, or both, prior to testing. Since estimation of equation (10) may be computationally costly due to nonlinearity in the parameters, it is useful to have a simple alternative.

A simple test can be developed based on the initial linear model (1). To construct a test, introduce a new selection indicator which identifies observability of yit in three consecutive periods, and nominally assume that this new indicator follows an index model with unobserved heterogeneity:

  • display math(25)

where ci3 is the unobserved effect and uit3 is the idiosyncratic error. Moreover, (nominally) assume that uit3 is normally distributed and independent of the explanatory variables and unobserved effect:

  • display math(26)

Using Chamberlain's approach and assuming normality, write the unobserved effect as

  • display math(27)

Combining (25), (26), and (27) together gives

  • display math(28)

where vit3 ≡ ai3 + uit3 is a new composite error term. With regard to the error terms in the primary equation, assume

  • display math(29)

which, when combined with the normality assumption, gives

  • display math(30)

After applying first differencing to equation (1), with some abuse of notation we can write the differenced primary equation for dit = 1 as

  • display math(31)

Thus the unobserved effect is removed by first differencing and φ3t λit3 captures the selection effect. Naturally, time-constant variables drop out from the equation. The test is then performed using the following procedure.

Procedure 3.

  1. For each of t = 3,…,T, run a probit regression

    • display math

    and compute the inverse Mills ratios, inline image.

  2. For dit = 1, augment the first-differenced primary equation by inline image and its interactions with time dummies and estimate the augmented equation by pooled two-stage least squares or GMM using yi,t − 2 and leads and lags of zit as instruments for ∆yi,t − 1 (∆xit, inline image and the interaction terms should be used as their own instruments). Use the Wald test to test the hypothesis φ31 = … = φ3T = 0.

As an extension to the proposed procedure, it is possible to impose a restriction of equal variances in the selection equation and estimate equation (28) by pooled probit. Similarly, one may assume that the effect of selection is the same in all time periods and omit the interaction terms in the second-step estimation. A test for selection bias in that case is a usual t-test of the significance of the coefficient on inline image. Note that for testing a usual variance–covariance matrix should be used; there is no need to adjust for the first-step estimation.

If, in some period, t − j (for j = 3,…,t − 1), yi,t − j is observed for all cross-section units, then yi,t − j can be used as an additional instrument in the second-step estimation. Otherwise, if there are missing values for at least some i, then the observable variable is (si,t − j·yi,t − j), and this is not a valid instrument, since we did not account for selection in period t − j when constructing inline image.

Importantly, the proposed test is valid regardless of whether or not the model in (25) is correct and whether or not the normality assumption holds. All we need for testing is a reasonable proxy for the selection effect, and the correct specification of the selection term is not essential. If a selection problem is present, hopefully this will still be captured by a non-zero coefficient on the inverse Mills ratio in the differenced equation. Similar to the estimators discussed above, having additional variables in zit that are not also in xit helps to make the test more reliable.When the hypothesis of no selection bias is not rejected, the pooled two-stage least squares or GMM estimation of the first-differenced equation with ∆xit, yi,t − 2, and leads and lags of strictly exogenous variables used as instruments, will produce consistent estimators. More distant lags can be used as additional instruments if observed for all cross-section units. However, if the null is rejected, Procedure 3 will be a valid correction procedure only if all the assumptions specified in this section are correct. Given that model for dit in equation (25) is quite restrictive, Procedure 3 is unlikely to perform well as a correction method. Therefore, the methodology described in the previous two sections should be used instead.

6 EMPIRICAL APPLICATION

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 THE MODEL
  5. 3 NLS ESTIMATION
  6. 4 GMM ESTIMATION
  7. 5 TESTING FOR SELECTION BIAS
  8. 6 EMPIRICAL APPLICATION
  9. 7 CONCLUSIONS
  10. ACKNOWLEDGEMENTS
  11. REFERENCES
  12. Supporting Information

This section illustrates the proposed methodology with an empirical example by applying the new methods to the estimation of dynamic earnings equations for females. This example is appropriate because earnings are largely determined by different historical factors and tend to be correlated over time.

The data come from the Panel Study of Income Dynamics (PSID), years 1980–1992. The sample consists of white females, who were followed over the considered period.6 Because when estimating equation (10) it is necessary that the initial condition is observed, we keep only those females for whom 1980 earnings are available. The final sample consists of 579 women, or 6948 observations over the 12-year period (1981–1992). For this period, the earnings sample is comprised of 5891 observations. Thus about 15% of earnings data are missing due to non-participation.

Because we define the population as women working in 1980, this exercise should be viewed as an evaluation of the effects of movement in and out of the labor force on estimated earnings equations. Such a question is of considerable interest in labor economics.

The dependent variable in the primary equation is the natural logarithm of the average annual hourly earnings, while the independent variables include age, age squared and time dummies. We assume that age is strictly exogenous and is not correlated with the unobserved effect. This assumption implies that the mean ability of women born in different years is about the same. Our sample is restricted to women who have completed their education (i.e. years of schooling do not vary over time); hence the effect of education is not separable from unobserved heterogeneity. Therefore we only include education as part of the unobserved effect. Additionally, to control for unobserved heterogeneity, we include the number of children in all time periods (i.e. the number of children is assumed to belong to zit, but not xit).

The selection rule is for labor force participation. A woman is considered to be a participant if she reports positive work hours in a given year. When estimating selection equations, in the probit regressions in each time period we include education, age, age squared, and the number of children in all time periods, where the number of children may have a direct effect on the labor force participation. Log of hourly earnings in 1980 is included depending on whether the methodology of Sections 2, 3, 4 or the methodology of Section 5 is used for the analysis.

Before applying the more advanced methods developed in Sections 2, 3, 4, we first estimate equation (1) using the simple approach of Section 5. From the total 1980–1992 sample we keep observations for which earnings data are available in three consecutive periods and use first differencing to remove the unobserved effect. As a result, the sample size reduces to 5033 observations; age and education drop out from the equation. Then, we estimate the first-differenced equation by pooled instrumental variables using the log of hourly earnings in t − 2 as an instrument for ∆yi,t − 1. We call this estimator the first difference instrumental variables (FD-IV) estimator.

The estimates for the log earnings equations are reported in Table 1. The first column of the table contains the estimates from FD-IV regressions without inverse Mills ratios. The second column contains the test of selection bias in the first-differenced equation using the results in Section 5. The estimate of ρ is rather similar in the two columns; it is about 0.15–0.17 and is statistically significant at the 1% level. However, the test suggests that selection bias may be present. The null of no selection is rejected at the 7% significance level. Thus one might conclude from the test using the FD equation that selection into the work force may be systematically related to idiosyncratic shocks to earnings.

Table 1. Estimates for the dynamic log(hourly earnings) equation
 FD-IV, no inline imageFD-IV with inline imageNLS, no inline imageNLS with inline imageGMM with inline image
(1)(2)(3)(4)(5)
  1. Note: Time-specific intercept is used in all regressions. FD-IV means the first difference instrumental variables estimation, where the log of hourly earnings in t − 2 is used as an instrument for the differenced log of hourly earnings in t − 1. Standard errors robust to serial correlation and heteroskedasticity are in parentheses beneath the coefficient estimates; robust p-values are beneath the test statistics. Standard errors in the NLS regression with inline image are corrected for the first-step estimation.

  2. Asterisks indicate significance at the *10% level; **5% level; ***1% level.

Lagged log of Hourly earnings0.153*** (0.049)0.172*** (0.047)0.576*** (0.056)0.586*** (0.056)0.574*** (0.040)
Education  0.033*** (0.005)0.032*** (0.005)0.029*** (0.004)
Age  0.013** (0.006)0.012** (0.007)0.009** (0.004)
Age squared−0.0002 (0.0002)−0.0001 (0.0002)−0.00018** (0.00007)−0.00016* (0.00009)−0.00013*** (0.000046)
Wald test of joint significance of the inverse Mills ratios inline image (0.065) inline image (0.036)inline image (0.000)

The estimates obtained using the methods discussed in Sections 2,3,4 are reported in the remaining three columns of Table 1. Columns (3) and (4) show estimation results from regressions where the NLS estimator is used at the second step. Column (5) contains the estimates obtained using Procedure 2, which employs GMM at the second step. The estimates for the augmented log earnings equation are reported in columns (4) and (5). Based on the Wald tests of the joint significance of the selection terms, the hypothesis of no selection bias is rejected at the 5% level in both cases. Thus we again find evidence of selection bias.

The NLS and GMM estimates of ρ are very similar in all three regressions. The estimate is about 0.6 and is significant at the 1% level, which provides evidence of state dependence in earnings offers. This estimate is rather different from the one obtained using first-differencing. Interestingly, similar results were obtained in Monte Carlo simulations, where the FD-IV estimator had substantially larger biases than the NLS estimator that did not account for selection. For all coefficient estimates, standard errors are smaller when the GMM estimator is used at the second step.

Columns (3)–(5) show an estimated effect of another year of schooling of about 3%, which is statistically significant at the 1% level. We emphasize, however, that this effect is not distinguishable from unobserved heterogeneity. Moreover, the coefficient on years of schooling in these regressions is not a true return to education because education has an additional effect on earnings through the autoregressive earnings term.

The coefficients on the age and age squared reveal a usual U-shape profile, although the corresponding estimates are less precise, particularly in the NLS regressions.

As a robustness check, we re-estimated the earnings equation using the data from years 1981–1992. The sample was restricted only to include women who reported earnings in 1981 (583 women).7 The resulting coefficient estimates and standard errors were very similar to those reported in Table 1. The only noticeable change was observed for the two-step GMM estimates of the coefficients on age and age squared, which became somewhat smaller and statistically insignificant. Based on the results of the joint Wald tests, the null of no selection bias could not be rejected; however, several selection correction terms were individually significant. Specifically, in the FD-IV regression the inverse Mills ratios for years 1984, 1985, and 1991 were significant at the 5% significance level. The correction term for year 1991 was also significant at the 5% level in the two-step NLS and two-step GMM regressions. The table with detailed estimation results is available from the authors upon request.

Returning to the discussion of the estimating equation in Section 2, we note that one could also estimate the parameters using equation (11). Is such a case, identification would rely on time variation in strictly exogenous variables, age and age squared. Moreover, the autoregressive coefficient, ρ, would only capture the observed dynamics. In applications where there are no time-varying strictly exogenous variables in the model (i.e. xit is empty), the data would not provide a distinction between the observed and unobserved dynamics.8

7 CONCLUSIONS

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 THE MODEL
  5. 3 NLS ESTIMATION
  6. 4 GMM ESTIMATION
  7. 5 TESTING FOR SELECTION BIAS
  8. 6 EMPIRICAL APPLICATION
  9. 7 CONCLUSIONS
  10. ACKNOWLEDGEMENTS
  11. REFERENCES
  12. Supporting Information

In this paper, new methods for estimating dynamic panel data models with selectivity were proposed. A distinctive feature of the new estimators is that they do not rely on differencing when treating unobserved heterogeneity. This feature allows one to avoid the weak instruments problem, which arises in the context of differencing if series are highly persistent or close to unit root. The proposed correction is relatively simple because the method requires correcting for selection in the current period only. The errors in both selection and primary equations may be heterogeneously distributed. The errors in the selection equation may also be serially dependent, and the general form of heteroskedasticity is allowed in the primary equation. Additionally, this paper develops a simple test for sample selection bias.

The proposed methods are applied to the estimation of dynamic earnings equations for females using the PSID data. The evidence of selection bias is found in both the first-differenced equation and the equation obtained after back-substitution. The NLS and GMM estimation based on the new methodology produces an estimate of the stability parameter that equals 0.6 and is rather different from the one obtained from the instrumental variables estimation of the first-differenced equation.

The proposed correction procedure is parametric and assumes normality of the errors in the selection equation. An important topic for future research is developing a semiparametric estimator, which would not require parametric assumptions regarding the error distributions. Such an estimator can be implemented within the framework of this paper using methods similar to those considered in Semykina and Wooldridge (2010).

  • 1

    Dynamic panel data models with censoring are considered, for example, by Honore and Hu (2004), Hu (2002), and Labeaga (1999). See also Bover and Arellano (1997).

  • 2

    Binder et al. (2005) show that the same problem arises in panel vector autoregressive models.

  • 3

    We thank an anonymous referee for bringing this fact to our attention. The referee also noted that an interesting question is whether our approach is less efficient than the Blundell and Bond (1998) approach. This is difficult to say, as the two approaches make different assumptions about the initial condition.

  • 4

    We thank an anonymous referee for suggesting that we consider this approach.

  • 5

    A detailed description of the experiments and all results are summarized in the online supplement to the paper.

  • 6

    We consider working-age women (ages 18–65) who were either household heads or ‘wives’, have completed their education and are neither self-employed nor agricultural workers. The woman was excluded from the analysis if her self-reported age exceeded the age constructed using information on the year of birth by more than 2 years or self-reported age was smaller than the constructed age by more than 1 year, or if the woman reported positive work hours and zero earnings.

  • 7

    The cross-section sample size increased because more women were working in 1981 than in 1980.

  • 8

    We thank an anonymous referee for suggesting that we include the discussion of this issue.

REFERENCES

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 THE MODEL
  5. 3 NLS ESTIMATION
  6. 4 GMM ESTIMATION
  7. 5 TESTING FOR SELECTION BIAS
  8. 6 EMPIRICAL APPLICATION
  9. 7 CONCLUSIONS
  10. ACKNOWLEDGEMENTS
  11. REFERENCES
  12. Supporting Information
  • Ahn SC, Schmidt P. 1995. Efficient estimation of models for dynamic panel data. Journal of Econometrics 68: 527.
  • Anderson TW, Hsiao C. 1981. Estimation of dynamic models with error components. Journal of the American Statistical Association 76: 598606.
  • Arellano M, Bond SR. 1991. Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. The Review of Economic Studies 58: 277297.
  • Arellano M, Bover O, Labeaga JM. 1999. Autoregressive models with sample selectivity for panel data. In Analysis of Panels and Limited Dependent Variable Models, in honour of Maddala G. S., Hsiao C, Lahiri K, Lee L, Pesaran MH (eds.). Cambridge University Press: Cambridge, UK, 2348.
  • Binder M, Hsiao C, Pesaran MH. 2005. Estimation and inference in short panel vector autoregressions with unit roots and cointegration. Econometric Theory 21: 795837.
  • Blundell RW, Bond SR. 1998. Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics 87: 115143.
  • Bover O, Arellano M. 1997. Estimating dynamic limited dependent variable models from panel data. Investigaciones Economicas 21: 141165.
  • Chamberlain G. 1980. Analysis with qualitative data. The Review of Economic Studies 47: 225238
  • Chamberlain G. 1982. Multivariate regression models for panel data. Journal of Econometrics 18: 546.
  • Chamberlain G. 1984. Panel data. In Handbook of Econometrics, Vol. 2, Griliches Z, Intriligator MD (eds.). North-Holland: Amsterdam, 12481318.
  • Gayle GL, Viauroux C. 2007. Root-N consistent semiparametric estimators of a dynamic panel-sample-selection model. Journal of Econometrics 141: 179212.
  • Holtz-Eakin D, Newey WK, Rosen HS. 1988. Estimating vector autoregressions with panel data. Econo-Metrica 56: 13711395.
  • Honore BE, Hu L. 2004. Estimation of cross sectional and panel data censored regression models with endogeneity. Journal of Econometrics 122: 293316.
  • Hsiao C, Pesaran MH, Tahmiscioglu AK. 2002. Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods. Journal of Econometrics 109: 107150.
  • Hu L. 2002. Estimation of a censored dynamic panel data model. Econometrica 70: 24992517.
  • Ichimura H. 1993. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics 58: 71120.
  • Klein RL, Spady RH. 1993. An efficient semiparametric estimator for binary response models. Econo-metrica 61: 387421.
  • Kyriazidou E. 2001. Estimation of dynamic panel data sample selection models. The Review of Economic Studies 68: 543572.
    Direct Link:
  • Labeaga JM. 1999. A Double-hurdle rational addiction model with heterogeneity: estimating the demand for tobacco. Journal of Econometrics 93: 4972.
  • Newey WK, McFadden D. 1994. Large sample estimation and hypothesis testing. In Handbook of Econometrics, Vol. 4, Engle RF, McFadden D (eds). North-Holland: Amsterdam, 21112245.
  • Semykina A, Wooldridge JM. 2010. Estimating panel data models in the presence of endogeneity and selection. Journal of Econometrics 157: 375380.
  • Wooldridge JM. 1995. Selection corrections for panel data models under conditional mean independence assumptions. Journal of Econometrics 68: 115132.
  • Wooldridge JM. 2002. Econometric Analysis of Cross Section and Panel Data. MIT: Cambridge, MA.
  • Wooldridge JM. 2005. Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. Journal of Applied Econometrics 20: 3954.
  • Ziliak JP, Kniesner TJ. 1998. The importance of sample attrition in life cycle labor supply estimation. Journal of Human Resources 22: 507530.

Supporting Information

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 THE MODEL
  5. 3 NLS ESTIMATION
  6. 4 GMM ESTIMATION
  7. 5 TESTING FOR SELECTION BIAS
  8. 6 EMPIRICAL APPLICATION
  9. 7 CONCLUSIONS
  10. ACKNOWLEDGEMENTS
  11. REFERENCES
  12. Supporting Information

The JAE Data Archive directory is available at http://qed.econ.queensu.ca/jae/datasets/semykina001/

FilenameFormatSizeDescription
jae_1266_sm_appendix.pdfPDF document176KSupporting Information
jae_1266_sm_appendix.texapplication/unknown29KSupporting Information

Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.