TESTING FOR CORRELATION IN ERROR-COMPONENT MODELS

This paper concerns linear models for grouped data with group-specific effects. We construct a portmanteau test for the null of no within-group correlation beyond that induced by the group-specific effect. The approach allows for heteroskedasticity and is applicable to models with exogenous, predetermined, or endogenous regressors. The test can be implemented as soon as three observations per group are available and is applicable to unbalanced data. A test with such general applicability is not available elsewhere. We provide theoretical results on size and power under asymptotics where the number of groups grows but their size is held fixed. Extensive power comparisons with other tests available in the literature for special cases of our setup reveal that our test compares favorably. In a simulation study we find that, under heteroskedasticity, only our procedure yields a test that is both size-correct and powerful. In a large data set on mothers with multiple births we find that infant birthweight is correlated across children even after controlling for mother fixed effects and a variety of prenatal care factors. This suggests that such a strategy may be inadequate to take care of all confounding factors that correlate with the mother’s decision to engage in activities that are detrimental to the infant’s health.


Introduction
The standard linear model for stratified observations on many small independent groups is y g,i = x g,i β + η g,i , g = 1, . . . , n, i = 1, . . . , m.
Although we do not make it explicit in the notation, everything to follow extends readily to unbalanced data; a discussion on this follows. The errors are likely to be correlated within groups. A standard approach (Moulton, 1986) is to model such dependence through the error-component form where α g is a group-specific effect and the errors ε g,i are assumed uncorrelated within each group. We will allow for arbitrary dependence between the α g and the x g,i within the groups. This formulation restricts the pairwise within-group covariances to be constant, a restriction that is seldomly tested.
There are several tests for the presence of a group effect ( Moulton and Randolph 1989, Akritas and Arnold 2000, Akritas and Papadatos 2004, Orme and Yamagata 2006 as well as direct estimators of the variance of group effects (Kline, Saggio and Sølvsten 2018) that are interesting in their own right (for example, in the teacher value-added literature; see Hanushek and Rivkin 2010) and can be used to construct within-between variance decompositions that are standard in the random-effect model. These procedures all break down when the ε g,i are correlated within groups, however. Likewise, the validity of many estimators of β hinges on the absence of serial correlation; examples include the popular instrumental-variable estimators of Anderson and Hsiao (1981), Holtz-Eakin, Newey and Rosen (1988), and Arellano and Bond (1991). Furthermore, even if the estimator of β is robust to serial correlation, the use of cluster-robust standard errors (Liang andZeger 1986, Arellano 1987) for inference can lead to substantial power loss and much inflated confidence regions when the errors are, in fact, uncorrelated; see Wooldridge (2003) and Stock and Watson (2008) for discussion and numerical illustrations and Berk, Brown, Buja, Zhang and Zhao (2013) and McCloskey (2017) for inference after model selection of this kind. Of course, the presence of correlation in the errors may also be a reflection of model misspecification that is of interest to detect, as in the time-series literature (using, say, the test of Ljung and Box 1978).
In this paper we develop a test of the null of no within-group correlation beyond that induced by the group-specific effect that has good sampling properties in settings where n is large but m is small. This fixed-m framework is the suitable asymptotic paradigm for micro data and is complicated by the inability to well-estimate the group-specific effects. This is a manifestation of the incidental-parameter problem (Neyman and Scott, 1948) that causes the standard correlation tests from the time-series literature to be inapplicable here. Using a portmanteau test is of interest if no strong stand can be taken on the particular form of correlation that should serve as the alternative. This is relevant in many applications, especially when the observations for a given group do not have a natural ordering (such as time, for example).
The test statistic we construct uses (estimators of) all linearly-independent differences between pairwise within-group covariances. Linear combinations of a subset of the moment conditions underlying our test statistic yield the test statistics of Arellano and Bond (1991), which can be used to test against non-zero qth-order autocorrelation in the first-differenced errors, as well as the joint test for correlation at multiple lags as discussed in Yamagata (2008). Because first-differencing introduces first-order autocorrelation also under the null such a test can only be constructed for q ≥ 2. Furthermore, at least q + 2 observations per group are needed to construct a meaningfull test for qth-order autocorrelation. Hence, a four-wave panel is needed to construct the statistic for q = 2, and a five-wave panel is needed for a joint test. In contrast, our test can be applied as soon as three observations per group are available. Inoue and Solon (2006) proposed a portmanteau test for our null under the additional assumptions that the covariates are strictly exogenous and the errors are homoskedastic.
Their test statistic depends on a regularization parameter that can severly affect power.
The available tests against specific alternatives (still in the context of static models and under the maintained assumption of homoskedasticity) are discussed in Born and Breitung (2016). The approach proposed here allows for heteroskedasticity of arbitrary form and only requires the estimator of β used to be asymptotically linear. against the first-order moving-average and autoregressive alternatives for which it was designed (see the simulations also below).
We provide asymptotic power calculations for three-wave and four-wave data. They give further insight in the behavior of our test and are subsequently used to compare its power to the regression-based test of Wooldridge (2002), the portmanteau test of Inoue and Solon (2006), and the m 2 -statistic of Arellano and Bond (1991 Testing in the one-way analysis of variance model. We initially consider the model where η g,i is directly observable, as in Cox and Solomon (1988), for example. Later we will replace η g,i by a suitable estimator. In (2.1), α g represents a group-specific unobserved effect while ε g,i is a latent idiosyncratic error that varies both across and within groups.
The standard error-component formulation assumes that all variables are independent and identically distributed, both across and within groups (as in Arellano 2003, Chapter 3). We will maintain this assumption across groups but will only impose E(ε g,i |α g ) = 0 for each group. 1 Our aim is to test the (composite) null hypothesis E(ε g,i 1 ε g,i 2 ) = 0 for all i 1 = i 2 , (2.2) which states that there is no within-group correlation beyond the correlation induced by the group-specific effect.
The presence of α g implies that a test of (2.2) based on covariances of the levels of η g,i will not be suitable. However, when iterating expectations using E(ε g,i |α g ) = 0 we see that 1 Random sampling at the group level can be relaxed. It suffices to assume that the η g,i are independent but not identically distributed across groups for our approach to go through-under suitable strengthening of the assumptions required for a law of large numbers and central limit theorem to apply. We refrain from such a sampling assumption here for ease of exposition.
For any i 1 = i 2 = i 3 this is the difference between two covariances. There are m(m − 1)/2 different covariances and, hence, linearly-independent differences. These differences are all zero if and only if we have that E(ε g,i 1 ε g,i 2 ) = δ for all i 1 = i 2 and some constant δ. Aside from this, testing (2.2) is equivalent to testing the alternative null that all r linearly-independent differences of the form in (2.3) are equal to zero. Lack of power against constant within-group covariance is unavoidable in the presence of group-specific effects and is shared by the other available test, including those by Wooldridge (2002) and Drukker (2003) and by Inoue and Solon (2006).
A convenient way to re-write the null that all the differences between covariances are zero is as follows. 2 Introduce the (m − 1) × r matrix and collect all errors for a given group in the vector η g := (η g,1 , . . . , η g,m ) . Let D denote the (m − 1) × m matrix first-difference operator, so D η g = (∆η g,2 , . . . , ∆η g,m ) , where ∆η g,i := η g,i − η g,i−1 . We then test the r-dimensional null This approach delivers testable moments as soon as more than two observations per group 2 Our null is equivalent to the set of moment conditions E(η g,i1 (η g,i2 − η g,i3 )) = 0, for i 1 = i 2 = i 3 , but only r of these equations are linearly independent. Our formulation in (2.4) is not the only way of selecting r such moments but is notationaly convenient. Note that any other way would yield (numerically) the same test statistic.

are available. 3
Observe that moments of the form are linear combinations of a subset of those in (2.4). These are qth-order autocovariances of ∆ε g,i . Arellano and Bond (1991) suggested testing for qth-order autocorrelation by evaluating whether the corresponding sample moment can be considered large relative to its standard error. The resulting test statistic is known as the m q -statistic. Yamagata (2008) proposed to combine all available m q -statistics into a single test procedure. By consequence, his moment conditions are also nested in (2.4). Notice that, as first-differencing introduces autocorrelation of order one under the null a sensible m q -statistic can only be constructed for q ≥ 2. Furthermore, the m 2 -statistic requires m ≥ 4 observations per group. The m q -statistic is available if m ≥ q + 2. Hence, for the joint approach of Yamagata (2008) to be different from the m 2 -statistic one needs panel data that consists of at least five waves.
Our test statistic for the null (2.4) is the quadratic form and its large-sample behavior, as the number of groups n grows, is summarized in Theorem 1 below. If desired, a centered version of the weight matrix can be used in the construction of s n .
In the theorem we consider sequences of local alternatives where E(ε g,i 1 ε g,i 2 ) = σ i 1 ,i 2 √ n 3 An alternative way to arrive at (2.4) is by noting that Because the distribution of α g is left unrestricted this equation, in itself, is not of direct use. However, the panel dimension allows to difference-out the second moment of the group-specific effect, yielding differences of the form which lead to (2.4). and σ i 1 ,i 2 is non-zero for at least one pair of indices i 1 = i 2 (and non-constant). We write the resulting Pitman drift in the moment condition as (2.5) Here and later we denote the non-central χ 2 -distribution with q degrees of freedom and non-centrality parameter c by χ 2 (q, c).
Proof. See the Appendix.
The result implies that a test that has size α ∈ (0, 1) in large samples can be constructed by comparing s n to the (1 − α)th quantile of the χ 2 (r, 0) distribution, rejecting the null if the statistic is larger than the quantile in question. Such a test is asymptotically unbiased, consistent against any fixed alternative, and has non-trivial asymptotic power against any Pitman sequence.
When the panel is unbalanced some of the entries of the vector v g will be missing for some groups. Setting such entries to zero (i.e., retaining only the non-missing data), our test remains consistent provided that the number of groups for which we observe η g,i 1 ∆η g,i 2 grows large for each pair (i 1 , i 2 ) that features in (2.4) (of course, under the assumption that the missingness is at-random).
Testing in the one-way regression model. We now generalize (2.1) to the regression setting where y g,i and x g,i are an observable outcome and p-vector of covariates, respectively, and η g,i is now the latent error term. Suppose that an estimatorβ of the coefficient vector β is available. Then we may use the residuals e g,i := y g,i − x g,iβ as estimators of the η g,i and construct the test statistic based on these residuals. We will require the estimatorβ to be asymptotically linear under the null and under local alternatives of the form in (2.5), i.e., that for a random variable ω g that has finite variance and zero-mean under our null, but may have non-zero mean under local deviations from our null. This is a very mild requirement as all commonly-used estimators satisfy it (of course, under suitable regularity conditions, see Newey 1985). When the covariates are strictly exogenous, for example, a natural estimator of β would be the within-group least squares estimator. This estimator is robust to within-group correlation. In contrast, when the covariates are only pre-determined, the instrumental-variable estimators described in Holtz-Eakin, Newey and Rosen (1988), which are based on moment conditions of the form E(z g,i ∆η g,i ) = 0 for z g,i := (x g,i−2 , . . . , x g,1 ) (or a subvector thereof, as in Anderson and Hsiao 1981) and all 1 < i ≤ m, will generally be asymptotically biased under local alternatives as in (2.5).
To set up our test statistic based on residuals we first define, in analogy to H g and η g , and e g := (e g,1 , . . . , e g,m ) . These allow us to constructv g := E g D e g , which is the plug-in estimator of v g = H g D η g . The use of residuals requires modifying the weight matrix in the quadratic form of the test statistic, however. In the proof to Theorem 2 we show that where the r × p Jacobian matrix featuresḢ g := ∂H g /∂β 1 , . . . , ∂H g /∂β p and X g := (x g,1 , . . . , x g,m ) , and we use I p to denote the p × p identity matrix. Each of the matrices ∂H g /∂β q is of the same form as H g , only with η g,i replaced by the qth-entry of ∂η g,i /∂β = −x g,i . A plug-in estimator of Ω is easily constructed as Thus, withω g denoting an estimator of the influence function ω g , our test statistic based on residuals isŝ Here,ω g will depend on the problem at hand. If residuals are constructed using the within-group estimator, for example, thenω g = (n −1 n g=1 X g M X g ) −1 X g M e g , where M := I N − D (D D ) −1 D , the matrix that transforms observations into deviations from their within-group mean.
Theorem 2 summarizes the large-sample properties of the test statisticŝ n . We let δ := δ + Ω E(ω g ) and use · to denote both the Euclidean norm and the Frobenius norm.
Proof. See the Appendix.
Theorem 2 differs from Theorem 1 in the local-power result. Estimation noise inβ changes the weight matrix in the non-centrality parameter. This change is independent of the alternative under consideration. Local power will be further affected ifβ suffers from asymptotic bias under the alternative. The extent to which this happens depends on the alternative in question. The degree to which both channels matter is governed by the Jacobian matrix Ω. Estimation of β will have no (asymptotic) impact on the properties of our test if Ω is equal to the zero matrix. This would happen, for example, when the covariates are strictly exogenous and the effect α g is orthogonal to all the ∆x g,i , as in the random-effect model (Arellano, 2003, Chapter 3). In this case, Theorem 2 collapses to Theorem 1.

Power comparisons
In this section we provide power comparisons in the random-effect model for three-wave and four-wave data. The appendix provides detailed power calculations for a dynamic model as an additional example.
Power calculations. We first calculate asymptotic power in specific cases for three-wave panels. In this case we test two moment conditions. Suppose that α g ∼ independent (0, γ 2 ) and that the errors are generated according to the (non-stationary) moving-average process of order one which is zero if and only if the errors are uncorrelated, i.e., θ = 0. Further, under the null, The non-centrality parameter of the limit distribution under local alternatives then equals which is independent of σ 2 0 but otherwise complicated. When the errors are homoskedastic the non-centrality parameter becomes 2 3 Consequently, power is monotone increasing in |θ| and decreasing in the ratio γ 2 /σ 2 . Both these findings are intuitive. When we set γ 2 = 0 but allow the errors to be heteroskedastic the non-centrality parameter equals Power is sensitive to the variances of the innovations. For example, an increase in σ 2 3 , ceteris paribus, will always cause power loss while changes in σ 2 1 and σ 2 2 can be both power enhancing and power reducing.
Given an expression for the non-centrality parameter we can approximate the power function of our test for any given sample size n. The left plot in Figure 1 shows the power of a 5%-level test, plotted as a function of θ, for 100 observations, no group-specific effects, and three choices of the variance parameters. The horizontal dotted line marks the size of the test. The solid line corresponds to the stationary case where σ 2 = 1 and shows substantial power. The dashed line is for a case where the variances increase, with σ 2 1 = 1, σ 2 2 = 2, and σ 2 3 = 3. In this case power is slightly lower than in the stationary case. The dashed-dotted line is for the reverse case where the cross-sectional variances decrease from three down to one. This yields a uniformly-superior power curve relative to the stationary case. Now suppose that the errors are generated via the first-order autoregressive specification where ε g,0 ∼ independent (0, σ 2 0 ). Then This will be non-zero for any value of the autoregressive parameter except zero. This includes the random-walk alternative. The non-centrality parameter in Theorem 1 can again be computed using the same matrix V as was used above. However, the resulting expression is long and difficult to interpret. When the process is stationary the expression reduces to with := ρ/(1 + ρ), which lives on (−∞, 1 2 ). Then the non-centrality parameter is equal to 2 3 An alternative where = h/ √ n for some h corresponds to an autoregressive parameter that satisfies ρ = h/ √ n+O(n −1 ). This result implies that the moving-average and autoregressive alternatives are locally asymptotically equivalent. Godfrey (1981) and Yamagata (2008) have reached the same conclusion, albeit for different tests.
The right plot in Figure 1 provides power approximations for three configurations of the autoregressive process-again for n = 100 and no group-specific effects-plotted as a function of ρ. The solid line is again for the stationary setting. This power curve is asymmetric, with power against a given ρ > 0 being lower than against −ρ < 0. This can be explained by the fact that | | is asymmetric in ρ. It takes on large (negative) values for ρ < 0 and small values for ρ > 0. Non-stationarity can be a source of power against autoregressive alternatives. The dashed power curve corresponds to the case where σ 2 0 = 0 and all other variances are equal to unity. Setting all initial conditions to zero yields power against near unit-root alternatives. In the unit-root case the Pitman drift depends only on σ 2 2 and so a larger value would be expected to yield more power close to unity. The dashed-dotted curve illustrates this by showing power when σ 1 = σ 2 3 = 1, σ 2 2 = 2, and Power comparisons. Under the null of no within-group correlation the first-differenced errors are autocorrelated at first-order (but not beyond). When errors are homoskedastic this correlation equals − 1 2 . A simple and popular alternative approach in this setting is to test whether t := corr(∆η g,i , ∆η g,i−1 ) is different from − 1 2 . This test was introduced in Wooldridge (2002, p. 282-283) and further discussed in Drukker (2003 wheret denotes the first-order sample correlation coefficient. Inspection of how t varies with θ shows that power is asymmetric, with power being higher against θ > 0 than against the corresponding −θ. This asymmetry does not arise with autoregressive alternatives. Figure   2 presents power comparisons for sample sizes of n = 100. Power is plotted for our test with γ 2 /σ 2 = 0 (solid line), γ 2 /σ 2 = 1 (dashed line), and γ 2 /σ 2 = 2 (dashed-dotted line) and for the test of Wooldridge (2002) and Drukker (2003) (dotted line). The curves clearly illustrate our findings.
The Wooldridge-Drukker test is not a portmanteau test. It will have no power when serial correlation manisfests itself only at higher order. It will also have no power against alternatives where E(ε g,i ε g,i−1 ) = E(ε g,i ε g,i−2 ), for example. The latter situation arises, for example, in the moving-average process of order two, ε g,i = u g,i + θ 1 u g,i−1 + θ 2 u g,i−2 , u g,i ∼ independent(0, σ 2 ), when θ 1 = θ 2 /(1 + θ 2 ). Indeed, here, t = − 1 2 for any value of θ 2 . The portmanteau test of Inoue and Solon (2006) can be seen as a generalization of the Wooldridge-Drukker test.
It evaluates whether the correlations between ε g,i 1 − ε g and ε g,i 2 − ε g are different from −1/(m − 1), where ε g is the within-group average of the errors. The Inoue and Solon (2006) test depends on a regularization parameter that serves to handle the fact that the demeaned errors sum to zero within each group. With three waves their procedure yields three possible tests statistics, say t (1,2) , t (1,3) , and t (2,3) . The statistic t (i 1 ,i 2 ) tests the null that corr((ε g,i 1 − ε g ), (ε g,i 2 − ε g )) = −1/(m − 1).
In stationary designs t (1,2) and t (2,3) will have the same limit behavior. Calculations reveal that t (1,2) where d = θ for moving average alternatives and d = for autoregressive alternatives.
In both cases, t (1,3) is revealed to be much more powerful than t (1,2) and t (2,3) . Figure 3 provides a power comparison between our test and the Inoue and Solon (2006) test. The power curves for our test co-incide with those in Figure 2. The power curves for the Inoue and Solon (2006)  and has a χ 2 (1, 0) limit distribution under the null. Under local moving-average alternatives of the same type as before the non-centrality parameter is , which reduces to θ 2 /4 under homoskedasticity. In contrast, our procedure tests five moment conditions, and the non-centrality parameter of its limit distribution equals 3 2 The left plot in Figure 4 provides the power curves for our test statistic (solid line) and for the m 2 -statistic (dashed line), again for a sample of size n = 100 and γ 2 = 0. Our approach is seen to be uniformly more powerful. As γ 2 /σ 2 increases relative power decreases, in line with the left plot in Figure 2 Under stationary autoregressive alternatives, E(∆η g,2 ∆η g,4 ) = − (1 − ρ)σ 2 , and the non-centrality parameter for violations of the moment condition is This expression is non-monotone in ρ. Furthermore, it approaches zero as ρ → 1 and so the m 2 -statistic will have low power against near unit-root alternatives. The non-centrality parameter for our test statistic in this context equals 3 2 This is strictly larger than the non-centrality parameter for three-wave data for all ρ. The right plot in Figure 4 -again for n = 100 and γ 2 = 0-confirms the poor power of the m 2 -statistic against positive autocorrelation patterns and shows favorable power for our test.
Simulations. We next provide results from a simulation experiment on a heteroskedastic regression model. We generated outcomes via y g,i = i − .05i 2 + α g + ε g,i , where α g ∼ N(0, 1) and the errors follow the moving-average process A motivation for this specification is a canonical regression of (log) wage on a quadratic polynomial in age/experience. Expected wages exhibit decreasing returns to age and the variance of wages decreases with age. In terms of our general model this corresponds to x g,i = (i, i 2 ) , β = (1, −.05) , and heteroskedasticity. We estimate β by the within-group estimator.  for our test based onŝ n (solid line), the test of Inoue and Solon (2006) (dashed line) implemented with the regularization parameter set as in Inoue and Solon (2006, p. 841) (which is also how it is implemented in Stata), and the heteroskedasticity-robust version of the Wooldridge-Drukker test due to Born and Breitung (2016). These would appear the most relevant comparisons here. Clockwise, the plots deal with m = 3, 6, 9, 12 and n = 250.
The test of Born and Breitung (2016) requires m ≥ 4, which explains why a power curve for this test is absent from the upper-left plot.
The simulation results confirm that our approach delivers a test that is both size correct and powerful. The Inoue and Solon (2006) test suffers from size distortion (which worsens as n grows), which arises as a consequence of the heteroskedasticity. The test of Born and Breitung (2016), although size correct, suffers from very low power against virtually all alternatives, and does so for all sample sizes considered. Power does increase with m but does so very slowly. This suggests that this test is not well suited to detect within-group correlation in micropanels.
We also present power against autoregressive alternatives of the form where we initialize each process with ε g,0 ∼ N (0, 1). The results are presented in Figure   6 and are in line with those obtained for moving-average alternatives. The Inoue and Solon (2006) test is size distorted while the Born and Breitung (2016) test is incapable of detecting moderate deviations from the null. Furthermore, here, its power curve displays quite erratic behavior to the right of zero which, in addition, depends heavily on the length of the panel. Our test controls size in all cases and has good power against all deviations from the null.

Empirical illustration
Infering the effect of smoking on birth outcomes is complicated by latent characteristics that are likely to be correlated with the mother's decision to smoke. The early literature has aimed to tackle this problem via instrumental variables; see Permutt and Hebel (1989), and Currie and Gruber (1996) and Reichman and Florio (1996) for related work. An alternative approach, taken by Abrevaya (2006), is to fit a fixed-effect model to data on mothers with multiple children. Exploiting repeated measurements is a powerful device in this context as it allows to control for all unobserved characteristics of the mother that are constant across births.
As data on mothers with two children is uninformative for our purposes we restrict the sample of Abrevaya (2006) to mothers with three children (a larger number was not observed in these data). This yields three observations on 12, 360 mothers. Such a dimension fits our asymptotic approximation very well. Table 1 provides a summary of the variables in the data. weight is the newborn's weight (in grams). smokes is a binary variable indicating whether or not the mother smokes and n cigarettes is the number of cigarettes smoked (per day). The control variables that vary across births for a given mother are the age of the mother (in years) (age), the newborn's gender (male) together with several variables that aim to measure the extent to which the mother took adequate prenatal care. These are novisit, a dummy that switches on if no prenatal visit occured, pretri2 and pretri3, which state that prenatal visits occured in the second and third trimester, respectively, and adeqcode2 and adeqcode3, which indicate whether the so-called Kessner index equals two or three, respectively. The Kessner index is a categorical measure for adequacy of prenatal care which is based upon length of gestation, number of prenatal visits and date of initial prenatal visit. The three categories of the Kessner index are 'adequate' (a value of one), 'intermediate' (a value of two) and 'inadequate' (a value of three). Table 2 summarizes the results of fitting a standard random-effect and a fixed-effect model to these data. The sign on the coefficients is as expected. Most notably, smoking has a substantially negative impact on birthweight. The table also provides p-values for the null that there is no unobserved mother heterogeneity. Under both specifications this null is strongly rejected. Furthermore, both the random-effect and the fixed-effect model associate a large fraction of the variation in birthweight to latent heterogeneity at the level of the mother, 39% and 49%, respectively.
In these data our test statistic for the null of no within-mother correlation tests two moment restrictions. The statistic takes the value 112.64, giving very strong evidence against our null (the test of Inoue and Solon (2006), on the other hand, gives a p-value of .059). Consequently, it is very likely that additional unobserved heterogeneity that is not captured by the inclusion of mother-specific effects is present in these data. One implication of this is that the attribution of half of the variation in birthweight to mother heterogeneity is not theoretically justified. A second consequence is that the random-effect model is misspecified. The fixed-effect estimator (with clustered standard errors) is robust to deviations from the null provided that the errors are mean independent of the regressors.
Given the endogeneity concerns that are central to this literature, another interpretation of our test result is that the use of a fixed-effect strategy to identify the causal effect of smoking on infant health need not be sufficient to solve the omitted-variable problem described by Abrevaya (2006).

Appendix
Proof of Theorem 1. The proof is standard. Consider first the limit result under the null. The moment conditions stated in the theorem imply that n , and so s n = z z L → χ 2 (r, 0). This is Part (i) of the theorem. Under the sequence of local alternatives E(v g ) = δ/ √ n the limit distribution of z contains an asymptotic bias term.
Proof of Theorem 2. The main difference with the proof of Theorem 1 is accounting for the estimation noise inβ. We first prove (2.7). Because β −β 2 = O P (n −1 ) by (2.6), and the covariates have finite second moments, the expansion holds. Further, as ∂v g /∂β =Ḣ g (I p ⊗D η g )−H g D X g , the moment conditions postulated in the theorem allow the application of a law of large numbers to establish that n g=1 ∂v g /∂β n P → Ω; (2.7) then follows. Next, again by the moment requirements on η g,i and ω g , the law of large numbers yields n g=1 (v g + Ωω g )(v g + Ωω g ) n P →Ṽ .

Consequently, with
z := n g=1 (v g + Ωω g )(v g + Ωω g ) −1/2 n g=1v g z z will follow a (non-central) χ 2 -distribution with r degrees of freedom. The non-centrality parameter is zero under the null of no within-group correlation andδ Ṽ −1δ under local alternatives. It remains only to show that so that the distributional approximations carry over to the feasible statisticŝ n . Given that n −1 n g=1 ω g − ω g 2 P → 0 by assumption it suffices to prove that O P → Ω and that From above, e g = η g − X g (β − β), from which, up to O P (n −1 ), g D X g n follows by re-arrangement. Then is obtained on noting that the sample averages on the righ-hand side of the expression all converge in probability to finite quantities and that β − β With i q denoting the p-dimensional unit vector in direction q = 1, . . . , p we can conveniently write (∂v g /∂β ) 2 = trace (∂v g /∂β ) (∂v g /∂β ) = For all q = 1, . . . , p we have by Cauchy-Schwarz because ε g,i , α g and x g,i all have finite fourth-order moments. We then have that n g=1 (∂v g /∂β ) 2 n = O P (1) from which n g=1 v g − v g 2 /n P → 0 follows. The proof is thus complete.
Power calculations for a dynamic regression model. Here we examine a stationary first-order autoregression and estimate the slope coefficient by the instrumental-variable estimator of Anderson and Hsiao (1981). This is the optimal generalized method-of-moment estimator constructed from the moments E   y g,0 ∆η g,2 y g,1 ∆η g,3   = 0.
In the calculations to follow we assume that α g = 0 for all groups. The Jacobian and covariance matrix of the two Anderson and Hsiao (1981) moments are, respectively, equal when the errors are uncorrelated and have variance σ 2 . These matrices can be combined to find that y g,0 ∆η g,2 + y g,1 ∆η g,3 2 .
When the moment conditions hold this is a mean-zero random variable. The moments typically fail to hold when our null of no within-group correlation is violated. When ε g,i = u g,i + θ u g,i−1 with u g,i ∼ independent (0, σ 2 ), E   y g,0 ∆η g,2 y g,1 ∆η g,3 so that E(ω g ) = (1 + β) θ. This is indeed non-zero for all θ = 0. When the errors, instead, follow the autoregressive process ε g,i = ρ ε g,i−1 + u g,i with u g,i ∼ independent (0, σ 2 ), the bias equals E   y g,0 ∆η g,2 y g,1 ∆η g,3 this can be verified using that y g,i = ∞ j=0 β j ε g,i−j , which itself follows from backward substitution. Hence, E(ω g ) = (1 + β) /(1 − ρβ) in this case. This again fails to be zero whenever ρ is non-zero.
The Jacobian of E(v g ) with respect to β is which has a simple form. It follows that, when errors follow a moving-average process and an autoregressive process, respectively, The bias expressions show that our test will have no power when β = 0. When the errors follow a first-order autoregressive process trivial power will also occur when ρ = β. The bias in the Anderson and Hsiao (1981) estimator effectively cancels the bias in our moment conditions in these cases. This result is not general, in that it is specific to the setting of a three-wave panel, stationary data, and the use of the Anderson and Hsiao (1981) estimator.
Lengthy but standard calculations show that the covariance matrix of v g + Ω ω g under the null isṼ The first matrix in the right-hand side expression corresponds to V from above (by virtue of stationarity). With |β| < 1 it follows that (Ṽ ) 11 < (V ) 11 and (Ṽ ) 12 < (V ) 12 (in magnitude) for any β.  The non-centrality parameter in Theorem 2(ii) is then found to equal for moving-average and autoregressive alternatives, respectively, where we have used the shorthand λ β := 4β 2 2β 3 + 3β 2 − 2β + 3 We note that λ β is roughly U-shaped on (−1, 1), reaching its minimum of zero at zero and its maximum of 2/3 at the boundary. This implies, for example, that the test is uniformly less powerful against moving-average alternatives compared to the case where the errors are directly observed.
We illustrate our power calculations for this example in Figure 7, again for n = 100.
The power curves are for β = − 1 2 (dashed line) and β = 1 2 (dashed-dotted line). The solid line corresponds to the power curve for the oracle test where β is known (and so Theorem 1 applies); these curves co-incide with those reported in Figure 1. The plots illustrate the power loss relative to the oracle and the dependence of power on the value of the autoregressive coefficient. In the autoregressive case it also shows the loss of power against alternatives where ρ is close to β.
Under the null the Anderson and Hsiao (1981) estimator is inefficient as we equally have E(y g,0 ∆η g,3 ) = 0 in that case. Combining both sets of moments would yield the Arellano and Bond (1991) estimator. To show the sensitivity of our test to the first-step estimator used we next evaluate local power when using the (just identified) estimator based on this additional moment condition alone. It turns out that the rank condition for this estimator fails when β = 0 and so, in what follows, we presume that β = 0. The intermediate calculations are similar to before and omitted for brevity. We havẽ For moving-average alternatives we find that indeed, E(y g,0 ∆η g,3 ) = 0 remains valid under such alternatives, and so E(ω g ) = 0 here. For autoregressive alternatives on the other hand, which is again more complicated. Here, our test will have no power when ρ = 2β/(1 + β 2 ).
The plots in Figure 8 compare the power curves of this alternative implementation of our test to the former as well as its oracle version. The power gains are substantial in the case of moving-average alternatives. There is a small power loss relative to the oracle as the estimator of β introduces additional sampling noise. The variance increase depends on β only through its square and so the power curves for β = − 1 2 in the upper plot and for β = 1