SEARCH

SEARCH BY CITATION

Keywords:

  • Model selection tests;
  • Nonnested hypotheses;
  • Nonlinear dynamic models;
  • Goodness-of-fit;
  • Mean square prediction error

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix

This paper generalizes Vuong (1989) asymptotically normal tests for model selection in several important directions. First, it allows for incompletely parametrized models such as econometric models defined by moment conditions. Second, it allows for a broad class of estimation methods that includes most estimators currently used in practice. Third, it considers model selection criteria other than the models’ likelihoods such as the mean squared errors of prediction. Fourth, the proposed tests are applicable to possibly misspecified nonlinear dynamic models with weakly dependent heterogeneous data. Cases where the estimation methods optimize the model selection criteria are distinguished from cases where they do not. We also consider the estimation of the asymptotic variance of the difference between the competing models’ selection criteria, which is necessary to our tests. Finally, we discuss conditions under which our tests are valid. It is seen that the competing models must be essentially nonnested.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix

Most procedures for choosing between two competing econometric models take the following form: each econometric model is estimated by a method that solves some optimization problem; the models are then compared by defining an appropriate goodness-of-fit or selection criterion for each model; and the better-fitting model according to this criterion is selected. In some cases the method of estimation for each model maximizes the goodness-of-fit criterion used for model selection. For instance, when the competing models are fully parametrized and estimated by maximum likelihood, some popular procedures for model selection are based on Akaike (1973), Akaike (1974) information criterion (AIC), Schwarz (1978) information criterion (SIC), or Hannan and Quinn (1979) criterion. In other cases, a different goodness-of-fit criterion is used for model selection. This arises when the competing models are estimated using the same sample and compared on their out-of-sample mean squared errors of prediction (MSEP). See Linhart and Zucchini (1986) for various other model selection criteria and procedures.

These model selection procedures are not entirely satisfactory. Since model selection criteria depend on sample information, their actual values are subject to statistical variations. As a consequence a model with a higher model selection criterion value may not outperform significantly its competitor. When the competing models are fully parametrized, nonnested and estimated by maximum likelihood (ML) and when the observations are independent and identically distributed (i.i.d.), Linhart (1988) and Vuong (1989) independently proposed a general testing procedure that takes into account statistical variations and which relies on some convenient asymptotically standard normal tests for model selection based on the familiar likelihood ratio (LR) test statistic.2 Such tests are testing the null hypothesis that the competing models are as close to the data generating process (DGP) against the alternative hypotheses that one model is closer to the DGP where closeness of a model is measured according to the Kullback–Leibler (1951) information criterion (KLIC). Thus, as in classical nested hypothesis testing, outcomes of the tests provide information on the strength of the statistical evidence for the choice of a model based on its goodness-of-fit.3

Though quite general, the applicability of Vuong's model selection tests is currently limited for various reasons. First, because they are based on the likelihood function, these tests require that the competing models be completely parametrized. For instance, this implies that error terms in competing nonlinear regression models or simultaneous equations models must be specified to belong to some parametric family of distributions. As a consequence, Vuong's tests cannot be used to discriminate between two econometric models defined by moment conditions, or more generally, between two competing models that are incompletely specified.

The second limitation arises from the method of estimation. While ML estimation is quite natural when the model selection criterion is KLIC (see e.g. White (1982)), there are various reasons that may lead a researcher to use an estimation method other than ML. For instance, for computational simplicity, robustness reasons, or by necessity because the competing models are incompletely specified, one may use an instrumental variable (IV) estimator or more generally a generalized method of moment (GMM) estimator (see Hansen (1982)), a robust estimator (see Huber (1981), Hampel et al. (1986)) or other extremum estimators (see Amemiya (1985), Gallant and White (1988)), a semiparametric estimator (see Andrews (1994), Newey and McFadden (1994), Powell (1994)), etc. Thus it is useful to provide a model selection testing framework that allows for a wide variety of estimation techniques.

Third, the maximum value of the likelihood (possibly adjusted) is not the only model selection criterion used in practice. For instance, when dealing with qualitative dependent variables models, alternative model selection criteria are Pearson-type goodness-of-fit statistics (see e.g. Moore (1978), Heckman (1981)). In linear regression models, criteria based on the in-sample MSEP are widely used (see e.g. Mallows (1973), Amemiya (1980)). When comparing the relative performance of macroeconometric models, a frequent criterion is the out-of-sample MSEP (see e.g. Meese and Rogoff (1983), Fair and Shiller (1990)). Another approach to the use of out-of-sample forecast performance for time series models is illustrated in Findley et al. (1998). Along these lines, recent contributions on model selection based on out-of-sample predictability are Diebold and Mariano (1995), West (1994), West (1996), Granger and Pesaran (2000), and White (2000).

Moreover, when the models are incompletely specified, the use of criteria other than the ML values becomes necessary. For instance, Sargan (1958) and Pesaran and Smith (1994) have proposed to use the value of the IV criterion function when the competing models are simultaneous equations models estimated by IV methods. Recently, for robustness reasons, Martin (1980), Ronchetti (1985) and Machado (1993) have proposed robust versions of the AIC and SIC criteria by replacing the likelihood part by the extremal value of the sample objective function defining the robust estimator used. See also Konishi and Kitagawa (1996) who propose a generalized information criterion that can be used with robust parameter estimates. Though the preceding criteria have a goodness-of-fit appeal, a model selection criterion need not have such a property. For instance, the precision or MSE of the parameter estimates of interest in the competing models can be a criterion for model selection (see e.g. Torro-Vizcarrondo and Wallace (1968)). This list of criteria is clearly not exhaustive. It suggests, however, than the choice of a model selection criterion depends on the researcher and the purpose of the econometric modelling.

Fourth, Vuong's tests are derived for competing models that are completely static and for observations that are i.i.d. Clearly, the i.i.d. assumption is restrictive when considering time series data. Moreover, dynamic models are frequently considered in empirical work. For instance, a classical question is to determine the order of an ARMA process (see e.g. Hannan and Quinn (1979)). Some generalizations of Vuong's tests to time series models have been undertaken recently and independently from the present work. Findley (1990), Findley (1991) considers essentially the case of competing Gaussian ARMA models when the true DGP is a strictly stationary process. Findley and Wei (1993) provide a generalization to some dynamic regression models.

The goal of the present paper is thus clear. It is to generalize Vuong (1989)model selection tests in several important directions. First, the present paper allows for incompletely specified models such as econometric models defined by moment conditions. Second, it allows for a broad class of estimation methods that includes most estimators used in practice such as the ML estimator, minimum chi-square estimators, GMM estimators, as well as other extremum estimators and some semiparametric estimators. In particular, we shall require that the estimators be inline image-consistent. See Gallant and White (1988) and Newey and McFadden (1994) for the class of parametric and semiparametric estimators considered here. Third, the present paper allows for model selection criteria other than the models’ likelihoods. An important example is the out-of-sample MSEP. Lastly, our tests are obtained for weakly dependent heterogeneous data. This permits the application of our tests to the selection of nonlinear dynamic models in times series situations.

At the outset, it is useful to stress that a distinctive feature of our approach is that both competing models may be misspecified. In particular, our approach does not require that either competing model be correctly specified under the null hypothesis under test. Such a feature reflects the observation that one can seldom specify a statistical model that can describe accurately the data in empirical work, especially in Social Sciences. See e.g. Nakamura et al. (1990) for a similar point of view. As we shall see, however, this does not prevent model comparisons.

Not requiring correct specification of the competing models also contrasts with the tests of Cox (1961), Cox (1962), which led to the development of a vast econometric literature on testing nonnested hypotheses (see Pesaran (1974) and the surveys by MacKinnon (1983) and McAleer (1987)).4Indeed, under the null hypothesis under test in Cox's approach, one of the competing models is correctly specified. Moreover Cox's tests are more difficult to compute than ours because they require a consistent estimate of the asymptotic mean of the test statistic under the null hypothesis. This also raises some theoretical difficulties when the competing models are incompletely specified. See e.g. Ghysels and Hall (1990) and Smith (1992).

The paper is divided into six more sections. Section 2 describes in some detail the general model selection framework for nonlinear dynamic models. A series of hypotheses about the asymptotic fit of the models are put forth. A basic result on the asymptotic properties of our test statistics is established under general conditions on the model selection criteria and the estimators used. The next two sections seek more primitive conditions ensuring that the conditions of this basic result hold. As the preceding examples of model selection procedures suggest, we distinguish two cases. Section 12 considers the case where model selection criteria can be viewed as optimands of the Gallant and White (1988) type although estimators that are employed are not those that maximize these criteria. In contrast, Section 18 specializes to the case where models are estimated by maximizing some criteria that are used subsequently for model selection. This section also covers the case where models are estimated by means of two-stage procedures in which the second stage involves optimizing goodness-of-fit conditional upon preliminary estimates of the nuisance parameters. Section 24 considers estimation of the asymptotic variance of the difference in goodness-of-fit despite possible misspecification of the competing models. This step is necessary for the construction of our test statistics. Section 31 discusses a critical condition for our test statistics to be asymptotically normal. Essentially, this condition requires that the estimated models be nonnested. Throughout, our theoretical results are illustrated with the comparison between two nonnested autoregressive models based on their in-sample and out-of-sample MSEP. Section 36 concludes. An appendix collects the proofs of our results.

Tests statistics and general results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix

Two econometric models, inline image and inline image, are estimated using data generated by an unknown stochastic process. The DGP satisfies

Assumption 1.

{Xt}t=−∞ is a p-dimensional stochastic process on a complete probability space inline image. To simplify the notation, we use ω to indicate the whole sequence {Xt}t=−∞+∞. Hereafter, there will be various functions indexed by n where ω appears as an argument. In most cases, these functions depend on ω through the vector (X1,…, Xn) corresponding to the period for which the competing models are compared. This need not be always the case, for instance when estimation of the competing models uses a different sample (see below).

For j = 1, 2, let γj denote the features of interest for model inline image and let Γj denote the set of possible values for γj. As usual, we require

Assumption 2.

For j = 1, 2, Γj is a compact subset of Rmath image. Thus γj and Γj can be viewed as the parameter vector of interest and the parameter space associated with model inline image, respectively.

Let inline image denote an estimator of γj. At this point, the method of estimation plays no role. The estimator inline image can be obtained from the same sample (X1,…, Xn) used for model comparison, i.e. the competing models are compared within sample. Alternatively, one may use a prior sample for estimation and the sample (X1,…, Xn) to assess the performance of the estimated competing models as in West (1994). One can also sequentially update inline image with increasing estimation samples and compare (say) the one-period ahead predictability over the sample (X1,…, Xn) of the sequentially updated models as in West (1996). We require, however, an assumption about the asymptotic behavior of the estimators, which is satisfied in general.

Assumption 3.

For j = 1, 2, inline image is a sequence of random vectors on inline image such that inline image for each n and there exists a nonstochastic sequence inline image uniformly interior to Γj for which inline image as n [RIGHTWARDS ARROW]. Note that inline image need not have a limit as n [RIGHTWARDS ARROW]. In the literature, the inline image are referred to as pseudo-true values. See Domowitz and White (1982), Bates and White (1985) and Gallant and White (1988)among others. The i.i.d. case considered in White (1982) and Vuong (1989) is considerably simpler because the pseudo-true values do not vary with the sample size. For dynamic model selection problems, it can be useful to allow for parameter drift. This can occur because the data generating process is nonstationary. But it may also occur with stationary data because of the sequence of specified models or estimating methods used.

Next, most if not all known model selection procedures are defined by model selection criteria (see e.g. Linhart and Zucchini (1986)). Typically, such criteria are goodness-of-fit criteria. Hereafter, we view them as lack-of-fit criteria, i.e. the opposite of goodness-of-fit criteria. As seen in the introduction, however, selection criteria may emphasize other aspects such as the precision of parameter estimates of interest (see e.g. Torro-Vizcarrondo and Wallace (1968)). To be general, we consider selection criteria of the form Qnj (ω, γj) for each competing model. Note that the criterion typically depends on the features of interest γj for model inline image. It may include some Akaike (1973) type correction factor for parsimony reasons as in Sin and White (1996). It is generally random because of its dependence on ω. This is the case when selection criteria use sample information, i.e. are statistics.

A first assumption on selection criteria covered by our theory is

Assumption 4.

For j = 1, 2, let {Qnj : Ω×Γj[RIGHTWARDS ARROW] R}n=1 be a sequence of functions such that inline image is measurable inline image for every γj∈Γj. There exists an equicontinuous sequence of nonstochastic functions inline image such that

  • image

as n [RIGHTWARDS ARROW]. Typically, inline image is the expectation of Qnj (ω, γj) with respect to ω. Thus Assumption 4 generally follows from a uniform strong law of large numbers (see e.g. Jennrich (1969), Andrews (1987)). As a matter of fact, many model selection criteria satisfy Assumption 4 under suitable regularity conditions (see Section 12). The most well known is minus the model log-likelihood −(1/n) log fnj (X1,…, Xn, γj) where fnj is the joint density of the first n observations associated with the parameter value γj of model inline image. Another common criterion is the MSEP based on n predictions inline image where Xt= (Y ′t, W ′t)′ and {ftj} is a sequence of predictor functions known up to γj as in West (1994), West (1996).

Given lack-of-fit criteria Qn1 (ω, γ1) and Qn2 (ω, γ2), and estimates inline image and inline image, which may come from the comparison sample (X1,…, Xn) or from another sample such as a prior one, a frequent procedure is to select the model with the smallest lack-of-fit measure inline image. Such a model selection procedure can be given an asymptotic interpretation. Given Assumptions 1–4, it follows from Domowitz and White (1982, Theorem 2.3) that inline image, for j = 1, 2. Hence

  • image(1)

where ΔQn (ω, γ1, γ2) = Qn1 (ω, γ1) − Qn2 (ω, γ2) and inline image. The quantity inline image can be interpreted as the difference between the asymptotic lack-of-fit of the competing models.

In view of the preceding remark, we shall consider various hypotheses comparing the (asymptotic) lack-of-fit of the competing models. Following Vuong (1989), the null hypothesis H0 is that inline image and inline image are asymptotically equivalent when

  • image

The first alternative hypothesis is that inline image is asymptotically better than inline image when

  • image

Similarly, the second alternative hypothesis is that inline image is asymptotically better than inline image:

  • image

Note that the limit of inline image as n [RIGHTWARDS ARROW] may not exist under the alternative hypotheses H1 and H2. Consequently, in the case of dynamic models with time series data, there is a third alternative, which is that inline image and inline image are asymptotically incomparable:

  • image

with at least one inequality being strict. In this case, there is some asymptotically nonnegligible region of the data for which inline image fits better than inline image or vice versa without the models being asymptotically equivalent.5Such a situation contrasts with the i.i.d. case considered by Vuong (1989) where the hypotheses H0, H1 and H2 are exhaustive. This is because inline image does not depend on n so that its limit necessarily exists and H3 is void. More generally, H3 is void under strict stationarity of the DGP as inline image and inline image will typically not depend on n.

As argued in the introduction, a simple numerical comparison of the sample values of the respective lack-of-fit criteria is not entirely satisfactory for it does not take into account sample variability. Significance of the difference in lack-of-fit needs to be assessed. To do so, we propose some tests with good asymptotic properties. We require additional regularity conditions. The first condition bears on lack-of-fit criteria and is similar to Assumption 4.

Assumption 5.

For j = 1, 2, inline image and inline image are continuously differentiable, and

  • image

as n [RIGHTWARDS ARROW]. Moreover, inline image is equicontinuous and inline image is bounded.

The next one is a joint assumption on estimation methods and lack-of-fit criteria. Typically, it follows from a multivariate Central Limit Theorem.

Assumption 6.

For some integer s > 0, there exists a bounded sequence of nonstochastic s × (k + 2) matrices {Cn} and a sequence of s × 1 random vectors {Zn} on inline image with inline image such that

  • image(2)

where k = k1+ k2 and

  • image(3)

In particular, Assumption 6 requires that estimators be inline image-asymptotically normal. This is satisfied by a large class of common econometric estimators such as extremum estimators (see e.g. Amemiya (1985), Gallant and White (1988) and Section 18 below) as well as some semiparametric estimators (see Newey and McFadden (1994, Section 8)). Assumption 6 also requires that lack-of-fit criteria evaluated at pseudo-true values be inline image-asymptotically normal. The latter condition will be verified in Section 12 for a large class of model selection criteria.

Let σn2= L′n C′n Cn Ln where

  • image(4)

Assumption 7.

lim  infnσn2 > 0. It turns out that σn2 is the asymptotic variance of the difference in lack-of-fit. The condition lim  infnσn2 > 0 is crucial and is discussed in Section 31.

Lastly, we need a consistent estimator of this variance.

Assumption 8.

There exists a sequence of random variables inline image on inline image such that inline image as n [RIGHTWARDS ARROW]. In Section 24, we shall propose some consistent estimators of σn2.

We are now in a position to define our test statistic as a suitably normalized difference of the sample lack-of-fit criteria:

  • image(5)

Our model selection test involves comparing values of Tn with critical values of a standard normal distribution. Let α denote the desired (asymptotic) size of the test and zα/2 the value of the inverse standard normal distribution function evaluated at 1 −α/2. If Tn < −zα/2, we reject H0 in favor of H1; If Tn > zα/2, we reject H0 in favor of H2; Otherwise, we accept H0.

An asymptotic justification of the proposed test is given in the next theorem. Define the hypotheses

  • image

Under H0*, the competing models are inline image-asymptotically equivalent. This constitutes a strengthening of H0. Note also that H1* (say) contains the important case where inline image has a finite and strictly negative limit as n [RIGHTWARDS ARROW]. We have

Theorem 1.

Given Assumptions 1–8, then σn2 is bounded and

  • (i)
    under H0*, Tn[RIGHTWARDS DOUBLE ARROW] N(0, 1),
  • (ii)
    under H1*, inline image,
  • (iii)
    under H2*, inline image.

Since H0*⊂ H0, Theorem 1 shows that our test has correct asymptotic size on a subset of the null hypothesis of asymptotic equivalence H0. The set H0*, however, contains important situations such as inline image for all n sufficiently large. Theorem 1 also shows that the test is consistent against H1* and H2*. Since H1 implies H1* and H2 implies H2*, the test is consistent against a larger set of alternatives than H1 and H2. Moreover, if inline image and inline image do not depend on n, as is typically the case under strict stationarity of the DGP, then Hj*= Hj for j = 0, 1, 2, and our proposed test has the desired asymptotic size under the null hypothesis of interest H0 and is consistent against the alternatives H1 and H2.

Model selection tests without lack-of-fit minimization

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix

Theorem 1 is derived under general assumptions. In this section and the next we seek more primitive assumptions on estimators and lack-of-fit criteria that will imply Assumptions 3–6. We postpone the discussion of Assumptions 7 and 8to Sections 31 and 24, respectively. Here we focus on the assumptions on the model selection criteria, namely Assumptions 4–6.

The framework of the preceding section is too general for our purpose. First, we need to be more precise on how the data are generated, i.e. on the DGP. We complement Assumption 1 by

Assumption 9.

For some r > 2, {Xt} is a φ-or α-mixing sequence such that φm is of size −r/(r − 1) or αm is of size −2r/(r − 2), respectively. Definitions of φ-mixing, α-mixing and size can be found in Gallant and White (1988, Chapter 3) among others. Assumption 9 allows quite general time dependence and heterogeneity of the data generating process, though extensions of our results to even more general processes such as mixingales can be entertained as in Gallant and White (1988).

Second, since we focus here on the assumptions concerning the model selection criteria, we retain the assumptions on the estimators, namely Assumption 3 and the appropriate part of Assumption 6. Specifically, the part of Assumption 6 concerning the estimators inline image is replaced by

Assumption 10.

For j = 1, 2, there exist kj× 1 random vectors {Yntj ; t = 1,…, n, n = 1, 2,…} on inline image with mean zero, bounded r th absolute moments, and near-epoch dependent upon {Xt} of size −1 such that

  • image(6)

where {Ajn+ ; n = 1, 2,…} are bounded nonstochastic symmetric kj× kj matrices. A definition and properties of near-epoch dependence can be found in Gallant and White (1988, Definition 3.13 and Chapter 4). Many econometric estimators defined by optimizing a stochastic function satisfy Assumption 10 (see Gallant and White (1988)). Some inline image-asymptotically normal semiparametric estimators also possess such an asymptotically linear representation as shown in Newey and McFadden (1994, Section 8).

Now, we turn to model selection criteria. Many common criteria actually take the form of an optimand of the type considered by Gallant and White (1988). For instance, this is the case for the log-likelihood and the MSEP seen earlier as well as for other criteria cited in the introduction and discussed below. Hence, it is natural to restrict the class of criteria by imposing the following primitive assumption.

Assumption 11.

For j = 1, 2,

  • image(7)

where inline image and

  • (i)
    mtj : Ω×Γj[RIGHTWARDS ARROW] Rmath image is measurable inline image for each γj and continuously differentiable on Γj. Also, dj : Rmath image×Γj[RIGHTWARDS ARROW] R is continuously differentiable on Rmath image×Γj,
  • (ii)
    {mtj (ω, γj)} and {∂mtj (ω, γj)/∂γj′} are almost surely Lipschitz-L1 on Γj,
  • (iii)
    {mtj (ω, γj)} and {∂mtj (ω, γj)/∂γj′} are r-dominated on Γj uniformly in t,
  • (iv)
    {mtj (ω, γj)} and {∂mtj (ω, γj)/∂γj′} are near-epoch dependent upon {Xt} of size −1 and −1/2, respectively, where the first dependence is uniform on Γj.

The form (7) of the model selection criteria and the requirements (i)–(iv) are reminiscent of the optimands and the regularity conditions placed on them that are considered by Gallant and White (1988), though to simplify, we assume that inline image is not indexed by n. Definitions of a random function that is almost surely Lipschitz-L1 and r-dominated can be found in Gallant and White (1988, Definitions 3.5 and 3.16).

As in the previous section, however, estimation plays no role. Specifically, we have in mind situations where a researcher has estimated two competing models via some estimation methods satisfying the regularity Assumption 10 and wishes to compare the estimated models according to their criterion values inline image. In contrast to the next section, the estimation methods used in this section need not optimize the selection criteria. Such situations are actually frequent. A time-series example using the out-of-sample MSEP as a model selection criterion is fully worked out after Theorem 2. Many other examples can be given. For instance, an econometric example is provided by the comparison of two competing nonlinear simultaneous equations models based on their out-of-sample MSEP, where each model is defined by a set of orthogonality conditions and, as in Andrews and Fair (1988) and Gallant and White (1988), estimated by GMM. Other examples also arise when estimating competing models by ML or other methods on ungrouped data and evaluating their lack-of-fit by some Pearson type chi-square statistics on grouped data (see e.g. Heckman (1981)). Such situations are considered in Vuong and Wang (1993b).

Let q = q1+ q2 and k = k1+ k2. Define the (q + k) × (q + k) matrix

  • image(8)

Let Rn= (Rn1′, −Rn2′)′ where Rnj is the (qj+ kj)-dimensional vector

  • image(9)

and inline image and inline image are the partial derivatives of dj (mj, γj) with respect to mj and γj evaluated at inline image.

The next result specializes Theorem 1 to model selection criteria of the form (7). It replaces the general Assumptions 4–6 by the more primitive Assumptions 9–11, and gives an expression for the asymptotic variance σn2.

Theorem 2.

Given Assumptions 1–3 and 7–11, suppose that Vn is uniformly of rank s for some s > 0. The conclusions of Theorem 1 then hold with σn2= R′n Vn Rn where Vn= O(1) and Rn= O(1).

From (8), the asymptotic variance σn2 depends generally on how inline image is estimated through the asymptotic variance of inline image. However, when dj (mj, γj) does not depend on γj and inline image, we have inline image (see also Section 18). The asymptotic variance σn2 then depends only on the asymptotic variance of the (m1, m2) components in (8). That is, inline image can be treated as if it is known since sampling uncertainty due to estimation of inline image becomes asymptotically irrelevant for testing the null hypothesis H0* of inline image-asymptotic equivalence of the competing models. In particular, under weak stationarity of the DGP, the condition inline image appears in West (1994), West (1996) and is shown to be satisfied by the out-of-sample MSEP in (non)linear regressions estimated by (non)linear least squares.

The following example illustrates the preceding result as well as the verification of its assumptions.6

Example

Consider the problem of choosing between the following autoregressive (AR) models for the univariate process {Yt}−∞+∞:

  • image

where it is specified that γj∈Γj= [−1, 1] and that εjt are uncorrelated with mean zero and variance τj2∈ϒj⊂ [0, +∞), for j = 1, 2.

A model selection procedure frequently used in macroeconomic modelling is to compare the out-of-sample MSEP of the estimated competing models. Specifically, suppose one has a sample of n = 2n* observations (Y1,…, Yn), and the first half is used for estimation while the second half is reserved for model comparison.7 For j = 1, 2 the out-of-sample MSEP for the AR (j) model is inline image where inline image and inline image is some estimator of γj based on the first half of the sample. Here, we take inline image to be the Yule–Walker estimator of the j th-order autocorrelation coefficient ρjo, namely inline image (see e.g. Fuller (1976, p. 327)). In particular, inline image belongs to Γj though it does not minimize the out-of-sample MSEP criterion Qnj (ω, γj).

We now verify that Assumptions 1–3 and 9–11 of Theorem 2 are satisfied. Assumptions 7–8 will be discussed in Sections 31 and 24, respectively. Hereafter, we assume that {Yt} is generated by a finite ARMA with roots outside the unit circle and i.i.d. Gaussian innovations.8 In particular, {Yt} is strictly stationary (see e.g. Hayashi (2000, Propositions 6.1 and 6.5)) and α-mixing of arbitrary size from Ibragimov and Linnik (1971).9For reasons seen below, define Xt= (Yt, Yt*)′ where Yt*= 0 if |t| is odd, and Yt*= Yt/2 if |t| is even or zero. Given the definition of {Xt} and Γj, Assumptions 1, 2 and 9 are thus trivially satisfied.

Turning to Assumption 11, which bears on the out-of-sample MSEP, let Mnj (ω, γj) = (M1nj (ω, γj), M2nj (ω, γj))′, where

  • image

Note that inline image from the definition of Yt*. Let dj (m1, m2) = 2(m1− m2). It is easy to see that the out-of-sample MSEP criterion Qnj (ω, γj) is of the form (7) with mtj (ω, γj) = (m1tj (ω, γj), m2tj (ω, γj))′= ((Yt−γj Yt−j)2, (Yt*−γj Yt−2j*)2)′. Moreover, inline image and inline image clearly satisfy Assumption 11 (i). Regarding Assumption 11(ii)–(iv), we verify them for {m1t (ω, γj)}. Similar arguments apply to {m2t (ω, γj)}.

Now, {m1tj (ω, γj)} is almost surely Lipschitz-L1 on Γj from Gallant and White (1988, pp. 21–22). We have ∂m1tj (ω, γj)/∂γj=−2(Yt−γj Yt−j)Yt−j. Hence, |∂m1tj (ω, γj)/∂γj−∂m1tj (ω, γoj)/∂γj | = 2Yt−j2j−γoj |, showing that {∂m1tj (ω, γj)/∂γj} is almost surely Lipschitz-L1 on Γj since E(Yt−j2) is constant and finite. Hence Assumption 11 (ii) is satisfied. Next, using j | ≤ 1 together with Minskowski and Cauchy–Schwarz inequalities, we have

  • image

whenever r > 1, where inline image is the Lr norm. This establishes Assumption 11 (iii) since E(Yt2r) is constant and finite (see Gallant and White (1988, p. 33)).10 Lastly, because they involve at most j lags of Xt, {m1t (ω, γj)} and {∂m1t (ω, γj)/∂γj} are near-epoch dependent on {Xt} of any size. Such near-epoch dependences are clearly uniform on Γj, establishing Assumption 11 (iv).

It remains to verify Assumptions 3 and 10, which bear on the estimators inline image, j = 1, 2. Following the argument in Gallant and White (1988, p. 49), inline image is a strongly consistent estimator of inline image, for j = 1, 2. Let inline image be n* /(n*− j) times the latter quantity. Because n* /(n*− j) [RIGHTWARDS ARROW] 1 as n [RIGHTWARDS ARROW], it follows that inline image. Moreover, because the process {Yt} is weakly stationary, it is easy to see that inline image, the j th-order autocorrelation coefficient of {Yt}, which is constant and in the interior of Γj. Hence Assumption 3 is satisfied. Moreover, we have

  • image

since (1/n*)inline image and inline image by a variety of Laws of Large Numbers and Central Limit Theorems (see e.g. Theorems 3.15 and 5.3 in Gallant and White (1988) for near-epoch dependent functions of mixing processes), and inline imageinline image. Hence, the estimator inline image satisfies the asymptotic linear representation (6) with Ajn+= 2/E(Yt2) and Yntj= (ρjo Yt2− Yt Yt−j)I(t ≤ n*), where inline image is the indicator of the event in parentheses. Thus, because {Ajn+} is constant and {Yntj} is near-epoch dependent on {Yt} of any size and hence near-epoch dependent on {Xt} of size −1, Assumption 10 is satisfied.

Provided Assumptions 7 and 8 are satisfied and Vn is of uniform rank (see Sections 31 and 24), Theorem 2 applies to the out-of-sample MSEP for comparing the above AR(1) and AR(2) models. That is, the quantity

  • image(10)

can be used as a model selection statistic for testing the null hypothesis H0* of inline image-asymptotic equivalence. In particular, from inline image (see the proof of Theorem 2) and weak stationarity of {Yt}, we have inline image, where inline image is the autocovariance function of {Yt}. Hence, inline image, which is independent of n. Therefore, the hypotheses H0, H1 and H2 are identical to H0*, H1* and H2*, respectively, which reduce to

  • image(11)

Moreover, because inline image and inline image does not depend on γj, the asymptotic variance σn2 does not depend on the sampling variability of inline image so that ρjo can be treated as known in the computation of σn2, as noted after Theorem 2. See also Section 24.

Model selection tests with lack-of-fit minimization

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix

Up to now, the methods used to estimate the competing models need not optimize the criteria for model selection. It frequently happens, however, that estimators minimize the chosen lack-of-fit measures. The most common situation arises when the competing models are fully parametric and estimated by ML methods. A frequent criterion then is the model log-likelihood possibly adjusted (see e.g. Akaike (1973), Akaike (1974)). When the observations are i.i.d. this situation is analyzed in Vuong (1989) for general parametric models and in Lien and Vuong (1987) for normal linear regressions. Other examples arise when estimating fully parametric models by minimum chi-square methods and using Pearson type statistics as a criterion for model selection (see Vuong and Wang (1991), Vuong and Wang (1993a)).

When the competing models are not fully parametrized, an important econometric example is given by nonlinear simultaneous equation models, where each competing model is defined by a set of implicit simultaneous equations and a set of orthogonality conditions. Each model is then estimated by nonlinear IV or GMM (see Amemiya (1985), Hansen (1982) and Gallant and White (1988) among others). Following Sargan (1958), Newey and West (1987a)and recently Pesaran and Smith (1994), when both competing models are overidentified, the value of the GMM optimand evaluated at the GMM estimator can be the basis for hypothesis testing and more generally model selection in nested and nonnested situations.

The simultaneous equation example is interesting because the lack-of-fit criterion depends on some nuisance parameters associated with the weighting matrix used in GMM estimation. To include such situations in our analysis, we partition the parameter vector γj into the parameter vector of interest θj and the vector of nuisance parameters τj. Similarly to Assumption 11, model selection criteria are now assumed to satisfy

Assumption 12.

For j = 1, 2, Γjj×ϒj where Θj and ϒj are compact subsets of Rmath image and Rmath image. Moreover,

  • image(12)

where γj= (θj′, τj′)′, inline image and

  • (i)
    mtj : Ω×Θj[RIGHTWARDS ARROW] Rmath image is measurable inline image for each θj and twice continuously differentiable on Θj. Also, dj : Rmath image×Γj[RIGHTWARDS ARROW] R is twice continuously differentiable on Rmath image×Γj,
  • (ii)
    {mtj (ω, θj)}, {∂mtj (ω, θj)/∂θj′} and {∂2 mtj (ω, θj)/∂θj∂θj′} are almost surely Lipschitz-L1 on Θj,
  • (iii)
    {mtj (ω, θj)}, {∂mtj (ω, θj)/∂θj′} and {∂2 mtj (ω, θj)/∂θj∂θj′} are r-dominated on Θj uniformly in t,
  • (iv)
    {mtj (ω, θj)}, {∂mtj (ω, θj)/∂θj′} and {∂2 mtj (ω, θj)/∂θj∂θj′} are near-epoch dependent upon {Xt} of size −1, −1 and −1/2, respectively, where the first two dependencies are uniform on Θj.

Conditions (i)–(iv) are similar to those used by Gallant and White (1988) and strengthen conditions (i)–(iv) of Assumption 11. These authors consider the case where optimands are of the form Qnj (ω, θj) = dj {Mnj (ω, θj)}. Bates and White (1985) consider the case where Qnj (ω, γj) = dj {Mnj (ω, θj), τj}. These are special cases of (12). On the other hand, Andrews and Fair (1988) consider the case where Qnj (ω, γj) = dj {Mnj (ω, θj, τj), τj}. The present formulation was preferred because it includes minimum Pearson chi-square estimation (see Vuong and Wang (1991), Vuong and Wang (1993a)).11

In this section, estimators inline image of the parameters of interest are obtained by minimizing model selection criteria conditional upon some preliminary estimates inline image. That is,

  • image(13)

for j = 1, 2. The lack-of-fit associated with model inline image is then inline image. Conditions on the asymptotic behavior of the nuisance parameter estimators are required. We assume

Assumption 13.

For j = 1, 2, let inline image be such that there exists a nonstochastic sequence inline image uniformly interior to ϒj for which inline image as n [RIGHTWARDS ARROW]. Moreover, there exist hj× 1 random vectors {Y2ntj ; t = 1,…, n, n = 1, 2,…} on inline image with mean zero, bounded r th absolute moments, and near-epoch dependent upon {Xt} of size −1 such that

  • image(14)

where {A2jn+ ; n = 1, 2,…} are bounded nonstochastic symmetric hj× hj matrices. Assumption 13 is standard and is satisfied by many optimization estimators (see Gallant and White (1988)). Unlike Bates and White (1985) and Andrews and Fair (1988), however, we do not impose conditions on cross partial derivatives with respect to θj and τj of the lack-of-fit criterion or optimand (12) so that estimation of the nuisance parameters may affect the asymptotic distribution of the estimators inline image. This extension is useful for minimum chi-square estimation and model selection tests based on Pearson type chi-square statistics (see Vuong and Wang (1991), Vuong and Wang (1993a)).

We need an identification condition similar to those used in the literature.

Assumption 14.

Let inline image and let inline image∂θj′.

  • (i)
    The sequence inline image has identifiably unique minimizers inline image uniformly interior to Θj.
  • (ii)
    The sequence of matrices A1jn is uniformly positive definite.

For a definition of identifiably uniqueness, see Domowitz and White (1982) or Gallant and White (1988).

Let h = h1+ h2. Define the (q + h) × (q + h) matrix

  • image(15)

Let Rn= (Rn1′, −Rn2′)′ where Rnj is the (qj+ hj)-dimensional vector

  • image(16)

and inline image and inline image are the partial derivatives of dj (mj, θj, τj) with respect to mj and τj evaluated at inline image.

The next theorem gives the basic result for model selection when the estimators used minimize (possibly in two steps) the lack-of-fit criteria. Relative to Theorem 1, it replaces the general Assumptions 2–6 by the more primitive Assumptions 9and 1221. The theorem also gives the corresponding expression for the asymptotic variance σn2.

Theorem 3.

Given Assumptions 1, 7–9 and 12–14, suppose that Vn is uniformly of rank s for some s > 0. The conclusions of Theorem 1 then hold with σn2= R′n Vn Rn where Rn and Vn are now defined by (16) and (15).

Note that the asymptotic variance σn2 does not depend on the first partial derivative of dj (mj, θj, τj) with respect to θj. More importantly, it is easy to see that σn2 is the same as if inline image, j = 1, 2 were known. That is, when estimators of θj minimize the chosen model selection criteria, sampling uncertainty due to estimation of inline image becomes asymptotically irrelevant for testing the null hypothesis H0* of inline image-asymptotic equivalence of the competing models. This is the case when the competing models are estimated by ML and compared on the basis of their likelihood values possibly adjusted as in Sin and White (1996).12 This is also the case for competing (non)linear regressions estimated by (non)linear least squares and compared via their in-sample MSEP, and hence their out-of-sample MSEP under covariance stationarity of the DGP, as in West (1994), West (1996). See the example below.

In contrast, when inline image, then σn2 and the asymptotic distribution of the difference in lack-of-fit inline image depend on how the nuisance parameters τj are estimated. This is so whether or not inline image, i.e. whether or not estimation of the nuisance parameters τj affects the asymptotic distributions of the estimators inline image of the parameters of interest. Such a result is surprising, but in fact agrees with Theorem 2 for the special case where inline image does not depend on θj and hj= kj so that τjj.

Example

[Continued] Consider again the problem of choosing between the AR(1) and AR(2) models of Section 12. Instead of the out-of-sample MSEP, we use here the in-sample MSEP inline image, where inline image minimizes inline image over Γj for j = 1, 2. Note that inline image is the least-squares estimator of γj constrained to the compact Γj, and is in general not equal to the j th autocorrelation estimator inline image. As before, to verify easily the assumptions of Theorem 3, we assume that {Yt} is generated by a finite ARMA with roots outside the unit circle and i.i.d. Gaussian innovations. Thus Assumptions 1 and 9are satisfied with Xt= Yt.

Let inline image. Hence, inline image also minimizes Qnj (ω, γj) over Γj for j = 1, 2. Because there are no nuisance parameters, Assumption 13 holds trivially. Moreover, Qnj (ω, γj) is of the form (12) with γjj, dj (m) = m, and mtj (ω, θj) = (Yt−γj Yt−j)2 if t ≥ j + 1 and equal to zero if t ≤ j. Thus, as in Section 12, inline image, {mtj (ω, θj)} and {∂mtj (ω, θj)/∂θj} satisfy Assumptions 12 (i)–(iv). Since 2 mtj (ω, θj)/∂θj∂θj′= 2Yt−j2, it follows that Assumption 12 is satisfied. Next, consider Assumption 14. We have inline image using the weak stationarity of {Yt}. Thus inline image is minimized uniquely at inline image, which is independent of n and belongs to (−1, +1). Also, A1jn is uniformly positive whenever n > j as A1jn= 2{(n − j)/n}γY (0). Hence, Assumption 14 is satisfied.

Provided Assumptions 7 and 8 are satisfied and Vn is of uniform rank (see Sections 24 and 31), Theorem 3 applies to the criterion inline image for comparing the AR(1) and AR(2) models. Because inline image, Theorem 3 also applies to the in-sample MSEP, i.e. the quantity

  • image(17)

can be used as a model selection statistic for testing the null hypothesis H0* of inline image-asymptotic equivalence. Moreover, because inline image is the same as for the out-of-sample MSEP evaluated at the autocorrelation estimator (see Section 12), the same remarks apply.13 Namely, the hypotheses H0, H1 and H2 are identical to H0*, H1* and H2*, respectively, and reduce to (11). Moreover, as noted after Theorem 3, because inline image minimizes Qnj (ω, γj), the asymptotic variance σn2 does not depend on the sampling variability of the (constrained) least-squares estimators inline image, and hence can be computed as if ρjo is known. See also Section 24.

Consistent variance estimation

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix

For our proposed tests to be operational, it is necessary to have a consistent estimator of the asymptotic variance σn2 (see Assumption 8). The next results are derived for situations where estimators do not necessarily optimize the selection criteria (see Section 12). Situations where estimators do optimize the selection criteria (see Section 18) are studied similarly. From Theorem 2 we know that σn2= R′n Vn Rn where Vn and Rn are given by (8) and (9), respectively. Thus it suffices to construct some consistent estimators of Vn and Rn. A consistent estimator of the (q + k)-dimensional vector Rn is obtained as usual by its sample analog evaluated at inline image (see (19) below).

The difficulty is to obtain a consistent estimator of the (q + k) × (q + k) variance covariance matrix Vn. When observations are i.i.d. and the competing models are nondynamic, constructing a consistent estimator of the asymptotic variance σn2 is straightforward (see Vuong (1989), Vuong and Wang (1991), Vuong and Wang (1993a), Vuong and Wang (1993b)). This is because inline image and Yntj do not depend on n and because the (q + k)-dimensional vectors appearing in (8) are i.i.d. Thus the matrix Vn is just the population variance covariance matrix of this (q + k)-dimensional vector. Hence a simple consistent estimator of Vn is its sample analog evaluated at the estimates inline image.

When observations are dependent and heterogeneous, consistent estimation of asymptotic variances is more complex but has been solved under general conditions (see Newey and West (1987b)), Gallant and White (1988, Chapter 6), and Andrews (1991)among others). In particular, an important condition is that the estimated model be correctly specified or that the DGP be stationary. However, even under our null hypothesis H0*, both competing models can be misspecified, while stationarity has not been assumed. The contribution of this section is to show that consistent estimation of the asymptotic variance of our test statistic is still possible in some important situations for weakly dependent and heterogeneous DGPs.

First, we strengthen Assumption 10 on the estimators inline image.

Assumption 15.

For j = 1, 2, there exists a sequence of kj-dimensional functions tj : Ω×Γj[RIGHTWARDS ARROW] Rmath image}t=1+∞ satisfying

  • (i)
    inline image is measurable inline image for each γj and continuously differentiable on Γj,
  • (ii)
    tj (ω, γj)} and {∂δtj (ω, γj)/∂γj′} are almost surely Lipschitz-L1 on Γj,
  • (iii)
    tj (ω, γj)} and {∂δtj (ω, γj)/∂γj′} are 2r-dominated on Γj uniformly in t,
  • (iv)
    tj (ω, γj)} is near-epoch dependent upon {Xt} of size −2(r − 1)/(r − 2) uniformly on Γj,

such that

  • image(18)

where {Ajn+ ; n = 1, 2,…} are bounded nonstochastic symmetric kj× kj matrices. Moreover, there exists a sequence of random matrices inline image such that inline image. A comparison of (6) and (18) gives inline image. Note that we allow tj, inline image. Extremum estimators that are inline image-asymptotically normal typically satisfy Assumption 15 (see Gallant and White (1988)).

Second, we strengthen Assumption 11 on model selection criteria.

Assumption 16.

Assumption 11 holds with (iii) and (iv) strengthened to

  • (i)
    {mtj (ω, γj)} and {∂mtj (ω, γj)/∂γj′} are 2r-dominated on Γj uniformly in t,
  • (ii)
    {mtj (ω, γj)} and {∂mtj (ω, γj)/∂γj′} are near-epoch dependent upon {Xt} of size −2(r − 1)/(r − 2) and −1/2, respectively, where the first dependence is uniform on Γj.

Next, we follow Newey and West (1987b) and Gallant and White (1988) and we introduce a truncation lag and some weights.

Assumption 17.

{mn} is a sequence of integers such that mn[RIGHTWARDS ARROW]+∞ as n [RIGHTWARDS ARROW]+∞ and mn= o(n1/4).

Assumption 18.

Given a sequence {mn}, define inline image where {ant ; t = 1,…, mn+ 1, n = 1, 2,…} is a triangular array such that |wnt | ≤Δ for some Δ < ∞ and all n = 1, 2,… and τ= 0, 1,…, mn. Moreover, for each τ, w[RIGHTWARDS ARROW] 1 as n [RIGHTWARDS ARROW]. Assumptions 17 and 18 are identical to assumptions TL and WT of Gallant and White (1988). See also Andrews (1991) for weaker assumptions.

We are now in a position to define the class of variance estimators that are considered. Using (9), a consistent estimator of Rn is, as usual, inline image where inline image is the (qj+ kj)-dimensional vector

  • image(19)

and inline image and inline image are the partial derivatives of dj (mj, γj) with respect to mj and γj evaluated at inline image. Define the (q + k) × (q + k) matrix

  • image(20)

where inline image is the (q + k)-dimensional vector

  • image(21)

We then define inline image.

The next theorem is the main result of this section. Define the (q + k) × (q + k) matrix

  • image(22)

where μnt is the (q + k)-dimensional vector

  • image(23)

Lastly, let inline image. Note that inline image in general. Indeed, inline image, which composes the first qj elements of inline image, is typically nonzero. In contrast, the last kj elements of inline image are typically equal to zero as inline image satisfies inline image in general.

Theorem 4.

Given Assumptions 1–3, 9 and 15–18, suppose that, under H0*, (i) inline image and (ii) there exists a sequence {dn} such that inline image for every t = 1,…, n with dn= O(n−1/8). Then inline image under H0*.14

As noted by Gallant and White (1988, Chapter 6), in the presence of heterogeneous observations and misspecifications, inline imageoverestimates in general Vn since inline image, where Λn is positive semidefinite (see step 2 in the proof). Moreover, as noted by these authors, Λn is not guaranteed to be bounded. The contribution of Theorem 4is thus to provide some sufficient conditions that ensure the consistency of inline image to σn2 so that asymptotically valid inferences based on Tn can be performed. Because Rn= O(1), condition (i) requires that a linear combination of inline image vanishes, while condition (ii) controls the fluctuations of the individual means μnt around the overall mean inline image.15

Condition (i) is satisfied in important situations such as selecting models estimated by ML based on their possibly adjusted likelihood values, or selecting models by GMM based on their GMM criteria. Specifically in ML estimation, we have dj (mj, γj) = mj so that inline image. Moreover, inline image and inline image satisfies inline image= 0. Therefore Rnj is zero except for its first component which is equal to one. Hence inline imageinline image. It follows that condition (i) is satisfied under H0*. In GMM estimation we have dj(mj, γj) =mj′ Pj mj where Pj is a qj× qj matrix so that inline imageinline image. Moreover, inline image satisfies inline image. Thus inline image. Hence inline image so that condition (i) again holds under H0*. The latter case includes selecting (non)linear regressions estimated by (non)linear least squares based on their in-sample or out-of-sample MSEP under covariance stationarity as in West (1994), West (1996). See the example below. More generally, when the last kj elements of inline image are zero, as noted before Theorem 4, then condition (i) reduces to inline image, which must hold under H0*.

Regarding condition (ii), note that Assumptions 15 (iii) and 16 (iii) already imply inline image for every n, t so that inline image is bounded uniformly in n, t because Rn= O(1). Thus condition (ii) strengthens this requirement. For instance, in the above ML case, this condition requires that the deviation of inline imageinline image from its mean inline imageinline image, for t = 1,…, n not only remains bounded but decreases (at the rate n−1/8) to zero as n [RIGHTWARDS ARROW]. Second, condition (ii) is clearly satisfied if Emtj (ω, γj) = Emsj (ω, γj) and tj (ω, γj) = Eδsj (ω, γj), for every t, s = 1, 2,…, γj∈Γj, and for j = 1, 2. This holds when {mtj (ω, γj), t = 1, 2,…} and tj (ω, γj), t = 1, 2,…} are first-order stationary processes.

Example

[Continued] We consider the case where the out-of-sample MSEP is used as a model selection criterion. Because Assumptions 1–3 and 9 have been already verified in Section 12, it remains to verify Assumptions 15–18 and conditions (i)–(ii) in order to apply Theorem 4. Assumptions 17 and 18 are satisfied by letting mn grow at a rate slower than n1/4 and by choosing appropriate weights wnt such as the Bartlett weights wnt= 1 − {t/(mn+ 1)} (see also footnote 13). Assumption 16 is verified by using an argument similar to that used for verifying Assumption 11 in Section 12.

To verify Assumption 15, we use the asymptotic linear representation of inline image obtained in Section 12 and the definition of Yt*. These give

  • image

Let δtj (ω, γj) =γj Yt*2− Yt* Yt−2j*= (γj Yt/22− Yt/2 Yt/2−j)I(|t| even). Note that inline image because inline image and {Yt} is stationary (see Section 12). Using an argument similar to that used for verifying Assumption 10 in Section 12, it is easy to see that Assumption 15 (i)–(iv) hold, where Xt= (Yt, Yt*)′. Moreover, let Ajn+= 2/E(Yt2) and inline image, where inline image is any consistent estimator of E(Yt2), such as inline image by any Law of Large Numbers for stationary ARMA or stationary and ergodic processes (see e.g. Hayashi (2000, p. 101)). It follows that Assumption 15 holds.

We now turn to conditions (i)–(ii). From Section 12 recall that mtj (ω, γj) = ((Yt−γj Yt−j)2, (Yt*−γj Yt−2j*)2)′. Combining this with the above definition of δtj (ω, γj), the definition of Yt*, and the weak stationarity of {Yt} gives μntj′Y (0)(1 −ρjo2)(1, I(|t| even), 0). Moreover, from the definitions of M1nj (ω, γj) and M2nj (ω, γj) given in Section 12, it is easy to see that E∂Mknj (ω, ρjo)/∂γj= 0 for k = 1, 2. Hence, Rnj′= (2, −2, 0) since dj (m, γj) = 2(m1− m2). Therefore, Rnμnt= Rn1′μnt1− Rn2′μnt2= 2γY (0)(ρ2o2−ρ1o2)(1 − I(|t|even) = 0 under H0*= H0 (see (11)). It follows that conditions (i) and (ii) are satisfied under H0*.

Theorem 4 thus applies and delivers a consistent estimator inline image of σn2. Specifically, simple algebra shows that inline image is given by (20), where inline image is replaced by the scalar inline image with

  • image

and Yt*= Yt/2 I(|t| even). As a matter of fact, we can propose a simpler consistent estimator of σn2 by exploiting the fact that Rnμnt= 0 under H0. Namely, from the alternative expression for σn2 given in footnote 14, we have R′n Unt= Rn1′ Unt1− Rn2′ Unt2, where Rnj′ Untj= 2{(Yt−ρjo Yt−j)2− (Yt*−ρjo Yt−2j*)2}. Hence, inline image using the definition of Yt*. Thus

  • image(24)

As expected from the remark after Theorem 2, the asymptotic variance σn2 can be computed by neglecting the estimation uncertainty arising from inline image, i.e. as if ρjo is known (see (10)). Moreover, because Rnμnt= 0 under H0, the expectation of the term in braces is zero under H0. Hence from Newey and West (1987b) and Gallant and White (1988), a simpler consistent estimator inline image of σn2 is given by four times the expression (20), where inline image is replaced by the difference in squared prediction errors inline image with the first and last sums starting from t = n*+ 1 and t =τ+ n*+ 1.16

On the positive asymptotic variance

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix

It remains to discuss Assumption 7, namely lim  infnσn2 > 0. Similar assumptions appear in Vuong (1989) for likelihood-based criteria in the static case and West (1996) for out-of-sample MSEP-based criteria. The purpose of this section is to characterize situations for which this assumption is violated. More precisely, we consider cases when limnσn2= 0. By considering subsequences of σn2, our results can be modified to obtain necessary and sufficient conditions for lim  infnσn2 > 0. Hereafter, we maintain H0* since we are interested in the asymptotic distribution of our test statistic Tn under the null hypothesis. Moreover, we adopt the general framework of Section 2.

Our first result shows the importance of Assumption 7.

Lemma 1.

Given Assumptions 1–6, suppose that H0* holds.

  • (i)
    Then σn2= o(1) if and only if inline image.
  • (ii)
    In addition, assume that inline image for j = 1, 2. Then σn2= o(1) if and only if inline image.

Lemma 32 extends Vuong (1989, Lemma 4.1) to dynamic situations and general model selection criteria. Part (i) shows that the inline image-asymptotic normality of our test statistic hold only if σn2 |= o(1). Part (ii) specializes to the case where estimation methods for both competing models (whether nested or nonnested) optimize their respective model selection criteria. As seen in Section 18, examples are ML estimation with log-likelihood-type criteria and GMM estimation with GMM criterion functions. The necessary and sufficient condition (ii) can then be interpreted as requiring that the estimated models are inline image-asymptotically identical. This condition is, of course, satisfied when inline image almost surely for n sufficiently large.

The last remark suggests that Assumption 7 is violated when the competing models are nested and estimated by optimizing the same selection criterion. This is confirmed by the next result. We define

Definition

Model 1 is nested in model 2 according to the chosen selection criteria if there exists a sequence of continuously differentiable functions inline image from Γ1 to Γ2 such that, for n sufficiently large, Qn1 (ω, γ1) = Qn2 {ω, hn1)}, for all (ω, γ1) ∈Ω×Γ1. In particular, this definition applies when the competing models are nested in the usual sense and the same criterion is used to compare these models. Note, however, that it is not sufficient to consider only nested models.

Theorem 5.

Given Assumptions 1–6, suppose that model 1 is nested in model 2 according to the selection criteria. Moreover, suppose that (i) inline image for j = 1, 2, (ii) inline image for any nonstochastic sequence inline image such that inline image, and (iii) inline image is the identifiably unique minimizer of inline image on Γ2. Then σn2= o(1) under H0*. As in Lemma 32 (ii), we consider the case where estimators inline image optimize the corresponding selection criterion so their limits inline image satisfy condition (i). Conditions (ii) and (iii) are typically satisfied by extremum estimators. In particular, inline image is in general asymptotically normal under sequences of local alternatives inline image converging to inline image.

Theorem 5 shows that our model selection testing procedure based on Tn is not valid when the competing models are nested according to the chosen selection criteria. For instance, this is the case for nested regression models estimated by nonlinear least squares and compared using their in-sample or out-of-sample MSEPs. This result is in agreement with previous work in nested situations (see e.g. Hansen (1982), Andrews and Fair (1988), Gallant and White (1988)), which indicates that inline image is asymptotically chi-square distributed under the null hypothesis that the smaller model is correctly specified. More generally, when σn2= o(1)Marcellino (2000) has shown that inline image follows a weighted chi-square distribution (see Vuong (1989, Definition 1)).

In view of the importance of Assumption 7, a natural question is whether one can test σn2= o(1) using an extension of the variance test proposed in Vuong (1989) for the static likelihood case. When the selection criteria are of the likelihood type, i.e. inline image(ω, γj), and inline image minimizes Qnj (ω, γj), recent work by Golden (2000) for the strict stationary case and Marcellino (2000) for the case where inline image is a martingale difference sequence has shown that inline image follows asymptotically a weighted chi-square distribution under the null hypothesis σn2= o(1). This result will continue to hold in our general framework.

Example

[Continued] When the out-of-sample MSEP is used as a model selection criterion, we have inline image. Because Assumptions 1–6 holds by the verification of Assumptions 1–3 and 9–11 in Section 12, Lemma 32 applies. Hence, from inline image and Lemma 32 (i), σn2= o(1) if and only if inline image under H0*= H0. Moreover, because inline image is consistent for inline image and inline image, as shown is Section 12, then inline image, i.e. inline image minimizes asymptotically the out-of-sample MSEP. Hence, by Lemma 32 (ii), σn2= o(1) if and only if inline image under H0. These results agree with the direct computation of the asymptotic variance σn2 given in (24).

As indicated above, one might want to test σn2= o(1), i.e. that the so-called long-run variance of the stationary process {dt} ≡ {(Yt−ρ1o Yt−1)2− (Yt−ρ2o Yt−2)2} is zero. This is obviously the case when the first and second autocorrelations ρ1o and ρ2o of the stationary process {Yt} are zero, though σn2= o(1) may hold for other DGPs under H0*= H0 given in (11). From Section 24, inline image can be used to test σn2= o(1) as the former is a consistent estimator of the latter, which is equal to inline image when the autocovariances d (τ)} of the process {dt} are summable (see e.g. Hayashi (2000, p. 401)). In particular, recent results obtained by Golden (2000) and Marcellino (2000) under some additional assumptions indicate that inline image follows asymptotically a weighted chi-square distribution under σn2= o(1) with weights that can be estimated consistently.17

Conclusion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix

This paper offers a general testing framework for assessing the statistical significance of the difference in model selection criterion values for two competing models under weak assumptions on the data generating process. Such a testing framework encompasses the static likelihood-based situations studied in Vuong (1989) as well as the out-of-sample prediction-based criteria considered in West (1994), West (1996). The competing models must be essentially nonnested but can be dynamic and incompletely specified. Our results allow for a wide class of inline image-asymptotically normal estimators and model selection criteria. Moreover, the methods used to estimate the competing models need not optimize the selection criteria used for model selection. Thus different samples can be used for model estimation and comparison. Situations where sampling uncertainty due to parameter estimation is asymptotically irrelevant for testing model equivalence are stressed. In particular, this is the case when the employed estimators optimize (possibly asymptotically) the model selection criteria used for model comparison.

To conclude, we make three remarks. First, our testing framework allows for the comparison of two competing models only. This is restrictive in practice. Extension to more than two models raises a problem of multiple comparison. Recently, Shimodaira (1998) has extended Vuong (1989) setting to multiple competing models through the use of confidence intervals, and White (2000) has extended West's (1994, 1996) prediction framework while establishing the validity of bootstrapping in such a multiple testing situation. Second, our results do not apply to the comparison of nonparametric models. Extension of our testing framework to nonparametric situations is possible as shown by Lavergne and Vuong (1996), Lavergne and Vuong (2000) for nonparametric regressions. For a survey of selection of regressors in parametric and nonparametric regressions, see Lavergne (1998).

Third, we have not attempted to address the choice of model selection criteria. Indeed no single index may be universally superior as each reflects the particular features of interest to a researcher. As Amemiya (1980) wrote ‘…all of the criteria considered are based on a somewhat arbitrary assumption which cannot be fully justified, and that by slightly varying the loss function and the decision strategy one can indefinitely go on inventing new criteria’. Recently, however, Granger and Pesaran (2000) have argued for a closer link between forecast evaluation and decision theory. Moreover, though our framework allows for a large class of criteria, one should be cautious in the choice of such criteria as systematic applications of our results may lead to nonsensical outcomes. This can be the case if different criteria are used across competing models.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix

We thank E. Guerre, P. Lavergne, A. Monfort, two referees and the Editor as well as seminar participants at the Université de Montréal, Stanford University, Université de Toulouse, Université d’Aix–Marseille, the World Congress of the Econometric Society, Barcelona, August 1990 and Malinvaud econometric seminar, Paris, November 1990. The second author is grateful to J. M. Dufour and E. Ghysels for a visit at the Université de Montréal in April 1990, which led to an early version (Rivers and Vuong 1991) containing the basic results of the current paper. Financial support from the National Science Foundation under Grant SBR-9631212 is gratefully acknowledged.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix
  • 1
    Akaike H. (1973); Information theory and an extension of the likelihood ratio principle. . In Proceedings of the Second International Symposium of Information Theory, (PetrovB.N.& CsakiF.. ed.) , pp. 25781. Akademiai Kiado, Budapest.
  • 2
    Akaike H. (1974); A new look at the statistical model identification. IEEE Transactions and Automatic Control AC-19: 71623.
  • 3
    Amemiya T. (1980); Selection of regressors. International Economic Review 21: 33154.
  • 4
    Amemiya T. (1985); Advanced Econometrics, Harvard University Press, Cambridge.
  • 5
    Andrews D.W. K. (1987); Consistency in nonlinear econometric models: a generic uniform law of large numbers. Econometrica 55: 146571.
  • 6
    Andrews D.W. K. (1991); Heteroskedasticity and autocorrelation consistent matrix estimation. Econometrica 59: 81758.
  • 7
    Andrews D.W. K. (1994); Asymptotics for semiparametric econometric models via stochastic equicontinuity. Econometrica 62: 4372.
  • 8
    Andrews D.W. K.& Fair R.C. (1988); Inferences in econometric models with structural change. Review of Economic Studies 55: 61540.
  • 9
    Bates C.& White H. (1985); A unified theory of consistent estimation for parametric models. Econometric Theory 1: 15178.
  • 10
    Caballero R.J.& Engel E.M. R. A. (1999); Explaining investment dynamics in U.S. manufacturing: a generalized (S, s) approach. Econometrica 67: 783826.
  • 11
    Cox D.R. (1961); Tests of separate families of hypotheses. . In In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. , pp. 10523.
  • 12
    Cox D.R. (1962); Further results on tests of separate families of hypotheses. Journal of the Royal Statistical Society, Series B 24: 40624.
  • 13
    Davidson R.& MacKinnnon J.G. (1981); Several tests for model specification in the presence of alternative hypotheses. Econometrica 49: 78193.
  • 14
    Diebold F.X.& Mariano R.S. (1995); Comparing predictive accuracy. Journal of Business and Economic Statistics 13: 25363.
  • 15
    Domowitz I.& White H. (1982); Misspecified models with dependent observations. Journal of Econometrics 20: 3558.
  • 16
    Ericsson N.R. (1983); Asymptotic properties of instrumental variables statistics for testing non-nested hypotheses. Review of Economic Studies 50: 287304.
  • 17
    Fair R.C.& Shiller R.J. (1990); Comparing information in forecasts from econometric models. American Economic Review 80: 37589.
  • 18
    Findley D.F. (1990); Making Difficult Model Comparisons, mimeo, U.S. Bureau of the Census.
  • 19
    Findley D.F. (1991); Convergence of finite multistep predictors from incorrect models and its role in model selection. Note di Matematica XI: 14555.
  • 20
    Findley D.F., Monsell B.C., Bell W.R., Otto, M.C.& Chen B.C. (1998); New capabilities and methods of the X-12-ARIMA seasonal adjustment program. Journal of Business and Economic Statistics 16: 12777.
  • 21
    Findley D.F.& Wei C.Z. (1993); Moment bound for deriving time series CLT's and model selection procedures. Statistica Sinica 3: 45380.
  • 22
    Fuller W.A. (1976); Introduction to Statistical Time Series, Wiley, New York.
  • 23
    Gallant A.R.& White H. (1988); A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models, Basil Blackwell, New York.
  • 24
    Gasmi F., Laffont, J.J.& Vuong Q. (1992); Econometric analysis of collusive behavior in a soft drink industry. Journal of Economics and Management Strategy 1: 277311.
  • 25
    Ghysels E.& Hall A. (1990); Testing nonnested Euler conditions with quadrature-based methods of approximation. Journal of Econometrics 46: 273308.
  • 26
    Godfrey L.G. (1983); Testing non-nested models after estimation by instrumental variables or least squares. Econometrica 51: 35565.
  • 27
    Golden M. (2000); Discrepancy Risk Model Selection Test Theory for Comparing Possibly Misspecified or Nonnested Models, mimeo, University of Texas, Dallas.
  • 28
    Gourieroux C., Monfort A.& Trognon A. (1983); Testing nested or non-nested hypotheses. Journal of Econometrics 21: 83115.
  • 29
    Granger C.W. J.& Pesaran M.H. (2000); Economic and statistical measures of forecast accuracy. Journal of Forecasting 19: 53760.
  • 30
    Hampel F.R., Ronchetti E.M., Rousseeuw, P.J.& Stahel W.A. (1986); Robust Statistics: The Approach Based on Influence Functions, Wiley, New York.
  • 31
    Hannan E.J.& Quinn B.G. (1979); The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B 41: 1905.
  • 32
    Hansen L.P. (1982); Large sample properties of generalized method of moments estimators. Econometrica 50: 102954.
  • 33
    Hayashi F. (2000); Econometrics, Princeton University Press, Princeton.
  • 34
    Heckman J. (1981); Heterogeneity and state dependence. . In Studies in Labor Markets, (RosenS.. ed.) , pp. 91139. University of Chicago Press, Chicago.
  • 35
    Huber P.J. (1981); Robust Statistics, Wiley, New York.
  • 36
    Ibragimov I.& Linnik Y. (1971); Independent and Stationary Sequences of Random Variables, Wolters-Noordhoff, Groningen.
  • 37
    Jennrich R.I. (1969); Asymptotic properties of non-linear least squares estimators. Annals of Mathematical Statistics 40: 63343.
  • 38
    King M.L.& McAleer M. (1987); Further results on testing AR(1) against MA(1) disturbances in the linear regression model. Review of Economic Studies 54: 64963.
  • 39
    Konishi S.& Kitagawa G. (1996); Generalized information criterion in model selection. Biometrika 83: 87590.
  • 40
    Kullback S.& Leibler R.A. (1951); On information and sufficiency. Annals of Mathematical Statistics 22: 7986.
  • 41
    Lavergne P. (1998); Selection of regressors in econometrics: parametric and nonparametric methods. Econometric Reviews 17: 22773.
  • 42
    Lavergne P.& Vuong Q. (1996); Nonparametric selection of regressors: the nonnested case. Econometrica 64: 20719.
  • 43
    Lavergne P.& Vuong Q. (2000); Nonparametric significance testing. Econometric Theory 16: 576601.
  • 44
    Lien D.& Vuong Q. (1987); Selecting the best linear regression model: a classical approach. Journal of Econometrics, Annals 35: 323.
  • 45
    Linhart H. (1988); A test whether two AIC's differ significantly. South African Statistical Journal 22: 15361.
  • 46
    Linhart H.& Zucchini W. (1986); Model Selection, Wiley, New York.
  • 47
    Machado J.A. F. (1993); Robust model selection and M-estimation. Econometric Theory 9: 47893.
  • 48
    MacKinnon J.G. (1983); Model specification tests against non-nested alternatives. Econometric Reviews 2: 85110.
  • 49
    Mallows C.L. (1973); Some comments on Cp. Technometrics 15: 66176.
  • 50
    Marcellino M. (2000); Model Selection for Non-linear Dynamic Models, mimeo, Universita Bocconi.
  • 51
    Martin R.D. (1980); Robust estimation of autoregressive models. . In Directions in Time Series, (BrillingerD.R.& TiaoG.C.. ed.) , pp. 22862. Institute of Mathematical Statistics, Hayward.
  • 52
    McAleer M. (1987); Specification tests for separate models: a survey. . In Specification Analysis in the Linear Model, (KingM.L.& GilesD.E. A.. ed.) , pp. 14695. Routledge and Kegan Paul, London.
  • 53
    Meese R.A.& Rogoff K. (1983); Empirical exchange rate models of the seventies: do they fit out of sample?Journal of International Economics 14: 324.
  • 54
    Mizon G.E.& Richard J.F. (1986); The encompassing principle and its applications to testing non-nested hypotheses. Econometrica 54: 65778.
  • 55
    Moore D. (1978); Chi-square tests. . In Studies in Statistics, vol. 19. (HoggR.V.. ed.) The Mathematical Association of America,
  • 56
    Nakamura A.O., Nakamura, M.& Orcutt–Duleep H. (1990); Alternative approaches to model choice. Journal of Economic Behavior and Organization 14: 97125.
  • 57
    Newey W.K.& McFadden D. (1994); Large sample estimation and hypothesis testing. . In Handbook of Econometrics, vol. 4. (EngleR.F.& McFaddenD.. ed.) , pp. 2111245. North Holland, Amsterdam.
  • 58
    Newey W.K.& West K.D. (1987a); Hypothesis testing with efficient method of moments estimators. International Economic Review 28: 77787.
  • 59
    Newey W.K.& West K.D. (1987b); A simple positive semi-definite heteroskedasticty and autocorrelation consistent covariance matrix. Econometrica 55: 7038.
  • 60
    Paarsch H.J. (1997); Deriving an estimate of the optimal reserve price: an application to British Columbian timber sales. Journal of Econometrics 78: 33357.
  • 61
    Pesaran M.H. (1974); On the general problem of model selection. Review of Economic Studies 41: 15371.
  • 62
    Pesaran M.H.& Smith R.J. (1994); A generalized R2 criterion for regression models estimated by the instrumental variables method. Econometrica 62: 70510.
  • 63
    Pham T.& Tran L. (1980); The Strong Mixing Properties of the Autoregressive Moving Average Time Series Models, Seminaire de Statistique, Grenoble.
  • 64
    Powell J.L. (1994); Estimation of semiparametric models. . In Handbook of Econometrics, vol. 4. (EngleR.F.& McFaddenD.. ed.) , pp. 2443521. North-Holland, Amsterdam.
  • 65
    Rivers D.& Vuong Q. (1991); Model selection tests for nonlinear dynamic models, Working Paper 9108, INRA-ESR, Toulouse.
  • 66
    Ronchetti E. (1985); Robust model selection in regression. Statistics and Probability Letters 3: 213.
  • 67
    Sargan J.D. (1958); The estimation of economic relationships using instrumental variables. Econometrica 26: 393415.
  • 68
    Schwarz G. (1978); Estimating the dimension of a model. Annals of Statistics 6: 4614.
  • 69
    Shimodaira H. (1998); An application of multiple comparison techniques to model selection. Annals of the Institute of Statistical Mathematics 50: 115.
  • 70
    Sin C.Y.& White H. (1996); Information criteria for selecting possibly misspecified parametric models. Journal of Econometrics 71: 20725.
  • 71
    Smith R.J. (1992); Non-nested tests for competing models estimated by generalized method of moments. Econometrica 60: 97380.
  • 72
    Torro-Vizcarrondo C.& Wallace T.D. (1968); A test of the mean square error criterion for restrictions in linear regression. Journal of the American Statistical Association 63: 55872.
  • 73
    Vuong Q. (1989); Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57: 30733.
  • 74
    Vuong Q.& Wang W. (1991); Tests for model selection using power divergency statistics, Working Paper 9106, INRA–ESR, Toulouse.
  • 75
    Vuong Q.& Wang W. (1993a); Minimum chi-square estimation and tests for model selection. Journal of Econometrics 56: 14168.
  • 76
    Vuong Q.& Wang W. (1993b); Selecting estimates using chi-square statistics. Annales d’Economie et de Statistique 30: 14364.
  • 77
    Walker A.M. (1967); Some tests of separate families of hypotheses in time series analysis. Biometrika 54: 3968.
  • 78
    West K.D. (1994); Asymptotic Inference about predictive ability, mimeo, University of Wisconsin.
  • 79
    West K.D. (1996); Asymptotic inference about predictive ability. Econometrica 64: 106784.
  • 80
    White H. (1982); Maximum likelihood estimation of misspecified models. Econometrica 50: 125.
  • 81
    White H. (1984); Asymptotic Theory for Econometricians, Academic, New York.
  • 82
    White H. (2000); A reality check for data snooping. Econometrica 68: 1097126.
  • 83
    Wolak F. (1994); An econometric analysis of the asymmetric information regulator utility interaction. Annales d’Economie et de Statistique 34: 1369.

Appendix

  1. Top of page
  2. Abstract
  3. Introduction
  4. Tests statistics and general results
  5. Model selection tests without lack-of-fit minimization
  6. Model selection tests with lack-of-fit minimization
  7. Consistent variance estimation
  8. On the positive asymptotic variance
  9. Conclusion
  10. Acknowledgements
  11. References
  12. Appendix
Proof of Theorem 1

A Taylor expansion of inline image around inline image gives

  • image

where inline image is in the line segment inline image for j = 1, 2. Multiplying by inline image and adding and subtracting a term, we obtain

  • image

Now, inline image because inline image by Assumption 3. Thus, given Assumptions 1, 2 and 5, it follows from Domowitz and White (1982, Theorem 2.3)that inline image. Moreover, by Assumption 6, inline image. Hence we obtain

  • image(25)

Subtracting inline image from both sides, and then subtracting the resulting equations for j = 1, 2 from each other, we obtain in matrix notation

  • image(26)

where the first equality follows from (3) and (4), and the second equality uses Assumption 6 and the boundedness of Ln, which is implied by Assumption 5.

Now we apply White (1984, Corollary 4.24) with inline image, Vn= Is and An= Cn Ln. Since Ln= OP (1) and Cn= OP (1) by Assumptions 5–6, then An= OP (1). Moreover, An is of (full column) rank one for all n sufficiently large by Assumption 7. Thus, if σn2= L′n C′n Cn Ln, then σn2= O(1) and inline image. Since σn−1= O(1) by Assumption 7, multiplying (A.2) by σn−1 gives

  • image(27)

Finally, we have

  • image

Since σn= O(1) and σn−1= O(1), it follows from Assumption 8 that inline imageinline image. Moreover, because 1/M < σn−1 < M for n sufficiently large and some M > 0, statements (i)–(iii) follow immediately from (5) and (A.3).

Proof of Theorem 2

We shall show that the conditions of Theorem 2 imply those of Theorem 1. It suffices to show that Assumptions 4–6are implied by the conditions of Theorem 2. We then shall show that σn2= R′n Vn Rn as required.

Step 1: Verification of Assumptions 4 and 5

Consider the quantities Mnj (ω, γj), ∂Mnj (ω, γj)/∂γj′ and their expectations. From Assumptions 1, 2, 9 and 11, it follows from Gallant and White (1988, Theorem 3.18) that

  • image(28)

where EMnj (ω, γj) and E∂Mnj (ω, γj)/∂γj′ are continuous on Γj uniformly in n, i.e. equicontinuous. Moreover, by Assumption 11 (iii), the latter two functions are uniformly bounded, i.e. there exists an M finite such that E | Mnj (ω, γj) |< M and E | ∂Mnj (ω, γj)/∂γj′ |< M for all γj and all n. Now define the nonstochastic function inline image from Γj[RIGHTWARDS ARROW] R for j = 1, 2 where inline image. Because inline image is continuous and EMnj (ω, γj) is uniformly bounded and equicontinuous, then inline image is equicontinuous. Moreover, because Γj is compact, (A.4) implies that inline image. This completes the verification of Assumption 4.

Turning to Assumption 5, from the LDC theorem we have ∂Emtj (ω, γj)/∂γj′= E∂mtj(ω, γj)/∂γj′, for t = 1, 2,…. Thus inline image is continuously differentiable on Γj, and we have by the chain rule

  • image(29)

In fact, inline image is equicontinuous because each term of (A.6) is equicontinuous. The latter statement follows from the continuous differentiability of inline image and the uniform boundedness and equicontinuity of EMnj (ω, γj) and E∂Mnj (ω, γj)/∂γj′. In particular, inline image is bounded as required. Lastly, to prove uniform convergence, note that

  • image(30)

Since inline image is continuously differentiable and Γj is compact, (A.4) and (A.5) imply

  • image

Thus, because dj {EMnj (ω, γj), γj}/∂mj′ and E∂Mnj (ω, γj)/∂γj′ are both O(1), we obtain from (A.4)–(A.7) that inline image uniformly on Γj. This completes the verification of Assumption 5.

Step 2: Verification of Assumption 6

A Taylor expansion gives

  • image(31)

where inline image is in the line segment inline image. Now inline image because of (A.4), Assumption 3 and Domowitz and White (1982, Theorem 2.3). Hence inline image. Since inline image and Γj is compact, then inline image. Thus, providedinline image, which is proved later, (A.8) becomes

  • image

Hence, stacking up and using Assumption 10, we obtain

  • image(32)

where Un is defined in Assumption 6, and inline image.

We shall show that inline image satisfies Assumption 6. To do so, define

  • image

First, by Assumptions 10 and 11 with r > 2, we have Wjn < ∞ and Bjn < ∞. Moreover, Wjn= O(1) and Bjn= O(1). The proof of the latter follows the proof that Bno= O(1) in Gallant and White (1988, pp. 86–87). Specifically, define inline image (or Znt=λ′Yntj) where λ∈ Rmath image (or λ∈ Rmath image), λ′ λ= 1 so that inline image (or λ′Bjnλ). Now, since EZnt= 0,

  • image

But, given Assumptions 1, 9, 11, (or 10), it follows from Gallant and White (1988, Lemma 3.14)that Znt is a mixingale of size −1 and a fortiori of size −1/2 with cnt≤Δ≤∞ for all n, t. Therefore, applying McLeish's inequality (Gallant and White (1988, Theorem 3.11)), we have inline image. Thus, λ′Wjnλ (or λ′Bjnλ) is O(1) for arbitrary λ∈ Rmath image or (λ∈ Rmath image), λ′λ= 1, implying that Wjn= O(1) (or Bjn= O(1)).

Next, define Vnt= (Vnt1′, Vnt2′)′ where inline image so that Vn is the covariance matrix of inline image by (8). Thus, by the Cauchy–Schwartz inequality, it follows from the preceding properties of Wjn and Bjn that Vn < ∞ and Vn= O(1). Moreover, by assumption, Vn is uniformly of rank s > 0. Hence, for every n = 1, 2,…, there exists a (k + q) × s matrix Pn that is uniformly of full column rank such that Vn= Pn P ′n. Since Vn= O(1), then Pn= O(1) also. Let inline image so that, almost surely,

  • image(33)

We shall show that inline image. The proof is similar to the proof that inline imageBno−1/2 Mnto[RIGHTWARDS DOUBLE ARROW] N(0, I) in Gallant and White (1988, p. 87).

Specifically, define Znt=λ′(P ′n  Pn)−1 P ′n Vnt. where λ∈ Rs, λ′ λ= 1. Thus, EZnt= 0. By Assumptions 10 and 11, we have ‖Vntjr≤Δ < ∞,   r > 2. Moreover, because Pn is O(1) and Pn is uniformly of full-column rank, then (P ′n  Pn)−1= O(1) and (P ′n Pn)−1 P ′n= O(1). Thus ‖Zntr≤Δ′ for r > 2. In addition, by Assumptions 10 and 11, {Znt} is near-epoch dependent on {Xt} of size −1 where {Xt} is mixing with φm of size −r/(r − 1) or αm of size −2r/(r − 2) by Assumption 9. Define

  • image

so that vn−2= O(n−1). Hence, by Gallant and White (1988, Theorem 5.3),

  • image

Thus, inline image by the Cramer–Wold device. In particular, because Pn= O(1), (A.10) implies that inline image, a condition which was needed to obtain (A.9).

Combining (A.9) and (A.10), we obtain inline image, where

  • image

Note that Cn= O(1) because of Assumptions 10 and 11, inline image, inline image compact, and Pn= O(1). Thus Assumption 6 is verified.

Step 3: Computation of σn2

We have

  • image

Using Pn P ′n= Vn and formula (A.6) for inline image, we obtain the desired result from σn2= L′n C′n Cn Ln.

Proof of Theorem 3

We shall show that the conditions of Theorem 3 imply those of Theorem 2. Assumptions 2 and 11 are trivially implied by Assumption 12. Thus, we need to prove that Assumptions 3and 10 are implied by the conditions of Theorem 3. The desired result will follow by application of Theorem 2.

Step 1: Verification of Assumption 3

Let inline image. The existence and measurability of inline image follows from Gallant and White (1988, Lemma 2.1) or Jennrich (1969) because inline image is measurable-inline image for every θj∈Θj and is continuous in θj for almost all ω by Assumption 12 (i).

Define inline image. The fact that inline image belongs to the interior of Γj uniformly in n follows from Assumptions 13 and 14 (i). It remains to verify that inline image. In view of Assumption 13, it suffices to prove that inline image. By Assumption 12 (iii) and r > 2, we have inline image(ω, θj)‖≤Δ, for some Δ < ∞, and all θj∈Θj and n = 1, 2,…. Also, by Assumptions 12 (ii)–(iv), it follows from Gallant and White (1988, Theorem 3.18) that EMnj (ω, θj) is continuous on Θj uniformly in n, and inline image uniformly on Θj. Hence, using Assumption 13, we have

  • image

The first vector is measurable on Ω and, for each ω∈Ω, is continuous in θj. The second vector is also continuous in θj. Since EMnj (ω, θj) is bounded on Θj uniformly in n, θj∈Θj compact, and inline image compact, it follows from Gallant and White (1988, Lemma 3.4)or Bates and White (1985, Lemma 2.4) that

  • image

where inline image. Therefore, by Gallant and White (1988, Theorem 3.3) or Domowitz and White (1982, Theorem 2.2), it follows from Assumption 14 (i) that inline image.

Step 2: Verification of Assumption 10

First, we consider the asymptotic distribution of inline image. Because inline image belongs to the interior of Θj uniformly in n by Assumption 14, and inline image, we have inline image. From a Taylor expansion around inline image we obtain

  • image(34)

where inline image and inline image. Now

  • image(35)

where 2 Mnj (ω, θj)/∂θj∂θj′≡(∂2 Mn1j (ω, θj)/∂θj∂θj′,…, ∂2 Mmath imagej (ω, θj)/∂θj∂θj′)′ using the same notation as in Gallant and White (1988, Lemma 5.2). Also

  • image

On the other hand, from Assumption 12 (iii) and the LDC theorem,

  • image

Thus, from inline image, we obtain

  • image

Note that every function in the right-hand side of any of the above three equations is bounded on Θj or Θj×ϒj uniformly in n because of Assumption 12 (iii). Hence inline image and its derivatives are bounded on Θj×ϒj uniformly in n.

Now, given Assumptions 1, 9 and 12, it follows from Gallant and White (1988, Theorem 3.18) that

  • image

where EMnj (ω, θj), E∂Mnj (ω, θj)/∂θj, and E∂2 Mnj (ω, θj)/∂θj∂θj′ are continuous on Θj uniformly in n. Since these three functions are bounded on Θj uniformly in n because of Assumption 12 (iii), and since inline image and its first two partial derivatives are continuous and hence uniformly continuous on compact sets, it follows from White (1984, Proposition 2.16) that

  • image

Moreover, inline image and inline image are continuous on Θj×ϒj uniformly in n.

Using the preceding results with inline image and inline image because inline image and inline image, it follows from Domowitz and White (1982, Theorem 2.3) that

  • image

where inline image and inline image are both O(1). Hence, Assumption 14 (ii) implies that inline image is nonsingular for n sufficiently large almost surely and A1jn−1= O(1). Thus, from (A.11) we obtain

  • image

But, from A1jn= O(1) uniformly positive definite and A1jn−1= O(1), we have

  • image

using White (1984, Proposition 2.16) and inline image= O(1). Thus, providedinline imageinline image, inline image and inline image, we obtain

  • image(36)

Consider now the first term inside the braces. From (A.12) we obtain

  • image(37)

From a Taylor expansion around inline image we have

  • image

where inline image. Thus, multiplying (A.14) by inline image, we obtain

  • image(38)

because inline image, which follows from Assumption 14 (i), i.e.

  • image

But inline image from uniform convergence established earlier. Hence, because EMnjinline image, inline image compact, inline image compact, we have

  • image

where the second term is O(1). Moreover, since inline image, we have

  • image

where the second terms are O(1). Since inline image from Assumption 12 (iii), (A.15) becomes

  • image(39)

using the notation defined in the text, providedinline image and inline image are both OP (1). The first OP (1) condition follows from Assumption A1, 9 and 12 by the argument used by Gallant and White (1988, pp. 85–86) for proving that inline image. The second OP (1) condition follows similarly. Note that (A.16) implies that inline image, as required for deriving (A.13) because inline image, inline image, inline image, and inline image are all O(1) as mentioned earlier.

Collecting results, as in Gallant and White (1988, p. 75) let

  • image

where inline image. Hence, from (A.16) we have

  • image

where EY1ntj= 0. Thus, from (A.13) and Assumption 13, we obtain

  • image(40)

which is in the form of Assumption 10. Now recall that inline image and A1jn−1= O(1), as shown earlier, and A2jn+= O(1) by Assumption 13. Hence Ajn+= O(1) as required. Second, E(Yntj) = 0 where Yntj= (Y1ntj′, Y2ntj′)′. The r-integrability of Yntj uniformly in n, t follows from Assumptions 12 and 13, and the boundedness of the nonstochastic matrices appearing in Yntj. Finally, {Yntj} is near-epoch dependent on {Xt} of size −1 because of Assumptions 12 and 13. This establishes Assumption 10.

Step 3: Computation of σn2

From Theorem 2, we have σn2= R′n  Vn  Rn where inline image,

  • image

and Ajn+ is the upper block triangular matrix in (A.17). Thus, matrix algebra gives inline image. Let Vnt= (Vnt1′, Vnt2′)′, where inline image. Hence, we obtain

  • image

where the last term has a covariance matrix Vn with uniform rank s > 0, as assumed in Theorem 3 and required by Theorem 2. The desired result follows.

Proof of Theorem 4

We prove three properties, and then the result.

Step 1

The first property is that inline image. From the uniform strong convergence of Mnj (ω, γj) and ∂Mnj (ω, γj)/∂γj′ to EMnj (ω, γj) and E∂Mnj (ω, γj)/∂γj′, and from the equicontinuity of the two limit functions (see the proof of Theorem 2), it follows from Domowitz and White (1982, Theorem 2.3) that inline image and inline image. Since EMnj (ω, γnj) and E∂Mnj (ω, γnj)/∂γj′ are both O(1), and continuous functions are uniformly continuous on compact sets, it follows that inline image and inline image, where inline image and inline image are both O(1). Using expressions (A.6) and (A.7), it follows from White (1984, Proposition 2.16) that inline image. Since inline image and Ajn+= O(1) by assumption, we obtain the desired property.

Step 2

The second property is that inline image. Its proof follows the proof of Theorem 5.6 in Gallant and White (1988). First, we have inline image where

  • image

and Vnt= (Vnt1′, Vnt2′)′ is the t th element of the sum (8). To see this, note that from (18) we have

  • image

Since inline image it follows from Gallant and White (1988, Lemma 6.6) and Assumptions 1, 9, 15 (iii)–(iv) and 26 (iii)–(iv) that inline image.

Second, we have inline image where

  • image(41)

Unt= (Unt1′, Unt2′)′, and inline image. To see this, note that

  • image(42)

because of (22) and EUntnt. Consider the leading term of the difference between (A.18) and (A.19). From Assumptions 15 (iii)–(iv) and 16 (iii)–(iv), we have by Gallant and White (1988, Corollary 4.3) that the elements of {Unt U′nt} are r-integrable uniformly in n, t and near-epoch dependent on {Xt} of size −1 and hence of size −1/2. Then, applying Gallant and White (1988, Theorem 6.2) on the elements of {Unt U′nt− E(Unt U′nt)}, we obtain

  • image

because |wn0 | ≤Δ by Assumption 18. Moreover, from Gallant and White (1988, Lemma 6.7) and Assumptions 1, 9, 15 (iii)–(iv) and 16 (iii)–(iv), we have

  • image

Hence inline image as claimed.

Third, we have inline image. This is proved by taking a Taylor expansion around inline image and by using an argument identical to that used by Gallant and White (1988, p. 106 and p. 118) for proving that inline image. It is here where we use the assumption that the elements of {∂mtj (ω, γj)/∂γj′} and {∂δtj (ω, γj)/∂γj′} are 2r-dominated uniformly on Γj. Collecting the preceding three facts, it follows that inline image. Note that Λn is positive semidefinite, which immediately follows from Assumption 18 and Gallant and White (1988, Lemma 6.5) applied to λ′μnt for arbitrary λ.

Step 3

The third property is that R′nΛn Rn[RIGHTWARDS ARROW] 0 under H0*. Let inline image. Thus, from (22) we have

  • image

Consider the first term and more specifically

  • image

Since inline image under H0* by condition (i), and mn= o(n1/4) by Assumption 17, it follows that R′nΛ1n Rn[RIGHTWARDS ARROW] 0 under H0*.

Consider the second term and more specifically

  • image

where the second inequality follows from ‖ρ′nt‖≤ K ≤∞ because of Assumptions 15 (iii) and 16 (iii). Since inline image and mn= o(n1/4) by assumption, and Rn= O(1), it follows that R′nΛ2n Rn[RIGHTWARDS ARROW] 0 and R′nΛ′2n Rn[RIGHTWARDS ARROW] 0 under H0*.

Finally, consider the last term. We have

  • image

Hence, because |w | ≤Δ and |R′nρnt | ≤ dn by assumption, we obtain

  • image

Since dn2= O(n−1/4) by condition (ii) and mn= o(n1/4) by Assumption 17, it follows that R′nΛ3n Rn[RIGHTWARDS ARROW] 0. Therefore under H0*, we have R′nΛn Rn[RIGHTWARDS ARROW] 0.

Step 4

We are now in a position to prove the theorem. From Steps 1 and 2 combined with Rn= O(1), we obtain inline image, i.e. inline image. Because Vn= O(1), Rn= O(1), and inline image, then inline image. Thus, using the definition of σn2, we have inline image, i.e. inline image because of Step 3. Therefore the proof is complete if the term in parentheses converges in probability to zero under H0*.

To see the latter, we note that

  • image

while inline image is given by a similar expression with inline image replacing Rn R′n. From Step 1 and Rn= O(1) we have inline image. Hence inline image under H0* by Step 3.

Proof of Lemma 32
  • (i)
    Under H0*, (A.2) gives inline image since inline image. Since inline image, it follows that inline image if and only if L′n C′n= o(1), i.e. if and only if σn2= o(1) since σn2= L′n C′n Cn Ln.
  • (ii)
    From (A.1) we have inline image since inline image and inline image for j = 1, 2 by assumption. The desired result follows from Part (i).
Proof of Theorem 5

First, because model 1 is nested in model 2 according to the chosen selection criterion, we can take inline image in view of Assumption 4. Thus H0* is equivalent to inline image. Hence, because inline image is the identifiably unique minimizer of inline image on Γ2, it is easily shown by contradiction that inline image under H0*.

Next, we show that σn2= o(1) by verifying condition (ii) of Lemma 32. Because model 1 is nested in model 2 according to the chosen selection criterion, such a condition is equivalent to inline image. Taking a Taylor expansion around inline image gives inline image, where inline image. This is satisfied because of assumption (ii) and inline image. The desired result follows.