^{1}Corresponding author. E-mail: qvuong@usc.edu

# Model selection tests for nonlinear dynamic models

Version of Record online: 4 NOV 2002

DOI: 10.1111/1368-423X.t01-1-00071

Additional Information

#### How to Cite

Rivers, D. and Vuong, Q. (2002), Model selection tests for nonlinear dynamic models. The Econometrics Journal, 5: 1–39. doi: 10.1111/1368-423X.t01-1-00071

^{2}As noted in Vuong (1989), the LR statistic can also be adjusted by some correction factors such as those proposed by , Schwarz (1978), and Hannan and Quinn (1979) to reflect the parsimony of each competing model. For a recent contribution on penalizing the LR statistic, see Sin and White (1996).^{3}Applications of Vuong's test, as it is called in the econometric literature, have appeared in empirical work. For instance, it has been used to test for the presence of collusion in Gasmi*et al.*(1992), for the presence of asymmetric information in Wolak (1994), for distributional assumptions in Paarsch (1997), and for discriminating a structural nonlinear model from linear counterparts in Caballero and Engel (1999).^{4}It is worth noting that extensions of Cox's tests followed the lines described previously, namely extensions to time series models and incompletely specified models estimated by methods other than ML. See Walker (1967), Davidson and MacKinnnon (1981), Ericsson (1983), Godfrey (1983), Gourieroux*et al.*(1983) and Mizon and Richard (1986), among others.^{5}Findley (1990) proposes an interesting graphical procedure that addresses this issue when the competing models are Gaussian ARMA or ARIMA models.^{6}We are grateful to a referee for suggesting this example. Other model selection problems can be worked out similarly such as choosing between an AR(1) model and a MA(1) model. In particular, the latter problem has been treated differently using some Cox-type tests for nonnested hypotheses (see e.g. Walker (1967), King and McAleer (1987)).^{7}To simplify, we assume that the sample size used for estimation is equal to the out-of-sample size used for model selection. Appropriate changes can accommodate an out-of-sample size p that increases at the same rate as n. See also for other situations such as lim_{n∞}p/n = 0 or ∞.^{8}This assumption is stronger than necessary, but greatly facilitates the verification of the assumptions. Whenever possible, we indicate when it can be weakened.^{9}Gaussianity can be relaxed as non-Gaussian ARMA (p, q) processes are also α-mixing of arbitrary size under appropriate conditions. See Pham and Tran (1980).^{10}The preceding argument shows that stationarity and Gaussianity can be weakened for Assumption 15 (ii), (iii) to hold as it suffices that EY_{t}^{2r}be uniformly bounded for some r > 1.^{11}The general case where Q_{n}^{j}(ω, γ^{j}) = d_{j}{M_{n}^{j}(ω, θ^{j}, τ^{j}), θ^{j}, τ^{j}} was not treated to economize on proofs and notations, but follows similarly. Moreover, to simplify, is again assumed independent of n.^{12}Sin and White (1996) provide conditions on the penalty functions ensuring weak or strong consistency of the adjusted likelihood criterion. Thus, combining their results with ours delivers a likelihood-based procedure that is consistent both as a model selection criterion and a model selection test of H_{0}^{*}.^{13}Findley (1990) notes that comparing the (in-sample) log-likelihood values is also equivalent to comparing the one-step MSEP when the competing models are Gaussian ARMA or ARIMA models. Diebold and Mariano (1995) allow for more general losses than the MSEP, though their results require either that the parameters of the competing models be known or lim_{n∞}p/n = 0, as noted by West (1996).^{14}The rates n^{−1/4}and n^{−1/8}arise from the rate of m_{n}in Assumption 27. As its proof shows, Theorem 4 actually holds for any rate of m_{n}that guarantees the consistency of for V_{n}, provided in (i) and in (ii). In particular, Andrews (1991) shows that the optimal rate of m_{n}for the Bartlett weights w_{nτ}used by Newey and West (1987b) is O(n^{1/3}), and hence does not satisfy Assumption 27. See also Andrews (1991) for optimal weights and data-dependent automatic determination of m_{n}.^{15}Note that , where U_{nt}is defined as in (21) but with replacing . Hence, from E(U_{nt}) =μ_{nt}it can be easily shown that , if . The latter condition, however, is not sufficient to ensure the consistency of to σ_{n}^{2}because the near-epoch dependence of R′_{n}U_{nt}on X_{t}does not guarantee that the*raw*moment E(R′_{n}U_{nt}R′_{n}U_{n,t−τ}) vanishes as τ increases, when E(R′_{n}U_{nt}) |= 0. On the other hand, E(R′_{n}U_{nt}) = 0 for all n, t trivially implies conditions (i)–(ii).^{16}Similarly, for the in-sample MSEP studied in Section 18, the estimator appearing in (17) can be taken to be given by (20), where is replaced by the difference in squared prediction errors .^{17}Similar results hold when using the in-sample MSEP for choosing between the two competing AR models.

#### Publication History

- Issue online: 4 NOV 2002
- Version of Record online: 4 NOV 2002
- Received on November 1999

- Abstract
- Article
- References
- Cited By

### Keywords:

- Model selection tests;
- Nonnested hypotheses;
- Nonlinear dynamic models;
- Goodness-of-fit;
- Mean square prediction error

### Abstract

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

This paper generalizes Vuong (1989) asymptotically normal tests for model selection in several important directions. First, it allows for incompletely parametrized models such as econometric models defined by moment conditions. Second, it allows for a broad class of estimation methods that includes most estimators currently used in practice. Third, it considers model selection criteria other than the models’ likelihoods such as the mean squared errors of prediction. Fourth, the proposed tests are applicable to possibly misspecified nonlinear dynamic models with weakly dependent heterogeneous data. Cases where the estimation methods optimize the model selection criteria are distinguished from cases where they do not. We also consider the estimation of the asymptotic variance of the difference between the competing models’ selection criteria, which is necessary to our tests. Finally, we discuss conditions under which our tests are valid. It is seen that the competing models must be essentially nonnested.

### Introduction

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

Most procedures for choosing between two competing econometric models take the following form: each econometric model is estimated by a method that solves some optimization problem; the models are then compared by defining an appropriate goodness-of-fit or selection criterion for each model; and the better-fitting model according to this criterion is selected. In some cases the method of estimation for each model maximizes the goodness-of-fit criterion used for model selection. For instance, when the competing models are fully parametrized and estimated by maximum likelihood, some popular procedures for model selection are based on Akaike (1973), Akaike (1974) information criterion (AIC), Schwarz (1978) information criterion (SIC), or Hannan and Quinn (1979) criterion. In other cases, a different goodness-of-fit criterion is used for model selection. This arises when the competing models are estimated using the same sample and compared on their out-of-sample mean squared errors of prediction (MSEP). See Linhart and Zucchini (1986) for various other model selection criteria and procedures.

These model selection procedures are not entirely satisfactory. Since model selection criteria depend on sample information, their actual values are subject to statistical variations. As a consequence a model with a higher model selection criterion value may not outperform *significantly* its competitor. When the competing models are fully parametrized, nonnested and estimated by maximum likelihood (ML) and when the observations are independent and identically distributed (i.i.d.), Linhart (1988) and Vuong (1989) independently proposed a general testing procedure that takes into account statistical variations and which relies on some convenient asymptotically standard normal tests for model selection based on the familiar likelihood ratio (LR) test statistic.^{2} Such tests are testing the null hypothesis that the competing models are as close to the data generating process (DGP) against the alternative hypotheses that one model is closer to the DGP where closeness of a model is measured according to the Kullback–Leibler (1951) information criterion (KLIC). Thus, as in classical nested hypothesis testing, outcomes of the tests provide information on the strength of the statistical evidence for the choice of a model based on its goodness-of-fit.^{3}

Though quite general, the applicability of Vuong's model selection tests is currently limited for various reasons. First, because they are based on the likelihood function, these tests require that the competing models be completely parametrized. For instance, this implies that error terms in competing nonlinear regression models or simultaneous equations models must be specified to belong to some parametric family of distributions. As a consequence, Vuong's tests cannot be used to discriminate between two econometric models defined by moment conditions, or more generally, between two competing models that are incompletely specified.

The second limitation arises from the method of estimation. While ML estimation is quite natural when the model selection criterion is KLIC (see e.g. White (1982)), there are various reasons that may lead a researcher to use an estimation method other than ML. For instance, for computational simplicity, robustness reasons, or by necessity because the competing models are incompletely specified, one may use an instrumental variable (IV) estimator or more generally a generalized method of moment (GMM) estimator (see Hansen (1982)), a robust estimator (see Huber (1981), Hampel *et al.* (1986)) or other extremum estimators (see Amemiya (1985), Gallant and White (1988)), a semiparametric estimator (see Andrews (1994), Newey and McFadden (1994), Powell (1994)), etc. Thus it is useful to provide a model selection testing framework that allows for a wide variety of estimation techniques.

Third, the maximum value of the likelihood (possibly adjusted) is not the only model selection criterion used in practice. For instance, when dealing with qualitative dependent variables models, alternative model selection criteria are Pearson-type goodness-of-fit statistics (see e.g. Moore (1978), Heckman (1981)). In linear regression models, criteria based on the in-sample MSEP are widely used (see e.g. Mallows (1973), Amemiya (1980)). When comparing the relative performance of macroeconometric models, a frequent criterion is the out-of-sample MSEP (see e.g. Meese and Rogoff (1983), Fair and Shiller (1990)). Another approach to the use of out-of-sample forecast performance for time series models is illustrated in Findley *et al.* (1998). Along these lines, recent contributions on model selection based on out-of-sample predictability are Diebold and Mariano (1995), West (1994), West (1996), Granger and Pesaran (2000), and White (2000).

Moreover, when the models are incompletely specified, the use of criteria other than the ML values becomes necessary. For instance, Sargan (1958) and Pesaran and Smith (1994) have proposed to use the value of the IV criterion function when the competing models are simultaneous equations models estimated by IV methods. Recently, for robustness reasons, Martin (1980), Ronchetti (1985) and Machado (1993) have proposed robust versions of the AIC and SIC criteria by replacing the likelihood part by the extremal value of the sample objective function defining the robust estimator used. See also Konishi and Kitagawa (1996) who propose a generalized information criterion that can be used with robust parameter estimates. Though the preceding criteria have a goodness-of-fit appeal, a model selection criterion need not have such a property. For instance, the precision or MSE of the parameter estimates of interest in the competing models can be a criterion for model selection (see e.g. Torro-Vizcarrondo and Wallace (1968)). This list of criteria is clearly not exhaustive. It suggests, however, than the choice of a model selection criterion depends on the researcher and the purpose of the econometric modelling.

Fourth, Vuong's tests are derived for competing models that are completely static and for observations that are i.i.d. Clearly, the i.i.d. assumption is restrictive when considering time series data. Moreover, dynamic models are frequently considered in empirical work. For instance, a classical question is to determine the order of an ARMA process (see e.g. Hannan and Quinn (1979)). Some generalizations of Vuong's tests to time series models have been undertaken recently and independently from the present work. Findley (1990), Findley (1991) considers essentially the case of competing Gaussian ARMA models when the true DGP is a strictly stationary process. Findley and Wei (1993) provide a generalization to some dynamic regression models.

The goal of the present paper is thus clear. It is to generalize Vuong (1989)model selection tests in several important directions. First, the present paper allows for incompletely specified models such as econometric models defined by moment conditions. Second, it allows for a broad class of estimation methods that includes most estimators used in practice such as the ML estimator, minimum chi-square estimators, GMM estimators, as well as other extremum estimators and some semiparametric estimators. In particular, we shall require that the estimators be -consistent. See Gallant and White (1988) and Newey and McFadden (1994) for the class of parametric and semiparametric estimators considered here. Third, the present paper allows for model selection criteria other than the models’ likelihoods. An important example is the out-of-sample MSEP. Lastly, our tests are obtained for weakly dependent heterogeneous data. This permits the application of our tests to the selection of nonlinear dynamic models in times series situations.

At the outset, it is useful to stress that a distinctive feature of our approach is that both competing models may be misspecified. In particular, our approach does not require that either competing model be correctly specified under the null hypothesis under test. Such a feature reflects the observation that one can seldom specify a statistical model that can describe accurately the data in empirical work, especially in Social Sciences. See e.g. Nakamura *et al.* (1990) for a similar point of view. As we shall see, however, this does not prevent model comparisons.

Not requiring correct specification of the competing models also contrasts with the tests of Cox (1961), Cox (1962), which led to the development of a vast econometric literature on testing nonnested hypotheses (see Pesaran (1974) and the surveys by MacKinnon (1983) and McAleer (1987)).^{4}Indeed, under the null hypothesis under test in Cox's approach, one of the competing models is correctly specified. Moreover Cox's tests are more difficult to compute than ours because they require a consistent estimate of the asymptotic mean of the test statistic under the null hypothesis. This also raises some theoretical difficulties when the competing models are incompletely specified. See e.g. Ghysels and Hall (1990) and Smith (1992).

The paper is divided into six more sections. Section 2 describes in some detail the general model selection framework for nonlinear dynamic models. A series of hypotheses about the asymptotic fit of the models are put forth. A basic result on the asymptotic properties of our test statistics is established under general conditions on the model selection criteria and the estimators used. The next two sections seek more primitive conditions ensuring that the conditions of this basic result hold. As the preceding examples of model selection procedures suggest, we distinguish two cases. Section 12 considers the case where model selection criteria can be viewed as optimands of the Gallant and White (1988) type although estimators that are employed are not those that maximize these criteria. In contrast, Section 18 specializes to the case where models are estimated by maximizing some criteria that are used subsequently for model selection. This section also covers the case where models are estimated by means of two-stage procedures in which the second stage involves optimizing goodness-of-fit conditional upon preliminary estimates of the nuisance parameters. Section 24 considers estimation of the asymptotic variance of the difference in goodness-of-fit despite possible misspecification of the competing models. This step is necessary for the construction of our test statistics. Section 31 discusses a critical condition for our test statistics to be asymptotically normal. Essentially, this condition requires that the estimated models be nonnested. Throughout, our theoretical results are illustrated with the comparison between two nonnested autoregressive models based on their in-sample and out-of-sample MSEP. Section 36 concludes. An appendix collects the proofs of our results.

### Tests statistics and general results

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

Two econometric models, and , are estimated using data generated by an unknown stochastic process. The DGP satisfies

#### Assumption 1.

{X_{t}}_{t=−∞}^{∞} is a p-dimensional stochastic process on a complete probability space . To simplify the notation, we use ω to indicate the whole sequence {X_{t}}_{t=−∞}^{+∞}. Hereafter, there will be various functions indexed by n where ω appears as an argument. In most cases, these functions depend on ω through the vector (X_{1},…, X_{n}) corresponding to the period for which the competing models are compared. This need not be always the case, for instance when estimation of the competing models uses a different sample (see below).

For j = 1, 2, let γ^{j} denote the features of interest for model and let Γ_{j} denote the set of possible values for γ^{j}. As usual, we require

#### Assumption 2.

For j = 1, 2, Γ_{j} is a compact subset of R. Thus γ^{j} and Γ_{j} can be viewed as the parameter vector of interest and the parameter space associated with model , respectively.

Let denote an estimator of γ^{j}. At this point, the method of estimation plays no role. The estimator can be obtained from the same sample (X_{1},…, X_{n}) used for model comparison, i.e. the competing models are compared within sample. Alternatively, one may use a prior sample for estimation and the sample (X_{1},…, X_{n}) to assess the performance of the estimated competing models as in West (1994). One can also sequentially update with increasing estimation samples and compare (say) the one-period ahead predictability over the sample (X_{1},…, X_{n}) of the sequentially updated models as in West (1996). We require, however, an assumption about the asymptotic behavior of the estimators, which is satisfied in general.

#### Assumption 3.

For j = 1, 2, is a sequence of random vectors on such that for each n and there exists a nonstochastic sequence uniformly interior to Γ_{j} for which as n ∞. Note that need not have a limit as n ∞. In the literature, the are referred to as pseudo-true values. See Domowitz and White (1982), Bates and White (1985) and Gallant and White (1988)among others. The i.i.d. case considered in White (1982) and Vuong (1989) is considerably simpler because the pseudo-true values do not vary with the sample size. For dynamic model selection problems, it can be useful to allow for parameter drift. This can occur because the data generating process is nonstationary. But it may also occur with stationary data because of the sequence of specified models or estimating methods used.

Next, most if not all known model selection procedures are defined by model selection criteria (see e.g. Linhart and Zucchini (1986)). Typically, such criteria are goodness-of-fit criteria. Hereafter, we view them as lack-of-fit criteria, i.e. the opposite of goodness-of-fit criteria. As seen in the introduction, however, selection criteria may emphasize other aspects such as the precision of parameter estimates of interest (see e.g. Torro-Vizcarrondo and Wallace (1968)). To be general, we consider selection criteria of the form Q_{n}^{j} (ω, γ^{j}) for each competing model. Note that the criterion typically depends on the features of interest γ_{j} for model . It may include some Akaike (1973) type correction factor for parsimony reasons as in Sin and White (1996). It is generally random because of its dependence on ω. This is the case when selection criteria use sample information, i.e. are statistics.

A first assumption on selection criteria covered by our theory is

#### Assumption 4.

For j = 1, 2, let {Q_{n}^{j} : Ω×Γ_{j} R}_{n=1}^{∞} be a sequence of functions such that is measurable for every γ^{j}∈Γ_{j}. There exists an equicontinuous sequence of nonstochastic functions such that

as n ∞. Typically, is the expectation of Q_{n}^{j} (ω, γ^{j}) with respect to ω. Thus Assumption 4 generally follows from a uniform strong law of large numbers (see e.g. Jennrich (1969), Andrews (1987)). As a matter of fact, many model selection criteria satisfy Assumption 4 under suitable regularity conditions (see Section 12). The most well known is minus the model log-likelihood −(1/n) log f_{n}^{j} (X_{1},…, X_{n}, γ^{j}) where f_{n}^{j} is the joint density of the first n observations associated with the parameter value γ^{j} of model . Another common criterion is the MSEP based on n predictions where X_{t}= (Y ′_{t}, W ′_{t})′ and {f_{t}^{j}} is a sequence of predictor functions known up to γ^{j} as in West (1994), West (1996).

Given lack-of-fit criteria Q_{n}^{1} (ω, γ^{1}) and Q_{n}^{2} (ω, γ^{2}), and estimates and , which may come from the comparison sample (X_{1},…, X_{n}) or from another sample such as a prior one, a frequent procedure is to select the model with the smallest lack-of-fit measure . Such a model selection procedure can be given an asymptotic interpretation. Given Assumptions 1–4, it follows from Domowitz and White (1982, Theorem 2.3) that , for j = 1, 2. Hence

- (1)

where ΔQ_{n} (ω, γ^{1}, γ^{2}) = Q_{n}^{1} (ω, γ^{1}) − Q_{n}^{2} (ω, γ^{2}) and . The quantity can be interpreted as the difference between the asymptotic lack-of-fit of the competing models.

In view of the preceding remark, we shall consider various hypotheses comparing the (asymptotic) lack-of-fit of the competing models. Following Vuong (1989), the null hypothesis H_{0} is that and are *asymptotically equivalent* when

The first alternative hypothesis is that is *asymptotically better* than when

Similarly, the second alternative hypothesis is that is asymptotically better than :

Note that the limit of as n ∞ may not exist under the alternative hypotheses H_{1} and H_{2}. Consequently, in the case of dynamic models with time series data, there is a third alternative, which is that and are *asymptotically incomparable*:

with at least one inequality being strict. In this case, there is some asymptotically nonnegligible region of the data for which fits better than or vice versa without the models being asymptotically equivalent.^{5}Such a situation contrasts with the i.i.d. case considered by Vuong (1989) where the hypotheses H_{0}, H_{1} and H_{2} are exhaustive. This is because does not depend on n so that its limit necessarily exists and H_{3} is void. More generally, H_{3} is void under strict stationarity of the DGP as and will typically not depend on n.

As argued in the introduction, a simple numerical comparison of the sample values of the respective lack-of-fit criteria is not entirely satisfactory for it does not take into account sample variability. Significance of the difference in lack-of-fit needs to be assessed. To do so, we propose some tests with good asymptotic properties. We require additional regularity conditions. The first condition bears on lack-of-fit criteria and is similar to Assumption 4.

#### Assumption 5.

For j = 1, 2, and are continuously differentiable, and

as n ∞. Moreover, is equicontinuous and is bounded.

The next one is a joint assumption on estimation methods and lack-of-fit criteria. Typically, it follows from a multivariate Central Limit Theorem.

#### Assumption 6.

For some integer s > 0, there exists a bounded sequence of nonstochastic s × (k + 2) matrices {C_{n}} and a sequence of s × 1 random vectors {Z_{n}} on with such that

- (2)

where k = k_{1}+ k_{2} and

- (3)

In particular, Assumption 6 requires that estimators be -asymptotically normal. This is satisfied by a large class of common econometric estimators such as extremum estimators (see e.g. Amemiya (1985), Gallant and White (1988) and Section 18 below) as well as some semiparametric estimators (see Newey and McFadden (1994, Section 8)). Assumption 6 also requires that lack-of-fit criteria evaluated at pseudo-true values be -asymptotically normal. The latter condition will be verified in Section 12 for a large class of model selection criteria.

Let σ_{n}^{2}= L′_{n} C′_{n} C_{n} L_{n} where

- (4)

#### Assumption 7.

lim inf_{n}σ_{n}^{2} > 0. It turns out that σ_{n}^{2} is the asymptotic variance of the difference in lack-of-fit. The condition lim inf_{n}σ_{n}^{2} > 0 is crucial and is discussed in Section 31.

Lastly, we need a consistent estimator of this variance.

#### Assumption 8.

There exists a sequence of random variables on such that as n ∞. In Section 24, we shall propose some consistent estimators of σ_{n}^{2}.

We are now in a position to define our test statistic as a suitably normalized difference of the sample lack-of-fit criteria:

- (5)

Our model selection test involves comparing values of T_{n} with critical values of a standard normal distribution. Let α denote the desired (asymptotic) size of the test and z_{α/2} the value of the inverse standard normal distribution function evaluated at 1 −α/2. If T_{n} < −z_{α/2}, we reject H_{0} in favor of H_{1}; If T_{n} > z_{α/2}, we reject H_{0} in favor of H_{2}; Otherwise, we accept H_{0}.

An asymptotic justification of the proposed test is given in the next theorem. Define the hypotheses

Under H_{0}^{*}, the competing models are -*asymptotically equivalent*. This constitutes a strengthening of H_{0}. Note also that H_{1}^{*} (say) contains the important case where has a finite and strictly negative limit as n ∞. We have

#### Theorem 1.

Given Assumptions 1–8, then σ_{n}^{2} is bounded and

- (i)under H
_{0}^{*}, T_{n}N(0, 1), - (ii)under H
_{1}^{*}, , - (iii)under H
_{2}^{*}, .

Since H_{0}^{*}⊂ H_{0}, Theorem 1 shows that our test has correct asymptotic size on a subset of the null hypothesis of asymptotic equivalence H_{0}. The set H_{0}^{*}, however, contains important situations such as for all n sufficiently large. Theorem 1 also shows that the test is consistent against H_{1}^{*} and H_{2}^{*}. Since H_{1} implies H_{1}^{*} and H_{2} implies H_{2}^{*}, the test is consistent against a larger set of alternatives than H_{1} and H_{2}. Moreover, if and do not depend on n, as is typically the case under strict stationarity of the DGP, then H_{j}^{*}= H_{j} for j = 0, 1, 2, and our proposed test has the desired asymptotic size under the null hypothesis of interest H_{0} and is consistent against the alternatives H_{1} and H_{2}.

### Model selection tests without lack-of-fit minimization

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

Theorem 1 is derived under general assumptions. In this section and the next we seek more primitive assumptions on estimators and lack-of-fit criteria that will imply Assumptions 3–6. We postpone the discussion of Assumptions 7 and 8to Sections 31 and 24, respectively. Here we focus on the assumptions on the model selection criteria, namely Assumptions 4–6.

The framework of the preceding section is too general for our purpose. First, we need to be more precise on how the data are generated, i.e. on the DGP. We complement Assumption 1 by

#### Assumption 9.

For some r > 2, {X_{t}} is a φ-or α-mixing sequence such that φ_{m} is of size −r/(r − 1) or α_{m} is of size −2r/(r − 2), respectively. Definitions of φ-mixing, α-mixing and size can be found in Gallant and White (1988, Chapter 3) among others. Assumption 9 allows quite general time dependence and heterogeneity of the data generating process, though extensions of our results to even more general processes such as mixingales can be entertained as in Gallant and White (1988).

#### Assumption 10.

For j = 1, 2, there exist k_{j}× 1 random vectors {Y_{nt}^{j} ; t = 1,…, n, n = 1, 2,…} on with mean zero, bounded r th absolute moments, and near-epoch dependent upon {X_{t}} of size −1 such that

- (6)

where {A_{jn}^{+} ; n = 1, 2,…} are bounded nonstochastic symmetric k_{j}× k_{j} matrices. A definition and properties of near-epoch dependence can be found in Gallant and White (1988, Definition 3.13 and Chapter 4). Many econometric estimators defined by optimizing a stochastic function satisfy Assumption 10 (see Gallant and White (1988)). Some -asymptotically normal semiparametric estimators also possess such an asymptotically linear representation as shown in Newey and McFadden (1994, Section 8).

Now, we turn to model selection criteria. Many common criteria actually take the form of an optimand of the type considered by Gallant and White (1988). For instance, this is the case for the log-likelihood and the MSEP seen earlier as well as for other criteria cited in the introduction and discussed below. Hence, it is natural to restrict the class of criteria by imposing the following primitive assumption.

#### Assumption 11.

For j = 1, 2,

- (7)

where and

- (i)m
_{t}^{j}: Ω×Γ_{j}R is measurable for each γ^{j}and continuously differentiable on Γ_{j}. Also, d_{j}: R×Γ_{j}R is continuously differentiable on R×Γ_{j}, - (ii){m
_{t}^{j}(ω, γ^{j})} and {∂m_{t}^{j}(ω, γ^{j})/∂γ^{j′}} are almost surely Lipschitz-L_{1}on Γ_{j}, - (iii){m
_{t}^{j}(ω, γ^{j})} and {∂m_{t}^{j}(ω, γ^{j})/∂γ^{j′}} are r-dominated on Γ_{j}uniformly in t, - (iv){m
_{t}^{j}(ω, γ^{j})} and {∂m_{t}^{j}(ω, γ^{j})/∂γ^{j′}} are near-epoch dependent upon {X_{t}} of size −1 and −1/2, respectively, where the first dependence is uniform on Γ_{j}.

The form (7) of the model selection criteria and the requirements (i)–(iv) are reminiscent of the optimands and the regularity conditions placed on them that are considered by Gallant and White (1988), though to simplify, we assume that is not indexed by n. Definitions of a random function that is almost surely Lipschitz-L_{1} and r-dominated can be found in Gallant and White (1988, Definitions 3.5 and 3.16).

As in the previous section, however, estimation plays no role. Specifically, we have in mind situations where a researcher has estimated two competing models *via* some estimation methods satisfying the regularity Assumption 10 and wishes to compare the estimated models according to their criterion values . In contrast to the next section, the estimation methods used in this section need not optimize the selection criteria. Such situations are actually frequent. A time-series example using the out-of-sample MSEP as a model selection criterion is fully worked out after Theorem 2. Many other examples can be given. For instance, an econometric example is provided by the comparison of two competing nonlinear simultaneous equations models based on their out-of-sample MSEP, where each model is defined by a set of orthogonality conditions and, as in Andrews and Fair (1988) and Gallant and White (1988), estimated by GMM. Other examples also arise when estimating competing models by ML or other methods on ungrouped data and evaluating their lack-of-fit by some Pearson type chi-square statistics on grouped data (see e.g. Heckman (1981)). Such situations are considered in Vuong and Wang (1993b).

Let q = q_{1}+ q_{2} and k = k_{1}+ k_{2}. Define the (q + k) × (q + k) matrix

- (8)

Let R_{n}= (R_{n}^{1′}, −R_{n}^{2′})′ where R_{n}^{j} is the (q_{j}+ k_{j})-dimensional vector

- (9)

and and are the partial derivatives of d_{j} (m^{j}, γ^{j}) with respect to m^{j} and γ^{j} evaluated at .

#### Theorem 2.

Given Assumptions 1–3 and 7–11, suppose that V_{n} is uniformly of rank s for some s > 0. The conclusions of Theorem 1 then hold with σ_{n}^{2}= R′_{n} V_{n} R_{n} where V_{n}= O(1) and R_{n}= O(1).

From (8), the asymptotic variance σ_{n}^{2} depends generally on how is estimated through the asymptotic variance of . However, when d_{j} (m_{j}, γ_{j}) does not depend on γ_{j} and , we have (see also Section 18). The asymptotic variance σ_{n}^{2} then depends only on the asymptotic variance of the (m^{1}, m^{2}) components in (8). That is, can be treated as if it is *known* since sampling uncertainty due to estimation of becomes asymptotically irrelevant for testing the null hypothesis H_{0}^{*} of -asymptotic equivalence of the competing models. In particular, under weak stationarity of the DGP, the condition appears in West (1994), West (1996) and is shown to be satisfied by the out-of-sample MSEP in (non)linear regressions estimated by (non)linear least squares.

The following example illustrates the preceding result as well as the verification of its assumptions.^{6}

#### Example

Consider the problem of choosing between the following autoregressive (AR) models for the univariate process {Y_{t}}_{−∞}^{+∞}:

where it is specified that γ^{j}∈Γ_{j}= [−1, 1] and that ε_{jt} are uncorrelated with mean zero and variance τ_{j}^{2}∈ϒ_{j}⊂ [0, +∞), for j = 1, 2.

A model selection procedure frequently used in macroeconomic modelling is to compare the out-of-sample MSEP of the estimated competing models. Specifically, suppose one has a sample of n = 2n^{*} observations (Y_{1},…, Y_{n}), and the first half is used for estimation while the second half is reserved for model comparison.^{7} For j = 1, 2 the out-of-sample MSEP for the AR (j) model is where and is some estimator of γ^{j} based on the first half of the sample. Here, we take to be the Yule–Walker estimator of the j th-order autocorrelation coefficient ρ_{j}^{o}, namely (see e.g. Fuller (1976, p. 327)). In particular, belongs to Γ_{j} though it does not minimize the out-of-sample MSEP criterion Q_{n}^{j} (ω, γ^{j}).

We now verify that Assumptions 1–3 and 9–11 of Theorem 2 are satisfied. Assumptions 7–8 will be discussed in Sections 31 and 24, respectively. Hereafter, we assume that {Y_{t}} is generated by a finite ARMA with roots outside the unit circle and i.i.d. Gaussian innovations.^{8} In particular, {Y_{t}} is strictly stationary (see e.g. Hayashi (2000, Propositions 6.1 and 6.5)) and α-mixing of arbitrary size from Ibragimov and Linnik (1971).^{9}For reasons seen below, define X_{t}= (Y_{t}, Y_{t}^{*})′ where Y_{t}^{*}= 0 if |t| is odd, and Y_{t}^{*}= Y_{t/2} if |t| is even or zero. Given the definition of {X_{t}} and Γ_{j}, Assumptions 1, 2 and 9 are thus trivially satisfied.

Turning to Assumption 11, which bears on the out-of-sample MSEP, let M_{n}^{j} (ω, γ^{j}) = (M_{1n}^{j} (ω, γ^{j}), M_{2n}^{j} (ω, γ^{j}))′, where

Note that from the definition of Y_{t}^{*}. Let d_{j} (m_{1}, m_{2}) = 2(m_{1}− m_{2}). It is easy to see that the out-of-sample MSEP criterion Q_{n}^{j} (ω, γ^{j}) is of the form (7) with m_{t}^{j} (ω, γ^{j}) = (m_{1t}^{j} (ω, γ^{j}), m_{2t}^{j} (ω, γ^{j}))′= ((Y_{t}−γ^{j} Y_{t−j})^{2}, (Y_{t}^{*}−γ^{j} Y_{t−2j}^{*})^{2})′. Moreover, and clearly satisfy Assumption 11 (i). Regarding Assumption 11(ii)–(iv), we verify them for {m_{1t} (ω, γ^{j})}. Similar arguments apply to {m_{2t} (ω, γ^{j})}.

Now, {m_{1t}^{j} (ω, γ^{j})} is almost surely Lipschitz-L_{1} on Γ_{j} from Gallant and White (1988, pp. 21–22). We have ∂m_{1t}^{j} (ω, γ^{j})/∂γ^{j}=−2(Y_{t}−γ^{j} Y_{t−j})Y_{t−j}. Hence, |∂m_{1t}^{j} (ω, γ^{j})/∂γ^{j}−∂m_{1t}^{j} (ω, γ_{o}^{j})/∂γ^{j} | = 2Y_{t−j}^{2} |γ^{j}−γ_{o}^{j} |, showing that {∂m_{1t}^{j} (ω, γ^{j})/∂γ^{j}} is almost surely Lipschitz-L_{1} on Γ_{j} since E(Y_{t−j}^{2}) is constant and finite. Hence Assumption 11 (ii) is satisfied. Next, using |γ^{j} | ≤ 1 together with Minskowski and Cauchy–Schwarz inequalities, we have

whenever r > 1, where is the L_{r} norm. This establishes Assumption 11 (iii) since E(Y_{t}^{2r}) is constant and finite (see Gallant and White (1988, p. 33)).^{10} Lastly, because they involve at most j lags of X_{t}, {m_{1t} (ω, γ^{j})} and {∂m_{1t} (ω, γ^{j})/∂γ^{j}} are near-epoch dependent on {X_{t}} of any size. Such near-epoch dependences are clearly uniform on Γ_{j}, establishing Assumption 11 (iv).

It remains to verify Assumptions 3 and 10, which bear on the estimators , j = 1, 2. Following the argument in Gallant and White (1988, p. 49), is a strongly consistent estimator of , for j = 1, 2. Let be n^{*} /(n^{*}− j) times the latter quantity. Because n^{*} /(n^{*}− j) 1 as n ∞, it follows that . Moreover, because the process {Y_{t}} is weakly stationary, it is easy to see that , the j th-order autocorrelation coefficient of {Y_{t}}, which is constant and in the interior of Γ_{j}. Hence Assumption 3 is satisfied. Moreover, we have

since (1/n^{*}) and by a variety of Laws of Large Numbers and Central Limit Theorems (see e.g. Theorems 3.15 and 5.3 in Gallant and White (1988) for near-epoch dependent functions of mixing processes), and . Hence, the estimator satisfies the asymptotic linear representation (6) with A_{jn}^{+}= 2/E(Y_{t}^{2}) and Y_{nt}^{j}= (ρ_{j}^{o} Y_{t}^{2}− Y_{t} Y_{t−j})I(t ≤ n^{*}), where is the indicator of the event in parentheses. Thus, because {A_{jn}^{+}} is constant and {Y_{nt}^{j}} is near-epoch dependent on {Y_{t}} of any size and hence near-epoch dependent on {X_{t}} of size −1, Assumption 10 is satisfied.

Provided Assumptions 7 and 8 are satisfied and V_{n} is of uniform rank (see Sections 31 and 24), Theorem 2 applies to the out-of-sample MSEP for comparing the above AR(1) and AR(2) models. That is, the quantity

- (10)

can be used as a model selection statistic for testing the null hypothesis H_{0}^{*} of -asymptotic equivalence. In particular, from (see the proof of Theorem 2) and weak stationarity of {Y_{t}}, we have , where is the autocovariance function of {Y_{t}}. Hence, , which is independent of n. Therefore, the hypotheses H_{0}, H_{1} and H_{2} are identical to H_{0}^{*}, H_{1}^{*} and H_{2}^{*}, respectively, which reduce to

- (11)

Moreover, because and does not depend on γ^{j}, the asymptotic variance σ_{n}^{2} does not depend on the sampling variability of so that ρ_{j}^{o} can be treated as known in the computation of σ_{n}^{2}, as noted after Theorem 2. See also Section 24.

### Model selection tests with lack-of-fit minimization

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

Up to now, the methods used to estimate the competing models need not optimize the criteria for model selection. It frequently happens, however, that estimators minimize the chosen lack-of-fit measures. The most common situation arises when the competing models are fully parametric and estimated by ML methods. A frequent criterion then is the model log-likelihood possibly adjusted (see e.g. Akaike (1973), Akaike (1974)). When the observations are i.i.d. this situation is analyzed in Vuong (1989) for general parametric models and in Lien and Vuong (1987) for normal linear regressions. Other examples arise when estimating fully parametric models by minimum chi-square methods and using Pearson type statistics as a criterion for model selection (see Vuong and Wang (1991), Vuong and Wang (1993a)).

When the competing models are not fully parametrized, an important econometric example is given by nonlinear simultaneous equation models, where each competing model is defined by a set of implicit simultaneous equations and a set of orthogonality conditions. Each model is then estimated by nonlinear IV or GMM (see Amemiya (1985), Hansen (1982) and Gallant and White (1988) among others). Following Sargan (1958), Newey and West (1987a)and recently Pesaran and Smith (1994), when both competing models are overidentified, the value of the GMM optimand evaluated at the GMM estimator can be the basis for hypothesis testing and more generally model selection in nested and nonnested situations.

The simultaneous equation example is interesting because the lack-of-fit criterion depends on some nuisance parameters associated with the weighting matrix used in GMM estimation. To include such situations in our analysis, we partition the parameter vector γ^{j} into the parameter vector of interest θ^{j} and the vector of nuisance parameters τ^{j}. Similarly to Assumption 11, model selection criteria are now assumed to satisfy

#### Assumption 12.

For j = 1, 2, Γ_{j}=Θ_{j}×ϒ_{j} where Θ_{j} and ϒ_{j} are compact subsets of R and R. Moreover,

- (12)

where γ^{j}= (θ^{j′}, τ^{j′})′, and

- (i)m
_{t}^{j}: Ω×Θ_{j}R is measurable for each θ^{j}and twice continuously differentiable on Θ_{j}. Also, d_{j}: R×Γ_{j}R is twice continuously differentiable on R×Γ_{j}, - (ii){m
_{t}^{j}(ω, θ^{j})}, {∂m_{t}^{j}(ω, θ^{j})/∂θ^{j′}} and {∂^{2}m_{t}^{j}(ω, θ^{j})/∂θ^{j}∂θ^{j′}} are almost surely Lipschitz-L_{1}on Θ_{j}, - (iii){m
_{t}^{j}(ω, θ^{j})}, {∂m_{t}^{j}(ω, θ^{j})/∂θ^{j′}} and {∂^{2}m_{t}^{j}(ω, θ^{j})/∂θ^{j}∂θ^{j′}} are r-dominated on Θ_{j}uniformly in t, - (iv){m
_{t}^{j}(ω, θ^{j})}, {∂m_{t}^{j}(ω, θ^{j})/∂θ^{j′}} and {∂^{2}m_{t}^{j}(ω, θ^{j})/∂θ^{j}∂θ^{j′}} are near-epoch dependent upon {X_{t}} of size −1, −1 and −1/2, respectively, where the first two dependencies are uniform on Θ_{j}.

Conditions (i)–(iv) are similar to those used by Gallant and White (1988) and strengthen conditions (i)–(iv) of Assumption 11. These authors consider the case where optimands are of the form Q_{n}^{j} (ω, θ^{j}) = d_{j} {M_{n}^{j} (ω, θ^{j})}. Bates and White (1985) consider the case where Q_{n}^{j} (ω, γ^{j}) = d_{j} {M_{n}^{j} (ω, θ^{j}), τ^{j}}. These are special cases of (12). On the other hand, Andrews and Fair (1988) consider the case where Q_{n}^{j} (ω, γ^{j}) = d_{j} {M_{n}^{j} (ω, θ^{j}, τ^{j}), τ^{j}}. The present formulation was preferred because it includes minimum Pearson chi-square estimation (see Vuong and Wang (1991), Vuong and Wang (1993a)).^{11}

In this section, estimators of the parameters of interest are obtained by minimizing model selection criteria conditional upon some preliminary estimates . That is,

- (13)

for j = 1, 2. The lack-of-fit associated with model is then . Conditions on the asymptotic behavior of the nuisance parameter estimators are required. We assume

#### Assumption 13.

For j = 1, 2, let be such that there exists a nonstochastic sequence uniformly interior to ϒ_{j} for which as n ∞. Moreover, there exist h_{j}× 1 random vectors {Y_{2nt}^{j} ; t = 1,…, n, n = 1, 2,…} on with mean zero, bounded r th absolute moments, and near-epoch dependent upon {X_{t}} of size −1 such that

- (14)

where {A_{2jn}^{+} ; n = 1, 2,…} are bounded nonstochastic symmetric h_{j}× h_{j} matrices. Assumption 13 is standard and is satisfied by many optimization estimators (see Gallant and White (1988)). Unlike Bates and White (1985) and Andrews and Fair (1988), however, we do not impose conditions on cross partial derivatives with respect to θ^{j} and τ^{j} of the lack-of-fit criterion or optimand (12) so that estimation of the nuisance parameters may affect the asymptotic distribution of the estimators . This extension is useful for minimum chi-square estimation and model selection tests based on Pearson type chi-square statistics (see Vuong and Wang (1991), Vuong and Wang (1993a)).

We need an identification condition similar to those used in the literature.

#### Assumption 14.

Let and let ∂θ^{j′}.

- (i)The sequence has identifiably unique minimizers uniformly interior to Θ
_{j}. - (ii)The sequence of matrices A
_{1jn}is uniformly positive definite.

For a definition of identifiably uniqueness, see Domowitz and White (1982) or Gallant and White (1988).

Let h = h_{1}+ h_{2}. Define the (q + h) × (q + h) matrix

- (15)

Let R_{n}= (R_{n}^{1′}, −R_{n}^{2′})′ where R_{n}^{j} is the (q_{j}+ h_{j})-dimensional vector

- (16)

and and are the partial derivatives of d_{j} (m^{j}, θ^{j}, τ^{j}) with respect to m^{j} and τ^{j} evaluated at .

The next theorem gives the basic result for model selection when the estimators used minimize (possibly in two steps) the lack-of-fit criteria. Relative to Theorem 1, it replaces the general Assumptions 2–6 by the more primitive Assumptions 9and 12–21. The theorem also gives the corresponding expression for the asymptotic variance σ_{n}^{2}.

#### Theorem 3.

Given Assumptions 1, 7–9 and 12–14, suppose that V_{n} is uniformly of rank s for some s > 0. The conclusions of Theorem 1 then hold with σ_{n}^{2}= R′_{n} V_{n} R_{n} where R_{n} and V_{n} are now defined by (16) and (15).

Note that the asymptotic variance σ_{n}^{2} does not depend on the first partial derivative of d_{j} (m^{j}, θ^{j}, τ^{j}) with respect to θ^{j}. More importantly, it is easy to see that σ_{n}^{2} is the same as if , j = 1, 2 were *known*. That is, when estimators of θ^{j} minimize the chosen model selection criteria, sampling uncertainty due to estimation of becomes asymptotically irrelevant for testing the null hypothesis H_{0}^{*} of -asymptotic equivalence of the competing models. This is the case when the competing models are estimated by ML and compared on the basis of their likelihood values possibly adjusted as in Sin and White (1996).^{12} This is also the case for competing (non)linear regressions estimated by (non)linear least squares and compared *via* their in-sample MSEP, and hence their out-of-sample MSEP under covariance stationarity of the DGP, as in West (1994), West (1996). See the example below.

In contrast, when , then σ_{n}^{2} and the asymptotic distribution of the difference in lack-of-fit depend on how the nuisance parameters τ^{j} are estimated. This is so whether or not , i.e. whether or not estimation of the nuisance parameters τ^{j} affects the asymptotic distributions of the estimators of the parameters of interest. Such a result is surprising, but in fact agrees with Theorem 2 for the special case where does not depend on θ^{j} and h_{j}= k_{j} so that τ^{j}=γ^{j}.

#### Example

[Continued] Consider again the problem of choosing between the AR(1) and AR(2) models of Section 12. Instead of the out-of-sample MSEP, we use here the in-sample MSEP , where minimizes over Γ_{j} for j = 1, 2. Note that is the least-squares estimator of γ^{j} constrained to the compact Γ_{j}, and is in general not equal to the j th autocorrelation estimator . As before, to verify easily the assumptions of Theorem 3, we assume that {Y_{t}} is generated by a finite ARMA with roots outside the unit circle and i.i.d. Gaussian innovations. Thus Assumptions 1 and 9are satisfied with X_{t}= Y_{t}.

Let . Hence, also minimizes Q_{n}^{j} (ω, γ^{j}) over Γ_{j} for j = 1, 2. Because there are no nuisance parameters, Assumption 13 holds trivially. Moreover, Q_{n}^{j} (ω, γ^{j}) is of the form (12) with γ^{j}=θ^{j}, d_{j} (m) = m, and m_{t}^{j} (ω, θ^{j}) = (Y_{t}−γ^{j} Y_{t−j})^{2} if t ≥ j + 1 and equal to zero if t ≤ j. Thus, as in Section 12, , {m_{t}^{j} (ω, θ^{j})} and {∂m_{t}^{j} (ω, θ^{j})/∂θ^{j}} satisfy Assumptions 12 (i)–(iv). Since ∂^{2} m_{t}^{j} (ω, θ^{j})/∂θ^{j}∂θ^{j′}= 2Y_{t−j}^{2}, it follows that Assumption 12 is satisfied. Next, consider Assumption 14. We have using the weak stationarity of {Y_{t}}. Thus is minimized uniquely at , which is independent of n and belongs to (−1, +1). Also, A_{1jn} is uniformly positive whenever n > j as A_{1jn}= 2{(n − j)/n}γ_{Y } (0). Hence, Assumption 14 is satisfied.

Provided Assumptions 7 and 8 are satisfied and V_{n} is of uniform rank (see Sections 24 and 31), Theorem 3 applies to the criterion for comparing the AR(1) and AR(2) models. Because , Theorem 3 also applies to the in-sample MSEP, i.e. the quantity

- (17)

can be used as a model selection statistic for testing the null hypothesis H_{0}^{*} of -asymptotic equivalence. Moreover, because is the same as for the out-of-sample MSEP evaluated at the autocorrelation estimator (see Section 12), the same remarks apply.^{13} Namely, the hypotheses H_{0}, H_{1} and H_{2} are identical to H_{0}^{*}, H_{1}^{*} and H_{2}^{*}, respectively, and reduce to (11). Moreover, as noted after Theorem 3, because minimizes Q_{n}^{j} (ω, γ^{j}), the asymptotic variance σ_{n}^{2} does not depend on the sampling variability of the (constrained) least-squares estimators , and hence can be computed as if ρ_{j}^{o} is known. See also Section 24.

### Consistent variance estimation

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

For our proposed tests to be operational, it is necessary to have a consistent estimator of the asymptotic variance σ_{n}^{2} (see Assumption 8). The next results are derived for situations where estimators do not necessarily optimize the selection criteria (see Section 12). Situations where estimators do optimize the selection criteria (see Section 18) are studied similarly. From Theorem 2 we know that σ_{n}^{2}= R′_{n} V_{n} R_{n} where V_{n} and R_{n} are given by (8) and (9), respectively. Thus it suffices to construct some consistent estimators of V_{n} and R_{n}. A consistent estimator of the (q + k)-dimensional vector R_{n} is obtained as usual by its sample analog evaluated at (see (19) below).

The difficulty is to obtain a consistent estimator of the (q + k) × (q + k) variance covariance matrix V_{n}. When observations are i.i.d. and the competing models are nondynamic, constructing a consistent estimator of the asymptotic variance σ_{n}^{2} is straightforward (see Vuong (1989), Vuong and Wang (1991), Vuong and Wang (1993a), Vuong and Wang (1993b)). This is because and Y_{nt}^{j} do not depend on n and because the (q + k)-dimensional vectors appearing in (8) are i.i.d. Thus the matrix V_{n} is just the population variance covariance matrix of this (q + k)-dimensional vector. Hence a simple consistent estimator of V_{n} is its sample analog evaluated at the estimates .

When observations are dependent and heterogeneous, consistent estimation of asymptotic variances is more complex but has been solved under general conditions (see Newey and West (1987b)), Gallant and White (1988, Chapter 6), and Andrews (1991)among others). In particular, an important condition is that the estimated model be correctly specified or that the DGP be stationary. However, even under our null hypothesis H_{0}^{*}, both competing models can be misspecified, while stationarity has not been assumed. The contribution of this section is to show that consistent estimation of the asymptotic variance of our test statistic is still possible in some important situations for weakly dependent and heterogeneous DGPs.

First, we strengthen Assumption 10 on the estimators .

#### Assumption 15.

For j = 1, 2, there exists a sequence of k_{j}-dimensional functions {δ_{t}^{j} : Ω×Γ_{j} R}_{t=1}^{+∞} satisfying

- (i)is measurable for each γ
^{j}and continuously differentiable on Γ_{j}, - (ii){δ
_{t}^{j}(ω, γ^{j})} and {∂δ_{t}^{j}(ω, γ^{j})/∂γ^{j′}} are almost surely Lipschitz-L_{1}on Γ_{j}, - (iii){δ
_{t}^{j}(ω, γ^{j})} and {∂δ_{t}^{j}(ω, γ^{j})/∂γ^{j′}} are 2r-dominated on Γ_{j}uniformly in t, - (iv){δ
_{t}^{j}(ω, γ^{j})} is near-epoch dependent upon {X_{t}} of size −2(r − 1)/(r − 2) uniformly on Γ_{j},

such that

- (18)

where {A_{jn}^{+} ; n = 1, 2,…} are bounded nonstochastic symmetric k_{j}× k_{j} matrices. Moreover, there exists a sequence of random matrices such that . A comparison of (6) and (18) gives . Note that we allow Eδ_{t}^{j} (ω, . Extremum estimators that are -asymptotically normal typically satisfy Assumption 15 (see Gallant and White (1988)).

Second, we strengthen Assumption 11 on model selection criteria.

#### Assumption 16.

Assumption 11 holds with (iii) and (iv) strengthened to

- (i){m
_{t}^{j}(ω, γ^{j})} and {∂m_{t}^{j}(ω, γ^{j})/∂γ^{j′}} are 2r-dominated on Γ_{j}uniformly in t, - (ii){m
_{t}^{j}(ω, γ^{j})} and {∂m_{t}^{j}(ω, γ^{j})/∂γ^{j′}} are near-epoch dependent upon {X_{t}} of size −2(r − 1)/(r − 2) and −1/2, respectively, where the first dependence is uniform on Γ_{j}.

Next, we follow Newey and West (1987b) and Gallant and White (1988) and we introduce a truncation lag and some weights.

#### Assumption 17.

{m_{n}} is a sequence of integers such that m_{n}+∞ as n +∞ and m_{n}= o(n^{1/4}).

#### Assumption 18.

Given a sequence {m_{n}}, define where {a_{nt} ; t = 1,…, m_{n}+ 1, n = 1, 2,…} is a triangular array such that |w_{nt} | ≤Δ for some Δ < ∞ and all n = 1, 2,… and τ= 0, 1,…, m_{n}. Moreover, for each τ, w_{nτ} 1 as n ∞. Assumptions 17 and 18 are identical to assumptions TL and WT of Gallant and White (1988). See also Andrews (1991) for weaker assumptions.

We are now in a position to define the class of variance estimators that are considered. Using (9), a consistent estimator of R_{n} is, as usual, where is the (q_{j}+ k_{j})-dimensional vector

- (19)

and and are the partial derivatives of d_{j} (m^{j}, γ^{j}) with respect to m^{j} and γ^{j} evaluated at . Define the (q + k) × (q + k) matrix

- (20)

where is the (q + k)-dimensional vector

- (21)

We then define .

The next theorem is the main result of this section. Define the (q + k) × (q + k) matrix

- (22)

where μ_{nt} is the (q + k)-dimensional vector

- (23)

Lastly, let . Note that in general. Indeed, , which composes the first q_{j} elements of , is typically nonzero. In contrast, the last k_{j} elements of are typically equal to zero as satisfies in general.

#### Theorem 4.

Given Assumptions 1–3, 9 and 15–18, suppose that, under H_{0}^{*}, (i) and (ii) there exists a sequence {d_{n}} such that for every t = 1,…, n with d_{n}= O(n^{−1/8}). Then under H_{0}^{*}.^{14}

As noted by Gallant and White (1988, Chapter 6), in the presence of heterogeneous observations and misspecifications, *overestimates* in general V_{n} since , where Λ_{n} is positive semidefinite (see step 2 in the proof). Moreover, as noted by these authors, Λ_{n} is not guaranteed to be bounded. The contribution of Theorem 4is thus to provide some sufficient conditions that ensure the consistency of to σ_{n}^{2} so that asymptotically valid inferences based on T_{n} can be performed. Because R_{n}= O(1), condition (i) requires that a linear combination of vanishes, while condition (ii) controls the fluctuations of the individual means μ_{nt} around the overall mean .^{15}

Condition (i) is satisfied in important situations such as selecting models estimated by ML based on their possibly adjusted likelihood values, or selecting models by GMM based on their GMM criteria. Specifically in ML estimation, we have d_{j} (m^{j}, γ^{j}) = m^{j} so that . Moreover, and satisfies = 0. Therefore R_{n}^{j} is zero except for its first component which is equal to one. Hence . It follows that condition (i) is satisfied under H_{0}^{*}. In GMM estimation we have d_{j}(m^{j}, γ^{j}) =m^{j′} P_{j} m^{j} where P_{j} is a q_{j}× q_{j} matrix so that . Moreover, satisfies . Thus . Hence so that condition (i) again holds under H_{0}^{*}. The latter case includes selecting (non)linear regressions estimated by (non)linear least squares based on their in-sample or out-of-sample MSEP under covariance stationarity as in West (1994), West (1996). See the example below. More generally, when the last k_{j} elements of are zero, as noted before Theorem 4, then condition (i) reduces to , which must hold under H_{0}^{*}.

Regarding condition (ii), note that Assumptions 15 (iii) and 16 (iii) already imply for every n, t so that is bounded uniformly in n, t because R_{n}= O(1). Thus condition (ii) strengthens this requirement. For instance, in the above ML case, this condition requires that the deviation of from its mean , for t = 1,…, n not only remains bounded but decreases (at the rate n^{−1/8}) to zero as n ∞. Second, condition (ii) is clearly satisfied if Em_{t}^{j} (ω, γ^{j}) = Em_{s}^{j} (ω, γ^{j}) and Eδ_{t}^{j} (ω, γ^{j}) = Eδ_{s}^{j} (ω, γ^{j}), for every t, s = 1, 2,…, γ^{j}∈Γ^{j}, and for j = 1, 2. This holds when {m_{t}^{j} (ω, γ^{j}), t = 1, 2,…} and {δ_{t}^{j} (ω, γ^{j}), t = 1, 2,…} are first-order stationary processes.

#### Example

[Continued] We consider the case where the out-of-sample MSEP is used as a model selection criterion. Because Assumptions 1–3 and 9 have been already verified in Section 12, it remains to verify Assumptions 15–18 and conditions (i)–(ii) in order to apply Theorem 4. Assumptions 17 and 18 are satisfied by letting m_{n} grow at a rate slower than n^{1/4} and by choosing appropriate weights w_{nt} such as the Bartlett weights w_{nt}= 1 − {t/(m_{n}+ 1)} (see also footnote 13). Assumption 16 is verified by using an argument similar to that used for verifying Assumption 11 in Section 12.

To verify Assumption 15, we use the asymptotic linear representation of obtained in Section 12 and the definition of Y_{t}^{*}. These give

Let δ_{t}^{j} (ω, γ^{j}) =γ^{j} Y_{t}^{*2}− Y_{t}^{*} Y_{t−2j}^{*}= (γ^{j} Y_{t/2}^{2}− Y_{t/2} Y_{t/2−j})I(|t| even). Note that because and {Y_{t}} is stationary (see Section 12). Using an argument similar to that used for verifying Assumption 10 in Section 12, it is easy to see that Assumption 15 (i)–(iv) hold, where X_{t}= (Y_{t}, Y_{t}^{*})′. Moreover, let A_{jn}^{+}= 2/E(Y_{t}^{2}) and , where is any consistent estimator of E(Y_{t}^{2}), such as by any Law of Large Numbers for stationary ARMA or stationary and ergodic processes (see e.g. Hayashi (2000, p. 101)). It follows that Assumption 15 holds.

We now turn to conditions (i)–(ii). From Section 12 recall that m_{t}^{j} (ω, γ^{j}) = ((Y_{t}−γ^{j} Y_{t−j})^{2}, (Y_{t}^{*}−γ^{j} Y_{t−2j}^{*})^{2})′. Combining this with the above definition of δ_{t}^{j} (ω, γ^{j}), the definition of Y_{t}^{*}, and the weak stationarity of {Y_{t}} gives μ_{nt}^{j′}=γ_{Y } (0)(1 −ρ_{j}^{o2})(1, I(|t| even), 0). Moreover, from the definitions of M_{1n}^{j} (ω, γ^{j}) and M_{2n}^{j} (ω, γ^{j}) given in Section 12, it is easy to see that E∂M_{kn}^{j} (ω, ρ_{j}^{o})/∂γ^{j}= 0 for k = 1, 2. Hence, R_{n}^{j′}= (2, −2, 0) since d_{j} (m, γ^{j}) = 2(m_{1}− m_{2}). Therefore, R_{n}^{′}μ_{nt}= R_{n}^{1′}μ_{nt}^{1}− R_{n}^{2′}μ_{nt}^{2}= 2γ_{Y } (0)(ρ_{2}^{o2}−ρ_{1}^{o2})(1 − I(|t|even) = 0 under H_{0}^{*}= H_{0} (see (11)). It follows that conditions (i) and (ii) are satisfied under H_{0}^{*}.

Theorem 4 thus applies and delivers a consistent estimator of σ_{n}^{2}. Specifically, simple algebra shows that is given by (20), where is replaced by the scalar with

and Y_{t}^{*}= Y_{t/2} I(|t| even). As a matter of fact, we can propose a simpler consistent estimator of σ_{n}^{2} by exploiting the fact that R_{n}^{′}μ_{nt}= 0 under H_{0}. Namely, from the alternative expression for σ_{n}^{2} given in footnote 14, we have R′_{n} U_{nt}= R_{n}^{1′} U_{nt}^{1}− R_{n}^{2′} U_{nt}^{2}, where R_{n}^{j′} U_{nt}^{j}= 2{(Y_{t}−ρ_{j}^{o} Y_{t−j})^{2}− (Y_{t}^{*}−ρ_{j}^{o} Y_{t−2j}^{*})^{2}}. Hence, using the definition of Y_{t}^{*}. Thus

- (24)

As expected from the remark after Theorem 2, the asymptotic variance σ_{n}^{2} can be computed by neglecting the estimation uncertainty arising from , i.e. as if ρ_{j}^{o} is known (see (10)). Moreover, because R_{n}^{′}μ_{nt}= 0 under H_{0}, the expectation of the term in braces is zero under H_{0}. Hence from Newey and West (1987b) and Gallant and White (1988), a simpler consistent estimator of σ_{n}^{2} is given by four times the expression (20), where is replaced by the difference in squared prediction errors with the first and last sums starting from t = n^{*}+ 1 and t =τ+ n^{*}+ 1.^{16}

### On the positive asymptotic variance

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

It remains to discuss Assumption 7, namely lim inf_{n}σ_{n}^{2} > 0. Similar assumptions appear in Vuong (1989) for likelihood-based criteria in the static case and West (1996) for out-of-sample MSEP-based criteria. The purpose of this section is to characterize situations for which this assumption is violated. More precisely, we consider cases when lim_{n}σ_{n}^{2}= 0. By considering subsequences of σ_{n}^{2}, our results can be modified to obtain necessary and sufficient conditions for lim inf_{n}σ_{n}^{2} > 0. Hereafter, we maintain H_{0}^{*} since we are interested in the asymptotic distribution of our test statistic T_{n} under the null hypothesis. Moreover, we adopt the general framework of Section 2.

Our first result shows the importance of Assumption 7.

#### Lemma 1.

Given Assumptions 1–6, suppose that H_{0}^{*} holds.

- (i)Then σ
_{n}^{2}= o(1) if and only if . - (ii)In addition, assume that for j = 1, 2. Then σ
_{n}^{2}= o(1) if and only if .

Lemma 32 extends Vuong (1989, Lemma 4.1) to dynamic situations and general model selection criteria. Part (i) shows that the -asymptotic normality of our test statistic hold only if σ_{n}^{2} |= o(1). Part (ii) specializes to the case where estimation methods for both competing models (whether nested or nonnested) optimize their respective model selection criteria. As seen in Section 18, examples are ML estimation with log-likelihood-type criteria and GMM estimation with GMM criterion functions. The necessary and sufficient condition (ii) can then be interpreted as requiring that the estimated models are -asymptotically *identical*. This condition is, of course, satisfied when almost surely for n sufficiently large.

The last remark suggests that Assumption 7 is violated when the competing models are nested and estimated by optimizing the same selection criterion. This is confirmed by the next result. We define

#### Definition

Model 1 is nested in model 2 according to the chosen selection criteria if there exists a sequence of continuously differentiable functions from Γ_{1} to Γ_{2} such that, for n sufficiently large, Q_{n}^{1} (ω, γ^{1}) = Q_{n}^{2} {ω, h_{n} (γ^{1})}, for all (ω, γ^{1}) ∈Ω×Γ_{1}. In particular, this definition applies when the competing models are nested in the usual sense and the same criterion is used to compare these models. Note, however, that it is not sufficient to consider only nested models.

#### Theorem 5.

Given Assumptions 1–6, suppose that model 1 is nested in model 2 according to the selection criteria. Moreover, suppose that (i) for j = 1, 2, (ii) for any nonstochastic sequence such that , and (iii) is the identifiably unique minimizer of on Γ_{2}. Then σ_{n}^{2}= o(1) under H_{0}^{*}. As in Lemma 32 (ii), we consider the case where estimators optimize the corresponding selection criterion so their limits satisfy condition (i). Conditions (ii) and (iii) are typically satisfied by extremum estimators. In particular, is in general asymptotically normal under sequences of local alternatives converging to .

Theorem 5 shows that our model selection testing procedure based on T_{n} is not valid when the competing models are nested according to the chosen selection criteria. For instance, this is the case for nested regression models estimated by nonlinear least squares and compared using their in-sample or out-of-sample MSEPs. This result is in agreement with previous work in nested situations (see e.g. Hansen (1982), Andrews and Fair (1988), Gallant and White (1988)), which indicates that is asymptotically chi-square distributed under the null hypothesis that the smaller model is correctly specified. More generally, when σ_{n}^{2}= o(1)Marcellino (2000) has shown that follows a weighted chi-square distribution (see Vuong (1989, Definition 1)).

In view of the importance of Assumption 7, a natural question is whether one can test σ_{n}^{2}= o(1) using an extension of the variance test proposed in Vuong (1989) for the static likelihood case. When the selection criteria are of the likelihood type, i.e. (ω, γ^{j}), and minimizes Q_{n}^{j} (ω, γ^{j}), recent work by Golden (2000) for the strict stationary case and Marcellino (2000) for the case where is a martingale difference sequence has shown that follows asymptotically a weighted chi-square distribution under the null hypothesis σ_{n}^{2}= o(1). This result will continue to hold in our general framework.

#### Example

[Continued] When the out-of-sample MSEP is used as a model selection criterion, we have . Because Assumptions 1–6 holds by the verification of Assumptions 1–3 and 9–11 in Section 12, Lemma 32 applies. Hence, from and Lemma 32 (i), σ_{n}^{2}= o(1) if and only if under H_{0}^{*}= H_{0}. Moreover, because is consistent for and , as shown is Section 12, then , i.e. minimizes asymptotically the out-of-sample MSEP. Hence, by Lemma 32 (ii), σ_{n}^{2}= o(1) if and only if under H_{0}. These results agree with the direct computation of the asymptotic variance σ_{n}^{2} given in (24).

As indicated above, one might want to test σ_{n}^{2}= o(1), i.e. that the so-called *long-run* variance of the stationary process {d_{t}} ≡ {(Y_{t}−ρ_{1}^{o} Y_{t−1})^{2}− (Y_{t}−ρ_{2}^{o} Y_{t−2})^{2}} is zero. This is obviously the case when the first and second autocorrelations ρ_{1}^{o} and ρ_{2}^{o} of the stationary process {Y_{t}} are zero, though σ_{n}^{2}= o(1) may hold for other DGPs under H_{0}^{*}= H_{0} given in (11). From Section 24, can be used to test σ_{n}^{2}= o(1) as the former is a consistent estimator of the latter, which is equal to when the autocovariances {γ_{d} (τ)} of the process {d_{t}} are summable (see e.g. Hayashi (2000, p. 401)). In particular, recent results obtained by Golden (2000) and Marcellino (2000) under some additional assumptions indicate that follows asymptotically a weighted chi-square distribution under σ_{n}^{2}= o(1) with weights that can be estimated consistently.^{17}

### Conclusion

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

This paper offers a general testing framework for assessing the statistical significance of the difference in model selection criterion values for two competing models under weak assumptions on the data generating process. Such a testing framework encompasses the static likelihood-based situations studied in Vuong (1989) as well as the out-of-sample prediction-based criteria considered in West (1994), West (1996). The competing models must be essentially nonnested but can be dynamic and incompletely specified. Our results allow for a wide class of -asymptotically normal estimators and model selection criteria. Moreover, the methods used to estimate the competing models need not optimize the selection criteria used for model selection. Thus different samples can be used for model estimation and comparison. Situations where sampling uncertainty due to parameter estimation is asymptotically irrelevant for testing model equivalence are stressed. In particular, this is the case when the employed estimators optimize (possibly asymptotically) the model selection criteria used for model comparison.

To conclude, we make three remarks. First, our testing framework allows for the comparison of two competing models only. This is restrictive in practice. Extension to more than two models raises a problem of multiple comparison. Recently, Shimodaira (1998) has extended Vuong (1989) setting to multiple competing models through the use of confidence intervals, and White (2000) has extended West's (1994, 1996) prediction framework while establishing the validity of bootstrapping in such a multiple testing situation. Second, our results do not apply to the comparison of nonparametric models. Extension of our testing framework to nonparametric situations is possible as shown by Lavergne and Vuong (1996), Lavergne and Vuong (2000) for nonparametric regressions. For a survey of selection of regressors in parametric and nonparametric regressions, see Lavergne (1998).

Third, we have not attempted to address the choice of model selection criteria. Indeed no single index may be universally superior as each reflects the particular features of interest to a researcher. As Amemiya (1980) wrote ‘…all of the criteria considered are based on a somewhat arbitrary assumption which cannot be fully justified, and that by slightly varying the loss function and the decision strategy one can indefinitely go on inventing new criteria’. Recently, however, Granger and Pesaran (2000) have argued for a closer link between forecast evaluation and decision theory. Moreover, though our framework allows for a large class of criteria, one should be cautious in the choice of such criteria as systematic applications of our results may lead to nonsensical outcomes. This can be the case if different criteria are used across competing models.

### Acknowledgements

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

We thank E. Guerre, P. Lavergne, A. Monfort, two referees and the Editor as well as seminar participants at the Université de Montréal, Stanford University, Université de Toulouse, Université d’Aix–Marseille, the World Congress of the Econometric Society, Barcelona, August 1990 and Malinvaud econometric seminar, Paris, November 1990. The second author is grateful to J. M. Dufour and E. Ghysels for a visit at the Université de Montréal in April 1990, which led to an early version (Rivers and Vuong 1991) containing the basic results of the current paper. Financial support from the National Science Foundation under Grant SBR-9631212 is gratefully acknowledged.

### References

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

- 11973); Information theory and an extension of the likelihood ratio principle. . In(
*Proceedings of the Second International Symposium of Information Theory*, (PetrovB.N.& CsakiF.. ed.) , pp. 257–81. Akademiai Kiado, Budapest. - 21974); A new look at the statistical model identification. IEEE Transactions and Automatic Control AC-19: 716–23.(
- 3 (
- 41985);(
*Advanced Econometrics*, Harvard University Press, Cambridge. - 51987); Consistency in nonlinear econometric models: a generic uniform law of large numbers. Econometrica 55: 1465–71.(
- 61991); Heteroskedasticity and autocorrelation consistent matrix estimation. Econometrica 59: 817–58.(
- 71994); Asymptotics for semiparametric econometric models via stochastic equicontinuity. Econometrica 62: 43–72.(
- 81988); Inferences in econometric models with structural change. Review of Economic Studies 55: 615–40.& (
- 91985); A unified theory of consistent estimation for parametric models. Econometric Theory 1: 151–78.& (
- 101999); Explaining investment dynamics in U.S. manufacturing: a generalized (S, s) approach. Econometrica 67: 783–826.& (Direct Link:
- 111961); Tests of separate families of hypotheses. . In(
*In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability*, vol. 1. , pp. 105–23. - 121962); Further results on tests of separate families of hypotheses. Journal of the Royal Statistical Society, Series B 24: 406–24.(
- 131981); Several tests for model specification in the presence of alternative hypotheses. Econometrica 49: 781–93.& (
- 141995); Comparing predictive accuracy. Journal of Business and Economic Statistics 13: 253–63.& (
- 151982); Misspecified models with dependent observations. Journal of Econometrics 20: 35–58.& (
- 161983); Asymptotic properties of instrumental variables statistics for testing non-nested hypotheses. Review of Economic Studies 50: 287–304.(
- 171990); Comparing information in forecasts from econometric models. American Economic Review 80: 375–89.& (
- 181990); Making Difficult Model Comparisons, mimeo, U.S. Bureau of the Census.(
- 191991); Convergence of finite multistep predictors from incorrect models and its role in model selection. Note di Matematica XI: 145–55.(
- 201998); New capabilities and methods of the X-12-ARIMA seasonal adjustment program. Journal of Business and Economic Statistics 16: 127–77., , , & (
- 211993); Moment bound for deriving time series CLT's and model selection procedures. Statistica Sinica 3: 453–80.& (
- 221976);(
*Introduction to Statistical Time Series*, Wiley, New York. - 231988);& (
*A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models*, Basil Blackwell, New York. - 241992); Econometric analysis of collusive behavior in a soft drink industry. Journal of Economics and Management Strategy 1: 277–311., & (Direct Link:
- 251990); Testing nonnested Euler conditions with quadrature-based methods of approximation. Journal of Econometrics 46: 273–308.& (
- 261983); Testing non-nested models after estimation by instrumental variables or least squares. Econometrica 51: 355–65.(
- 272000); Discrepancy Risk Model Selection Test Theory for Comparing Possibly Misspecified or Nonnested Models, mimeo, University of Texas, Dallas.(
- 281983); Testing nested or non-nested hypotheses. Journal of Econometrics 21: 83–115., & (
- 292000); Economic and statistical measures of forecast accuracy. Journal of Forecasting 19: 537–60.& (
- 301986);, , & (
*Robust Statistics: The Approach Based on Influence Functions*, Wiley, New York. - 311979); The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B 41: 190–5.& (
- 321982); Large sample properties of generalized method of moments estimators. Econometrica 50: 1029–54.(
- 332000);(
*Econometrics*, Princeton University Press, Princeton. - 341981); Heterogeneity and state dependence. . In(
*Studies in Labor Markets*, (RosenS.. ed.) , pp. 91–139. University of Chicago Press, Chicago. - 35 (
- 361971);& (
*Independent and Stationary Sequences of Random Variables*, Wolters-Noordhoff, Groningen. - 371969); Asymptotic properties of non-linear least squares estimators. Annals of Mathematical Statistics 40: 633–43.(
- 381987); Further results on testing AR(1) against MA(1) disturbances in the linear regression model. Review of Economic Studies 54: 649–63.& (
- 391996); Generalized information criterion in model selection. Biometrika 83: 875–90.& (
- 401951); On information and sufficiency. Annals of Mathematical Statistics 22: 79–86.& (
- 411998); Selection of regressors in econometrics: parametric and nonparametric methods. Econometric Reviews 17: 227–73.(
- 421996); Nonparametric selection of regressors: the nonnested case. Econometrica 64: 207–19.& (
- 43 & (
- 441987); Selecting the best linear regression model: a classical approach. Journal of Econometrics, Annals 35: 3–23.& (
- 451988); A test whether two AIC's differ significantly. South African Statistical Journal 22: 153–61.(
- 461986);& (
*Model Selection*, Wiley, New York. - 471993); Robust model selection and M-estimation. Econometric Theory 9: 478–93.(
- 481983); Model specification tests against non-nested alternatives. Econometric Reviews 2: 85–110.(
- 49 (
- 502000); Model Selection for Non-linear Dynamic Models, mimeo, Universita Bocconi.(
- 511980); Robust estimation of autoregressive models. . In(
*Directions in Time Series*, (BrillingerD.R.& TiaoG.C.. ed.) , pp. 228–62. Institute of Mathematical Statistics, Hayward. - 521987); Specification tests for separate models: a survey. . In(
*Specification Analysis in the Linear Model*, (KingM.L.& GilesD.E. A.. ed.) , pp. 146–95. Routledge and Kegan Paul, London. - 531983); Empirical exchange rate models of the seventies: do they fit out of sample?Journal of International Economics 14: 3–24.& (
- 541986); The encompassing principle and its applications to testing non-nested hypotheses. Econometrica 54: 657–78.& (
- 551978); Chi-square tests. . In(
*Studies in Statistics*, vol. 19. (HoggR.V.. ed.) The Mathematical Association of America, - 561990); Alternative approaches to model choice. Journal of Economic Behavior and Organization 14: 97–125., & (
- 571994); Large sample estimation and hypothesis testing. . In& (
*Handbook of Econometrics*, vol. 4. (EngleR.F.& McFaddenD.. ed.) , pp. 2111–245. North Holland, Amsterdam. - 581987a); Hypothesis testing with efficient method of moments estimators. International Economic Review 28: 777–87.& (
- 591987b); A simple positive semi-definite heteroskedasticty and autocorrelation consistent covariance matrix. Econometrica 55: 703–8.& (
- 601997); Deriving an estimate of the optimal reserve price: an application to British Columbian timber sales. Journal of Econometrics 78: 333–57.(
- 611974); On the general problem of model selection. Review of Economic Studies 41: 153–71.(
- 621994); A generalized R& (
^{2}criterion for regression models estimated by the instrumental variables method. Econometrica 62: 705–10. - 631980); The Strong Mixing Properties of the Autoregressive Moving Average Time Series Models, Seminaire de Statistique, Grenoble.& (
- 641994); Estimation of semiparametric models. . In(
*Handbook of Econometrics*, vol. 4. (EngleR.F.& McFaddenD.. ed.) , pp. 2443–521. North-Holland, Amsterdam. - 651991); Model selection tests for nonlinear dynamic models, Working Paper 9108, INRA-ESR, Toulouse.& (
- 661985); Robust model selection in regression. Statistics and Probability Letters 3: 21–3.(
- 671958); The estimation of economic relationships using instrumental variables. Econometrica 26: 393–415.(
- 68 (
- 691998); An application of multiple comparison techniques to model selection. Annals of the Institute of Statistical Mathematics 50: 1–15.(
- 701996); Information criteria for selecting possibly misspecified parametric models. Journal of Econometrics 71: 207–25.& (
- 711992); Non-nested tests for competing models estimated by generalized method of moments. Econometrica 60: 973–80.(
- 721968); A test of the mean square error criterion for restrictions in linear regression. Journal of the American Statistical Association 63: 558–72.& (
- 731989); Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57: 307–33.(
- 741991); Tests for model selection using power divergency statistics, Working Paper 9106, INRA–ESR, Toulouse.& (
- 751993a); Minimum chi-square estimation and tests for model selection. Journal of Econometrics 56: 141–68.& (
- 761993b); Selecting estimates using chi-square statistics. Annales d’Economie et de Statistique 30: 143–64.& (
- 77 (
- 781994); Asymptotic Inference about predictive ability, mimeo, University of Wisconsin.(
- 791996); Asymptotic inference about predictive ability. Econometrica 64: 1067–84.(
- 801982); Maximum likelihood estimation of misspecified models. Econometrica 50: 1–25.(
- 811984);(
*Asymptotic Theory for Econometricians*, Academic, New York. - 822000); A reality check for data snooping. Econometrica 68: 1097–126.(
- 831994); An econometric analysis of the asymmetric information regulator utility interaction. Annales d’Economie et de Statistique 34: 13–69.(

### Appendix

- Top of page
- Abstract
- Introduction
- Tests statistics and general results
- Model selection tests without lack-of-fit minimization
- Model selection tests with lack-of-fit minimization
- Consistent variance estimation
- On the positive asymptotic variance
- Conclusion
- Acknowledgements
- References
- Appendix

##### Proof of Theorem 1

A Taylor expansion of around gives

where is in the line segment for j = 1, 2. Multiplying by and adding and subtracting a term, we obtain

Now, because by Assumption 3. Thus, given Assumptions 1, 2 and 5, it follows from Domowitz and White (1982, Theorem 2.3)that . Moreover, by Assumption 6, . Hence we obtain

- (25)

Subtracting from both sides, and then subtracting the resulting equations for j = 1, 2 from each other, we obtain in matrix notation

- (26)

where the first equality follows from (3) and (4), and the second equality uses Assumption 6 and the boundedness of L_{n}, which is implied by Assumption 5.

Now we apply White (1984, Corollary 4.24) with , V_{n}= I_{s} and A_{n}= C_{n} L_{n}. Since L_{n}= O_{P } (1) and C_{n}= O_{P } (1) by Assumptions 5–6, then A_{n}= O_{P } (1). Moreover, A_{n} is of (full column) rank one for all n sufficiently large by Assumption 7. Thus, if σ_{n}^{2}= L′_{n} C′_{n} C_{n} L_{n}, then σ_{n}^{2}= O(1) and . Since σ_{n}^{−1}= O(1) by Assumption 7, multiplying (A.2) by σ_{n}^{−1} gives

- (27)

Finally, we have

Since σ_{n}= O(1) and σ_{n}^{−1}= O(1), it follows from Assumption 8 that . Moreover, because 1/M < σ_{n}^{−1} < M for n sufficiently large and some M > 0, statements (i)–(iii) follow immediately from (5) and (A.3).

##### Proof of Theorem 2

##### Step 1: Verification of Assumptions 4 and 5

Consider the quantities M_{n}^{j} (ω, γ^{j}), ∂M_{n}^{j} (ω, γ^{j})/∂γ^{j′} and their expectations. From Assumptions 1, 2, 9 and 11, it follows from Gallant and White (1988, Theorem 3.18) that

- (28)

where EM_{n}^{j} (ω, γ^{j}) and E∂M_{n}^{j} (ω, γ^{j})/∂γ^{j′} are continuous on Γ_{j} uniformly in n, i.e. equicontinuous. Moreover, by Assumption 11 (iii), the latter two functions are uniformly bounded, i.e. there exists an M finite such that E | M_{n}^{j} (ω, γ^{j}) |< M and E | ∂M_{n}^{j} (ω, γ^{j})/∂γ^{j′} |< M for all γ^{j} and all n. Now define the nonstochastic function from Γ_{j} R for j = 1, 2 where . Because is continuous and EM_{n}^{j} (ω, γ^{j}) is uniformly bounded and equicontinuous, then is equicontinuous. Moreover, because Γ_{j} is compact, (A.4) implies that . This completes the verification of Assumption 4.

Turning to Assumption 5, from the LDC theorem we have ∂Em_{t}^{j} (ω, γ^{j})/∂γ^{j′}= E∂m_{t}^{j}(ω, γ^{j})/∂γ^{j′}, for t = 1, 2,…. Thus is continuously differentiable on Γ_{j}, and we have by the chain rule

- (29)

In fact, is equicontinuous because each term of (A.6) is equicontinuous. The latter statement follows from the continuous differentiability of and the uniform boundedness and equicontinuity of EM_{n}^{j} (ω, γ^{j}) and E∂M_{n}^{j} (ω, γ^{j})/∂γ^{j′}. In particular, is bounded as required. Lastly, to prove uniform convergence, note that

- (30)

Since is continuously differentiable and Γ_{j} is compact, (A.4) and (A.5) imply

Thus, because d_{j} {EM_{n}^{j} (ω, γ^{j}), γ^{j}}/∂m^{j′} and E∂M_{n}^{j} (ω, γ^{j})/∂γ^{j′} are both O(1), we obtain from (A.4)–(A.7) that uniformly on Γ_{j}. This completes the verification of Assumption 5.

##### Step 2: Verification of Assumption 6

A Taylor expansion gives

- (31)

where is in the line segment . Now because of (A.4), Assumption 3 and Domowitz and White (1982, Theorem 2.3). Hence . Since and Γ_{j} is compact, then . Thus, *provided*, which is proved later, (A.8) becomes

Hence, stacking up and using Assumption 10, we obtain

- (32)

where U_{n} is defined in Assumption 6, and .

We shall show that satisfies Assumption 6. To do so, define

First, by Assumptions 10 and 11 with r > 2, we have W_{jn} < ∞ and B_{jn} < ∞. Moreover, W_{jn}= O(1) and B_{jn}= O(1). The proof of the latter follows the proof that B_{n}^{o}= O(1) in Gallant and White (1988, pp. 86–87). Specifically, define (or Z_{nt}=λ′Y_{nt}^{j}) where λ∈ R (or λ∈ R), λ′ λ= 1 so that (or λ′B_{jn}λ). Now, since EZ_{nt}= 0,

But, given Assumptions 1, 9, 11, (or 10), it follows from Gallant and White (1988, Lemma 3.14)that Z_{nt} is a mixingale of size −1 and *a fortiori* of size −1/2 with c_{nt}≤Δ≤∞ for all n, t. Therefore, applying McLeish's inequality (Gallant and White (1988, Theorem 3.11)), we have . Thus, λ′W_{jn}λ (or λ′B_{jn}λ) is O(1) for arbitrary λ∈ R or (λ∈ R), λ′λ= 1, implying that W_{jn}= O(1) (or B_{jn}= O(1)).

Next, define V_{nt}= (V_{nt}^{1′}, V_{nt}^{2′})′ where so that V_{n} is the covariance matrix of by (8). Thus, by the Cauchy–Schwartz inequality, it follows from the preceding properties of W_{jn} and B_{jn} that V_{n} < ∞ and V_{n}= O(1). Moreover, by assumption, V_{n} is uniformly of rank s > 0. Hence, for every n = 1, 2,…, there exists a (k + q) × s matrix P_{n} that is uniformly of full column rank such that V_{n}= P_{n} P ′_{n}. Since V_{n}= O(1), then P_{n}= O(1) also. Let so that, almost surely,

- (33)

We shall show that . The proof is similar to the proof that B_{n}^{o−1/2} M_{nt}^{o} N(0, I) in Gallant and White (1988, p. 87).

Specifically, define Z_{nt}=λ′(P ′_{n} P_{n})^{−1} P ′_{n} V_{nt}. where λ∈ R^{s}, λ′ λ= 1. Thus, EZ_{nt}= 0. By Assumptions 10 and 11, we have ‖V_{nt}^{j}‖_{r}≤Δ < ∞, r > 2. Moreover, because P_{n} is O(1) and P_{n} is uniformly of full-column rank, then (P ′_{n} P_{n})^{−1}= O(1) and (P ′_{n} P_{n})^{−1} P ′_{n}= O(1). Thus ‖Z_{nt}‖_{r}≤Δ′ for r > 2. In addition, by Assumptions 10 and 11, {Z_{nt}} is near-epoch dependent on {X_{t}} of size −1 where {X_{t}} is mixing with φ_{m} of size −r/(r − 1) or α_{m} of size −2r/(r − 2) by Assumption 9. Define

so that v_{n}^{−2}= O(n^{−1}). Hence, by Gallant and White (1988, Theorem 5.3),

Thus, by the Cramer–Wold device. In particular, because P_{n}= O(1), (A.10) implies that , a condition which was needed to obtain (A.9).

##### Step 3: Computation of σ_{n}^{2}

We have

Using P_{n} P ′_{n}= V_{n} and formula (A.6) for , we obtain the desired result from σ_{n}^{2}= L′_{n} C′_{n} C_{n} L_{n}.

##### Proof of Theorem 3

##### Step 1: Verification of Assumption 3

Let . The existence and measurability of follows from Gallant and White (1988, Lemma 2.1) or Jennrich (1969) because is measurable- for every θ^{j}∈Θ_{j} and is continuous in θ^{j} for almost all ω by Assumption 12 (i).

Define . The fact that belongs to the interior of Γ_{j} uniformly in n follows from Assumptions 13 and 14 (i). It remains to verify that . In view of Assumption 13, it suffices to prove that . By Assumption 12 (iii) and r > 2, we have (ω, θ^{j})‖≤Δ, for some Δ < ∞, and all θ^{j}∈Θ_{j} and n = 1, 2,…. Also, by Assumptions 12 (ii)–(iv), it follows from Gallant and White (1988, Theorem 3.18) that EM_{n}^{j} (ω, θ^{j}) is continuous on Θ_{j} uniformly in n, and uniformly on Θ_{j}. Hence, using Assumption 13, we have

The first vector is measurable on Ω and, for each ω∈Ω, is continuous in θ^{j}. The second vector is also continuous in θ^{j}. Since EM_{n}^{j} (ω, θ^{j}) is bounded on Θ_{j} uniformly in n, θ^{j}∈Θ_{j} compact, and compact, it follows from Gallant and White (1988, Lemma 3.4)or Bates and White (1985, Lemma 2.4) that

where . Therefore, by Gallant and White (1988, Theorem 3.3) or Domowitz and White (1982, Theorem 2.2), it follows from Assumption 14 (i) that .

##### Step 2: Verification of Assumption 10

First, we consider the asymptotic distribution of . Because belongs to the interior of Θ_{j} uniformly in n by Assumption 14, and , we have . From a Taylor expansion around we obtain

- (34)

where and . Now

- (35)

where ∂^{2} M_{n}^{j} (ω, θ^{j})/∂θ^{j}∂θ^{j′}≡(∂^{2} M_{n1}^{j} (ω, θ^{j})/∂θ^{j}∂θ^{j′},…, ∂^{2} M^{j} (ω, θ^{j})/∂θ^{j}∂θ^{j′})′ using the same notation as in Gallant and White (1988, Lemma 5.2). Also

On the other hand, from Assumption 12 (iii) and the LDC theorem,

Thus, from , we obtain

Note that every function in the right-hand side of any of the above three equations is bounded on Θ_{j} or Θ_{j}×ϒ_{j} uniformly in n because of Assumption 12 (iii). Hence and its derivatives are bounded on Θ_{j}×ϒ_{j} uniformly in n.

Now, given Assumptions 1, 9 and 12, it follows from Gallant and White (1988, Theorem 3.18) that

where EM_{n}^{j} (ω, θ^{j}), E∂M_{n}^{j} (ω, θ^{j})/∂θ^{j}, and E∂^{2} M_{n}^{j} (ω, θ^{j})/∂θ^{j}∂θ^{j′} are continuous on Θ_{j} uniformly in n. Since these three functions are bounded on Θ_{j} uniformly in n because of Assumption 12 (iii), and since and its first two partial derivatives are continuous and hence uniformly continuous on compact sets, it follows from White (1984, Proposition 2.16) that

Moreover, and are continuous on Θ_{j}×ϒ_{j} uniformly in n.

Using the preceding results with and because and , it follows from Domowitz and White (1982, Theorem 2.3) that

where and are both O(1). Hence, Assumption 14 (ii) implies that is nonsingular for n sufficiently large almost surely and A_{1jn}^{−1}= O(1). Thus, from (A.11) we obtain

But, from A_{1jn}= O(1) uniformly positive definite and A_{1jn}^{−1}= O(1), we have

using White (1984, Proposition 2.16) and = O(1). Thus, *provided*, and , we obtain

- (36)

Consider now the first term inside the braces. From (A.12) we obtain

- (37)

From a Taylor expansion around we have

where . Thus, multiplying (A.14) by , we obtain

- (38)

because , which follows from Assumption 14 (i), i.e.

But from uniform convergence established earlier. Hence, because EM_{n}^{j}, compact, compact, we have

where the second term is O(1). Moreover, since , we have

where the second terms are O(1). Since from Assumption 12 (iii), (A.15) becomes

- (39)

using the notation defined in the text, *provided* and are both O_{P } (1). The first O_{P } (1) condition follows from Assumption A1, 9 and 12 by the argument used by Gallant and White (1988, pp. 85–86) for proving that . The second O_{P } (1) condition follows similarly. Note that (A.16) implies that , as required for deriving (A.13) because , , , and are all O(1) as mentioned earlier.

Collecting results, as in Gallant and White (1988, p. 75) let

where . Hence, from (A.16) we have

where EY_{1nt}^{j}= 0. Thus, from (A.13) and Assumption 13, we obtain

- (40)

which is in the form of Assumption 10. Now recall that and A_{1jn}^{−1}= O(1), as shown earlier, and A_{2jn}^{+}= O(1) by Assumption 13. Hence A_{jn}^{+}= O(1) as required. Second, E(Y_{nt}^{j}) = 0 where Y_{nt}^{j}= (Y_{1nt}^{j′}, Y_{2nt}^{j′})′. The r-integrability of Y_{nt}^{j} uniformly in n, t follows from Assumptions 12 and 13, and the boundedness of the nonstochastic matrices appearing in Y_{nt}^{j}. Finally, {Y_{nt}^{j}} is near-epoch dependent on {X_{t}} of size −1 because of Assumptions 12 and 13. This establishes Assumption 10.

##### Step 3: Computation of σ_{n}^{2}

From Theorem 2, we have σ_{n}^{2}= R′_{n} V_{n} R_{n} where ,

and A_{jn}^{+} is the upper block triangular matrix in (A.17). Thus, matrix algebra gives . Let V_{nt}= (V_{nt}^{1′}, V_{nt}^{2′})′, where . Hence, we obtain

where the last term has a covariance matrix V_{n} with uniform rank s > 0, as assumed in Theorem 3 and required by Theorem 2. The desired result follows.

##### Proof of Theorem 4

We prove three properties, and then the result.

##### Step 1

The first property is that . From the uniform strong convergence of M_{n}^{j} (ω, γ^{j}) and ∂M_{n}^{j} (ω, γ^{j})/∂γ^{j′} to EM_{n}^{j} (ω, γ^{j}) and E∂M_{n}^{j} (ω, γ^{j})/∂γ^{j′}, and from the equicontinuity of the two limit functions (see the proof of Theorem 2), it follows from Domowitz and White (1982, Theorem 2.3) that and . Since EM_{n}^{j} (ω, γ_{n}^{j}) and E∂M_{n}^{j} (ω, γ_{n}^{j})/∂γ^{j′} are both O(1), and continuous functions are uniformly continuous on compact sets, it follows that and , where and are both O(1). Using expressions (A.6) and (A.7), it follows from White (1984, Proposition 2.16) that . Since and A_{jn}^{+}= O(1) by assumption, we obtain the desired property.

##### Step 2

The second property is that . Its proof follows the proof of Theorem 5.6 in Gallant and White (1988). First, we have where

and V_{nt}= (V_{nt}^{1′}, V_{nt}^{2′})′ is the t th element of the sum (8). To see this, note that from (18) we have

Since it follows from Gallant and White (1988, Lemma 6.6) and Assumptions 1, 9, 15 (iii)–(iv) and 26 (iii)–(iv) that .

Second, we have where

- (41)

U_{nt}= (U_{nt}^{1′}, U_{nt}^{2′})′, and . To see this, note that

- (42)

because of (22) and EU_{nt}=μ_{nt}. Consider the leading term of the difference between (A.18) and (A.19). From Assumptions 15 (iii)–(iv) and 16 (iii)–(iv), we have by Gallant and White (1988, Corollary 4.3) that the elements of {U_{nt} U′_{nt}} are r-integrable uniformly in n, t and near-epoch dependent on {X_{t}} of size −1 and hence of size −1/2. Then, applying Gallant and White (1988, Theorem 6.2) on the elements of {U_{nt} U′_{nt}− E(U_{nt} U′_{nt})}, we obtain

because |w_{n0} | ≤Δ by Assumption 18. Moreover, from Gallant and White (1988, Lemma 6.7) and Assumptions 1, 9, 15 (iii)–(iv) and 16 (iii)–(iv), we have

Hence as claimed.

Third, we have . This is proved by taking a Taylor expansion around and by using an argument identical to that used by Gallant and White (1988, p. 106 and p. 118) for proving that . It is here where we use the assumption that the elements of {∂m_{t}^{j} (ω, γ^{j})/∂γ^{j′}} and {∂δ_{t}^{j} (ω, γ^{j})/∂γ^{j′}} are 2r-dominated uniformly on Γ_{j}. Collecting the preceding three facts, it follows that . Note that Λ_{n} is positive semidefinite, which immediately follows from Assumption 18 and Gallant and White (1988, Lemma 6.5) applied to λ′μ_{nt} for arbitrary λ.

##### Step 3

The third property is that R′_{n}Λ_{n} R_{n} 0 under H_{0}^{*}. Let . Thus, from (22) we have

Consider the first term and more specifically

Since under H_{0}^{*} by condition (i), and m_{n}= o(n^{1/4}) by Assumption 17, it follows that R′_{n}Λ_{1n} R_{n} 0 under H_{0}^{*}.

Consider the second term and more specifically

where the second inequality follows from ‖ρ′_{nt}‖≤ K ≤∞ because of Assumptions 15 (iii) and 16 (iii). Since and m_{n}= o(n^{1/4}) by assumption, and R_{n}= O(1), it follows that R′_{n}Λ_{2n} R_{n} 0 and R′_{n}Λ′_{2n} R_{n} 0 under H_{0}^{*}.

Finally, consider the last term. We have

Hence, because |w_{nτ} | ≤Δ and |R′_{n}ρ_{nt} | ≤ d_{n} by assumption, we obtain

Since d_{n}^{2}= O(n^{−1/4}) by condition (ii) and m_{n}= o(n^{1/4}) by Assumption 17, it follows that R′_{n}Λ_{3n} R_{n} 0. Therefore under H_{0}^{*}, we have R′_{n}Λ_{n} R_{n} 0.

##### Step 4

We are now in a position to prove the theorem. From Steps 1 and 2 combined with R_{n}= O(1), we obtain , i.e. . Because V_{n}= O(1), R_{n}= O(1), and , then . Thus, using the definition of σ_{n}^{2}, we have , i.e. because of Step 3. Therefore the proof is complete if the term in parentheses converges in probability to zero under H_{0}^{*}.

To see the latter, we note that

while is given by a similar expression with replacing R_{n} R′_{n}. From Step 1 and R_{n}= O(1) we have . Hence under H_{0}^{*} by Step 3.

##### Proof of Lemma 32

- (i)Under H
_{0}^{*}, (A.2) gives since . Since , it follows that if and only if L′_{n}C′_{n}= o(1), i.e. if and only if σ_{n}^{2}= o(1) since σ_{n}^{2}= L′_{n}C′_{n}C_{n}L_{n}. - (ii)From (A.1) we have since and for j = 1, 2 by assumption. The desired result follows from Part (i).

##### Proof of Theorem 5

First, because model 1 is nested in model 2 according to the chosen selection criterion, we can take in view of Assumption 4. Thus H_{0}^{*} is equivalent to . Hence, because is the identifiably unique minimizer of on Γ_{2}, it is easily shown by contradiction that under H_{0}^{*}.

Next, we show that σ_{n}^{2}= o(1) by verifying condition (ii) of Lemma 32. Because model 1 is nested in model 2 according to the chosen selection criterion, such a condition is equivalent to . Taking a Taylor expansion around gives , where . This is satisfied because of assumption (ii) and . The desired result follows.