## Introduction

Most procedures for choosing between two competing econometric models take the following form: each econometric model is estimated by a method that solves some optimization problem; the models are then compared by defining an appropriate goodness-of-fit or selection criterion for each model; and the better-fitting model according to this criterion is selected. In some cases the method of estimation for each model maximizes the goodness-of-fit criterion used for model selection. For instance, when the competing models are fully parametrized and estimated by maximum likelihood, some popular procedures for model selection are based on Akaike (1973), Akaike (1974) information criterion (AIC), Schwarz (1978) information criterion (SIC), or Hannan and Quinn (1979) criterion. In other cases, a different goodness-of-fit criterion is used for model selection. This arises when the competing models are estimated using the same sample and compared on their out-of-sample mean squared errors of prediction (MSEP). See Linhart and Zucchini (1986) for various other model selection criteria and procedures.

These model selection procedures are not entirely satisfactory. Since model selection criteria depend on sample information, their actual values are subject to statistical variations. As a consequence a model with a higher model selection criterion value may not outperform *significantly* its competitor. When the competing models are fully parametrized, nonnested and estimated by maximum likelihood (ML) and when the observations are independent and identically distributed (i.i.d.), Linhart (1988) and Vuong (1989) independently proposed a general testing procedure that takes into account statistical variations and which relies on some convenient asymptotically standard normal tests for model selection based on the familiar likelihood ratio (LR) test statistic.^{2} Such tests are testing the null hypothesis that the competing models are as close to the data generating process (DGP) against the alternative hypotheses that one model is closer to the DGP where closeness of a model is measured according to the Kullback–Leibler (1951) information criterion (KLIC). Thus, as in classical nested hypothesis testing, outcomes of the tests provide information on the strength of the statistical evidence for the choice of a model based on its goodness-of-fit.^{3}

Though quite general, the applicability of Vuong's model selection tests is currently limited for various reasons. First, because they are based on the likelihood function, these tests require that the competing models be completely parametrized. For instance, this implies that error terms in competing nonlinear regression models or simultaneous equations models must be specified to belong to some parametric family of distributions. As a consequence, Vuong's tests cannot be used to discriminate between two econometric models defined by moment conditions, or more generally, between two competing models that are incompletely specified.

The second limitation arises from the method of estimation. While ML estimation is quite natural when the model selection criterion is KLIC (see e.g. White (1982)), there are various reasons that may lead a researcher to use an estimation method other than ML. For instance, for computational simplicity, robustness reasons, or by necessity because the competing models are incompletely specified, one may use an instrumental variable (IV) estimator or more generally a generalized method of moment (GMM) estimator (see Hansen (1982)), a robust estimator (see Huber (1981), Hampel *et al.* (1986)) or other extremum estimators (see Amemiya (1985), Gallant and White (1988)), a semiparametric estimator (see Andrews (1994), Newey and McFadden (1994), Powell (1994)), etc. Thus it is useful to provide a model selection testing framework that allows for a wide variety of estimation techniques.

Third, the maximum value of the likelihood (possibly adjusted) is not the only model selection criterion used in practice. For instance, when dealing with qualitative dependent variables models, alternative model selection criteria are Pearson-type goodness-of-fit statistics (see e.g. Moore (1978), Heckman (1981)). In linear regression models, criteria based on the in-sample MSEP are widely used (see e.g. Mallows (1973), Amemiya (1980)). When comparing the relative performance of macroeconometric models, a frequent criterion is the out-of-sample MSEP (see e.g. Meese and Rogoff (1983), Fair and Shiller (1990)). Another approach to the use of out-of-sample forecast performance for time series models is illustrated in Findley *et al.* (1998). Along these lines, recent contributions on model selection based on out-of-sample predictability are Diebold and Mariano (1995), West (1994), West (1996), Granger and Pesaran (2000), and White (2000).

Moreover, when the models are incompletely specified, the use of criteria other than the ML values becomes necessary. For instance, Sargan (1958) and Pesaran and Smith (1994) have proposed to use the value of the IV criterion function when the competing models are simultaneous equations models estimated by IV methods. Recently, for robustness reasons, Martin (1980), Ronchetti (1985) and Machado (1993) have proposed robust versions of the AIC and SIC criteria by replacing the likelihood part by the extremal value of the sample objective function defining the robust estimator used. See also Konishi and Kitagawa (1996) who propose a generalized information criterion that can be used with robust parameter estimates. Though the preceding criteria have a goodness-of-fit appeal, a model selection criterion need not have such a property. For instance, the precision or MSE of the parameter estimates of interest in the competing models can be a criterion for model selection (see e.g. Torro-Vizcarrondo and Wallace (1968)). This list of criteria is clearly not exhaustive. It suggests, however, than the choice of a model selection criterion depends on the researcher and the purpose of the econometric modelling.

Fourth, Vuong's tests are derived for competing models that are completely static and for observations that are i.i.d. Clearly, the i.i.d. assumption is restrictive when considering time series data. Moreover, dynamic models are frequently considered in empirical work. For instance, a classical question is to determine the order of an ARMA process (see e.g. Hannan and Quinn (1979)). Some generalizations of Vuong's tests to time series models have been undertaken recently and independently from the present work. Findley (1990), Findley (1991) considers essentially the case of competing Gaussian ARMA models when the true DGP is a strictly stationary process. Findley and Wei (1993) provide a generalization to some dynamic regression models.

The goal of the present paper is thus clear. It is to generalize Vuong (1989)model selection tests in several important directions. First, the present paper allows for incompletely specified models such as econometric models defined by moment conditions. Second, it allows for a broad class of estimation methods that includes most estimators used in practice such as the ML estimator, minimum chi-square estimators, GMM estimators, as well as other extremum estimators and some semiparametric estimators. In particular, we shall require that the estimators be -consistent. See Gallant and White (1988) and Newey and McFadden (1994) for the class of parametric and semiparametric estimators considered here. Third, the present paper allows for model selection criteria other than the models’ likelihoods. An important example is the out-of-sample MSEP. Lastly, our tests are obtained for weakly dependent heterogeneous data. This permits the application of our tests to the selection of nonlinear dynamic models in times series situations.

At the outset, it is useful to stress that a distinctive feature of our approach is that both competing models may be misspecified. In particular, our approach does not require that either competing model be correctly specified under the null hypothesis under test. Such a feature reflects the observation that one can seldom specify a statistical model that can describe accurately the data in empirical work, especially in Social Sciences. See e.g. Nakamura *et al.* (1990) for a similar point of view. As we shall see, however, this does not prevent model comparisons.

Not requiring correct specification of the competing models also contrasts with the tests of Cox (1961), Cox (1962), which led to the development of a vast econometric literature on testing nonnested hypotheses (see Pesaran (1974) and the surveys by MacKinnon (1983) and McAleer (1987)).^{4}Indeed, under the null hypothesis under test in Cox's approach, one of the competing models is correctly specified. Moreover Cox's tests are more difficult to compute than ours because they require a consistent estimate of the asymptotic mean of the test statistic under the null hypothesis. This also raises some theoretical difficulties when the competing models are incompletely specified. See e.g. Ghysels and Hall (1990) and Smith (1992).

The paper is divided into six more sections. Section 2 describes in some detail the general model selection framework for nonlinear dynamic models. A series of hypotheses about the asymptotic fit of the models are put forth. A basic result on the asymptotic properties of our test statistics is established under general conditions on the model selection criteria and the estimators used. The next two sections seek more primitive conditions ensuring that the conditions of this basic result hold. As the preceding examples of model selection procedures suggest, we distinguish two cases. Section 12 considers the case where model selection criteria can be viewed as optimands of the Gallant and White (1988) type although estimators that are employed are not those that maximize these criteria. In contrast, Section 18 specializes to the case where models are estimated by maximizing some criteria that are used subsequently for model selection. This section also covers the case where models are estimated by means of two-stage procedures in which the second stage involves optimizing goodness-of-fit conditional upon preliminary estimates of the nuisance parameters. Section 24 considers estimation of the asymptotic variance of the difference in goodness-of-fit despite possible misspecification of the competing models. This step is necessary for the construction of our test statistics. Section 31 discusses a critical condition for our test statistics to be asymptotically normal. Essentially, this condition requires that the estimated models be nonnested. Throughout, our theoretical results are illustrated with the comparison between two nonnested autoregressive models based on their in-sample and out-of-sample MSEP. Section 36 concludes. An appendix collects the proofs of our results.