## Introduction

There is a great deal of literature on Bayesian model comparison for nonspatial regression models, where alternative models consist of those based on differing matrices of explanatory variables. For example, Zellner (1971) sets forth the basic Bayesian theory behind model comparison for the case where a discrete set of *m* alternative models are under consideration. The approach involves specifying prior probabilities for each model as well as prior distributions for the regression parameters. Posterior model probabilities are then calculated and used for inferences regarding the alternative models based on different sets of explanatory variables.

More recent works such as that by Fernández, Ley, and Steel (2001a, b) consider cases where the number of possible models *m* is large enough such that calculation of posterior probabilities for all models is difficult or infeasible. A Markov Chain Monte Carlo model composition methodology known as *MC*^{3} proposed by Madigan and York (1995) has gained popularity in the mathematical statistics and econometrics literature (e.g., Denison, Mallick, and Smith 1998; Raftery, Madigan, and Hoeting 1997; Fernández, Ley, and Steel 2001a, b). The popularity of *MC*^{3} arises in part from its ability to provide a theoretically justifiable approach to a question that often arises in regression modeling—which explanatory variables are most important in explaining variation in the dependent variable vector?

This article develops the *MC*^{3} methodology for two important spatial econometric regression models that have received widespread application (LeSage and Pace 2004). A host of additional complications arise when attempting to extend regression-based approaches to these spatial regression estimators. We provide theoretical details regarding these issues as well as computationally efficient solutions.

The models for which we provide an *MC*^{3} modeling approach are members of a class of spatial regression models introduced in Ord (1975) and elaborated in Anselin (1988), shown in (1) and (2). The sample of *n* observations in the vector *y* represents a cross-section of regions located in space, for example, counties, states, or countries.

The *n* by *k* matrix *X* contains explanatory variables as in ordinary least-squares (OLS) regression, β is an associated *k* by 1 parameter vector, and ɛ is an *n* by 1 disturbance vector, which we assume takes the form . The scalar α represents the intercept parameter and ι_{n} an *n* by 1 vector of ones. The *n* by *n* matrix *W* specifies the structure of spatial dependence between observations (*y*) or disturbances (*u*), with a common specification having elements *W*_{ij}>0 for observations *j*=1, …, *n* sufficiently close (as measured by some distance metric) to observation *i*. As noted above, observations reflect geographical regions, and so distances might be calculated on the basis of the centroid of the regions/observations. The expressions *Wy* and *Wu* produce vectors that are often called spatial lags, and ρ and λ denote scalar parameters to be estimated along with β and σ^{2}. Nonzero values for the scalars ρ and λ indicate that the spatial lags exert an influence on *y* or *u*. In this study, we refer to the model in (1) as the spatial autoregressive model (SAR) and that in (2) as the spatial error model (SEM). We note that setting ρ=0 in the SAR model or λ=0 in the SEM model produces a least-squares regression model.

There are a number of alternative ways to specify the spatial connectivity structure for the matrix *W*, where a particular specification selects elements *i*, *j* that are nonzero, reflecting spatial dependence between observations *i* and *j*. By convention, the diagonal elements of the spatial weight matrix *W* are set to zero to preclude an observation or disturbance term from dependence on its own value. We do not focus on specification issues related to the weight matrix *W*, a potentially important aspect of SAR and SEM model specification. We focus exclusively on model specification issues regarding the choice of explanatory variables that appear in the matrix *X*. An implication of this is that estimates for β, ρ, λ, and σ are conditional on the particular weight structure used, but this is implicitly assumed by conventional maximum likelihood estimation algorithms for the models in (1) and (2), which treat the weight matrix *W* as given or fixed.

A related model specification issue that arises in applied work is selection of the appropriate model, SAR or SEM in our case.^{1} Numerous Monte Carlo studies of alternative systematic or sequential specification search approaches that address this facet of spatial regression model specification have been published in the literature, and a review of these can be found in Florax, Folmer, and Rey (2003).

The literature focusing on the specific question of which explanatory variables should be included in the *X* matrix for the class of models including SAR and SEM models is less prevalent. Hepple (1995a, b) sets forth conventional Bayesian methods for the case involving a small set of *m* models, where it is feasible to calculate *m* posterior model probabilities. To our knowledge, the topic of *MC*^{3} methodology for this class of models has not yet been discussed.

In the next section, we review a unified Bayesian approach to model specification in the context of the SAR and SEM models. We extend the discussion set forth in Hepple (1995a, b) by including informative priors for the parameters β, λ, ρ, and σ in these models. Our presentation in the third section focuses on the *MC*^{3} methodology and details needed to apply this approach to the SAR and SEM models. The fourth section of the article provides an illustrative example of the proposed method in a model of origin–destination migration flows across the 48 contiguous U.S. states and the District of Columbia. There are numerous other spatial econometric applications where interest focuses on which explanatory variables are most important, providing a large number of opportunities to apply the methods described here.