Bayesian Model Averaging for Spatial Econometric Models


James P. LeSage, McCoy Endowed Chair of Urban and Regional Economics, McCoy College of Business Administration, Department of Finance and Economics, Texas State University—San Marcos, San Marcos, TX 78666


We extend the literature on Bayesian model comparison for ordinary least-squares regression models to include spatial autoregressive and spatial error models. Our focus is on comparing models that consist of different matrices of explanatory variables. A Markov Chain Monte Carlo model composition methodology labeled MC3 by Madigan and York is developed for two types of spatial econometric models that are frequently used in the literature. The methodology deals with cases where the number of possible models based on different combinations of candidate explanatory variables is large enough such that calculation of posterior probabilities for all models is difficult or infeasible. Estimates and inferences are produced by averaging over models using the posterior model probabilities as weights, a procedure known as Bayesian model averaging. We illustrate the methods using a spatial econometric model of origin–destination population migration flows between the 48 U.S. states and the District of Columbia during the 1990–2000 period.


There is a great deal of literature on Bayesian model comparison for nonspatial regression models, where alternative models consist of those based on differing matrices of explanatory variables. For example, Zellner (1971) sets forth the basic Bayesian theory behind model comparison for the case where a discrete set of m alternative models are under consideration. The approach involves specifying prior probabilities for each model as well as prior distributions for the regression parameters. Posterior model probabilities are then calculated and used for inferences regarding the alternative models based on different sets of explanatory variables.

More recent works such as that by Fernández, Ley, and Steel (2001a, b) consider cases where the number of possible models m is large enough such that calculation of posterior probabilities for all models is difficult or infeasible. A Markov Chain Monte Carlo model composition methodology known as MC3 proposed by Madigan and York (1995) has gained popularity in the mathematical statistics and econometrics literature (e.g., Denison, Mallick, and Smith 1998; Raftery, Madigan, and Hoeting 1997; Fernández, Ley, and Steel 2001a, b). The popularity of MC3 arises in part from its ability to provide a theoretically justifiable approach to a question that often arises in regression modeling—which explanatory variables are most important in explaining variation in the dependent variable vector?

This article develops the MC3 methodology for two important spatial econometric regression models that have received widespread application (LeSage and Pace 2004). A host of additional complications arise when attempting to extend regression-based approaches to these spatial regression estimators. We provide theoretical details regarding these issues as well as computationally efficient solutions.

The models for which we provide an MC3 modeling approach are members of a class of spatial regression models introduced in Ord (1975) and elaborated in Anselin (1988), shown in (1) and (2). The sample of n observations in the vector y represents a cross-section of regions located in space, for example, counties, states, or countries.


The n by k matrix X contains explanatory variables as in ordinary least-squares (OLS) regression, β is an associated k by 1 parameter vector, and ɛ is an n by 1 disturbance vector, which we assume takes the form inline image. The scalar α represents the intercept parameter and ιn an n by 1 vector of ones. The n by n matrix W specifies the structure of spatial dependence between observations (y) or disturbances (u), with a common specification having elements Wij>0 for observations j=1, …, n sufficiently close (as measured by some distance metric) to observation i. As noted above, observations reflect geographical regions, and so distances might be calculated on the basis of the centroid of the regions/observations. The expressions Wy and Wu produce vectors that are often called spatial lags, and ρ and λ denote scalar parameters to be estimated along with β and σ2. Nonzero values for the scalars ρ and λ indicate that the spatial lags exert an influence on y or u. In this study, we refer to the model in (1) as the spatial autoregressive model (SAR) and that in (2) as the spatial error model (SEM). We note that setting ρ=0 in the SAR model or λ=0 in the SEM model produces a least-squares regression model.

There are a number of alternative ways to specify the spatial connectivity structure for the matrix W, where a particular specification selects elements i, j that are nonzero, reflecting spatial dependence between observations i and j. By convention, the diagonal elements of the spatial weight matrix W are set to zero to preclude an observation or disturbance term from dependence on its own value. We do not focus on specification issues related to the weight matrix W, a potentially important aspect of SAR and SEM model specification. We focus exclusively on model specification issues regarding the choice of explanatory variables that appear in the matrix X. An implication of this is that estimates for β, ρ, λ, and σ are conditional on the particular weight structure used, but this is implicitly assumed by conventional maximum likelihood estimation algorithms for the models in (1) and (2), which treat the weight matrix W as given or fixed.

A related model specification issue that arises in applied work is selection of the appropriate model, SAR or SEM in our case.1 Numerous Monte Carlo studies of alternative systematic or sequential specification search approaches that address this facet of spatial regression model specification have been published in the literature, and a review of these can be found in Florax, Folmer, and Rey (2003).

The literature focusing on the specific question of which explanatory variables should be included in the X matrix for the class of models including SAR and SEM models is less prevalent. Hepple (1995a, b) sets forth conventional Bayesian methods for the case involving a small set of m models, where it is feasible to calculate m posterior model probabilities. To our knowledge, the topic of MC3 methodology for this class of models has not yet been discussed.

In the next section, we review a unified Bayesian approach to model specification in the context of the SAR and SEM models. We extend the discussion set forth in Hepple (1995a, b) by including informative priors for the parameters β, λ, ρ, and σ in these models. Our presentation in the third section focuses on the MC3 methodology and details needed to apply this approach to the SAR and SEM models. The fourth section of the article provides an illustrative example of the proposed method in a model of origin–destination migration flows across the 48 contiguous U.S. states and the District of Columbia. There are numerous other spatial econometric applications where interest focuses on which explanatory variables are most important, providing a large number of opportunities to apply the methods described here.

A Bayesian approach to model specification

Other authors have set forth the Bayesian theory behind model comparison that involves specifying prior probabilities for each of the m alternative models M=M1, M2, …, Mm under consideration, which we label π(Mi), i=1, …, m, as well as prior distributions for the parameters π(η), where η=(ρ, α, β, σ) (e.g., Zellner 1971; Fernández, Ley, and Steel 2001b). Our focus here is on comparing models with different explanatory variables.

If the sample data are to determine the posterior model probabilities, the prior probabilities should be set to equal values of 1/m, making each model equally likely a priori. These are combined with the likelihood for y conditional on η as well as the set of models M, which we denote p (y|η, M). The joint probability for M, η, and y takes the form


Application of Bayes rule produces the joint posterior for both models and parameters as


The posterior probabilities regarding the models take the form


which requires integration over the parameter vector η. Our focus in the next two subsections is development of the marginal posterior in (5) for the SAR and SEM models as this plays a key role in the MC3 model comparison methodology set forth in the following section.

Log-marginal posterior for the SAR model

We initially consider the SAR model, where the likelihood function for the parameters η=(α, β, σ, ρ), based on the data y takes the form shown in (6), where we include the spatial weight matrix W to indicate that the likelihood is conditional on the particular weight matrix used in the model. That is, the weight matrix is taken as given and treated in the same manner as the sample data information in y, X.


We can define the prior distributions for the parameters in η using a number of different approaches. An issue that arises in Bayesian model comparison is that posterior model probabilities can be sensitive to alternative specifications for the prior information. Use of diffuse priors on the model parameters might seem desirable in this situation, but can lead to paradoxical outcomes as noted by Lindley (1957). We draw on the work of Fernández, Ley, and Steel (2001b) for the case of least-squares models and rely on Zellner's g-prior for the parameters β in the model. An uninformative prior is assigned to the intercept parameter α and a conventional gamma prior is used for the parameter σ. We discuss two possible priors for the spatial dependence parameter ρ. In the sequel, we provide details regarding the priors for the parameters α, β, σ, and ρ in turn.

A great deal of computational simplicity arises if we follow Fernández, Ley, and Steel (2001a) and employ Zellner's g-prior (Zellner 1986) for the parameters β in the SAR model. In addition to simplifying matters, Fernández, Ley, and Steel (2001a) provide a theoretical justification for use of the g-prior as well as Monte Carlo evidence comparing nine alternative approaches to setting the hyperparameter g. Based on the Monte Carlo experiments, they recommend setting inline image, where n denotes the number of observations, and inline image the number of explanatory variables in the least-squares matrix X associated with the model indexed by Mi.

We allow for an intercept parameter α in the model as the dependent variable vector y is not transformed. The matrix X represents the set of explanatory variables excluding the constant term, transformed by subtracting the means of each variable vector and dividing by the standard deviation. We assume that the spatial lag vector Wy appears in all models as does the intercept term, leaving only the variable vectors in the matrix X subject to change as we compare alternative models. This approach mirrors that developed by Fernández, Ley, and Steel (2001a), where the intercept term appears in all models. Another justification for this approach is that in the absence of spatial dependence where the parameter ρ equals zero, we would rely on standard regression-based models of the type described in Fernández, Ley, and Steel (2001a). In our spatial context, it seems intuitively appealing that, in the absence of explanatory variables X, the dependent variable y is modeled to follow a simple spatial autoregression, yWy+αιn+ɛ.

We can adopt an approach similar to that of Fernández, Ley, and Steel (2001a), who rely on a noninformative prior for the intercept term α, which seems desirable as the vector y is untransformed. The g-prior on the regression coefficients for model Mi, which we designate as inline image, takes the form shown in (7). One motivation behind this prior setting is a desire to provide prior information that will not exert undue influence on posterior conclusions regarding choices between alternative models based on different sets of explanatory variables. Another aspect of the g-prior that is attractive in situations involving model comparisons based on alternative sets of explanatory variables is the ability of the before take the covariance structure of the explanatory variables in X into account.


Note that the studentized form of X will facilitate the prior mean of zero placed on β. Given the normal prior for the parameters β, an inverted gamma prior for σ2 shown in (8) with parameters ν and inline image allows us to draw on forms from the conjugate nature of these two prior distributions from standard Bayesian regression theory.


The results that we derive can be extended to the case of a noninformative prior on σ2, as this arises as a special case when inline image.

When considering a prior for the parameter ρ, we note that numerical integration will be required with respect to this parameter to produce the marginal posterior distribution for the SAR model. This allows flexibility in specifying a prior πr(ρ). However, we note that Lemma 2 in Sun, Tsutakawa, and Speckman (1999) indicates a restricted region of support for the parameter ρ. For row-standardized weight matrices W (where the row-sums are unity) typically used in application of these models, ρ must lie in the interval inline imagewhere μmin denotes the minimum eigenvalue of the spatial weight matrix W (Lemma 2 in Sun, Tsutakawa, and Speckman 1999). Further, μmin<0, so inline image. We also note that for many spatial data sets that exhibit significant positive spatial dependence, a restriction of 0<ρ<1 may be appropriate. For more general cases where negative spatial dependence is not to be ruled out a priori, a restriction −1<ρ<1 is often reasonable in that this represents the effective region of support.

We suggest two alternative priors for the parameter ρ consistent with these notions: one being a uniform prior on the interval [−1, 1] and the other a Beta (a0a0) prior centered on zero.


Fig. 1 depicts prior distributions associated with hyperparameter values for the Beta prior of a0=1.01, 1.1, and 2, illustrating that values of a0 near unity produce a relatively uninformative prior that downweights to zero the prior weight placed on end points of the interval for ρ, consistent with theoretical restrictions. We note that this prior can also be used for the spatial dependence parameter λ in the SEM model discussed later.

Figure 1.

 Priors for ρ, λ.

Using Bayes' theorem, the marginal distribution of the SAR model can be written as




Using the properties of the multivariate normal probability density function (pdf) and the inverted gamma pdf to analytically integrate with respect to β and σ, we can arrive at an expression for the log marginal that will be required for model comparison purposes. We note that, as the intercept term is common to all models, this leads to n−1 as the degrees of freedom in the posterior in the following equation




An important point regarding expression (11) is that we must rely on numerical integration to convert this into a scalar. This is in stark contrast to the results for conventional regression models, where analytical integration over the parameters β and σ leads to an expression that can be used to produce a scalar result. Details regarding computational issues surrounding numerical integration are set forth in Appendix A

Log-marginal posterior for the SEM model

For the case of the SEM model, inline image, inline image, we also rely on a noninformative prior for the intercept term α and constrain this term to appear in all models. Again, the matrix X is transformed to studentized form, but not the vector y. As a prior mean of zero is assigned to the parameters β by the g-prior, it is important that the X matrix is well scaled. We rely on the normal-gamma priors for β, σ2, but note that the matrix X does not contain a vector of ones, and the vector β does not contain an intercept term.

For this model, we define inline image and inline image, noting that X is studentized (or well scaled) to facilitate the prior mean of zero placed on the parameters β. A departure of the SEM from the SAR model is reliance on a g-prior based on the covariance matrix constructed from inline image, that is, inline image. The motivation for this is that the SEM estimates are: inline image, and so following Zellner (1986), we should rely on the associated covariance matrix to form our g-prior. As in the case of the SAR model, this prior greatly simplifies the calculations needed to construct the log-marginal posterior for this model. It is convenient to define


which we use in (12) and the definitions that follow.




We note that when λ=0, we have inline image and inline image, and so our result is identical to that of Fernández, Ley, and Steel (2001a) in this case. As in the case of the SAR model, we must rely on numerical integration over the spatial dependence parameter λ to convert this into a scalar. Details regarding computational issues surrounding numerical integration are set forth in Appendix A

MC3 spatial model comparison

A large literature on Bayesian model averaging (BMA) over alternative linear regression models containing differing explanatory variables exists (Raftery, Madigan, and Hoeting 1997; Fernández, Ley, and Steel 2001a, b). The MC3 approach introduced in Madigan and York (1995) is set forth here for the SAR and SEM models. For a regression model with k possible explanatory variables, there are 2k possible ways to select regressors to be included or excluded from the model. For k=15 say, we have 32,768 possible models, ruling out computation of the log-marginal for all possible models as impractical.

The MC3 method of Madigan and York (1995) devises a strategic stochastic Markov chain process that can move through the potentially large model space and sample regions of high posterior support. This eliminates the need to consider all models by constructing a sampler that explores relevant parts of the very large model space. If we let M denote the current model state of the chain, models are proposed using a neighborhood, nbd(M), which consists of the model M itself along with models containing either one variable more (labeled a “birth step”) or one variable less (a “death step”) than M. A transition matrix, q, is defined by setting q (MM′)=0 for all M′∉nbd(M) and q (MM′) constant for all M′∈nbd(M). The proposed model M′ is compared with the current model state M using the acceptance probability shown in following equation:


The use of univariate numerical integration methods described in Appendix A allows us to construct a Metropolis–Hastings sampling scheme that implements the MC3 method. A vector of the log-marginal values for the current model M is stored during sampling along with a vector for the proposed model M′. These are then scaled and integrated to produce the ratio p(M′|y)/p(M|y) in (14) that determines acceptance or rejection of the proposed model. In contrast to conventional regression models, there is a need to store log-marginal density vectors for each unique model found during the MCMC sampling to calculate posterior model probabilities over the set of all unique models visited by the sampler.

Although the use of birth and death processes in the context of Metropolis–Hastings sampling will theoretically produce samples from the correct posterior, Richardson and Green (1997), among others, advocate incorporating a “move step” in addition to the birth and death steps into the algorithm. We rely on this approach as there is evidence that combining these move steps improves the convergence of the sampling process (Denison, Mallick, and Smith 1998; Richardson and Green 1997). The move step takes the form of replacing a randomly chosen single variable in the current explanatory variables matrix with a randomly chosen variable not currently in the model. Specifically, we might propose a model with one less explanatory variable (death step) and then add an explanatory variable to this new model proposal (birth step). This leaves the resulting model proposal with the same dimension as the original one with a single component altered. This type of sampling process is often labeled “reversible jump” MCMC. The model proposals that result from birth, death, and move steps are all subjected to the Metropolis–Hastings accept/reject decision shown in (14), which is valid as long as the probabilities of birth, death, and move steps have an equal probability of 1/3.

The Bayesian solution to incorporating uncertainty regarding specification of the appropriate explanatory variables into the estimates and inferences is to “average” over alternative model specifications. This is in contrast with much applied work that relies on a single model specification identified using various model comparison criterion that lead to a “most preferred model.” The averaging involves weighting alternative model specifications by their posterior model probabilities. We note that the MC3 procedure identifies models associated with particular explanatory variables and assigns a posterior model probability to each of these models. Like all probabilities, the posterior model probabilities sum to unity, and so they can be used as weights to form a linear combination of estimates from models based on differing explanatory variables. This weighted combinations of sampling draws from the posterior are used as the basis for posterior inference regarding the mean and dispersion of the individual parameter estimates.

Applied illustrations

In the first section, we contrast results from least-squares and SAR model MC3 procedures using a data generated example based on a small data set and a small set of candidate explanatory variables. This allows us to validate our MC3 approach in a setting where the true parameter values and model specification is known. In the second section, we apply the least-squares and SAR model MC3 and BMA methods to a spatial sample of 2401 origin–destination migration flows from each of the 48 U.S. states plus the District of Columbia. As in most models of origin–destination flows, we have two sets of explanatory variables: one associated with characteristics of the origin regions/states and another with the destination characteristics. This provides an illustration of how the SAR model MC3 and BMA methodology works in applied practice. As the sample data exhibits strong spatial dependence, we would expect differences between the least-squares and spatial results concerning which of the origin–destination characteristics are important in determining population migration flows.

A small sample illustration

If the data-generating process is the SAR model, then inline image, and least-squares estimates for β are biased and inconsistent (see LeSage and Pace 2004). In these cases, we would expect that least-squares MC3 procedures would not produce accurate estimates and inferences regarding which variables are important. If the data-generating process is the SEM model, then inline image, where inline image. Least-squares estimates are unbiased but inefficient (analogous to the case of serial correlation in the disturbances), and so we might expect better results for least-squares MC3 procedures.

To illustrate differences between least-squares and spatial results, we used a 49-location data set from Anselin (1988) containing Columbus, Ohio neighborhoods, along with observations on the median housing values (hvalue) for each neighborhood and household income (hincome). These variable values were used as the basis for a data-generated experiment where the true parameter values as well as the true model specification will be known. The spatial weight matrix was based on the four nearest neighbors to each of the spatial observations, and the matrix was row-normalized to have row sums of unity. Two additional explanatory variables were constructed using spatial lags of the house values and household income. These variables represent an average of housing values and household income from the nearest four neighborhoods, constructed through multiplication of these variable vectors by the row-normalized spatial weight matrix W. Intuitively, housing values and household income levels in nearby neighborhoods might contribute to explaining variation in the y variable, which was neighborhood crime rates in the Anselin (1988) application. This produces a model shown in the following equation:


The intercept term and spatial lag are included in all models, and so the number of possible candidate variables is 4, leading to 24=16 possible models. This makes it simple to validate our MC3 algorithms by comparison with exact results based on posterior model probabilities for the set of 16 models. The explanatory variables were put in studentized form to accommodate the Zellner g-prior used by the MC3 procedures, which relies on a prior mean of zero for the β coefficients.

The true values for α, βi, i=1, …, 4 were set to values of inline image and inline image. A standard normal random deviate was use for ɛ in the generating process. The value of ρ was set to 0.6, indicating moderate spatial dependence.

Least-squares estimates as well as maximum likelihood SAR estimates are shown in Table 1. The bias associated with the least-squares estimates appears to be substantial because the two sets of estimates differ widely in magnitude, and the parameter ρ is large and significantly different from zero. The greatest disagreement in the two sets of estimates is with respect to the two spatially lagged explanatory variables, which might be a focus of model comparison and inference. Intuitively, it might be of interest whether housing values and household income levels in nearby neighborhoods contribute to explaining variation in say neighborhood crime rates. We note that the sign of the spatially lagged house value variable is different in the least-squares and SAR regression, and the significance of the spatial lag of household income is different.

Table 1.   OLS and SAR Model Estimates
VariableTruthOLS estimatesSAR estimates
Coefficientt statisticProbabilityCoefficientt statisticProbability
  1. OLS, ordinary least squares; SAR, spatial autoregressive model.

ρ    0.6375.860.000

As the number of possible models here is 24=16, it would have been possible to simply calculate the log-marginal posterior for these 16 models to find posterior model probabilities. Instead, we applied our MC3 algorithm to least-squares and SARs. A run of 10,000 draws was sufficient to uncover all 16 unique models, requiring 15 and 27 s, respectively, for the OLS and SAR MC3 procedures.2

Information regarding models that exhibited posterior model probabilities >1% is given in Table 2 for the OLS MC3 procedure and Table 3 for the SAR MC3 procedure, with the posterior model probabilities shown in the last row of the two tables. These tables use “1” and “0” indicators for the presence or absence of variables in each of the models presented in the tables.

Table 2.   OLS BMA Model Selection Information
  1. OLS, ordinary least squares; BMA, Bayesian model averaging.

Model probabilities0.0150.0970.0970.1460.644
Table 3.   SAR BMA Model Selection Information
  1. SAR, spatial autoregressive model; BMA, Bayesian model averaging.

Model probabilities0.0180.0180.0510.1440.2370.506

From the tables, we see that in the case of OLS, the true model was not among the five models with posterior model probabilities >1%. (It was assigned a model probability of 0.1%.) In contrast, the SAR MC3 procedure identified the true model and assigned it a posterior model probability >50%. This result should not be surprising given that the least-squares estimator is biased and inconsistent in the presence of spatial dependence. The SAR procedure resulted in the two variables, income and housing values used to generate the sample y vector, appearing in the four highest probability models that together account for nearly 94% of the posterior probability mass. In contrast, two of the top five least-squares models did not contain both of these variables, and these two models accounted for nearly 20% of the posterior probability mass.

We conclude this discussion by noting that the small sample example based on only 4 candidate variables suggests that the use of least-squares-based MC3 procedures in the presence of spatial dependence will exert an impact on the model selection inferences. Presumably, this impact is an adverse one as least-squares estimates are either biased and inconsistent for the case of the spatial lag model (SAR), or inefficient in the case of the SEM.

A large sample illustration

An example that models variation in state-to-state migration patterns using origin to destination flows provides a large sample of 2401 observations. Starting with an n by n square matrix of interregional flows from each of the n=49 origin states (including the District of Columbia), to each of the n destination states, we can produce an n2 by one vector of these flows by stacking the columns of the flow matrix into a variable vector that we designate as y. Without loss of generality, we assume that each of the n columns of the flow matrix represents a different origin and the n rows reflect destinations. The first n elements in the stacked vector y reflect flows from origin 1 to all n destinations. The last n rows of this vector represent flows from origin n to destinations 1 to n. Typically, the diagonal elements of a flow matrix containing flows within a region, for example, from origin 1 to destination 1, origin 2 to destination 2, and so on, will be large relative to the off-diagonal elements representing interregional flows. We set these elements to values of zero to focus our efforts on explaining variation in flows between states. In addition, the vector y was constructed to represent the difference between (logged) state-to-state migration flows during the 1995–2000 period and the flows during the 1985–1990 period. This transformation converts the dependent variable into growth rates of flows over the 10-year period from 1990 to 2000, which produced a relatively normally distributed vector of 492=2,401 observations.

A conventional gravity or spatial interaction model based on a nonspatial regression might be used to explain variation in the vector y of origin–destination flows (see Fischer, Scherngell, and Jansenberger 2006; Sen and Smith 1995; Tiefelsdorf 2003). Here, we contrast the nonspatial regression to a spatial regression model of the type suggested by LeSage and Pace (2005). These regression models rely on an n by k matrix of explanatory variables that we label xd, containing k characteristics for each of the n destinations. Given the format of our vector y, where observations 1 to n reflect flows from origin 1 to all n destinations, this matrix would be repeated n times to produce an n2 by k matrix of destination characteristics that we represent as Xd for use in the regression. A second matrix containing origin characteristics that we label Xo would be constructed for use in the gravity model. This matrix would repeat the characteristics of the first origin n times to form the first n rows of Xo, the characteristics of the second origin n times for the next n rows of Xo, and so on, resulting in an n2 by k matrix of origin characteristics. Typically, the distance from each origin to destination is also included as an explanatory variable vector in the gravity model, and perhaps nonlinear terms such as distance-squared. We let D represent an n2 by 1 vector of these distances from each origin to each destination formed by stacking the columns of the origin–destination distance matrix into a variable vector.3

In contrast to the traditional nonspatial gravity model, LeSage and Pace (2005) note that a spatial econometric model of the variation in origin–destination flows would be characterized by: (1) reliance on spatial lags of the dependent variable vector, resulting in a SAR or (2) of the disturbance terms, producing a SEM. Spatial weight matrices represent a convenient and parsimonious way to define the spatial dependence or connectivity relations among observations.

For our applied illustration we use ideas from LeSage and Pace (2005) and rely on a spatial weight matrix consisting of inline image, where W is an n by n spatial weight matrix based on contiguity of the n regions. The n2 by n2 matrix inline image is normalized to have row sums of unity and captures spatial dependence among the origin–destination flows by creating a spatial lag vector inline image that averages over neighbors to both origin and destination regions. This leads to a SAR model:


In (16), the explanatory variable matrices Xd, Xo represent n2 by k matrices containing destination and origin characteristics, respectively, and the associated k by 1 parameter vectors are βd and βo. The vectors D and D2 denote the vectorized origin–destination distance matrix and its square with γ12 scalar parameters. As is conventional in SARs, we assume inline image.

As candidate explanatory variables, a series of 19 destination and origin specific variables plus distance and distance-squared were used, for a total of 40 explanatory variables. These variables were transformed using logs, so all coefficients should be interpretable as reflecting the elasticity response of the growth rate in state-to-state migration over the period 1990–2000 to changes in each variable. This scaling should help accommodate the prior mean of zero employed in the Zellner g-prior. The distance and distance-squared vectors were also logged. A description of the candidate explanatory variables along with the variable names used to report estimation results is in Table 4.

Table 4.   Socioeconomic Demographic Variables
Variable nameDescription
AreaThe origin–destination state area in square miles
MalesThe number of males in 1990
FemalesThe number of females in 1990
PcincomePer capita income in 1990
YoungThe # of persons aged 22–29 in 1990
Near retirementThe # of persons aged 60–64 in 1990
RetiredThe # of persons aged 65–74 in 1990
Born in stateThe # of persons born in the state in 1990
Foreign bornThe # of foreign born persons in the state in 1990
Recent immigrantsThe # of recent foreign immigrants, during the years 1980–1990
College gradsThe # of persons over age 25 with college degrees in 1990
Grad/ProfessionThe # of persons over age 25 with graduate/professional degrees in 1990
House valueThe median house value in 1990
Travel timeMean travel time to work in 1990
UnemploymentThe unemployment rate in 1990
Labor forceThe # of persons over 16 years employed in 1990
Median rentThe median rent in 1990
Retirement incomeMedian retirement income in 1990
Self-employedThe # of persons self-employed in 1990
DistanceThe origin–destination state distance
Distance2The distance variable squared

Runs involving 250,000 draws were used to test for convergence in both OLS and SAR MC3 results. Two sets of MCMC draws were carried out using randomly selected starting variables in the explanatory variables matrix, resulting in model averaged estimates that were identical to three decimal places for all parameters and in most cases identical to four decimal places.

For the case of least-squares MC3, the sampling run of 250,000 draws produced 49,246 unique models. The 10 models with the highest posterior model probabilities are shown in Table 5, which takes the same format as Table 2. Variable names are preceded with a symbol “D-“ or “O-” to indicate destination and origin characteristics. The top 10 models accounted for only 44.30% of the probability mass, with 95 models having posterior model probabilities >0.1%, accounting for 83.02% probability mass, 473 models exhibiting model probabilities >0.01%, totaling 95.51% probability mass, 1539 models with probabilities >0.001% totaling 98.98 probability mass, and 4016 models with probabilities > 0.0001%, accounting for 99.83% of the probability mass.

Table 5.   Variables Entering the Top 10 Least-Squares Models
Variable name/model10987654321
D-Near retirement1110111111
D-Born in state1111111111
D-Foreign born0000000000
D-Recent immigrants0111000100
D-College grads0000000000
D-House value0000000000
D-Travel time0000000000
D-Labor force1111111111
D-Median rent1111111111
D-Retirement income1111111111
O-Near retirement1111111111
O-Born in state0000000000
O-Foreign born0000000000
O-Recent immigrants1111111111
O-College grads0010010010
O-House value1111111111
O-Travel time1111111111
O-Labor force0000000000
O-Median rent0000000000
O-Retirement income1110110010
Model probabilities0.0200.0200.0230.0260.0270.0380.0590.0690.0750.086

To illustrate convergence of the MC3 sampling process, we note that results from a second run of 250,000 draws based on a random selection of starting variables produced only 47,699 unique models, but the top 10 model probabilities were identical to those reported in Table 5. In addition, there were 96 models having posterior model probabilities >0.1%, accounting for 83.16% probability mass, 473 models exhibiting model probabilities >0.01%, accounting for 95.50% probability mass, 1550 models with probability >0.001%, accounting for 98.97 total probability mass, and 4039 models with probability >0.0001%, accounting for 99.83% of the probability mass. This suggests that the BMA procedure is finding regions of the large model space with posterior support and ignoring regions with low support.

Table 5 shows the variables appearing in the 10 highest posterior probability models. Variables that appear in each model are designated with a “1” and those that do not appear with a “0.” We find that 13 of the 38 origin–destination variables appear in all of the 10 highest probability models, and these variables are associated with posterior probabilities of inclusion >65% (probabilities of inclusion are shown in Table 7).4 One variable “D-Near retirement” appeared in nine of the 10 highest probability models. After this, we see a decline in variables appearing in only six of the 10 models reported in the table, as well as a decline in the probability of inclusion to below 50%.

Table 7.   Posterior Probabilities for Variables Entering the Model
Variable nameOLS MC3SAR MC3Difference
  1. OLS, ordinary least squares; SAR, spatial autoregressive model.

D-Near retirement0.50130.02240.4789
D-Born in state0.94060.02310.9175
D-Foreign born0.04990.01590.0340
D-Recent immigrants0.45310.01760.4355
D-College grads0.06620.02050.0457
D-House value0.08380.02460.0592
D-Travel time0.07300.0990−0.0260
D-Labor force0.93820.45170.4865
D-Median rent0.73270.02130.7114
D-Retirement income0.77770.02280.7549
O-Near retirement0.85620.9089−0.0527
O-Born in state0.11980.02210.0977
O-Foreign born0.07660.02110.0555
O-Recent immigrants0.85550.8791−0.0236
O-College grads0.20790.02330.1846
O-House value0.94030.89070.0496
O-Travel time0.65080.02140.6294
O-Labor force0.11640.02240.0940
O-Median rent0.21890.02180.1971
O-Retirement income0.36330.02140.3419

A focus of inference for gravity models would be the relative importance of origin versus destination characteristics in explaining variation in state-to-state population migration. Table 5 shows that five of the 13 variables that appear in all of the 10 highest probability models are destination characteristics as well as the one variable that appeared in nine of the 10 top models. This leaves eight of the 13 variables appearing in all 10 top models as origin characteristics, suggesting a slight edge for the case of “origin push” as opposed to “destination pull.” It will be of interest to contrast this result with those from the SAR.

For the SAR MC3 procedure, 250,000 draws produced only 5220 unique models, a substantially lower number of models than in the case of least squares. The top 10 models are reported in Table 6, accounting for 76.40% of the total probability mass. Also, in contrast with the least-squares results, we found that 37 models with posterior probabilities >0.1% accounted for 96.11% of the probability mass and 120 with probabilities >0.01% accounted for 99.01% of the total probability mass. In summary, the set of high posterior probability models resulting from the spatial model were much more compact than those from least squares. As in the case of least squares, a second run of 250,000 draws produced nearly identical results.

Table 6.   Variables Entering the Top 10 Spatial Autoregressive Models
Variable name/model10987654321
D-Near retirement1000000000
D-Born in state0010010100
D-Foreign born0000000000
D-Recent immigrants0000000000
D-College grads0000000000
D-House value0000000000
D-Travel time0101100111
D-Labor force0010010000
D-Median rent0000000000
D-Retirement income0000000000
O-Near retirement1111111111
O-Born in state0000000000
O-Foreign born0000000000
O-Recent immigrants1111111111
O-College grads0000000001
O-House value1111111111
O-Travel time0000000000
O-Labor force0000000000
O-Median rent0000000000
O-Retirement income0000000000
Model probabilities0.0230.0250.0260.0290.0350.0400.0450.0630.1120.366

The SAR model MC3 results are presented in Table 6 in the same format as the least-squares results. In contrast with the least-squares results, only six variables enter all 10 top models, and these are all origin characteristics. The probability for variable inclusion (shown in Table 7) for these six variables was 0.87 or higher. One destination characteristic entered 8 of the 10 top models, “D-Grad/Profession,” but exhibited a probability of inclusion <50%. The conclusion we would draw here regarding the importance of origin versus destination characteristics is quite different from that reported earlier for least squares. It is interesting that the six origin characteristics that appear in all 10 top SARs also appear in all 10 top least squares models. The spatial autoregressive results appear to exclude a great number of variables from appearing in the model relative to least-squares. There is a theoretical motivation for this type of result. Ignoring the intercept term, we note that least-squares estimates are likely to be biased upward in the face of positive spatial dependence (ρ>0) when the matrix X contains logs of positive values, as shown in the following equation:


Posterior probabilities for each of the 38 origin–destination and distance variables entering the model are shown in Table 7 for both the least-squares and SAR MC3 procedures. The last column in the table shows the difference between the OLS and SAR model results. We see 6 of 40 cases where the differences are >50%, and another 8 of 40 cases where these differences are between 0.29 and 0.48, pointing to a number of cases where the inclusion probabilities from the least-squares procedure are higher than the spatial model. In contrast, we see seven negative differences, with one equal to −0.4189, and the remainder <−0.10, reflecting a small number of cases with higher variable inclusion rates for the SAR. The average number of variables appearing in the top 10 least-squares models was 17.8 and 9.7 for the SARs.

Estimates for the variables would allow an examination of the elasticity response and direction of impact on population migration associated with the variables that exhibit high probabilities of inclusion. Table 8 reports model-averaged SAR estimates based on averaging over the 120 models with posterior probabilities >0.01%, and OLS estimates based on averaging over 473 models exhibiting model probabilities >0.01%.5 Bayesian MCMC estimates for the OLS and SAR model implemented the Zellner's g-prior, diffuse intercept and noise variance priors, and the Beta (1.01, 1.01) prior for ρ were used to produce 2000 retained draws. These draws were weighted by the posterior model probabilities and used to construct a posterior mean as well as 5% and 95% highest posterior density intervals (HPDI) that are reported in Table 8.

Table 8.   Model Averaged Estimates
Variable nameSAR estimatesOLS estimates
5% HPDIMean95% HPDI5% HPDIMean95% HPDI
  1. OLS, ordinary least squares; SAR, spatial autoregressive model; HPDI, highest posterior density intervals.

D-Near retirement−0.0090−0.0065−0.0042−0.7743−0.7439−0.7119
D-Born in state−0.0095−0.0075−0.0054−0.3748−0.3611−0.3475
D-Foreign born0.00000.00000.00000.00000.00000.0000
D-Recent immigrants0.00000.00000.00000.10130.11240.1231
D-College grads−0.0004−0.00010.00010.00030.00050.0006
D-House value0.00450.00610.00770.00140.00170.0021
D-Travel time0.14620.17440.20180.00220.00270.0033
D-Labor force−0.0331−0.0301−0.0270−0.9079−0.8786−0.8505
D-Median rent0.00000.00000.0000−0.4110−0.3890−0.3671
D-Retirement income−0.0005−0.0004−0.0003−0.2321−0.2204−0.2084
O-Near retirement−1.0307−0.9335−0.8415−1.7331−1.6795−1.6257
O-Born in state0.00000.00000.00000.00000.00000.0000
O-Foreign born−0.00020.00000.00030.00000.00000.0000
O-Recent immigrants−0.3757−0.3348−0.2930−0.5383−0.5215−0.5047
O-College grads0.02300.05620.09190.04950.05980.0700
O-House value0.61920.67790.73791.07021.09931.1294
O-Travel time0.00000.00000.0000−0.1930−0.1776−0.1620
O-Labor force0.00000.00000.00000.00000.00000.0000
O-Median rent−0.00090.00030.00160.01220.01500.0175
O-Retirement income0.00000.00000.0000−0.0999−0.0915−0.0834

For interpretation purposes, the mean growth rate in interstate migration over the 10-year period was 0.0465, or less than one-half percent per year, with the 5 percentile being −0.4731 or around negative 5% per year and the 95 percentile being 0.5868, or about 6% per year. An estimate for β of 0.10 suggests that a 10% change in the explanatory variable would give rise to a 1% change in the growth rate over the 10-year period, and we focus on coefficient estimates that are >0.10 in absolute value when analyzing the model-averaged estimates.

For the case of the SAR model estimates, there were five origin characteristics that exerted impacts greater than 0.10 (in absolute value) on migration flows, O-Unemployment, O-Retired, O-Near retirement, O-House value, and O-Recent immigrants. Of these, the unemployment rate, retired persons, and house value were positive, while persons near retirement and recent immigrants were negative. The origin unemployment rate had the largest impact on out-migration flows, with the estimate suggesting that a state with a 1% higher unemployment rate would exhibit a 2% higher growth rate in out-migration over the 10-year period. Retired persons and those aged near retirement had the second largest impact, both exhibiting elasticities around unity, but with opposite signs. Intuitively, these signs seem correct. Higher house values in the origin state lead to out-migration flows and a higher number of recent immigrants from abroad lead to lower out-migration, which seem intuitively plausible. There were only three destination characteristics that exerted an impact >0.10 on migration growth rates, D-Unemployment, D-Graduate/Professional, and D-Travel time, and all were <0.5639 in magnitude. The travel time and unemployment rates had a positive impact on in-migration flows to the destination, which seems implausible, whereas persons with graduate/professional degrees had a negative impact, a plausible result.

Taken together, the SAR model-averaged estimates suggest that origin characteristics are relatively more important than destination characteristics, leading to a “push” interpretation of migration flows.

The OLS model-averaged estimates were different in that they pointed to nine destination characteristics having coefficients >0.10 (in absolute) and seven origin characteristics meeting this criterion. As in the case of the SAR model-averaged estimates, the O-Unemployment had the largest impact with a coefficient of 2.4341. As indicated earlier, this is likely to exhibit an upward bias relative to the SAR model estimate. Similar upward biases can be found in OLS estimates for O-Near retirement and O-Retired and O-House value. The second largest impact was D-Females which taken together with the large negative impact for D-Males, suggests a destination-state population size effect, as males plus females equal population. Interestingly, in the SAR that contains a spatial lag variable, this population size effect is not noticeable.

In total, the OLS-model averaged estimates suggest a different inference regarding the relative importance of origin versus destination characteristics on migration flows than that described above for the SAR model. We note that the model-averaged posterior mean estimate for the parameter ρ was 0.6077, with a 5% and 95% HPDI of 0.5900 and 0.6252, respectively, suggesting a bias in the least-squares results.


It is often the case that sample data exhibit spatial dependence requiring use of spatial regression models of the type considered here. Spatial dependence can lead to least-squares estimates that are biased and inconsistent, and at best least-squares will produce inefficient estimates. This invalidates use of conventional least-squares MC3 and BMA techniques. We have developed the theory as well as computationally efficient algorithms for implementing MC3 and BMA techniques for two important spatial regression models. Expressions for the log-marginal posterior based on Zellner's g-prior were developed and found to be considerably more complex than in the case of least squares. Implementation of MC3 techniques for the spatial regression models requires repeated evaluation of the log-marginal posterior in the context of Metropolis–Hastings sampling. Numerical integration over the spatial dependence parameter as well as calculation of the log-determinant of an n by n matrix are required to evaluate the log-marginal posterior. We provide computational details for efficient algorithms that allow this to be done in a rapid fashion, making spatial regression model MC3 competitive with conventional least-squares MC3 algorithms. A set of public domain MATLAB functions that implement the methods described here are available at for public use.

An applied illustration using a SAR approach to model state-to-state population migration flows over the period 1990–2000 demonstrated that different estimates and inferences result from using an OLS and SAR MC3 methodology in conjunction with model averaging. The results differed regarding which variables were important in explaining variation in population migration flows as well as the model-averaged estimates of the relative impact of the various variables.


This work was accomplished while LeSage was a visiting scholar at CREUSET, Université de Saint-Etienne, and Parent was a doctoral student at CREUSET, Université de Saint-Etienne, Saint-Etienne, France.


Appendix A

There are four separate terms involved in the univariate integration problem over the range of support for the parameter ρ in the SAR model. In the context of the MC3 described in the third section, there is a need for a computationally fast scheme for the univariate integration. This must be carried out on every pass through the MCMC sampler that occurs thousands of times.

The four terms in (11) for the SAR model that vary with ρ are shown in (A1) as T1, T2, T3, and T4.


A log transformation can be applied to all terms T1, …, T4, allowing us to rely on computationally fast methods presented in Pace and Barry (1997) and Barry and Pace (1999) to compute the log-determinant in T1, that is, ln (det (In−ρW)). One of the earlier computationally efficient approaches to solving for estimates in a model involving a sample of 3107 observations was proposed by Pace and Barry (1997). They suggested using direct sparse matrix algorithms such as the Cholesky or LU decompositions to compute the log determinant. In addition to storage savings, sparse matrices result in lower operation counts as well, speeding computations.

Pace and Barry (1997) also suggest a vectorization of the terms in T1 and T2 of our problem that they found useful for maximum likelihood estimation of the SAR model. This involves constructing log-determinant values over a grid of q values of ρ, which is central to our task of integration for the terms T1(ρ) and T2(ρ). In the SAR and SEM models, support for ρ must lie in the interval inline image, but typical applied work simply relies on a restriction of ρ to the (−1, 1) or (0, 1) interval to avoid the need to compute eigenvalues.

Turning attention to the term T2(ρ), Pace and Barry (1997) write the term, inline imageas a vector in q values of ρ. For our problem, we have the expression shown in the following equation:




The term T3 can be vectorized using a loop over ρi values along with the expression inline image. This presents the most problems for a matrix programming language such as MATLAB, as a loop must be used to evaluate the quadratic form in T3 for each value of ρi.

Finally, the term T4(ρ) representing the prior on the parameter ρ is simple to compute over a grid of q values for ρ, and transform to logs.

One important point to note is that we need not estimate the parameters η=(α, β, σ, ρ) of the model to carry out numerical integration leading to posterior model probabilities. Intuitively, we have analytically integrated the parameters α, β, and σ out of the problem, leaving only a univariate integral in ρ. Given any sample data y, X along with a spatial weight matrix W, we can rely on the Pace and Barry (1997) vectorization scheme applied to our task. This involves evaluating the log-marginal density terms T1, …, T4 over a fine grid of q values for ρ ranging over the interval, inline image, or [−1, 1]. Given a matrix of vectorized log-marginal posteriors, integration can be accomplished using Simpson's rule.

Further computational savings can be achieved by noting that the grid can be rough, say based on 0.01 increments in ρ, which speeds the direct sparse matrix approach of Pace and Barry (1997) or Barry and Pace (1999) computations. Spline interpolation can then be used to produce a much finer grid very quickly, as the log determinant is typically quite well behaved for reasonably large spatial samples in excess of 250 observations.

Another important point concerns scaling necessary to carry out the numerical integration that is carried out on the antilog of the log-marginal posterior density. Our approach allows one to evaluate log-marginal posteriors for each model under consideration and store these as vectors ranging over the grid of ρ values. Scaling can then be simply solved by finding the maximum of these vectors placed as columns in a matrix, that is, the maximum from all columns in the matrix. This maximum is then subtracted from all elements in the matrix of log-marginals, producing a value of zero as the largest element, so that the anti-log is unity. This approach to scaling provides an eloquent and universal solution that requires no user intervention and works for all problems.

Numerical integration over λ for the SEM model can use the same vectorization for the terms T1 and T4 as in the case of the SAR model. For this model, we need to construct vectors for inline image and inline image, that are used to calculate a vectorized version of inline image.

The term T3 expression for the SEM model which contains an intercept term takes the form inline image, which can be vectorized over the λi values in inline image.


  1. 1 There are also other models in the family elaborated in Anselin (1988) that are possible choices.

  2. 2 MATLAB version 7 software was used in conjunction with a Pentium III M laptop computer.

  3. 3 The diagonal elements of the distance matrix containing distances from origin 1 to destination 1, origin 2 to destination 2, and so on, will be zero.

  4. 4 Fernández, Ley, and Steel (2001a, b) provide details on calculations of probabilities for inclusion of individual variables in the models.

  5. 5 It is common practice to reweight the posterior model probabilities so that they sum to unity and use these normalized weights to produce model-averaged estimates. This was done here.