This article introduces measures to quantify spatial autocorrelation for vectors. In contrast to scalar variables, spatial autocorrelation for vectors involves an assessment of both direction and magnitude in space. Extending conventional approaches, measures of global and local spatial associations for vectors are proposed, and the associated statistical properties and significance testing are discussed. The new measures are applied to study the spatial association of taxi movements in the city of Shanghai. Complications due to the edge effect are also examined.

]]>Eigenvector-based spatial filtering is one of the often used approaches to model spatial autocorrelation among the observations or errors in a regression model. In this approach, a subset of eigenvectors extracted from a modified spatial weight matrix is added to the model as explanatory variables. The subset is typically specified via the selection procedure of the forward stepwise model, but it is disappointingly slow when the observations *n* take a large number. Hence, as a complement or alternative, the present article proposes the use of the least absolute shrinkage and selection operator (LASSO) to select the eigenvectors. The LASSO model selection procedure was applied to the well-known Boston housing data set and simulation data set, and its performance was compared with the stepwise procedure. The obtained results suggest that the LASSO procedure is fairly fast compared with the stepwise procedure, and can select eigenvectors effectively even if the data set is relatively large (*n* = 10^{4}), to which the forward stepwise procedure is not easy to apply.

When in geography one reconstructs individual behavior starting from aggregated data through ecological inference, a crucial aspect is the spatial variation of individual behavior. Basic ecological inference methods treat areas as if they were all exchangeable, which in geographical applications is questionable due to the existence of contextual effects that relate to area location and induce spatial dependence. Here that assumption is avoided by basing ecological inference on a model that simultaneously does a cluster analysis, grouping together areas with similar individual behavior, and an ecological inference analysis in each cluster, estimating the individual behavior in the areas of each group. That allows one to capture most of the spatial dependence and summarize the individual behavior at a local level through the behavior estimated for each cluster. This approach is used to investigate vote switching in Catalonia, where voters split across a national allegiance divide on top of the ideological divide. That leads to Catalans having a lot of options to choose from, and to them voting differently depending on whether the election is for the Catalan parliament or for the Spanish parliament. To investigate that, the results in the two most recent pairs of such elections are analyzed by simultaneously clustering areas based on the similarity of their vote and vote switch patterns, and estimating one vote switch pattern for each cluster. As a result, Catalonia is partitioned into four clusters that have a strong spatial structure, with all the areas in the same cluster having similar demographic composition. The estimated vote switch patterns are quite different across clusters but very similar across pairs of elections, and they help assess how the differential voter turnout and the strategic dual vote effects vary in space.

]]>In many physical geography settings, principal component analysis (PCA) is applied without consideration for important spatial effects, and in doing so, tends to provide an incomplete understanding of a given process. In such circumstances, a spatial adaptation of PCA can be adopted, and to this end, this study focuses on the use of geographically weighted principal component analysis (GWPCA). GWPCA is a localized version of PCA that is an appropriate exploratory tool when a need exists to investigate for a certain spatial heterogeneity in the structure of a multivariate data set. This study provides enhancements to GWPCA with respect to: (i) finding the scale at which each localized PCA should operate; and (ii) visualizing the copious amounts of output that result from its application. An extension of GWPCA is also proposed, where it is used to detect multivariate spatial outliers. These advancements in GWPCA are demonstrated using an environmental freshwater chemistry data set, where a commentary on the use of preprocessed (transformed and standardized) data is also presented. The study is structured as follows: (1) the GWPCA methodology; (2) a description of the case study data; (3) the GWPCA application, demonstrating the value of the proposed advancements; and (4) conclusions. Most GWPCA functions have been incorporated within the GWmodel R package.

]]>This article considers the most important aspects of model uncertainty for spatial regression models, namely, the appropriate spatial weight matrix to be employed and the appropriate explanatory variables. We focus on the spatial Durbin model (SDM) specification in this study that nests most models used in the regional growth literature, and develop a simple Bayesian model-averaging approach that provides a unified and formal treatment of these aspects of model uncertainty for SDM growth models. The approach expands on previous work by reducing the computational costs through the use of Bayesian information criterion model weights and a matrix exponential specification of the SDM model. The spatial Durbin matrix exponential model has theoretical and computational advantages over the spatial autoregressive specification due to the ease of inversion, differentiation, and integration of the matrix exponential. In particular, the matrix exponential has a simple matrix determinant that vanishes for the case of a spatial weight matrix with a trace of zero. This allows for a larger domain of spatial growth regression models to be analyzed with this approach, including models based on different classes of spatial weight matrices. The working of the approach is illustrated for the case of 32 potential determinants and three classes of spatial weight matrices (contiguity-based, *k*-nearest neighbor, and distance-based spatial weight matrices), using a data set of income per capita growth for 273 European regions.

Taylor's power law (TPL) is the power relation between mean densities and variance of natural populations, and described as one of ecology's few ubiquitous laws. Although the power model has been increasingly applied in social systems modeling, including economics, this article, using English and Welsh economic data as an applied example, suggests that TPL ought to be imported more carefully. The article seeks to convince readers that ecological population methodologies can have an important role in analysis of human spatial behavior, and that this function should not be diminished in pursuit of quick interdisciplinary results. Through the production of “scale-adjusted dispersion indicators,” the article proposes an application of TPL that is quite different from its use in ecological modeling.

]]>In this article, we construct new, simple, and nonparametric tests for spatial independence using symbolic analysis. An important aspect is that the tests are free of a priori assumptions about the functional form of dependence, making them especially suitable in situations where the dependence is nonlinear. We define the concept of a similarity relation, which is used to keep track of similarity between neighboring observations. This similarity count is used to construct new statistical tests based on both random permutation simulations and derived asymptotic distributions. We include a Monte Carlo study to better illustrate the properties and the behavior of the new tests under several synthetically generated processes. Apart from being competitive compared with other nonparametric and parametric tests, results underline the outstanding power of the new tests for nonlinear-dependent spatial processes.

]]>The spatial interaction model (SIM) is an important tool for retail location analysis and store revenue estimation, particularly within the grocery sector. However, there are few examples of SIM development within the literature that capture the complexities of consumer behavior or discuss model developments and extensions necessary to produce models which can predict store revenues to a high degree of accuracy. This article reports a new disaggregated model with more sophisticated demand terms which reflect different types of retail consumer (by income or social class), with different shopping behaviors in terms of brand choice. We also incorporate seasonal fluctuations in demand driven by tourism, a major source of non-residential demand, allowing us to calibrate revenue predictions against seasonal sales fluctuations experienced at individual stores. We demonstrate that such disaggregated models need empirical data for calibration purposes, without which model extensions are likely to remain theoretical only. Using data provided by a major grocery retailer, we demonstrate that statistically, spatially, and in terms of revenue estimation, models can be shown to produce extremely good forecasts and predictions concerning store patronage and store revenues, including much more realistic behavior regarding store selection. We also show that it is possible to add a tourist demand layer, which can make considerable forecasting improvements relative to models built only with residential demand.

]]>The vector assignment *p*-median problem (VAPMP) is one of the first discrete location problems to account for the service of a demand by multiple facilities, and has been used to model a variety of location problems in addressing issues such as system vulnerability and reliability. Specifically, it involves the location of a fixed number of facilities when the assumption is that each demand point is served a certain fraction of the time by its closest facility, a certain fraction of the time by its second closest facility, and so on. The assignment vector represents the fraction of the time a facility of a given closeness order serves a specific demand point. Weaver and Church showed that when the fractions of assignment to closer facilities are greater than more distant facilities, an optimal all-node solution always exists. However, the general form of the VAPMP does not have this property. Hooker and Garfinkel provided a counterexample of this property for the nonmonotonic VAPMP. However, they do not conjecture as to what a finite set may be in general. The question of whether there exists a finite set of locations that contains an optimal solution has remained open to conjecture. In this article, we prove that a finite optimality set for the VAPMP consisting of “equidistant points” does exist. We also show a stronger result when the underlying network is a tree graph.

Monitoring population characteristics and their patterns of spatial evolution are fundamental components for urban management and policy decision-making. Societal issues such as health, transport, or crime are often explored using a range of models describing the urban dynamics of population attributes at specific scales that can be seen as complementary. Using and simulating data at different scales of aggregation asks for the need to analyze and compare spatiotemporal variations in order to better understand the model behaviors and emerging properties of the geosimulation. This article analyzes the uses of the entropy measure in the literature and constraining factors needed for its potential extension to explore the variations in geographic and time scales. In particular, the article discusses the need for a truly spatial entropy that takes into account the spatial contiguities of the observations usually aggregated within a zoning system of areal units. Two generic solutions are exposed for the various geometries and attribute structures used for census-related analyses; they are based on existing measures for point data using (i) co-occurrences of observations and (ii) discriminant ratios of distances between groups of observations. Their extensions to areal compositional data are articulated around their conceptual changes and geocomputational challenges. A revisited and new version of the entropy decomposition theorem, encompassing a spatiality concept semantically related to correlation, is also presented as efficiently reusing the constrained hierarchical zoning system of administrative units to enable discovery of emerging spatial pattern features from the geosimulation. A comparison of the results between the classical use of entropy and the spatial entropy framework devised shows the flexibility and added capabilities of the approach for new types of analyses, thus allowing new insight into studies of population dynamics.

]]>This article discusses how standard spatial autoregressive models and their estimation can be extended to accommodate geographically hierarchical data structures. Whereas standard spatial econometric models normally operate at a single geographical scale, many geographical data sets are hierarchical in nature—for example, information about houses nested into data about the census tracts in which those houses are found. Here we outline four model specifications by combining different formulations of the spatial weight matrix *W* and of ways of modeling regional effects. These are (1) groupwise *W* and fixed regional effects; (2) groupwise *W* and random regional effects; (3) proximity-based *W* and fixed regional effects; and (4) proximity-based *W* and random regional effects. We discuss each of these model specifications and their associated estimation methods, giving particular attention to the fourth. We describe this as a hierarchical spatial autoregressive model. We view it as having the most potential to extend spatial econometrics to accommodate geographically hierarchical data structures and as offering the greatest coming together of spatial econometric and multilevel modeling approaches. Subsequently, we provide Bayesian Markov Chain Monte Carlo algorithms for implementing the model. We demonstrate its application using a two-level land price data set where land parcels nest into districts in Beijing, China, finding significant spatial dependence at both the land parcel level and the district level.

Conventional methods used to identify crime hotspots at the small-area scale are frequentist and employ data for one time period. Methodologically, these approaches are limited by an inability to overcome the small number problem, which occurs in spatiotemporal analysis at the small-area level when crime and population counts for areas are low. The small number problem may lead to unstable risk estimates and unreliable results. Also, conventional approaches use only one data observation per area, providing limited information about the temporal processes influencing hotspots and how law enforcement resources should be allocated to manage crime change. Examining violent crime in the Regional Municipality of York, Ontario, for 2006 and 2007, this research illustrates a Bayesian spatiotemporal modeling approach that analyzes crime trend and identifies hotspots while addressing the small number problem and overcoming limitations of conventional frequentist methods. Specifically, this research tests for an overall trend of violent crime for the study region, determines area-specific violent crime trends for small-area units, and identifies hotspots based on crime trend from 2006 to 2007. Overall violent crime trend was found to be insignificant despite increasing area-specific trends in the north and decreasing area-specific trends in the southeast. Posterior probabilities of area-specific trends greater than zero were mapped to identify hotspots, highlighting hotspots in the north of the study region. We discuss the conceptual differences between this Bayesian spatiotemporal method and conventional frequentist approaches as well as the effectiveness of this Bayesian spatiotemporal approach for identifying hotspots from a law enforcement perspective.

]]>Local statistics test the null hypothesis of no spatial association or clustering around the vicinity of a location. To carry out statistical tests, it is assumed that the observations are independent and that they exhibit no global spatial autocorrelation. In this article, approaches to account for global spatial autocorrelation are described and illustrated for the case of the Getis–Ord statistic with binary weights. Although the majority of current applications of local statistics assume that the spatial scale of the local spatial association (as specified via weights) is known, it is more often the case that it is unknown. The approaches described here cover the cases of testing local statistics for the cases of both known and unknown weights, and they are based upon methods that have been used with aspatial data, where the objective is to find changepoints in temporal data. After a review of the Getis–Ord statistic, the article provides a review of its extension to the case where the objective is to choose the best set of binary weights to estimate the spatial scale of the local association and assess statistical significance. Modified approaches that account for spatially autocorrelated data are then introduced and discussed. Finally, the method is illustrated using data on leukemia in central New York, and some concluding comments are made.

]]>Nearly all segregation measures use some form of administrative unit (usually tracts in the United States) as the base for the calculation of segregation indices, and most of the commonly used measures are aspatial. The spatial measures that have been proposed are often not easily computed, although there have been significant advances in the past decade. We provide a measure that is individually based (either persons or very small administrative units) and a technique for constructing neighborhoods that does not require administrative units. We show that the spatial distribution of different population groups within an urban area can be efficiently analyzed with segregation measures that use population count-based definitions of neighborhood scale. We provide a variant of a *k*-nearest neighbor approach and a statistic spatial isolation and a methodology (EquiPop) to map, graph, and evaluate the likelihood of individuals meeting other similar race individuals or of meeting individuals of a different ethnicity. The usefulness of this approach is demonstrated in an application of the method to data for Los Angeles and three metropolitan areas in Sweden. This comparative approach is important as we wish to show how the technique can be used across different cultural contexts. The analysis shows how the scale (very small neighborhoods, larger communities, or cities) influences the segregation outcomes. Even if microscale segregation is strong, there may still be much more mixing at macroscales.

Social scientists characterize social life as a hierarchy of environments, from the microlevel of an individual's knowledge and perceptions to the macrolevel of large-scale social networks. In accordance with this typology, individuals are typically thought to reside in micro- and macrolevel structures, composed of multifaceted relations (e.g., acquaintanceship, friendship, and kinship). This article analyzes the effects of social structure on micro outcomes through the case of regional identification. Self-identification occurs in many different domains, one of which is regional; that is, the identification of oneself with a locationally associated group (e.g., a “New Yorker” or “Parisian”). Here, regional self-identification is posited to result from an influence process based on the location of an individual's alters (e.g., friends, kin, or coworkers), such that one tends to identify with regions in which many of his or her alters reside. The structure of this article is laid out as follows: initially, we begin with a discussion of the relevant social science literature for both social networks and identification. This discussion is followed with one about competing mechanisms for regional identification that are motivated first from the social network literature, and second by the social psychological and cognitive literature of decision making and heuristics. Next, the article covers the data and methods employed to test the proposed mechanisms. Finally, the article concludes with a discussion of its findings and further implications for the larger social science literature.

]]>This article formulates a model to analyze the role of fixed costs in the design of optimal transportation hub networks. The primary purpose of this article is to better model costs in hub networks, an issue that has attracted considerable attention. This article allows particular versions of hub networks to emerge from the cost structure, rather than by imposing a rigid predefined connectivity protocol. The article integrates modeling approaches from an environmental hub location model with the three-index formulation of Ernst and Krishnamoorthy to produce a hub location model with fixed and variable costs for all arcs. Our goal is to demonstrate how the inclusion of a richer cost model in transportation hub location can generate a wide range of different network types, depending on the relative magnitudes of the cost elements. While the existence of special case network solutions is well known and has been exploited in optimization, the current research provides added insight to the cost of flow in a more, or less, connected hub network. Eight fundamental prototype networks are derived as special cases, and some additional unanticipated network types also emerge. The results are illustrated with a standard CAB25 data set.

]]>