There are a number of statistics for describing the general properties of a distribution. These involve simple descriptions of overall pattern (global characteristics), descriptions of regional variation, and descriptions of small, concentrated clusters (hot spots). Among these are the mean center, center of minimum distance, standard deviational ellipse, and the directional mean (Ebdon 1988; LeBeau 1992).
These simple analogies to univariate statistics can be used to compare different types of distributions or to compare the same distribution for different time periods. As an example, Fig. 1 shows the standard deviational ellipses for burglaries in Precinct 12 in Baltimore County for June and July 1997.5 As seen, there is a spatial shift that occurred between June and July. As summer progresses, some vacationers occupy the communities along the Chesapeake Bay and the distribution of burglaries follows this pattern.
A key concept in spatial statistics is that of spatial autocorrelation (Griffith 1987). There are various definitions of spatial autocorrelation but a simple one is that events are spatially arranged in a nonrandom manner, either more concentrated or, occasionally, more dispersed than would be expected on the basis of chance. There are several well-known global measures of spatial autocorrelation—Moran's I, Geary's C, and the Moran correlogram—that are included in CrimeStat (Moran 1948; Geary 1954; Ebdon 1988). There are also several statistics that describe spatial autocorrelation through the properties of distances between incidents including nearest neighbor analysis (Clark and Evans 1954), linear nearest neighbor analysis, K-order nearest neighbor (Cressie 1991), and Ripley's K statistic (Ripley 1976, 1981). The testing of significance for Ripley's K is done through a Monte Carlo simulation that estimates approximate confidence intervals.
As an example, Fig. 2 below shows the Ripley's K distribution of motor vehicle crashes in Houston in 1998 and compares it to both an “envelope” from 100 random Monte Carlo simulations as well as the distribution of the 2000 population (measured by the centroids of census blocks).6 Ripley's K counts the cumulative number of other points within a circle of a certain radius placed over each point in the distribution. The count is made for multiple radii so that concentration can be compared at different scales. As seen, the distribution of vehicle crashes is highly concentrated (i.e., having a larger count within the search circle), more so than would be expected by the population distribution and certainly more so than would be expected under complete spatial randomness.
Hot spot analysis
From an analytical perspective, tools that identify hot spots are very useful to police departments because they tend to focus their deployment and prevention resources on the areas that are most likely to generate incidents.7
To illustrate, Fig. 3 shows first- and second-order standard deviational ellipses of driving while intoxicated (DWI) crashes in central Houston from 1999 to 2001, using the nearest-neighbor hierarchical clustering routine. The first-order clusters are the grouping of incidents while the second order are the grouping of the first-order clusters. As seen, the incidents tend to occur in small clusters. Several of the small clusters, in turn, are grouped into larger district clusters. Fig. 4 zooms into one of the clusters in the East End of Houston, a low-income community with many DWI crashes.
Using another example, Fig. 5 shows the clustering of street robberies in west Baltimore County using the STAC clustering algorithm. As seen, three of them fall along a major arterial in the county (State Highway 26); the robberies are concentrated at commercial strips along the arterial.
Because the hot spot tools are complex algorithms, statistical significance must be tested with a Monte Carlo simulation. The nearest-neighbor hierarchical clustering, the risk-adjusted nearest-neighbor hierarchical clustering, and the STAC routines each have a Monte Carlo simulation that allows the estimation of approximate confidence intervals or test thresholds for these statistics.
Of course, a hot spot routine only identifies a collection of points that are close together. It does not explain why they are together. For that, additional research and analysis is required. In the case of crime incident hot spots, the clustering could be due to a high concentration of potential victims (e.g., at a shopping mall), particular land uses that encourage crimes (e.g., an area with a concentration of bars and adult bookshops; Levine, Wachs, and Shirazi 1986), a common activity (e.g., a drug trade “center”), a location where many offenders live, or a neighborhood where a rash of incidents suddenly occur (e.g., vehicle thieves often hit a neighborhood for a short period of time). The hot spot could also be due to chance; in any distribution, a certain amount of clustering will occur by chance. That is why it is important to test any hot spot against a random distribution (through a Monte Carlo simulation, for example) and to also examine several years of data to ensure that it is not transitory.
There are also several miscellaneous options in CrimeStat that make the program easier to use. Parameters can be saved and reloaded, tab colors can be changed, and Monte Carlo simulation data can be output.
CrimeStat is accompanied by sample data sets and a manual that gives the background behind the statistics with many examples. The manual includes examples contributed by researchers from many different fields. As mentioned, the software and documentation are available for free from the NIJ (see footnote 2).