Pattern Analysis Based on Type, Orientation, Size, and Shape

Authors


Emily Atwood Williams, Department of Geography, Arizona State University, Tempe, AZ 85287-0104
e-mail: emilya_williams@yahoo.com

Abstract

Quantifying the pattern of geographically defined areas has proven useful in analyzing and understanding geographic processes. Pattern analyses most commonly used by geographers measure the spatial distribution of nonspatial variables at each point or in each area. The objective of this research is to develop and assess a pattern analysis for area-class maps that considers more than the nonspatial attributes when describing a spatial pattern. We examine whether similar geographic areas—similar with respect to attribute type and several geometric properties—exhibit particular spatial patterns (e.g., clustered, random, or dispersed). The additional attributes we include are orientation, size, and shape. We refer to the method as TOSS, reflecting the Type, Orientation, Size, and Shape of areas. The goal of considering these additional attributes is to capture the geometric aspects of pattern that other approaches to pattern analysis do not. Sets or groups of similar areas are created using cluster analysis, based on a variable standardization score using the Gower coefficient of similarity. The spatial distributions for each set of geographic objects are then calculated using nearest neighbor analysis. To test the effectiveness of the TOSS method, results from pattern analysis and the TOSS method are qualitatively and quantitatively compared. The TOSS method offers a novel approach to pattern analysis.

Introduction

Understanding land use patterns can provide insight into the spatial processes that created these patterns. Fu and Chen (2000) researched the spatial pattern of agriculture in northern China in an attempt to determine the required landscape diversity for controlling soil erosion. Their results prompted an increase in vegetation land covers to aid in erosion control. Ripple, Bradshaw, and Spies (1991) used spatial indices of patch size, shape, abundance, and spacing to quantify the landscape of managed forest areas in the western Cascades of Oregon. They suggested that such a quantification of the spatial distribution of forest stands is important when evaluating and managing wildlife habitats.

Insight into characteristics of pattern, as the above examples illustrate, help answer questions related to spatial scale, spatial heterogeneity, boundaries, and incomplete data. Understanding and quantifying pattern also lends insight into spatial processes and changes in these processes that may have led to the existing patterns. Pattern analysis is commonly performed with a metric that describes the spatial distribution of a nonspatial variable of interest (e.g., an attribute) using the terminology clustered, random, or dispersed (also called regular). Some researchers use a combination of indices, such as patch size, shape, fragmentation, and fractals to quantify patterns on the landscape, but the metrics are evaluated separately (Ripple, Bradshaw, and Spies 1991; Remmel and Csillag 2003). No known approach exists that evaluates whether similar geographic sets from area-class maps—similar with respect to attribute type and several geometric properties—exhibit particular spatial patterns.

Humans can discern patterns based on visual structures, such as the arrangement of different shapes. Picture a quilt. Shapes, such as squares and triangles, are combined in different ways thus creating a visual pattern. Also involved in the creation of quilts is the orientation of the shapes. The quilt may contain pieces of various sizes, adding to the complexity of the overall pattern. These pieces also range in color, a characteristic geographically referred to as the attribute or type (e.g., land use type, soil type, geology type). The unique combination of quilt pieces with varying sizes, shapes, orientations, and colors generates different patterns.

The objective of this research is to develop and assess a pattern analysis approach for area-class maps that considers the distribution of orientation, size, shape, and attribute type. Area-class maps represent data-driven patterns, such as those created by soil type or land cover mapping and unlike those created by census tracts or political boundaries (Bunge 1966; Mark and Csillag 1989; Chrisman 2002). The assumption for area-class maps is that homogenously defined areas arise because of a stationary process across the study area (Boots 2003). Frequently, the data are represented as an irregular tessellation with categorical attributes, such as vegetation cover. Other times, the map is composed of discrete objects, such as lakes. The method could be applied to determine if lakes with similar geometric configurations tend to form a pattern, which could suggest a similar process occurred in their creation (e.g., the Finger Lakes in upstate New York that were formed due to continental glaciers).

We examine whether similar geographic areas—similar with respect to attribute type and geometric properties—exhibit a spatial pattern (e.g., clustered, random, or dispersed). Li and Reynolds (1993, 1994, 1995), Gustafson (1998), Remmel and Csillag (2003), and Boots (2003) defined pattern with two components: composition (e.g., the categories in a landscape) and their configuration (e.g., how the categories are distributed). We expand on this conceptually by considering orientation, size, and shape as additional components to composition of pattern. The only other similar technique is one proposed by Dacey (1965), who included size and shape but not orientation. When considering composition and configuration, the relative location of a nonspatial attribute is the basis for quantifying a spatial pattern. We combine attribute (or type, as we refer to it in our method) with its orientation, size, and shape before performing analysis on the configuration. It is reasonable to assume that the addition of the other geometric characteristics may contribute to the overall pattern in a meaningful way.

We introduce the TOSS method, represented by the Type, Orientation, Size, and Shape of each area that makes up a spatial pattern. By combining specific characteristics of individual areas within a spatial pattern, we examine (1) whether groups of similar areas (similar with respect to attribute type and geometric properties) exhibit patterns and (2) how our method describes spatial pattern compared with other pattern analysis methods. We evaluated these questions by applying TOSS to four data sets: a quilt, land use in two municipalities in Arizona, and Pennsylvania geology. In addition to comparing the quantitative results, we conducted a small survey to determine human perception of the TOSS approach.

Background

Area-class maps

Area-class maps, sometimes referred to as categorical maps, nominal maps, data-driven maps, and dasymetric maps, represent a single phenomenon in a defined area on the surface of the earth (Bunge 1966). In the context of area-class maps, time is fixed (e.g., land use is from one time period), attributes are controlled (e.g., land use categories are defined in advance), and space is measured (e.g., the boundaries of land use are identified on the surface of the Earth) (Chrisman 1999). Our approach to pattern analysis is not appropriate when the attributes are measured and space is controlled, whether the data are spatially intensive or spatially extensive. For example, in a choropleth map of the United States' population densities, time is fixed, space (and therefore the geometric characteristics) is controlled, and attribute (population density) is measured. Therefore, measuring the orientation, size, and shape of the state boundaries would not contribute meaningful insight to the distribution of population. Rather, the drivers of population density are more related to economics, politics, and history than they are to state size or shape.

The defined areas in area-class maps are dependent on factors such as scale, the number of categories, and the interpretation made by a cartographer or air photo interpreter. Understanding the constraints on data due to these factors is critical in the interpretation of results. Research on area-class maps has examined line generalization (Mark and Csillag 1989), modeling error (Chrisman 1989), classification (Zhang and Stuart 2001; Mennis 2003), visualization (Hengl et al. 2004), map comparison (Remmel and Csillag 2006), and spatial heterogeneity (Li and Reynolds 1993, 1994, 1995). Techniques to measure the spatial heterogeneity of area-class maps measure the configuration of categories. In a sense, they measure spatial pattern of area-class maps.

The pattern analysis technique most commonly applied to categorical data is the join count statistic. This approach measures the degree of clustering or dispersion of contiguous areas, typically for binary variables. A binary variable can only have one of two values assigned to it. Join count statistics have been applied to determine spatial clustering of genotypes (an organism's genetic information) in population genetics (Shimatani and Takahashi 2003), a field that spatial statistics were first introduced to by Sokal and Oden (1978). Boots (2003) developed a local measure for spatial association of categorical data as an extension to the join count statistic. Another extension of the two-value join count is the k-color map, where the spatial distribution of more than two values is evaluated (Cliff and Ord 1981). Cliff and Ord (1981) summarize Dacey's (1965) remarks on the limitations of the join count statistic. According to Cliff and Ord, Dacey describes the limitations of these techniques as the problem of “topological invariance” and the requirement of spatially contiguous areas. Topological invariance means that the pattern measures are invariant under geometric transformations after the geographic relationships are identified (the linked matrix). Geometric transformations include changing the shape and size of areas in the structure. The second limitation is that all areas need to be adjacent to one another. The TOSS method is our attempt to address these concerns with an approach to deal with continuous fields and discrete objects.

Pattern analysis

Pattern analysis in a broader sense involves calculating indices to describe the spatial distribution, arrangement, and structure of geographic objects (typically points, lines, areas, or grids and/or their attributes) (Chou 1995). There are a variety of techniques to assess pattern, including those that examine the geographic distribution of discrete objects, measure the geometry of objects, and measure spatial autocorrelation.

Techniques that examine the geographic distribution of objects are nearest neighbor analysis and quadrat analysis (Cressie 1993; Lee and Wong 2001). The terminology used to describe the dispersion of objects is typically clustered, random, and dispersed (the terms regular and uniform are also used). These are frequently calculated for point data but can also be applied to centroids of polygons or areas, such as area-class maps. Nearest neighbor analysis has been applied to measure the pattern of animal distributions (Heupel and Simpfendorfer 2005), plant distributions (Shaw et al. 2005), and landforms (Gao, Alexander, and Barnes 2005). Quadrat analysis has been utilized to examine the spatial distribution of accessibility (Sadahiro 2005) and to measure forest and plant distributions (Falcon-Lang 2003; Plante et al. 2004). Skelton (1996) offered suggestions on how quadrat analysis could be used in medicine. Quadrat analysis results are inaccurate when an inappropriate quadrat size is used. While nearest neighbor analysis tends to be more robust, both techniques are criticized for their inability to measure the correlation of attributes.

Other forms of pattern analysis measure the geometric form of landscape, typically on discrete objects. These indices measure irregularity, patch size, shape, contagion, and adjacency measures (Li and Reynolds 1993; McGarigal and Marks 1995; Haines-Young and Chopping 1996; Gustafson 1998; He, DeZonia, and Mladenoff 2000). Remmel and Csillag (2003) evaluated several of these landscape pattern indices and discovered that even small differences in land cover proportions can alter pattern indices significantly or similar pattern indices can be calculated from visually different landscapes. Fractal analysis, which is often used to generate artificial but realistic landscapes, has been used to quantify irregularity in landscapes. De Keersmaecker, Frankhauser, and Thomas (2003) utilized fractals to describe the irregularity of urban land uses in Brussels, Belgium. Haines-Young and Chopping (1996) provide a review of patch size, shape, and contagion indices with particular application to forestry projects. These indices have been applied to numerous scientific applications, particularly in the field of landscape ecology (Corry 2005; Grainger, van Aarde, and Whyte 2005; Wei and Hoganson 2005). The downside of these approaches is that they measure only specific features of pattern rather than a broader scope of pattern (Remmel and Csillag 2003).

Spatial tessellations and the application of Voronoi diagrams have been used to analyze the distribution pattern of points, to measure spatial intensity, and to aid in pattern extraction (Okabe et al. 2000). To measure the distribution of points, nearest neighbor analysis and the shape of the point patterns are calculated (Kristensen et al. 2006). By calculating a Voronoi diagram for a point distribution, Chiu (2003) was able to extract characteristics of the Delaunay triangles (e.g., interior angles and edge length), to test for complete spatial randomness of point distributions. Spatial intensity, how concentrated an activity or phenomenon is in a particular study area, is measured by calculating the area and elongation of Voronoi polygons. This approach has been used in a wide range of applications including agriculture, microbiology, and astronomy (Pásztor 1994; Wager, Coull, and Lange 2004; Kristensen et al. 2006). Use in pattern recognition and extraction is sparse but has potential as demonstrated for fire spread by Calabi and Hartnett (1968).

Pattern analysis is also frequently equated with measures of spatial autocorrelation—representing both composition and configuration (Li and Reynolds 1993). Nonspatial attributes are evaluated on whether they exhibit positive or negative spatial autocorrelation. A positive spatial autocorrelation result indicates that neighboring areas or points are similar with respect to attribute values, while negative spatial autocorrelation means nearby regions are less similar in attributes than one would expect in a random pattern. Spatial autocorrelation is one component of pattern; it is limited to the clustering or dispersion of objects rather than measuring geometric aspects of pattern (Boots 2003). It is difficult to apply globally to area-class maps because the approach requires calculations on the attribute. Spatial autocorrelation techniques have been applied to subsets of area-class maps. For example, Overmars, de Koning, and Veldkamp (2003) used Moran's Index (I ) on four different land use types in Ecuador. For each land use type they measured the spatial autocorrelation of parcel size.

The TOSS method explained

The objective of this research is to develop a pattern analysis method that considers the orientation, size, and shape of geographic objects in addition to the attribute type for area-class maps. In area-class maps, the orientation, size, and shape measurements would be of interest and applicable to our method because the boundaries of the areas are defined and characterized by the processes that formed them.

Our hypothesis is that pattern can be described by quantifying the spatial configuration of sets of geographic objects derived from an area-class map. Each area in the map has associated with it four attributes—type, orientation, size, and shape. Each area in our test patterns is represented by a polygon. Sets or groups of similar polygons are then created based on the similarity of values of the four TOSS attributes. Then, the spatial configuration of each set is calculated. The spatial configuration of each set is assessed based on its tendency to be clustered, random, or dispersed.

There are three primary steps of the TOSS polygon pattern analysis method (Fig. 1). The first step of the TOSS method is to calculate the four TOSS attributes for each polygon in the input pattern (Table 1). The measurements described below are not the only possibilities for calculating these attributes, but they are the ones we used to test TOSS. The first input attribute of the TOSS method is a polygon's type. We define type as the unifying characteristic of a polygon, separating it from surrounding polygons. Existing pattern analyses refer to type as the “attribute” of interest. We use the term “attribute” as a general term that refers to each of the TOSS variables. Type is most frequently a nominal (categorical) variable, but could also be of ordinal (ranking) or interval/ratio (quantitative) scale. For example, in a land-use classification map, nominal classifications of areas may include commercial, residential, and agricultural. The orientation of an area is the directional degree of the area's primary axis. The orientation value is calculated through the use proprietary software.1 The values of the orientation angle range from 0° to 180° and increase counterclockwise, starting from 0° on the right and 90° when the primary axis is vertical to the data's x-axis. The size of each polygon is equivalent to its measured area, a measurement typically calculated by standard geographic information systems (GIS) software. The measurement unit of size depends on the data's coordinate system. For our evaluation, polygon shape was determined using a perimeter-to-area ratio:

image(1)
Figure 1.

 The three primary steps of TOSS polygon pattern analysis.

Table 1.   Polygon Attributes and their Measurements for TOSS Pattern Analysis
AttributeMeasureCalculationValue
  1. GIS, geographic information systems.

TypeClassification AttributeN/AClass name or identifier
OrientationPrimary AxisDegree of Rotation0°–180°
SizePolygon's AreaInternal to GISNumber in projection units
ShapePerimeter-to-Area Ratio√Area/(0.282 × Perimeter)Index: 0–1

The shape index returns a value between 0 and 1, where numbers closer to 1 represent more compact shapes (i.e., a perfect circle equals 1). Many other shape measurements, including both single- and multiple-parameter indices, exist (Boyce and Clark 1964; Duda and Hart 1972; Wentz 2000). However, the perimeter-to-area ratio is a conceptually simple single-parameter (i.e., compactness) measurement and is easily calculated in the GIS environment.

Step two of the TOSS method is to create the sets of geographic objects based on similarities of the polygons in regard to their TOSS attributes. Because the four input attributes are of different measurement scales, the values for each polygon must first be standardized so that each contributes equally to the analysis. Gower (1971) presented a standardization method that allows for mixed data types and, therefore, was appropriate to use on the polygons with both nominal and interval/ratio attributes. The Gower similarity coefficient (Sij) between objects i and j, is defined as

image(2)

Above, sijk is the similarity between the ith and jth polygons as measured by the kth attribute; δijk represents the possibility of such a comparison, equal to 1 when attribute k can be compared for i and j, and 0 otherwise; and wk is the weight applied to the kth attribute (if any). If no attribute is to be weighted more heavily than another, as was the case in our research, wk equals 1; an attribute is effectively ignored in the similarity calculation when wk equals 0. The effect of the denominator is to divide the sum of the similarity scores by the number of attributes or, if differential weighting is applied, by the sum of their weights.

For nominal variables (i.e., type in the TOSS method):

image(3)

For directional data (i.e., orientation):

image(4)

where vik and vjk are the angular values of polygons i and j for attribute k, such that sijk ranges from 0 to 1, where sijk is equal to 0 when they are at 90° apart (the maximum difference possible) and sijk is equal to 1 when two polygons are parallel in orientation.

For interval/ratio variables (i.e., size and shape):

image(5)

where vik and vjk are the values of attribute k for polygons i and j, respectively, and vk,max and vk,min are the maximum and minimum values, respectively, of the attribute k in the complete data set, such that sijk ranges between 1, for identical values vik=vjk, and 0, for the two extreme values vk,maxvk,min.

Given these different measures of similarity depending on the scale of the attribute of comparison, every polygon in a given pattern is compared with every other polygon in that pattern and a similarity score (Sij) is calculated for every pair of polygons in the pattern. The resulting similarity matrix, S, contains these standardized scores (Sij). The sets of geographic objects are determined by performing cluster analysis on the similarity matrix.

Cluster analysis is an exploratory statistical method that aims to sort cases (for this research, polygons) into groups where the degree of association between members of the same group is strong and is weak between members of different groups (Wishart 2003). Using similarity coefficients, like the Gower similarity coefficient, clusters are determined. There is no best clustering method, although Everitt (1993) discussed several studies that compared the usefulness of various cluster analysis methods. Each investigation gave slightly different results, but the three most common methods to yield satisfactory results were average linkage, complete linkage, and Ward's increase in sum of squares.

The third and final step of TOSS pattern analysis is to calculate the spatial distributions for each set of geographic objects (the clusters of polygons) that were created in step two. Performing nearest neighbor analysis on the polygon centroids generates three spatial distributions for each data set. We chose to conduct nearest neighbor analysis for two reasons. First, this technique was easily performed (Lee and Wong 2001). More importantly, nearest neighbor analysis does not require contiguous polygons, which is a key consideration because the sets of polygons identified through the cluster analysis will likely not be contiguous. Each set is then described as having a clustered, random, or dispersed configuration.

The advantages of the TOSS method over other pattern analysis methods are that it can be easily applied to nominal data as well as noncontinuous data. Furthermore, TOSS takes into account the structure (orientation, size, and shape) and distribution (regularity, relationship, and clustering) of geographic objects. For example, a geology data set of Pennsylvania has a noticeable dendritic pattern in the northwestern region of the state and a linear, diagonally oriented pattern in the southeast. Using a method such as TOSS, which includes type, orientation, size, and shape, is an attempt to capture this kind of visual description of pattern in a quantitative manner.

Evaluating TOSS

Approach

We applied the TOSS method to four data sets, where each area in our digital database is represented by a polygon (Fig. 2). We compared the results from TOSS to several other pattern analysis approaches. The other approaches we identify for comparison with TOSS are: join count, quadrat analysis, nearest neighbor analysis, Moran's I, and Geary's C. Each technique we selected for comparison represents a global measure that can be applied to area data—both continuous fields and discrete areas. We compare data requirements, limitations, simplicity of the measurement, and how well results match human interpretation.

Figure 2.

 The four sample data sets: (a) Quilt color pattern, (b) Pennsylvania geology pattern, (c) Surprise, and (d) Tempe land use patterns.

We conducted a 16-question survey to undergraduate students enrolled in an introductory physical geography course. The fundamental purpose of this survey was to assess human perception of pattern terminology of our four sample data sets. This group clearly has certain biases (e.g., age), but we believe that because the course is not restricted to geography majors, the participants represent a cross-sectional view of what constitutes spatial pattern. Participants were shown digital images of each of our four data sets. First they were asked to identify the term that best described the pattern they saw using the terms: random, clustered, dispersed, or none of the above. Second, they were shown the same four data sets again and asked to identify the phrase that best described the pattern they saw using the phrases: overall regular pattern, clusters of regular pattern, clusters of no discernable pattern, no overall pattern, or none of the above describe the pattern I see. Our intention was to determine whether phrases—which could be associated with TOSS results—provide a better framework for describing pattern than the simple terminology associated with several pattern analysis techniques.

Our first data set in our comparative analysis was a quilt pattern. We chose a quilt because humans can, through visual inspection, easily discern a pattern. In this way, the quilt pattern acts as a standard for comparison against other data sets and their results. Our second data set is geology of the state of Pennsylvania obtained from the United States Geological Survey (USGS). The USGS geology data are at a scale of 1:2,500,000 and represent rock types based on relative rock age. Our third and fourth data sets are of Maricopa County, Arizona land use for the year 2000 (MAG 2006). Two cities within the region, Tempe and Surprise, were extracted from this database. The difference between these cities is that Tempe is completely surrounded by other municipalities and Surprise is located on the urban fringe, adjacent to open desert. Tempe's central location restricts annexation such that new development must occur on vacant lots with a tendency for higher density development. Surprise, on the other hand, is located on the urban fringe, where annexation is possible, potentially leading to a less dense land use plan. The latter three geographic data sets meet the measurement criteria discussed earlier, where time is fixed, attribute is controlled, and space is measured. All four layers were projected into NAD83 UTM coordinates (in meters)—Zone 12 for the land use and quilt (arbitrarily) layers and Zone 18 for the geology layer.

Data processing

In processing the data for TOSS, we assigned the calculated values for type, orientation, size, and shape attributes to each polygon in our data sets. Type corresponds to the category that each polygon represents in its pattern. For the quilt data set, the type values are the display colors that make up the pattern—black, dark gray, light gray, and white. The geology types represent the relative age of the rock layers, denoted by names of the particular era, period, or epoch in order from youngest to oldest—Cretaceous, Triassic, Permian, Pennsylvanian, Mississippian, Devonian, Silurian, Ordovician, Cambrian, Paleozoic, and Proterozoic. The USGS originally classified the rock layers into 33 geologic time periods, but in order to generate more homogenous sets of geographic objects, we combined the data into 11 classes based on a geologic time scale. The land use type corresponds to land use classes developed by MAG and also originally included 33 land use classes. Similar to the geology data set, we combined these classes to make nine general land use classifications—residential, commercial, industrial, office, miscellaneous employment, transportation, open space, multiple use, and vacant.

Orientation was calculated by measuring the angle of a polygon's primary axis. This resulted in a number ranging from 0° to 180°. Size, measured in meters, was calculated during the creation of the GIS data layers for each input pattern. Shape was calculated from the area and perimeter values.

The next data processing step was the creation of the similarity matrices for each pattern. The Gower (1971) coefficient of similarity was calculated for every polygon pair in a pattern based on values of the four polygon attributes. Each TOSS attribute was given equal weight, although alternatively if a researcher knew a priori that a particular variable contributed more or less to a pattern's meaningfulness, each polygon variable could be weighted accordingly (using the wk variable when calculating the Gower similarity coefficient).

The final data processing step was the generation of the sets of geographic objects. This was done via cluster analysis on the similarity matrices. We applied Ward's method to generate the clusters because it produced clusters that contained a reasonable number of polygons, unlike the average linkage method, which produced clusters containing only one or two polygons. This was an important factor because the final step involved assessing the spatial distribution of each group and the dispersion of a small sample cannot be analyzed.

Results

TOSS pattern analysis

Step one of TOSS was the calculation of the four polygon attributes: type, orientation, size, and shape. Fig. 3 displays the quilt pattern in terms of these four characteristics as an example. Step two of the TOSS method was to create the sets of geographic objects based on values of the four polygon attributes. Fig. 4 shows each data set in terms of its three sets of geographic objects or clusters. Referring back to Fig. 3 provides an understanding of the process of grouping similar polygons together to make the sets of geographic objects during the Gower standardization and cluster analysis phases of the TOSS method. For example, most of the type black polygons have midranging orientation values (46°–135°), similar area values (the actual values range from 359–417 m2, grouping them into the third area category of 314–782 m2), and similar shape measurements (Shape Index=0.61–0.80). Therefore, each type black polygon is placed in the same set of geographic objects, Set 2 in Fig. 4. The other polygons that were placed in Set 2 were calculated to be more similar to each other than to the polygons in either Sets 1 or 3 in terms of their TOSS attributes.

Figure 3.

 The quilt polygon data set categorized by each of the four polygon attributes of the TOSS method: (a) Type, (b) Orientation, (c) Size (Area), and (d) Shape (Shape Index). Using these attributes, each polygon was grouped into the three most homogeneous clusters via hierarchical cluster analysis. These clusters are the sets of geographic objects.

Figure 4.

 The sets of geographic objects (clusters) after performing hierarchical cluster analysis (Ward's increase in sum of squares) on each of the polygon patterns: (a) quilt, (b) Pennsylvania, (b) Surprise, and (d) Tempe. Using the four polygon attributes as criteria, cluster analysis was restricted to the creation of only three clusters.

Table 2 summarizes the means and standard deviations of each of the three sets of geographic objects for each pattern in terms of the four TOSS polygon attributes. These statistics quantify the similarity of the polygons that exists within each set, which were created using Ward's increase in sum of squares clustering method. Ideally, clusters contain polygons that are the most similar to each other, reflected by lower standard deviations. Therefore, when comparing clusters or comparing a set to all data, lower standard deviations mean that the polygons within the cluster are more similar to one another than to polygons in another set. Table 3 compares the standard deviations of the polygons in each set of geographic objects to the standard deviations of the complete data set. The bold values indicate a decrease in the standard deviation from the overall pattern to the sets of geographic objects, signifying sets of more similar polygons.

Table 2.   Result Summary of Hierarchical Cluster Analysis (Ward's): Mean (Standard Deviation) by Attribute
AttributeSet 1Set 2Set 3
  1. * To input the attribute type into the Gower standardization formula, each type description had to be converted to a numerical attribute as follows : Quilt: 1=Black, 2=Dark Gray, 3=Light Gray, 4=White . Geology: 1=Cretaceous, 2=Triassic, 3=Permian, 4=Pennsylvanian, 5=Mississippian, 6=Devonian, 7=Silurian, 8=Ordovician, 9=Cambrian, 10=Paleozoic, 11=Proterozoic . Land use: 1=Residential, 2=Commercial, 3=Industrial, 4=Office, 5=Miscellaneous Employment, 6=Transportation, 7=Open Space, 8=Multiple Use, 9=Vacant . Size is reported here in km2 although the data are recorded in m2.

Quilt
 Type*4.00 (0.00)1.69 (0.46)3.00 (0.00)
 Orientation (°)82.55 (57.83)85.26 (57.20)89.29 (44.96)
 Area/Size (m2)467.94 (449.05)322.60 (188.02)95.18 (5.01)
 ShapeIndex/Shape0.70 (0.05)0.77 (0.05)0.73 (0.00)
 Count (Polygons)686588
Geology
 Type*7.25 (2.77)4.00 (0.00)5.00 (0.00)
 Orientation (°)69.62 (38.41)62.24 (45.90)53.86 (39.77)
 Area/Size (km2)1,823,877 (6,785,494)89,373 (183,614)288,172 (1,203,355)
 ShapeIndex/Shape0.56 (0.22)0.82 (0.14)0.73 (0.21)
 Count (Polygons)645063
Surprise Land Use
 Type*5.63 (2.81)7.00 (0.00)1.00 (0.00)
 Orientation (°)96.73 (53.14)100.23 (50.69)93.24 (58.54)
 Area/Size (km2)5126 (30,331)1603 (5577)204 (571)
 ShapeIndex/Shape0.71 (0.21)0.63 (0.23)0.74 (0.16)
 Count (Polygons)12247187
Tempe Land Use
 Type*4.49 (2.14)9.00 (0.00)5.52 (2.72)
 Orientation (°)94.68 (54.18)91.76 (61.77)94.31 (56.59)
 Area/Size (km2)211 (567)561 (161)117 (682)
 ShapeIndex/Shape0.70 (0.19)0.71 (0.18)0.75 (0.16)
 Count (Polygons)13276619
Table 3.   Standard Deviations: Sets of Geographic Objects vs. All Polygons in Pattern
AttributeSet 1Set 2Set 3Overall
  1. Bold type indicates instances where the standard deviations within the sets are less than the standard deviation values for the overall data sets.

Quilt
 Type0.000.460.000.93
 Orientation57.8357.2044.9652.96
 Size449.05188.0295.18312.16
 Shape0.050.050.000.05
Geology
 Type2.770.000.002.15
 Orientation38.4145.9039.7741.67
 Size6,785,494.00183,614.001,203,355.004,217,584.00
 Shape0.220.140.210.23
Surprise Land Use
 Type2.810.000.003.03
 Orientation53.1450.6958.5455.76
 Size30,331.005577.00571.0018,143.00
 Shape0.210.230.160.19
Tempe Land Use
 Type2.140.002.722.74
 Orientation54.1861.7756.5956.72
 Size567.00161.00682.00635.00
 Shape0.190.180.160.16

Step three of the TOSS method was to calculate the spatial distributions of the sets of geographic objects for each pattern using nearest neighbor analysis, requiring the creation of point pattern subsets of each of the polygon clusters (Fig. 5). Table 4 summarizes the results of nearest neighbor analysis on these point patterns for each of their sets of geographic objects (Set 1, Set 2, Set 3). These are the results of performing TOSS pattern analysis on the four data sets. The theoretical extremes of the R statistic are a clustered (R=0), random (R=1), or dispersed (R=2.1491) pattern, but in practice anything significantly <1 would be described as clustered and anything significantly >1 would be described as dispersed. The statistical significance of these values is described with the Z score.

Figure 5.

 The three point sets of geographic objects for each data set after performing TOSS pattern analysis. The points were derived from the polygon centroids according to their cluster classifications.

Table 4.   TOSS Pattern Analysis Results
SetnRZR
  1. Bold type indicates that values were determined with a .05 level of significance.

Quilt
 Set 1681.5829.18
 Set 2651.4146.38
 Set 3881.0200.36
Geology
 Set 1640.607−6.01
 Set 2500.516−6.56
 Set 3630.668−5.04
Surprise Land Use
 Set 11220.426−12.14
 Set 2470.553−5.87
 Set 3180.365−16.38
Tempe Land Use
 Set 11320.507−10.84
 Set 2760.483−8.62
 Set 36190.617−18.25

Pattern analysis comparison

Table 5 compares the results of the TOSS method to five global pattern analysis techniques. It is clear from this comparison that different approaches to analyzing pattern yield different results. TOSS results are similar to the nearest neighbor analysis, which is not surprising because nearest neighbor analysis is the basis for TOSS. Join-count could not be applied because, in the case of k-color maps, the distribution needs to be based on controlled boundaries such as census tracts or other municipal boundaries. This would allow for cases of the same attribute to be adjacent to one another. In the case of data-driven maps (where the attributes are fixed and the geographic boundaries are variable), if two areas with the same attributes were adjacent to one another, they would be merged into one area. Moran's I and Geary's C could not be applied to these data because the pattern attributes are nominal and these methods require quantitative data. Other studies have applied Moran's I to nominal data but the attribute measured is quantified by the size or intensity of the named attribute (the work by Overmars, de Koning, and Veldkamp (2003) cited earlier is an example of this type of application). Table 6 provides a qualitative comparison of several pattern analysis methods in terms of their data requirements, limitations, simplicity, and intuitiveness. While TOSS is only applicable to area-class maps, the table highlights how the TOSS method fills a niche that other methods do not—a pattern analysis method for discrete or continuous areas of mixed data types within the same data set that are based on controlled attributes and variable geographic space.

Table 5.   Comparison of Results for TOSS and Existing Pattern Analysis Methods
 QuiltGeologySurprise
Land Use
Tempe
Land Use
  1. NNA, nearest neighbor analysis . * N/A because the data need to be binary or, for k-color maps, the polygon boundaries need to be controlled rather than variable . N/A because the attribute data are nominal and the technique cannot be applied globally.

TOSSSet 1=DispersedSet 1=RandomSet 1=ClusteredSet 1=Random
Set 2=DispersedSet 2=RandomSet 2=RandomSet 2=Clustered
Set 3=Not significantSet 3=RandomSet 3=ClusteredSet 3=Random
Join Count*N/AN/AN/AN/A
Quadrat AnalysisDispersed/Uniform (CV=0)Random (CV=.84)Clustered (CV=2.44)Random to Clustered (CV=1.32)
NNAClustered (R=1.28)Random (R=0.79)Random (R=0.51)Random (R=0.77)
Moran's IN/AN/AN/AN/A
Geary's CN/AN/AN/AN/A
Human subject on Clustered, Random, DispersedClustered (32%)Random (18%)Random (27%)Random (21%)
Human subject on TOSSClusters of no discernable pattern (19%)None of the above (36%)None of the above (25%)None of the above (27%)
Table 6.   Qualitative Comparison of TOSS with Existing Pattern Analysis Methods
Method  LimitationsSimplicityMatches
human
intuition
  1. NNA, nearest neighbor analysis.

 Geographic AttributeData Requirements   
TOSSContiguous or discrete polygonsNominal, Ordinal, Interval, and RatioApplies only to area-class maps (no points)Interpretation difficult100% No
Join CountContiguous polygonsNominal, Ordinal, Interval, or RatioApplies best to binary data or fixed boundary polygonsCumbersome with large datasetsN/A
Quadrat AnalysisPoints or contiguous polygonsNominal, Ordinal, Interval, or RatioDoes not evaluate the distribution of the attributesDependent on study area definition50% Yes
NNAPoints or polygon centroidsNominal, Ordinal, Interval, or RatioDoes not evaluate the distribution of the attributesDifficult to determine appropriate quadrat size100% Yes
Moran's IPoints or PolygonsInterval or RatioEvaluates only the dispersion of attributes for interval or ratio dataYesN/A
Geary's CPoints or polygonsInterval or RatioEvaluates only the dispersion of attributes for interval or ratio dataYesN/A

Human subject testing

A survey was administered to 234 undergraduate students enrolled in an introductory physical geography course in midsemester during the Fall of 2005. There was approximately a 50–50 split between male and female participants, so we do not expect to see any gender biases in our results. Student majors were also quite varied with most of them in Engineering, Education, Physical Sciences, and Social Sciences. The strongest bias in our sample was that the majority of the students were young adults (nearly 74% were between the ages of 18 and 20 and 17% between the ages of 21 and 23), therefore representing a small subset of the adult population. This is not surprising but suggests that their life experiences (e.g., working in the job market) are limited. We tested their basic geography skills with three questions on spatial orientation and map reading. Only 11% answered all three questions correctly; most students (40%) only answered one question correctly. These results suggest that their skills in basic geography are poor despite having 8 weeks of physical geography coursework.

Table 7 summarizes the results of the survey participants' perspectives on pattern terminology when asked to examine the four sample data sets used in this research. The survey evaluated how well pattern analysis terminology describes patterns. Results suggest that humans assign terminology similar to the terminology used for nearest neighbor analysis (4 out of 4 data sets were assigned the same term). The terminology we wanted to associate with TOSS did not match human perception of pattern. We, therefore, eliminated the step of assigning terminology to TOSS results.

Table 7.   Summary of Human Subject Evaluation of Pattern Terminology
 QuiltGeologySurprise
Land Use
Tempe
Land Use
Clustered, Random, Dispersed
 Most popular answerClustered (56%)Random (33%)Random (36%)Random (46%)
None (20%)Dispersed (31%)Dispersed (25%)Dispersed (32%)
Dispersed (18%)Clustered (20%)Clustered (21%)Clustered (15%)
Random (3%)None (14%)None (15%)None (5%)
 Least popular answerIncorrect (3%)Incorrect (3%)Incorrect (3%)Incorrect (1%)
TOSS
 Most popular answerClusters of regular pattern (36%)None (38%)No overall pattern (35%)None (25%)
Overall regular pattern (29%)No overall pattern (27%)None (30%)No overall pattern (15%)
Clusters of no discernible pattern (20%)Clusters of no discernible pattern (16%)Clusters of no discernible pattern (16%)Clusters of regular pattern (11%)
No overall pattern (13%)Clusters of regular pattern (15%)Clusters of regular pattern (15%)Clusters of no discernible pattern (6%)
 Least popular answerNone (<1%)Overall regular pattern (4%)Overall regular pattern (3%)Overall regular pattern (<1%)

Discussion

Effectiveness of TOSS

We evaluated whether groups of polygons that are similar with respect to attribute type, orientation, shape, and size produce patterns. The results of the TOSS method deliver insightful aspects to the patterns' compositions and illustrate how the TOSS approach is an effective pattern analysis tool.

Each of the three sets of geographic objects generated from the Pennsylvania geology data set resulted in a statistically significant random pattern. Within the geology pattern, for example, all the polygons in Sets 2 and 3 are of the same type (Pennsylvanian and Mississippian, respectively) while Set 1 consists of the remaining polygon types. Set 1 (n=64) is composed of a mixture of geology types, orientation, the largest-sized areas, and fairly irregular shapes; Set 2 (n=50) is composed of only one geology type (type=4), orientation that averages 70° (where 90° represents due north), but has a high standard deviation, the smallest-sized areas, and the most compact shapes; and Set 3 (n=63) is composed of only one geology type (type=5), orientation, relatively small-sized areas (but larger than those in Set 2), and somewhat compact shapes (but less so than in Set 2). A geologist may want to investigate the fact that these rock layers are more similar to each other across four attributes than the other rock layers are to each other based on these findings.

In all but the Pennsylvania geology data set, TOSS results describe a combination of patterns for the three sets of geographic objects, providing different results than pattern analysis performed on the overall patterns. The interpretation can be further explained by reexamining the patterns of the three sets of geographic objects in each pattern (Fig. 5). The quilt pattern consists of two dispersed sets and one set where the pattern is not statistically significant based on the Z score. Set 3 represents a Type II statistical error because we expect that all three sets in the quilt pattern should be perfectly regular. The fact the land use data sets consist of some sets with clustered patterns is encouraging because the TOSS method aims to group similar polygons together. For example, with the two land use patterns, it is logical to assume that most residential lots of similar shape and size would be close to each other just as commercial districts with larger lots would be clumped together as part of a city's land use plan.

Comparative analysis

Our comparative analysis suggests that the TOSS method provides researchers with a tool to analyze pattern that formerly did not exist. The method has the potential to provide insight into groups of polygons that are similar with respect to orientation, size, and shape. TOSS pattern analysis provides an approach to analyze data-driven patterns of discrete or continuous polygons and their attributes. The TOSS method can also be used on data sets with mixed data types, distinguishing it from the existing pattern analysis methods in our comparison. One drawback of TOSS is that the results can be difficult to interpret because sets of polygons each have their own descriptive pattern. While this has the potential to provide insight because a group of similar polygons is distinct, the method is not as straightforward as other approaches.

Our human subject testing assessment revealed some negative results. We attempted to assign terminology to describe the overall pattern of TOSS (e.g., sets of clustered polygons or sets of dispersed polygons). The human subject testing we performed suggested our current phrases do not adequately describe pattern. We are not completely discouraged by this outcome for several reasons. First, our human subjects are not as experienced with analyzing geographic patterns as professionals in the field may be. While conducting the survey, we provided only a cursory explanation of the pattern terminology. It was left up to the participant—and their own life experiences—to assign the terms and phrases to the graphic images. As measures, we can expand our participants to include more experienced professionals and provide more detailed explanation of the terms and phrases. Second, our four sample data sets did not represent the range of TOSS results that are possible. When the data sets were identified, it was our intention to capture a wide range of possibilities—from patterns that exhibit regularity (e.g., the quilt) to less regularity (e.g., land use). Based on the statistical results and the human subject assessment, this was not the case.

Finally, our evaluation did not allow participants to write down their perception of the patterns they were viewing. If participants were permitted to provide their own description of pattern, we might have been able to capture phrases more appropriate to the patterns displayed. Based on the above considerations, we believe that a follow-up assessment is warranted. With a more informed audience and an expanded survey we may be able to determine better phrases TOSS pattern analysis.

TOSS considerations

The TOSS method relies on equally weighted values of each polygon's type, orientation, size, and shape, attributes chosen for their visual blatancy and ease of calculation. We could have instead chosen different attributes, used only 3 attributes or even 10, or we could have weighted the polygon attributes to reflect their known significance or surmised effect (if any) on a pattern's formation. For our research each attribute was given equal weight during the Gower standardization phase of the TOSS method, which can be easily adjusted. For example, a researcher may hypothesize or know that a size has significant influence on an overall pattern and should be the focus of pattern analysis studies and weighted accordingly. The omission of particular attributes is also warranted. For example, performing a Pearson's product moment coefficient may show that the measurements for any two variables are correlated. If the assumption is that the four TOSS attributes are independent, then the multicolinearity may cause an unintentional weighting of one or more of the TOSS attributes. For example, the size and shape attributes could be correlated either because of the measurements chosen to generate the data or because of inherent processes that contribute to the formation of shape itself. A likely cause of any correlation is that the size measurement for a polygon was its area and the shape measurement was a perimeter-to-area ratio, both of which include a polygon's area. A less obvious but also likely cause of the correlation could be that, regardless of the measurements used to calculate a polygon's shape, intrinsic correlations may exist in all polygon pattern data between a particular shape and its size. Determining this, however, requires a thorough investigation comparing multiple shape indices and a polygon's area, which is beyond the scope of this research. If such a study concluded that a polygon's shape and size are inherently correlated, then both attributes should be included in pattern analysis and would essentially create a natural weighting factor for attributes that contribute to the structure of an overall pattern.

Choosing to measure a polygon's shape with the perimeter-to-area ratio compactness index certainly affected what aspects of the polygons were considered (i.e., their perimeters and areas) and indirectly may have influenced the assigned set for a particular polygon during cluster analysis. Using an alternative multiple-parameter shape index, such as the Wentz trivariate approach (Wentz 2000), may have classified polygons based on more advanced mathematical measures, which may have grouped the polygons differently. Zhao and Stough (2005) introduced an elongation index that measures shape similarity, including fragmented and perforated shapes, offering another possible approach that could be extended for use in pattern analysis. The generation of the sets of geographic objects was a primary step of the TOSS method, so any decision made regarding the TOSS attribute measurements has an effect on the input values for the cluster analysis and, consequently, the end results.

The number of sets of geographic objects and the number of objects within each set generated during the cluster analysis phase of the TOSS method also affects the results. Theoretically the more clusters generated, the more similar the polygons would be within each cluster until each polygon was alone in its own cluster. Although a polygon is more similar to the remaining polygons in its set than it is to the overall collection of polygons, limiting the clusters to three sets of geographic objects per pattern caused several seemingly different polygons to be grouped into the same cluster or set and may not have been enough to distinguish polygons that truly belong together and those that do not. The question remains as to what number of clusters is best, which for each data set may be different and cannot be standardized in a method such as the TOSS method. Another problem with creating sets of similar polygons arises during the use of the nearest neighbor analysis statistic. The results from nearest neighbor analysis become less meaningful when the sample size is too small. Therefore, sets must contain a large enough sample size to be evaluated and interpreted.

One more clustering analysis decision that affected the results of the TOSS method was the selection of clustering methods. The decision to ultimately perform point pattern analysis on the sets of geographic objects dictated that a reasonable number of polygons had to be within each set, a number that would allow for a spatial distribution description. Again, this may have caused polygons with drastically different TOSS attributes to still be clustered into the same set of geographic objects. Devising a pattern description scheme that allows for a small number of polygons to constitute a pattern would be necessary for there to be relevance in choosing an alternative clustering method, such as the average linkage method.

The TOSS method relies on an existing method of pattern analysis—in our case we used nearest neighbor analysis. An inherent characteristic of nearest neighbor analysis could have dictated the pattern descriptions, regardless of the patterns being separated into sets of geographic objects. Using a different method of analyzing dispersion could have made a difference in the outcome.

Conclusions

The TOSS method's most obvious distinguishing element is that it describes pattern based on groups of similar objects for area-class maps. These sets of objects are formed statistically based on attribute type, orientation, size, and shape. The Gower (1971) coefficient of similarity was introduced to pattern analysis as a means to group similar objects and to be able to do so using data of different measurement scales and mixed data types. The intention behind our research was to determine whether groups of similar objects from area-class maps exhibit a pattern. Our method is similar to pattern analysis methods that measure spatial heterogeneity in a pattern (i.e., local Moran), but the TOSS method assumes that these differences exist as a result of a combination of the geometric characteristics of areas as well as nonspatial attributes.

Future research objectives are aimed at testing the TOSS method's flexibility so that it may be applied to and effective on a variety of spatial patterns in various disciplines. This is especially pertinent in the number of clusters or sets of geographic objects that are generated during the cluster analysis phase. Although three clusters is not a firm rule of the TOSS method, it was the chosen number for our research and we suggest that future research be more experimental in determining this number. Another decision open for future researchers is the importance given to particular polygon attributes, taking advantage of the weighting options during the Gower standardization phase. Weights could be experimental or based on known facts in a research field. The measurements chosen to calculate the attributes of the TOSS method are also variable. Results of the quilt pattern were the most promising in terms of the TOSS pattern's effectiveness, implying that the simple attribute measurements may be good for simple polygons; more complex measurements of orientation, size, and shape may be necessary for patterns of more complex and irregular polygons. Even the attributes chosen for the TOSS method are flexible, meaning that in different disciplines different combinations of nominal and interval/ratio data may be combined to describe polygonal patterns. This would, of course, change the acronym, but the important and unique aspects of the new pattern analysis method would be preserved.

Note

  1. 1 We used ESRI's ArcInfo Workstation (Version 8.1) with the ZONALGEOMETRY command and the ELLIPSE parameter. This calculation measures the angle of a polygon's primary axis.

Ancillary