Population densities are a key variable in ecology, but their measurement is complicated by problems that appear trivial only at first glance. Species are not distributed over their range uniformly, and the spatial distribution of their abundances or densities does not usually follow a simple model, such as a Gaussian curve around their geographical centre (Blackburn et al., 1999; Sagarin & Gaines, 2002a,b; Sagarin et al., 2006). This spatial heterogeneity has two consequences. First, mean population densities over a (larger) area do not satisfy all aspects of density-related research questions, for example if they relate to interactions of individuals where empty areas (such as water bodies for a terrestrial organism) are irrelevant (see Gaston et al., 1999, and references therein). Second, estimates of population densities from compilations of samples, particularly those taken from the published literature or otherwise ‘non-standardized’ methodology, might create biases if the location of sample areas has not been chosen independently of population densities. It is probably the norm rather than the exception that study sites are chosen non-randomly, as few researchers would decide to conduct a field study at a site where their target organism does not occur (or occurs only in very low numbers). If densities are calculated across compilations of samples (for the purpose of region-wide abundance estimates), biases due to non-randomly chosen sample sites must be expected to increase with decreasing sample area, as habitat heterogeneity (and hence unsuitable habitats) is increasingly represented as different (potential) sample plots, rather than being included in every sample. Pautasso & Gaston (2006) have shown that world-wide bird census data are consistent with this idea. The consequence is a lower-than-proportional increase of abundances with increasing areas, or a negative density–area relationship.
In an attempt to precisely identify the most extreme bias resulting from the exclusion of sample areas where a species does not occur at all, Pautasso & Weisberg (2008) presented data on a complete census of violets and their spatial distribution in 100 m2 of English grassland (at eight replicates), which we call ‘perfect data sets’ for the purposes of such a study. By deliberately omitting areas without violets (‘zero-cells’) at different spatial resolutions (i.e. grid cell sizes), they could show that this creates a negative density–area relationship because at finer resolution a larger proportion of areas formed by empty grid cells could be cut out (note that there's a direct analogy to measuring range occupancies of species at different resolutions or ‘grains’). We applaud this approach, although we suspect that the studied system represents a ‘microcosm’ where habitat heterogeneity is possibly much lower than in larger systems (despite a large number of grid cells). Pautasso & Weisberg (2008) also analysed data for human population sizes in administrative units of various countries and regions, arguing that larger units contain more ‘zero-cells’ for the reasons pointed out above. However, administrative boundaries are probably delineated in order to precisely produce this pattern, i.e. avoid small units with (almost) no-one living there. The authors recognized this problem and conceded that their data set on human population density is not ‘perfect’. In the present contribution we repeat Pautasso & Weisberg's violet analyses on gridded data of human population densities for Central Europe and the eastern United States. The grid pattern avoids issues of density variation associated with administrative units. In addition, we present spatial simulations investigating the link between the heterogeneity of densities (i.e. their spatial autocorrelation) and the resulting density–area relationship.
The empirical analysis of population densities is based on the LandScan 2005 Global Population Data Base (http://www.ornl.gov/sci/landscan/), which provides global data in a 30 ¥ 30 geographical grid. The data set is based on census data where available, but also relies on interpolation techniques in combination with remote sensing data (such as roads and night time illumination) to fill gaps in the census data. By restricting ourselves to regions where censuses are probably most reliable and interpolation assumptions met more often than outside the ‘developed world’ (e.g. availability of electricity, road access), we consider these data ‘nearly perfect’. We transformed data to an equal-area grid (using bilinear interpolation; one square cell ª 0.92 km2) and cut out 526 ¥ 1024 cell rectangles in Europe and eastern America (not containing any sea areas or large water bodies). We calculated population densities of these rectangles after omitting empty cells, exploring different resolutions (i.e. grid cell sizes) in log2 classes (i.e. repeated doubling of cell sizes) rather than using log10 classes (Pautasso & Weisberg, 2008; this allowed a finer resolution of plots). As we did not apply fitted statistical models, we carried out calculations on non-transformed data. All matrix calculations were carried out in matlab (version 7.0).
Density data varying only in spatial structure were simulated with the geostatistical software package gstat (Pebesma & Wesseling, 1998; Pebesma, 2004). Random fields were generated by unconditional Gaussian simulation with a spherical variogram model and a standard normal population (m = 0, s2 (sill) = 1). The simulation was repeated using different variogram ranges (1, 5, 10, 15, 20, 25, 30, 50 km). In other words, we randomly assigned log-transformed population densities, drawn from a normal distribution, to grid cells, constrained only by enforcing spatial autocorrelations of different strength. The simulations were carried out for a grid of the same shape and extent as those applied to the empirical population density data. Values in each random field were then exponentiated to gain lognormal distributions, and values below 1 were set to zero. Our inherent assumption of (truncated) lognormally distributed population densities is supported by the findings of Decker et al. (2007), who reported that human settlement sizes over their whole range of size classes are better described by lognormally distributed values than by ‘Zipf's law’ (a power function), which fitted only the ‘tail’ of larger agglomerations well.
Figure 1 shows our plots of population density of non-empty areas at different resolutions (grid cell sizes) for human populations of central Europe and the eastern USA. The decline of density with increasing grid size confirms the potential error caused by ignoring zero-cells, as suggested by Pautasso & Weisberg (2008). Figure 1 also shows that the relationship between density and grid cell size is not linear. Zeros-excluded and zeros-included density estimates are linked by a hyperbolic relationship in which the constant of proportionality is the occupancy of the sample set (i.e. the proportion of empty cells). This problem has already been discussed in an ecological context decades ago (Aitchison, 1955; Pennington, 1983; Wright, 1991). Above a certain grid cell size there were no more empty cells and density measurements became constant. Effectively, the constant density value above a certain grid cell size reflects the achievement of a density representative for a given environment, e.g. an ecoregion or country. The density–area pattern is very similar for European and US data, although population densities were generally lower, at all spatial resolutions, in the USA. Negative density–area relationships (note that Fig. 1 has a different design from equivalent data displayed in Pautasso & Weisberg, 2008) develop because the area covered by non-empty grid cells increases with increasing cell size (coarsening resolution). Density–area functions display a much steeper slope in Europe than in the USA, reflecting a different spatial structure of data and population densities.
Figure 2 shows the results of simulating different strengths of autocorrelation, or spatial ‘clumping’. With increasing variogram range (i.e. increasing autocorrelation), the effects of ignoring zero-cells became stronger and occurred at larger spatial resolutions. This indicates that the risk of generating erroneous density data is higher in regions with strong contrasts in population density than in areas with relatively small changes in density. Visual inspection of mapped simulation results closely resembled empirical patterns when a measured variogram range of c. 13 km was used.
Our results demonstrate that research on the spatial pattern of population densities must proceed in at least two directions. For practical research methodology, we need to know how large errors of population measurement are when compiling non-standardized data based on sample areas of various sizes and unknown biases of location (Scherner, 1981; Gaston et al., 1999). As shown above, there are links of density–area relationships with the spatial structure (i.e. autocorrelation) of populations. Further research might provide quantitative relationships between variogram parameters (i.e. range or lag distance) and the ‘zero-cell’ effects discussed above (e.g. at what spatial scale no bias of mean densities may be expected). Even rough estimates of the latter may then provide some information on the question of what data are suitable to include in measures of populations density. Webster & Oliver (2001, pp. 85–103) discuss how to design field studies to estimate variogram parameters. They recommend a minimum sample size of 100, but better 150 plots, for precise estimates. For the purpose of pre-study investigation, this is still in a realistic range of effort for large sampling schemes such as the regular British or American bird surveys (i.e. the British Breeding Bird Survey, http://www.bto.org/bbs/ and the North American Breeding Bird Survey, http://www.mbr-pwrc.usgs.gov/bbs/) but probably not for many less well-supported investigations. Without such information, mean densities across large, poorly surveyed areas must be used with the utmost care (Pautasso & Gaston, 2006). Depending on the research topic, their use may be avoidable (see, e.g., Beck et al. (2006) for meta-analyses of local results in a spatial context).
Process-based models of population growth, migration, resource use and other behavioural traits, in dependence on habitat suitability, are commonly applied in ecology to understand the shaping of distribution patterns of species (e.g. Hubbell, 2001; Freckleton et al., 2005, 2006). Applying such modelling approaches to human systems may be especially fruitful since better empirical data on human population densities are available (as exemplified by the data used here) for testing spatially explicit models. Moreover, we may have deeper insight in the proximate causes of individual behaviour in human beings. Furthering this approach in order to understand the mechanisms leading to observed patterns of spatial autocorrelation of population densities would help us to cope with the methodological problems outlined above, as well as aiding our general understanding of ecological processes in a spatial context.