We used recorded occurrences of bird and mammal species in the USA and Australia, obtained from the Global Biodiversity Information Facility (GBIF) using the GBIF database from November 2011 (GBIF 2008). In order to control the geographic accuracy of the records used in the analyses, we only used records with three or more decimal digits in both longitude and latitude coordinates. Collection and museum data, as well as data providing portals such as GBIF, include several types of bias, originating from different sampling errors (Guralnick et al. 2007; Kadmon et al. 2009; Kent and Carmel 2011). A rigorous testing of the effect of inherent biases in GBIF data (Kent and Carmel 2011) concluded that under strict use of multiple species, and across large spatial extents, biases in the data have no significant effect on species composition analyses using multivariate analyses, like the one we conducted here. Several recent studies also concluded that occurrence records may be used for ecological spatial studies under certain constrictions (Graham et al. 2007; Loiselle et al. 2008), offsetting the inherent biases in collection data. An additional measure that can be taken to reduce the effect of taxonomic bias in GBIF data is the use of external taxonomic lists (Guralnick et al. 2007). Here, we filtered our data using only bird species that breed in the respective regions, using a species list from the Breeding Bird Survey of North America and a similar list provided by an expert ornithologist from Australia (J. Szabo, pers. comm.). We omitted all bat species from the mammal datasets, as we assumed that their ecological requirements are very different from those of terrestrial mammals.
In addition to occurrence records, we compiled GIS layers of environmental variables from the two regions, using remotely sensed data (LULC from http://glcf.umiacs.umd.edu/ and NDVI from http://www.fao.org/geonetwork), and fine scale global climatic and elevation variables available from WorldClim (Hijmans et al. 2005). All environmental variables (Table 1) were available at a spatial resolution of 0.008330 longitude (equivalent to ~1 km2 around the equator) or finer. In order to record the values of the different variables at various grain sizes, we resampled the variable layers using the mean value for continuous variables and majority value for categorical variables. Scale was altered quantitatively by simultaneously altering both grain and extent, in order to maximize the explanatory power of environmental variables simultaneously (for details, see Appendix). Spatial scale, as defined here, consists of two components following Wiens (1989). Extent is the area covered by a delineation of all sampling locations in a given study area. Each extent consists of a basic sampling grid. The size of a single cell in a given sampling grid is the grain size. When moving up from the finest scale to the next coarser scale, we doubled the length of the side of each grid cell. We repeated this process 10 times. We created an ArcGIS python script that generated sets of square sampling grids of extent E and grain g at each scale (Table 2). In each sampling grid, comprising 32 × 32 pixels (total 1024 cells), the script returned the number of pixels with species occurrences and the number of species in the grid. In order to meet the requirements of multivariate analyses, we set a threshold on the amount of data in each of the sampling grids. A sampling grid was included in the analyses if it met two conditions. First, it had to include at least 30 grid cells with nonsingleton occurrences (more than one occurrence record per cell), and second, it contained data on at least six different species. For each selected sampling grid that complied with the thresholds, and at each scale, we ran a partial Canonical Correspondence Analysis (pCCA) using the vegan package (Oksanen et al. 2008) in the R statistical software package, version 2.12 (R Core Team 2010). For pCCA, we divided the environmental variables into four groups: climate (mean annual temperature, temperature seasonality, mean annual precipitation, and precipitation seasonality), topography (elevation and elevation range), land use–land cover (distance to urban areas, population density, and percentages of agriculture, forest, grasslands, urban, surface waters, and wetland areas), and NDVI. We then applied pCCA to each group separately (ter Braak 1986; ter Braak and Verdonschot 1995; Cushman and Mcgarigal 2002; Legendre et al. 2005). To calculate the amount of variance in species composition explained by each variable and each group, we divided the inertia of each group in each sampling grid by the overall inertia in the respective sampling grid and multiplied it by 100. Total inertia is an expression of the amount of variance in the species data within the sampling grids (ter Braak 1986), and individual inertia is equivalent to the amount of variance that is related solely to the specific variable (the exclusive fraction) or group of variables, after accounting for the variance explained by other variables (the shared fraction) and the interaction between the different variables (Cushman and Mcgarigal 2002). Due to data limitations, we omitted the finest scale from the analysis of mammals in Australia and the two finest scales from the analysis of mammals in the contiguous USA (Tables 2 and 3).
Effect of variability in the explanatory variables
The explanatory power of the environmental variables might be correlated with the range of conditions in the sampling grids, that is, the effect per unit change might be constant. In such cases, the differences in explained variance in species composition among different explanatory variables are attributed to the range of variable values (see e.g., Steinitz et al. 2006). Alternatively, the strength of the relationship may also be determined by the ecological affinity of species to the environmental variables, thus even variables with little variability will have strong explanatory power. We examined the per unit effect of climate and LULC variable groups (which consistently explained most of the variance in species composition) by calculating the coefficient of variation (CV) of each variable in each sampling grid used in the analyses. We then fitted a linear regression model to test what amount of explained variance in species composition for each of the species groups at each spatial scale can be attributed to CV. Linear regressions were carried out using R version 2.15.2 (R Core Team 2010). CV was calculated as the ratio of variance of an environmental variable in the sampling grid and the mean value of that variable in that grid. This was possible as the size of the smallest sampling grid cell was coarser than the resolution of the original environmental data layers. A strong correspondence between the amount of explained variance in species composition and the CV of an environmental variable would suggest that the per unit effect is constant and the differences are a result of the amount of variance in the explanatory variable. However, a large amount of explained variance in species composition attributed to environmental variables with low variability will indicate a strong ecological affinity.