The presence–absence matrix reloaded: the use and interpretation of range–diversity plots

Authors

  • Héctor T. Arita,

    Corresponding author
    1. Centro de Investigaciones en Ecosistemas, Universidad Nacional Autónoma de México, Apartado Postal 27-3, CP 58090, Morelia, Michoacán, Mexico
      Héctor T. Arita, Centro de Investigaciones en Ecosistemas, Universidad Nacional Autónoma de México, Apartado Postal 27-3, CP 58089, Morelia, Michoacán, Mexico. E-mail: arita@oikos.unam.mx
    Search for more papers by this author
  • Andrés Christen,

    1. Centro de Investigación en Matemáticas, Apartado Postal 402, CP 36000, Guanajuato, Mexico
    Search for more papers by this author
  • Pilar Rodríguez,

    1. Comisión Nacional para el Conocimiento y Uso de la Biodiversidad, Liga Periférico-Insurgentes Sur 4903, CP 14010 Mexico, DF, México
    Search for more papers by this author
  • Jorge Soberón

    1. Natural History Museum, University of Kansas, Lawrence, KS 66045, USA
    Search for more papers by this author

Héctor T. Arita, Centro de Investigaciones en Ecosistemas, Universidad Nacional Autónoma de México, Apartado Postal 27-3, CP 58089, Morelia, Michoacán, Mexico. E-mail: arita@oikos.unam.mx

ABSTRACT

Aim  A great deal of information on distribution and diversity can be extracted from presence–absence matrices (PAMs), the basic analytical tool of many biogeographic studies. This paper presents numerical procedures that allow the analysis of such information by taking advantage of mathematical relationships within PAMs. In particular, we show how range–diversity (RD) plots summarize much of the information contained in the matrices by the simultaneous depiction of data on distribution and diversity.

Innovation  We use matrix algebra to extract and process data from PAMs. Information on the distribution of species and on species richness of sites is computed using the traditional R (by rows) and Q (by columns) procedures, as well as the new Rq (by rows, considering the structure of columns) and Qr (by columns, considering the structure by rows) methods. Matrix notation is particularly suitable for summarizing complex calculations using PAMs, and the associated algebra allows the implementation of efficient computational programs. We show how information on distribution and species richness can be depicted simultaneously in RD plots, allowing a direct examination of the relationship between those two aspects of diversity. We explore the properties of RD plots with a simple example, and use null models to show that while parameters of central tendency are not affected by randomization, the dispersion of points in RD plots does change, showing the significance of patterns of co-occurrence of species and of similarity among sites.

Main conclusion  Species richness and range size are both valid measures of diversity that can be analysed simultaneously with RD plots. A full analysis of a system requires measures of central tendency and dispersion for both distribution and species richness.

INTRODUCTION

Research on macroecology focuses on the analysis of spatial patterns and processes at the regional, continental and global scales. Patterns of interest are based on variables showing geographic variation, from intra-specific and inter-specific traits to attributes of whole assemblages (Gaston et al., 2008). Most of these patterns can be summarized in species × site matrices, in which rows represent taxa, columns correspond to localities, and each element shows some attribute of a particular species at a given site (Bell, 2003; Gaston et al., 2008). The most basic form of such matrices is the presence–absence matrix (PAM), in which elements acquire binary values that represent the presence (1) or absence (0) of a particular species in a given site (Gotelli, 2000; Arita et al., 2008). Matrices can be analysed by columns (Q-mode) or by rows (R-mode), yielding different kinds of information from the same data (Williams & Lambert, 1961; Sneath & Sokal, 1973; Simberloff & Connor, 1979; Legendre & Legendre, 1983). In large-scale studies, an analysis of PAMs by rows produces information on the range size of species, whilst an equivalent analysis by columns yields data on the species richness of sites.

Additional information can be extracted from PAMs by using Rq- and Qr-mode analyses (Arita et al., 2008). In the Qr-mode, data are computed by columns (by sites), but considering the structure of the rows that intersect a given column with a non-zero entry (that is, species occurring in the focal site). This procedure generates the ‘dispersion field’ of a site, which is the set of ranges of all species that occur in that locality (Graves & Rahbek, 2005; Arita et al., 2008). A comparison of dispersion fields for several sites allows in turn the analysis of the geographic variation of range sizes (Lutz, 1921; Anderson & Koopman, 1981; Rapoport, 1982; Hawkins & Diniz-Filho, 2006; Orme et al., 2006). Equivalently, Rq-mode analyses are performed by rows (by species), but incorporating information from the columns that intersect the focal row with a non-zero entry. The resulting set of species-richness values of the sites that form the range of a species is the ‘diversity field’ of that species (Arita et al., 2008; Villalobos & Arita, 2010).

The properties of dispersion and diversity fields can be visualized using range–diversity (RD) plots (Fig. 1), in which information on range size and species richness is depicted simultaneously (Arita et al., 2008; Borregaard & Rahbek, 2010). RD plots can be built by species or by sites, and a complete understanding of a system consisting of several species occurring in a number of sites would normally require the use of both types of plot. The location of points in RD plots by species depends on the covariation among species, which is ultimately defined by the patterns of co-occurrence. Because variance in species richness can be partitioned into components determined by the distribution of species (Schluter, 1984; Bell, 2003; Legendre et al., 2005), RD plots can be used as a visual tool for examining such decomposition, which can be tested quantitatively with a variance-ratio test (Schluter, 1984).

Figure 1.

Range–diversity plots for 10 species of mammals in 18 islands of the Thousand Islands region in New York (Lomolino, 1986). Note that some points overlap. (a) By species, showing their proportional range sizes (ordinates) and the average species richness within their ranges (abscissas); histograms on top and on the right-hand side show the frequency distribution of those variables; the solid curved line marks the upper theoretical limit for points; the vertical dashed line corresponds to the mean proportional species richness of the 18 sites and the hyperbolic dashed curves are lines of equal covariance among species. (b) By sites, showing their proportional species richness (ordinates) and the average proportional range size of species occurring in the sites; the other elements of the graph correspond to those described for (a).

Rq and Qr procedures, by combining information on species richness and distribution, allow analyses that go beyond the standard studies that consider each variable separately. Thus, RD plots can be useful tools in studies that require the simultaneous consideration of patterns of diversity and distribution, for example when examining patterns of beta diversity or nestedness (Arita et al., 2008; Christen & Soberón, 2009). RD plots and associated parameters can also be useful in the validation of dynamic models of continental diversity (Gotelli et al., 2009; Borregaard & Rahbek, 2010; Villalobos & Arita, 2010) and in the identification of priority areas and species for conservation initiatives.

In this paper we discuss the use of RD plots in extracting and interpreting information from PAMs. In particular, we examine the role of covariance in determining the position of points on the graphs, and explore the use of variance ratios in detecting association among species, as proposed by Schluter (1984), or among sites, as proposed here. We use matrix algebra to derive the mathematical relationships between diversity and distribution, and show how this procedure enables fast and efficient computer algorithms. We also present empirical examples and null models to illustrate the analytical power of RD plots.

INNOVATION

In this section we present a step-by-step guide to building and interpreting RD plots by extracting information from a PAM. We employ the mathematical relationships between diversity and distribution that have been demonstrated by Arita et al. (2008) and present an alternative notation based on matrix algebra (Christen & Soberón, 2009). We use a worked example to show simple ways in which parameters can be readily calculated and provide a fully functional R script (R Development Core Team, 2008) that should enable readers to manage large datasets (Appendix S1 in Supporting Information). Most potential users of RD plots not interested in the mathematical details might find all the information that they require by following the step-by-step introductory example and by analysing their own data with the R script as it is. Other users might want to go through the mathematical derivations to be able to modify the R script to suit their particular datasets or analytical needs.

The presence–absence matrix

The basic source of information for building RD plots is a S×N presence–absence matrix Δ that summarizes the distribution of S species among N sites (we follow the convention of denoting matrices and vectors with bold characters). Each row represents a species, each column represents a site, and the elements of the matrix are δ(i,j) = 1 if species i is present in site j, and δ(i,j) = 0 otherwise. The sum of elements along a row yields the number of sites in which the corresponding species occurs (that is, its range size ni), and the equivalent sum along a column equals the total number of species present in a site (that is, its species richness sj). The vectors containing the S range size values and the N species-richness values can be easily calculated as n=Δ1N and s=ΔT1S, where 1N and 1S are vectors of ones of length N and S, respectively, and the superscript T indicates transpose.

Table 1 is a PAM showing the distribution of S= 10 species of mammals in N= 18 islands of the Thousand Islands region of New York (Lomolino, 1986): The right-hand column in bold marked as ni is n, the vector of range size values (ni, which in this example is the number of islands in which each species occurs). The first row in bold is the transpose of S, the vector of species-richness values for each island, sj). The averages of these vectors are inline imagesites and inline image species, respectively.

Table 1.  Presence–absence matrix (PAM) showing the distribution of 10 mammal species (sp) among 18 islands (si) in the Thousand Islands region of New York (data from Lomolino, 1986).
 si 1si 2si 3si 4si 5si 6si 7si 8si 9si 10si 11si 12si 13si 14si 15si 16si 17si 18 ni Di inline image
  1. ni are the range-size (occupancy) values for each species; Di and inline image are the corresponding diversity field and range–diversity values. Sj represents the species-richness values for islands, and Rj and inline image are the corresponding dispersion field volume and per site range size values. Values in bold are derived from row or column totals.

sp 1110111110000000000 7 34 4.86
sp 2101111111111111110 16 39 2.44
sp 3111010010000000000 5 31 6.20
sp 4110000000000000001 3 21 7.00
sp 5111000000000000001 4 26 6.50
sp 6111110000000000000 5 32 6.40
sp 7111100000000000000 4 28 7.00
sp 8110000000000000000 2 19 9.50
sp 9110000000000000000 2 19 9.50
sp 10110000000000000000 2 19 9.50
sj 10 9 5 4 4 2 2 3 1 1 1 1 1 1 1 1 1 2    
Rj 50 34 34 32 33 23 23 28 16 16 16 16 16 16 16 16 16 7    
inline image 5.00 3.78 6.80 8.00 8.25 11.50 11.50 9.33 16.00 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 3.5    

The fill of the matrix is the total of occurrences (ones) in it, inline image in this example. The fill can be easily computed as the sum of all species-richness values inline image or of all range size values inline image, that is, inline image. If the inline image and inline image values are divided by the total number of sites or the total number of species, respectively, we obtain their proportional values inline image and inline image. It is easy to show that inline image, that is, the proportional fill of a matrix is always equal to the proportional mean richness or the proportional mean range size in the system. In the Thousand Islands example, inline image, meaning that on average each island contains 27.8% of the species and that the average species occurs in 27.8% of the islands. Whittaker's index of beta diversity equals the inverse of the proportional fill, β= (f*)−1= 3.6 in the present example, so it can be envisioned either as the factor relating the total species richness with average local richness inline image (Whittaker, 1960) or as the ratio of the total area of the region and the average range-extent of species inline image (Routledge, 1977; Arita et al., 2008).

Rq and Qr analyses

The diversity field volume (Di) of species i is the summation of species-richness values of sites within its range. In the example, the diversity field volume of species 1 (first row) is the sum of the richness values of sites (columns) 1, 2, 4, 5, 6, 7 and 8, that is D1= 34 species. The dispersion field volume (Rj) of site j is the summation of range sizes of the species occurring in that site (Graves & Rahbek, 2005). The dispersion field volume of the 18th site (last column) in the example is the sum of the range sizes of species 4 and 5, that is R18= 7 sites. The vectors of the S diversity field volumes and the N dispersion field volumes can be computed as D=Δs=ΔΔT1S and R=ΔTn=ΔTΔ1N, respectively.

Dividing the diversity field volumes by the corresponding range size of each species, we obtain the vector of average species-richness values within each range. Equivalently, dividing the dispersion field volumes by the corresponding species richness of each site, we can compute a vector of the mean range sizes of species occurring in the site. We call these parameters the mean range richness of a species (inline image) and the mean per site range size of a locality (inline image). Notice that the first one is a richness value that can be assigned to a species, and the second one is a range size variable assigned to a site. Dividing these variables by the total number of species or by the total number of sites, we obtain the proportional range richness of a species (inline image) and the proportional per site range size of a site (inline image).

RD plots

In RD plots by species, abscissas are the proportional range richness values (inline image) and ordinates are the proportional range sizes of species (ni*, Fig. 1a). In RD plots by sites, abscissas represent the proportional per site range size (inline image) and the proportional species-richness values correspond to the ordinates (sj*, Fig. 1b). In both cases, a vertical line is drawn to coincide, along the x-axis, with the proportional fill of the PAM, which equals both the average proportional range size and the average proportional species richness of the system; in this example, inline image.

In Fig. 1(a) and (b), the dark curved lines represent mathematical constraints that mark a limit to the possible values of points in RD plots. Their shape and position depend on the minimum and maximum richness and range size values (Arita et al., 2008). These limits can be explained verbally using the law of the large numbers. In the plot by species the larger the ‘sample size’ (the number of sites forming a range), the closer the range richness value has to be to the overall mean. In the limit, the average richness of the sites forming the range of a species occurring everywhere is identical to the overall mean richness, so a point lying on the very top of the RD plot will necessarily be located on the vertical dashed line. By contrast, points corresponding to species occurring in a few sites (representing ‘small samples’) can vary widely along the abscissa, as shown by the larger dispersion of points in the bottom part of the RD plots in Fig. 1(a). With a similar reasoning, the point corresponding to a site containing all species will necessarily be located on the top of the plot and on the vertical dashed line, because the average range size of species occurring there is identical to the overall mean. Sites with low diversity values, in contrast, will show more variation in average range size.

In Fig. 1(a) the point close to the top corresponds to species 2, which occurs in 16 of the 18 islands (ni*= 0.89) and co-occurs, on average, with 2.44 species in each island (inline image). A species occurring on all islands would necessarily co-occur with an average of 2.78 species and its point would be at the top of the graph, exactly on the vertical dashed line. In contrast, species occurring on only one island could in principle have proportional range richness values (inline image) from 1/10 = 0.1 (the focal species being the only one on the island) to 10/10 = 1.0 (the focal species sharing the island with all other species). In the example, points corresponding to species occurring on a few islands are all located on the right-hand side of the plot, indicating a tendency of restricted species to occur only in high-richness sites. In the plot by sites (Fig. 1b), island 1 harbours the 10 species (sj*= 1.0), so its point lies on top of the plot and exactly on the vertical line, indicating that species occurring there have an average range of 5.0 islands (inline image). Almost all islands with low or intermediate species richness harbour species occurring in many sites, so their points lie on the right-hand part of the plot. The only exception is island 18, which contains two species, occurring on only three and four islands. The point corresponding to this island is the one on the lower left part of the plot.

The position of points in RD plots in relation to the vertical line is also related to the average covariance of species or sites. In general, inline image is the average covariance of species i with all species, and inline image is the average covariance of site j with all sites (Arita et al., 2008). Hence, the covariance of a species depends on the number of species with which it shares its distribution, and the covariance of a site is determined by the number of sites with which it shares species (Arita et al., 2008). Points located along the hyperbolic dashed curves in Fig. 1 have the same covariance (the ±0.01, 0.05, and 0.1 isocovariance lines are shown, negative covariances to the left, positive covariances to the right). In the plot by species, the farther a point is from the dashed line, the higher is the absolute value of the covariance of the corresponding species with the complete biota. These lines are drawn using the equations inline image and inline image where ρ and τ are particular covariance values for species and for sites, respectively.

Histograms on top of RD plots in Fig. 1 show that the points for most species and most islands lie to the right of the vertical dashed line, that is, their average covariance is > 0. In fact, 9 of the 10 species and 15 of the 18 sites have average covariances between +0.05 and +0.1. This pattern, in which points of most species and most sites fall on the right-side sector of RD plots is characteristic of highly nested assemblages, in which if a species occurs in only a few sites, these sites tend to be areas of high species richness. Equivalently, low-richness sites are populated by species that are widespread.

Variance partitioning and variance-ratio tests

The N×N matrix of variance–covariance among sites is computed as inline image for j and m= 1 to N, where csi(j,m) is the covariance between sites j and m. The equivalent S×S variance-covariance matrix for species is inline image for i and l= 1 to S, where csp(i,l) is the covariance between species i and l. The elements along the diagonals are the binary variances vsi(j) =sj*(1 −sj*) for site j and vsp(i) =ni*(1 −ni*) for species i. Notice that ΔΔT and ΔTΔ are the S×S matrix of co-occurrence of species and the N×N matrix of the number of species shared by sites, respectively. The diagonal of the first matrix is equal to the vector n of range sizes, the diagonal of the second matrix is equal to the vector s of species-richness values, and the trace of either of these matrices equals f, the fill of the matrix. The average covariance of any given site j with all sites is τj, and the N× 1 vector of such values for all sites is given by inline image. By symmetry, the average covariance of species i with all species is ρi, and the S× 1 vector of such values is inline image.

In any PAM the variance in species richness of sites (sj) equals the sum of the variances in range size for all species plus twice the sum of covariances among species (Schluter, 1984; Bell, 2005):

image(1)

Notice that inline image is the trace of the matrix Csp, that is, the summation of the variances of species, and that inline image is the summation of the non-diagonal elements of the matrix Csp, that is, twice the summation of all pair-wise covariances. In other words, the right-hand part of equation 1 is simply the summation of all elements of Csp, that is:

image(2)

This leads to the important result that the variance in species richness among sites depends on the variance and covariance of distributional values for species. This property can be used for testing the hypothesis of a possible association of species in terms of co-occurrence patterns (Robson, 1972; Schluter, 1984; Bell, 2005). From equation 1, if the sum of covariances of all species is zero (meaning that on average there is no association among them), then the ratio

image

must be equal to 1. An observed Vsp that is less than an expected value generated by some null model would indicate a negative total covariance, which might point to a possible mechanism of avoidance or exclusion between species at local scales (Gotelli, 2000; Bell, 2005), or at spatial segregation due to contrasting climatic or habitat preferences at biogeographic scales.

Following the same logic, the variance in range size among species is determined by the variance and covariance of sites in terms of species richness:

image(3)
image(4)

In equation 3 the first term on the right-hand side is the sum of variances and the second term is twice the sum of covariances among sites, so Var(n) equals the summation of all elements of the matrix Csi, as shown by equation 4. A variance-ratio test equivalent to the one proposed by Schluter (1984) can be used for sites to test for significant similarity in terms of shared species,

image

V si can be used for testing a possible clustering of sites in terms of shared species. In principle, Vsp and Vsi are related, through the relationships between variance among sites and among species, but not totally dependent on each other. A full description of a system, including patterns by species and by sites, could be achieved through the use of both parameters.

Table 2 shows the variance–covariance matrix by species (Csp) of the Thousand Islands example. The diagonal of the matrix includes the binary variances generated by the range size of each species, so the sum of these S= 10 elements is

Table 2.  Variance–covariance matrix by species (sp) for the mammals of the Thousand Islands region, calculated from Table 1. Thumbnail image of
image

The sum of the S(S− 1) = 90 non-diagonal elements equals twice the sum of all pair-wise covariances,

image

From Table 1, we can calculate the population variance in richness:

image

The partitioning of variance defined by equation 1 is readily corroborated: 7.173 = 1.518 + 5.654. Equivalently, the variance in range-size values can be partitioned into two components from the variance–covariance matrix by sites (equation 3):

image

Schluter's (1984) variance-ratio parameter is Vsp= 7.173/1.518 = 4.72, and the equivalent ratio for sites is Vsi= 15.80/2.32 = 6.81.

CONTINENTAL EXAMPLES: MAMMALS IN THREE MEXICAN REGIONS

In this section, we present data on the distribution and richness patterns of the mammals of three contrasting regions of Mexico to illustrate the analytical power of RD plots (Fig. 2), and present the results of three different null models to show how RD plots and variance-ratio tests can be used in combination to dissect the variance components of the distribution and diversity parameters (Table 3, Figs 3 & 4).

Figure 2.

Range–diversity plots for the mammal fauna of three regions in Mexico, by species (a, c, e), and by sites (b, d, f): (a, b) central Mexico; (c, d) the Isthmus of Tehuantepec; (e, f) the Yucatan Peninsula. By species, showing their proportional range sizes (ordinates) and the average species richness within their ranges (abscissas); histograms on top and on the right-hand side show the frequency distribution of those variables; the solid curved line marks the upper theoretical limit for points; the vertical dashed line corresponds to the mean proportional species richness of the sites and the hyperbolic dashed curves are lines of equal covariance among species. By sites, showing their proportional species richness (ordinates) and the average proportional range size of species occurring in the sites with the other elements of the graph correspond to those described for species.

Table 3.  Parameters of diversity and distribution of the mammals of three Mexican regions.
 Region
Central MexicoIsthmusYucatan
  1. Numbers in parentheses are proportional values.

  2. PAM, presence–absence matrix.

  3. V sp and Vsi, variance-ratio parameters for species and sites, respectively.

Parameters of the region   
 Quadrats625050
 Species212206111
 Fill of PAM5770 (0.44)6601 (0.64)4265 (0.77)
 Whittaker's beta2.281.561.30
Parameters of species   
 Mean range size27.22 (0.44)32.04 (0.64)38.42 (0.77)
 Mean range richness98.14 (0.46)137.07 (0.67)88.18 (0.79)
 Vsp7.4415.889.65
Parameters of sites   
 Mean species richness93.06 (0.44)132.02 (0.64)85.3 (0.77)
 Mean per site range size40.86 (0.66)41.62 (0.83)45.14 (0.90)
 Vsi24.0826.7728.97
Figure 3.

Range–diversity plots for three null models using empirical data for the mammal fauna of central Mexico, by species (a, c, e) and by sites (b, d, f). (a, b) Scattered ranges simulation retaining the empirical species-richness frequency distribution. (c, d) Scattered ranges simulation retaining the empirical range-size frequency distribution. (e, f) Cohesive ranges simulation using the spreading-dye algorithm, retaining the empirical range-size frequency distribution. By species, showing their proportional range sizes (ordinates) and the average species richness within their ranges (abscissas); histograms on top and on the right-hand side show the frequency distribution of those variables; the solid curved line marks the upper theoretical limit for points; the vertical dashed line corresponds to the mean proportional species richness of the sites and the hyperbolic dashed curves are lines of equal covariance among species. By sites, showing their proportional species richness (ordinates) and the average proportional range size of species occurring in the sites with the other elements of the graph correspond to those described for species.

Figure 4.

Frequency distribution of the values of Schluter's variance-ratio parameter by species (a) and by sites (b) corresponding to two null models using data for the mammals of central Mexico. The left-hand slim histogram in each panel corresponds to the simulations using scattered ranges and retaining the empirical range-size frequency distribution; the right-hand histogram in each case corresponds to the simulations using the spreading-dye algorithm to model cohesive ranges. Numbers and arrows show the value and location of the empirical values. Histograms show the results of 1000 iterations of each simulation.

Each region consists of a set of half-degree quadrats in which the distribution of mammals was used to build the corresponding presence–absence matrices. The first region was located in central Mexico, a very heterogeneous area located in the transitional zone between the Nearctic and Neotropical biogeographic realms; the second region included parts of the Isthmus of Tehuantepec in south-eastern Mexico, also a highly heterogeneous area lying on the transitional zone but with a stronger component of Neotropical influence; the third square included the Yucatan Peninsula, a relatively homogeneous area of full Neotropical composition. The central Mexico region included 62 half-degree quadrats, while the other two regions included 50 quadrats each. Distributional data were extracted from the database described in Arita et al. (1997), and more details on the scaling of diversity patterns in these three regions can be found in a previous publication (Arita & Rodríguez, 2002).

In Fig. 2, the three regions are shown in order of their f* value (or, equivalently, in order of decreasing beta diversity), from central Mexico (Fig. 2a, b) to Yucatan (Fig. 2e, f). The fill of the PAM equals the summation of all range size values or the summation of all species-richness values; consequently, its magnitude is closely tied to the range-size and species-richness frequency distributions, which are shown in the right-hand panels of RD plots in Fig. 2. Histograms for the Yucatan region, for example, show a predominance of widespread species and species-rich sites, a fact that is reflected in the high f* value (Fig. 2e, f). The central Mexico region shows a more even distribution of range-size values and lower values of species richness for its sites, all of which is reflected in a lower f* (Fig. 2a, b). The Isthmus region is intermediate between the Yucatan and the central Mexico cases (Fig. 2c, d), with a range-size frequency distribution skewed to small ranges, but not as extreme as for the Yucatan region (Fig. 2e, f).

The position of points relative to the dashed vertical line depends on the degree of association among species or the degree of similarity among sites. In the RD plots by sites for the three regions, points are located to the right of the vertical line, with several points going farther than the + 0.1 isocovariance line (Fig. 2b, d, f), which indicates that all sites show a positive average covariance with other sites. This is also shown by the high variance ratio by sites (Vsi > 24 in the three cases, Table 1). In the plots by species, points tend to lie to the right of the vertical line, but the tendency is much stronger in the Isthmus region (Fig. 2c) than in the central Mexico or Yucatan regions (Fig. 2a, e). Notice that in the case of the Yucatan (Fig. 2e), there are several points lying at the very top of the plot, coinciding with the vertical line. This pattern is corroborated by the variance-ratio values by species (Vsp), which are > 1.0 in the three cases, but higher for the Isthmus region (Table 1).

The central Mexico mammal fauna (Fig. 2a, b) is a combination of widespread and restricted taxa that generates a pattern of low average local richness but high regional richness (i.e. a high beta diversity). Covariance among species (association) is positive but low, and covariance among sites (similitude) is also low. Several mammalian species in the Isthmus region are widespread, but the region also harbours many species with restricted distributions. This pattern generates sites with local species-richness values that are higher than those for central Mexico but whose aggregate richness is lower, indicating a lower beta diversity (Fig. 2c, d). Finally, the Yucatan region consists of sites containing species that mostly occur all over the peninsula, generating a pattern of high local species richness, but a very low beta diversity (Fig. 2e, f).

Null models and the effect of range cohesion

Null models have the purpose of contrasting real-world assemblages against hypothetical patterns generated by randomizing some variables while retaining the empirical values for other parameters (Gotelli & McGill, 2006). We used three null models that have been shown to generate contrasting patterns when examined with RD plots (Borregaard & Rahbek, 2010; Villalobos & Arita, 2010).

In our first null model, we maintained the empirical column sums, that is, we retained the original species-richness frequency distribution but assigned sites to species at random with no replacement. We did this by permutating the zeroes and ones in each column, so we conserved the empirical fill of the matrix and, consequently, the original Whittaker beta diversity. The range size values, in contrast, changed with this procedure, and so did the variance–covariance matrices both for species and for sites.

Figure 3(a, b) shows the results of one run of this null model applied to the central Mexico region. Notice that the position of the vertical line is identical to that in Fig. 2(a, b) (corresponding to f*= 0.44), and that the species-richness frequency distribution is the same in both cases (right-hand panel in Figs 2b & 3b). In contrast, the range-size frequency distribution (Fig. 3a, right panel), the frequency distributions of the range richness and per site range parameters (Fig. 3a, b, top panels), and the position of points are all changed. The randomization process generated a system in which the covariance among sites was zero, a pattern shown in the RD plot by sites in the arrangement of points along the vertical dashed line (Fig. 3b) and by the value of the variance-ratio parameter by sites, whose average across 1000 iterations of the model was practically equal to 1.0 (Vsi= 0.998, with variance = 0.009), contrasting with the empirical value (Vsi= 24.08, Table 1). Species also showed a lower variance ratio in the simulations than in the real-world system (Vsp= 4.69, mean for 1000 iterations, variance = 5.13 × 10−5; Vsp= 7.44, empirical value; Table 1). This means that in the simulations, species had a tendency to overlap less than in the real world, but never attaining a total independence.

Our second null model was mathematically identical to the first one, but inverting the role of sites and species. We retained the empirical range-size frequency distribution (row totals) and generated permutations of rows of the PAM to simulate the random assignment, without replacement, of species to sites. This model also retains the empirical f*, so the position of the vertical line is not changed (Fig. 3c, d, corresponding to the central Mexico region). In the RD by species (Fig. 3c), the range-size frequency distribution (right panel) is unchanged, but points arrange along the vertical line, showing that the average covariance among species is close to zero, as a consequence of the randomization procedure. This pattern is also shown by the variance ratio being practically equal to 1 (Vsp= 0.994 for 1000 iterations, Fig. 4a left-hand histogram).

Sites showed less variation in species richness than in the real-world system (histograms in the right panel of Figs 2b & 3d) but had a strong positive covariance (similitude in species composition), as shown by the points in Fig. 3(d) being concentrated on the right-hand side of the plot and by the value of the variance ratio for sites (Vsi= 23.62, mean for 1000 iterations, Fig. 4b left-hand histogram). However, these Vsi values are less than 24.08, the empirical value for the system, meaning that species in the simulations show less overlap in their distributions than in the empirical system (Fig. 4b).

In the third null model we retained the empirical range-size frequency distribution but simulated ranges as random cohesive units by using the spreading-dye algorithm as described by Jetz & Rahbek (2002). For each species, we started with a randomly located site; then, we filled the available adjacent cells until the count of sites equalled the empirical range size of the species. The model generated higher covariance values among species than in the real-world systems. This is shown by the significantly higher variance ratio in the simulations (Vsp= 14.02, mean of 1000 iterations) than in the real-world system (Vsp= 7.44, Fig. 4a right-hand histogram). This tendency can be also seen in the RD plot by species (Fig. 3e). Sites also showed a higher covariance in the simulations than in the empirical dataset. In the simulations, the variance ratio was significantly higher (Vsi= 24.56, average for 1000 iterations of the model, Vsi= 24.08, empirical data) and points aggregated to the right in the RD plot by sites (Fig. 3f). These patterns show that real-world species tend to co-occur less frequently than expected if ranges are modelled as cohesive units, but more frequently than expected from scattered-ranges models (Arita & Rodríguez-Tapia, 2009).

Measures of central tendency and dispersion

When quantifying the species richness of sites and the range size of species through a PAM, mathematical relationships determine limits to the possible values that diversity and distribution components can attain. Our theoretical developments and null models, however, show that the species-richness frequency distribution cannot be fully predicted if only the range-size frequency distribution is known. The same is true the other way around; the species-richness frequency distribution sets limits to but do not fully determine the range-size frequency distribution.

The proportional fill of a PAM (inline image), or equivalently Whittaker's beta β= (f*)−1, determines the central tendency of points in RD plots when the mean covariance is zero, but not their dispersion. A way to visualize this is to imagine a system in which the general parameters of the system (inline image, inline image, f*, β) are completely determined. Imagine now that we can move, distort, and even fragment the ranges of species with the only restriction that we retain their size. No matter how extreme our actions are, the values of the parameters mentioned above do not change. A direct consequence of this thought experiment is that Whittaker's index, despite being the most commonly used measure of beta diversity (Koleff et al., 2003; Tuomisto, 2010a,b; Anderson et al., 2010), is insensitive to transformations of the PAM that leave its dimension and fill constant (Arita & Rodríguez, 2002). In contrast, the manipulation of ranges implies changes in the parameters of variation around the mean, for example the variance–covariance matrices, the shape of the species-richness frequency distribution, the horizontal location of points in RD plots and the Schluter's variance-ratio parameters.

CONCLUSION

Species richness and range size are two sides of the same coin, that is, they are equally valid parameters for measuring biological diversity. A complete comprehension of the assemblage will require an analysis of parameters of central tendency and dispersion for both species richness and range size. RD plots and their associated parameters can be a powerful instrument in such endeavour.

ACKNOWLEDGEMENTS

We thank R. K. Colwell, N. J. Gotelli, T. Rangel, P. Trejo and F. Villalobos for helpful discussion. G. Rodríguez-Tapia provided efficient technical support and crucial help in computer-based analyses. Financial support was provided by DGAPA-UNAM, PAPIIT program and by Microsoft Research KUCR no. 47780 for J. Soberón.

BIOSKETCHES

Héctor T. Arita is research professor at the National University of Mexico, where he teaches community ecology, ecological statistics and conservation biology. He is a macroecologist specializing in mathematical models of diversity and distribution.

Editor: Tim Blackburn

Ancillary