Evaluating the performance of species richness estimators: sensitivity to sample grain size



    Corresponding author
    1. Departamento de Biodiversidad y Biología Evolutiva, Museo Nacional de Ciencias Naturales (CSIC), C/José Gutiérrez Abascal, 2, Madrid 28006, Spain;
    2. Departamento de Ciências Agrárias, CITAA, Universidade dos Açores, Campus de Angra, Terra-Chã, Angra do Heroísmo 9701 851, Terceira (Açores), Portugal; and
    Search for more papers by this author

    1. Departamento de Ciências Agrárias, CITAA, Universidade dos Açores, Campus de Angra, Terra-Chã, Angra do Heroísmo 9701 851, Terceira (Açores), Portugal; and
    Search for more papers by this author

    1. Departamento de Ciências Agrárias, CITAA, Universidade dos Açores, Campus de Angra, Terra-Chã, Angra do Heroísmo 9701 851, Terceira (Açores), Portugal; and
    2. BIOME Group, Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
    Search for more papers by this author

Joaquín Hortal, Departamento de Biodiversidad y Biología Evolutiva. Museo Nacional de Ciencias Naturales (CSIC), C/José Gutiérrez Abascal, 2, Madrid 28006, Spain. E-mail: jhortal@mncn.csic.es


  • 1Fifteen species richness estimators (three asymptotic based on species accumulation curves, 11 nonparametric, and one based in the species–area relationship) were compared by examining their performance in estimating the total species richness of epigean arthropods in the Azorean Laurisilva forests. Data obtained with standardized sampling of 78 transects in natural forest remnants of five islands were aggregated in seven different grains (i.e. ways of defining a single sample): islands, natural areas, transects, pairs of traps, traps, database records and individuals to assess the effect of using different sampling units on species richness estimations.
  • 2Estimated species richness scores depended both on the estimator considered and on the grain size used to aggregate data. However, several estimators (ACE, Chao1, Jackknife1 and 2 and Bootstrap) were precise in spite of grain variations. Weibull and several recent estimators [proposed by Rosenzweig et al. (Conservation Biology, 2003, 17, 864–874), and Ugland et al. (Journal of Animal Ecology, 2003, 72, 888–897)] performed poorly.
  • 3Estimations developed using the smaller grain sizes (pair of traps, traps, records and individuals) presented similar scores in a number of estimators (the above-mentioned plus ICE, Chao2, Michaelis–Menten, Negative Exponential and Clench). The estimations from those four sample sizes were also highly correlated.
  • 4Contrary to other studies, we conclude that most species richness estimators may be useful in biodiversity studies. Owing to their inherent formulas, several nonparametric and asymptotic estimators present insensitivity to differences in the way the samples are aggregated. Thus, they could be used to compare species richness scores obtained from different sampling strategies. Our results also point out that species richness estimations coming from small grain sizes can be directly compared and other estimators could give more precise results in those cases. We propose a decision framework based on our results and on the literature to assess which estimator should be used to compare species richness scores of different sites, depending on the grain size of the original data, and of the kind of data available (species occurrence or abundance data).


Species richness is the most commonly used biodiversity indicator (see Gaston 1996 for a review) for conservation (e.g. Margules, Nicholls & Pressey 1988; Conroy & Noon 1996; Kerr 1997; van Jaarsveld et al. 1998), ecological research (e.g. Tilman, Wedin & Knops 1996; Naeem et al. 1996; Brown et al. 2001) and macroecology (e.g. Currie 1991; Gaston 2000; Whittaker, Willis & Field 2001). However, complete inventories of the fauna at a given place, for a specific community or geographical area are often exceedingly hard to get. In addition, biodiversity data suffer from heterogeneity in sampling strategies and/or sample size. Moreover, it is well known that differences in the characteristics of biological assemblages produce differences in sampling effectiveness. Thus, when the same sampling effort with standardized techniques is carried out in different areas and/or community types, sampling success may not be always the same, leading to important biases in the total species richness inventoried at each site.

Owing to this, studies involving comparisons of species richness among different areas, sites or communities need to use extrapolation or rarefaction techniques to ‘standardize’ richness data (see Palmer 1990, 1991; Baltanás 1992; Soberón & Llorente 1993; Colwell & Coddington 1994; Walther et al. 1995; Walther & Morand 1998; Gotelli & Colwell 2001; Walther & Martin 2001; Walther & Moore 2005). There are many methodologies currently in use for this task, from which four main groups can be distinguished:

Most times, species accumulation and species–area curves are confounded (see, e.g. the debate in Scheiner 2003, 2004 and Gray, Ugland & Lambshead 2004, 2005). For clarity, we consider species accumulation curves those where the area of each sample unit is not used to build the curve, whereas species–area curves include the area of each sample unit (e.g. a forest remnant, see below) in the formulation of the curve (see Colwell et al. 2004).

To date, numerous assessments on the performance of several of these estimators under different conditions and/or sample sizes have been carried out (e.g. Chazdon et al. 1998; Keating et al. 1998; Peterson & Slade 1998; Walther & Morand 1998; Chiarucci, Maccherini & De Dominicis 2001; Walther & Martin 2001; Brose 2002; Longino, Coddington & Colwell 2002; Borges & Brown 2003; Brose, Martinez & Williams 2003; Chiarucci et al. 2003; Melo et al. 2003; Brose & Martinez 2004; O’Hara 2005; Jiménez-Valverde et al. 2006; see review at Walther & Moore 2005). These works seek for the most adequate estimator, that is, the one with less estimation bias (deviation from the true richness value) and higher precision (i.e. the lesser random error), thus producing the higher accuracy (the combination of bias and precision) (see Walther & Moore 2005 for a review).

The idea behind such extensive evaluation work is to find estimators that could be used to compare species richness scores from different sites with reliability. Chazdon et al. (1998) defined three features for an ideal richness estimator: (1) independence of sample size (amount of sampling effort carried out); (2) insensitivity to unevenness in species distributions; and (3) insensitivity to sample order. The above-mentioned studies used results from similar survey methods (thus, similar sampling units) with different intensities and/or sampling success to determine the adequacy of different estimators. However, an assessment on how the different estimators perform when the units used to describe sampling effort differ among the surveyed places is yet to be done. Different sampling strategies are often used to assess species richness in natural areas or large regions. To compare richness values obtained from different survey strategies (which is often the case for macroecology studies), we need a measure with low sensitivity to this source of variation, independently of its success in determining the real number of species in a given place. The scores obtained with such an estimator would allow direct comparison of species richness between sites surveyed with different sampling methodologies, as well as with unequal sampling efforts (providing that the sampling coverage is sufficiently large). Thus, we can add a fourth feature to those proposed by Chazdon et al. (1998) for an ideal estimator: (4) insensitivity to heterogeneity in the sample units used among studies. Following Whittaker et al. (2001) recommendations, hereafter we will use the term grain size (see Levins 1968) to define the sampling effort unit (e.g. traps, transects, cells in a geographical grid, or landscape patches).

In this work, we analyse the effect of variation in grain size on species richness estimations. We evaluate the accuracy of the predictions obtained with many of the estimators presently available when different strategies are used to group the same dataset into different grains (i.e. sample units). To do this, we have used data from standardized surveys of the arthropod epigean community of native forests from the five major Azorean islands, in order to estimate the total number of arthropod species that occurs in the Archipelago (excluding small islands), using seven different grain sizes.

Biological dataset

The dataset used for this work comes from a study conducted in the Azorean archipelago (North Atlantic; 37–40°N, 25–31°W), which comprises nine main islands and some small islets. Aligned on a WNW–ESE axis, these islands extend for about 615 km across the Mid-Atlantic Ridge, which separates the western group (Flores and Corvo) from the central (Faial, Pico, São Jorge, Terceira and Graciosa) and the eastern (São Miguel and Santa Maria) groups. All islands are of relatively recent volcanic origin, ranging from 250 000 years bp (Pico) to 8·12 Ma bp (Santa Maria) (Nunes 1999). The climate is temperate oceanic, with high relative atmospheric humidity (reaching 95% in high altitude native semitropical evergreen laurel forest), as well as limited temperature fluctuations throughout the year. The predominant native vegetation is ‘Laurisilva’, a humid evergreen broadleaf and microphyllous (hereafter short-leaf) laurel type forest. Dominant trees and shrubs include short-leaf Juniperus brevifolia and Erica azorica (both endemics), and the broadleaf species Ilex perado ssp. azorica (endemic), Laurus azorica (native) and the shrub Vaccinium cylindraceum (endemic) (Silva et al. 2005).

On seven of the Azorean islands (excluding the smaller and more disturbed Graciosa and Corvo) native vegetation was surveyed within Natural Forest Reserves and/or NATURA 2000 protected areas using standardized sampling protocols (see Borges et al. 2005a, 2006). During the summers of 1999–2003, 150 m × 5 m transects were randomly placed in each fragment of native protected forest. Simulating a species–area relationship of 0·35 slope (a 10× increase in area implies the duplication in the number of species), transects were set up using a logarithmic scale, placing two transects in 10 ha fragments, four transects in 100 ha fragments and eight transects in 1000 ha fragments. Therefore, larger reserves received higher sampling effort (Borges et al. unpublished data). For the present study, only five islands were covered, those with at least two native protected areas (see Table 1) and a total of 78 transects located within 17 forest remnants were selected.

Table 1.  List of the studied 11 natural forest reserves, one geological reserve (*) and five additional areas (†) with their name, code, island (PIC = Pico; FLO = Flores; SJG = São Jorge; SMG = São Miguel; TER = Terceira), number of available transects, area and altitude (minimum and maximum). When nature reserves correspond to different habitats, area and altitude are given just for their laurisilva forest remnants
 NameCodeIslandTransectsArea (ha)Altitude (m)
1Morro Alto e Pico da SéFLO-MAFLO81558300–915
2Caldeiras Funda e RasaFLO-FRFLO4 459350–600
3Mistério da PrainhaPIC-MPPIC8 643425–841
4Lagoa do CaiadoPIC-LCPIC4 131800–939
5CaveiroPIC-CPIC4 199850–950
6Pico PinheiroSJG-PSJG2 175600–780
7Pico Frades – TopoSJG-TSJG2  50600–942
8Pico do GalhardoTER-GHTER4  66550–700
9Caldeira do Guilherme MonizTER-GMTER4 408455–470
10Terra BravaTER-TBTER8 143600–750
11Serra de Sta Barbara e M. NegrosTER-SBTER121274550–1025
12Biscoito da FerrariaTER-BFTER6 391475–808
13Algar do Carvão*TER-ACTER2  28629
14MatelaTER-MTER2  25350–393
15GraminhaisSMG-GSMG2  27850–925
16AtalhadaSMG-ASMG2  15425–530
17Pico da VaraSMG-PVSMG4 245400–1103

Along each transect, 30 pitfall traps were set in the ground at 5-m intervals for at least a 2-week period (see also Borges et al. 2005a). Fifteen traps were half-filled with a nonattractive solution with a small proportion of ethylene glycol, and the other half with a general attractive solution (Turquin) made of dark beer and some preservatives (see Turquin 1973). Traps were placed alternately along each transect. With such a procedure, it was expected not only to survey the relative abundance (although biased by their mobility) of each species sampled (with nonattractive traps), but also to capture the maximum number of species (with attractive traps).

All Araneae, Opiliones, Pseudoscorpiones and insects (excluding Collembola, Diplura, Diptera and Hymenoptera) were first sorted into morphospecies by students under supervision of a trained taxonomist (PB) (see Oliver & Beattie 1993, 1996). Later, the morphospecies were identified by one of us (PB) using voucher specimens already available in situ, and all unknowns were sent to several taxonomists for species identification (see Acknowledgements).

At the end of this survey, a total of 22 815 individuals, pertaining to 205 epigean arthropod species (or morphospecies) were captured. As there is no complete checklist available for the studied fauna, we provide a comparison with an expert ‘guesstimate’ of the total number of species. According to his expertise in such fauna, one of us (PB) has arbitrarily extrapolated the total number of epigean species for each one of the main arthropod groups present in the natural forests of the five studied islands. The recent checklist of the Azorean arthropods (Borges et al. 2005b), developed from an exhaustive survey of relevant literature and collections, was taken as a starting point. To get a ‘guesstimate’ for each group current numbers in such a checklist were taken into account, but also the degree of knowledge of each group at the Azores and PB's expertise. Only native forest fauna was considered, not counting introduced species and/or pasture dwelling species (see also Borges 1999). According to PB's knowledge, 306 species could be an approximate figure for the total number of species present in such forests (see Table 2). Such a guesstimate should not be taken as the true number of arthropod epigean species in Azorean Laurisilva, but only as a milestone to identify abnormally biased richness estimations.

Table 2.  Guesstimate (expert extrapolation) of total species richness of the epigean arthropod fauna of natural forest remnants in the five Azorean islands studied (São Miguel, Terceira, Pico, Flores and São Jorge). Sobs is the number of epigean species recorded during the survey
Coleoptera 82126
Araneae 50 70
Blattaria  1  1
Chilopoda  6  7
Dermaptera  2  3
Diplopoda 12 20
Hemiptera 45 70
Opiliones  2  2
Pseudoscorpiones  3  4
Thysanura  2  3

Data grouping

To study how variations in grain size affect species richness estimations, we have grouped data into samples using seven different strategies.

  • 1Islands: grouping all individuals captured in each island as a single sample (n = 5).
  • 2Natural areas: using all data from each forest remnant as a sample (n = 17).
  • 3Transects: comprising all the individuals captured in each transect (n = 78).
  • 4Pairs of traps (herein, ‘2Traps’): combining the data of each pair of Turquin and ethylene pitfalls into a single measure (e.g. traps 1 and 2; traps 3 and 4; … of a given transect), assuming that they constitute an heterogeneous though complementary sample unit, comprising two different capture methods (see Discussion on heterogeneous sampling units at Jiménez-Valverde & Lobo 2005) (n = 1116).
  • 5Traps: using each pitfall trap separately (n = 2232).
  • 6Database records (herein, ‘Records’): where all the individuals of the same species present in a single trap give rise to a single sample (e.g. if a trap contains two individuals of the same species, a single sample is used, with an abundance value of 2; if, on the contrary, these two individuals pertain to two different species, two different samples occur, each one with an abundance value of 1; see examples in Hortal, Lobo & Martín-Piera 2001; Lobo & Martín-Piera 2002; and Martín-Piera & Lobo 2003) (n = 8666).
  • 7Individuals: where each individual captured produces a sample (n = 22815).

Here, it is important to point out that both Records and Individuals provide only incidence measures (although species abundance is directly included in Individuals, and indirectly in Records). Thus, the performance of abundance-based estimators (see below) could be less reliable for these two grains.

Species richness estimators

We compared the performance of 15 different species richness estimators (see abbreviations and descriptions in Table 3).

Table 3.  Characteristics of the species richness estimators used for this analysis
  1. Abbrev. is the abbreviation used throughout this work. Type refers to the kind of method used to estimate species richness: As, asymptote value of a fitted species accumulation curve; NP, estimation using a nonparametric model; SA, species–area curve; see text for further details. Data refer to the kind of data required for the estimation: In, incidence of the species in each sample; Ab, abundance of the species in each sample. Program refers to the software used for computing (estimateS 7·0 –Colwell 2004, available at http://purl.oclc.org/estimates; STAT – any common statistical program, in this case StatSoft 2001; ws2m 3·2 (beta) –Turner et al. 2000, available at http://eebweb.arizona.edu/diversity/; Ugland –Ugland et al. 2003, Excel® spreadsheet available at http://folk.uio.no/johnsg/main.htm). See text for further explanations.

ClenchEstimation of Michaelis– Menten function asymptoteAsInestimateS/STATClench (1979)in Soberón & Llorente (1993)
Exp NegEstimation of Negative Exponential function asymptoteAsInestimateS/STATMiller & Wiegert (1989)in Soberón & Llorente (1993)
WeibullEstimation of Weibull function asymptoteAsInestimateS/STATBrown & Mayer (1988)in Flather (1996)
MMNonparametric Michaelis– Menten richness estimatorAs/NPInestimateSRaaijmakers (1987)in Colwell (2004)
ACEAbundance-based Coverage Estimator of species richnessNPAbestimateSChao & Lee (1992); Chao et al. (2000); Chazdon et al. (1998) in Colwell (2004)
ICEIncidence-based Coverage Estimator of species richnessNPInestimateSLee & Chao (1994); Chao et al. (2000); Chazdon et al. (1998) in Colwell (2004)
Chao1Abundance-based estimator of species richnessNPAbestimateSChao (1984)inColwell (2004)
Chao2Incidence-based estimator of species richnessNPInestimateSChao (1984, 1987); Colwell (2004)
Jackknife1First-order Jackknife richness estimatorNPInestimateSBurnham & Overton (1978, 1979); Heltshe & Forrester (1983)in Colwell (2004)
Jackknife2Second-order Jackknife richness estimatorNPInestimateSSmith & van Belle (1984)in Colwell (2004)
BootstrapBootstrap richness estimatorNPInestimateSSmith & van Belle (1984)in Colwell (2004)
F3Extrapolation nonparametric estimator 3NPAbws2mRosenzweig et al. (2003)
F5Extrapolation nonparametric estimator 5NPAbws2mRosenzweig et al. (2003)
F6Extrapolation nonparametric estimator 6NPAbws2mRosenzweig et al. (2003)
UglandSample based species–area curveSAAbUglandUgland et al. (2003)

For Clench, Negative Exponential and Weibull estimators, species richness is calculated as the asymptote value of a function fitted to the smoothed species accumulation curve provided by estimateS 7·0 (100 randomizations; Colwell 2004). This ideal curve represents an unbiased description of the sampling process, where possible effects due to the order by which the samples have been taken or listed are removed by randomizing their order of entrance in the curve. We used statistica to fit each function to the data (see function equations in Soberón & Llorente 1993 for Clench and Negative Exponential, and Flather 1996 for Weibull), and then calculated the asymptote value from the so-obtained parameters (see a description of the process in Jiménez-Valverde & Hortal 2003 or Hortal et al. 2004). Michaelis–Menten is a nonparametric formulation of the Clench one, and is calculated in estimates as the mean score after 100 randomizations (Colwell 2004). As it presents slight differences from the scores obtained with Clench, we have included it in our analyses.

Together with the former, we evaluated another 10 nonparametric estimators. Seven are the ‘classical’ estimators provided by the estimateS software: ACE, ICE, Chao1, Chao2, Jackknife1, Jackknife2, Bootstrap (see Table 3 for references), and have been widely used and studied (e.g. Chazdon et al. 1998; Brose et al. 2003; Chiarucci et al. 2003). They are available in several programs (e.g. estimateS, spadeChao & Shen 2003–05, or species diversity and richnessHenderson & Seaby 2002). Here, it is important to check for the estimator formulas used by each program; while several estimators have been updated, these changes are not always available (e.g. Chao1 bias-corrected formula is used by current versions of estimateS and spade, but not by species diversity and richness). The other three (F3, F5 and F6) are available in ws2m software (Turner, Leitner & Rosenzweig 2000), but their adequacy has not yet been formally tested against other estimators, except for the paper where they were proposed (Rosenzweig et al. 2003). This is also the case for the only species–area-based estimator we test (Ugland), for which the only application available is its primary source (Ugland et al. 2003).

Comparison among estimators performance

We estimated the total species richness scores for the five studied islands using the 15 estimators and the seven different grain sizes (Table 4). The eight estimators included in the estimateS package (as well as the three asymptotic ones, see Table 3) were applied to all grains, being computing times negligible (on a Pentium Centrino 1·7 GHz with 1·25 GB RAM), except for Records and Individuals (note that high computation times are due to the randomization process, which is not necessary to obtain the results of nonparametric estimators if a single estimate is needed; R. Colwell, personal communication). Computing times for ws2m were higher, reaching 10 min in the case of 2Traps, 15 min for Traps, and 140 min for Records, being impossible to compute the matrix of Individuals. For the species–area curve estimator (Ugland) the samples needed to be partitioned into up to five different groups, and due to Excel® limitations, it was only possible to use up to 240 samples. Thus, we could only apply it to two grain sizes (Natural Areas and Transects), allowing the use of the 17 Natural Areas or the 78 Transects partitioned into the five groups (islands).

Table 4.  Species richness estimated for seven different grains by the 15 estimators analysed. n refers to the number of grains that could be used for each estimator; note that F3, F5 and F6 could not be used with Individuals, and the Ugland estimator could only be applied to two grains (see text). SD is the SD of the different results obtained with each estimator. Results according to estimateS output (note that slight differences could appear between estimateS and spade estimates; A. Chao, personal communication). Nt.Ar. is Natural Areas. Ind. is Individuals. Other abbreviations as in text and Table 3
Clench745·82 343·6 277·5248·7223·9223·3223·7218·4
Exp Neg718·59 238·2 208·2197·0188·7188·7188·8187·5
Weibull7162·81 244·6 306·0319·4433·4412·1407·3744·9
MM757·69 333·0 253·5212·7184·5183·5183·1174·1
ACE70·23 288·3 288·3288·3288·3288·3289·0288·3
ICE731·99 372·8 329·6299·1291·9288·7288·7288·1
Chao172·79 298·1 ± 34·9 301·8 ± 34·9301·8 ± 34·9301·8 ± 34·9301·8 ± 34·9307·2 ± 37·7304·2 ± 34·9
Chao2732·35 379·5 ± 48·3 363·0 ± 49·3361·0 ± 53·1311·2 ± 37·7311·2 ± 37·7307·2 ± 37·7304·2 ± 34·9
Jackknife1710·08 293·8 ± 22·6 287·8 ± 13·8282·0 ± 11·1270·9 ± 8·3271·0 ± 8·3271·0 ± 8·1268·0 ± 7·9
Jackknife2715·24 345·1 345·6339·7316·9316·9317·0311·0
Bootstrap74·80 244·3 240·6237·3232·9232·9232·9231·7
F36381·231255·3 631·2435·5297·8295·1261·1 
F66457·981588·9 618·4512·3422·6444·6415·2 
Ugland2113·29  601·1440·9    

We evaluated the effect of variations in grain size on species richness estimation in two ways. First, we evaluated the sensitivity of the estimators to such variations, in order to identify those with higher independence from grain. To do this, we used precision (i.e. the variability in the estimates obtained with different grains) and, secondarily, bias (i.e. the distance between the estimates and the guesstimate of 306 species) (see Walther & Moore 2005). The best-performing estimators can be regarded as the most suitable to compare areas sampled with different grains. Second, we used the results from the more precise estimators to analyse the suitability of using different grain sizes for comparable biodiversity studies. We characterized the relationships among the estimates obtained with different grains, to determine which groups of grains presented similar values when the same estimator is used. This way, we identified which pairs or groups of grains produce comparable estimations.

performance of species richness estimators

Five estimators (Ugland, F3, F5, F6 and Weibull) performed clearly worse than the other 10. Ugland could only be applied to two grain sizes, showing low precision, and both estimations seem unrealistic when compared with the guesstimate (601·1 species estimates using Natural Areas, and 440·9 using Transects; see Table 4). The same occurred with F3, F5 and F6, which also showed low precision and extremely biased scores in the greatest grain sizes (see Table 4 and Fig. 1). Although the three were more precise and less biased at grains smaller than Transect (which are at most times directly comparable, see Relationships among grain sizes below), their overall performance was worse than the other estimators. While their ranges in these three estimates are equal or higher than 30 (36·71 for F3, 56·15 for F5, and 29·44 for F6), almost all the other estimators presented variations smaller than 5·5. The only ‘classical’ estimator with a similar performance is Weibull (with a range of 26·17 in the estimates for 2Traps, Traps and Records), which also presents an erratic behaviour. While the other estimators show a more or less constant pattern of diminishing estimated scores as grain diminishes (see Fig. 2), Weibull reaches an unrealistic top value when calculated using Individuals (nearly 750 species; see Table 4). This pattern is probably due to the high effectiveness of this function in adjusting to the smoothed species accumulation curves (adjusted R2 scores always higher than 99·99%; see also Jiménez-Valverde et al. 2006). Such ‘overfitting’ to the different curves obtained with each grain makes this estimator extremely dependent on grain size. Owing to their poor performance, we have excluded these five estimators from the rest of the analyses.

Figure 1.

Variability in the species richness scores calculated by each estimator. Square plots represent the mean, boxes show the SD, and whiskers provide the 95% confidence interval (1·96*SD). The left graph shows 14 of the studied estimators (note that Ugland is excluded). As F3, F5 and F6 could only be applied to six grain sizes (see text), ‘Individuals’ is excluded from the calculations of this graph. Owing to the high variability in these three estimators, the other 11 are represented in the right graph, using data for the seven grain sizes. Here, the guesstimate (306 species) is represented by a discontinuous line, and the observed richness (205 species) as dots and lines (see text). Abbreviations as in text and Table 3.

Figure 2.

Performance of the 10 less variable estimators across grain sizes. The discontinuous line shows the guesstimate, and the observed number of species is represented as dots and lines. Abbreviations as in text and Tables 3 and 4.

All the other 10 estimators showed limited variability, with SDs always less than 60 species (c. 20% of the guesstimate) (Table 4). In general, most of them showed a decreasing pattern, producing higher estimations at the greater grains, and similar estimated scores at the four smaller grain sizes (2Traps, Traps, Records and Individuals) (Fig. 2). The abundance-based estimators ACE and Chao1, and, to a lesser extent, Bootstrap, showed a high precision, with negligible SDs (0·23, 2·79 and 4·80, respectively; see Table 4), produced by the little variations due to data aggregation in samples at each grain. First- and second-order Jackknife estimators also showed great precision. To the contrary, incidence-based estimators ICE and Chao2, and asymptotic estimators (Clench, Negative Exponential and Michaelis–Menten) showed consistently higher scores at greater grain sizes. These three asymptotic estimators, as well as Bootstrap, were also negatively biased, performing even worse than the observed number of species (Negative Exponential and Michaelis–Menten estimators; Fig. 2). Again, ACE and Chao1 showed the best performance, with scores very close to the guesstimate throughout the different grains. ICE, Chao2, and the two Jackknife estimators performed quite well in the smaller grains. Although being cautious about our guesstimate, all these nonparametric estimators seemed to present higher accuracy in their estimations, thus seeming a better option than asymptotic estimators and Bootstrap.

The results of all 10 estimators were highly dependent on the level of sampling effort (sample coverage), providing lower richness scores when calculated from a lower number of samples. We calculated the predictions of these estimators at four levels of sample coverage (30, 50, 70 and 100% of the samples). For each grain, we calculated the average species richness predicted after 100 randomizations at each sample coverage level (e.g. the mean predictions for each of the 10 estimators at 23 (30%), 39 (50%), 55 (70%) and 78 (100%) samples of Transects, or at 2600, 4333, 6066 and 8666 samples of Records). Islands grain size was excluded due to its low number of samples. Then, we calculated the mean and SD for each estimator and level of sample coverage, to account for the bias and precision of their predictions (Table 5). Although the abundance-based nonparametric estimators (ACE and Chao1) are more precise, when we analyse the effect of sample coverage their advantage over the rest of the estimators diminishes. While their precision decreases at lower sampling intensities, the variability of incidence-based and Jackknife nonparametric estimators diminishes or remains more or less constant (see Table 5).

Table 5.  Mean and SD of the estimates of species richness at different sample coverage (30, 50, 70 and 100% of the samples; Islands was excluded from this analysis due to its low number of samples) for the 10 less variable estimators
% samplesMean species richness estimationSD
Exp Neg132·50158·44173·96193·15 4·66 5·74 6·46 8·15
ACE184·63229·71255·64288·4414·2311·44 8·86 0·25
Chao1191·77230·75258·34303·1120·1313·0111·27 2·24
Chao2209·27243·49271·47326·33 6·13 3·77 4·8527·78
Jackknife1181·27219·30242·98275·12 5·24 2·05 3·79 7·89
Jackknife2209·99247·97281·97324·52 6·77 6·87 6·1714·37
Bootstrap154·37187·54208·08234·70 6·97 1·98 0·64 3·46

relationships among grain sizes

Species richness estimates were highly correlated among the smaller grains. We compared the results of the 10 best-performing estimators (see before) at each grain size to determine which ones produced similar estimates (Fig. 3). Islands was the most dissimilar grain, only significantly related to Natural Areas (Pearson R = 0·850) and Transects (Pearson R = 0·733). The extremely high correlation (Pearson R = 1·000) between Traps and 2Traps, and between Records and Individuals is remarkable. Interestingly, both pairs of grains were also highly correlated (Pearson R = 0·996). Thus, it seems equivalent to use any of these four grains, at least in the studied case. In addition, these were the grains where the scores obtained from nonparametric estimators were less biased (see Fig. 2).

Figure 3.

Relationships between grain sizes. Cluster analysis was developed using Unweighted Pair-Group Average as linkage rule (see e.g. Sokal & Rohlf 1995), and Spearman correlations (Table 6) as similarity measure. Abbreviations as in text and Table 4.


Species richness is a central component of biodiversity (Gaston 1996; Gaston & Blackburn 2000). As the real number of species is unknown after sampling campaigns, estimators are needed to provide a clearer picture of species richness patterns. To date, many estimators are available, a number that increases as new approaches to the analysis of the process of species accumulation with sampling effort are developed (e.g. Colwell et al. 2004). However, the estimation of the true number of species in a given place or area seems to be elusive. Unfortunately, when a nearly complete checklist is available (in simulations or extremely well-known areas), most times the actual richness figures are not accurately estimated (e.g. Brose et al. 2003; Brose & Martinez 2004), depending on the data and the estimator used (see Walther & Morand 1998; Walther & Martin 2001). Despite this drawback, such estimators could be also useful if they provide a good picture of species richness patterns. Hortal et al. (2004) found that predictive models of the geographical distribution of species richness developed from Clench estimations were more accurate and representative than the models developed from observed scores, due to the diminution in the estimations of the bias due to unequal sampling effort. Here, in addition to the accuracy in representing real species richness score (that is virtually unknown, such as present data), we stress the importance of the estimates being comparable as a criterion for the adequacy of estimators (see, e.g. Palmer 1990, 1991). For an estimator to be useful, it should combine accuracy in representing real species richness with the capacity to give unbiased estimations of the differences in species richness among areas and/or grain sizes.

With regard to estimator performance, it is important to point out the poor results shown for these data by the new estimators proposed by Rosenzweig et al. (2003) and Ugland et al. (2003), as well as of Weibull (Flather 1996). Therefore, and given our results, the use of the estimators F3, F5, F6, Weibull and Ugland is currently not advisable, until their performance in future comparative analyses determine that they could be worth using in a number of cases. Once these estimators were excluded, the rest seem to perform more or less well. However, asymptotic estimators appear less desirable; they consistently underestimate species richness, a pattern well-documented in the literature (see, e.g. Lamas et al. 1991; Soberón & Llorente 1993; Colwell & Coddington 1994; Chazdon et al. 1998; Gotelli & Colwell 2001; Walther & Martin 2001; Melo et al. 2003; Brose & Martinez 2004; Hortal et al. 2004; Walther & Moore 2005; Jiménez-Valverde et al. 2006). Although the performance of nonparametric estimators has been always questionable (see, e.g. Longino et al. 2002; Chiarucci et al. 2003; Jiménez-Valverde & Hortal 2003; O’Hara 2005), neither observed species richness, nor species–area curves or asymptotic estimators seem to perform better than most of these estimators. Therefore, we believe that they should be used in the absence of complete inventories, despite their drawbacks and potential inaccuracy.

The estimations of the abundance-based Chao1 and ACE nonparametric estimators were independent from the grain used to aggregate the samples. This result was expected as both estimators are calculated from the species frequencies directly obtained from the total pool of individuals (A. Chao and R. Colwell, personal communication). Therefore, abundance data remain the same, regardless of the grain size used to aggregate data (except for Records, a measure of incidence), so only small variations due to data aggregation in samples appear. This property makes both abundance-based estimators extremely useful to compare data coming from different grains; as they depend on total species abundances, they should be quite robust to variations in grain size if sample coverage is sufficiently large (A. Chao, personal communication). However, differences in the survey methodologies and sampling effort used at the different sites (i.e. sampling bias sensuWalther & Moore 2005) could also affect estimations, so to compare surveys with different grain size data should be carefully examined to determine if they are comparable. It is also worth mentioning the low bias of both Chao1 and ACE estimators in our study, although this result cannot be taken as a general rule (see Walther & Moore 2005). On the contrary, although Bootstrap estimates are also highly precise, its clear underestimations of species richness (also reported by Chiarucci et al. 2003) prevent us from using it.

Having said this, it is important to take into account other effects to determine if these two abundance-based estimators are preferable. Given that other estimators have been identified as being more accurate in previous analyses, they could be a better alternative if the data to be compared have small variations in sample grain sizes. Interestingly, when the effect of sample coverage is studied, the precision of abundance-based estimators ACE and Chao1 diminishes at lower sampling intensities, whereas it remains constant or even increases for incidence-based and Jackknife nonparametric estimators (see Table 5). A higher precision of incidence-based estimators at low sampling intensities has been previously reported (e.g. Chazdon et al. 1998), so they appear to be the best option when sampling effort has been scarce. In our study, Jackknife and incidence-based estimators showed an erratic behaviour in the greater grain sizes. However, when grain is maintained constant, these four estimators (ICE, Chao2, Jackknife1 and Jackknife2) have been reported to be more accurate and less sensitive to other drawbacks, such as sample coverage, patchiness of species distributions, variability in the probability of capture, and many others (see Palmer 1990, 1991; Baltanás 1992; Palmer & White 1994; Chazdon et al. 1998; Chiarucci et al. 2001; Brose et al. 2003; Brose & Martinez 2004). Therefore, a balance between sample grain (see below) and sample coverage is needed to make the best choice of an estimator for specific datasets.

With regard to the most-adequate grain size for aggregating samples, it seems that any of the small grains (2Traps, Traps, Records and Individuals) here considered could give an appropriate picture of the community. All these grains refer to data captured in a single point, or a reduced area. Their high similarity could be due either to the effectiveness of such point data to describe the studied fauna, or to the fact that the arthropod community of a limited area is sampled equally by all of them. Gotelli & Colwell (2001) discuss the differences between using individuals or samples as a measurement of sampling intensity. Given our results, estimations made using individuals did not differ so much from those obtained using small samples, such as traps. On the other hand, greater grains, including Transect, seem not to be adequate to obtain unbiased species richness estimations that could be compared among different places and/or grains. As a conclusion, data referred to a point or a small area (a plot) seem to be the most adequate to estimate species richness. We thus recommend that authors should record and make available the abundance data for traps or similar units in biological databases and published papers. Not compiling the information available from biodiversity studies in an extensive way, but just for transects or local studies, may result in the loss of useful information on biodiversity patterns.

The performance of Records is particularly remarkable; this is a measure that, although seeming less informative, performs as well as Individuals and Traps. Records is just an incidence measure; that is, a sample of each time a species is recorded in a different place, day, trap, or by a different collector regardless of the number of individuals found. In spite of this, their performance in our study has been also good when using abundance-based estimators (see above). Other studies have demonstrated the utility of this measure to characterize sampling effort (Hortal et al. 2001; Lobo & Martín-Piera 2002; Martín-Piera & Lobo 2003), and recent works demonstrate its accuracy to describe the process of species accumulation from heterogeneous data (J.M. Lobo unpublished results; Romo Benito & García-Barros 2005). Thus, we encourage the use of this measure to analyse data coming from heterogeneous sources, such as studies made with different methodologies, or museum and collection data, a still underestimated but valuable reservoir of biodiversity data (see Petersen & Meier 2003; Petersen, Meier & Larsen 2003; Graham et al. 2004; Suarez & Tsutsui 2004).

Many times the information available for biodiversity meta-analyses comes from heterogeneous sources and different sampling methodologies, and thus, from different grains. Given the here-presented grain-associated performance, as well as the recommendations made by Chazdon et al. (1998), Brose et al. (2003) and Brose & Martinez (2004), among others, we propose a decision framework to determine which estimator should be used in each case (see Table 6). When all data pertain to comparable grains (or small grains, which we have seen to provide similar estimates), ICE, any of the two Jackknives, or Chao2 should be preferred. The relative performance of these estimators is still under discussion, as the results obtained to date are heterogeneous. For example, whereas Chazdon et al. (1998) found a better performance of ICE and Chao2 at low sample coverages, Brose et al. (2003) found that the behaviour of the latter is erratic and dependent on sample coverage, and that Jackknife estimators were the most accurate in determining species richness. We thus recommend following the recommendations given in Chazdon et al. (1998), as well as the decision paths provided by Brose et al. (2003), and Brose & Martinez (2004), to decide which estimator should be used when grains are similar or comparable. However, when grains are heterogeneous, and, specially, when analysing data from large grain sizes, such as islands, regions, or forest remnants, all these estimators lose accuracy, thus producing a grain-related effect in the estimated richness scores. Being data on abundance available, the precision of ACE and/or Chao1 in our study make them the best choice. However, both estimators give a lower bound to species richness (O’Hara 2005), producing only conservative estimates. Moreover, abundance data are not usually available, as checklists from different land patches and/or regions are the only information available (see, e.g. the butterfly data of Ricketts et al. 1999; used by Rosenzweig et al. 2003). In those cases where incidence estimators are needed, we recommend the use of any of the Jackknife estimators, as both seem to be less affected by grain size than ICE and Chao1. Here, the decision framework provided by Brose et al. (2003), and Brose & Martinez (2004) could help to decide which Jackknife estimator best suits the available data.

Table 6.  Choice of the best species richness’ estimator for biodiversity meta-analyses, depending on data characteristics (abundance or incidence) and studied grain sizes (‘Comparable’ means similar grain sizes and ‘small grains’ refer to data about Traps, small groups of Traps, Records or Individuals). Note that the estimators recommended for abundance data when using ‘Comparable and/or small grains’ are of incidence type
 Abundance dataIncidence data
Comparable and/ or small grainsICE, Jackknife1, Jackknife2, Chao2ICE, Jackknife1, Jackknife2, Chao2
Noncomparable and/or great grainsACE or Chao1Jackknife1 or Jackknife2

Concluding remarks

Chao1 and ACE estimators are highly precise with respect to grain variations. Abundance data will allow the direct comparison of species richness scores estimated from different grains using these two estimators. However, they present several drawbacks, so we provide a framework to decide which estimator should be preferred in each case, within the six that performed best in our study (ACE, ICE, Chao1, Chao2, Jackknife1 and Jackknife2). Although it is desirable to obtain a general agreement on using a single measure to estimate species richness (to allow the comparison of different works), their performance varies between data and cases (see Walther & Moore 2005). Thus, we also recommend providing the results of these six measures as a common addition to the results of biodiversity studies.

With regard to the selection of grain size, the smaller grains (from Pairs of Traps to Individuals, which usually may correspond to the real sample units applied in the field) seem to produce the most precise and unbiased estimations. Therefore, we recommend use of these grains for biodiversity estimations. It is also important to remark on the good performance of Records (Martín-Piera & Lobo 2003), a measure that could be applied to most of the primary biodiversity data, with no information on sampling effort associated, currently stored in old collections and classic literature.


The authors wish to thank Alberto Jiménez Valverde, Jorge M. Lobo, Pedro Cardoso, Bruno A. Walther, Robert Colwell, Anne Chao and one anonymous referee for their suggestions, discussions and critical review, and to Rafael Zardoya (Museo Nacional de Ciencias Naturales) for his kind help to use a Mac computer for computing the Individuals curve. We are also indebted to all the taxonomists who helped identify the morphospecies: H. Enghoff, F. Ilharco, V. Manhert, J. Ribes, A.R.M. Serrano and J. Wunderlich, and to J. Amaral, A. Arraiol, E. Barcelos, P. Cardoso, P. Gonçalves, S. Jarroca, C. Melo, F. Pereira, H. Mas i Gisbert, A. Rodrigues, L. Vieira and A. Vitorino for their contribution to field and/or laboratory work. Funding for field data collecting was provided by ‘Direcção Regional dos Recursos Florestais’ (‘Secretaria Regional da Agricultura e Pescas’) through the Project ‘Reservas Florestais dos Açores: Cartografia e Inventariação dos Artrópodes Endémicos dos Açores’ (PROJ. 17.01-080203). This work was also supported by Project ATLANTICO, financed by European Community program INTERREG III B. JH was supported by a Portuguese FCT (Fundação para a Ciência e Tecnologia) grant (BPD/20809/2004), and also by the Fundación BBVA project ‘Yámana – Diseño de una red de reservas para la protección de la biodiversidad en América del Sur Austral utilizando modelos predictivos de distribución con taxones hiperdiversos’, the Spanish MEC project CGL2004-0439/BOS and the FCT Center CITAA. CG was supported by a FCT grant (BD/11049/2002).