Undersampling bias: the null hypothesis for singleton species in tropical arthropod surveys


  • Jonathan A. Coddington,

    Corresponding author
    1. Department of Entomology, National Museum of Natural History, NHB-105, Smithsonian Institution, PO Box 37012, Washington, DC 20013-7012, USA
      *Corresponding author. E-mail: coddington@si.edu
    Search for more papers by this author
  • Ingi Agnarsson,

    1. Department of Entomology, National Museum of Natural History, NHB-105, Smithsonian Institution, PO Box 37012, Washington, DC 20013-7012, USA
    2. Department of Biology, University of Puerto Rico, P.O. Box 23360, San Juan, PR 00931-3360, Puerto Rico
    Search for more papers by this author
  • Jeremy A. Miller,

    1. Nationaal Natuurhistorisch Museum Naturalis, Darwinweg 2, 2333 CR Leiden, The Netherlands
    Search for more papers by this author
  • Matjaž Kuntner,

    1. Department of Entomology, National Museum of Natural History, NHB-105, Smithsonian Institution, PO Box 37012, Washington, DC 20013-7012, USA
    2. Institute of Biology, Scientific Research Centre of the Slovenian Academy of Sciences and Arts, Novi trg 2, PO Box 306, SI-1001 Ljubljana, Slovenia
    Search for more papers by this author
  • Gustavo Hormiga

    1. Department of Biological Sciences, The George Washington University, 2023 G Street NW, Washington D.C., 20052, USA.
    Search for more papers by this author

*Corresponding author. E-mail: coddington@si.edu


  • 1Frequency of singletons – species represented by single individuals – is anomalously high in most large tropical arthropod surveys (average, 32%).
  • 2We sampled 5965 adult spiders of 352 species (29% singletons) from 1 ha of lowland tropical moist forest in Guyana.
  • 3Four common hypotheses (small body size, male-biased sex ratio, cryptic habits, clumped distributions) failed to explain singleton frequency. Singletons are larger than other species, not gender-biased, share no particular lifestyle, and are not clumped at 0·25–1 ha scales.
  • 4Monte Carlo simulation of the best-fit lognormal community shows that the observed data fit a random sample from a community of ~700 species and 1–2 million individuals, implying approximately 4% true singleton frequency.
  • 5Undersampling causes systematic negative bias of species richness, and should be the default null hypothesis for singleton frequencies.
  • 6Drastically greater sampling intensity in tropical arthropod inventory studies is required to yield realistic species richness estimates.
  • 7The lognormal distribution deserves greater consideration as a richness estimator when undersampling bias is severe.

Null models in biology perform the useful function of explaining many data in often infuriatingly simple ways (Gotelli & Graves 1996; Colwell & Lees 2000; Harte et al. 2001; Hubbell 2001; Green & Ostling 2003). Often they counterbalance ad hoc explanations of the pattern at hand. In this paper, we propose that the high frequency of ‘singleton’ species (those represented by single individuals) in tropical arthropod inventories or surveys is simply explained as undersampling, and use a large but incomplete survey of spiders in Guyana to make the point. Species richness estimation continues to play an increasingly important role in conservation and biological inventory assessment in multiple contexts (Cardoso et al. 2008; de Thoisy, Brosse, & Dubois 2008; Shen & He 2008; Schoeman, Nel, & Soares, 2008).

Table 1 gives the results of a keyword search of Biological Abstracts (through 2007) for the largest and most ambitious tropical arthropod surveys that provide data on singletons. As these studies clearly show, high singleton frequencies characterize typical tropical arthropod surveys, averaging 32% of species from the 71 studies. Why are there so many singletons in those surveys? Clearly, community-level singletons (and the species they represent) would have no chance to reproduce and could play no significant ecological role. Although the tropics are said to harbour many rare species, presumably most are not so rare as to lack at least a few conspecific neighbours with whom successfully to mate. Hence, singletons in biological surveys are anomalies, and as such have attracted much attention. To explain them, an array of ad hoc hypotheses have been proposed. However, we propose that, particularly when singleton frequencies are high, undersampling as a null hypothesis should precede more biological ad hoc explanations (McGill 2003).

Table 1.  Summary of tropical arthropod surveys. Arthropod surveys from tropical forest sites reporting total abundance (abun., or species presence per sample for ants, Agosti et al. 2000), species richness (spp.), and singletons (reported, calculated from figures given, or approximated as Fisher's α, noted in source column). Intensity is abun./spp. A search of Biological Abstracts (1986–2007) on the terms (species richness) and (Arthropoda) and (Oriental region or Australasian region or Neotropical region or Ethiopian region) produced 514 results, many of which did not provide the required inventory statistics or were not from wet tropical sites. Those meeting our criteria, in addition to those known to us personally, are listed below. References for this table are listed in the Appendix
TaxonStudy siteAbun.Spp.SingletonsIntensityPercentage of SingletonsSource
ArthropodsAustralia20 507759 271  2736Basset & Kitching 1991
InsectaCosta Rica (Area 1)488142  91   364Janzen & Schoener 1968
InsectaCosta Rica (Area 2)1362262 165   563‘ ’
InsectaCosta Rica (Area 3)4857404 254  1263‘ ’
InsectaCosta Rica (Area 4)1339545 390   272‘ ’
Insecta (leaf-chewing+ sap-sucking)New Guinea80 0621050 278  7626Novotný & Basset 2000
InsectaGuyana27 735604 229·5  4638Basset et al. 2001 (singletons calculated)
 BlattariaPanama (BCI)322479  15  4119Wolda 1983 (Fisher's α)
 ColeopteraAustralia (Queensland)10 0001514 612   740Monteith & Davies 1984 (approx. values)
 Coleoptera: CurculionidaePanama (BCI)28 521703 131  4119Wolda 1987 (Fisher's α)
 Coleoptera: Pselaphidae, AnthicidaePanama (BCI)6482114  19·7  5717‘ ’
 ColeopteraPanama (BCI)34 705597 102·5  5817‘ ’
 ColeopteraNew Guinea4840633 321   851Allison et al. 1993
 ColeopteraNew Guinea3977418 199  1048Allison et al. 1997
 ColeopteraPeru (Tambopata)15 86934291728   550Erwin 1997
 ColeopteraSulawesi18 0001355 623  1346Hammond et al. 1997 (approx. values)
 ColeopteraBrazil8454993 446·9   945Didham et al. 1998 (singletons calculated)
 Coleoptera: CurculionidaeHonduras26 891293  38   913Anderson & Ashe 2000
 Coleoptera: StaphylinidaeHonduras7349224  53  3324‘ ’
 ColeopteraMalaysia80281711 823   548Chung et al. 2000
 ColeopteraUganda29 7361433 596  2142Wagner 2000
 ColeopteraEcuador2329318  91   729Lucky et al. 2002
 Coleoptera: ScarabaeinaeBolivia405073   7  5510Spector & Ayzama 2003
 Coleoptera: Pselaphinae, HisteridaeEcuador3465385 155   940Carlton et al. 2004
 Coleoptera: PhytophagousPanama3009364 139   838Ødegaard 2004
 ColeopteraEcuador15 1812001 397   820Erwin et al. 2005
 Coleoptera: ScarabaeinaeColombia7894101  20  7820Escobar et al. 2005
 ColeopteraBrazil (Parana)1883518 266   451Ganho & Marinoni 2005
 Coleoptera: AticiniBrazil (Parana)1891106  32  1830Linzmeier et al. 2006
 ColeopteraAustralia29 9861473 526  2036Stork & Grimbacher 2006
 Diptera: MuscidaeBrazil (Parana)701491  10  7711Costacurta et al. 2003
 Diptera: PhoridaeCosta Rica3341115  20  2917Brown 2004
 Diptera: SyrphidaeBrazil (Parana)39276  12   516Marinoni et al. 2004
 EphemopteraPanama (Corriente Grande)717827   4 26615Wolda & Flowers 1985 (Fisher's α)
 EphemopteraPanama (Miramar)29 12033   4 88212‘ ’
 EphemopteraZaire29 89221   2142310‘ ’
 HemipteraAustralia600498  35  6136Andrew & Hughes 2005
 HomopteraPanama (BCI)22 046458  82·1  4818Wolda 1987 (Fisher's α)
 HomopteraPanama (Pipeline Rd.)1324332 126   438Wolda 1979
 Hymenoptera: ParasiticaSulawesi700293 179   261Noyes 1989
 Hymenoptera: FormicidaeCosta Rica (Monteverde)399853   6  7511Longino & Nadkarni 1990
 Hymenoptera: FormicidaeCosta Rica7904437  51  1812Longino et al. 2002
 Hymenoptera: ApidaeBrazil (Minas Gerais)118320   6  5930Nemesio & Silveira 2006
 Lepidoptera: butterfliesMalaysia9031620 118  1519Corbet 1942
 Lepidoptera: mothsMalaysia94611048 538   951Barlow & Woiwod 1989
 Lepidoptera: butterfliesEcuador6690130  20   515DeVries et al. 1997
 Lepidoptera: butterfliesEcuador88391  22  1024DeVries et al. 1999
 Lepidoptera: butterfliesEcuador11 861128  18  9314DeVries & Walla 2001
 LepidopteraBorneo48553  16   930Schulze et al. 2001
 Lepidoptera: butterfliesThailand193653   4  37 8Ghazoul 2002
 Lepidoptera: GeometridaeEcuador23 720868 161  2719Hilt et al. 2006
 OdonataPeru1537136  31  1123Louton et al. 1996
 OrthopteraPanama (BCI)156673  15·9  2122Wolda 1987 (Fisher's α)
 PsocopteraPanama (BCI)10 092148  20  6814Broadhead & Wolda 1985 (Fisher's α)
 PsocopteraPanama (Fortuna)430184  10  1512‘ ’
AraneaeBolivia (50 m)875191  89   547Coddington et al. 1991, 1996
AraneaeBolivia (1200 m)1109329 147   345‘ ’
AraneaeBolivia (2200 m)654158  70   444‘ ’
AraneaeBrazil (Manaus)7562  32   152Höfer et al. 1994
AraneaeTobago177798  27  1828Hormiga & Coddington 1994
AraneaePeru (Samiria)58951140 520   546Silva 1996
AraneaePeru (Pakitza)2616498 207   542Silva & Coddington 1996
AraneaeCosta Rica714486  11  8313Bodner 2002
AraneaeTanzania (understorey)9096170  32  5419Sørensen et al. 2002
AraneaeTanzania (canopy)5233149  35  3523Sørensen 2003
AraneaeMalaysia6999578 145  1225Floren & Deeleman-Reinhold 2005, personal communication
AraneaeMt. Cameroon (500 m)573231  93   240Coddington et al., unpublished
AraneaeMt. Cameroon (3000 m)155555  14  2825‘ ’
AraneaePeru (Tambopata)1821635 341   354Coddington & Silva, unpublished
AraneaePeru (Manu)222123  78   263Erwin & Coddington, unpublished
AraneaeGuyana5964351 101  1729This study
Averages 9372464 176  61·631·6 

Singleton tropical arthropod species are anomalous for several reasons. First, minimum viable population sizes are conventionally at least 500 individuals (Gilpin & Soulé 1986). Second, many arthropods begin life clumped because eggs are clumped when laid – in spiders eggs are clustered within an egg sac. Most nonvolant arthropods are small and probably rarely travel hundreds or even dozens of metres to mate. Third, clumped distributions in nature are far more common than random or dispersed (Krebs 1999). While clumping certainly depends on scale, at hectare scales randomness is typical of canopy trees and jaguars, not small, nonflying, sedentary arthropods such as spiders.

Ad hoc explanations for singletons often invoke aspects of the biology of particular groups, such as host or food plant specificity (Price et al. 1995). In spiders, males of sedentary web-spinning species must wander to find females (potentially passing through atypical habitat patches, i.e., tourists), and are likely to be small and rare (Vollrath & Parker 1992). General explanations include source-sink phenomena or mass-effects (e.g. ‘ecological drift,’Hubbell 2001) at both local (‘tourist’) and regional (‘waif’ or ‘vagrant’) scales (Schmida & Wilson 1985; Pulliam 1988; Southwood 1996; Stork & Hammond 1997; Novotný & Basset 2000; Magurran & Henderson 2003; Basset et al. 2004; Ødegaard 2004). Time, space, or method ‘edge effects’ are also frequent explanations. Adults outside their breeding seasons are scarce, and if only adults are identifiable (true for spiders), will be artefactually rare (Ulrich 2001; Longino, Coddington, & Colwell 2002; Scharff et al. 2003; Basset et al. 2004). Nocturnality or seasonal migration could produce similar effects. Space edge effects are usually microhabitat preferences. Species patches just trespassing on plot boundaries might produce many ‘false’ singletons. Method edge effects are the accidental sampling of a species by an inappropriate method, such as a canopy species in a pitfall trap (Longino et al. 2002; Scharff et al. 2003). Finally, singletons may be absolutely rare, i.e. sparse with large nearest-neighbour distances throughout their range. Perhaps, as is now recognized for tropical trees (Pitman et al. 1999; Kenfack et al. 2006), we drastically underestimate the scale at which many tropical arthropod species live and ought to be sampled.

Undersampling bias and biological explanations are not mutually exclusive. However, if repeated random sampling of communities modelled on statistical parameters estimated from the sample mimic the observed results, undersampling should serve as the initial null explanation for high singleton frequencies (McGill 2003), analogous to the use of null models in other fields (Harte et al. 2001; Hubbell 2001). Variation not explained by undersampling may then be attributed to more complex causes.

Statistical methods to assess undersampling bias are relatively recent; quantitative estimates of its magnitude have been historically difficult, if not impossible, to obtain. Observed richness values are traditionally used for descriptive or comparative purposes (Groombridge 1992; Heywood & Watson 1995; Levin 2001). If high singleton frequencies indicate undersampling, however, then tropical arthropod communities are substantially larger than measured, and comparisons based on observed numbers are misleading. This has important implications for conservation biology, and also implies that typical inventories are under-resourced and/or poorly designed.

Here we use the results of an intensive 1-ha survey of spiders to test various explanations for high singleton frequency. Although spiders are typical sedentary arthropod predators and these results may apply only to that guild, high singleton frequencies also characterize inventories of other tropical arthropods (Table 1). Specifically, we test four process hypotheses and the null hypothesis of undersampling bias: singletons tend to be small and therefore missed; singletons tend to be males because as adults they travel further than females; nearest conspecific distances exceed 0·25–1 ha spatial scales (population structure is much larger than anticipated); singletons are ‘cryptic’ and hard to detect; and singletons are simply an artefact of undersampling because the scope of the survey exceeded sampling resources.


study site

The study was carried out during 10 days, 5–14 July, 1999, in a primary lowland blackwater rainforest (1°36′46″N, 58°38′15″W) on the bank of Essequibo River, 240 m elevation, 4·42 km south of Gunn's landing, Upper Takutu-Upper Essequibo, Guyana. Four nested, concentric 0·25-ha subplots (total 1 ha) were established in uniform closed canopy forest (Fig. 1). Five experienced collectors worked simultaneously in the field during both day and night using a battery of collecting methods that broadly access most of the spider fauna (see Coddington et al. 1991; Sørensen, Coddington, & Scharff 2002; Scharff et al. 2003 for details).

Figure 1.

Plot design of four nested 0·25-ha subplots A-D with counts of singletons (italics) per subplot and doubletons per subplot pairs (lines with adjacent numbers).

conspecific distances

The nested subplot design provided a range of spatial scales. If singletons are spatial edge effects – multi-individual clumps or patches with only one individual in the subplot or plot – the outermost subplot with the largest perimeter should contain the most singletons (Fig. 1). Likewise, doubletons of spatially clumped species should occur in the same subplot. More generally, we tested for clumping by comparing the observed distribution of singletons and doubletons among subplots against the null hypothesis of equal frequency in all possible subplot combinations (e.g. A, B, C, or D for singletons, and AA, AB, AC, AD, BB, etc. for doubletons).

body size

We measured total body length to the nearest 0·1 mm of one individual of each sex (when available) of each species. The ‘average’ tropical spider is then the mean of these lengths (assuming equal sex ratios) weighted by the relative abundance of each species. A t-test then compared mean size of singletons to nonsingletons.

sex ratio

Using the observed sex ratio in the total sample as the null, we compared the singleton sex ratio to it with a chi-squared test, both for singletons as a whole and for singletons of web-spinning species only, where males must wander to encounter the sedentary females.

cryptic habits

Generally speaking, spider species within families are more alike in their biology than between (Coddington and Levi 1991). Araneids mostly spin orb-webs, mostly above the forest floor, but anapids and symphytognathids spin theirs mostly in the leaf litter. Philodromids run on leaves, and salticids jump between them (but only during the day). Because tropical 1-ha inventories usually find at least 30 families, testing the relative frequency of singletons among families against a null of the relative abundance of families in the total sample should detect whether singletons tend to have one lifestyle more frequently than another.

undersampling bias

We assessed inventory completeness by visually inspecting the average of 50 resamples of the observed species accumulation curve, as well as the singleton and doubleton curves, and four commonly used species richness estimators (Chao1, Chao2, ICE and ACE; Peterson & Slade 1998; Walther & Martin 2001; Colwell 2005). In a complete inventory, the observed curve should asymptote and singletons should tend to zero, with doubletons lagging singletons. If incomplete but sufficient to estimate richness accurately, estimator curves should asymptote (Colwell & Coddington 1994). Constantly rising curves of all sorts imply incomplete inventories.

We also fit the data to a lognormal distribution using the method described in Scharff et al. (2003) and Longino et al. (2002). We use the lognormal as a reasonable null hypothesis (McGill 2003). Other models, such as the parameter-rich zero-sum multinomial, have been proposed as better fits to empirical data than the lognormal (e.g. Hubble 2001), however, a recent detailed test fails to support that claim, and indeed showed the opposite (McGill 2003). Given the high number of parameters in the zero-sum model, the cumbersome procedure of fitting the model, and lack of evidence for its superior fit to empirical data, McGill advised the preferential use of the lognormal as the simpler (more parsimonious) null model. Using the best-fit lognormal parameters, we generated and randomly sampled 6000 individuals from 1000 replicate communities of each of four sizes: 500, 600, 700, or 800 species, which parameters were chosen to mimic the empirical sample. We then compared the observed sample to these 4000 simulated samples on numbers of singletons, doubletons, and species. If the observed sample clearly deviated, undersampling bias alone does not explain high singleton frequency.

Statistical tests, curve fitting, modelling, resampling procedures, and species richness estimation used a combination of systat 11 (Systat Software, Inc., Richmond, CA, USA), EstimateS 7·50 (Colwell 2005), and Microsoft Excel 2007 (Microsoft Corp., Redmond, WA, USA).

Finally, a few measures of leaf litter, shrub/subcanopy, and canopy tropical moist forest spider densities per m2 have been published (Table 2). Extrapolated to 1 ha, these statistics provide crude estimates of spider abundance that can be compared to the total abundances predicted by the lognormal model.

Table 2.  Spider density in tropical forests. Estimated number of total and adult spiders in a hectare of primary tropical rainforest. Min, computed from minimum values; max, from maximum values. References for this table are listed in the Appendix
LayerPlaceMethodDensity (n/m2)
 BrazilBerlese22171·8Höfer & Brescovit 2001
BrazilBerlese330125·2Höfer & Brescovit 2001
BrazilBerlese12941·9Morais 1985
BrazilBerlese10835·1Adis & Schubart 1984
Mean 197·068·5 
 BrazilBeating0·50·2Höfer & Brescovit 2001
BrazilBeating3·51·1Höfer & Brescovit 2001
GuyanaBeating Day2·70·9This study
GuyanaBeating Night1·90·6This study
Mean 2·20·7 
 BruneiFog4·71·5Russell-Smith & Stork 1994
SulawesiFog4·61·5Russell-Smith & Stork 1995
AustraliaFog8·62·8Basset 1990, 1991
TanzaniaFog4·81·5Sørensen 2003
BrazilFog4·8 Adis 1984
BrazilFog2·00·7Höfer et al 2001
BrazilFog5·51·8Höfer et al. 1994
Mean 5·01·6 
Total spiders (1 m2 column) 20471 
Min spiders (1 m2 column) 11136 
Max spiders (1 m2 column) 342129 
Total spiders (ha) 2 041 500708 333 
Min spiders (ha) 1 105 000360 000 
Max spiders (ha) 3 421 0001 291 000 

specimens and sorting procedures

Each sample was labelled with plot, date, collector, method, and replicate number if two samples were otherwise identical. Team members (all arachnological taxonomists) or other experts on particular families sorted the specimens to morphospecies. All identifications of singletons and doubletons were checked and verified by at least two of the team members. Voucher specimens of each species identified in this study are deposited at the National Museum of Natural History (NMNH), Smithsonian Institution, Washington DC.


The five collectors accumulated 300 samples over 10 days from the 1-ha plot (Table 3) containing a total of 5965 adults (and 6953 juveniles) of 352 species, of which 101 were singletons (29%) and 56 were doubletons. The most abundant species numbered 412. Inventory completion (observed richness/Chao1 estimate) ranged from 15% to 71% among methods, and overall was 79%. Sampling intensity (no. of ind./no. of spp.) ranged from 1·4 to 10·5 among methods and overall was 17. The survey compares favourably to other large published efforts in intensity and numbers of species encountered, considering that most spider species cannot be trapped (Table 1). However, the continually rising accumulation curves and richness estimators indicate that the inventory was still incomplete by the end of sampling (Fig. 2). The 95% upper confidence limit of the Chao2 estimator (itself only a lower-bound estimate), for example, was 520 species, but clearly had not reached a limit. True species richness in the hectare almost certainly exceeded 500 species, and probably much more.

Table 3.  Collecting methods and results. AE, BE, CR, GR, PF, SW, D and N stand for aerial, beating, cryptic, ground, pitfall, sweeping, day, and night collecting methods, respectively (see text). Sample intensity is total individuals/total species. Inventory completion is total species/Chao1 estimate
No. of samples 12  76 36 19 28 20 28 32 46 2 1 300
Total individuals102221064427252839962170343923245965
Total species 45 210138 95 69 72 69115 571517 352
Singletons 32  73 57 53 29 34 30 54 251114 101
Doubletons  2  29 29 15  9 13  9 17  4 2 1  54
Sample intensity  2·3  10·5  4·7  2·9  7·7  5·5  9·0  6·2  7·7 1·5 1·4  17·0
Percentage of Singletons 71%  35% 41% 56% 42% 47% 43% 47% 44%73%82%  29%
Chao1 estimate301 30219418911212011520013545n/a 443
Inv. completion 15%  70% 71% 50% 62% 60% 60% 57% 42%33%   79%
Figure 2.

Four species richness estimators (ACE, ICE, Chao 1, Chao 2), the 95% upper confidence limit of the Chao2 estimator, observed, singleton, and doubleton curves for data of Table 3.

The mean and standard deviation of the body lengths of adults collected was 2·89 ± 2·85 mm (thus an estimate of the average size of an adult lowland tropical moist forest spider). The mean singleton body length was 5·30 ± 4·67 mm. Singletons are significantly larger, not smaller, than the average species.

The overall male : female sex ratio in the sample was significantly female biased (1:1·3, P < 0·01). The overall singleton sex ratio was as biased as the total sample (1:1·7, P = 0·18). Sedentary web-spinner singletons were equally biased (1:2·8, P = 0·12). Singletons, therefore, are not disproportionately males of sedentary web-spinning species.

The distribution of doubletons across subplots (Fig. 1) was random (P = 0·82) as was the incidence of singletons from the centre to the outermost subplot (P = 0·80). Tripletons also showed no tendency to clump within subplots. Conspecific nearest neighbour distances, therefore, are not clumped at the coarse 0·25- to 1-ha scales tested here.

Singletons showed no taxonomic pattern, occurring in families in proportion to the latter's relative abundance (P > 0·99). If undersampling bias varied according to lifestyle defined as family identity, the effect was not detectable at this level of sampling intensity.

The observed data fit the lognormal distribution well (Fig. 3, 0·9 > P > 0·5). The predicted number of species in the modal octave S0 was 76·4 ± 13 (µ = 6·2562), the variance term (‘a’) was 0·195 ± 0·210 (σ = 3·6262) and estimated community size 694 species. 

Figure 3.

Lognormal fit (0·9 > P > 0·5) to data of Table 3. Predicted community size is 694 species. Note the over-estimation of abundant species at the right-hand tail.

Figure 4 shows the results of 1000 random draws of constant sampling effort (6000 individuals) from simulated lognormal distributions with the above parameter values for 500, 600, 700, and 800 total species, compared to the observed data (arrow). For clarity, only 25 randomly chosen samples from each community size are plotted, as otherwise the observed data point would have been completely obscured. Observed richness rises with total richness, and numbers and percentages of singletons and doubletons rise because, as richness increases, sampling intensity decreases. On these three statistics, the empirical sample falls between the 700 and 800 species model communities, roughly agreeing with the lognormal richness estimate in Fig. 3. Overall, it falls well within the stochastic variation seen in these random draws from ‘null’ lognormal distributions (Fig. 4). True singletons in the model lognormal communities averaged only 4% of the total.

Figure 4.

Singleton, doubleton, and observed species totals from the Guyana study (open circle, arrow) and 25 random samples of 6000 individuals from model lognormal communities of 500, 600, 700, and 800 species.

To assess how many more specimens would be required to enable richness estimators to cover true community richness under these circumstances, we sampled 60 000 (intensity 170) and 120 000 individuals (intensity 340) from the 700 species lognormal community, thus 10 and 20 times the actual sampling effort. At an intensity of 170, percentage of singletons was 14%, and the Chao and coverage estimators were 595–600 species with Chao upper confidence intervals of 636 species – still short of the true 700 species richness. At an intensity of 340, the Chao and coverage estimators were 650–663 species, with a Chao upper confidence interval of 702 species – thus just covering the true richness value – and percentage of singletons fell to 9%. 

Figure 5 depicts the logarithmic decline in singletons with sampling intensity for the data of Table 1, and predicts zero singletons at sampling intensities of roughly 1100. Sampling the model community at that intensity yielded on average 4% singletons and 658 species observed.

Figure 5.

Log-log plot of sampling intensity vs. percentage of singletons for data of Table 1 (r2 = 0·58; P = 0·001).

We present what few data exist on tropical spider densities in Table 2. Ignoring differences due to locality and construed as a ground to canopy vertical 1 m2 column, the leaf litter contains most individuals, the canopy/subcanopy less, and the shrub/understorey layer least. Given the decrease in leaf area or other substrate with height above the ground, the decline is plausible. It predicts, extremely roughly, about 2 million total spiders per hectare of tropical forest (range 1·1–3·4 M). The modelled lognormal populations ranged between 1·2 and 3·3 million individuals, which agrees with Table 2.


Rare species and estimating total species richness in particular are difficult statistical problems (Bunge & Fitzpatrick 1993; Ulrich 2001; Brose, Martinez, & Williams 2003; Magurran 2004; Ellison & Agrawal 2005; Cunningham & Lindenmayer 2005; Mao & Colwell 2005; Walther & Moore 2005). Estimating how many data are required to obtain robust and reliable species richness estimates is also difficult (Keating et al. 1998; McArdle 1990). This complicates inventory design. Modelling studies have suggested that nonparametric richness estimators do not begin to cover the true value until about two-thirds to four-fifths of the species have been observed (Walther & Morand 1998; Mao & Colwell 2005). On the other hand, intensely sampled communities usually are lognormal at local scales, even if the full distribution is truncated by failure to detect rare species (Sugihara 1980; Longino et al. 2002; Connolly et al. 2005, but see Williamson & Gaston 2005).

In this study, the empirical sample of 6000 individuals may have included only half the species present, with singletons comprising 29% of species observed. Nonparametric richness estimators suggested only 443–460 species, a shortfall of 35% compared to the lognormal estimate. While any singleton may have been due to any of the process explanations discussed above, the simplest explanation for the high frequency is undersampling. As sampling continues and singleton frequencies drop, biological explanations become more plausible.

Two ‘biological’ explanations were statistically significant, but neither in the direction hypothesized. Singletons were significantly larger (not smaller) than the average spider. Twenty-three singletons over 7 cm caused that difference. These were mostly large cursorial species (including ctenids, sparassids, and miturgids) for which absolute densities of one, or very few individuals per hectare are plausible. Singletons are also disproportionately females, not males, but the sample in general was female-biased, and singletons no more so, even among sedentary web-spinning species where the presumed bias towards singleton wandering males should have been most pronounced. Adult male spiders are relatively short lived and wandering males experience exceptionally high mortality (Vollrath & Parker 1992); both of these factors likely contribute to a female-biased sex ratio in the inventory data, even if the sex ratio at birth were even (as they are for most spiders examined to date, see Avilés & Maddison 1991, Avilés, McCormack, Cutter & Bukowski 2000).

The other explanations tested, lifestyle, spatial edge effects, and clumping of individuals at 1-ha scales and below, were insignificant. Novotný & Basset (2000) and Ulrich (2001) also found that few biological explanations of singletons were supported. Magurran & Henderson (2003) use a 21-year data set on a temperate fish community of 80 species to show that about a third to a half of the species accumulated over that time-span were tourists or waifs. In any given short-term sampling event, however, presumably few of the rare species would have been tourists or waifs. In a spider inventory of a ‘known’ fauna, Scharff et al. (2003) hypothesized 58% of singletons as phenological, methodological, or spatial edge effects, but they did not test the null hypothesis of undersampling bias. For relatively instantaneous events such as this inventory, singleton frequencies are about what one would expect from random samples of a lognormally distributed community – in this case, of about 700 species. The null hypothesis of undersampling bias cannot be rejected.

This was an intense, short-term inventory (300 person-hours), designed to yield an ‘instantaneous’ richness estimate that avoided the confounding effect of phenological change. Especially in relatively aseasonal tropical habitats, sampling year round or for multiple years might yield a more complete inventory over and above the effect of greater sampling intensity (DeVries, Walla, & Greeney 1999; Scharff et al. 2003). Increasing the sampling area might also improve efficiency, especially if, as perhaps suggested by the significantly larger singleton size and the possibility that some true singletons occur in any given hectare, we underestimated the scale at which sedentary tropical arthropods should be sampled. Their lifetime ranges may encompass much larger areas. On the other hand, species richness increases logarithmically with area (Rozenzweig 1995), burdening species richness estimates. Regardless, the key point is that the scope of the inventory must be carefully matched to available resources.

What little we know of tropical spider communities broadly agrees with the predictions of the lognormal fit (Table 2). Our empirical sample included only nine of 23 predicted octaves, yet the implied community, when randomly sampled at the same intensity, compared well to empirical observations of total species, numbers of singletons and doubletons, maximum abundance, and total numbers of individuals (Fig. 4). None of the collecting methods used in Table 2 are completely efficient, therefore, the actual hectare abundance of spiders is probably higher.

When the modelled 700 species community was sampled at an intensity of 1100, on average 658 species and 4% singletons resulted. Lognormal distributions always predict some singletons (here on average 28 or 4%), and stochastic replicates never contain all 700 species (here on average 685). Practically speaking, sampling intensities of 1100 detects just about as many species as stochastic models provide.

For these data, a sampling intensity of 340 (10 times the actual sampling effort) was just sufficient to include the known richness within the upper bound of the Chao2 estimator. This implies that inventories, as a rule of thumb, should aim for intensities between that and 1100 to obtain realistic nonparametric estimates of species richness.

Richness estimators are relatively more efficient if they can report the true richness based on relatively few data. The efficiency of available nonparametric richness estimators is poor in the sense that roughly three quarters of the community must be observed before the estimator confidence interval actually covers the true value (Walther & Morand 1998). Chao estimators, moreover, have a maximum upper bound of about half the square of the observed richness (if the sample of n species contains n-1 singletons or uniques and one doubleton or duplicate), but in practice such efficiencies are never achieved because of the improbability of so biased a sample.

The lognormal distribution can potentially result in higher richness estimates than nonparametric approaches (given the same data) because it assumes the relative abundance distribution is symmetric around the modal octave (Sugihara 1980; Longino et al. 2002), and therefore tends to at least double the observed richness. A number of authors argue that empirical communities show an asymmetric excess of rare species (Nee, Harvey, & May 1991; Nekola & Brown 2007), and Hubbell and co-workers argue from first principles that such is expected (Hubbell 2001; Volkov et al. 2003). However, McGill (2003) suggests that this observed skew in species abundance distributions may also be a sampling artefact. One might also point out that the lognormal even less realistically overestimates the abundant tail of the distribution (Fig. 3, Longino et al. 2002; Magurran & Henderson 2003). However, even if the lognormal slightly underestimates rare species, that error is small compared to the gross negative bias of nonparametric estimators at small sample sizes.

The stochastic variation in small samples drawn from the same lognormal population is impressive (Fig. 4). For the 700 species case, 1000 draws of 6000 individuals produced singleton counts of 62–134 and observed richnesses from 121–414, which comfortably cover the observed statistics of 101 and 351. The lognormal distribution therefore may still be a useful method to estimate species richness under circumstances where many data are available, yet not enough for nonparametric estimators to function well. Unlike the relative abundance distribution-based estimators of Ulrich (1999, 2005), it does not require an explicit ratio of sampled to total habitat area, and thus is more practical in the field.

If general, this result implies that even large survey efforts (Table 1, Fig. 5) continue to undersample tropical arthropod biodiversity by perhaps a factor of 2 if singletons average 32% of the total. In many surveys, the figure is much higher (Table 1). Undersampling is a serious issue even for large mammal and bird surveys, where singletons average 16% (Bernard & Fenton 2002; Shankar and Sukamar 2002; McCain 2004). Consequently, typical surveys will underestimate species richness, with obvious implications for our understanding of biodiversity, and for any conservation decisions based on such data.

In summary, it appears that most tropical arthropod biodiversity surveys have been severely under-resourced if their goal was to census or estimate species richness of a defined taxonomic community at a particular place and time. Reliable methods do exist to estimate how many data are required to estimate many ecological statistics (Krebs 1999; Magurran 2004), but species richness historically is an exception. One may hope that future statistical research will improve estimator efficiency, but in the meantime the use of existing estimators dramatically exposes the gap between inventory design as implemented and the minimum necessary to obtain reliable richness estimates. Here the lognormal was more efficient than nonparametric estimators, and perhaps should be used more frequently. Species richness estimators are increasingly used in basic research to detect undersampling bias; results thus far suggest that it is ubiquitous and severe. Rather than scaling back inventory goals, we suggest that inventory analyses continue to assess undersampling bias in order to justify the budgets required to obtain adequate data. Funding sources and consumers of these essential data can scarcely argue that inadequate results are acceptable. If results continue to demonstrate that much greater sampling intensities are required, such will eventually become the norm, rather than the exception.


Thanks to Vicky Funk, Carol Kelloff, David Clarke, Tom Hollowell, and Romeo Williams for logistical support, and Rick West, Gita Bodner, G. B. Edwards, Martín Ramírez, Fernando Alvarez-Padilla, Scott Larcher, and Dana DeRoche for specimen work and identification. We are grateful for the hospitality of the WaiWai community of Southern Guyana. We thank Robert Colwell, Phil DeVries, Jack Longino, and Anne Magurran for comments on an earlier draft. Support for this research came from a Smithsonian ‘Biodiversity of the Guianas’ grant to J. A. Coddington, a Smithsonian Neotropical Lowlands grant to J. Coddington, and a National Science Foundation grant to G. Hormiga and J. Coddington (DEB 9712353).