Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species
Article first published online: 16 OCT 2013
© 2013 The Authors. Methods in Ecology and Evolution © 2013 British Ecological Society
Methods in Ecology and Evolution
Volume 4, Issue 11, pages 1091–1100, November 2013
How to Cite
Chao, A., Wang, Y. T., Jost, L. (2013), Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species. Methods in Ecology and Evolution, 4: 1091–1100. doi: 10.1111/2041-210X.12108
- Issue published online: 6 NOV 2013
- Article first published online: 16 OCT 2013
- Accepted manuscript online: 19 AUG 2013 08:51PM EST
- Manuscript Accepted: 1 AUG 2013
- Manuscript Received: 14 FEB 2013
- Taiwan National Science Council. Grant Number: 100-2118-M007-006-MY3
- Population Biology Foundation
- Good-Turing frequency formula;
- mutual information;
- sample coverage;
- Shannon entropy;
- species accumulation curve;
- species discovery rate
- Estimating Shannon entropy and its exponential from incomplete samples is a central objective of many research fields. However, empirical estimates of Shannon entropy and its exponential depend strongly on sample size and typically exhibit substantial bias. This work uses a novel method to obtain an accurate, low-bias analytic estimator of entropy, based on species frequency counts. Our estimator does not require prior knowledge of the number of species.
- We show that there is a close relationship between Shannon entropy and the species accumulation curve, which depicts the cumulative number of observed species as a function of sample size. We reformulate entropy in terms of the expected discovery rates of new species with respect to sample size, that is, the successive slopes of the species accumulation curve. Our estimator is obtained by applying slope estimators derived from an improved Good-Turing frequency formula. Our method is also applied to estimate mutual information.
- Extensive simulations from theoretical models and real surveys show that if sample size is not unreasonably small, the resulting entropy estimator is nearly unbiased. Our estimator generally outperforms previous methods in terms of bias and accuracy (low mean squared error) especially when species richness is large and there is a large fraction of undetected species in samples.
- We discuss the extension of our approach to estimate Shannon entropy for multiple incidence data. The use of our estimator in constructing an integrated rarefaction and extrapolation curve of entropy (or mutual information) as a function of sample size or sample coverage (an aspect of sample completeness) is also discussed.