Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data

Authors

  • David I. Warton

    Corresponding author
    1. Department of Biological Sciences, Division of Environmental and Life Sciences, Macquarie University NSW 2109, Australia
    • Department of Statistics, School of Mathematics, University of New South Wales, NSW 2052 Australia.
    Search for more papers by this author

Abstract

An important step in studying the ecology of a species is choosing a statistical model of abundance; however, there has been little general consideration of which statistical model to use. In particular, abundance data have many zeros (often 50–80 per cent of all values), and zero-inflated count distributions are often used to specifically model the high frequency of zeros in abundance data. However, in such cases it is often taken for granted that a zero-inflated model is required, and the goodness-of-fit to count distributions with and without zero inflation is not often compared for abundance data.

In this article, the goodness-of-fit was compared for several marginal models of abundance in 20 multivariate datasets (a total of 1672 variables across all datasets) from different sources. Multivariate abundance data are quite commonly collected in applied ecology, and the properties of these data may differ from abundances collected in autecological studies. Goodness-of-fit was assessed using AIC values, graphs of observed vs expected proportion of zeros in a dataset, and graphs of the sample mean–variance relationship.

The negative binomial model was the best fitting of the count distributions, without zero-inflation. The high frequency of zeros was well described by the systematic component of the model (i.e. at some places predicted abundance was high, while at others it was zero) and so it was rarely necessary to modify the random component of the model (i.e. fitting a zero-inflated distribution). A Gaussian model based on transformed abundances fitted data surprisingly well, and rescaled per cent cover was usually poorly fitted by a count distribution. In conclusion, results suggest that the high frequency of zeros commonly seen in multivariate abundance data is best considered to come from distributions where mean abundance is often very low (hence there are many zeros), as opposed to claiming that there are an unusually high number of zeros compared to common parametric distributions. Copyright © 2005 John Wiley & Sons, Ltd.

Ancillary