Volume 20, Issue 2
RESOURCE ARTICLE

Dirichlet‐multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

Joshua G. Harrison

Corresponding Author

E-mail address: joshua.harrison@uwyo.edu

Department of Botany, University of Wyoming, Laramie, WY, USA

Correspondence

Joshua G. Harrison, Department of Botany, 3165, University of Wyoming, 1000 E. University Avenue, Laramie, WY 82071, USA.

Email: joshua.harrison@uwyo.edu

Search for more papers by this author
W. John Calder

Department of Botany, University of Wyoming, Laramie, WY, USA

Search for more papers by this author
Vivaswat Shastry

Department of Botany, University of Wyoming, Laramie, WY, USA

Search for more papers by this author
C. Alex Buerkle

Department of Botany, University of Wyoming, Laramie, WY, USA

Search for more papers by this author
First published: 24 December 2019
Citations: 4

Abstract

Molecular ecology regularly requires the analysis of count data that reflect the relative abundance of features of a composition (e.g., taxa in a community, gene transcripts in a tissue). The sampling process that generates these data can be modelled using the multinomial distribution. Replicate multinomial samples inform the relative abundances of features in an underlying Dirichlet distribution. These distributions together form a hierarchical model for relative abundances among replicates and sampling groups. This type of Dirichlet‐multinomial modelling (DMM) has been described previously, but its benefits and limitations are largely untested. With simulated data, we quantified the ability of DMM to detect differences in proportions between treatment and control groups, and compared the efficacy of three computational methods to implement DMM—Hamiltonian Monte Carlo (HMC), variational inference (VI), and Gibbs Markov chain Monte Carlo. We report that DMM was better able to detect shifts in relative abundances than analogous analytical tools, while identifying an acceptably low number of false positives. Among methods for implementing DMM, HMC provided the most accurate estimates of relative abundances, and VI was the most computationally efficient. The sensitivity of DMM was exemplified through analysis of previously published data describing lung microbiomes. We report that DMM identified several potentially pathogenic, bacterial taxa as more abundant in the lungs of children who aspirated foreign material during swallowing; these differences went undetected with different statistical approaches. Our results suggest that DMM has strong potential as a statistical method to guide inference in molecular ecology.

DATA AVAILABILITY STATEMENT

All scripts and processed data used for this manuscript are available at https://github.com/JHarrisonEcoEvo/DMM Harrison, Calder, Shastry, & Buerkle, 2019 and a snapshot corresponding to the status at publication at Zenodo (10.5281/zenodo.3558682). Data from Duvallet et al. (2019) can be downloaded from (https://doi.org/10.5281/zenodo.2678108).

Number of times cited according to CrossRef: 4

  • COVID‐19 and ethnicity: Spotlight on the global rheumatology issues in developing and developed countries, International Journal of Rheumatic Diseases, 10.1111/1756-185X.13883, 23, 7, (849-852), (2020).
  • Applications of weighted association networks applied to compositional data in biology, Environmental Microbiology, 10.1111/1462-2920.15091, 22, 8, (3020-3038), (2020).
  • Correlation and association analyses in microbiome study integrating multiomics in health and disease, , 10.1016/bs.pmbts.2020.04.003, (2020).
  • The quest for absolute abundance: The use of internal standards for DNA‐based community ecology, Molecular Ecology Resources, 10.1111/1755-0998.13247, 0, 0, (2020).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.