Selecting traits that explain species–environment relationships: a generalized linear mixed model approach




Quantification of the effect of species traits on the assembly of communities is challenging from a statistical point of view. A key question is how species occurrence and abundance can be explained by the trait values of the species and the environmental values at the sites.


Using a sites × species abundance table, a site × environment data table and a species × trait data table, we address the above question using a novel generalized linear mixed model (GLMM) approach. The GLMM overcomes problems of pseudo-replication and heteroscedastic variance by including sites and species as random factors. The method is equally applicable to presence–absence data as to count and multinomial data. We present a tiered forward selection approach for obtaining a parsimonious model and compare the results with alternative methods (the fourth corner method and RLQ ordination).


We illustrate the approach on a presence–absence version on two data sets. In the Dune Meadow data, species presence is parsimoniously explained by moisture and manure on the meadows in combination with seed mass and specific leaf area (SLA). In the Grazed Grassland data, species presence is parsimoniously explained by the grazing intensity and soil phosphorus in combination with the C:N ratio and flowering mode.


Our GLMM approach can be used to identify which species traits and environmental variables best explain the species distribution, and which traits are significantly correlated with environmental variables. We argue that the method is better suited for providing an interpretable and predictive model than the fourth corner method and RLQ.