• Generalized Linear Model;
  • Logistic regression;
  • Predictive accuracy assessment;
  • Quercus;
  • ROC-curve;
  • Sensitivity;
  • Specificity;
  • Vegetation modeling
  • Munz & Keck (1973)

Abstract. The use of Generalized Linear Models (GLM) in vegetation analysis has been advocated to accommodate complex species response curves. This paper investigates the potential advantages of using classification and regression trees (CART), a recursive partitioning method that is free of distributional assumptions. We used multiple logistic regression (a form of GLM) and CART to predict the distribution of three major oak species in California. We compared two types of model: polynomial logistic regression models optimized to account for non-linearity and factor interactions, and simple CART-models. Each type of model was developed using learning data sets of 2085 and 410 sample cases, and assessed on test sets containing 2016 and 3691 cases respectively. The responses of the three species to environmental gradients were varied and often non-homogeneous or context dependent. We tested the methods for predictive accuracy: CART-models performed significantly better than our polynomial logistic regression models in four of the six cases considered, and as well in the two remaining cases. CART also showed a superior ability to detect factor interactions. Insight gained from CART-models then helped develop improved parametric models. Although the probabilistic form of logistic regression results is more adapted to test theories about species responses to environmental gradients, we found that CART-models are intuitive, easy to develop and interpret, and constitute a valuable tool for modeling species distributions.