Bioclimatic modelling using Gaussian mixture distributions and multiscale segmentation


Correspondence: Daniel G. Gavin, Department of Plant Biology, 265 Morrill Hall, University of Illinois, Urbana, IL 61801. E-mail:


Aim  To introduce Gaussian mixture distributions and sequential maximum a posteriori image segmentation (GM-SMAP) as a model that predicts species ranges from mapped climatic variables, and to compare its predictive capacity with two commonly used bioclimatic models: regression tree analysis (RTA) and smoothed response surfaces (SRS).

Location  North-west North America.

Methods  We compared models for their ability to predict the distributional range of western hemlock (Tsuga heterophylla). We calculated and projected nine climatic and water-balance variables to a 2-km grid up to 140 km from the T. heterophylla range. Models were trained using the five variables selected by RTA, as well as subsets of three variables. Goodness of fit was assessed using models trained and tested on the entire study area. Predictive capacity was assessed using 100 cross-validation tests, each trained on a randomly sampled 1% of the study area and tested on the complement of the study area.

Results  Models using all five variables were significantly better than three-variable models. Model fit was greatest for SRS. GM-SMAP misclassified slightly more area and RTA misclassified almost twice the area compared to SRS. However, cross- validation showed that the predictive capacity was clearly greatest for GM-SMAP and lowest for SRS, indicating that GM-SMAP makes more accurate predictions from sparse data.

Main conclusions  GM distributions prevent overfitting using an information-theoretic approach, and the SMAP algorithm minimizes the spatial extent of the largest misclassified area using a multiscale method. These properties, useful for image classification, also aid their strong predictive capacity as a bioclimatic model. SRS overfit the data, lowering its predictive capacity, and RTA failed to capture details of interactions among variables, yielding a poor fit. These results demonstrate the strong potential of GM-SMAP as a bioclimatic model.