• Nearest-neighbor imputation;
  • Plant community composition;
  • Random forest;
  • Species distribution modelling;
  • Vegetation mapping;
  • Western Oregon



Landscape management and conservation planning require maps of vegetation composition and structure over large regions. Species distribution models (SDMs) are often used for individual species, but projects mapping multiple species are rarer. We compare maps of plant community composition assembled by stacking results from many SDMs with multivariate maps constructed using nearest-neighbor imputation.


Western Cascades ecoregion, Oregon and California, USA.


We mapped distributions and abundances of 28 tree species over 4,007,110 ha at 30-m resolution using three approaches: SDMs using machine learning (random forest) to yield: (1) binary (RF_Bin); (2) basal area (abundance; RF_Abund) predictions; and (3) multi-species basal area predictions using a nearest-neighbor imputation variant based on random forest (RF_NN). We evaluated accuracy of binary predictions for all models, compared area mapped with plot-based areal estimates, assessed species abundance at two spatial scales and evaluated communities for species richness, problematic compositional errors and overall community composition.


RF_Bin yielded the strongest binary predictions (median True Skill Statistics; RF_Bin: 0.57, RF_NN: 0.38, RF_Abund: 0.27). Plot-scale predictions of abundance were poor for RF_Abund and RF_NN (median Agreement Coefficient (AC): −1.77 and −2.28), but strong when summarized over 50-km radius tessellated hexagons (median AC for both: 0.79). RF_Abund's strength with abundance and weakness with binary predictions stems from predicting small values instead of zeros. The number of zero value predictions from RF_NN was closest to counts of zeros in the plot data. Correspondingly, RF_NN's map-based species area estimates closely matched plot-based area estimates. RF_NN also performed best for community-level accuracy metrics.


RF_NN was the best technique for building a broad-scale map of diversity and composition because the modelling framework maintained inter-species relationships from the input plot data. Re-assembling communities from single variable maps often yielded unrealistic communities. Although RF_NN rarely excelled at single species predictions of presence or abundance, it was often adequate to many (but not all) applications in both dimensions. We discuss our results in the context of map utility for applications in the fields of ecology, conservation and natural resource management planning. We highlight how RF_NN is well-suited for mapping current but not future vegetation.