Making better Maxent models of species distributions: complexity, overfitting and evaluation
Models of species niches and distributions have become invaluable to biogeographers over the past decade, yet several outstanding methodological issues remain. Here we address three critical ones: selecting appropriate evaluation data, detecting overfitting, and tuning program settings to approximate optimal model complexity. We integrate solutions to these issues for Maxent models, using the Caribbean spiny pocket mouse, Heteromys anomalus, as an example.
North-western South America.
We partitioned data into calibration and evaluation datasets via three variations of k-fold cross-validation: randomly partitioned, geographically structured and masked geographically structured (which restricts background data to regions corresponding to calibration localities). Then, we carried out tuning experiments by varying the level of regularization, which controls model complexity. Finally, we gauged performance by quantifying discriminatory ability and overfitting, as well as via visual inspections of maps of the predictions in geography.
Performance varied among data-partitioning approaches and among regularization multipliers. The randomly partitioned approach inflated estimates of model performance and the geographically structured approach showed high overfitting. In contrast, the masked geographically structured approach allowed selection of high-performing models based on all criteria. Discriminatory ability showed a slight peak in performance around the default regularization multiplier. However, regularization levels two to four times higher than the default yielded substantially lower overfitting. Visual inspection of maps of model predictions coincided with the quantitative evaluations.
Species-specific tuning of model parameters can improve the performance of Maxent models. Further, accurate estimates of model performance and overfitting depend on using independent evaluation data. These strategies for model evaluation may be useful for other modelling methods as well.