Aim To explore the impacts of imperfect reference data on the accuracy of species distribution model predictions. The main focus is on impacts of the quality of reference data (labelling accuracy) and, to a lesser degree, data quantity (sample size) on species presence–absence modelling.
Innovation The paper challenges the common assumption that some popular measures of model accuracy and model predictions are prevalence independent. It highlights how imperfect reference data may impact on a study and the actions that may be taken to address problems.
Main conclusions The theoretical independence of prevalence of popular accuracy measures, such as sensitivity, specificity, true skills statistics (TSS) and area under the receiver operating characteristic curve (AUC), is unlikely to occur in practice due to reference data error; all of these measures of accuracy, together with estimates of species occurrence, showed prevalence dependency arising through the use of a non-gold-standard reference. The number of cases used also had implications for the ability of a study to meet its objectives. Means to reduce the negative effects of imperfect reference data in study design and interpretation are suggested.