Aim To offer an objective approach to some of the problems associated with the development of logistic regression models: how to compare different models, determination of sample size adequacy, the influence of the ratio of positive to negative cells on model accuracy, and the appropriate scale at which the hypothesis of a non-random distribution should be tested.
Location Test data were taken from Southern Africa.
Methods The approach relies mainly on the use of the AUC (Area under the Curve) statistic, based on ROC (threshold Receiver Operating Characteristic) plots, for between-model comparisons. Data for the distribution of the bont tick Amblyomma hebraeum Koch (Acari: Ixodidae) are used to illustrate the methods.
Results Methods for the estimation of minimum sample sizes and more accurate hypothesis-testing are outlined. Logistic regression is robust to the assumption that uncollected cells can be scored as negative, provided that the sample size of cells scored as positive is adequate. The variation in temperature and rainfall at localities where A. hebraeum has been collected is significantly lower than expected from a random sample of points across the data set, suggesting that within-site variation may be an important determinant of its distribution.
Main conclusions Between-model comparisons relying on AUCs can be used to enhance objectivity in the development and refinement of logistic regression models. Both between-site and within-site variability should be considered as potentially important factors determining species distributions.