Profile or group discriminative techniques? Generating reliable species distribution models using pseudo-absences and target-group absences from natural history collections


Rubén García Mateo, Universidad de Castilla-La Mancha, ICAM, Laboratorio de SIG y Teledetección, Av. Carlos III s/n, Toledo, 45071, Spain.


Aim  The presence-only data stored in natural history collections is the most important source of information available regarding the distribution of organisms. These data and profile techniques can be used to generate species distribution models (SDMs), but pseudo-absences must be generated to use group discriminative techniques. In this study, we evaluated whether the SDMs generated with pseudo-absences are reliable and also if there are differences in the results obtained with profile and group discriminative techniques.

Location  Ecuador, South America.

Methods  The SDMs were generated with a training data set for each of the five species of Anthurium and six different methods: two profile techniques (BIOCLIM and Gower’s distance index), three group discriminative techniques [logistic multiple regression (LMR), multivariate adaptative regression splines (MARS) and Maxent] and a mixed modelling approach genetic algorithm for rule-set production (GARP), which employs a combination of profile and group discriminative techniques and generates its own pseudo-absences. For LMR, MARS and Maxent, three types of absences were generated: (1) random pseudo-absences in equal number to presences and excluding a buffer area around presences (except for Maxent, which assumes that this background sample includes presences), (2) a large number (10,000) of random pseudo-absences, also excluding a buffer area around each presence and (3) ‘target-group absences’ (TGA), consisting of sites where other species of the group have been collected by the specialist, but not the species being modelled. To compare the predictive performance of the SDMs, the area under the curve statistic was calculated using an independent testing data set for each species.

Results  MARS, Maxent and LMR produce better results than the profile techniques. The models created with TGA are generally more accurate than those generated with pseudo-absences.

Main conclusions  The advantages and disadvantages of different options for using pseudo-absences and TGA with profile and group discriminative modelling techniques are explained and recommendations are made for the future.