A problem with variable selection in a comparison of correlative and process‐based species distribution models: Comments on Higgins et al., 2020

Abstract Comments are presented on an article published in October 2020 in Ecology and Evolution (“Predictive ability of a process‐based versus a correlative species distribution model”) by Higgins et al. This analyzed natural distributions of Australian eucalypt and acacia species and assessed the adventive range of selected species outside Australia. Unfortunately, inappropriate variables were used with the MaxEnt species distribution model outside Australia, so that large climatically suitable areas in the Northern Hemisphere were not identified. Examples from a previous analysis and from the use of the freely available spatial portal of the Atlas of Living Australia are provided to illustrate how the problem can be overcome. The comparison of methods described in the Higgins et al. paper is worthwhile, and it is hoped that the authors will be able to repeat their analyses using appropriate variables with the correlative model.

the MaxEnt species distribution model outside Australia, so that large climatically suitable areas in the Northern Hemisphere were not identified. Examples from a previous analysis and from the use of the freely available spatial portal of the Atlas of Living Australia are provided to illustrate how the problem can be overcome. The comparison of methods described in the Higgins et al. paper is worthwhile, and it is hoped that the authors will be able to repeat their analyses using appropriate variables with the correlative model.
The analysis of suitable environmental conditions for Australian tree species, both in Australia and overseas, has been a recurrent theme since the first species distribution model (SDM) package became available in 1984 (see sections 2.2 and 3.0 in Booth & Muir, 2020).
The Higgins et al., (2020) paper analyzed natural distribution data for 664 Australian eucalypt and acacia species. Natural distribution data for 46 selected species were also analyzed to assess whether a process-based or a correlative SDM most effectively predicted the adventive range of these species outside Australia. It was concluded that "the correlative model-MaxEnt-has a superior ability to describe the data in the training data domain (Australia) and that the process-based model-TTR-SDM-has a superior ability to predict the distribution of the study species outside of Australia." These conclusions may be true, but readers cannot be sure as there is a major problem with the variables selected for use when applying the correlative model outside Australia. This causes the MaxEnt model (Phillips et al., 2006) to fail to identify large environmentally suitable areas in the Northern Hemisphere.
In their methods section, Higgins et al., (2020) described that their process-based modeling used monthly data for minimum, mean and maximum monthly temperature, soil moisture contents, and solar radiation. A soil nitrogen content measure was also used that was held constant across all months. They stated that "This exact same list of environmental variables was used for the MaxEnt model fitting. This ensures that both models use the same information for estimating the species distribution models." The problem with using monthly variables in a correlative model for the whole world is that the same months are not comparable across hemispheres. To avoid this problem, climatic classifications, such as the widely used Köppen and Köppen-Geiger systems (Köppen, 1936;Peel et al., 2007), were developed using measures related to periods such as seasons (i.e., summer or winter) and the coldest month. Early SDM studies built on these ideas but provided sets of variables rather than a hierarchical classification system.
The first SDM package was called BIOCLIM (Booth et al., 2014;Nix, 1986) and used a set of 12 variables, which was extended to 19 variables in 1996. These variables are listed in the ANUCLIM publication (for current version, see Xu & Hutchinson, 2011), which describes several programs including BIOCLIM. The set of 19 BIOCLIM variables was adopted for use by the WorldClim system (Fick & Hijmans, 2017;Hijmans et al., 2005), which with over 20 000 citations in Google Scholar TM is the most widely used source of climatic data for SDM studies. MaxEnt parameter settings and region selection can have important impacts on analysis outputs (Merow et al., 2013), but there is no indication that these could account for the problem described here. All the analyses described here used default parameter settings. The analyses shown here in Figures 1 and 3 were for the whole world. The Thompson et al., (2011) analysis shown in Figure 2 was clipped to include only the Mediterranean region but produced similar results for the region to those shown in Figure 3.
In conclusion, the use of just monthly variables is not appropriate for global application of correlative models. It is recommended that BIOCLIM variables, such as those provided by the WorldClim database (Fick & Hijmans, 2017;Hijmans et al., 2005), should be used. If researchers wish to use other variables derived from monthly values, it is recommended that they compare their detailed SDM results with the broad regions provided by Köppen-Geiger maps (Peel et al., 2007). If there are gross differences, this suggests more variables may be needed or that the variables being used may not be appropriate. Higgins et al., (2020) is potentially a very interesting paper, but the Köppen-Geiger classification, the Thompson et al., (2011) paper, and the ALA analysis described here clearly indicate that there is a fundamental problem with the MaxEnt analyses as illustrated by  Thompson et al., (2011). Green-yellow areas are climatically suitable and reddish-brown areas are highly suitable. Note the extensive suitable areas compared with the almost total lack of suitable areas in Figure 1

K E Y WO R DS
Acacia, BIOCLIM, ecological niche models, eucalypt, invasive species, species distribution models

ACK N OWLED G EM ENTS
I am grateful to Prof. Steven Higgins for some preliminary email exchanges concerning his paper with colleagues. Figures 1 and 2 are reproduced here under the terms of the Wiley Open Access Licence. I thank the Atlas of Living Australia team for the provision of their extremely useful facility. Thanks to Jacqui England and Libby Pinkard for comments on an early draft of this manuscript, as well as the Associate Editor and anonymous reviewers for their helpful comments. Thanks to Chris Foote (Editor-in-Chief, Ecology and Evolution) for advice on submitting a comment piece.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data used to produce Figure 3 Thompson et al., (2011). Purple dots indicate natural distribution data from Western Australia used to calibrate the model. The scale at the left of the map indicates climatic suitability with yellow/green areas being relatively suitable and blue unsuitable. Note the extensive climatically suitable areas in the Mediterranean region compared with output from the MaxEnt model shown in Figure 5 of Higgins et al., (2020)