Challenges and opportunities of species distribution modelling of terrestrial arthropod predators

Species distribution models (SDMs) have emerged as essential tools in the equipment of many ecologists, useful to explore species distributions in space and time and answering an assortment of questions related to biogeography, climate change biology and conservation biology. Historically, most SDM research concentrated on well‐known organisms, especially vertebrates. In recent years, these tools are becoming increasingly important for predicting the distribution of understudied invertebrate taxa. Here, we reviewed the literature published on main terrestrial arthropod predators (ants, ground beetles and spiders) to explore some of the challenges and opportunities of species distribution modelling in mega‐diverse arthropod groups.


| INTRODUC TI ON
A mainstream topic in ecology, biogeography and conservation biology is the extent to which climatic conditions affect species performance (Colinet et al., 2015;Rezende & Bozinovic, 2019), which together with geographical and historical constraints ultimately modulates species niches and observed range boundaries (Bennett et al., 2021;Thomas, 2010). Obtaining a nuanced understanding of the factors conditioning species distributions has gained new urgency amid the current climate emergency (Ripple et al., 2020), insofar as changing climatic conditions are determining fast redistributions of species along latitudinal, elevational and other spatial gradients (Chen et al., 2011;Lenoir et al., 2020). As global climate change redefines the geography of life, we are becoming spectators of a large-scale experiment of complex ecological responses (Halsch et al., 2021;Román-Palacios & Wiens, 2020), where interactions among previously isolated species can quickly occur (Krosby et al., 2015), invasions of novel areas by alien species are becoming routine (Hellmann et al., 2008;Liu, Clarke, et al., 2020;Liu, Blackburn, et al., 2020), and unnoticed extinctions are potentially taking place on a daily basis (Barnosky et al., 2011;Cardoso, Barton, et al., 2020;Hughes et al., 2004). Therefore, mapping the diversity of life has never been so urgent (Santini, Antão, et al., 2021).
Over the years, ecologists and statisticians have developed an assortment of methods for modelling the niche and distribution of species in space and time, several of which fall under the umbrella of correlative species distribution models or ecological niche models (Box 1). For simplicity, we will hereafter refer to these as "species distribution models" (SDMs), while redirecting the interested readers to semantic discussions (Peterson et al., 2012;Sillero, 2011;Warren, 2012). Researchers have used SDM techniques for mapping the distribution of organisms in a variety of systems, although the number of applications across habitats and the tree of life have not been equal. For example, while the use of SDMs has grown exponentially in the terrestrial realm from the early 2000s onwards (Araújo et al., 2019;Lobo et al., 2010;Robinson et al., 2011), applications in systems where three-dimensionality is an important feature-for example marine ecosystems (Melo-Merino et al., 2020;Robinson et al., 2017), tree canopies (Burns et al., 2020), soils (Schröder, 2008) and caves (Mammola & Leroy, 2018)-have lagged behind. Also, applications of SDMs in animals have concentrated mostly on vertebrates (Titley et al., 2017), while studies on arthropod groups remain scarcer, although recently increasing (Figure 1).
This paucity of SDM studies is possibly related to a number of arthropod-specific modelling challenges. First, arthropods often are small organisms that move in small spatial scales, strongly influenced by microclimatic conditions and microhabitat structure (Pincebourde & Woods, 2020). These characteristics are hardly captured by the ubiquitous bioclimatic variables derived from remote sensing at relatively large spatial scales (Lembrechts et al., 2020;Potter et al., 2013).
Second, arthropods often have short life cycles with wide population abundance fluctuations from season to season and strong metapopulation dynamics, making it difficult to determine what their real, constantly changing range is. Third, occurrence data sets for poorly known arthropod species are likely to be severely spatially and temporally biased, affecting our appreciation of their real distribution patterns . Thus, arthropods pose particular modelling challenges that add to the ones already present for vertebrates, but they should also offer opportunities for future SDM research as data and new methods are made available (Maino et al., 2016).
Natural history is indeed entering its next-generation phase (Anderson et al., 2021;Jarić et al., 2020;Tosa et al., 2021), one characterized by increasingly available data (not only distribution data but also species traits and phylogenies) that can be routinely integrated in our modelling exercises. This is made possible by a parallel development of new methods, ranging from computationally fast multispecies modelling platforms (Pichler & Hartig, 2021) to flexible techniques able to account for traits (phenotypic plasticity) and genetic data in making predictions (Brewer et al., 2016;Bush et al., 2016;Garzón et al., 2019), along with tools to ease model interpretability (Ryo et al., 2021). As entomology is entering a nextgeneration phase too (Høye et al., 2021;Liu, Clarke, et al., 2020;Liu, Blackburn, et al., 2020), in all likelihood these advances will soon cascade to positively affect our understanding of the distribution of less studied arthropod groups.
Not only anticipating this progress but also considering the recent upsurge of studies discussing an "insect apocalypse" and the related calls for understanding the drivers of arthropod extinction risk Cardoso & Leather, 2019;Wagner et al., 2021), we conducted a systematic mapping of the literature to understand and synthesize trends in the use of SDMs in arthropod research. We explored these topics through the lens of the literature on dominant terrestrial arthropod predators: ants (c. 30,000 described species; Parr et al., 2017)

| Systematic literature search and metadata extraction
Between 20 and 24 November 2020, we searched on the Web of Science (Clarivate Analytics) for articles relying on SDMs to predict distributions of terrestrial arthropod predators (ants, ground beetles and spiders) and, for comparative purposes, other terrestrial vertebrate and invertebrate groups (Table 1). For each taxonomic group considered, we found and extracted papers using the following general query: TS=("family name(s)" OR "vernacular name(s)") AND TS=("Species distribution model*" OR "Ecological niche model*" OR "Bioclimatic envelope model*" OR "Niche model*" OR "Distribution model*" OR "Habitat suitability model*") where TS denotes a search for "Topic," and the asterisk (*) is a regular expression used to match all words including that string of characters (e.g. "model*" matches "models," "modelling," and "modelled"). See Appendix S1 for the list of specific queries.
We exported all results into the online review application Rayyan (Ouzzani et al., 2016) for title, keywords and abstract screening, whereby we excluded by-catches of papers not actually dealing with SDMs or our model species (e.g. our search for the keyword "spiders" also captured papers dealing with spider monkeys, genus Ateles) (Table 1). Furthermore, for ants, ground beetles and spiders, we manually inspected papers to extract specific data (Appendix S2).
We recorded the geographical extent of each study and all the species modelled. We classified the type of predictors used, their resolution, and the SDM algorithm(s) and modelling protocol employed.
Specifically, we coded the modelling protocol under three main categories: single algorithm, when studies just applied one modelling technique; ensemble of models, when the authors applied a plethora of different models (e.g. generalized linear model, generalized additive model, random forest, MaxEnt) and ensemble predictions of individual models via an averaging formula or algorithm (Araújo et al., 2019); and no silver bullet (Qiao et al., 2015), when the authors applied a number of algorithms (e.g. generalized additive model, boosted regression tree, symbolic regression) and only selected one for projecting the distribution based on some measure of algorithm performance.
Finally, we summarized the key results of each study (Appendix S2).

| Data analyses
We conducted analyses in R 3.6.3 (R Core Team, 2020) and visualized data using the ggplot2 R package (Wickham, 2016) and QGIS (Open Source Geospatial Foundation Project, 2020). The complete data set and R code used for the analyses are available on GitLab (https://gitlab.com/Denis Lafag e/sdm_review).

BOX 1 A general definition of species distribution models (SDMs) and their domain of applicability
As a broad and general definition, species distribution modelling implies using some statistical algorithms to explore the relationship between species occurrences (typically georeferenced localities) and environmental variables.
Once this relationship is determined, the model is used to characterize the ecological niche of a given species. This is usually achieved by projecting a probability surface or a habitat suitability map into a geographical space to represent its potential range of distribution (Guisan et al., 2017).
These models can be constructed using a wide range of algorithms including linear and additive regressions (Elith & Leathwick, 2009), symbolic regressions , tree-based machine learning (Zhang et al., 2019), maximum entropy models  and more. Given the large variety of life histories and data sources, the best modelling algorithm and approach necessarily changes, with no universal best solutions (Qiao et al., 2015).
Whereas the first paper relying on species distribution modelling is now over three decades old [e.g. the first applications of the algorithm BIOCLIM can be traced back to 1986 (Booth, 2018)], there has been an acceleration in the use of these tools in just the last two decades (Araújo et al., 2019;Lobo et al., 2010; Figure 1). This trend was probably due to the increase in occurrence data (Wüest et al., 2020;Zhang, 2017) and easy-to-use, often automated statistical packages that perform species distribution modelling (reviewed in Angelov, 2019). These methods have become popular in the toolkit of many ecologists, being useful to answer a range of questions. Not only are SDMs routinely used to describe species distributions, they have also proved important to assist and complement taxonomic studies (Rödder et al., 2010) and to set conservation agendas (Guisan et al., 2013). Furthermore, given that these models are transferable in space and time (Yates et al., 2018), they find applications in studies on climate change (Dormann, 2007;, historical biogeography (Peterson, 2009) and invasion biology (Liu, Clarke, et al., 2020;Liu, Blackburn, et al., 2020;Peterson, 2003), among other topics.
We analysed bibliometric data regarding the articles on ants, ground beetles and spiders with the bibliometrix R package (Aria & Cuccurullo, 2017). In order to map the production of articles per country for each group, we assigned articles to a country based on the affiliations of all the authors at the time when each article was published. In order to identify the most influential papers for researchers dealing with modelling of terrestrial arthropod predators distributions, we used a weighted co-citation network. Initially introduced for bibliometric research, co-citation networks have proved useful to identify key literature items acting as bridges between disciplines (Trujillo and Long, 2018). A particular article is included in the network when it is cited by at least two papers from the data set under study (Batagelj & Cerinšek, 2013). The number of co-citations is the number of times two articles are cited together. Furthermore, we built a collaboration network to identify the existence of bridges among scientists working on ants, ground beetles and spiders.

| Caveats in the interpretation of the survey
Due to our search strategy in the Web of Science and selection of keywords (Appendix S1), we did not capture all possible studies on SDMs dealing with our focal groups. For example, we missed some studies on taxonomy that used SDMs to assist species delimitations, as these did not mention the methodology in their keywords, title or abstract. Similarly, SDMs have recently begun to be routinely used for assessing terrestrial arthropod risk of extinction (Branco et al., 2019;Fukushima et al., 2019;Milano et al., 2021;Seppälä et al., 2018aSeppälä et al., , 2018bSeppälä et al., , 2018cSeppälä et al., , 2018d, but most of these studies were missed for the same reason. Furthermore, for many groups, especially vertebrates, the authors may not mention the higher taxonomic ranks included in our query but exclusively the species, genus or family, which will not be captured. We also acknowledge that our search was not linguistically exhaustive as we only included articles in English (Konno et al., 2020). As a result, our estimation of the volume of the literature on the focal groups should be taken as an approximation of the real number of studies. While we operated under the assumption that the biases were homogeneously distributed across all taxonomic groups, allowing us to compare them and to draw general inferences, still the comparison of absolute numbers of studies across taxa should be taken with caution (e.g. in Figure 1).  Figure 1a). However, the total number of studies was greater for vertebrates (67%) than invertebrates, and this difference would be even greater if these numbers are relativized to the total number of known vertebrate and arthropod species. This is a typical pattern that is partly explained by the fact that there is more available information on vertebrates (e.g. distribution data; Troudet et al., 2017) and partly the result of a cognitive bias in terms of researcher's subjective preferences for certain taxa over others (Clark & May, 2002)-what has been termed by entomologists "institutional vertebratism" or "taxonomic chauvinism" (Leather, 2009a(Leather, , 2009b. The few available studies on arthropods are drops in the ocean when considering the number of described and as yet undescribed species of insects (Stork, 2018) and spiders (Agnarsson et al., 2013).

| TA XONOMI C B IA S IN S DM RE S E ARCH ON ARTHROP ODS
Taxonomic bias towards certain groups exists also among articles dedicated to arthropods (Cardoso, 2012;Leandro et al., 2017).
For example, butterflies are among the most studied in SDM studies (6.4%), which once again may be due to a greater availability of information (Brereton et al., 2011;van Swaay et al., 2008;Thomas, 2005), and which in turn might be driven by aesthetic characteristics. Other well-studied groups are those relevant from an economic point of view, such as vectors of diseases (Diptera, 8.9%), crop pests (other beetles, 6.6%) and pollinators (Apoidea, 3.2%) ( Figure 1b).
As for our focal groups, we found that although spiders and ground beetles outnumber ants in terms of described species, the number of species studied was considerably higher for ants. This may be linked to the topic of articles, with most papers focusing on one of the numerous invasive ant species-it is likely that a few globally relevant invasive ant species (e.g. Argentine ant, fire ant) allow myrmecologists to obtain research funding, thus attracting most research attention (Holway et al., 2002;Silverman & Brightwell, 2008).
Inevitably, the few studies on ants, ground beetles and spiders have often been opportunistic, largely reflecting the specific interests of the few authors who have ventured to explore the potential of SDMs in terrestrial arthropod research. For example, this is evident when looking at a sample of papers on spiders-most studies focused on large-sized, taxonomically unique and/or charismatic species (Decae et al., 2019;Hamilton et al., 2016;Jiménez-Valverde et al., 2011;Wang et al., 2018), taxa of medical importance (Planas et al., 2014;Taucare-Ríos et al., 2018;Wang et al., 2018) or taxa inhabiting peculiar habitats that are the interest of certain authors, such as caves Pavlek & Mammola, 2021).

| G EOG R APHI C AL B IA S IN S DM RE S E ARCH ON ARTHROP ODS
The geography of studies, as inferred from author affiliations, revealed how the production of SDM papers on ants, ground beetles and spiders is mostly concentrated in North and South America and Europe (Figure 2). These are geographical areas that hold lower levels of diversity but greater availability of data. There were, however, some conspicuous differences among groups. For ants, modelled species are mostly in North and South America and Europe (Figures S1 and S2); only 15 studies modelled species distribution worldwide. For spiders and ground beetles, most studies focused on European species (Figures S3-S6), and only three and one studies/y, respectively, had worldwide coverage. There were considerably more ant species, which have been studied with SDMs than spiders and ground beetles.

| INFLUENTIAL PAPER S , COLL ABOR ATIONS AND TOPIC S
The co-citation network allowed us to identify key articles co-cited by the studies included in our survey ( Figure 3). As expected, most co-cited papers were methodological rather than arthropod-specific papers. The top-cited papers were Phillips et al. (2006) andHijmans et al. (2005), respectively, the reference for the algorithm MaxEnt and for the most widely used global climate database (WorldClim).
Among the less co-cited but still influential papers, there were several references to phylogenetic methods, suggesting that a number TA B L E 1 Number of articles returned by the queries on Web of Science (WOS) and number of articles kept after title, keywords and abstract screening of articles are potentially integrative research using multiple lines of evidence to deal with species delimitation (Ferretti et al., 2019;Ross et al., 2010) and historical biogeography (Magalhaes et al., 2014;Mammola et al., 2015;Planas et al., 2014;Solomon et al., 2008).
Network analysis also revealed highly structured collaboration hubs around the three groups of interest ( Figure 4). Observed collaboration hubs were strongly bound but limited in size, with only four cases of inter-group collaborations (ants-ground beetles, ants-spiders and ground beetles-spiders). Two cases were the result of multi-taxa studies (Christman et al., 2016;Jiménez-Valverde et al., 2009), and two were related to authors involved in articles dealing with two different groups: Williams S.E. (Staunton et al., 2014;Steiner et al., 2008) and Peterson A.T. (Peterson & Nakazawa, 2008;Planas et al., 2014;Roura-Pascual et al., 2009, 2006 probably on account of the reduced number of globally important known invasive spiders (Nentwig, 2015).
Finally, the focus of articles dealing with ground beetles was almost entirely climate change (52.6%) and the drivers of species  Figure 5), as also emphasized by the co-citation network ( Figure 3). This is a recurrent pattern in the latest SDM research, as found for the research in other animal groups (e.g. bats; Razgour et al., 2016). This trend is probably due to the fact that MaxEnt is a presence-background technique, allowing users to overcome some of the difficulties associated with obtaining reliable absence data in the light of imperfect detection. Moreover, MaxEnt has proved to be a robust species distribution modelling technique according to comparative studies (Elith et al., 2006, a highly co-cited reference in our data set as shown in Figure 3).

| Environmental variables
Bioclimatic variables were by far the most used predictors to model and explain species distributions (Table 2) (Table 2).
Note, however, that this may be systematic of a wider bias to choose data that are preprocessed, with little effort to select appropriate variables given the biology of the species (Fourcade et al., 2018;van de Pol et al., 2016).
Topography, soil and land use, and habitat variables are used less often, possibly due to greater limitations in their availability.
Nevertheless, when used, these non-climatic factors were often selected as important in modelling the distribution (>65% for ants and >80% for spiders and ground beetles; Table 2).
Importantly, most environmental rasters used today for developing SDM achieve a maximum resolution of 30 arc.sec (cell size c.
1 km 2 at the equator), which is excellent but might not be enough in the case of invertebrates that are known to respond to microclimatic characteristics over spatial scales of millimetres to metres (Potter et al., 2013;Suggitt et al., 2018). This is a key impediment that currently limits our ability to fully model the niche and distribution of F I G U R E 3 Weighted co-citation network for the top 30 cited papers in the entire data set (ants, ground beetles and spiders). The size of the vertex is proportional to the number of articles citing a given reference. The colours of the links and vertex reflect citation clusters. The colour of the text corresponds to the paper theme F I G U R E 4 Collaboration network between authors. Colours represent clusters of collaboration and pictograms the group targeted. For readability, the network is restricted to those papers with at least one author having two articles in the data set. This represents 64 articles (out of 103) and 211 authors (out of 355) terrestrial arthropods. In the analysed literature, mean variable resolution was rather similar for ants and ground beetles [respectively, 314.9 arc.sec (max = 1.4 arc.sec) and 414.7 arc.sec (max = 0.05 arc. sec)]. The mean resolution was higher for spiders (171.7 arc.sec (max = 1.4 arc.sec)).

| Mechanistic models and alternative data sources
The integration of SDM use with species functional traits and ecophysiological data was scarce. For ants, 10.2% of articles used traits and 6.2% ecophysiological data. For spiders, 2.4% of articles used functional traits and none ecophysiological data. For ground beetles, no articles used functional traits or ecophysiological data. In the few instances where similar variables were considered, these were not directly incorporated as predictors in the model but rather discussed in comparison with the modelled distribution. For the three groups, between 20% and 25% of papers used phylogenies, but without directly incorporating the phylogenetic information into the models (Table 2).

| SOLUTI ON S TO ALLE VIATE DATA LI M ITATI O N S
The scarcity of data has been pointed out as one of the key limitations to our understanding of the drivers of biodiversity change in invertebrates (Cardoso & Leather, 2019), as summarized in a number of so-called biodiversity shortfalls (Cardoso et al., 2011;Ficetola et al., 2019;Hortal et al., 2015;Lopes-Lima et al., 2021). SDMs may help us to combat some of these impediments by identifying unexplored regions of high environmental suitability for improving the geographical gaps in species distributions (i.e. tackling the Wallacean shortfall), by identifying the environmental drivers of these distributions (Hutchinsonian shortfall) and even by suggesting suitable sites for further sampling (Linnean shortfall). However, the SDM construction in itself requires robust and high-quality distribution data, creating a loop that is difficult to break. We provide below a few promising avenues for future improvements.

| Distribution data
A quick search for any bird species in the Global Biodiversity Information Facility (www.gbif.org) reminds us that it is unlikely we will ever possess for arthropods the same amount and quality of data  (Stork, 2018), while natural scientists are simply too few (Tewksbury et al., 2014). However, some recent technical advances may help us to overcome the main impediments related to data limitation and getting close to the goal of modelling the distribution of arthropods with more confidence.
Foremost, the emergence of ensembles of small models has proved promising to optimize the modelling of species for which few occurrences are available; this is achieved by combining a set of small bivariate models to create a consensus model that avoids overfitting (details in Breiner et al., 2015Breiner et al., , 2018. Second, modelling above the species level (Smith et al., 2019), for example by integrating data from related species when their niche overlap is large (Qiao et al., 2017), may be a useful shortcut to overcome a lack of distribution data in many circumstances. Finally, the recent advances in metabarcoding and environmental DNA is of major interest to overcome the issue of species detectability (Muha et al., 2017) and lack of invertebrate taxonomists (Hebert & Gregory, 2005). Metabarcoding consists in identifying species using small DNA sequences that are highly variable between species and weakly variable within a given species. It is the basis of the environmental DNA approach, which consists in the identification of the species present in a given environment using the DNA left by individuals. Despite many technical challenges, environmental DNA and metabarcoding face becoming standard survey tools in ecology (Deiner et al., 2017;Liu, Clarke, et al., 2020;Liu, Blackburn, et al., 2020), including for ants, ground beetles and spiders (Kennedy et al., 2020;Piper et al., 2019;Toju & Baba, 2018). Their ability to provide reliable absence data and to produce a massive amount of presence data is predicted to improve the efficiency of SDMs in the near future (Muha et al., 2017). Recently, for example, the use of environmental DNA has proved useful to forecast the spread of invasive species (Zhang et al., 2020) or to monitor the success of reintroduction programmes (Riaz et al., 2020). Large-scale projects including metabarcoding of terrestrial arthropod communities [e.g. LIFEPLAN (https://www.helsi nki.fi/en/proje cts/lifeplan) and the Insect Biome Atlas (https://www.insec tbiom eatlas.com)] are currently taking place and will provide an unprecedented data baseline for SDMs. This will likely trigger the parallel development of tools to handle the big data era (Hallgren et al., 2016).

| Micro-scale environmental predictors
Gigantic leaps forward are being made in the development of microclimatic databases (Kearney et al., 2014), as well as approaches for downscaling temperature data at high resolutions from thermal images (Senior et al., 2019) or airborne light detection and ranging data (George et al., 2015). It is predicted that in the following years, the use of remote sensing-derived data will become the standard for modelling and mapping the microclimate (Zellweger et al., 2019), especially in invertebrate research where the use of similar highresolution data has already proved useful to achieve realistic conservation prioritization (Bombi et al., 2019).

| OPP ORTUNITIE S FOR S DM RE S E ARCH ON TERRE S TRIAL INVERTEB R ATE S
SDMs are often used as a simple, correlative way to estimate species ranges based on the realized niche, having large uncertainties and often over-or underfitting the real distribution. In an influential paper published 15 years ago, it was foreseen that SDMs may offer "more than simple habitat models" (Guisan & Thuiller, 2005), by tackling biotic interactions, migration processes, dispersal limitations and (meta)population dynamics.
The challenges faced by conservation biologists today call for the development of more of these process-based models (or mechanistic models), providing causal explanations for the observed patterns (Briscoe et al., 2019). These can be defined as any model that mechanistically links model predictions and species fitness, measured either directly or indirectly using functional traits or environmental and biological interactions (e.g. competing or mutualistic species) (Kearney, 2006). This idea was reinforced by a seminal paper by Kearney and Porter (2009) calling for explicitly not only integrating physiological data in mechanistic niche modelling but also life history traits (including dispersal abilities and fitness).
Currently, there are proportionally more such studies for plants and marine invertebrates than animals (Chardon et al., 2020;Webb et al., 2020), because large spatial data sets needed for integrating physiological trait variation are available (Chown & Gaston, 2016).
While all these applications are still rare when it comes to terrestrial arthropods (see Maino et al., 2016), recently there have been studies that have addressed biotic interaction (Mammola & Isaia, 2017), dispersal limitations (Monsimet et al., 2020) and metapopulation dynamics (Giezendanner et al., 2020), thereby showing promising directions for future research. Studies including probability of survival to different stresses such as cold (Cuddington et al., 2018) or desiccation (Barton et al., 2019) were also performed for particularly well-known groups, including lepidopterans and insects considered as pests. However, whereas mechanistic models are increasingly available, they have high data demands and thus cannot be routinely used for invertebrates , especially in terrestrial arthropods where, as previously discussed, the scarcity of data on natural history and the large number of species are a clear challenge.
Some ideas towards a more mechanistic understanding of arthropod distributions are discussed in the following.

| Integration of species attributes and traits in SDMs
Species traits influence the outcome of SDMs in two ways. First, they themselves influence the distribution of species. Either in the present, past or future, the ability of species to adapt to certain conditions, their history, their relation with other species or their ability to disperse, all influence species distribution and its change in time (Diamond, 2018). Second, their traits may influence how complete or biased the known distribution data are and hence how adequate the modelled distributions for the different purposes are. Taking into account trait data before, during and after the modelling is therefore crucial for correct interpretation and to be aware of possible limitations (but see Beissinger & Riddell, 2021, for cautionary arguments).
The recent upsurge in open source trait databases and projects [ants (Parr et al., 2017), ground beetles (Homburg et al., 2014) and spiders (Lowe et al., 2020;] offers an unprecedented data baseline to integrate trait variability in modelling exercises and develop mechanistic descriptions of species distributions and their changes through time. Accordingly, the integration of correlative distribution analyses and functional approaches has recently been advocated Thuiller et al., 2009;Wittmann et al., 2016), as it would make it possible to bridge the differences in biogeography and functional ecology-"functional biogeography" (Violle et al., 2014).
There are various ways to link correlative SDMs and traits (Kearney & Porter, 2009). The most obvious one is a simple comparison between model outputs and trait variability, including the formulation of hypotheses about why these may concur or not.
Examples in invertebrates are the positive relationship between predicted habitat suitability and body size found in spiders , phenotype-environment associations observed in butterflies (Zaman et al., 2019) or the use of thermal physiology tests to define thermal safe zones in ants (Coulin et al., 2019).
With the aim of obtaining more meaningful and realistic prediction of biodiversity change, recently new modelling approaches that directly incorporate phenotypic plasticity and other functional traits into correlative modelling are being scrutinized (e.g. AdaptR; Bush et al., 2016;ΔTraitSDM;Garzón et al., 2019). In our view, one of the most flexible way to integrate traits in SDM is via the so-called spatial Bayesian species distribution model (Brewer et al., 2016), as a Bayesian framework offers the possibility to constraint modelling conditions using priors (e.g. thermal limits derived from ecophysiological data or other traits). Whereas the performance of the approach has been tested in a handful of invertebrate species thus far (e.g. Feng et al., 2020;Zhou et al., under review), similar tools will become increasingly useful as the availability of traits and computation power increase, leading to more realistic and evolutionary-driven predictions of biodiversity change.

| Linking genetic data and distributions
SDMs have been criticized, among other things, for not taking into account heterogeneity in the genetic structure of populations within the species range (Hampe & Petit, 2005;Smith et al., 2019).
Several recent studies have demonstrated that genetically informed SDMs improve climate change predictions because they incorporate possible local adaptations (Ikeda et al., 2017;Marcer et al., 2016). Instead of building SDMs based on species occurrence defined using standard taxonomy, one can model the distribution of each genetic unit of the population. The identification of these units can be achieved using traditional molecular markers such as amplified fragment-length polymorphisms, microsatellites and even single nucleotide polymorphisms (see below). For example, Marcer et al. (2016) built SDMs for each haplotype Arabidopsis thaliana (Brassicaceae) and found that even though most haplotype distribution ranges will shrink with global climate change, two of them will expand. Some authors also advocate the use of genetic data because it allows the production of real absence data (absence of a given genetic cluster), making it possible to fit logistic regressions and the incorporation of endogenous spatial autocorrelation (Gotelli & Colwell, 2011). The recent advances in high-throughput sequencing techniques allow ecologists to collect single nucleotide polymorphism data (Peterson et al., 2012) for cluster identification at reasonable costs. Single nucleotide polymorphism provides fine-scale resolution of population genetic structure, which can then be incorporated into SDMs. To our knowledge, this has rarely been done on animal populations (but see Hu et al., 2021;Razgour et al., 2018) and has never been done on terrestrial arthropod species.

| Accounting for dispersal
Using a correlative approach makes the inclusion of complex processes such as dispersal more difficult. While the inclusion of dispersal into SDMs has been advocated for more than 15 years (Seaborn et al., 2020) and can substantially improve model fit (Dormann, 2007), dispersal processes are still rarely taken into account. According to recent quantitative literature surveys, in the last two decades the proportion of SDM papers that included dispersal data in estimates of range shifts hovered around 20% (Holloway & Miller, 2017;Seaborn et al., 2020). The available studies on arthropods either considered dispersal by including a buffer of reachable areas around presences based on species-specific dispersal abilities [e.g. long-distance dispersal via ballooning for spiders (Mammola & Isaia, 2017)], or relied on more sophisticated approaches based on kernel distribution [e.g. model of butterfly accounting for both demography and dispersal via a kernel distribution (Singer et al., 2018)].
In general, these are rough estimations, given that dispersal is a complex phenomenon that is not trivial to integrate into SDMs (Thuiller et al., 2013). Indeed, dispersal is characterized by three phases (Clobert et al., 2009)

| CON CLUS IONS
Efforts to map the diversity of invertebrate life have been mostly concentrated in the last ten years, emphasizing how more and more entomologists and other scientists are beginning to incorporate SDMs into their research. In the light of our ignorance about the diversity, distribution and life history of most arthropods, these versatile tools are proving useful to fill some major knowledge gaps regarding their diversity patterns. The importance of similar endeavours becomes apparent when considering the accumulating evidence about the silent extinctions of invertebrates Eisenhauer et al., 2019;Wagner et al., 2021), the limited conservation efforts that are directed towards them (Cardoso, 2012;Mammides, 2019;Mammola et al., 2020;Milano et al., 2021) and the calls for solutions to these problems (Harvey et al., 2020;Samways et al., 2020).
Apart from the conservation implications of using SDMs to map arthropod diversity, we have shown how terrestrial arthropods may provide opportunities for advancing SDM research.
Given that terrestrial arthropod distributions are strongly influenced by microclimatic conditions and microhabitat structure, they represent ideal candidates for testing novel modelling approaches.
So far, this potential is still largely unexploited, and thus, we have discussed some recent avenues of research where the integration of different data sources may lead to mechanistic descriptions of key processes associated with species distributions. We are certain that our suggestions are a drop in the ocean when compared to what is currently available in terms of modelling possibilitiesmethodological advances in SDM-related theory are so quick that often it is difficult to keep pace. As brand new solutions to describe patterns and processes associated with species distribution are becoming available, we hope that this review will succeed in highlighting the potential of arthropods in SDM research and, in the future, that we will more often see them involved as protagonists in these developments.

ACK N OWLED G EM ENTS
We are grateful to Cathryn Primrose-Mathisen for proofreading our DL was supported by the SAD "PEPPS" (Région Bretagne).

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
The complete data set used for the analyses is available in Dryad caves, as scalable model systems in which to minimize confounding effects and reduce the number of parameters needed to explore different eco-evolutionary processes.
Denis Lafage is a researcher focusing on community ecology and food webs of interface habitats. He is particularly interested in community response to perturbations both natural and humaninduced at local and landscape scales.