Integrating intraspecific differentiation in species distribution models: Consequences on projections of current and future climatically suitable areas of species

Conventional species distribution models (SDMs) usually focus on the species level but disregard intraspecific variability. Phylogeographic structure and evolutionary significant units (ESU) have been proposed as pragmatic proxies to incorporate intraspecific differentiation in SDMs. Nevertheless, the efficiency of using these proxies in SDMs has been poorly investigated. We analysed how the projections of current and future climatically suitable areas can be affected when using ESU‐based or lineage‐based models compared to a species‐level model.


| INTRODUC TI ON
Species distribution modelling (SDM) is among the most widely used methods in macro-ecology and conservation ecology (Mainali et al., 2015;Zurell et al., 2016). Several methods are available but most studies use a correlative approach where the species niche is modelled by relating distribution data to environmental conditions (Guisan & Zimmermann, 2000). SDMs are typically applied to map species ranges and to project their changes under future global change scenarios (Guisan & Zimmermann, 2000;Rasmont et al., 2015). However, most of the SDM studies consider species as a unit, while intraspecific differentiation and potential niche divergences have largely been ignored (D'Amen, Zimmermann, & Pearman, 2013;Pearman, D'Amen, Graham, Thuiller, & Zimmermann, 2010;Valladares et al., 2014). This can bias models especially when smaller populations with potential local adaptations and individualistic responses are less-well represented by species-level models (D'Amen et al., 2013;Lecocq, Rasmont, Harpke, & Schweiger, 2016;Pearman et al., 2010). This is of particular concern in the context of climate change and respective mitigation options, especially when the resulting models underestimate or overestimate the climatic tolerance or sensitivity of species (Benito Garzón, Alía, Robson, & Zavala, 2011;. One way to lower such uncertainties of SDMs could be to integrate intraspecific niche divergences in the models (Valladares et al., 2014). However, this requires a priori definition of subunits differentiated within a species of concern .
Defining such subunits on the basis of features shaped by adaptation to regional climatic conditions (e.g., higher heat stress resistance for some populations; Martinet, Lecocq, Smet, & Rasmont, 2015) is the best solution (Alexander & Edwards, 2010;. However, comprehensive information on such features is seldom available for the majority of species. A more pragmatic approach may use proxies for the detection of intraspecific differentiation (Lecocq, Brasero, Meulemeester et al., 2015b) that could be related to different climatic requirements of populations (Lynch, 1996). This relies on the assumption that intraspecific differentiation triggered by demographic history and/or local adaptations shapes population specificities in genetic, morphology, physiology, behaviour, or ecology that can impact the environmental requirements (Hewitt, 2004;Lynch, 1996).
Considering the phylogeographic structure has been suggested as an effective way to integrate intraspecific differentiation into SDMs (i.e., lineage-based models) (D' Amen et al., 2013;Marcer, Méndez-Vigo, Alonso-Blanco, & Picó, 2016;Pearman et al., 2010). The phylogeographic structure is usually estimated with neutral markers which are impacted by the heritage of the populations resulting from the effects of neutral evolutionary forces (i.e., genetic drift, mutation, migration, hybridization; Avise, 2000). This allows defining differentiated allopatric lineages with individualistic evolutionary history (Avise, 2000) which has possibly shaped lineage-specific climatic requirements (Marcer et al., 2016). However, the definition of units below the species level based on solely neutral marker remains controversial because (a) DNA sequences analysed are often chosen arbitrarily (Cruaud et al., 2014) and (b) ecologically differentiated populations are not always characterized by the accumulation of many genetic differences (Ferguson, 2002;Salvato et al., 2002). These limitations can be overcome by combining different lines of evidence such as phylogeographic structure based on multi-markers along with phenotypic traits and/or ecological features within the context of the integrative taxonomy (Schlick-Steiner et al., 2010). This approach allows detecting conspicuous allopatric differentiated populations (Lecocq, Brasero, Meulemeester et al., 2015b) that are closely related to evolutionarily significant units (ESUs). Initially, the ESUs were defined as populations with current geographic separation, neutral marker genetic differentiation or locally adapted phenotypic traits (Conner & Hartl, 2004). However, this initial definition has been criticized because (a) current geographic disjunction can be triggered by very recent demographic events making adaptive differences between allopatric populations unlikely (e.g., see examples of recent local extinctions of bumblebees in , (b) ESUs defined solely using neutral markers ignore adaptive differences (Frankham, Ballou, & Briscoe, 2010), (c) trait divergences considered as locally adapted phenotypic traits can be shaped by phenotypic plasticity (i.e., the ability of a genotype to produce different phenotypes under different environmental or developmental conditions; Valladares et al., 2014). Therefore, we here use a more restrictive definition: ESUs are geographically isolated conspicuous groups differentiated in genetic and phenotypic traits (Frankham et al., 2010). These ESUs are used as an effective shortcut for estimating patterns of intraspecific diversity in conservation biology (Frankham et al., 2010). Since the integrative taxonomy approach is increasingly applied (e.g., arthropods: Hendrixson, Guice, & Bond, 2015;vertebrates: Costa & Amorim, 2014), corresponding information on ESUs is constantly growing for an increasing number of species. This could pave the way to a massive development of ESU-based SDMs and an improved different responses to climate change when they lead to different results than species-based models.

K E Y W O R D S
bumblebee, climate change, evolutionary significant unit, intraspecific variability, model performance, species distribution model projection of tomorrow's biodiversity. Nevertheless, the efficiency of ESU-based SDMs compared to that of lineage-based SDMs or specie-level SDMs has been less-well investigated to date.
In this paper, we analyse how the projections of current and future climatically suitable areas can be affected by using ESU-based or lineage-based SDMs compared to a species-level model. As examples, we use three widespread bumblebee species (Bombus lapidarius, B. pascuorum and B. terrestris) with comparable distributions across Europe, for which phylogeography and ESU delimitations are available (Lecocq, Brasero, Martinet, Valterovà, & Rasmont, 2015;Lecocq, Coppée et al., 2016;Lecocq, Dellicour et al., 2015).

| Studied species and geographic range
We focused on three (West)-Palaearctic bumblebee species: Bombus lapidarius, B. pascuorum and B. terrestris. These species are some of the most abundant bumblebees where they occur . They are sympatric in most of their range and not threatened (i.e., IUCN Red List status: Least Concern with stable populations) . The occurrence data came from the database "Base de données fauniques de Gembloux-Mons" . We extracted all species occurrences between 1950 and 2014 in the West-Palaearctic region but because of the rather low sampling effort and unavailability of lineage/ESU statuses, we had to exclude Russia, Belorussia, Kazakhstan and Ukraine. We obtained 35,252 observations for B. lapidarius, 63,565 observations for B. pascuorum and 34,970 observations for B. terrestris and called the resulting datasets "raw observation datasets." We aggregated these data to presence/absence data at 10 arc-minutes resolution grid (i.e., = 0.17° or ~14.5 km at 40°N) to account for differences in local sampling effort and to obtain reliable absence data. Additionally, we also used a lower resolution (i.e., 30 arc-minutes) to further taking into account the sampling effort differences. To further increase the reliability of absence data, we considered only grid cells as valid for absence data when at least one bumblebee individual from one

| Climatic data
We obtained 19 bioclimatic variables from the WorldClim database

| Climatic niche comparisons
We assessed potential climatic niche differentiation between species level, lineages and ESUs, by determining the niche position and niche breadth for each classification level via outlying mean index (OMI) analyses (Dolédec, Chessel, & Gimaret-Carpentier, 2000). This measure describes the average relative position of species in a multivariate environmental space by measuring the distance between the mean habitat conditions used by a target group and the mean habitat conditions across the entire study area. The higher the OMI value for a particular group of organisms (i.e., species, lineage, or ESU) is, the more marginal and atypical the group's niche is compared to mean habitat conditions observed in the studied region. In contrast, low (i.e., close to zero) OMI values indicate groups occurring in typical habitats of a region. Significance of OMI was tested with a randomization test based on 1,000 permutations. We estimated niche breadth per classification level and species by using the tolerance index which measures the amplitude in the distribution of each target group along a climatic gradient (Dolédec et al., 2000;Thuiller, Lavorel, Midgley, Lavergne, & Rebelo, 2004

| SDM development and evaluation
We developed SDMs for each species and classification level separately using boosted regression trees (BRTs) (Elith, Leathwick, & Hastie, 2008) in r (R-package gbm version 2.1.; Ridgeway, 2013) assuming a binomial error structure and using a learning rate of 0.001, a tree complexity of three (lower complexities resulted in worse models while higher complexities performed similarly), and a bag fraction of 0.75. To ensure an overall prevalence of 0.5 and thus comparable resulting occurrence probabilities, we weighted the absences by the ratio of the number of presences on the number of absences (Maggini, Lehmann, Zimmermann, & Guisan, 2006 We merged individual lineage/ESU models (so-called summed models) by keeping the highest probability obtained among models for a particular grid cell. To transform the resulting occurrence probabilities into presence/absence maps, we identified a threshold by maximizing the true skill statistic (TSS) (Allouche, Tsoar, & Kadmon, 2006). Spatial autocorrelation in model residuals is known to result in biased parameter estimates and the inflation of type I errors in SDM (Crase, Liedloff, Vesk, Fukuda, & Wintle, 2014). Therefore, we used correlograms plot Moran's I (i.e., a measure of spatial autocorrelation; Moran, 1950) between grid cells as a function of the distance between them (Kissling & Carl, 2007)  We evaluated the models using ( into presence/absence data by maximizing TSS, and performance of the summed models was evaluated using sensitivity, specificity and TSS. We based these assessments on the "raw observation dataset" to take into account all observations for each species (including observations for which we were not able to determine the lineage/ESU status).

| Future projections
We aimed at examining the impact of potential differences among the three SDM approaches on future projections of climatically suitable areas. We used the respective climate values obtained for the five global circulation models (GCMs; i.e., CCSM4, HadGEM2-AO, MIROC-ESM, MPI-ESM-LR and NorESM1-M) and according to the representative concentration pathways 8.5 scenario (Moss et al., 2008). We used only one climate change scenario as we aimed only at comparing the general consequences when using the three different SDM approaches and not at performing for a comprehensive assessment of potential future changes in the three species.
We obtained these data from the WorldClim database (Hijmans et al., 2005)  the current predicted geographic range that could remain climatically suitable in 2070 for each classification level.

| RE SULTS
As the results were quite similar at the two different resolutions, we here present only the results at 10 arc-minutes resolution.
The results at 30 arc-minutes resolution are shown in Supporting Information ( Figures S2 and S4), and only major differences with the other resolution are given in the main text.

| Climatic niche differentiation
The first three axes of the PCA based on the overall climatic conditions in the study area accounted for 95% of the total variability Our analyses indicated climatic niche differentiation within each of the three species. All species, lineages and ESUs showed significant OMI values indicating distinct climatic requirements compared to overall conditions within the study area (Supporting Information Table S1). However, the high residual tolerance values for most of the species/lineages/ESUs (Supporting Information   (Figure 1 and Supporting Information Figure S2).

| SDM performance comparison
The correlograms of the Moran's I statistics did not indicate significant autocorrelation (Supporting Information Figure S3). Model performance was good to excellent for the species-level models and each of the lineage-and ESU-based models ( Table 2 and Supporting Information Figure S2). The comparisons of performances between F I G U R E 1 Outlying mean index (OMI) analyses of each classification level (species, lineage and evolutionary significant units) for three bumblebee species. These analyses are based on the datasets at the resolution of 10 arc-minutes. Spaces occupied by studied species along the first and second axes of the OMI analysis are given on the left with the projection of mean position (centre of red crosses) and its variance (branches of the red crosses) occupied by each group of each classification level. Canonical weights of the five bioclimatic variables are given on the right (Bio1 = annual mean temperature, Bio4 = temperature seasonality, Bio5 = maximum temperature of the warmest month, Bio13 = precipitation of the wettest month and Bio15 = precipitation seasonality) alternative models showed quite similar results at each resolution (Tables 2 and Supporting Information Table S2). AUC values were higher for the species-level models than for other models (except for B. pascuorum; Tables 2 and Supporting Information Table S2).
Overall, TSS values showed a lower prediction ability of summed ESU-based models compared to the species-level models (Tables 2   and Supporting Information Tables S2-S4). TSS values of lineagebased models were as good as species-level models for B. pascuorum but were worse for B. lapidarius (Tables 2 and Supporting Information   Table S2). In general, summing up lineage-based and ESU-based models led to larger areas considered as climatically suitable compared to species-based models (Figure 2 and Supporting Information Figure S4; see individual SDM for each lineage/ESU in Supporting Information Figure S5). This was indicated by lower model specificity but correspondingly higher sensitivity for lineage-based and ESUbased models (Tables 2, Supporting Information Tables S2 and S4).
Summed lineage-based and summed ESU-based models did not only predict larger areas as climatically suitable compared to species-based models but also the unsuitable areas differed from species-based models (Figure 2). Using species-based models as a TA B L E 2 Evaluation of alternative models for the three bumblebee species of the BPD including 97% of the BPD and for B. terrestris 124% of the BPD but including only 90% of the BPD (Figure 2). For all species, lineage-and ESU-based SDMs predicted also a larger geographic extent for Eastern and Southern Europe (Figure 2 and Supporting Information Figure S4).

| Future projections
As performances of each SDM were similar between the two resolutions, we investigated future projections at 10 arc-minutes resolution. Climate change under the representative concentration pathways 8.5 scenario was projected to have considerable effects on future distributions of all three species (Figure 3 and Supporting Information Figure S6). However, the severity of this impact depended on the classification level used to assess species responses (Table 3; Figure 3). In particular, species-level models were more pessimistic than lineage-based or ESU-based models for all species in terms of relative loss of suitable area until 2070 (Table 3).

| D ISCUSS I ON
Our analyses were based on occurrence data from several previous field surveys. We argue that these datasets can be considered reliable for SDM analyses (see Methods). However, we cannot completely rule out that field surveys failed to detect or disregarded (i.e., some surveys may have only targeted a particular species and therefore did not record others) our species of concern. Therefore, climatic niche and modelled current and future distribution of the three bumblebee species should be considered with caution.

| Climatic niche differentiation within species
Our analyses show that the climatic niches of most of the lineages and ESUs are distinct (Figure 1; Supporting Information Table S1), as also observed in mammals (D'Amen et al., 2013) or other insects (Homburg, Brandt, Drees, & Assmann, 2014). One could argue that this is an expected result since the environment is spatially F I G U R E 2 Results of species distribution models for three bumblebee species. Potential climatically suitable area according to speciesbased models, summed lineage-based models and summed ESU-based models for three bumblebee species. Green areas indicate model predictions. Analyses are based on datasets at 10 arc-minutes resolution. The thresholds used to transform the occurrence probabilities into presence/absence map are those maximizing the true skill statistic: 0.56/0.  Velthuis & van Doorn, 2006). Therefore, we assume that there are indeed niche differentiations for the analysed bumblebee species, but further assessments, for example, via bioassays (e.g., Martinet, Lecocq et al., 2015), are needed.

| Does integrating ESU and lineage information improve species distribution models?
The existence of spatial autocorrelation may affect the performance metrics (e.g., Crase et al., 2014) which we used to compare species-level, lineage-based and ESU-based models. Although the Moran's I statistics were high for some distance classes, they were not significant. Therefore, we consider that our results were poorly impacted by potential issues triggered by the presence of spatial autocorrelation in model residuals.
In our case, integrating lineage or ESU information did not improve overall measures of accuracy like TSS or AUC, but model sensitivity and specificity changed (ESU and lineage-based SDMs vs. species-based SDM). While prediction of presence data was improved (i.e., increased sensitivity), decreased specificity led to constant over-prediction for all three species ranges. However, these results must be qualified. On one hand, the specificity of ESU and lineage-based SDMs could be underestimated in our analyses be- suitability in Eastern Europe. Therefore, the prediction by lineage/ ESU-based SDMs that these areas are climatically suitable for the species (thus increasing the sensitivity) could be an artefact of the over-prediction of these models rather than a performance increase.
Overall, our current results do not highlight that integrating ESU or lineage information increases the SDM performance under current climatic conditions. Therefore, none of the alternative SDM approaches appears to be superior under current conditions.

| Integrating intraspecific differentiation for climatic risk assessment
Our projections underline different climatically suitable areas in 2070 for all species but with pronounced differences depending on the classification level used for SDMs ( Figure 3). Indeed, the extent of change and the proportion of conserved, lost and gained areas depended on the kind of model used (

| Limitations and relevance of the approach
The main problem associated with integrating intraspecific in SDMs less observations) can lead to an overestimation of species range because smaller observation datasets can decrease the SDM precision resulting in broader climatic parameters inference (Stockwell & Peterson, 2002). In our study, the quite similar performances and the larger species range predictions of ESU and lineage-based SDMs do not confirm the first concern but could reflect the second potential issue.
Another potential issue is the relevance of the a priori classification of differentiated populations. Indeed, intraspecific units defined on the basis of genetic markers or ESU delimitation may not display physiologically different responses to climate change. For instance, a previous study showed that tolerance to heat is largely conserved among closely related species (Araújo et al., 2013). This would suggest that heat tolerance may also be conserved within species. However, several empirical studies (including studies on bumblebee species) underlined differentiations in genes coding for heat tolerance (Du, Li, Zhang, Meng, & Zhang, 2014) (Margules & Pressey, 2000). As SDM-based climatic risk predictions are increasingly used to produce guidelines to (non-)governmental agencies and to assess conservation plans (Carroll, 2010), divergences and limitations of each modelling approach should be taken into account for developing efficient biodiversity management strategies. Lineage and ESU-based SDMs offer the advantage to drag attention to species in which allopatric populations could display physiologically different responses to climate change when they lead to different results than species-based models. Therefore, they pave the way to further assessments of ecoclimatic tolerance of particular species where intraspecific differentiation could lead to an unexpected response to climate change.

ACK N OWLED G EM ENTS
The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007(FP7/ -2013 under grant agreement no 244090, STEP Project (Status and Trends of European Pollinators, www.step-project.net) (Potts et al., 2011).

CO N FLI C T O F I NTE R E S T S
We have no competing interests.

DATA ACCE SS I B I LIT Y
The occurrence data came from the database "Base de données fauniques Gembloux-Mons" and were published in Rasmont et al. (2015).