Distributions of mammals in Southeast Asia: The role of the legacy of climate and species body mass

Current species distributions are shaped by present and past biotic and abiotic factors. Here, we assessed whether abiotic factors (habitat availability) in combination with past connectivity and a biotic factor (body mass) can explain the unique distribution pattern of Southeast Asian mammals, which are separated by the enigmatic biogeographic transition zone, the Isthmus of Kra (IoK), for which no strong geophysical barrier exists.


| INTRODUC TI ON
It is widely accepted that species distributions and thus biodiversity patterns result from the interplay of both biotic and abiotic factors (Soberón, 2007), not only those currently active, but also those experienced in the past (Dullinger et al., 2012;Svenning, Eiserhardt, Normand, Ordonez, & Sandel, 2015;Svenning & Skov, 2007a). However, the majority of biogeographic transition zones, i.e. zones separating the regions that encompass the distribution ranges of distinct groups of biota, are usually explained by the presence of a strong geological or geographic barrier to species movement (e.g. Isthmus of Panama; Bacon et al., 2015).
One prominent biogeographic transition zone, however, cannot be explained by presence of such a geophysical barrier-the Isthmus of Kra (IoK) on the Malay Peninsula (Figure 1a), which separates mainland Indochina from Sundaland (Hughes, Satasook, Bates, Bumrungsri, & Jones, 2011;Woodruff & Turner, 2009). The IoK is located at 10°30′ N and at its narrowest point is today only 44 km wide (Parnell, 2013). Depending on the taxa studied (i.e. bats: Hughes et al., 2011;birds: Hughes, Round, & Woodruff, 2003; butterflies: Corbet & Pendlebury, 1992;amphibians: Inger, 1999;mammals: Woodruff & Turner, 2009) To explain how IoK has become such a prominent biogeographic transition zone it has been suggested that rapid sea-level rises during the last 5 Myr have submerged the narrow central and northern land stretches, causing local faunal extinctions and compressing species distributions to regions north and south of IoK (Woodruff & Turner, 2009). However, low sea levels during glacial periods in the last 2 Myr uncovered land bridges that connected Sundaic and Indochinese landmasses (Lohman et al., 2011). Such land bridges could allow for species movement across IoK only if they were covered by suitable habitat. Some studies that focused on the reconstruction of the vegetation cover in Southeast Asia during the last glacial maximum (LGM, ~21,000 years ago) suggest that central Sundaland was covered by humid tropical forest that connected the Sunda islands in west-east direction (Cannon, Morley, & Bush, 2009;Raes et al., 2014), whereas savanna-like conditions persisted on the emerged lands north of Sundaland. However, other studies provide support for a much larger spread of open savannalike vegetation, which probably formed a transequatorial corridor that crossed Sundaland from north to south (Bird, Taylor, & Hunt, 2005;Gathorne-Hardy, Syaukani, Davies, Eggleton, & Jones, 2002;Meijaard, 2003). This would have restricted the tropical rain forests to smaller refugia mainly in Sumatra and Borneo. There is thus no consensus about what habitat prevailed on Sundaland during the LGM (Lohman et al., 2011). Meijaard (2009) has suggested that even in the absence of a strong geophysical barrier, the distinction between the Sundaic and Indochinese biota could have been maintained by the ecology of the species in combination with availability of suitable habitat. More generally, a conceptual framework suggested by Soberón (2007) distinguishes three sets of factors that affect species distributions:  (specialized Indochinese, general Indochinese, specialized Sundaic, general Sundaic), the number of species in each distribution group, and the taxonomic orders they belong to in consideration) and connectivity (reflecting the accessibility of the suitable abiotic conditions). Thus, the current distribution ranges are shaped by these factors acting during consecutive periods from the past to the present (Svenning et al., 2015). Although the factors affecting species distributions presently received a fair share of attention, the "legacy" effects looking at the impacts of factors in previous periods have mainly been studied in plant species (Dullinger et al., 2012;Normand et al., 2011;Svenning & Skov, 2007a, 2007b.
We addressed the question of how abiotic and biotic factors as well as connectivity shape biogeographic transition zones over time in the absence of a geophysical barrier, by focusing on IoK.
In particular, we assessed how the present distribution ranges of non-volant mammals inhabiting the areas around IoK are affected by (a) abiotic factors acting during the past (LGM, c. 21 ka and mid-Holocene, 6 ka) and present, (b) connectivity of available habitats during the LGM, and (c) biotic factors. Based on their present distributions, we classified 125 mammal species into two large distribution groups (Indochina and Sundaland), each with two subgroups, distinguishing the more specialized species (specialized Indochinese: ranges exclusively north of 12° N; specialized Sundaic species: ranges exclusively south of 5° N) from the ones with more general requirements (general Indochinese: ranges predominantly in Indochina, but reaching into the transition zone from 5 to 12° N; general Sundaic: ranges mainly in Sundaland, but protruding into the transition zone).
We note that the term "specialized" is used here to refer to general climatic (habitat) requirements of a species, and not to its diet.
To understand how IoK had become such a strong transition zone, we combined species distribution modelling (SDM), which allows hindcasting habitat suitability during the LGM and mid-Holocene, with comparative methods (phylogenetic linear model), which enable comparison of species-specific characteristics among related species, and tested the following two hypotheses ( Figure 2): H1-We hypothesized that the land emerged during LGM affected the habitat availability for predominantly general Indochinese and general Sundaic species, because they are less specialized and should be able to invade newly available habitats more readily. We derived two predictions based on H1: P1 (regarding the total habitat area in the past)-We predict that for general Indochinese species the total area of potentially suitable habitat was higher during the LGM F I G U R E 2 A schematic representation of the conceptual framework, hypotheses, respective predictions and findings (indicated as either support for or rejection of the hypotheses), and methods used to test specific hypotheses. The conceptual framework followed the one described in the text (Soberón, 2007). The numbering of hypotheses (i.e. H1, H2) and predictions (P1, P2) corresponds to the one used in the main text. For predictions the species distribution groups are abbreviated as follows: I S -specialized Indochinese, I G -general Indochinese, S S -specialized Sundaic, and S G -general Sundaic species. 'Tot suitable south' reflects the sum of predicted suitability values south of IoK All All compared to present day conditions, because the emerged land provided additional habitat more similar to that of current Indochina, irrespective of whether a transequatorial savanna corridor was crossing Sundaland or the Sunda islands were connected by the tropical rain forest (in this case savannas would still cover emerged areas north of Sundaland). For general Sundaic species, however, we predict (a) an increase in the total habitat area during the LGM compared to nowadays if the emerged land connecting the Sunda islands was covered by tropical rain forests (P1a); and (b) a decrease in the habitat area if rain forest habitats were restricted to refugia (P1b).
Our second prediction (P2) derived from H1 states that the change in the available habitat was directional, i.e. newly available habitat was located mainly south of IoK (as opposed to the newly Testing these hypotheses about how present and past factors have influenced mammal species distributions around IoK provides an alternative explanation for the persistence of a biogeographic transition zone in the absence of a strong geophysical barrier.

| Study area and species selection
We constrained our species list to non-volant mammals that according to The IUCN/SSC Red List of Threatened Species (http:// www.iucnr edlist.org/) are distributed in Southeast Asia, defined as an area between 91° West, 130° East, −15° South and 33° North and whose distribution range overlapped with at least one of the following countries: Myanmar, Thailand, Cambodia, Laos, Vietnam, Singapore, Malaysia, Brunei and Indonesia. Additionally, because we aimed to reveal a general pattern across multiple species, we excluded the species with very small distribution ranges (<130,000 km 2 , roughly the size of Java) because they are likely to be confined to rather specific small-scale climatic conditions (e.g. top mountain endemics) that are not typical of the general regional pattern. These selection criteria resulted in a preliminary set of 189 mammalian non-volant species (henceforth referred to as mammal species), of which 64 had to be removed because (a) their distribution ranges were located mainly outside of the study area and overlapped <5% with the study area; or (b) their distribution ranges covered the whole study area, rendering assignment to one of the four distribution groups impossible; (c) they are being driven to extinction by humans (e.g. hunting) or are easily spread by humans, meaning their current distribution ranges are likely to be determined predominantly by anthropogenic rather than environmental factors. The remaining 125 species were assigned to one of the four distribution groups (see Introduction and Figure 1b; for more details on the selection criteria see Appendix S1 and for the resulting species list see Table S2.1).

| Palaeo-projections with SDMs
We fitted SDMs for all 125 species in the following way: for each species we obtained its current distribution range from the Red List (maps downloaded on May 12th 2017) by using selected IUCN categories of presence, origin and seasonality, as detailed in Appendix S1. We rasterized this distribution range shapefile with a resolution of 2.5 arc minutes, because this is the finest resolution at which the climatic data used for hindcasting were available (see below). Next, we randomly sampled 10% of raster cells as occurrences for model fitting. By sampling the presences from the complete distribution ranges for each species we thus made sure that we were capturing the present climatic niche of each species (Soberón, 2007). For the species with >90% of their distribution range located within the study area (in total 113 species), we used the extent of the study area as a background. For the remaining 12 species, we used as a background a square encompassing the whole distribution range.
In both cases we excluded the Oceanian and Australian zoogeographic realms from our study area (Holt et al., 2013). We sampled the background points from the background area and set their number to 10,000 if the number of sampled presences was <35,000 (118 species), and to 100,000 otherwise (Table S2.1). We generated three sets of background samples to account for the randomness associated with their assignment. As environmental predictors for model fitting we used the bioclimatic variables provided by world clim.org/version1 (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005).
To avoid multicollinearity, for each species we calculated Spearman correlation coefficients among the environmental predictors at the locations assigned as presences and background points (for this we lumped all three sets of background samples; van Proosdij, Sosef, Wieringa, & Raes, 2016). Only environmental predictors with −0.7 < ρ < 0.7 (Spearman correlation) were retained for model building, resulting in a different set of predictors for each species (Table   S2.2). Models were fitted with the biomod2 package in R (Thuiller, Georges, Engler, & Breiner, 2016) using Maxent.
To evaluate the quality of the fitted models we used the approach suggested by Raes and Ter Steege (2007) for SDMs based on presence-only data. This approach is based on a comparison of the area under the curve (AUC) for the actually fitted model against a null distribution of expected AUC values. This null distribution is obtained by calculating AUC values for 999 models fitted to 999 sets of randomly drawn presence sets (and respectively generated background samples). We then compared the AUC value obtained with the model with the one-sided 95% confidence interval (CI) produced under the null distribution to determine whether the fitted model performed significantly better than expected by chance.
Next, we took the mean of the probabilities predicted with the models that were significantly better than expected by chance (all three models for each species, see Table S2.1) to obtain a single model for each species (Table S2.1). We found high consistency among the models based on the three sets of background samples for each species (results not shown), justifying averaging over these three models.
Next, we used the fitted habitat suitability model for each species to hindcast its habitat suitability to two past time periods using climatic data for the mid-Holocene (c. 6 ka) and the LGM (c. 21 ka), which were obtained with the following global climate models: CCSM4, MIROC-ESM and MPI-ESM-P (http://www.world clim.org/ paleo-climate1). These global climate models are commonly used for projecting the past habitat suitability with SDMs (Barker, Rodríguez-Robles, & Cook, 2015;Raes et al., 2014;Wilting et al., 2016). Finally, we built one model for each of the two past time periods by averaging the predicted probabilities obtained across the three climate models to account for uncertainty in climate model predictions (see Appendix S1 for details about the choices and assumptions made for SDM).
To test the sensitivity of our findings to the choice of the algorithm, we also fitted the models with boosted regression trees (BRT), and as the results obtained with the two algorithms were consistent (Appendix S2, Figures S2.1-S2.5), we here report the results based on Maxent only. Additionally, we assessed the sensitivity of our findings (see Appendix S1) to (a) the grid cell size used (by running the analyses with grid cells of 0.5°) because previous research found that predictions of SDMs based on fine-resolution environmental data may be biased (Hurlbert & Jetz, 2007); and (b) the proportion of the raster cells used as presences (in addition to the 10% reported in the main text we also used 1% and 20%). Our sensitivity analyses (see Appendix S2, Figure S2.6-S2.10) indicated that the results were not sensitive to the choice of the grid cell size and the proportion of cells used as presences. Therefore, our results validate our choices of using (a) the grid cell size of 2.5 arc minutes and (b) 10% of distribution range as presences.

| Impact of abiotic factors on species distributions
To assess how the habitat area of species inhabiting Southeast Asia changed across periods (H1), we converted the predicted habitat suitability values into presence/absence maps using as a threshold the value with maximum sum of sensitivity and specificity (i.e. max-SSS, Liu, Berry, Dawson, & Pearson, 2005;Liu, Newell, & White, 2016). To test prediction P1 we then used a linear model (Gaussian error distribution) with the total habitat area as response and period, species distribution group and their interaction as predictor variables. This model allowed us to assess how the total habitat area changed across the three periods for species in the four different distribution groups. Because the interaction of the period with distribution group was not significant, we excluded it from the model used to predict the total habitat area of each distribution group in each period.
To test prediction P2 we used the area of the habitat south of IoK as a response variable and fitted a linear model (Gaussian error distribution) with period, species distribution group, and their interaction as predictor variables. This model allowed us to assess how the area of the habitat south of IoK changed across the three periods for species in the four different distribution groups.
To account for the fact that the total landmass area changed in the LGM compared to the mid-Holocene and the current period, we fitted the two above-mentioned models by including another "period", which corresponded to the hindcasted LGM suitability map clipped to the current landmass. Furthermore, because the biogeographic transitions around IoK are reported to cover the latitudinal range from 5° N to 12° N, depending on the taxon considered (Hughes et al., 2011(Hughes et al., , 2003Woodruff & Turner, 2009), we checked the robustness of our results by using both the southern (5° N) and the northern (12° N) boundaries of the zoogeographic transition zone to recalculate the area of the habitat south of these boundaries.

| Impact of a suite of factors on species distributions
Since we have shown that during the LGM potentially suitable areas existed for Indochinese species south of IoK (see Results), we here focused on Indochinese species only. We tested the hypothesis that for Indochinese species, the area of the habitat that is currently available south of IoK depends on abiotic factors, a biotic factor and connectivity. As abiotic factors we used, for each species, the sum of predicted suitability values in raster grids south of IoK (10°30′N) during the LGM, mid-Holocene and present. The connectivity between the suitable areas located north and south of IoK in the LGM was calculated for each species using the least-cost path (LCP) analysis. We first assigned as the start and end points the grid cells with the maximum suitability value north and south of IoK (using the hindcasted LGM suitability map). The LCP algorithm then uses as costs the sum of resistance values (=inverse suitability values, i.e. 1/ suitability value) of each grid cell on the potential routes between the start and end points and selects a single path characterized by the least cost.
Next, for each species, we converted the least-cost distance (a sum of resistance values along the LCP) into connectivity (inverse of the least-cost distance value). For 14 species the least cost distance was estimated as infinity, a case when the resistance values of some of the grid cells on the identified path between the start and end points equal infinity because their suitability values are approaching 0. In such cases no LCP could be identified and we, therefore, assigned the least-cost distance to the maximum least cost distance identified across all species. We recognize that the LCP analysis provides only a proxy of connectivity while omitting the details of the movement path (e.g. inability to cross barriers such as rivers), but this method offered the best compromise between the detailed depiction of the movement (data-demanding) and discerning the pattern across multiple species within a large spatial extent.
As biotic factors we initially aimed to use several life history traits, such as adult body mass, age at first reproduction, inter-birth interval, litter size, number of litters per year, and age at sexual maturity, all available from the PanTHERIA database (Jones et al., 2009). But because most data were missing for many species (cf. González-Suárez & Revilla, 2013) we had to restrict our analysis to a single trait, body mass, which we used as a proxy of competitive ability of species, in particular reflecting the species' ability to colonize newly available habitats. Indeed, in mammals, adult body mass correlates strongly (Pearson correlation coefficient = 0.97) with adult forearm length (González-Suárez & Revilla, 2013), which is a known proxy for mammal mobility. Furthermore, larger body mass is associated with wider diet breadth, meaning that larger predators can feed on a wider range of prey species (Gilljam et al., 2011).
And, generally, for predators, the larger their size, the larger their prey, meaning that they would out-compete smaller-sized predators (Brose et al., 2006).
We then used a phylogenetic linear model to assess how body mass, connectivity during the LGM, and availability of suitable area south of IoK during all three periods affected the area of the habitat currently located south of IoK for Indochinese species (Figure 2), H2. We standardized (mean = 0 and SD = 1) all predictors prior to model fitting. The model was fitted using the function 'pgls()' from 'caper' package in R (Orme et al., 2013). Prior to model fitting, we tested for correlation among explanatory variables to avoid multicollinearity. Because the sum of suitability values correlated strongly among three periods (r > 0.7) and because the sum of suitability values in the LGM correlated with connectivity ( Figure   S2.11), we only retained the sum of suitability values estimated for the current period as an explanatory variable and excluded the suitability sums for the other two periods. For phylogeny we relied on the updated mammalian supertree (Fritz, Bininda-Emonds, & Purvis, 2009) and resolved polytomies randomly. To test how sensitive our results were to the assignment of the maximum leastcost distance for species for which the least-cost distance was estimated to infinity, we also re-run the model on the subset of data without these 14 species. F I G U R E 3 (a) Total predicted habitat area for species in the four distribution groups in each period (LGM, mid-Holocene and current) and (b) the area of the predicted habitat located south of the Isthmus of Kra for the species in the four distribution groups, per period. LGM_ curLand denotes the habitat area in the LGM calculated when considering current landmasses (to account for the differences in emerged landmasses between periods with different sea levels)

| Impact of abiotic factors on species distributions
As predicted (P1), we found that for general Indochinese species the area of suitable habitat had been larger during the LGM compared to the current period ( Figure 3a, Table 1). The same was found for general Sundaic species (Figure 3a, Table 1), providing support for the prediction that species dependent on evergreen rain forests found suitable habitat on the exposed Sunda Shelf (P1a). We did not find a stronger increase in the habitat area during the LGM for some distribution groups compared with others, as indicated by non-significant interaction between period and distribution group (F = 0.919, d.f. = 9, p = .5082). Importantly, if the hindcasted suitability maps for LGM were clipped to current landmasses we found that for all species there was a non-significant slight increase in the habitat area in the LGM compared to the current period (Table 1), indicating that additional suitable areas during the LGM were mainly located outside of the present landmasses.
The area of the habitat south of IoK depended on time period, species distribution group and the interaction between them (Table 1). As predicted (P2), we found that for general and specialized Indochinese species the area of the habitat south of IoK during the LGM was higher compared to the current period, even if the landmasses in the LGM were clipped to match the current period ( Figure 3b, Table 1). Similarly, for Sundaic species, the area of the habitat south of IoK was higher in the LGM compared to the current period, but only if we used the LGM landmasses. By clipping the LGM landmasses to match the current period, we found a slight decrease in the area of the habitat south of IoK for Sundaic species (compared to the current period), suggesting that the habitat available to Sundaic species in the LGM was mainly present on the emerged landmasses. The results were qualitatively unaffected by use of either the southern or northern boundaries (5° N and 12° N) of the biogeographic transition zone to calculate the proportion of the habitat south of IoK (Table S2.3 and Figure S2.12).
TA B L E 1 Results of linear models assessing the effects of predictors on a log-transformed (a) total habitat area (P1) and (b) habitat area south of IoK (P2) across the species in four distribution groups. As a baseline we used the distribution group specialized Indochinese species and the "current" period. Period corresponds to one of the studied time periods: current, mid-Holocene, LGM, and LGM corrected for the current landmasses (see Methods)

| Impact of a suite of factors on species distributions
We found partial support for H2 according to the phylogenetic model (

| D ISCUSS I ON
We showed that both present and historical abiotic factors in combination with body mass, as one proxy for biotic characteristics of to Java, as would be expected in the presence of a transequatorial savanna corridor (Bird et al., 2005;Gathorne-Hardy et al., 2002;Meijaard, 2003). We found that large parts of the exposed shelf around the equator between Borneo and Sumatra were not predicted to contain suitable habitat for Indochinese species ( Figure 5).

TA B L E 2
Results of the phylogenetic model testing how the area of the habitat south of IoK during the current period was affected by connectivity during the LGM, body mass, and sum of suitability values for the current period. Significant variables are highlighted in bold. The predictors were standardized (mean = 0 and SD = 1) prior to the analysis. The estimated phylogenetic signal alpha is 0 (  Ideally, fossil records could be used to validate the hindcasts made here (e.g. Metcalf et al., 2014). Unfortunately, the Quaternary fossil record in Southeast Asia is of poor temporal and spatial resolution (Louys, Curnoe, & Tong, 2007), limiting our ability to compare fossil data with our projections for each species for the two past periods with any accuracy. Another issue with the fossil record concerns poor knowledge about the history and formation of fossil sites, which means that the absence of a species from the fossils recovered at a particular site cannot be interpreted with 100% confidence as the absence of that species from the study area during a particular historical period. Nevertheless, we tested our hindcast projections by matching them with the fossil data (cf. Davis, Mcguire, & Orcutt, 2014;Martínez-Meyer, Peterson, & Hargrove, 2004) Barker et al., 2007;Louys, 2012) that were also included in our species list, had high habitat suitability during the LGM at the Niah cave locality (Table S2.5). And one of three species that overlapped between our species list and the fossils from Java (Wajak Cave, 8°37'59'' S, 116°9'E, fossil age 30-40 ka: Storm et al., 2013) had high habitat suitability in that region according to the hindcasts to the LGM. Although these few records cannot ascertain the accuracy of our model projections for all species, they provide some credibility for the overall patterns found in our study.
In addition to abiotic factors (suitable climatic conditions), a biotic factor, body mass, also affected the current distributions  Barker et al., 2015;Cooper et al., 2016;Fitzpatrick et al., 2011;Mathai et al., 2019;Patel et al., 2016;Raes et al., 2014;. Indeed, for our study region, the global climate models provide the best available climate projections in terms of spatial resolution. Furthermore, our analyses were conducted at a spatial resolution of <1°, which may be prone to commission errors (Hurlbert & Jetz, 2007), although our sensitivity results based on 0.5 degree resolution were consistent with the results reported in the main text. Taken together, we would like to highlight that given the high habitat fragmentation in our study region, special attention should be devoted to the species with lower mobility, particular as it has been predicted that suitable ecological conditions for 23%-46% of Bornean mammals will shift upslope under forecasts of land cover and climate change (Struebig et al., 2015).
Similarly to studies on plants (Normand et al., 2011;Svenning & Skov, 2007b), amphibians and reptiles (Araújo et al., 2008), we demonstrated that the legacy of previous climatic conditions and current climate have also affected mammal species distributions. We showed that, at least during the LGM, savanna-like habitats apparently did not stretch all the way southwards to Java. We further showed that the transition at the IoK between the distinct Indochinese and Sundaic biota was maintained through different climatic periods during the Late Pleistocene, even though the biota were directly connected during long periods of the Late Pleistocene by the exposure of the Sunda Shelf. This finding provides the first evidence that biogeographic transition zones may be explained by climatic factors and their legacy, and that a strong geophysical barrier, such as the Isthmus of Panama is not needed to maintain the separation of biota.

ACK N OWLED G EM ENTS
We thank Nicolas J. Deere for the comments on a previous draft of this manuscript and Julien Louys and two anonymous reviewers for valuable suggestions. This work was conducted as part of a Leibniz-Association project SAW-2013-IZW-2.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data supporting the results are available from the Dryad Digital Stephanie Kramer-Schadt https://orcid. org/0000-0002-9269-4446 Joerns Fickel https://orcid.org/0000-0002-0593-5820