Climatic predictors of species distributions neglect biophysiologically meaningful variables

Species distribution models (SDMs) have played a pivotal role in predicting how species might respond to climate change. To generate reliable and realistic predictions from these models requires the use of climate variables that adequately capture physiological responses of species to climate and therefore provide a proximal link between climate and their distributions. Here, we examine whether the climate variables used in plant SDMs are different from those known to influence directly plant physiology.


| INTRODUC TI ON
Over the last 20 years, species distribution models (SDMs) have become one of the most widely used methods for predicting how species will respond to global environmental change. A search in Web of Science (May 2018) for articles containing both "species distribution models" and "climate change", for example, gave over 7,800 returns. Studies that use SDMs, or develop tools for doing so are amongst the most highly cited in ecology and conservation (e.g., Elith et al., 2006 [3,524 citations]; Guisan & Zimmerman, 2000 [3,479]; Phillips, Anderson, & Schapire, 2006 [4,953]; Thomas et al., 2004 [3,271]-Web of Science Core Collection, May 2018).
Moreover, results from SDMs have shaped 21st century conservation policy, highlighting that regions with favourable climates will soon lie beyond the natural limits of colonization of many current species distributions, and hence, that the redesign of protected area networks or species translocations may be needed (Guisan & Thuiller, 2005).
In the context of climate change, a premise of SDMs is that climate determines the natural distribution of species (Pearson & Dawson, 2003). On this basis, SDMs determine the statistical relationship between current species' presence/absence data and a set of climatic variables and use this to predict the areas that a species may be able to occupy in the future (Elith & Leathwick, 2009). The climate variables selected to model species distributions are therefore assumed to impose constraints on species such that at locations or times when climatic conditions are unsuitable, populations of a species are unable to survive in the wild (Pearson & Dawson, 2003).
The climatic variables used in SDMs can be identified in two main ways. Most commonly, a correlative approach is taken, whereby statistical associations between species' presence or absence data and a set of climate variables are initially tested and the strongest predictors included in the SDM (Elith & Leathwick, 2009). In contrast to these "correlative" SDMs, "mechanistic" or "physiological" models use variables for which experimental work has established direct links to biological processes of the study species.
The appropriate selection of climate variables is fundamental to the reliability of SDMs (Austin & Van Neil, 2011). If the variables selected cannot adequately represent climatic factors that influence a species' distribution, then subsequent range predictions in new locations or future climate scenarios may be incorrect. The degree to which climate variables are proximal is therefore an important consideration when constructing SDMs (Austin, 2002;Petitpierre, Broennimann, Kueffer, Daehler, & Guisan, 2017).
Proximal variables directly capture physiological mechanisms or processes of the study species and, as such, are causally linked to a species' distributional response to climate both in space and in time (Austin, 2002(Austin, , 2007. Indirect links to species' physiology can be captured by "distal" variables which may provide a good "mean field approximation" for these proximal predictors (Bennie, Wilson, Maclean, & Suggitt, 2014). However, other factors in a species' environment, both climatic and non-climatic, may contribute strongly to observed relationships between distal variables and species distributions in correlative models. The influence of these additional factors may be unique to the time and place in which correlations between a distal variable and species distributions are determined so that in new locations or future climates the ability of a distal variable to predict species distributions may weaken or be lost (Jackson, Betancourt, Booth, & Gray, 2009). Proximal variables are thus likely to provide more robust estimates of distribution, particularly when applied to studies of species responses to climate change, and are often considered superior to distal alternatives when using SDMs for this purpose (Austin, 2002).
Despite recommendations to use proximal variables in SDMs (e.g., Helmuth, Kingsolver, & Carrington, 2005;Barbet-Massin & Jetz, 2014), those selected are known often to neglect physiological processes (Mod, Scherrer, Luoto, & Guisan, 2016). By definition, mechanistic SDMs will use proximal variables but correlative SDMs, which remain the most popular approach to modelling species distributions (Barry & Elith, 2006) We focus only on plant species to provide a more complete and comparable analysis and because climate is widely accepted as the most dominant influence on plant distributions (cf. Box, 1981;Woodward, 1987). Further, as primary producers, plant distributions will influence resource availability at higher trophic levels, which in turn has important implications for the conservation of species further up the food chain (Hadded et al., 2009). As SDMs are used routinely to assess species distributions in the context of climate change (Austin & Van Niel, 2011), we analyse climate variables associated directly with a changing climate (Collins et al., 2013) and which are known to influence plant distributions (Austin & Van Neil, 2011), namely temperature and water availability (Körner et al., 2016).
While we acknowledge that factors such as dispersal and biotic interactions can also exert strong influence on species distributions (Gallien et al., 2015;Shea & Chesson, 2002), consideration for these is beyond the scope of this study. We hope to aid the effective parameterisation of the climatic component of SDMs, especially to meet demands to predict accurately species responses to climate change.

| Data sources
We compiled data from the peer-reviewed literature on species distributions and physiology. We performed two literature searches: Each study was inspected, and all climate variables found significantly or insignificantly to affect plant physiology (e.g., growth, reproduction, survival) were recorded. In cases where experimental treatments were delivered over unspecified phenological stages, but occurred when the otherwise unmanipulated environmental variables were conducive to plant growth, we defined the temporal scale of the final variable as "during the growing season." We grouped soil water content into a single variable (for each unique time period of measurement), regardless of the way this was determined in the study (e.g., gravimetrically, volumetrically) as individually the variables would be very highly correlated and would not provide meaningful additions to the physiology list if separated. There were no other cases where the grouping of variables was necessary. Herein, we refer to any variables identified from the physiology search as the "physiology" or "physiological" variables. Full details of the physiology variables can be found in the Appendix S1. A variable could be classed as both an "SDM" and a "physiology" variable if it was used to model the distribution of a plant in one of the SDM studies and also found to be physiologically relevant in a study from the plant physiology literature.

| Analysis
To identify the 10 most frequently used variables from each of the two searches, we summed the number of times each unique climate variable was used in their respective literature and sorted the results from highest to lowest. Final rankings of the physiology variables were calculated by dividing these frequencies (significant) by the number of times each variable was used (significant + insignificant) in the 150 papers reviewed. This accounts for ease of manipulation of these variables within an experimental setting, but a further limitation is that our literature search was non-exhaustive and variable rankings may therefore be sensitive to studies selected. We therefore performed a post hoc sampling with replacement procedure to test for the robustness of variable ranks. We generated 999 new samples and tested for concordance between the ranking order of the top 10 variables in each of the new samples and our original top 10 physiology variables using Kendal's W test (Tate & Clelland, 1957).
Studies modelling distributions of a greater number of species may use more general climate variables, so we investigated whether use of the top 10 SDM variables was influenced by species number, using a generalized linear model (GLM). As variable use was represented as either 0 (false) or 1 (true), a binomial error distribution and logit link function were used. Species number was logarithmically transformed to reduce heteroscedasticity. Eight studies were excluded from this analysis as the species number was not stated.
We sought next to provide statistical comparison between the climatic conditions described by the SDM and physiology variables.
However, climate variables are often correlated with one another, and in consequence, even if the variables are different, the spatial patterns of those most frequently used in SDMs may capture in aggregate the spatial patterns of the physiological variables. We followed a two-step process in order to compare the SDM and physiology variables: (a) principal component analysis (PCA) on both variable sets; and (b) multiple regression analysis of SDM principal component scores using scores from physiology principal components (PCs) 1-3 as predictors.
Principal component analysis can be used to reduce dimensionality in a dataset and indicate which variables contain the most information (King & Jackson, 1999). Here it allows us to determine which aspects of climate variation are described by the SDM and physiology variables. We performed two PCAs to identify which climate variables contributed most to the overall variation in conditions described by the top 10 SDM and physiology variables.
Total annual precipitation and mean annual temperature were not included in the analyses as they featured in both top 10 lists and so it was not necessary to examine the spatial differences between the SDM and physiology studies for these variables. Data were scaled to account for differences in units among each variable set, and a scree plot was used to determine how many PCs to retain from each PCA (Appendix S3: Figure S2). For both the SDM and physiology variables, we retained the first three PCs. We analysed the variable loadings for PCs 1-3 for both variable sets to determine the aspects of climate they described.
It is not possible to compare PCs in a like-for-like way between variable sets. PC1 for the SDM variables, for example, may correlate poorly with PC1, but well with PC2 of the physiology variables so that collectively the PCs from the two sets of variables may describe similar trends in global climate variation. To assess similarity between the climate variation described by the physiology and SDM variables, we therefore performed three multiple regression analyses on scores of each of the SDM PCs using the scores for the physiology PCs 1-3 as predictors (Appendix S3: Figure S5). To determine the variance unexplained collectively by the multiple regressions, we calculated the squared residuals for each regression and mapped the square root of the minimum of these residuals, thereby revealing where discrepancies in the spatial patterns of climate captured by the two sets of variables were greatest ( Figure 3).
All data analyses were conducted using the statistical programme R (R Core Team, 2018).
We found strong concordance between variable ranking in the Total annual precipitation and mean annual temperature were the only variables that featured in both the SDM and physiology top 10 lists. Of the top 10 SDM variables, six captured variation or extremes of temperature and four captured variation or extremes of precipitation. The top 10 physiology variables were more diverse in the aspects of climate that they describe and placed greater emphasis on water availability. Soil moisture content was the most commonly identified physiology variable yet was only included in one SDM study. The timing of climatic events within the growing season was important to five of the physiology variables but was not explicitly featured in any of the SDM variables.

| Spatial patterns
Comparison of global maps depicting the mean values for the SDM and physiology variables indicate that the spatial patterns of climate they describe are dissimilar. For example, global variation in growing season soil moisture content, the top physiology variable, was not matched by any of the SDM precipitation variables; growing season soil moisture content showed a more patchy distribution, particularly in the Northern Hemisphere, whereas variation in the SDM precipitation variables generally radiated out from the equator. Similarly, temperature seasonality and maximum temperature during the growing season, the variables ranking fourth and seventh in the SDM and physiology top 10, respectively, showed clear differences in spatial variation despite both describing temperature indices of climate. Maximum temperature during the growing season captures climatic variation more independently of equatorial influence than temperature seasonality and acknowledges that extreme high temperatures (>35 C) will be detrimental to plant growth. The physiology variables appeared to show greater spatial heterogeneity in climatic variation, particularly for the temperature-related variables. The physiology variables highlighted areas with climates distinct from that of the general trend in the surrounding area, such as along the west coast of South America, whereas the SDM variable appeared to smooth out these nuances.

| Principal component analyses
The first three components for the SDM and physiology variables explained 86% and 94% of the variance, respectively (Tables 3 and   4

| Correlation with species number
Individually, total annual precipitation, mean diurnal range, temperature isothermality and temperature annual range (rankings 2, 7, 8 and 9, respectively) were more likely to be used with an increasing number of study species (GLM, p = 0.02, p = 0.02, p = 0.03, p = 0.04) (

| Variable selection as a predictor of plant distributions
The climate variables used in SDMs are assumed to reflect the physiological constraints on the study species that affect where they can survive in the wild (Kearney & Porter, 2009). Proximal variables represent a direct link between climate and physiology (Austin, 2002;Jackson et al., 2009) and as physiological limits are inherent traits, their influences on a species' distribution are more likely to be conserved in time and space (Austin, 2002(Austin, , 2007. Distal variables, however, correlate indirectly to species' physiology through their relationship to the proximal variables they replace (Merow et al., 2014). Although distal variables may provide a good "mean field approximation" for proximal predictors under existing climates (Bennie et al., 2014), it cannot be assumed that this relationship will be conserved in time and space, and in consequence, the use of distal variables in predictive models is questionable. Physiological variables may therefore be more robust predictors of species distributions in novel climates and locations (Austin, 2002 (Araya, Gowing, & Dise, 2010) to English meadows (Silvertown, Dodd, Gowing, & Mountford., 1999) has been attributed to hydrological niche separation.
Precipitation is often selected as a distal predictor for soil moisture (e.g., Austin & Van Niel, 2011) and, indeed, four of the top 10 SDM variables related to precipitation. However, precipitation has been shown to be a poor proxy for soil moisture conditions (Piedallu, Gégout, Perez, & Lebourgeois, 2013) and may therefore fail accurately to capture the amount of water that ultimately becomes available to plants (Dilts, Wesiberg, Dencker, & Chambers, 2015). The discrepancy between precipitation and soil moisture variables may become increasingly important at finer spatial scales, where topography has greater influence on soil water content (Daws, Mullins, Burslem, Paton, & Dalling, 2002;Maclean, Bennie, Scott, & Wilson, 2012) or in transition zones between wet and dry climates where evaporation is high and feedbacks between soil moisture and precipitation occur (Koster et al., 2004). Substituting soil moisture variables with precipitation surrogates could therefore threaten the reliability of SDMs (Weltzin et al., 2003), and indeed, explicitly incorporating soil moisture predictors into SDMs for plants has been suggested as a way to increase the reliability of subsequent range predictions (e.g., Whitehead, 2001). Variance among the SDM variables, however, is described by factors reflecting the variability and extremes of temperature and precipitation throughout the year and could be considered to define climate continentality.
The PCA results suggest that, physiologically, it is important that climate variables consider the mutual availability of temperature and water (i.e., "better together") whereas the SDM variables describe the ranges or extremes in these aspects of climate (and usually model them independently of each other). Most plant phyla are known to have evolved during a "tropical planet" (Benton, 1993) and high niche conservatism in plants (Prinzing, 2001;Romdal, Araújo, & Rahbek, 2013) means that many species are likely to have retained a tropical affinity (Wiens & Donoghue, 2004). Our results concur with this hypothesis as climate variables indicating tropicality, particularly combinations of temperature and water, were found to be physiologically important to plants.
Our second hypothesis was also supported. By calculating and mapping globally the minimum residual differences between PC scores for the SDM and physiology variables, we show that the spatial patterns of climate variation described by the most commonly used SDM variables do not match those described by physiologically F I G U R E 3 Global map of the minimum residual differences from multiple regression analyses of SDM principal components 1-3 using scores from principal components 1-3 for the physiology variables as predictors relevant variables. We conclude that the top 10 SDM variables are distal indicators of species distributions.
Residual differences were greatest in areas where precipitation regimes or the mutual availability of temperature and water become more important to the classification of climate, which confirms that the SDM variables are poor proxies for physiological variables that relate to water availability, particularly soil moisture content and the timing of rainfall within the growing season. Areas of hot desert and polar climates (as defined by the Köppen-Geiger climate classification system -Geiger, 1961; see Kottek et al., 2006 for updated global map), however, were, in general, similarly described by both sets of variables. These are areas of temperature extremes (although in opposing directions) which suggests that once a certain temperature threshold is reached, average climate data can adequately capture physiologically limiting conditions and may be good substitutes for more proximal variables in these cases.

| Variable selection in a changing climate
Species distribution models have become a popular tool among ecologists and conservation biologists to predict how species might respond to climate change (Pearson & Dawson, 2003). Indeed, in the studies we examined, nearly one-third (48/150) aimed to predict species response to climate change as their primary objective and most referred to the application of SDMs for this purpose. As the climate warms further and the results of previous change become more evident, the role of SDMs to predict the impacts of climate change on species distributions and aid conservation strategies is likely to grow and many authors have highlighted the need to account for climate change in protected area design (Araújo, Cabez, Thuiller, Hannah, & Williams, 2004;Hannah et al., 2007) and to assess the best locations to protect species of conservation priority (e.g., 20082008; Porfirio et al., 2014).
When applying SDMs to climate change studies, the variables selected for modelling are assumed to be good predictors of a species' range in a new time and place. Until a forecasted future climate is realized, however, it will be impossible to determine the accuracy of these predictions. A major advantage of using proximal climate variables is that their direct link to physiology and therefore species distributions can be quantified and is unlikely to change significantly over the modelled time period (acknowledging that although local adaptation may occur), it is unlikely to match the rate of climate change (Davis & Shaw, 2001). This means proximal variables are likely to be more reliable indicators of future species distributions.
The distal variables used often in correlative studies may provide less robust estimates of future ranges as their correlations to proximal variables cannot confidently be extrapolated over space or time (Elith & Leathwick, 2009 Alongside raising the need for accurate predictions of species ranges, climate change will increase the challenges associated with modelling their distributions. For example, climate change is expected to increase the frequency and intensity of extreme weather events (Collins et al., 2013) which can advance the change in species composition in response to altered environmental conditions (Jentsch, Kreyling, & Beierkuhnlein, 2007). The possibility of more extreme weather supports the use of physiologically relevant climate variables as correlations between proximal variables that reflect climatic events and those describing averaged trends may weaken or break down in more unpredictable climates. Fay, Carlisle, Knapp, Blair, and Collins (2003), for example, show that increased variability of rainfall, without reduction in the overall rainfall amount, can reduce above ground net primary productivity in a tall-grass prairie in Kansas.
Similarly, Orlowsky and Seneviratne (2012) report that predicted future seasonal extremes of temperature scale with changes in global annual mean temperature by a factor of more than two, with the consequence that limiting thresholds of temperature may not be captured in averaged data (e.g., Parker & Abatzoglou, 2017).
Although recent range expansions have been attributed to rises in mean annual temperatures (Wilson et al., 2005), this means that species responses to distal predictors are likely to be lagged, and the absolute number of days outside of their physiological tolerance may increase on a much shorter time-scale (Parmesan, Root, & Willig, 2000). Late frosts or summer heatwaves, for example, are likely to impact species almost immediately if these affect their ability to survive, grow and reproduce and proximal variables would be able to capture these tolerances and track changes to species distributions occurring in "real-time." This may also provide information on changes to species distributions on a time-scale that is more relevant to conservation decision-making and facilitate the development of proactive management strategies.
A lengthening of the growing season is another expected result of climate change (Jentsch et al., 2007) and has already been observed in higher latitudes (Menzel & Fabian, 1999 to capture, at the population-level, the effect of many individuals responding to climatic pressures and must therefore offer a good "mean field approximation" for biological processes that determine whether a species can survive, grow and reproduce in an area (Bennie et al., 2014). Proximal variables are intimately tied to biological process of the study species and as such, may provide better approximations of the climatic requirements of a species that influence their distributions (Kearney & Porter, 2009 (Kearney, Shamakhy, et al., 2014). However, a number of climate forcing variables are required as inputs and the reliability of microclimate estimates may be compromised if hourly data are unavailable and therefore obtained by interpolation. Readily available datasets of ecophysiologically meaningful variables, or fine spatial and temporal climate data, which allow such variables to be derived (Kearney, Isaac, & Porter, 2014), are therefore much needed.
To predict how climate change may impact species distributions, physiological datasets for potential future climate scenarios will also be required. This may be possible through the use of statistical weather generators which produce multiple statistically plausible simulations of weather at temporal resolutions (e.g., Ivanov, Bras, & Curtis, 2007) which could in aggregate be used to generate probabilistic estimates of physiologically relevant variables.
Importantly, this approach to modelling future climate conditions can capture changes to climate extremes and variability (Semenov & Barrow, 1997) and has been applied with success in the agricultural literature in studies of crop suitability (e.g., White, Hoogenboom, Kimball, & Wall, 2011;Holzkämper, Calanca, Honti, & Fuhrer, 2015) and future climatic risk (Mosedale et al., 2015). Meanwhile, a useful next step would be to test the ability of our top 10 physiological variables to predict the current distributions of some species.

| CON CLUS ION
Species distribution models should be constructed using aspects of climate to which the study species is known or most likely to respond (Bramer et al., 2018;Suggitt et al., 2017). We have shown here that the most commonly used SDM variables often neglect important physiological factors and, in particular, that soil moisture content and the timing of climatic events during the growing season should feature more explicitly in the climate variables used in plant SDMs. We echo other researchers in that climate variables should be justified based on the physiology of the study species (e.g., Austin & Van Niel, 2011), but more specifically, that they should be closely related to these proximal mechanisms. This is likely to be particularly important when predicting species distributions in tropical or mountainous environments, where we suggest that the results of SDMs that use distal variables are interpreted with more caution.
Data deficiencies are often considered a limiting factor for the use of proximal variables in SDMs. With the advent and recent improvements in remote sensing technology, there are more opportunities than ever before to measure physiologically relevant variables and use these data to model species distributions (e.g., Kemppinen et al., 2018). Wherever possible, new technologies should be exploited to expand physiologically relevant climate datasets as this could help prevent variable use being compromised based on data availability.
We also urge climatologists to consider, as a matter of priority, the development of high-resolution climate surfaces for physiologically meaningful variables. The ability of statistical weather generators to provide information on physiological conditions for possible future climate scenarios should also be explored. There is a growing demand for robust predictions of species distributions and taking steps to make physiologically relevant climate data more widely available for use in SDMs could support the best conservation decisions to protect global biodiversity as the climate changes.

ACK N OWLED G EM ENTS
A. S. Gardner was supported by the Natural Environment Research Council (NERC) iCASE studentship [Grant Reference: NE/P01229/1] in partnership with Cornwall Council. We thank Dr J. Serra-Diaz and two anonymous referees for their helpful comments on this manuscript.

DATA ACCE SS I B I LIT Y
Please refer to Appendix S1 and Appendix S2 of the Supporting Information. The R script used to build our analysis has been released as an R package (climvars) on GitHub (https ://github.com/ ilyam aclea n/climvars).