Modelling native and alien vascular plant species richness : At which scales is geodiversity most relevant ?

School of Geography, University of Nottingham, University Park, Nottingham NG7 2RD, United Kingdom Geography Research Unit, University of Oulu, P.O. Box 8000, FI-90014, Finland Correspondence Joseph J. Bailey, School of Geography, University of Nottingham, University Park, Nottingham NG7 2RD, United Kingdom. Email: josephjbailey@outlook.com Editor: Dr. Adriana Ruggiero Funding information This research was supported by the U.K. Natural Environment Research Council (NERC) PhD Studentship 1365737, which was awarded to J.J.B., University of Nottingham, in October 2013 (supervised by R.F. and D.B.). J.H. acknowledges the Academy of Finland (project number 285040). Abstract


| INTRODUCTION
Understanding the spatial patterns of biodiversity is important for scientific theory, conservation and management of ecosystem services (Hanski et al., 2012;Lomolino, Riddle, Whittaker, & Brown, 2010). Climatic variables are well known to correlate strongly with species richness over large spatial extents (Hawkins et al., 2003); correlates of species richness at smaller extents (regional and landscape scales) are less well established (Field et al., 2009;Vald es et al., 2015), but environmental heterogeneity is widely thought to be important (Stein, 2015;Stein, Gerstner, & Kreft, 2014). Although a bewildering array of measures of environmental heterogeneity have been used, there is growing interest in geodiversity, both as having value in itself (Gray, 2013) and as a potential correlate and predictor of spatial biodiversity patterns (Lawler et al., 2015).
Quantification of these geofeatures varies across studies (e.g., Pellitero, Manosso, & Serrano, 2015). We introduce the term 'geodiversity component' (GDC; Figure 1b), to refer to the quantified geofeature, whether this be areal coverage (e.g., of a particular landform), richness (e.g., the number of geological types) or length (e.g., of a river). These GDCs together measure 'geodiversity' at the scale being studied. The GDCs we use here are intended to capture aspects of the abiotic heterogeneity with which living organisms interactand thus better and more explicitly measure environmental heterogeneity for the purposes of explaining species richness patterns than crude topographic measures such as mean slope, elevational range or mean aspect (Figure 1).
Such topographic measures have been widely used as correlates or predictors of species richness , and to create a conceptual distinction we omit these from our definition of geodiversity.
A small but rapidly growing number of studies have found that explicit measures of geodiversity add explanatory power to statistical models accounting for spatial biodiversity patterns (e.g., Hjort et al., FIG URE 1 Our definition of 'geodiversity', which is amongst the more specific in the context of the wider literature. It omits relatively crude topography and climate data (a) and consists of geodiversity components (GDCs). The GDCs used in our study, and their associated geofeatures and ecological relevance, are listed (b) 2012; Kougioumoutzis & Tiniakou, 2014;Pausas, Carreras, Ferre, & Font, 2003;Tukiainen, Bailey, Field, Kangas, & Hjort, 2016; see the review by Lawler et al., 2015). However, these studies have tended either to consider only one or two aspects of geodiversity or use a single geodiversity variable that simply counts geofeatures to produce an overall measure of georichness (e.g., Hjort et al., 2012;Räsänen et al., 2016). The considerable improvements in explanatory power that these preliminary approaches have achieved indicate the need for fuller analysis of the relationship between biodiversity and geodiversity, and particularly for explicit consideration of the separate components of geodiversity (Beier et al., 2015a). To date, very few studies have attempted this, and even fewer at geographical extents greater than the landscape scaleexcept for that by Tukiainen et al. (2016), which only analyses threatened species. Therefore, we now have evidence suggesting that geodiversity affects biodiversity, but our understanding of how it does so remains severely limited.
While we know much about the scale dependence of the relationships between species richness and many of its commonly used correlates (McGill, 2010;Mittelbach et al., 2001;Pausas et al., 2003;Ricklefs, 1987;Rosenzweig, 1995), little is known about the scales at which richness is most strongly correlated with geodiversity. Current thinking is that geodiversity is most relevant to species richness at landscape to regional extents, with climate dominating at broader (e.g., continental) extents and biotic interactions more locally (Lawler et al., 2015). Theoretically, the local and landscape extents are most relevant because the various GDCs may be amongst the most variable predictors at this scale (Tukiainen et al., 2016;Willis & Whittaker, 2002), unlike climate. Therefore, if GDCs are important determinants of the spatial arrangement of biodiversity, we should expect their statistical explanatory power to be strongest at the local and landscape scales.
We also know relatively little about the importance of grain size in modelling species richness. Theoretically, coarser grain sizes may average out fine-scale abiotic environmental heterogeneity over a larger area, thus relating more weakly to species richness, unless these finescale data are related to broad environmental gradients (Field et al., 2009;Hawkins et al., 2003).
A key reason for the limited research to date on geodiversity and its relationship with biodiversity is limited data availability. In broadscale macroecological studies in particular, the widespread use of topographic measures to date, such as topographic range or standard deviation, in statistical models of species richness patterns is explained primarily by the difficulty of obtaining more sophisticated and meaningful environmental heterogeneity variables (e.g., O'Brien, Field, & Whittaker, 2000). However, better data and processing capabilities now allow landscape heterogeneity to be quantified in new ways. Here we take advantage of these developments to move beyond simplistic measures of topographic heterogeneity and derive novel geodiversity variables. In particular, we use 'geomorphon', a recently developed geomorphometric tool for extracting landform data from digital elevation models (Jasiewicz & Stepinski, 2013). This allows low-cost quantification of landform features, which we use to measure landform richness at a spatial resolution of 25 m across the whole island of Great Britain.
Alien and native species richness are likely to relate differently to the abiotic environment (Kumar, Stohlgren, & Chong, 2006;Py sek et al., 2005), but little work has compared the relationship of alien and native species richness with environmental heterogeneity. Native species have had longer to equilibrate with abiotic environmental conditions (Räsänen et al., 2016), so their richness may be expected to be more closely related to geofeatures and topography. Conversely, geofeatures may account less well for alien species richness, especially of neophytes (species introduced after AD 1500), which are more likely to be found where temperatures are higher and where there is greater human presence and connectivity via transport networks (Celesti-Grapow et al., 2006;Py sek, 1998). An exception may be waterwaysthese geofeatures can promote the spread of alien species (Deutschewitz, Lausch, K€ uhn, & Klotz, 2003). Natural disturbance processes may also create suitable conditions for alien species (Fleishman, Murphy, & Sada, 2006). Broadly, we expect native species to have the strongest relationship with geodiversity, followed by archaeophytes (alien species introduced before AD 1500) and then neophytes.
Overall, despite the clear potential for geodiversity to improve our understanding of spatial biodiversity patterns in relation to environmental heterogeneity, its incorporation into biodiversity modelling is underdeveloped conceptually, spatially and empirically. Outstanding questions include: At what spatial scales and in which types of location is geodiversity most relevant? For which taxa? Does it relate differently to alien species than to native species? Which geofeatures are most important? Here, we begin to address some of these knowledge gaps by analysing the relationships between a wide range of GDCs and the species richness of both native and alien vascular plants across Great Britain. We test the degree to which GDCs add explanatory power over and above widely used topographic and climatic variables at varying spatial scales, using two grain sizes and either seven (small grain size) or five (large grain size) study-area extents. Our main aims are to determine: (a) the scales at which geodiversity best accounts for species richness patterns; (b) which components of geodiversity account for the most variation in species richness, and how much; and (c) whether geodiversity-species richness relationships differ between native and alien species. Specifically, we tackle to following hypotheses: (H 1 ) geodiversity will contribute significantly to biodiversity models, particularly at smaller study-area extents (Hjort et al., 2015;Tukiainen et al., 2016); (H 2 ) the most relevant GDCs will vary between native and alien species (Deutschewitz et al., 2003) and, within alien species, between archaeophytes and neophytes.

| Data
All predictors and predictor sets are summarized in Table 1. Data sources are detailed further in Appendix S1 in the Supporting Information.
Data were compiled for each 1 km 2 (n 5 222,111) and 100 km 2 (n 5 2,121) British National Grid cell using ARCGIS 10 (and GRASS GIS for geomorphometry, as detailed below) and processed and analysed in R (R Core Team, 2016).
Vegetation data were provided by the Botanical Society of Britain and Ireland (BSBI) via the Distribution Database at two grain sizes: 1 km 3 1 km ('monad') and 10 km 310 km ('hectad') grid cells corresponding to the British National Grid. The BSBI hosts a single database to which data are contributed by its volunteers and coordinators, who are strongly encouraged to use unbiased sampling (Walker, Pearman, Ellis, McIntosh, & Lockton, 2010). We used accepted data records (those verified within the database) collected between January 1995 and September 2015.
Species were defined as native, archaeophyte (probably introduced by humans before AD 1500), or neophyte (after AD 1500). 'Casual aliens' (those that fail to establish) were excluded. Total species richness (all three groups plus uncategorized species or those with no accepted status) and alien species richness (archaeophytes plus neophytes) were also modelled. Status definitions of each species followed the Wild Flower Society (2010), which, in turn, used multiple sources. Grid cells with less than 75% land coverage (considering lakes and ocean) were excluded. The final dataset contained 6,932 species: 1,490 natives and 1,331 aliens comprising 151 archaeophytes and 1,180 neophytes, the rest of the species being unclassified.
Undersampled grid cells were excludedthis removed bias arising from unrealistic species richness values due to undersampling. To determine undersampling, we performed a series of linear regressions that used climate and topography variables (not geodiversity) to account for the species richness of grid cells within a radius of 150 km around each hectad. A cell within this region was flagged as potentially undersampled if its standardized residual was less than 21.5 (i.e., if species richness in that cell was strongly over-predicted). This was repeated for every hectad for both grain sizes. Grid cells flagged as undersampled more than 15% of the time they were analysed were classed as undersampled and removed. Two hectads (0.1%) and 2,147 monads (1%) were removed, leaving 2,121 and 219,964, respectively. This procedure ensures that grid cells are not perceived to be undersampled when they are simply in harsh environments that would most likely contain few species anyway.
A 25 m 3 25 m-resolution digital elevation model (DEM) was produced by resampling the 5 m 3 5 m NEXTMap DEM from Intermap

(obtained under academic license via the NERC Earth Observation Data
Centre; see Table 1). Using the DEM, we performed geomorphometric analyses (see below) and calculated commonly used topographic metrics (mean and standard deviation of elevation and slope). We downloaded c. 1-km 2 resolution climate data from WorldClim (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005). We calculated land-cover variety using the number of Corine land-cover classes. The total human population per grid cell was calculated from 2010 census data from Casweb.
We compiled GDCs (Figure 1b) using existing national datasets and automated extraction of landform coverage using geomorphometry (Table 1). Data included geological diversity and superficial deposit diversity derived from 1:50,000 scale shapefiles provided by the British Geological Survey under an academic licence. Soil texture data were from the same source but had a resolution of 1 km 2 . We calculated river length and lake area using OS Strategi GIS data. We used the geomorphometric algorithm 'r.geomorphon' developed by Jasiewicz & Stepinski (2013) in GRASS GIS 7.1 (GRASS Development Team, 2016) to automatically extract landform coverage data from the DEM (Appendix S2). The following landforms and features were mapped in raster format: peaks, ridges, shoulders, spurs, slopes, footslopes, hollows, valleys, flat areas and pits. We did not explicitly quantify mineralogy and pH, but these are implicitly incorporated via geology. Fossils, important for geoheritage and geoconservation (Thomas, 2012), were not included because of their limited theoretical relevance to the biodiversity patterns studied here and a lack of consistent data. Maps of climate, topography and geofeatures are presented in Appendix S3.

| Analysis
We developed species richness models for three predictor sets: (a) geodiversity only, (b) geodiversity variables excluded (leaving standard topographic variables, climate, population and land-cover variety) and . The two smallest extents were not used for the coarser grain size for reasons of sample size. All regional models were run using the centroid of each hectad grid cell (n 5 2,121) as the central point of each 'region'.
We used boosted regression trees (BRTs) to model species richness in R 3.0.2 (R Core Team, 2016). BRT is a machine-learning method that can be seen as an advanced form of regression modelling (Elith, Leathwick, & Hastie, 2008). Here, with a complex dataset, largely unknown relationships (particularly GDCs) and multiple scales with variable collinearities and interactions, use of a BRT was efficient and appropriate. Additionally, BRTs explicitly consider interactions, which can indicate important combined effects, and handle nonlinearity and collinearity relatively well (Dormann et al., 2013;Elith et al., 2008).
However, we also assessed collinearities separately.
We used gbm.step ('gbm 2.1.1' package in R; Ridgeway, 2015) to implement BRT. This function controls the number of terms in order to produce parsimonious models. To quantify modelled effects of individual explanatory variables, the contribution (relative influence) of each predictor was obtained from gbm.step. These are scaled to add to 100, where '100' for a predictor would mean that it ws the sole contributor to the final model. Where the model contribution reflected a negative relationship with species richness, we then made the value negative for display purposes. Combined model contributions were calculated for the predictor sets and subsets defined in Table 1. We used a tree complexity of 3 (allowing up to three-way interactions; Elith et al., 2008), a bag fraction of 0.5 and a preferred learning rate of 0.05, which was occasionally reduced to 0.01, 0.005 and then 0.001 according to data requirements.
Predictors contributing <10% (or sometimes <7.5%) were removed from the initial model, which was rerun with the simplified predictor set to produce the final results (further details are given in Appendix S4).
As well as evaluation using internal fit statistics ('self-statistics'), models were validated using 10-fold cross-validation (CV) in the 'gbm' package. This approach randomly subsamples the data 10 times according to the user-defined bag fraction; our bag fraction was 0.5, so each time 50% of the data were used to parameterize the model and the other 50% to evaluate it. The final cross-validation correlation statistic is the mean correlation between the training and testing data across 10 runs. Model statistics were compared with and without GDCs using paired-samples t-tests.

| RE SULTS
Geodiversity components (GDCs) made the largest contributions to models at the smallest study extent and smallest grain size (in the geodiversity column of Figure 3, the left-hand blue boxplot is the highest).
At this scale, geodiversity was the strongest of all the predictor sets (of all the left-hand blue boxplots in Figure 3, those for geodiversity are the highest). With each increase in extent, the modelled contribution of geodiversity declined substantially relative to the other types of variable. GDCs were not relevant at the larger extents, giving way particularly to climate and human population. Climate was more important for archaeophytes than neophytes. The contribution of 'topography' (the coarse variables typically used in modelling species richness patterns) showed similar patterns to geodiversity, but was less important at smaller scales and declined less sharply as scales increased. Mapping the results (Figure 2) shows the widespread dominance of the geodiversity predictor set at the smaller geographical extents, its importance generally declining relative to climate with increasing extent, except in  (Table   2), particularly annual mean temperature; human population was also often important, especially for species richness of neophytes.
The contribution of geodiversity to biodiversity models was dominated by landform data from geomorphometry, but hydrology (rivers and lakes), and to a lesser extent materials (soil, superficial deposits and geology), were also important ( Figure 3 The dominant predictor set for native (top row) and alien (bottom row) species richness at the 1-km 2 grain size for three spatial extents. White spaces are where the quantity of data was insufficient to run a reliable model or cells were excluded as they were undersampled. An example of the six extent diameters (25, 50, 100, 150, 200 and 250 km) is shown in the bottom-right, in this case for British National Grid cell SK54, which is one of 2,121 cells around which species richness was analysed at the two grain sizes and six extents with topography and climate (e.g., hydrology, rock variety, coverage of hollows, slopes and valleys) and others more strongly (e.g., coverage of peaks, ridges and spurs was moderately related to higher, cooler places at the coarser grain size and nationally), but these collinearities were still often much weaker than those between and within climate and topography predictors.
Self-statistics and cross-validation statistics were consistently higher (indicating better models) for larger extents, the coarser grain size of 100 km 2 and alien species richness ( Figure 5, Appendices S10 and S11). Adding geodiversity often, but not always, resulted in Numbers show the combined model contributions (rounded to whole numbers) for each predictor set. Model evaluation (mean cross-validation correlation, CV) and fit statistics (self-statistics, SS) are also presented. Arch 5 archaeophytes; Neo 5 neophytes.
FIG URE 4 Model contributions from individual geodiversity components at the 1 km 3 1 km grain size for each extent. These graphs are truncated at 150% and 250%, but only a small minority of points lie beyond these values. A full version of this figure with all species groups is included in Appendix S6. 'Sup. Dep.', superficial deposits significantly better models, especially at the smaller extents and for native species richness for both grain sizes (Table 3). Results for total species richness broadly followed those for native species richness ( Figure 3), despite the presence of many uncategorized species in the overall richness data. Results for alien species richness tended to follow those for archaeophytes, even though there were relatively few archaeophyte species.

| DISCUS SION
Geodiversity made a significant addition to models of vascular plant species richness over and above widely used topographic metrics, particularly at smaller geographical extents (H 1 ). At the smallest extent, geodiversity contributed more than any other type of predictor accounting for species richness, while at larger extents climatic variables became increasingly dominant. With respect to individual geodiversity components (GDCs), automatically extracted landform data were of particular explanatory value, demonstrating that species richness-landform relationships can be detected at macroecological scales.
These data represent a novel predictor set in macroecology and are rel-atively easily extracted from widely available DEMs. Our analyses also highlighted the importance of separately analysing individual GDCs rather than lumping them into a single variable for use in biodiversity modelling, as done in most of the limited research to date. Results were broadly similar for alien and native species richness patterns (H 2 ), except that neophytes were more strongly related to human population than were the other plant groups. Results for total species richness were very similar to those for native species richness, despite the presence of many uncategorized species in the overall richness data.
Geodiversity therefore succeeded in capturing unique dimensions of environmental heterogeneity that have theoretical mechanistic links to species richness, and which add explanatory power when modelling species richness patterns of vascular plants . This is consistent with our first hypothesis (H 1 ), which was based on theorized links between biodiversity and the presence and diversity of both landforms and surface materialsreflecting the presence of more resources and greater habitat and niche variety (Anderson & Ferree, 2010;Hjort et al., 2012;Lawler et al., 2015;Moser et al. 2005), and possibly the results of some disturbance processes (le Roux et al., 2013). Also consistent with H 1 was the decline in magnitude of the contribution of  Table 3. Archaeophyte and neophyte results can be seen alongside these in Appendix S10, and an equivalent graph for cross-validation statistics is also provided (Appendix S11). GDC, geodiversity component geodiversity with increasing extent, at both grain sizes, as other variables (particularly broad-scale climate) took over. Geodiversity therefore seems to provide a predictor set that can account for the variety of the abiotic environment at these finer extents ('landscape' scale) where broad-scale climate is more constant. At these scales, geodiversity data may be strongly related to microclimate and localized hydrological, edaphic and geological conditions that are relevant to the establishment and persistence of species.
Theoretically, variables measuring fine-resolution environmental heterogeneity may contribute relatively little to models of species richness using large grain sizes because of the tendency for the heterogeneity to average out within grid cells (Field et al., 2009). If so, GDCs such as those measuring landforms should have reduced explanatory power at larger grain sizes, when extent is held constant, while climate-and productivity-related variables may increase. However, for the 100-, 150-, 200-and 250-km geographical extents (for which both grain sizes were assessed), we observed similar geodiversity results for each grain size, often with slightly higher relative geodiversity contributions at the 100-km 2 grain than 1 km 2 . This suggests that the size (extent) of the study area more strongly affects the relative contribution of geodiversity as a biodiversity predictor than does grain size. This may be because the heterogeneity measured by GDCs is correlated with broader environmental gradients, so the averaging of fine-scale variation at larger grains does not affect the explanatory power of GDCs much compared with the large increases in the degree to which broad climatic and topographic gradients are captured at larger geographical extents (Hawkins et al., 2003). Further research is required on this question. GDCs results in greatly reduced multicollinearity problems compared with the use of crude topographic variables, then our ability to determine cause and effect should be improved; this is consistent with the notion that GDCs relate more directly to mechanisms than do crude topographic variables (Gray, 2013; and see the Introduction). That is, explicit consideration of landscape features in biodiversity modelling may enhance ecological understanding (Hjort et al., 2015), and is also likely to be highly relevant to the modelling of individual species' distributions.
Specific GDCs were important in the species richness models, consistent with the notion that species richness-GDC relationships can be detected at macroecological scales, and add to biodiversity models.
These results were far more informative than using a compound measure of geodiversity. For example, we observed some negative relationships between biodiversity and various GDCs (Figure 4), while valley coverage, river length and surface materials had more consistently Mean difference in model fit (self-statistics, SS) and evaluation (cross-validation, CV) statistics (also see Figure 5 and Appendices S10 and S11) between models with and without geodiversity (i.e., a positive value indicates an increase in model performance after geodiversity was added)  (Hjort et al., 2015) but not represented in the relatively coarse river maps that are generally available and used in this study. Knowledge of surface (soil and superficial deposits) and subsurface (geology) material richness was less useful than expected from previous research (e.g., Anderson & Ferree, 2010;Tukiainen et al., 2016). Perhaps an explicit consideration of the coverage of specific types of rock and soil (and mineralogy more generally) would be revealing. Further research on this would help us to better understand the links between specific GDCs and biodiversity.
The relative contributions of different predictors to alien and native species richness models showed broadly similar patterns across scales, but the magnitudes varied somewhat. The contributions from GDCs, particularly landforms, were greater for native species richness than alien, and native biodiversity models were also most improved by the addition of GDCs (partly consistent with H 2 ). Contributions from GDCs might therefore particularly add important information to native biodiversity models, which sometimes underperform compared with alien species richness models (Deutschewitz et al., 2003;Kumar et al., 2006). This finding is supported by the relative importance of geodiversity in explaining native richness compared with total richness models seen elsewhere (Räsänen et al., 2016).
The main difference between models of alien and native plant species richness lay in the contribution from human population, which was highest for neophytes, then archaeophytes and relatively low for natives. While the relationship between alien species richness and cities or human populations has been known for some time (e.g., Deutschewitz et al., 2003;K€ uhn, Brandl, & Klotz, 2004;McKinney, 2008; Py sek, 1998), our results suggest that this relationship is more pronounced for neophytes and that the strength of this relationship is affected by scale (both extent and grain size). In line with known links between riverine habitats and neophyte species richness (Deutschewitz et al., 2003), we observed a substantial contribution of river length to neophyte richness models, and frequent interactions between river length and human population, implying increased human influence along rivers, which in turn may promote neophyte species richness.
However, there was no notable relationship between native richness and river length, in contrast to findings elsewhere (Deutschewitz et al., 2003). Other aspects of hydrology (including lake area) tended to be less important in our models than landforms, topography, human population and climate. Overall, transport, for which human population provides a proxy, and river networks (and their interaction) could be promoting the dispersal of alien species and thus increasing neophyte species richness (Hulme, 2009;Py sek et al., 2010).

| Geodiversity in biodiversity science and conservation: Opportunities and challenges
We have used geodiversity to try to provide more explicit representation of environmental heterogeneity than crude topographic variables.
Indeed, GDCs more directly measure environmental conditions and processes, such as habitat diversity, resource gradients and microclimatic and sheltering effects (Hjort et al., 2015;Matthews, 2014), thus enabling us to more precisely capture the causal processes behind the biodiversity patterns. With this in mind, geodiversity may have benefits beyond species richness modellingin species distribution modelling, for example. Investigating where geodiversity is most relevant to patterns of life globally requires research on geodiversity in geographical domains beyond our study area. It has also been suggested that such information might be important in the context of refugia by identifying parts of the landscape that can withstand long-term environmental change by providing stable microclimates (Keppel et al., 2015;Lawler et al., 2015); and geodiverse locations may facilitate the adaptation of species to climate change, as well as their persistence (Albano, 2015;Maclean, Hopkins, Bennie, Lawson, & Wilson, 2015).
There are also practical reasons for improving our understanding of the relationship between geodiversity and biodiversity; the first relates to conservation. The idea known as 'conserving nature's stage' is gaining momentum (see Anderson & Ferree, 2010;Beier, Hunter, & Anderson, 2015b). This suggests that instead of targeting individual species and habitats for conservation we target areas capable of supporting high biodiversity under future environmental changes, by either better maintaining the existing environment or by providing greater environmental heterogeneity. Furthermore, geodiversity has been related to the diversity of threatened species richness and rarityweighted richness, at least in high latitudes (Tukiainen et al., 2016).
Indeed, the geodiversity data used in our study are likely to enhance our understanding of previously demonstrated links between abiotic diversity and site complementarity, which may be a measure of biodiversity that is more relevant to conservation than raw species richness . Areas high in geodiversity are also thought to promote greater resilience to climatic change for biodiversity and for essential provisioning, regulating, cultural and supporting ecosystem services (Brazier, Bruneau, Gordon, & Rennie, 2015;Gordon & Barron, 2013). However, we stress the importance of not overlooking the value of individual species in 'geohomogeneous' (low geodiversity) places or unique geofeatures in speciespoor areas. For example, an endangered species in a forest underlain by a single geology and few landforms should not be overlooked, whilst unique and irreplaceable geofeatures (e.g., certain fossils or rare geological units and mineralogy) will not always be relevant to biodiversity and present-day species distributions, but have geoheritage value.
Another practical advantage of geodiversity data is that they are usually cheaper and faster to collect and collate than species occurrence data  and, in areas where geodiversity correlates very strongly with biodiversity (e.g., northern Finland; Hjort et al., 2012), geodiversity may represent a useful surrogate for biodiversity. We have compiled a lengthy (but not exhaustive) table of GDC categories, including geology, soil, landforms and hydrology in Appendix S12 and a list of the remotely sensed datasets required, which may be a useful resource for reference. Importantly, a key dataset in our research was the automatically extracted landform data, which only required open-source GIS software (GRASS), a freely available algorithm within that software (r.geomorphon ;Jasiewicz & Stepinski, 2013) and a DEM. Datasets related to land surface materials are less accessible in much of the world, but high-resolution datasets of ecologically relevant soil variables have recently become widely available (Hengl et al., 2014;www.isric.org/content/soilgrids). Other geodiversity data not used in our study might improve models further, including explicit data on pH and mineralogy, for example, or topographic wetness or insolation. Additional sources of geodiversity data, appropriate for smaller study areas than in our study, include those captured on airborne platforms:  used aerial photography combined with field surveys. It would be interesting to know whether such intensive data add explanatory power and ecological meaning to data obtained automatically from geomorphometry. Future sources of such data could include capture by unmanned aerial vehicles (UAVs), release of archival data and increased data sharing on capture (Lowman & Voirin, 2016).
In conclusion, we have shown that geodiversity can add significantly to models of species richness of vascular plants over and above the widely used topographic metrics in our study area. We found some differences in the response of alien and native species richness to geodiversity; further research on this may be beneficial for conservation and management. Our findings demonstrate the largely unexploited potential of explicit geodiversity data, which may aid explanation by more directly measuring causal factors and reducing multicollinearity of explanatory variables. Research on the role of geodiversity across a variety of taxonomic and geographical domains is still in its infancy, and we have pointed to some research needs. Our finding that automatically extracted landform data were valuable should encourage collaboration between geomorphologists, ecologists and biogeographers.