Past distribution of epiphyllous liverworts in China: The usability of historical data

Abstract Epiphyllous liverworts form a special group of bryophytes that primarily grow on the leaves of understory vascular plants in tropical and subtropical evergreen broadleaf forests. Being sensitive to moisture and temperature changes, epiphyllous liverworts are often considered to be good indicators of climate change and forest degradation. However, they are a poorly collected and taxonomically complicated group, with an only partly identified distribution pattern. In this study, we built four models based on 24 environmental variables at four different spatial resolutions (i.e., 1 km, 5 km, 10 km, and 15 km) to predict the past distribution of epiphyllous liverworts in China, using Maxent model and 63 historical location records (i.e., presence‐only data). Both area under the curve of the receiver operating characteristic (AUC) and true skill statistic (TSS) methods are used to assess the model performance. Results showed that the model with the predictors at a 15‐km resolution achieved the highest predictive accuracy (AUC=0.946; TSS=0.880), although there was no statistically significant difference between the four models (p > 0.05). The most significant environmental variables included aridity, annual precipitation, precipitation of wettest month, precipitation of wettest quarter, and precipitation of warmest quarter, annual mean NDVI, and minimum NDVI. The predicted suitable areas for epiphyllous liverworts were mainly located in the south of Yangtze River and seldom exceed 35°N, which were consistent with the museum and herbarium records, as well as the historical records in scientific literatures. Our study further demonstrated the value of historical data to ecological and evolutionary studies.


| INTRODUC TI ON
Epiphyllous liverworts, a special group of bryophytes that grow on the leaves of understory vascular plants, often inhabit constantly moist and warm forests in tropical and subtropical regions (Chen & Wu, 1964; Figure 1). There are three types of epiphyllous liverworts: obligate, facultative, and occasional. The obligate epiphyllous liverworts occur exclusively on living leaves. The facultative epiphyllous liverworts occur predominantly on living leaves but can grow on other substrates. While the occasional epiphyllous liverworts seldom occur on living leaves, but predominantly present on other substrates. Both obligate and facultative species belong to typical epiphyllous liverworts (Zhu & So, 2001). They are particularly sensitive to moisture and temperature changes and are regarded as potential indicators of climate change and forest degradation or integrity (Jiang et al., 2014;Pócs, 1996). Epiphyllous liverworts have been mainly found in Asia, Australia, Africa, Central and South America, and Macaronesian islands in Europe at latitudes of about 30 degrees north and south of the equator. At times, they have been found in regions at much higher latitudes such as Madeira (32.5°N) (Sjögren, 1975) and the Azores (38.5°N) (Sjögren, 1997) in Portugal, the Appalachians (35.0°-37.97°N) (Davison, 1997;Risk, Richardson, & Davison, 2011;Schuster, 1959), Caucasus Mountains (43.5°N) in Russia (Pócs, 1982), Sikoku (33.75°N) (Kamimura, 1939) and Niigata Prefecture (38°N) (Shirasaki, 1997) in Japan, Chiltern Hills (51. 75°N) in Britain (Porley, 1996), and British Columbia (49.42°N) in Canada (Vitt, Ostafichuk, & Brodo, 1973).
In China, field surveys and studies on epiphyllous liverwort have been conducted for almost a century (Chen & Wu, 1964).
Approximately 168 epiphyllous liverwort species have been found in China due to its diverse topography and climatic conditions, with a relatively high endemism rate and high conservation status (Zhu & So, 2001). These species are widely distributed in tropical rainforests and subtropical evergreen broad-leaved forests throughout the Chinese provinces within 30 degrees north latitude, including Anhui, Fujian, Guangdong, Guangxi, Guizhou, Hainan, Hongkong, Hubei, Hunan, Jiangxi, Sichuan, Taiwan, Tibet, Yunnan, and Zhejiang (Chen & Wu, 1964). However, recent studies also found them in regions even further north (i.e., 31°N), including Guanxian county in Sichuan province (Luo, 1990) and Houhe Nature Reserve in Hubei province (Peng, Liu, & Wu, 2002). The spatial distribution of epiphyllous liverworts may vary over time because of changes in climate and habitat conditions. The temporal patterns of species distribution can be examined by drawing a biological inference from species locational data of various periods via a GIS-based species distribution model (Butcher et al., 2014;Guisan & Thuiller, 2005;Guisan & Zimmermann, 2000).
Species distribution models (SDMs) are widely used in ecology and conservation, which relate species occurrence data to environmental predictor variables on the basis of statistically or theoretically derived response surfaces (Guisan & Zimmermann, 2000). Species occurrence data can be categorized as simple presence or presence-absence observations based on random or stratified field sampling or records obtained from natural history collections (Graham, Ferrier, Huettman, Moritz, & Peterson, 2004). Environmental variables can directly or indirectly affect species. Biologists have long been attempting to identify where a species will be in the future and to predict its temporal and spatial distribution in unknown regions on the basis of geographical distribution data of species in the past and present (Moya, Jacome, & Yoo, 2017;Ning, Wei, & Feng, 2017).
Understanding the spatial dynamics of species over time and their driving factors has a critical role in resource utilities, potential risk F I G U R E 1 Epiphyllous liverworts growing on leaves of various vascular plants. Photographs by Yanbin Jiang assessment, and conservation planning (Guisan & Thuiller, 2005;Johnson, Ober, & Adams, 2017). SDMs have already been applied for predicting the current distribution of epiphyllous liverworts (Jiang et al., 2014). However, the spatial and temporal dynamics of epiphyllous liverworts remain unknown. Epiphyllous liverworts are likely to be among the groups of organisms that would benefit most strongly from the use of historical records for ecological and conservation research, because these species have fast generation times and tightly coupled with the local environment. Therefore, natural history collections housed in museums and herbaria, as well as bibliographic records of historical data, may be used to predict their distribution and change.
A significant limitation of historical records of epiphyllous liverworts is the uncertainty about where the occurrences are located.
The local name, altitude, habitat, and collection time are the only valuable information available in most of the presence data. How historical records can be used to characterize the propagation patterns of epiphyllous liverworts, therefore, needs to be determined to examine the distribution range at the regional scale. A set of predictors available at fine resolution (grain size) may also need to be aggregated to coarser resolutions (Guisan, Graham, Elith, & Huettmann, 2007). Thus, this study aims to examine the past distribution of epiphyllous liverworts in China based on historical records of epiphyllous liverworts as well as environmental variables at different spatial resolutions. In particular, we set out to address the following questions: (a) How do spatial resolution (grain size) changes affect model performance using historical records for modeling the distribution of epiphyllous liverworts? (b) How wide is the modeled distribution of epiphyllous liverworts across China under the different spatial resolutions? (c) Which abiotic or biotic factors (e.g., topography, temperature, precipitation, and vegetation) limit the geographical distribution of epiphyllous liverworts at various spatial resolutions (e.g., 1 km, 5 km, 10 km, and 15 km) ?

| Species data
Typical epiphyllous liverworts, including both obligate and facultative species, were considered as the target species of the current study. In total, about 140 epiphyllous liverworts species belonging 28 genera of 11 families were involved. We considered all these species as a "species group." These species occurrence data were composed of historical records collected before 2000 (Appendix 1). These records were derived from publications  and natural history collections from the Herbarium, Institute of Botany, Chinese Academy of Sciences . Most of the location information of the historical species data was presented as descriptions of localities with variable accuracy. Records with the vague location information, such as province, county, or locations that cannot be found, were excluded in this study. We approximated geo-referenced point localities through Google Earth and the Vegetation Map of China (1:4,000,000) (http://westdc.westgis.ac.cn), considering the following three factors: (a) local name; (b) elevation; and (c) forest distribution. The geo-referenced historical data may have variable location accuracy, while accurate occurrence records are available at high resolutions (Engler, Guisan, & Rechsteiner, 2004). Occurrence localities with a distance of at least 15 km were retained to lessen spatially autocorrelated effect. A total of 63 historical records with the estimated location in the range of the study area were obtained and plotted in Figure 2.

| Environmental variables
Three categories of environmental variables, including bioclimatic data, topographic data, and satellite-derived vegetation indices, were used to predict the epiphyllous liverworts distribution in this study.
We downloaded 19 bioclimatic variables from the WorldClim website (http://www.worldclim.org/). WorldClim is a set of global climate layers (climate grids) at a 1-km resolution, which was generated by interpolating observations from over 4,000 weather stations around the world between 1950 and 2000 (Hijmans et al., 2005). We also downloaded the potential evapotranspiration (PET) and Aridity Index (AI) datasets from the CGIAR-CSI GeoPortal (http://csi.cgiar. org). Both PET and AI grid layers are available at 1-km spatial resolution representing the annual average over the 1950-2000 period.
PET is a measure of the ability of the atmosphere to remove water through evapotranspiration processes. AI defined as the ratio of annual potential evapotranspiration to annual precipitation, which can be used to quantify precipitation availability over atmospheric water. AI values increase for more humid conditions and decrease with more arid conditions. Topography is a relatively static variable compared with other biophysical factors, including climate, functioning as a key driver of biodiversity (Rosenzweig, 1995). We downloaded the GTOPO30 digital elevation model (DEM) data from the U.S. Geological Survey website (https://lta.cr.usgs.gov/GTOPO30), which has a 30-arc-seconds (approximately 1 km) spatial resolution. Then, we generated slope and aspect data layers from the GTOPO30 DEM using ArcGIS 10.1.

Satellite-derived Normalized Difference Vegetation Index
(NDVI) data contributed significantly to the distribution of epiphyllous liverworts (Jiang et al., 2014). As the species occurrence data were derived from 1936 to 1999 (Appendix 1) and the climatic data were derived from 1950 and 2000, the only time-equivalent NDVI data source was the GIMMS NDVI (http://glcf.umiacs.umd.edu/ data/gimms/). The GIMMS NDVI is originated from 1981, with a resolution of ~8 km. A time series of 20 yearly (1981 to 2000) averaged images was generated and used to calculate meaningful NDVI indices: annual maximum NDVI, annual mean NDVI, annual minimum NDVI, and NDVI standard deviation.
All the environmental variables were firstly resampled and projected as GIS raster layers in GCS_WGS_1984 at ca. 1-km resolution.
Then, the 1-km variables were aggregated to coarser resolutions of 5 km, 10 km, and 15 km, and converted all layers to ASCII format for use in Maxent. Projection and aggregation were implemented in ArcGIS 10.1. Table 1 shows the details of all the environmental variables used for modeling.

| Species distribution modeling
Maximum entropy (Maxent) modeling is a general-purpose method for characterizing probability distributions from incomplete information (Phillips, Anderson, & Schapire, 2006). The Maxent method does not require absence data, making it appropriate for modeling species distributions based on presence-only historical species records. We used Maxent software (version 3.3.3e, http://www.cs.princeton.edu/~schapire/maxent/) to generate the species distribution model. Recommended default values of convergence threshold (10 −5 ) and maximum number of iterations (500) were used when building the model. We generated 10,000 random points (i.e., background or pseudo-absence sample points) from the whole study area. Suitable features and regularization values used can reduce model overfitting and complexity (Warren & Seifert, 2011). According to Phillips and Dudik (2008), combinations of features including linear (L), quadratic (Q), and hinge (H) were set by default in Maxent when species occurrence samples were 15 to 79. We thus practiced the L, LQ, H, and LQH features, with regularizations of 0.5, 1, 1.5, 2, 2.5, 3, 3.5, and 4, respectively, in order to select the optimal settings of features and regularization. The selection of features and regularization was carried out based on the sample size corrected Akaike information criteria (AICc) (Warren & Seifert, 2011). The default logistic output of Maxent is continuous variables ranging from 0 to 1, where high values indicate high relative suitability.

| Model scenarios, evaluation, and statistical analysis
To determine the proper resolution of accurately modeling the past distribution of epiphyllous liverworts, we developed four model scenarios using the same species dataset (63 historical records).
The spatial resolution of these environmental layers was at 1 km, 5 km, 10 km, and 15 km, and each level of layers together with species dataset was a model scenario. To avoid sampling bias of species occurrence data, we used a bias file in each model scenario. The bias file was generated based on the point localities of historical records by applying kernel density function (Elith, Kearney, & Phillips, 2010).

F I G U R E 2 Study area and locations of the 63 occurrence records of epiphyllous liverworts in China used in the species distribution models
To facilitate model evaluation, we used the cross-validation approach with 10 replicates in the Maxent for each model scenario. During the cross-validation, the species dataset was divided into 10 random partitions, and the model was operated 10 times with each of the 10 partitions as a testing set (six or seven occurrence localities); the other nine partitions were used as a training set (57 or 56 occurrence localities) in a replicate. As a result, 10 datasets, including predicted values of training and testing localities and 10,000 background (pseudo-absence) localities, were generated automatically. We integrated 10 replicates for model evaluation, with a testing prediction and a corresponding background prediction by each replicate. Each locality in an evaluation replicate has two values: One is the observed occurrence value (background points = 0; test presence points = 1), and the other is the predicted value derived from the logistic output of the Maxent model.
To evaluate the predictive accuracy of models, we used both threshold-independent and threshold-dependent methods. The area under the curve (AUC) of the receiver operating characteristic (ROC) is a dominant tool in evaluating the accuracy of models predicting distributions of species because the ROC has the advantage of being threshold-independent. The resulting AUCs range from 0 to 1, with 1 indicating a perfect fit of the model, > 0.9 signifying excellent model performance, 0.7-0.9 as moderately useful models, < 0.7 for poor model performance, and "0.5" indicating randomness (Pearce & Ferrier, 2000). Considering that AUC cannot be used as a standard and sufficient measurement of accuracy in species distribution models (Austin, 2007), we also used the true skill statistic (TSS), a threshold-dependent method.
TSS is calculated by adding sensitivity and specificity together and subtracting 1. The TSS values range from −1 to 1, and 1 indicates a perfect fit and values of 0 or less indicate a performance no better The fixed thresholds that reject only the lowest 10% of possible predicted values (T10) were then examined (Pearson et al., 2007).
The additional one is the value that corresponds to the point on the ROC curve where sensitivity and specificity are maximized (Max Sensitivity + Specificity) (Braunisch & Suchant, 2010). We selected the second one because the thresholds of four model scenarios from the LPT were small and the maximum Sensitivity + Specificity significantly differed (Appendix 2). On the basis of the determined thresholds, we compared the spatial distribution range of epiphyllous liverworts at four spatial resolution levels. We applied the Jackknife test to diagnose which environmental variables were the key predictor variables to create the models (Prates-Clark, Saatchi, & Agosti, 2008). The importance of an environmental variable is determined on the basis of obtaining a large training gain when the variable is used alone in the model and a subsequent decrease in training gain when removed from the model. The response curves were also plotted to demonstrate how variables affected the presence probability of epiphyllous liverworts being present.
The response curves used all point localities and the respective environmental variable in isolation, and, thus, do not include interactions with other environmental variables (Phillips et al., 2006).

| Model performance
According to AIC c criteria, models with LQH features and regularization of 0.5 were selected. For all model scenarios, the AUC values were significantly higher than those of the random model The 15-km resolution models obtained the highest AUC and maximum TSS when compared with the three other models. By assessing the AIC c values, the 15-km resolution model also exhibited the best performance with the lowest AIC c (Table 2).

| Comparison of predictive performance
We derived four distribution maps of epiphyllous liverworts over entire China from four model scenarios on the basis of the environmental variables at 1 km, 5 km, 10 km, and 15 km resolutions, respectively. Logistic presence probability of epiphyllous liverworts is depicted in Figure 3. They exhibited similar distribution range; the north distribution extension did not exceed 35°N, and the most likely occurrence area was located in the south of

| Relative importance of environmental variables in determining species occurrence
Jackknife tests were performed to determine key variables influencing epiphyllous liverworts distribution at different spatial scales. The environmental variable with the highest training gain, when used in isolation, is considered to contain the most predictive ability of any variables. The environmental variable reduces the gain the most when it is omitted, which therefore appears to possess the highest amount of information that is not present in the other variables. Figure 4 shows the results of the jackknife experiments, which reveals that the factors that de-   The graphs depict the training gains when a variable is used in isolation, when the variable is excluded, and when all variables are utilized. The gain is a measure of how better the Maxent probability distribution fits the distribution of occurrence data. A variable has useful information when the gain is high as it is used in isolation and has unique information when it reduces the gain most when it is excluded Bio16, Bio18, NDVI_mean, and NDVI_min were preferable to epiphyllous liverworts presence and only if these variables reached or less than a particular value, epiphyllous liverworts probably occurred. For example, temperature annual range was less than 31°C, annual precipitation was higher than 1000 mm, and annual minimum NDVI exceeded 0.15, and the presence probability of epiphyllous liverworts could reach to 0.2.

| Factors determining model performance and distribution range
Spatial scale is a fundamental issue in the construction of the species distribution model. Sampling resolution should optimally be selected to be as coherent with the resolution of the predictor variables and to correspond to the scale relevant for habitat selection (Guisan & Thuiller, 2005). If species records are of vague locations, then a set of predictors available at a fine resolution may need to be aggregated to a coarse resolution (Guisan et al., 2007). Changing the resolution can result in two directions of model performance, that is, slight average toward model degradation at coarse resolution (Guisan et al., 2007) or model improvement at the coarse resolution compared with the fine resolution (Tobalske, 2002). In the present study, model performance exhibited an insignificant trend along resolution coarsening according to AUC and TSS. The effect of the resolution on the model performance could be species-specific (Gottschalk, Aue, Hotes, & Ekschmitt, 2011;Guisan et al., 2007;Seo, Thorne, Hannah, & Thuiller, 2009). For species in our study, the resolution did not significantly influence model fitting at the regional scale. Nevertheless, 15 km was suggested to be the optimal resolution of all the four resolutions to model epiphyllous liverwort distribution, because it F I G U R E 5 Response curves illustrating the relationship between presence probability of epiphyllous liverworts and environmental variables. These curves show how the response changes for a particular variable used in isolation. The response curves were derived from the 15-km model in Maxent showed the highest model fit and training gain among all model scenarios. Also, the distribution map derived from the 15-km resolution model showing less suitable area and more fragment distribution patches was more consistent with the observed or real species distribution according to expert knowledge of epiphyllous liverworts and our previous study (Jiang et al., 2014). The higher accuracy achieved by the coarse resolution model indicated that a proper spatial resolution of environmental variables in accordance with the accuracy of occurrence location should be taken into consideration.
As studied by numerous ecologists, sample size is another key issue on the performance of species distribution models (Hernandez, Graham, Master, & Albert, 2006;Stockwell & Peterson, 2002;Wisz et al., 2008). Models with a large number of occurrences in the training set generally performed better and had smaller variances than models built with few occurrences (Guisan et al., 2007). Accurate predictions of species distributions were also based on adequate sampling of environmental variation, because any two geographical regions will differ in the distribution and range of their environmental variation .

| Environmental variables accounted for epiphyllous liverworts occurrence at the regional scale
Climate is often considered a predominant range-determining mechanism at large spatial scale (Blach-Overgaard, Svenning, Dransfield, Greve, & Balslev, 2010;Guisan et al., 2007;Pearson & Dawson, 2003). The variables most often having the highest contributions in the Maxent model were variables related to precipitation and temperature. For epiphyllous liverworts in China, high annual precipitation and mean temperature increase the presence probability. As we analyzed that AI, Bio13, Bio16, and Bio18 were all closely correlated with annual precipitation, Bio4, and Bio7 were correlated with annual mean temperature tightly. These results reflect that epiphyllous liverworts favor habitats with humid and warm climate, which is consistent with past ecological studies on epiphyllous liverworts (Benavides & Sastre-De Jesus, 2011;Jiang et al., 2014;Kraichak, 2014;Olarinmoye, 1974). These areas with humid and warm climate determine the geographical distribution of evergreen forests in China, which occur between 18 and 32°N and 98-123°E, within areas dominated by tropical and subtropical climate, with annual mean temperature between 14°C and 26°C, and precipitation ranging from 1,000 to 5,000 mm (Wu, 1980). Annual mean and minimum NDVI also provided meaningful and significant contributions to defining the distribution range and spatial patterns of epiphyllous liverworts.
However, the importance of the NDVIs was a bit less than climatic variables in this study, which is inconsistent with our previous study (Jiang et al., 2014). Some reasons could explained: First, it may attributed to the uncertainty of the species occurrence data; second, the long-time span of historical records may be more sensitive to climate change other than vegetation cover; third, NDVIs originated from the GIMMS (8 km) were of much coarser resolution than that derived from SPOT sensor (1 km). By contrast, topographic variables had an insignificant influence on the regional presence of epiphyllous liverworts, which is consistent with the results of our previous study (Jiang et al., 2014). The descriptions of the known localities demonstrate that epiphyllous liverworts are distributed in a broad range of altitude, from 300 to 2,800 m, and they are sensitive to microclimate and small terrain changes. The topographic effect considerably weakens under a broad scale, with a resolution higher than 1 km.

| Importance of historical data
Systematic surveys with constant spatial scale as environmental variables are likely to be more powerful than haphazard historical records in species distribution modeling (Aikio, Duncan, & Hulme, 2010). Historical distributions of organisms in recent and distant (paleontological) past however have provided a platform for assessing biodiversity dynamics with and without anthropogenic influence (Graham et al., 2004). Historical data are consid- where predicted high occurrence probability may represent sufficient geographical conditions. Even spatial error exists due to the descriptive localities, which can be reduced after selecting an appropriate spatial resolution of environment layers. Historical data are therefore useful in helping to construct a reliable model when accurate samples are insufficient.

| CON CLUS ION
Successfully modeling the past distribution of epiphyllous liverworts based on historical records depended on several factors. Changes in resolution did not significantly affect model fitting performance, but influenced the suitable area and distribution pattern. 15 km was suggested to be the optimal resolution of the four resolutions (1 km, 5 km, 10 km, and 15 km) to model epiphyllous liverwort distribution, because this model possessed the highest model fit and training gain, and more consistent with the real species distribution. Climatic variables, especially humidity-related variables, such as annual precipitation and aridity, together with vegetation indices contributed significantly in defining species distribution range and spatial patterns. The low predicted area indicates that epiphyllous liverworts only occur in a restricted geographical range in China. The results of our study indicate that epiphyllous liverworts are suitable for the analyses of ecological and biogeographical patterns over time and space, and certainly help in assessing the effect of human disturbance on the distribution and predict future distribution to climate change. The predicted approximate habitat suitability and habitat loss also guide conservation and management.

CO N FLI C T O F I NTE R E S T
None declared. Threshold determined by the lowest predicted value was associated with any one of the observed presence records (LPT), rejecting the lowest 10% of possible predicted values (T10), and maximum specificity plus sensitivity (Max Se+Sp).