DNA barcode analyses improve accuracy in fungal species distribution models

Species distribution models based on environmental predictors are useful to explain a species geographic range. For many groups of organisms, including fungi, the increase in occurrence data sets has generalized their use. However, fungal species are not always easy to distinguish, and taxonomy of many groups is not completely settled. This study explores the effect of taxonomic uncertainty in databases used for modeling fungal distributions. We analyze distribution models for three morphospecies from the corticioid genus Xylodon (Hymenochaetales, Basidiomycota), com-paring models based on species names on vouchers specimens with models derived from species identified by DNA barcode. Differences in the contribution of predictors driving the distribution of each modeled taxon and the extent of their ranges were studied. Records under Xylodon paradoxus , X . flaviporus , and X. raduloides were obtained from fungarium collections and GenBank repository. Two grouping criteria were used: (a) specimens were grouped by their collection or sequence voucher names and (b) specimens were grouped following molecular identification using ITS sequences through barcoding gap species recognition (BGSR). Climatic, geographic, and biotic variables were used to predict the potential distribution of each taxon through MaxEnt algorithm. From the three morphospecies selected according to voucher names, up to 19 species candidates were detected using BGSR. Climatic variables were the most important predictors in distribution models made from names on voucher specimens, but their importance decreased when BGSR was applied. In general, the extent of species distributions was more restricted for taxa under BGSR. Our results show that taxonomic uncertainty has a strong effect in Xylodon species distribution models. Misleading results can be obtained when cryptic species or identification errors mask the actual diversity of the presence records. Preserved specimens in natural history collections offer the possibility to assess whether the species name on labels matches the current species recognition criteria.


| INTRODUC TI ON
In recent decades, ecological and biogeographical studies have increasingly utilized new tools for modeling species distributions from presence records (Elith et al., 2006). Modeling based on correlations between species occurrences and environmental predictors has been used to obtain maps of potential distributions for poorly studied species, to evaluate pest risks (Sutherst, 2014), or to help in design of natural reserves (Watts et al., 2009). The combination of powerful, new algorithms with the increase in environmental cartography has made it possible to apply these methodologies in a broad range of organisms, including different groups of fungi such as ectomycorrhizal (Wolfe et al., 2010) or soil biocrust (Belnap et al., 2014). Indeed, fungi have been pointed as one of the most benefited groups due to the large number of occurrence records stored in fungarium collections (Hao et al., 2020;Wollan et al., 2008).
In the modeling process, much attention has been paid to algorithm performance (Qiao et al., 2015), the accuracy of predictor variables (Petitpierre et al., 2017), and the sample size and collection bias (Beck et al., 2014;Fourcade et al., 2014), but taxonomic uncertainty in presence records has attracted less interest (Elith et al., 2013). This could be due in part to the difficulty of assessing the reliability of records in reference collections (such as fungaria or herbaria) or citizen science databases (Lozier et al., 2009). On many occasions, only a list with geographic coordinates is available, and researchers must rely on the accuracy of geographic coordinates and, in particular, on the correctness of species identifications. This taxonomic uncertainty could produce misleading results with important conservation or economic consequences (Bortolus, 2008). This issue plays a major role in those groups for which taxonomy is not completely resolved, or organisms that require expertise to correctly identify the species (Smith et al., 2016).
One of the most important sources of taxonomic uncertainty in reference collections is the shift in species recognition criteria in recent decades (Bridge et al., 2003). The traditionally applied morphological species recognition, MSR (Taylor et al., 2000), has been used to identify more than 70,000 fungal species (Hawksworth et al., 1996;Taylor et al., 2000), resulting in a worldwide distribution for many of these taxa (Hallenberg, 1991). This homogeneous distribution for many fungal species has supported the Baas Becking hypothesis: "Everything is everywhere, but environment selects" (Baas Becking LGM, 1934). This idea, originally applied to microorganisms, has been extended to include fungal species due to the small size of fungal spores, the main agent of fungal dispersion (Taylor et al., 2006). The apparent unlimited dispersal ability of many fungal species has often been used to explain their cosmopolitan distributions (Davison et al., 2015). Nowadays, the development of molecular tools has allowed identification of a significant amount of hidden biodiversity with numerous cryptic or sibling species previously masked under a single species name (Fišer et al., 2018;Koufopanou et al., 1997). The shift from morphological to phylogenetic species recognition, PSR (Taylor et al., 2000), has redrawn the map of fungal distribution, and new biogeographical patterns have arisen when morphospecies were redefined following PSR criteria (May, 2018). There are already a number of situations where a single species with worldwide distribution has been redescribed as several species with regional or restricted distribution (Carlsen et al., 2011;Nilsson et al., 2003;Telleria et al., 2010). This new approach in the study of fungal diversity has promoted the idea that cosmopolitanism in fungi is just the result of the application of MSR, rather than an actual biodiversity distribution pattern (Sato et al., 2012). In this context, reference collections or DNA sequence repositories allow for a re-evaluation of the species names assigned to collections or sequence vouchers, and, therefore, the assessment of the effects of taxonomic uncertainty or misleading specimen identifications in the potential distribution inferred by species distribution models.
Xylodon (Hymenochaetales, Basidiomycota) is a white-rot fungus considered one of the most species-rich corticioid genera (Hjortstam & Ryvarden, 2007 and plays an important role as a wood decomposer from temperate to tropical forests. It contains many species that have been traditionally cited worldwide, and its taxonomy has rapidly changed in recent years (Riebesehl & Langer, 2017). In addition, despite their macroscopic basidiocarps, the morphological traits used to distinguish among closely related species are highly homoplasic, making them prone to errors in specimen identifications.
The aim of the present study was to analyze the effect of taxonomic uncertainty and misidentifications in reference collections and sequence databases on species distribution models in Xylodon.
We analyze the possible effects in two ways: First, we assessed whether a greater hidden diversity could be masked under a single species name by analyzing sequences from the ITS DNA region with barcoding gap analysis (Puillandre et al., 2012;Schoch et al., 2012); and second, we constructed species distribution models following both identification criteria (names on vouchers collections; species candidates obtained from barcoding gap analysis) and analyzed differences in the contribution of predictor variables and the distribution area sizes.

| Species studied and selection of material
A general search of preserved specimens in the Global Biodiversity Information Facility (GBIF) confirmed the worldwide distribution of presence records assigned to these three morphospecies: Xylodon flaviporus, X. paradoxus, and X. raduloides (Figure 1). These three Xylodon species traditionally known as being widely distributed were selected to discuss the effect of taxonomic uncertainty on biogeographical hypotheses supported by species distribution models. Those species have been traditionally located in Schizopora, but recent studies demonstrated that it is not possible to separate Schizopora from Xylodon on a morphological or molecular basis (Riebesehl & Langer, 2017), and therefore, Schizopora species are currently integrated into Xylodon. (Appendix S1) and has been reported from numerous hardwood substrates, such as Castanea, Eucalyptus, Fagus, and Quercus, but also on conifers, such as Picea and Pinus. Xylodon flaviporus has been considered as distributed worldwide (Figure 1), especially in warm and tropical zones. It has been reported from around the world: South America, Africa, southern Europe, and South Asia (Gilbertson & Ryvarden, 1987;Núñez & Ryvarden, 2001;Paulus et al., 2000;Ryvarden & Melo, 2014;Wu, 2000).
All the available collections of these three morphospecies were studied from a total of five fungaria, CFMR, MA-Fungi, NY, O, and PDD (Table 1). Label information was used to assign the geographic location for each record (unprojected coordinates, WGS87 datum).
When an exact location was not provided, label information such as towns or kilometer points along roads was used to obtain geographic coordinates. Only records with known coordinate uncertainty of less than 5 km (i.e., those which on average could be placed in a single 10 × 10 km cell) were considered. A basidiome fragment from fungarium specimens (less than 10 mg) was removed to perform molecular analyses. F I G U R E 1 Presence records for three Xylodon morphospecies traditionally considered as widely-distributed, from the Global Biodiversity Information Facility database (GBIF). These occurrences correspond only to preserved specimens in natural historical collections. DOIs: Xylodon flaviporus https://doi. org/10.15468/ dl.tvfuk9; Xylodon paradoxus https://doi.org/10.15468/ dl.mwpda3; Xylodon raduloides https://doi. org/10.15468/ dl.wkcufk TA B L E 1 Selected specimens and species assignation following names in fungarium and sequence vouchers (Data Set 1) and following barcoding gap species recognition (Data Set 2, Figure 2 FERNÁNDEZ-LÓPEZ Et aL.

Label/ Voucher species name (Data Set 1) Country BGSR (Data Set 2) Collection number
GenBank Accession n.º  Whiting et al. (1997). In order to detect species candidates, a barcoding gap approach was utilized, using the Internal Transcribed Spacer (ITS) because this region is a universal barcode across fungi, able to detect genetic variability at the species level (Schoch et al., 2012). The ITS5/ITS4 (White, 1990)

| Species distribution modeling
Three kinds of predictor variables were included in distribution models, representing different factors that usually affect species distributions: abiotic, biotic, and geographic variables (Soberón, 2007 was also included as a predictor since Xylodon species are wooddecay fungi and depend on the existence of wood to grow and maintain their populations. Finally, to include pure geographic constraints that could affect species distributions limiting their dispersal or colonization capacity, latitude and longitude were included as predictor variables (Acevedo et al., 2012). Due to the circular character of longitude, the sine and cosine components were used instead (Pewsey et al., 2013). All predictor layers were used at 10 × 10 km resolution grid.
Two data sets of presence records were built, depending on the species recognition criterion used to create them from the specimens analyzed: in Data Set 1, modeling groups were made based on taxonomic information of each specimen as recorded in fungaria or sequence databases (Table 1); in Data Set 2, barcoding gap analyses results were used to re-group specimens following candidate species proposed by molecular barcoding analyses (BGSR). When the number of presences reported for a species candidate by BGSR was too low (less than 6), distribution model was not performed for such species candidates due to the small sample size (Pearson et al., 2006;van Proosdij et al., 2016). The modeling approach was exactly the same for all arrangements in both data sets. We used the MaxEnt algorithm to conduct distribution models (Phillips et al., 2006(Phillips et al., , 2017. MaxEnt has been reported to perform well when only presence data (i.e., museum and herbarium/fungarium data) are used, as in our case. Moreover, this algorithm has demonstrated acceptable accuracy for small-sized samples (Pearson et al., 2006 (Merow et al., 2013) and regularization multiplier = 1 and cloglog output were selected (Phillips et al., 2017). In order to control the possible sample bias from our presence data sets, a layer of human footprint index was included as bias grid in MaxEnt, to represent those areas with more human accessibility as more probably sampled (Elith et al., 2011;Phillips et al., 2009). This index was obtained from "Last of the Wild Project", version 2. It consists of an overlay of a number of global data layers F I G U R E 2 Neighbor-joining distance tree for the whole ITS nrDNA matrix used in this study. Big clades indicate sequences used in the three independent barcoding gap species recognition analyses conducted. Species names in fungarium and sequence vouchers and species candidate arrangements obtained from barcoding gap species recognition analyses are showed in tree tips

| RE SULTS
A total of 150 collections were considered in this study, of which 83 were newly sequenced (Table 1). Following genetic distance tree results (Figure 2), sequences were separated into three major clades, performed, those groups for which sample size was greater than 6 specimens (see Table 3, Figure 2).
The importance of each predictor in distribution models is shown in Table 3. Climatic variables obtained the highest percent contribution in models from names in fungarium and sequences vouchers (71.37% on average), followed by geographic predictors (21.64% on average) and finally by biotic variables (tree cover, 6.98% on average). However, for the models performed from species candidates detected by BGSR, the contribution of climatic variables was generally lower (41.68% on average; but see SC-A4, SC-B5, and SC-C5). Geographic predictors were most important for five of the nine species candidates under BGSR (53.74% of contribution on average), and tree cover had the least predictive value (6.98 percent contribution on average).

Distribution models built from names on voucher collections
showed worldwide distributions and lacked biogeographic patterns ( Figure 3). The extent of those distributions ranged from 14% to 19% of total worldwide emerged lands (Table 3).

TA B L E 3
Modeling results using both data sets: (1) Label and sequence voucher names and (2)  F I G U R E 3 Presence records and distribution models for specimens arranged following labels and vouchers species names F I G U R E 4 Presence records and distribution models for specimens arranged following ITS barcoding gap analyses (BGSR) In contrast, distribution models obtained for species candidates detected by BGSR showed in most cases local or restricted distributions ( Figure 4). The distributions predicted from these models were in general smaller, with the exception of the species candidates SC-B5 and SC-C5. AUC values were always high, independent of the arrangement criterion used, with minimum and maximum between 0.90 and 0.99 (Table 3).

| D ISCUSS I ON
The development of new statistical tools to predict species distributions has promoted the use of presence-only databases such as natural history collections (Elith & Leathwick, 2007). Herbaria/ fungaria or museums have been an important source of vouchered records to address biogeographical studies in macrofungi (Wollan et al., 2008). Nowadays, those techniques are commonly applied for a broad range of purposes, from assessing pest invasion risks to conservation management (Franklin, 2013). They have also been used to evaluate the environmental factors that drive fungal species distributions (Wollan et al., 2008;Yuan et al., 2015) or to predict the potential distribution of ectomycorrhizal fungi under different climate change scenarios (Guo et al., 2017). However, the effects of taxonomic uncertainty have rarely been assessed in fungal distribution models (Elith et al., 2013). Xylodon is an appropriate case study to understand those effects due to its high diversity and the lack of macroscopic diagnostic characters in many of its species (Riebesehl & Langer, 2017).
Our results distinguished up to 19 species candidates under only three species names using molecular tools ( Figure 2, Table 3).
Although these species candidates are not all confirmed because a deeper study is needed, it draws a more realistic picture about the actual diversity in our presence records. In fungi, it is becoming commonplace to detect many phylogenetic species when molecular data are analyzed for within a single morphospecies (Cai et al., 2014;Fernández-López et al., 2020). Taxonomic issues are not fully solved in our analyses since only one DNA region was used, and multiple sources of evidences in an integrative framework are recommended to correctly define species boundaries (Dayrat, 2005). However, it has been demonstrated that the ITS barcoding region generally performs well in fungal species delimitation (Schoch et al., 2012) and barcoding region analyses are broadly used in fungal environmental studies (Tedersoo et al., 2014). Therefore, the species candidates delimited in this study are an appropriate first step to understand the complexity in the available Xylodon data in different reference collections. A deeper study of those candidates could be useful to detect new morphological or ecological traits to distinguish among species in Xylodon.
Genetic analyses pointed toward two sources of misleading information in the studied material: first taxonomic uncertainty through cryptic speciation processes inside each morphospecies, since several subclades could be distinguished in the three major clades delimited (Figure 2), and second, a significant amount of incorrect identifications even for the broadly defined morphospecies, especially between Xylodon paradoxus and X. raduloides. These results could be expected due to the morphological similarities of these two species. In addition, X. raduloides was split from X. paradoxus only in the late twentieth century (Hallenberg, 1983), and therefore, it is probable that several X. raduloides collections were still labeled under its old name.
Despite the relatively small sample size used in this study, our presence records described well the scope of the general distribution of the material available in reference collections (Figures 1 and   3). Predicted areas from models using label/voucher information described cosmopolitan distributions for the three morphospecies.
Predicted areas occupied up to 19% of the world's emerged lands, and the three morphospecies can be found in Africa, America, Asia, Europe, and Oceania. However, models derived from the molecular analysis showed local or restricted distributions in most cases (except SC-B5 and SC-C5), with a biogeographic pattern (Figure 4).
These reduced distributions support a more realistic picture of fungal diversity, since it has been demonstrated that genetic lineages remained at least partially isolated from each other in many fungi (Peay et al., 2010;Sato et al., 2012). It should be noted that the number of presence records for most of the candidate species is too small (Data Set 2) to affirm that predicted distributions reflect the actual species range, that is, species candidates SC-A3 and SC-A4 (Table 3).
Thus, the lack of occurrences scattered over the actual species range could produce overfitted predictions, and therefore, distribution ranges can be underestimated. Moreover, the addition of new presence records could affect the distribution pattern described for each species candidates, especially for those with a smaller sample size.
However, differences in distributions obtained between Data Set 1 and Data Set 2 are in accordance with similar patterns that have been reported in many Basidiomycota, for which there has been a transition from a few cosmopolitan species to numerous species with a regional distribution (Petersen & Hughes, 1999). The distinct geographic distributions of each lineage in the molecular analysis is in itself support for recognition of the lineages as distinct taxonomic entities, although distribution on its own would not be sufficient for recognition of segregate species.
Among the species candidates delimited by the BGSR approach, SC-B5 and SC-C5 maintained a worldwide distribution, with no biogeographic pattern supporting a genetic structure (Figure 4).
In the case of SC-B5, this is due to two specific samples (one from France and another from Canada) and could be explained by humanmediated translocation, commonly reported for wood-decay fungi, for example, timber trade (Fernández-López et al., 2019;Paulus et al., 2000), since the rest of the samples are located in Australia-New Zealand. On the other hand, SC-C5 presents a much more complex pattern, with closely related genetic samples distributed around the world. This pattern could be due to the inability of the barcoding approach to distinguish between these close-related species and therefore other sources of evidence or more DNA regions should be used to confirm this result (Balasundaram et al., 2015;Martín et al., 2018). Nevertheless, the hypothesis that the specimens arranged in this group conform a species with a worldwide distribution cannot be discarded. New methodologies that explicitly account for long-distance dispersal should be performed to resolve this issue.
The distribution predicted for species arranged following fungarium and sequence vouchers was mainly driven by climatic predictors rather than geographic or tree cover predictors (Table 3).
It has been demonstrated that variables such as temperature or precipitation play a central role in fungal distributions (Hao et al., 2020). However, for distribution modes derived from the genetic barcoding gap approach, although climatic factors remained important, they generally lost part of their predictive power in favor of geographic variables (Table 3). In our analyses, tree cover had less contribution than climatic or geographic factors. However, its contribution could be masked by climatic factors due to collinearity among predictors, and therefore, it should not be evaluated.
Moreover, the resolution of cartographic layers could be too low to reflect the actual wood availability in small patches, where corticioid fungi can be present in isolated trees (Abrego et al., 2017).
It is important to highlight the inability of internal model valida-  (Lobo et al., 2008). Since our study area is worldwide, AUC scores for our models could be a misleading measure of model performance.

| CON CLUS ION
Our results demonstrate the important role that taxonomic uncertainty plays in the inferences obtained from species distribution models (Elith et al., 2013). Distribution patterns obtained from models based on names on fungarium collections and sequence vouchers appear to support the Baas Becking hypothesis "Everything is everywhere, but environment selects" in Xylodon. These unrealistic and overestimated distributions could similarly be assumed for other species that are involved in conservation programs or pest management plans, resulting in biological and economic loses (Bortolus, 2008;. In this context, preserved specimens in natural history collections offer the possibility to reevaluate occurrence data sets by sequencing when taxonomic uncertainty may compromise the results obtained from species distribution models (Elith & Leathwick, 2007).

ACK N OWLED G M ENTS
Thanks to the curators of CFMR, NY, O, and PDD for their invaluable assistance arranging specimens and culture loans.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
DNA sequences and voucher information: Genbank accession num-