Modelled distributions and conservation status of the wild relatives of chile peppers (Capsicum L.)

To fill critical knowledge gaps with regard to the distributions and conservation status of the wild relatives of chile peppers (Capsicum L.).


| INTRODUC TI ON
Crop wild relatives-the wild progenitors and closely related species to cultivated plants-have provided many important agronomic and nutritional traits for crop improvement (Dempewolf et al., 2017;Hajjar & Hodgkin, 2007). As populations of some of these taxa are adapted to extreme climates, adverse soil types, and important pests and diseases, they may provide key traits for the adaptation of crop plants to emerging and projected future challenges (Dempewolf et al., 2013).
Knowledge gaps with regard to wild genetic resources, including information on species' taxonomy and relatedness to pertinent crops (i.e., gene pool assignments), geographic distributions, and values for traits of interest, constrain their potential use in plant breeding (Dempewolf et al., 2017;Miller & Khoury, 2018). Such knowledge gaps also affect conservation efforts, which are essential to protect vulnerable populations from habitat destruction, over-harvesting, climate change, pollution, and invasive species (Bellon, Dulloo, Sardos, Thormann, & Burdon, 2017;Díaz et al., 2019;Jarvis, Lane, & Hijmans, 2008), and to ensure that these genetic resources are safeguarded for the long term and available for research in ex situ plant conservation repositories (Castañeda-Álvarez et al., 2016;Gepts, 2006). Global analyses indicate that many crop wild relatives are poorly represented in gene banks (Castañeda-Álvarez et al., 2016) and in protected areas (Khoury et al., 2019a). These reports highlight the urgency of addressing fundamental knowledge gaps to have the information available to guide conservation and crop improvement efforts.
Three genetic (species) complexes have been recognized within the genus, based on genetic relatedness and reproductive compatibility with the domesticated taxa (Barchenger & Bosland, 2019;Emboden, 1961;Eshbaugh, 1970;Heiser & Smith, 1948;Pickersgill, 1971Pickersgill, , 1980Scaldaferro, 2019;Tong & Bosland, 1999). Each of these complexes contains both domesticated and wild taxa. Member species of the annuum complex generally have white, greenish, or yellowish flowers, and include the crop species C. annuum var. annuum, C. chinense and C. frutescens. Members of the baccatum complex typically have white flowers with yellow to green corolla spots.
Members of the pubescens complex have purple flowers. While comprehensive crossability studies between all species in the genus have yet to be completed (Barchenger & Bosland, 2019), successful assessment indicated that six taxa may be critically endangered, three endangered, ten vulnerable, six near threatened and 12 least concern.
Main conclusions: Taxonomic richness hot spots, especially along the Atlantic coast of Brazil, in Bolivia and Paraguay, and in the highlands of Colombia, Ecuador, Peru and Venezuela, represent particularly high priority regions for further collecting for ex situ conservation as well as for enhanced habitat conservation.

K E Y W O R D S
biodiversity conservation, Capsicum, Chili peppers, crop wild relatives, plant genetic resources TA B L E 1 Capsicum L. taxa and their known chromosome numbers, clades, complexes, genetic relative/potential gene pool classifications, and domestication/cultivation status hybridizations are known among species (Scaldaferro, 2019), including between those belonging to different complexes (Walsh & Hoot, 2001). Provisional clades of Capsicum species, based on their positions in phylogenetic trees using sequence-based molecular markers, have also been described (Carrizo Garcıa et al., 2016). Genetic relatedness classifications based on interfertility research, supplemented by taxonomic, phylogenetic and ploidy information, provide partial indications of the gene pools of the domesticated species (Table 1; USDA ARS NPGS, 2019).
A number of wild Capsicum taxa are harvested from nearby populations and sold in local and regional markets. For example, fruit of annuum var. glabriusculum, all of which were determined to be secure although the assessments are from the 1990s (NatureServe, 2019).
Here we use taxonomic and geographic occurrence information to model the potential distributions of all 37 currently known wild taxa in the genus Capsicum, and to characterize their ecogeographic niches. We assess the conservation status of the taxa, in gene banks and botanic gardens (ex situ), and in protected areas (in situ), and perform preliminary threat assessments.

| Occurrence information
Reference occurrence data were obtained from all records listed within the genus Capsicum from the Global Biodiversity Information Taxonomic names were standardized based on current literature (Barboza & Bianchetti, 2005;Barboza et al., 2011Barboza et al., , 2019Carrizo Garcıa et al., 2016;Jarrett et al., 2019) and a monograph on the genus soon to be published (Barboza et al., in prep). Cultivated taxa; records listed in sample status fields as other than wild, weedy or null (e.g., landrace, improved, breeding material and cultivated); fossil specimens in the GBIF dataset; and records listed in collecting/acquisition source fields as sourcing from markets, institutes and home gardens were excluded. In preparation for the conservation analysis, we classified each record according to whether it was a reference observation (labelled H, as most of these records were from herbaria), or a "site where germplasm collected" location of an existing ex situ plant gene bank or botanic garden conservation accession (labelled G, as most records were from gene banks). For GBIF, this classification was accomplished by filtering the "Basis of Record" field, assigning "living specimen" records as G, with the other categories (observation, literature, preserved specimen, human observation, machine observation, material sample and unknown) assigned as H. All records in Genesys, WIEWS and PlantSearch were assigned G, while GRIN Global records were assigned G when their status field was listed as "active" and H when "inactive". Records from the Global Crop Wild Relative Database had already been categorized accordingly. Gene bank/botanic garden (G) occurrences with detailed locality information but lacking coordinates were georeferenced using Google Earth (Google, 2019a) to maximize the comprehensiveness of the ex situ conservation gap analysis.
To review the occurrence data in preparation for distribution modelling, H and G coordinates were uploaded to an interactive mapping platform (Google, 2019b). Occurrences in bodies of water or in clearly incorrect locations were corrected or removed. Refined occurrence data were extracted for use in distribution modelling.
The final occurrence dataset is available in Appendix S1, sheet 1 in the Supporting Information.
Variables for slope and aspect were also incorporated after having been calculated from the altitude dataset using the terrain function in R package "raster" . All ecogeographic predictors were processed at a spatial resolution of 2.5 arc-min (~5 km 2 at the equator) (values available in Appendix S1, sheet 2 in the Supporting Information; raw data available from Khoury et al., 2019b).
Ecogeographic variables (per taxon) were selected using the R package "VSURF" (Genuer, Poggi, & Tuleau-Malot, 2018). All variables with no measurable impact on model performance were removed and the remaining variables were ranked in order of importance. Starting with the most important predictor, variables with a Pearson's correlation coefficient greater than a 0.7 were removed.
This process was performed for the top five predictor variables, with the remaining variables employed in the modelling process (Appendix S1, sheet 3 in the Supporting Information).
The number of comparative background points (pseudo-absences) were defined per taxon in proportion to the total area of the spatial background, which was calculated based on pertinent ecoregion boundaries, that is the ecoregions defined in Olson et al. (2001) (available from Khoury et al., 2019b) wherein occurrence data fell, bounded by pertinent country borders, with a maximum of 5,000 pseudo-absences per taxon. Pseudo-absence points that fell within the same cell as a presence point were not included.
For each taxon with at least 10 coordinates, the modelled distribution was calculated as the median of ten MaxEnt model replicates (K = 10), using linear, quadratic, hinge and product features, with a regularization parameter β = 1.0. For taxa with less than ten coordinates, the median of three replicates (K = 3) was calculated.

| Ecogeographic characterization
Ecogeographic predictor information, at a resolution of 30 arc-seconds (approximately 1 km 2 at the equator) for 23 pertinent variables (slope and aspect variables were not included as they do not provide meaningful ranges with which to distinguish variation among taxa) from the WorldClim 2 and CGIAR-CSI datasets, were extracted for all records with coordinates, for all taxa (Appendix S1, sheet 4 in the Supporting Information). These data were used to characterize taxa with regard to their potential ecogeographic niches for each variable. We also used these data to assess the representation of these niches in ex situ conservation by comparing the distributions of G points for each taxon within the full spread of its occurrences, as supplement to the conservation analysis described below.

| Conservation gap analysis
We assessed the degree of representation of each taxon in both ex situ and in situ conservation systems building on methods outlined in Khoury et al. (2019a). For ex situ, four scores were calculated.
To supplement the conservation gap analysis, we used the occur- (EN) where 100 km 2 < EOO < 5,000 km 2 or 10 km 2 < AOO < 500 km 2 , vulnerable (VU) where 5,000 km 2 < EOO < 20,000 km 2 or 500 km 2 < AOO < 2000 km 2 , possible near threatened (NT) where 20,000 km 2 > EOO < 45,000 km 2 or 2,000 km 2 < AOO < 4,500 km 2 , and least concern (LC) where EOO ≥ 45,000 km 2 and AOO ≥ 4,500 km 2 . We did not perform analyses based on rates of change over time due to the limited date information in the occurrence dataset, but provided observations based on our field experiences for some taxa.
While the metrics and observations do not provide the full set of criteria needed for Red Listing, they may offer indications of the most probable threat status of the taxa.   glabriusculum-the putative wild progenitor of C. annuum var. annuum and most widely dispersed and well-studied wild taxon in the genus.

| Distributions of wild Capsicum
Of the 29 taxa with at least ten distinct occurrences, and thus modelled with ten replicates, all passed the preset distribution modelling evaluation criteria and were therefore considered robust (Table S2.3 in the Supporting Information). The eight taxa with less than ten coordinates were each modelled with three replicates, producing ASD15 scores outside the threshold. Based on our current knowledge, we consider these models to be fair

| Ecogeographic characterization
Substantial variation with regard to ecogeographic niches was found across taxa. For example, the taxa with occurrences in the locations

| Conservation status
With regard to the conservation status of wild Capsicum in gene banks and botanic gardens, the overwhelming majority of taxa were found to be minimally or completely unrepresented ex situ. Twentythree taxa (62.2% of the total) were not represented in the available germplasm databases. An additional nine taxa had fewer than ten accessions. A total of 35 taxa were assessed as high priority for further collecting, including the two putative crop progenitors (C. annuum var. glabriusculum, with an FCSex of 6.65, and C. baccatum var. baccatum, FCSex of 20.45) ( Figure 2, Table 2; Table S2.2). Capsicum chacoense (FCSex = 27.1) was assigned medium priority, and C. cardenasii (FCSex = 82.11) was considered sufficiently conserved ex situ.
The mean FCSex across all taxa was 6.60 on the conservation status scale of from 0 to 100. Due to such a low level of ex situ conservation of these wild taxa, further collecting is needed throughout their distributions. Priorities for collecting largely mirror patterns of taxon richness, thus, uncollected populations of up to ten taxa potentially occur in the same ca.  Table 2; Table   S2.2). One taxon (C. piuranum) was determined to have no official habitat protection anywhere within its potential distribution in northern Peru. Thus, it was categorized as high priority for further action. However, protected areas were detected nearby the modelled distribution of the species. Two other taxa (C. tovarii and C. benoistii) were also assessed high priority, 21 taxa medium priority (including the two known crop progenitors), 11 low priority and two (C. galapagoense and C. villosum var. muticum) sufficiently conserved in situ.
As with the ex situ analysis, the ERSin scores per taxon were higher than the GRSin, in this case for all taxa but one (C.  Table 2; Table S2.2). The FCSc-mean averaged across all taxa was 26.29. In summary, 18 taxa were determined to be high priorities for further conservation (including  (Table 2; Table S2.4).
These results provide further support for the current Red   (Scaldaferro et al., 2018).
Focusing on the putative crop progenitors and close relatives, that is the taxa most likely to be utilized in crop breeding, which also include the wild taxa sold in local markets or cultivated in home gardens, the high priority taxon C. annuum var. glabriusculum is poorly represented ex situ with regard to geographic coverage of its potential distribution, but fairly well represented with regard to ecogeographic and ecological variation (Table 2; Figure S2.2). This taxon is potentially present in protected areas that are fairly well distributed across its modelled range from the southern United States to northern South America, and was given a preliminary designation of LC in the threat assessment. The majority of the currently rec-

| Challenges and limitations to distribution modelling and conservation gap analysis
Distributions of wild Capsicum are influenced by factors beyond the 26 ecogeographic predictors used here. These may include biotic (e.g., dispersal agents, host plants, mycorrhizae, pathogens and pollinators) and other abiotic (e.g., soil parent material and other edaphic characteristics) factors (Carlo & Tewksbury, 2014;Kraft et al., 2014;Tewksbury et al., 1999Tewksbury et al., , 2008. A number of the taxa, in particular C. annuum var. glabriusculum, but also C. baccatum var. baccatum, C. cardenasii, C. chacoense, C. eximium, C. praetermissum and C. frutescens (putative wild/feral populations), are harvested from the wild, with populations exposed to varying levels of human management and impact, which affect their distributions over the long term (Aguilar-Meléndez & Lira Noriega, 2018;van Zonneveld et al., 2015van Zonneveld et al., , 2018Villalon-Mendoza et al., 2014). Furthermore, the current ecogeographic suitability models are unable to fully account for relatively recent extirpation events of populations due to habitat degradation or destruction, for example for C. lanceolatum, thought to now be extinct in Mexico (Barchenger & Bosland, 2019).
Our results, therefore, should best be considered as planning tools to guide explorations for further confirmation in the field.

F I G U R E 3 (a)
Predicted further collecting priorities hot spots map for wild Capsicum L. taxa. The map displays richness of areas within the potential distributions of the 37 wild Capsicum taxa that have not been previously collected for ex situ conservation, with up to ten taxa in need of further collecting potentially found in the same areas. Sites where existing germplasm of taxa has been collected are overlaid. (b) Predicted further in situ protection priorities map for wild Capsicum. The map displays richness of areas within the potential distributions of the 37 wild Capsicum taxa that are outside of current protected areas, with up to ten taxa found in the same unprotected areas. Protected areas are displayed in green Biodiversity occurrence data are often spatially biased, tending to concentrate around roadways and major population centres (Stolar & Nielsen, 2015;Syfert, Smith, & Coomes, 2013). Alongside extensively reviewing the presence coordinate locations for accuracy, to mitigate the potential effect of spatial bias, we generated background points (pseudo-absences) only within the ecoregions in which the presence points were located. This limited the amount of variability present within the range of predictor values in the background dataset (Jarnevich & Young, 2019). For some taxa, it is possible that the current occurrence data did not capture the full ecogeographic range within which the species can be found. As a result, the edges of the predicted distribution models represent particularly important regions for further field exploration (Jarnevich, Stohlgren, Kumar, Morisette, & Holcombe, 2015).
With regard to the conservation analyses, openly available da- Our ecogeographic suitability model-based results did not always align perfectly with our field experience, particularly with regard to presence in protected areas. For example, our models (as well as points) for C. villosum var. muticum were determined through the analysis to overlap quite well with the protected areas listed in the WDPA. Unfortunately, for the taxon, its observed restricted distribution in fact falls just outside of protected areas, and the quality of its habitat has declined progressively during our field visits over the past six years.
While the lands listed in the WDPA hopefully afford collateral protection to wild Capsicum taxa as a result of overall land conservation practices, robust long-term protection of these plants in these areas will likely require the formation of active taxon-and  ses. Once the function of these candidate loci is established, the phenotypic effects of particular genetic variants can be mobilized for use (Tanksley & McCouch, 1997).

| Challenges to utilization of wild
Further, interspecific hybridizations can present challenges for breeding with wild species. For example, crosses made between members of the pubescens complex and other groups have sometimes resulted in unilateral incompatibility (Onus & Pickersgill, 2004;Pickersgill, 1997). Post-fertilization seed abortion or sterility in the offspring has also been reported in several interspecific crosses (Pickersgill, 1991;Smith & Heiser, 1957;Yoon, Yang, Do, & Park, 2006).
These constraints to utilization acknowledged, several successful strategies to overcome barriers to interspecific hybridization do exist in Capsicum (Yoon et al., 2006). Furthermore, for more than 20 years, genes from pepper have been moved into tomato using transgenic technologies, increasing resistance to key diseases (Tai et al., 1999). In the emerging era of genome editing, both the 12 and 13 base chromosome number wild Capsicum taxa could be useful in the development of more resilient peppers, as well as other crops, although we note that much of the environmental adaptation within wild plants is polygenic and quantitative (Tiffin & Ross-Ibarra, 2014), so there may be limits on the degree to which adaptation can be engineered. Regardless of the breeding methods used, ensuring adequate representation of these wild relatives in conservation systems, and further characterizing populations with regard to their adaptations to abiotic and biotic stresses, will provide the foundations for their more widespread use.

DATA AVA I L A B I L I T Y S TAT E M E N T
Occurrence data, processed ecogeographic data, and interactive taxon-level modeling and conservation status results and metrics are provided in the Supporting Information. Associated ecogeographic and spatial input data are available through open access repositories (Khoury et al., 2019b). All code implemented in the analysis is available at: https ://github.com/dcarv er1/cwrSDM.