Phylogenetic diversity and conservation of crop wild relatives in Colombia

Abstract Crop wild relatives (CWR) are an important agricultural resource as they contain genetic traits not found in cultivated species due to localized adaptation to unique environmental and climatic conditions. Phylogenetic diversity (PD) measures the evolutionary relationship of species using the tree of life. Our knowledge of CWR PD in neotropical regions is in its infancy. We analysed the distribution of CWR PD across Colombia and assessed its conservation status. The areas with the largest concentration of PD were identified as being in the northern part of the central and western Andean mountain ranges and the Pacific region. These centres of high PD were comprised of predominantly short and closely related branches, mostly of species of wild tomatoes and black peppers. In contrast, the CWR PD in the lowland ecosystems of the Amazon and Orinoquia regions had deeply diverging clades predominantly represented by long and distantly related branches (i.e. tuberous roots, grains and cacao). We categorized 50 (52.6%) of the CWR species as ‘high priority’, 36 as ‘medium priority’ and nine as ‘low priority’ for further ex‐situ and in situ conservation actions. New areas of high PD and richness with large ex‐situ gap collections were identified mainly in the northern part of the Andes of Colombia. We found that 56% of the grid cells with the highest PD values were unprotected. These baseline data could be used to create a comprehensive national strategy of CWR conservation in Colombia.

. According to recent studies focusing on global conservation of CWR Vincent et al., 2019), Colombia is not considered a high priority region for sourcing CWR of global importance.
In terms of CWR global hotspots, Colombia does not appear in the areas of high CWR diversity and centres of crop origins (Kell et al., 2017;Khoury, Carver, et al., 2019;Maxted et al., 2007;Maxted & Vincent, 2021). Consequently, Colombia remains largely unexplored as a region of importance for the biodiversity of CWR that are not yet formally recognized but which possess great potential. This might be because Colombia's CWR are poorly documented and do not represent a large proportion of the main global food crops or their centres of origin (Cañas-Gutiérrez et al., 2019;Couvreur et al., 2007;Jarvis et al., 2008;Ocampo et al., 2010).
Biodiversity is often measured by counting the number of species in an area (species richness). This is an informative metric but does not indicate the diversity of the tree of life in an area (Mishler, 2010). A key method to quantify evolutionary diversity in the tree of life is phylogenetic diversity (PD; Faith, 1992). PD measures evolutionary diversity by summing the lengths of branches connecting the tips of a phylogenetic tree, normally to the root of the tree but sometimes only to the most recent common ancestor. The PD metric is a key tool in the identification of evolutionary relationships across space, hence improving our capacity to measure important genetic resources (Faith, 2013). PD is recognized as one of the flagship conservation metrics to maximize protection of biodiversity (Daru et al., 2019;Forest et al., 2007;Gumbs et al., 2020;Laity et al., 2015;Rosauer et al., 2017;Tolley et al., 2016;Zhang et al., 2016).
The Convention on Biological Diversity (CBD) declared the tree of life, or PD, as an effective way of preserving biodiversity for people because it maintains the benefits of nature to humanity (Gumbs et al., 2020). In practical terms, PD not only represents a variety of evolutionary features and heritage of species but also safeguards key sections of the tree of life and therefore the potential uses of unexplored biodiversity under a current changing environment and its future threats.
Although PD is recognized as a key biodiversity indicator, finding available data to quantify PD in tropical regions is a challenge. This aspect, combined with the fact that Colombia is a potential source of unexplored CWR diversity, makes our case study a research priority for the in situ conservation of native genetic resources. To illustrate the importance of documenting CWR diversity in regions such as Colombia, here we will mention the example of the cacao biological expeditions, CacaoBIO (https://bit.ly/3j72GKz). The Colombian regions that have not been explored for more than 50 years due to armed conflict. A key result of this study was finding a large diversity of wild Theobroma cacao L. as well as extant plants of the genus Herrania which is sister to Theobroma (González-Orozco, Sanchez, et al., 2020). The PD of wild cacao in Colombia has not yet been fully quantified but sampling a wide variety of CWR gene pools increases the chances of using this cacao CWR biodiversity in future genetic studies.
There are many PD-based approaches that can be applied, depending on the extent and target group under study. However, there is not a single most comprehensive PD metric (Cadotte et al., 2010). When applied spatially, these alternate metrics fall under the umbrella of spatial phylogenetics (Azevedo et al., 2020;Gonzalez-Orozco et al., 2016;Laffan, 2018;Laffan et al., 2010;Mishler et al., 2020;Scherson et al., 2017;Thornhill et al., 2017). In this research, we apply some of the derived measures of spatial PD developed and tested extensively in different biological groups, continents, and regions (Mienna et al., 2020;Mishler et al., 2020). Relative phylogenetic diversity (RPD) identifies locations with unusual concentrations of long or short branches of the tree of life by comparing PD calculated using the observed tree with that calculated using a tree with the same topology but where all branch lengths are set to the mean non-zero branch length. This type of information helps to reveal evolutionary patterns such as where assemblages are more closely or distantly related over time . A key part of spatial phylogenetics is the use of randomization approaches to assess significance of observed diversity patterns, enabling the identification of distributions that are more extreme than expected under a random scenario (Mishler et al., 2014).
Regarding the application of PD to close relatives of crops, Jovovic (2020) noted that CWR are a fundamental element in modern agriculture because they provide important genes for plant breeders. Therefore, it is important to investigate the use of PD for the conservation of wild relatives. Particularly, a stable conservation status of the CWR species in Colombia is lacking. One of the newly developed tools to assess species status in conservation is the GapAnalysis R package (Carver et al., 2021). Gap analysis assessments are extremely useful for conservation planning of genetic resources because they help identify areas where more species collections are required (Khoury, Carver, et al., 2019). However, they have not yet been used to explore the conservation status of PD.
In this study, we identify the major centres of PD CWR in Colombia. We also investigate how well-conserved high areas of PD and species richness are, as well as identify regions where additional areas for further collecting are needed to plan ex-situ and in situ conservation of Colombia's CWR.

| Species occurrences
To determine which CWR species to include in our study, we used the species list and distribution records of CWR for Colombia from the Global Biodiversity Information Facility (GBIF) database version 1.12, generated by the International Centre for Tropical Agriculture (CIAT)- (CWRODC, 2018). To create the spatial data set, we extracted all geocoded CWR species records for Colombia from the CWR GBIF database (GBIF, 2019). We used the accepted CWR taxonomic names listed in CWR GBIF to filter the records to be extracted from version 4 of the Botanical Information and Ecology Network (BIEN) data set (Maitner et al., 2017) Table S2 for genus list; and Dataset S1 for final spatial data and species list with botanical families). These data were imported into Biodiverse version 3.0 (Laffan et al., 2010) and aggregated to square grid cells with a spatial resolution of 0.1° (~10 km). This resulted in a total of 1801 grid cells spanning continental Colombia.

| Assembly of molecular data and phylogenetic analyses
A list of Colombian CWR plants was used to search GenBank using Matrix Maker (Freyman & Thornhill, 2016). Sequences of seven loci were searched for-trnL, matK, ITS, rbcL trnL-trnF, atpB and matR (accessions for each locus are in Table S3). Individual alignments of each locus were created using MaffT version 7 (Katoh, 2013) and concatenated using SequenceMatrix (Vaidya et al., 2011). A maximumlikelihood analysis was performed on the concatenated alignment using RAxML in the Cipres Portal (Miller et al., 2009). Most of the nodes of the phylogeny have bootstrap support values of 100% across the tree ( Figure S2). The phylogenetic tree is available in Dataset S2. The resulting tree was exported and converted to nexus format using FigTree version 1.3 (Rambaut, 2009).

| Species richness and sampling redundancy analyses
We used the Biodiverse software (Laffan et al., 2010), version 3.0, to calculate redundancy and species richness (SR) indices for each 0.1-degree grid cell. Redundancy represents the ratio of species to samples per grid cell and has values in the interval [0,1]. Values close to 1 are well sampled, while those near zero have poor sample redundancy. Species richness represents the total number of unique taxa in each grid cell.

| Phylogenetic diversity analyses
Biodiverse version 3.0 (Laffan et al., 2010) was used to calculate a set of spatial phylogenetic indices for the 0.1-degree grid cells for the combined GBIF and BIEN data set. As defined in the introduction, observed PD and RPD were calculated using branches connecting the terminals to the root branch. The statistical significance of the PD and RPD values was estimated using a randomization process in which taxa were randomly allocated across the landscape, but where the range of each taxon and the richness of each cell were held constant (Laffan & Crisp, 2003;Mishler et al., 2014;Thornhill et al., 2016). A total of 999 random realizations were run and compared against the observed values to estimate rank-relative significance of observed PD (one-tailed high test) and RPD (two-tailed test). A significantly high PD score indicates there is more of the tree in a region than expected, while a significantly low PD score indicates there is less of the tree in a region than expected. For RPD, a significantly high value for a region indicates an over-representation of long branches, while a significantly low value indicates an over-representation of short branches (Mishler et al., 2014).

| Species distribution modelling
In preparation for the gap analysis, the maximum entropy (MaxEnt) algorithm (Phillips et al., 2006) was applied to produce potential ecogeographic suitability models for 101 of the 185 species of CWR included in our study (Table S1). Thirty per cent of the occurrences per species were used to conduct a random test of the samples. The default setting was used for the remainder of the modelling parameters. The 89 species for which the MaxEnt models were not run had insufficient conditions to satisfy the required parameters. The AUC value used to optimize predictability was >0.7, and replicates were set to 5. Seven climate variables representing the average climatic history of Colombia from 1980 to 2010 (mean annual precipitation, average temperature, minimum and maximum temperature, relative humidity, solar radiation, wind speed) with a spatial resolution of 3 × 3 km were used as predictors for the MaxEnt modelling (Agrosavia, 2014;Alzate-Velásquez, 2017González-Orozco, Porcel, et al., 2020). Suitability values >= 0.5 were labelled as 1 and those <0.5 were labelled as 0 to generate a binary classification required as an input of the GapAnalysis. Further, using the nearest neighbour resampling method, the MaxEnt rasters were masked using a Colombia protected areas layer using a buffer of 5km following the default sug-

| Conservation gap analysis of CWR species
We assessed the degree of representativeness of 101 species of CWR in an ex-situ conservation system using the R package GapAnalysis (Carver et al., 2021). The 89 species of CWR that did not fulfil the gap analyses criteria were assigned as high priority for further collecting in both in situ and ex-situ systems. This way we were able to use the full set of 185 species used in the PD and species richness analyses. This approach estimates eight metrics of conservation representativeness. For both the ex-situ and in situ metrics, the indices calculated were sampling score (SRSex-in), geographic score (GRex-in), ecological score (ERSex-in), and the final conservation score (FCSex-in). A further index is a combined metric, or FSC-mean, which was calculated for 95 species by averaging the final ex-situ FCSex and in situ FCSin scores. To assess the ex-situ representativeness of the species used in the analysis, the online plant genetic resource platform 'Genesys' (https://www.genes ys-pgr.org/) was searched for the records included in our sample that was conserved in any of the germplasm collections mentioned in Genesys. The FAO WIEWS data set was also assessed as part of the review for building our data set of germplasm collections, but its information was already contained in Genesys. The final combined per-species scores were assigned a series of status conservation categories according to Khoury, Amariles, et al. (2019). Finally, a predicted species richness gap analysis was calculated summing up the SDM binary MaxEnt rasters for the 101 CWR species.
For the cases where CWR species had ex-situ gap analysis results, a PD conservation gap richness indicator was developed. The observed PD raster (10 km) was resampled to fit the spatial resolution of the modelled species (5 km) using a bilinear function. Then, the PD and gap richness maps were standardized dividing by their respective maximum values to obtain values from 0 to 1. Finally, the standardized PD and ex-situ gap richness for further collecting were averaged to generate the PD conservation gap richness indicator.
This new spatial indicator of PD conservation gap richness allowed us to prioritize areas where there are important ex-situ sites that require further ex-situ collecting (i.e. germplasm banks). To complement the PD gap richness indicator, a PD and predicted species richness indicator was calculated as the sum of the binary SDMs for the 101 species instead of using gap richness. This PD and predicted species richness indicator allowed us to prioritize areas where there is a greater concentration of both species richness and PD.

| Conservation assessment of PD
An assessment of PD conservation was conducted using the spatial intersect tool in QGIS. Grid cells with the highest values of observed SR and PD (top 5-95 quantile), and significantly high and low grid cells of PD and RPD that overlapped with the protected areas of each department in Colombia was counted. Consequently, a percentage of regional representativeness of PD inside protected areas per department was estimated. This assessment did not include any information from the Gap Analysis.

| Phylogenetic tree
Our phylogenetic tree comprises 185 CWR species across 11 clades ( Figure 1; Figure S2). The tree represents 17 of the 29 major gene pools of global crops prioritized by the CWR diversity project and recognized in the International Treaty on Plant Genetic Resources (Maxted & Kell, 2009).

| Observed patterns of species diversity
We found four main regions of high concentration of CWR SR in

| Observed patterns of PD
We found that the mountainous regions in the northern and central mountain ranges of Colombia host the greatest amount of PD. Macarena. One small group was in the 'Puerto Carreño' region of the Vichada department. These areas of high PD suggest that there is more of the tree and that those taxa were less closely related than would be expected by chance. We found incomplete representation of CWR PD in the Amazon and Orinoquia regions.
Four areas of significantly low RPD were found in places different to significantly low PD areas (locations 1-4 in Figure 2d). This suggests that CWR species in these three areas have shorter than expected branches, but are not necessarily more closely related than in the main centre of significantly low PD. These Andean sites might indicate places with the potential for evolutionary adaptation and speciation. The areas of significantly high RPD in the Andean region of Antioquia (location 1 in Figure 2d) indicate branches significantly longer than expected. There were also some scattered grid cells with significantly high RPD found in the eastern and northern parts of the country including the Orinoquia, Amazon and Caribbean regions ( Figure 2d). These cases might indicate places with the potential for keeping unique evolutionary history because they have many long branches.

| Conservation Gap analysis
The mean final conservation score (FCSc-mean) across all species was 25.5 on a conservation status score of 0-25 (very poor) and 75-100 (comprehensive) with scores ranging from 0 to 90.96 (Table S3).
The average ex-situ conservation score across species was 33.05 and 18.05 for the in situ conservation (Figure 3). We found that 50 species (52.6%) were assessed as high priority, 36 (37.8%) medium priority and 9 (9.4%) low priority for further collecting to address gaps in ex-situ conservation. For the case of in situ conservation, 73 (76.8%) were assessed as high priority, 21 (22.1%) medium priority and 1 (0.9%) low priority for further collecting to address gaps in in situ conservation.
The ecological representation of CWR species that have been collected for conservation repositories indicated to be a high priority, with a mean ecological representativeness score (ERSex) of 21.6, compared with 31.1 for the geographic score (GRSex). For the case of ex-situ conservation, a total of 51 species (53.6%) were assessed as high priority, 26 (27.3%) low priority and 18 (18.9%) sufficiently conserved for further collecting to address gaps in ex-situ conservation ( Figure 3). The ecological representativeness (ERSin) of CWR F I G U R E 1 Phylogenetic tree of 185 species of CWR of Colombia representing 11 clades numbered as the main contributing groups species showed a low priority with a mean score of 76.9, compared with 17.4 for the geographic score (GRSin). For this case of in situ conservation, a total of 77 species (81%) were assessed as high priority, 23 (24.2%) medium priority and 1 (0.9%) low priority for further collecting to address gaps in in situ conservation (Figure 3).
Although predicted ranges were high based on the ERSin, species were poorly represented in protected areas with the mean final in situ conservation score (FCSin) of 18.4. In contrast, FCSex presented an score of 33.1, suggesting that more effort need to be put on further collecting ex-situ diversity of CWR outside protected areas.

| Phylogenetic diversity Gap conservation indicator
The spatial distribution of PD and predicted species richness indicator shows that the mountainous areas of the Andean region have the highest concentration of PD and richness (Figure 4)

F I G U R E 3 Conservation gap analysis
results for 95 species of CWR in Colombia. Species are listed by ascending priority for further conservation action by priority categories (HP = high priority; MP = medium priority; LP = low priority; SC = sufficiently conserved). The red X represents the final conservation score combined (FCSc-mean) for the species which is the average of the final ex-situ (FCSex, green X) and in situ (FCSin, blue X) scores richness conservation gaps, these areas are suggested as the main candidates for further collecting of species CWR in Colombia.

| DISCUSS ION
Crop wild relatives are a priority because they are genetic resources for agriculture (Miller & Khoury, 2018). Despite recent advances in exploring native biodiversity (Aitken et al., 2008;Faith, 2013;Jarvis et al., 2008;Sgrò et al., 2011;Winter et al., 2013), an important but unanswered question is 'What are the main consequences of PD loss?' (Uchida et al., 2019).
Most PD studies are focused on investigating PD directly associated with species of native flora, but more studies should be conducted linking CWR PD and conservation (Park et al., 2020;Zhang et al., 2015). According to the International Union for Conservation Particularly, exploring the PD of CWR is fundamental to identify cross-compatibility between crops and their wild relatives which is an essential aspect for the future of agriculture (Viruel et al., 2021). Pironon et al. (2020) proposed the idea of unifying a concept of agro-biodiversity when centres of diversity of wild and domesticated biodiversity are considered.

| PD spatial distribution and centres
We analysed spatial patterns of PD for 185 CWR taxa and found novel areas of relevance to global CWR assessments (Figure 1). This TA B L E 1 CWR conservation assessment based on the percentage of observed species richness (SRob), observed PD (PDob), randomized PD (PDr), and randomized RPD (RPDr) hot spots per department present inside protected areas of Colombia

| Centres of biodiversity and PD of CWR: Use in agriculture
Most of the literature on spatial phylogenetics focuses on the analysis of patterns of regional biodiversity Dagallier et al., 2020;Garcia-R et al., 2019;González-Orozco et al., 2015;Laffan et al., 2016;Mekala et al., 2019;Scherson et al., 2017;Sosa et al., 2018). There are comparatively few applications in the agricultural sciences (Jovovic et al., 2020;Martin et al., 2019;Perales & Golicher, 2014;Turley & Brudvig, 2016). PD can be an effective means to explore the effect of agriculture on evolutionary diversity; for example, Turley and Brudvig (2016) showed that landscapes with an agricultural history had a decrease in PD of plant communities.
They found that plants became more closely related across time, leading to an increase in phylogenetic clustering, and suggesting a homogenization of the diversity of lineages in the tree of life.
Areas of high native biodiversity and agronomical resources of CWR could be a potential source of genetic resources well adapted and resilient to modern challenges (Pironon et al., 2020). In a recent study, González-Orozco (2021) identified three main centres of species richness and 25 areas of high endemism for the native terrestrial species of plants found in Colombia.
In the same region of high diversity of the terrestrial flora of Colombia, we found areas of significantly high CWR PD with an over-representation of short branches. Such patterns are indicative of phylogenetic clustering (Webb et al., 2002). Our results exem-

| Conservation of CWR species and PD
The taxonomic diversity of CWR included in this study represents 18 plant genera and 13 families (Table S2). Solanum, Piper, Ficus and Ipomoea are the genera that have the largest number of CWR species represented in the tree. There are still many other CWR taxa in Colombia that would require attention. It is therefore important that the taxa in Table 2 are used as a starting point in building further species lists for CWR prioritizations in Colombia.
Overall, 52.6% of the wild relatives in this study were assessed as high priority for further preserving in situ and collecting for ex-situ conservation. As additional metrics providing further detail to these results, we recommend using both the PD conservation gap richness and the PD-predicted species richness.
The uncertainty of where to look for relevant regions of biodiversity is a disadvantage in hyperdiverse countries such as Colombia. Therefore, the first task is to identify areas of greatest diversity (González-Orozco, 2021). However, to date, maps of critically important CWR PD have not been available for Colombia. The Informing conservation based on the results of the PD spatial patterns and species relationships is another use of the data (Table 1). For example, they could be used to create a discussion about which part of the tree of life is more strategic to preserve.
However, it is not for us to decide which option is better or worse.
Our PD results should be taken as a guideline but not a decisionmaking tool. We could ask the question 'would areas with long branches of limited evolutionary relationships be most important to conserve'? Or would areas with short branches of closely related evolutionary relationship and thus potentially a large amount of recent evolution/diversification be most important to conserve? We could argue that both are equally important because each of them represents a different facet of biodiversity and evolutionary history.
In a country such as Colombia which has extremely high alpha diversity, a decision on protecting areas with a high concentration of closely related taxa would be a logical conclusion. This is the case of the Andean biogeographic region, which hosts most of the hyperdiverse genera of CWR in Colombia. For example, we found that the slopes of the western range showed the main areas of significantly low RPD, meaning a high concentration of short branches, that are closely related to the genera Piper and Solanum. If the aim would be to preserve a younger evolutionary diversification, then these places in those mountainous areas would be the best candidates. However, there are some cradles of important evolutionary diversity such as the Amazon and Orinoquia lowlands. In this case, we could say that preserving areas of significantly high RPD would be the best option for conservation of ancient biodiversity. Despite the low sampling in those regions, we identify a small number of sites with significantly high RPD in the eastern lowlands of the country where genera such as Theobroma, Capsicum and Manihot were found. In the context of a mega biodiverse country with many conservation priorities other than CWR, protecting biodiversity that is younger in origin is a better opportunity than concentrating on older evolutionary clades.
However, some effort needs to be applied to strategic clades because losing ancient biodiversity such as the CWR genus Theobroma or cacao would imply compromising key biodiversity from nearer the root of the CWR tree of life.

| Limitations: Undersampling and taxonomic biases
The genetic data used here are not considered as representative of all CWR present in Colombia. However, our sampling allowed us to build a phylogeny composed of 185 CWR taxa grouped in 11 major clades of global importance (Figure 1). The CWR species in the phylogeny were chosen based on an international database of CWRs (GBIF-CIAT consortium), which is a valid species selection strategy.
Consequently, further selection of CWR species used in PD analysis should consider using a more targeted sampling to improve the representation of native genetic diversity that is underrepresented in the international databases.
Colombia´s biogeographic regions are strongly influenced by elevation (González-Orozco, 2021). A gradual increase in height above sea level generated drastic changes in vegetation zones in Colombia.
We found that the main centres of PD and RPD (Figure 2c,d) follow specific elevational trends, which suggests that the CWR present there are adapted to different climate conditions. This study had better elevational representation of CWR from low to mid-elevations (1000-1500 masl). Unfortunately, due to a lack of readily available data, the distribution of CWR in very high or low elevations is not as well sampled and are therefore underrepresented. In Colombia, the high-altitude zones above 2000 masl are more impacted by land-use changes due to the expansion of cities in the Andes. The other possibility is that less CWR diversity occurs in the highlands. The midlands (below 1500 masl) on the other hand have been more recently modified by human impacts such as deforestation (Etter et al., 2006(Etter et al., , 2008 Debouck et al., 1993;Ramirez-Villegas et al., 2020). However, even for these well-documented and comprehensive crop-related groups, their data are not fully represented in the most common opensource databases such as GenBank, GBIF and BIEN.

| CON CLUS IONS
The CWR in the Amazon and Orinoquia regions are undersampled.
The Andean mountains are the main reservoir of in situ and ex-situ conservation of CWR PD in Colombia. Fifty-two per cent of the CWR species ranked as 'high priority' and were poorly represented in germplasm databases and protected areas that possess the most ideal geo-ecological conditions. The geographic gaps in both ex-situ and in situ conservation of CWR largely aligned with areas of high concentrations of CWR PD and species richness. However, we found new areas of significant PD that are unexplored.
This study is the first attempt to quantify the PD of CWR in Colombia.
We identified one major centre of PD on the northern part of the central and western ranges part of the Andean region of Colombia where we mostly found short branches. The other important areas of PD were in the Pacific region scattered in the southwest and the east of the country where we mostly found long branches in the lowlands. There is an urgent need to generate a list of CWR in Colombia that includes the nontraditional and understudied wild crops. If not, we will continue to have an imbalanced understanding of CWR diversity in Colombia.

ACK N OWLED G EM ENT
The authors would like to thank the Corporación Colombiana de Investigación Agropecuaria-Agrosavia for providing funding.

CO N FLI C T O F I NTE R E S T
The authors declare they have no conflict of interest regarding the data or inferences discussed in this manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available in the Supporting Information of this article.