Conservation status assessment of banana crop wild relatives using species distribution modelling

Crop wild relatives (CWR) are an essential source of genetic material for the improvement of certain traits in related crop species. Despite their importance, increasing public, scientific and political support, large gaps exist in the amount of genetic material collected and conserved of many CWR. Here, we construct a dataset on the distribution of wild banana species (Musa spp.) and assess their risk and conservation status. We deal with the following questions: (a) What areas are potentially suitable for wild banana species? (b) How much of the wild banana diversity is currently at risk or insufficiently conserved ex and in situ?


| INTRODUC TI ON
Crop wild relatives (CWR) are wild plant species closely related or ancestral to cultivated plants. In comparison with their associated crop, CWR often contain important traits for agriculture that are new or have been lost during domestication (Fielder et al., 2015;Hajjar & Hodgkin, 2007;Heywood et al., 2007). Their genetic resources can be used to provide pest and disease resistance in crops, as well as improved crop fertility, quality, agronomic, phenological or morphological traits (Dempewolf et al., 2017). Many CWR and other wild species are at risk due to increased abiotic and biotic stress related to climate change but especially due to anthropogenic pressure caused by the increasing world population and subsequent habitat loss and/or fragmentation, improper land use and the lack of effective nature conservation strategies (Govindaraj et al., 2015;Heywood et al., 2007).
Substantial efforts have been undertaken to improve CWR conservation (Castañeda-Álvarez et al., 2016). Ex situ conservation strategies have been given most attention, and a large amount of germplasm is already stored in gene banks. Such a strategy also makes it possible to distribute genetic material to crop breeders.
Nevertheless, current ex situ conservation has four major drawbacks. First, only a small proportion of the inter-and intraspecific genetic variation present in CWR is currently stored in gene banks (Castañeda-Álvarez et al., 2016;Guarino & Lobell, 2011). In Europe, only 1,095 CWR taxa (6% of CWR present in Europe) were included in the EURISCO catalogue of ex situ plant genetic resources (Ford-Lloyd et al., 2011). Second, some CWR produce recalcitrant seeds that cannot survive with traditional ex situ conservation techniques such as drying and freezing (Bonner, 1990). Third, plants conserved ex situ are not influenced by the same evolutionary selection processes as in their native environment, limiting the development of new genetic variation. As a result, accommodation to a gradually changing environment and new biotic interactions is prevented (Heywood, 2016;Meilleur & Hodgkin, 2004). Lastly, to safeguard ex situ collections from disappearing due to local natural or human-caused disasters, safety duplication of accessions is strongly encouraged. Currently, duplication of germplasm is not always documented and many accessions might not be duplicated at all (FAO, 2014).
It is thus essential to complement ex situ collections with in situ conservation strategies as a resource for future breeding strategies.
Moreover, there are some species that are unable to be established or regenerated outside their natural habitat due to their complex ecosystem interactions or when seed dormancy cannot be broken by known (artificial) methods (FAO, 1997;Rasmussen et al., 2015).
For example, long-term ex situ conservation of vanilla crop wild relatives is complicated due to their obligate association with mycorrhizal fungi, requiring both plant and mycobiont to be conserved simultaneously (e.g. by means of cryopreservation) (Flanagan et al., 2019;Merritt et al., 2014). In situ conservation comprises the conservation of species and their genetic variability in populations as well as the ecosystem in which they thrive. It involves many different procedures, starting from the selection of a target species to the management and monitoring of designated areas (Hunter & Heywood, 2011).
A first step in establishing a strategy for both in and ex situ conservation is a gap analysis, that is knowing where high levels of genetic variation of a selected species might be located and to what extent these species are already represented in gene banks or protected locally. Accordingly, habitats or ecosystems that need additional protection can be readily identified (Maxted et al., 2008). In particular for species with a poorly known distribution range, a gap analysis approach often requires advanced modelling tools to overcome the need for information about species' absences and consequently demands very large datasets containing occurrence records.
Modelling approaches based on presence-only data cover the lack of location data for modelling distribution ranges and can therefore aid in the establishment of conservation and management strategies of threatened species (Bosso et al., 2013;Khoury et al., 2015;Phillips et al., 2006).
With a production of over 125 million tonnes each year, bananas are considered one of the most important fruit crops in the world (FAO, 2018). However, considerably less conservation efforts and strategies exist for their wild relatives compared to rice, wheat and maize (Castañeda-Álvarez et al., 2016). Bananas belong to the genus  (Häkkinen & Väre, 2008;Perrier et al., 2011).
Edible bananas are diploid, triploid or tetraploid hybrids, typically containing genetic information from M. acuminata subspecies (the "A" genome) and from M. balbisiana (the "B" genome). Few cultivars also contain genetic information of M. schizocarpa N.W.Simmonds (the "S" genome), such as in East African Highland Bananas and genetic information from species from the Australimusa section (the "T" genome) (Carreel, 1994;Němečková et al., 2018). Fe'i bananas are another group of rare, edible bananas belonging to the former Australimusa section and are independently domesticated from M. acuminata and M. balbisiana (Ploetz et al., 2007). The presence of the M. balbisiana genome is often associated with drought tolerance and Xanthomonas resistance not found in Musa acuminata, but M. balbisiana genetic resources are currently largely underused due to incorporated sequences of the endogenous banana streak virus, a Badnavirus (Duroy et al., 2015). The narrow genetic basis of current cultivated bananas and their limited fertility are major constraints on further improvement through classical breeding (Brown et al., 2017).  (Dita et al., 2018).
In the late 1980s, a new strain of Fusarium oxysporum f. sp. cubense (TR4) started to infect cultivars of the Cavendish subgroup, which had been selected in the past because of their resistance against TR1 (Pérez-Vicente, 2004;Ploetz, 2015). While the distribution of TR4 was restricted to East and parts of Southeast Asia for a long time, it was more recently discovered in Jordan and other banana-growing regions in Africa (Garcia-Bastidas et al., 2014;Zheng et al., 2018). Alarmingly, the occurrence of TR4 was recently also detected in Colombia and thus for the first time in Latin America (Garcia-Bastidas et al., 2019). Together with more extreme weather events associated with climate change, the conservation of wild bananas becomes even more important as the wild material can serve as potential source against disease resistance such as TR4 or drought tolerance (Castañeda-Álvarez et al., 2016;Heslop-Harrison & Schwarzacher, 2007;Zuo et al., 2018). Wild banana species are typically diploid and can vary in numerous traits such as height, flower and fruit shape and colour ( Figure 1). They can be subdivided into two sections: the Callimusa section (former sections Australimusa, Callimusa and Ingentimusa) and the Musa section (former sections Musa and Rhodochlamys) (Häkkinen, 2013).
Several conservation efforts have already been made to conserve wild banana germplasm ex situ in the form of seeds, in vitro, cryopreservation or as living plants. Most of the available germplasm is kept as in vitro cultures or frozen meristems at The International Musa Germplasm Transit Centre (ITC) in Belgium (Panis et al., 2005;Van den houwe et al., 1995Van den houwe et al., , 2003. Collection missions in the past mainly focussed on Musa acuminata subspecies, M. balbisiana and diploid and triploid cultivated varieties to serve as potential source of genetic resources for banana breeders. However, little is known about the natural distribution of many wild species and consequently specific collecting and in situ conservation strategies are missing for these species. Botanical knowledge to identify species correctly is rare due to the lack of good herbarium material as a result of their large, fleshy architecture and ephemeral flowers and molecular methods are often needed for a correct species distinction (Liu et al., 2002). Moreover, most wild species occur in remote and often inaccessible areas that require substantial travelling to reach. In addition, many of the tropical and subtropical regions in Southeast Asia are heavily understudied and field missions are needed to further map the distribution of wild Musa species (Sardos et al., 2018).
In this study, we establish a comprehensive dataset containing georeferenced occurrence records of wild banana species and subspecies. Subsequently, potential species distributions are modelled with MaxEnt (Phillips et al., 2006) using presence-only data to over-  (Mittermeier et al., 2011;Olson et al., 2001).

| Occurrence data
We compiled a dataset of occurrence records of wild Musa species and subspecies by combining information from known in situ and ex situ collections (e.g. Millennium Seed bank, The International Musa Germplasm Transit Centre) and from other well-known databases (e.g. Naturalis Biodiversity Centre, Global Biodiversity Information Facility, Genesys PGR). Presence data obtained from scientific articles and recent field missions in Vietnam, Papua New Guinea and Bougainville were also included. Accurate locality descriptions without coordinates were georeferenced using Google Earth pinpoints (Google LLC, 2018). For some taxa, occurrences were obtained at the subspecies level and are referred to as species throughout the article. Duplicate records, outliers, zero coordinates, records in centroids of provinces and countries and erroneous occurrences in the sea were removed with the online tool "CoordinateCleaner" (Zizka et al., 2019). Accession names were compared and adjusted to their currently accepted name according to the World Checklist of Selected Plant Families (WCSP, 2018). Data were trimmed to a maximum of one occurrence per species per raster cell of 30 arcseconds to avoid strong autocorrelation between environmental variables.
While methods exist to identify the minimum required sample size (van Proosdij et al., 2016), we set the minimum number of records to infer relationships between species and environmental conditions for each species at five (Appendix S1-sheet 1) (Raes et al., 2014).

| Environmental data
Current climatic conditions were represented by 19 bioclimatic variables obtained from the WorldClim 2 database with a spatial resolution of 30 arcseconds. The data represent average monthly climate data for 1970-2000 (Fick & Hijmans, 2017). The Maximum Green Vegetation Fraction was downloaded at a 30 arcsecond resolution from the USGS Land Cover Institute (Broxton et al., 2014). Digital elevation models (DEM) of Asia, Southeast Asia and Australia with a 30 arcseconds spatial resolution were retrieved, subsequently combined and aligned to fit the same dimensions and number of raster cells as the layers containing bioclimatic information. Additionally, slope and aspect (i.e. slope direction) were derived from the DEM using the terrain function in the "raster" package in R (Hijmans, 2020), resulting in a final set of 23 environmental variables (Appendix S2- Table S1).

| Distribution modelling
Species-specific variable selection was carried out based on random forests with the VSURF_thres function in the package "VSURF" in R (Genuer et al., 2019). Using 50 Random Forest runs that were built using 2,000 trees each, variables were ranked from high to low variable importance (VI). A threshold is estimated based on standard deviations of variable importance and variables with a VI lower than the threshold are eliminated. Subsequently, compared to the top five predictors and in order of importance, variables with Pearson's correlation coefficient larger than 0.7 were excluded (Appendix S1sheet 2).
Most optimal combination of MaxEnt features (linear, quadratic, product) and regularization parameters (ranging from 0.1-10) to develop the models were selected using the ENMevaluate function in R package "ENMeval" (Muscarella et al., 2014), using the randomk-fold method to partition occurrence and background localities. As biologically meaningful thresholds are unknown or assumed, hinge and threshold features were excluded (Gomes et al., 2018;Merow et al., 2013). Considering the SDMs as a good proxy for the true species' distributions, they were transformed to binary species distribution maps using the maximized sum of sensitivity and specificity as threshold (Khoury et al., 2020;Liu et al., 2005). To model the area with suitable environmental conditions for each banana species, the values of included variables were extracted from each occurrence location together with a maximum number of background points of 5,000. We performed a species-specific background point selection method as described in Khoury et al. (2020), that is by limiting the background of each species to the ecoregion and the countries of the original occurrence locations (Khoury et al., 2020). Species distributions were modelled with the maximum entropy algorithm implemented in MaxEnt (Phillips et al., 2006) using the maxent function in R package "dismo" . This presence-background modelling software was developed to cope with presence-only (PO) data by contrasting this to a sample of background locations drawn from the study area where the presence of a species is unknown (Merow et al., 2013). It can compete with or even outperforms other methods (e.g. ANN and GLM), in particular for small sample sizes and when species have a limited distribution (Aguirre-Gutiérrez et al., 2013;Elith et al., 2006Elith et al., , 2011Williams et al., 2009).
To produce and evaluate each SDM, occurrence records were split into training and testing data using a cross-validation approach with ten replicates for species with more than 10 occurrences or five replicates for species with less than 10. Distribution models were then calculated as the median of these replicates. Model evaluation was based on three different metrics: the area under the curve (AUC), the standard deviation of the AUC between replicates (SDAUC) and the proportion of the potential distribution model with a standard deviation > 0.15 (ASD15). Species with an AUC above 0.7, SDAUC < 0.15 and an ASD15 < 0.10 are considered stable (Ramírez-Villegas et al., 2010). Based on the models that passed these criteria, a species richness map was created. For species where no robust model could be generated, a buffer of 0.5 degrees (~50 km radius) was created around each occurrence record (Khoury et al., 2019).
As species distributions are not limited by ecoregion or country borders, we ran a complementary analysis without restricting the background selection to specific countries or ecoregions. Because not all areas in the study region have been sampled to the same degree, we created a bias layer using all data records of all Musa species as target group (APPENDIXS2- Figure S1) (Rinnan, 2015). Sampling background points from a layer representing sampling bias has been proven to greatly improve model performance (Phillips et al., 2009;Syfert et al., 2013). This allows us to assess species richness for the study area, without excluding countries or ecoregions that had no occurrence records in our dataset. For this set of analyses, 10,000 background points were sampled and MaxEnt's standard settings were used together with a regularization parameter of one. and other records from herbarium and field observation data were scored as "H." For G records, buffers of 0.5 degrees (~50 km radius) were made (CA50). The indicator is based on the calculation of six metrics, three for both in and ex situ [Sampling Representativeness Score (SRS), Geographical Representativeness Score (GRS) and
Based on the average of these three metrics, final in and ex situ conservation scores (FCS in and FCS ex, respectively) were calculated.
Combined conservation scores (FCS c ) were then used to determine the indicator score for the Musa species assessed in this study.
In this section, we used Musa balbisiana var. balbisiana as a case study to explain in detail the assessment of the combined conserva- ERS ex = number of ecoregions represented within CA50 of G records number of ecoregions represented within SDM × 100 SRS in = number of occurrences in protected area total number of occurrences × 100

| Distribution of wild banana
Fifty-nine wild Musa species with more than five observations per species were found, resulting in 1,511 georeferenced, unique records in the study area ( Figure 2). A total of 147 of those are conserved ex situ. The dataset includes the species name, coordinates, source type and unique identification numbers of each occurrence (Appendix S1, sheet 1

| Distribution models
Out of the 59 species, 41 of the modelled predictions (70%) had an AUC > 0.7, an STAUC < 0.10 and an ASD15 < 0.15 and were considered robust. Eight out of the 18 species that did not pass these criteria had less than 10 occurrence records (AppendixS1-sheet 3).
The predicted range of these 41 species and buffered maps of the species that did not pass the criteria are included in the appendix (AppendixS3). Clear differences between modelled species distributions were found. While some species models suggest a rather F I G U R E 2 Study area, ranging from north-eastern India to Australia including all islands in between. All 1,511 occurrences of 59 Musa species are marked on the map in different colours. Species-specific coordinates can be found in the occurrence list (Appendix S1) broad predicted distribution of the species, for example M. itinerans

| Risk assessment
Preliminary risk assessment with "conR" based on IUCN criterion B indicated that 11 out of 59 Musa species are currently vulnerable while 9 species are currently endangered. Most Musa species analysed in this study were considered as of least concern for future conservation efforts or as near-threatened (i.e. could become threatened in the near future). Occurrence in protected area greatly varied between different species and ranged from 0 for 11 species to 60% for Musa exotica R.V.Valmayor, with an average of 13.9% for the genus (Table 1).  Table S2.

| Suitable area
Species richness maps produced with SDMs that passed the evaluation criteria suggest large differences in environmental suitability throughout the study area. Next to the high number of different F I G U R E 3 Species richness maps based on species that passed the criteria. (a), species richness with background points constricted to countries and ecoregions in which occurrence records were present; (b), species richness based on analyses using the full study area as background TA B L E 1 Partial assessment of conservation status of wild bananas based on IUCN criterion B. EOO, extent of occurrence; AOO, area of occupancy. IUCN categories are determined and designated as follows: LC or NT, least concern or near-threatened; VU, vulnerable; EN, endangered  (Janssens et al., 2016), a pattern that was also found in Impatiens L. species (Janssens et al., 2009) and likely applies for many other genera. This biodiversity hotspot is in general characterized by a high species richness and is believed to contain the sixth most endemic genera and species in the world (Mittermeier et al., 2011).
A drawback of restricting the background to countries where occurrences have been found is that countries that might have a suitable environment are completely ignored. Therefore, we did an additional analysis using the full study area as background. This resulted in simi- Bananas require a high amount of annual precipitation ranging between 1,100 and 2,690 mm evenly distribution throughout the year (BIO12) (van Asten et al., 2011;Robinson & Alberts, 1986). Longer periods of dry soil might lead to root tip death and increased susceptibility to pathogens (BIO14, BIO18) (Nelson et al., 2006;Ochola et al., 2015;Turner et al., 2007). These requirements largely coincide with these of lowland tropical rain forests that are here confirmed to be most environmentally suitable for most bananas in this study.

| Risk assessment and conservation status
Our extinction risk assessment with ConR suggests that nine wild package (Dauby et al., 2017). While many of the occurrence records included in this study were sampled in the last two to three decennia, others were collected much earlier (e.g. Argent, 1976;Hotta, 1947;Simmonds, 1956). Including older occurrence records might lead to an overestimation of the current distribution, as these populations are more likely to have gone extinct in contrast to more recently sampled populations.
To date, at least one wild banana native to north-eastern Queensland, Musa fitzalanii F.Muell., has been reported as extinct or critically endangered in the wild. It has only been reported from a type specimen in the herbarium of Queensland (Pollefeys et al., 2004).
Another example is M. mannii H.Wendl. ex Baker, which was thought to be extinct for over 120 years but was recently rediscovered in north-eastern India, collected ex situ and listed as critically endangered in the wild (Häkkinen & Väre, 2008;Joe et al., 2014).
The extinction risk assessment (IUCN criterion B) suggests that, based on the size of their distribution, 39 wild species are currently listed as of least concern or as near-threatened. As only 5% of wild banana species were indicated as sufficiently conserved or of low priority for additional in and ex situ conservation, these species are prone to become threatened in the near future. Eleven out of 20 species that were classified as endangered (EN) or vulnerable (VU) in the risk assessment were marked as high priority for further conservation in the gap analysis based on their final conservation score.
Additionally, none of these species except M. coccinea Andrews had georeferenced records that are conserved ex situ, making higher conservation efforts even more critical for those species. Especially because assessment of in situ conservation status assumes that plants in protected areas are effectively protected, occurrence in protected area does not necessarily mean that the taxon of interest is being protected. The level of in situ conservation priority is therefore likely an underestimation compared to the reality as protected areas include both strictly and less strictly protected areas, as well as multiuse protected areas with zones of integrated management where some species-bananas in this case-are not actively protected (Ferraro While not all ex situ collections (and coordinates) are currently available in online databases for many species (Khoury et al., 2019), we had access to the information of the International Musa germplasm Transit Centre (ITC), the largest Musa germplasm collection in the world. Still, there are big differences in ex and in situ conservation scores (9.17 vs 41.63, respectively), suggesting that many wild banana species, and this also accounts for other CWR, are in need for additional ex situ conservation (Castañeda-Álvarez et al., 2016).
There are many local banana collections in the world (Bioversity International, 2019), but most focus on preserving cultivars in their collections rather than wild species. While a high number of wild accessions are present in some collections (e.g. UPLB in the Philippines, RIF in Indonesia, NTBG in Hawaii), collection-specific information and georeferenced localities are either unknown or unavailable for the public.

| Considerations on species importance for crop improvement
Because wild banana species belong to the same genus as the crop, they can all be considered as CWR. However, it is unlikely that all have the same value for crop improvement. As cultivated bananas are vegetatively propagated and typically have seedless fruits, breeding effort, time and intensive resources are required to develop more resistant varieties (Batte et al., 2019;Brown et al., 2017;Ortiz, 2013;Ortiz & Swennen, 2014). For this reason, the use of CWR has been more successful in conventional breeding programmes for crops such as maize, rice and wheat that are not limited to clonal propagation (Dempewolf et al., 2017). However, new molecular techniques such as genomic prediction in banana might facilitate the process and will further increase the importance of banana CWR . Moreover, cultivated bananas are derived from hybridization between Musa acuminata subspecies (A genome) and M. balbisiana (B genome), M. schizocarpa in some cultivars (S genome) and some Australimusa species (T genome) (D'Hont et al., 2000). This indicates that some species might be more interesting for breeding programmes than others. For example, cross-compatibility has  & de Vicente, 2010;D'Hont et al., 2000;Shepherd, 1999). banksii and M. balbisiana had a conservation score of 32.24, indicating that conservation efforts have been already successful but still need to be intensified. Nine members of the former Rhodochlamys section were included and seven are of high priority for further conservation (with an average FCS c of 23.71). Because of their high tolerance to drought and resistance to Fusarium wilt and leaf spot disease, more attention needs to be given to their conservation (Uma et al., 2006). Members of the Australimusa section and potential progenitors of the Fe'i bananas are insufficiently conserved (FCS c of 18.38 for the section) and 10 out of 13 included species are of high priority for further conservation. Here, M. bukensis is indicated to be absent in germplasm collections and in protected area. Seeds of this species were, however, recently collected and conserved ex situ at Meise Botanic Garden (Sardos et al., 2018).

| CON CLUS ION
While bananas are one of the most important fruit crops and many efforts exist in conserving large numbers of varieties, both in and ex situ conservation of their wild relatives is limited. With a less stable future climate and large deforestation, collection needs to be accelerated for the conservation of species and important adaptive traits for crop improvement. We find that highest Musa species richness is likely found in the north-eastern states of India and the south Chinanorthern Vietnam border. Based on a partial IUCN assessment, 20 out of 59 assessed wild species are considered vulnerable or endangered. The ex situ conservation assessment indicated that three species are of low priority for further conservation while 48 are of high priority because they are in need for further collecting or completely absent in germplasm collections. Thirteen out of 59 species are of low priority for additional in situ conservation, though it is hard to assess whether bananas are actively being protected and whether the conserved plants are good representatives of the gene pool of their species. Little is still known about many wild banana species and specific information on their distribution (e.g. georeferenced localities) is often scarce or insufficient for generating reliable SDMs.
Hence, there is a great need for supplementary field missions. Based on the species distribution and species richness maps that are provided, researchers have an indication where new individuals could be located. Therefore, our approach forms a basis for developing a proper collecting strategy. In the context of climate change, a followup study assessing the effect of different climate scenarios (according to the IPCC) on distribution of wild Musa species might provide additional information on their conservation threat.

ACK N OWLED G EM ENTS
The authors are grateful to all donors who supported this work through their contributions to the CGIAR Fund (https://www.cgiar.

CO N FLI C T S O F I NTE R E S T
The authors declare no conflicts of interest.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13233.