Modelled distributions and conservation priorities of wild sorghums (Sorghum Moench)

To fill knowledge gaps regarding the distributions, ecogeographic niches and conservation status of sorghum's wild relatives (Sorghum Moench).


| Crop wild relatives
Crop wild relatives (CWR) are the close genetic relatives of domesticated crops, including their progenitors. In addition to providing unique ecosystem functions and biotic interactions in their native environments, CWR represent key sources of genetic material for introduction into crop lines through plant breeding. The use of CWR by agricultural scientists has become regular practice since the 1940s (Meilleur & Hodgkin, 2004), and has contributed to the development of new lines of many globally important crops (Dempewolf et al., 2017;Hajjar & Hodgkin, 2007). Recently, CWR have been included in the tools used to increase the range of conditions in which crops can be grown, as well as bolstering adaptability to changing climatic conditions and pathogens (Dempewolf et al., 2017). The Despite their current and potential value, many CWR are threatened by habitat loss and degradation (Fischer & Lindenmayer, 2007;Kell et al., 2011), invasive species (Díaz et al., 2006;Ford-Lloyd et al., 2011) and climate change (Jarvis, Lane, et al., 2008). A variety of CWR conservation efforts are forming a response (Khoury, Greene, et al., 2019), both ex situ (in botanic gardens and seed banks) and in situ (in protected areas). A lack of representativeness of species and their intraspecific diversity has been recognized in genebanks  and in protected areas (Heywood et al., 2007;Khoury, Amariles, Soto, Diaz, Sotelo, Sosa, Ramírez-Villegas, Achicanoy, Velásquez-Tibatá, et al., 2019;Maxted et al., 2013).

| Domesticated sorghum
Here, we refer to domesticated sorghum as the many varieties of the species Sorghum bicolor (L.) Moench, including the cultivated varie-  (Fuller & Stevens, 2018). Today, it is grown on every inhabited continent and is the fifth-most important cereal crop globally in terms of tons produced (FAO, 2019). Its predominant use remains human consumption, especially as a grain in sub-Saharan Africa, with its ability to grow without fertilizer being advantageous in subsistence systems (Hadebe et al., 2017). There is also widespread use of sorghum in the production of syrup and alcoholic beverages, and a growing market for gluten-free products . In developed countries, its major use is as animal feed, with pigs and chickens fed on the grain and cattle fed on the stem and leaves (Ronda et al., 2019). Sorghum is also grown for bioethanol production, with yield per hectare generally equalling that of maize and exceeding it under dry conditions (Putnam et al., 1991). One of sorghum's most notable agronomic traits is its superior drought and heat tolerance compared with other cereals (Dai, 2013;Hadebe et al., 2017;Rosenow & Clark, 1981).
Like many domesticated crops, sorghum exhibits genetic uniformity as a result of intensive selection for traits such as drought resistance and yield (Doebley et al., 2006). Sorghum diversification breeding with CWR has not advanced as far as in other major cereal crops, in part due to incompatibility constraints (Hodnett et al., 2005).
Fortunately, the introgression of traits from CWR into sorghum has recently become more achievable with the advent of S. bicolor lines, which do not arrest the growth of pollen tubes of other species (Kuhlman et al., 2010). Hybrids have since been made by crossing S. bicolor with Sorghum macrospermum E. D. Garber (Kuhlman et al., 2010), and also with sugarcane (Saccharum L.) spp. (Hodnett et al., 2010). Genetic modification research in sorghum has also advanced due to the development of new transformation techniques with success rates of up to 20.7% (Liu & Godwin, 2012), compared with just 0.286% in the first published attempts (Casas et al., 1993).
This progress potentially allows a greater use of wild Sorghum Moench (and other genera), which cannot be crossed with the crop using conventional techniques.

| Sorghum's wild relatives
The genus Sorghum is currently considered to contain 22 wild taxa, whose collective range extends from Australia to the Pacific Islands, Southeast, East and South Asia, Central America and much of sub-Saharan Africa (Table 1). Seventeen wild taxa are native to Australia, with 13 being endemic, even though the crop itself was domesticated in Africa (Dillon, Shapter et al., 2007). Despite having a negligible contribution to the domestication of globally important crops , Australia's proximity to Asia and the Pacific Islands has engendered a surprising diversity of CWR, including those of sorghum, bananas and rice (Norton et al., 2017).
The majority of Australian Sorghum taxa are located in the northern, monsoonal region of the country (Andrew & Mott, 1983;Lazarides et al., 1991), mainly occurring in the Northern Territory, Western Australia and Queensland. Sorghum trichocladum (Rupr. ex Hack.) Kuntze is the only species native to the Americas, with a distribution between southern Mexico and Honduras. The five remaining taxa are distributed across Africa and Asia, including the two taxa most closely related to domesticated sorghum-S. bicolor subsp. verticilliflorum and Sorghum propinquum (Kunth) Hitchc.-which respectively have broad distributions across sub-Saharan Africa and eastern Asia.
Most wild Sorghum taxa are able to adapt to a range of edaphic conditions and collectively cover a broad range of habitats, including rocky slopes, sand dunes, grasslands and forests (Lazarides et al., 1991). This suggests that the CWR might contain high levels of genetic variation across populations.
Various traits of sorghum's CWR have already been identified as potentially useful for introduction into S. bicolor, including resistance to pests such as sorghum shoot fly and spotted stem borer (Kamala et al., 2009;Venkateswaran, 2003), resistance to sorghum downy mildew (Kamala et al., 2002) and low cyanogenic glucoside concentrations (Cowan et al., 2020; Table 2). There is interest in expanding sorghum's environmental tolerance, especially tolerance to colder climates (Fiedler et al., 2016;Yu & Tuinstra, 2001) have also historically been used as food sources by Dagoman people (Arndt, 1961), showing that they are already palatable and may even show promise as new crops themselves. Unfortunately, much about the life history and conservation status of sorghum's wild relatives has not yet been documented (Ananda et al., 2020).
This study aims to provide a further understanding of the ecogeographic adaptations, distributions and conservation status of wild sorghums. To do this, we characterized the climatic and topographic niches of wild Sorghum taxa, calculated species distribution models using occurrence information combined with climatic and topographic data, used these models to assess the current conservation of wild Sorghum taxa both ex situ and in situ and conducted preliminary threat assessments for the taxa.

| Study taxa
In this paper, we analysed the distribution and conservation status of all 22 known wild taxa of the genus Sorghum as listed by USDA ARS NPGS (2020; Table 1). Landraces of domesticated sorghum, referred to as "wild" by some authors (Mace et al., 2013), were not included.
Cleistachne sorghoides Benth. was included as part of the genus in this study due to molecular evidence placing it within the Sorghum clade (Dillon, Lawrence, et al., 2007;Liu et al., 2014;Sun et al., 1994), despite its nomenclature not yet reflecting this evidence ( which were produced through hybridization of domesticated sorghum with wild taxa. Sorghum halepense is commonly found beyond its native range and is considered a noxious weed in many regions (Holm et al., 1977), compounding its lack of suitability for this study.
Sorghum bicolor subsp. verticilliflorum and S. propinquum are considered part of sorghum's primary gene pool, with all other taxa being in the tertiary gene pool (Harlan & de Wet, 1971; USDA ARS NPGS, 2020; Table 1). Taxonomic names were standardized as per USDA ARS NPGS (2020).

| Occurrence data
Occurrence data were compiled from the Global Biodiversity

| Species distribution modelling
Species distribution models were created using the maximum entropy (MaxEnt) algorithm (Phillips et al., 2006 in the R package "dismo" (Hijmans, Phillips, et al., 2017). Following Khoury, Amariles, Soto, Diaz, Sotelo, Sosa, Ramírez-Villegas, Achicanoy, Velásquez-Tibatá, et al. (2019), models were produced using 26 ecogeographic variables (Table S2.1 in Supporting Information), including 19 bioclimatic variables, solar radiation, water vapour pressure and wind speed, all of which were derived from WorldClim 2.0 (Fick & Hijmans, 2017). For the final three variables, we produced annual values by calculating the median across monthly values. We also included altitude, which was compiled from the CGIAR-CSI dataset based on NASA Shuttle Radar Topography Mission data (Jarvis, Reuter, et al., 2008); and slope and aspect, which were calculated from the altitude data using the terrain function in the R package "raster" . All ecogeographic variables were processed at a 2.5-arc-minute spatial resolution (approximately 5 km 2 at the equator). The ecogeographic variables used in MaxEnt models were selected separately for each taxon using the R package "VSURF" (Genuer et al., 2019). Variables were ranked in order of impact on model performance, and every variable that made no measurable impact was removed. The remaining variables were tested for Pearson's correlation with other variables, and any variable with a correlation coefficient greater than 0.7 or less than −0.7 with any variable more important than itself was removed. This process was repeated until there were no pairs of variables within the five most important variables with a coefficient greater than 0.7 or less than −0.7.
For each taxon, a spatial background was created based on the boundaries of the ecoregions in which taxon occurrences were located (Olson et al., 2001). Pseudoabsence numbers were then created in proportion to the area of the taxon's spatial background, with a maximum of 5,000 pseudoabsences. Ten replicate models were produced for each taxon using the MaxEnt algorithm (K = 10), using linear, quadratic, hinge and product features, with a regularization parameter β = 1.0. The median of these replicates formed the final MaxEnt model. Median models were evaluated using three measures: area under the receiver operating characteristic curve (AUC); standard deviation of the AUC across replicates (SDAUC); and the proportion of the potential distribution model with a standard deviation above 0.15 (ASD15). For a model to be considered accurate, each of the following criteria must be met: AUC ≥ 0.7; SDAUC < 0.15; and ASD15 ≤ 10%. Lastly, MaxEnt models were thresholded using the maximum sum of sensitivity and specificity (Liu et al., 2005(Liu et al., , 2013.
Models were clipped as required to the extent of the taxon's spatial background.

| Ecogeographic characterization
Ecogeographic predictor data, at a resolution of 2.5 arc minutes for the 26 ecogeographic variables from the WorldClim 2.0 and CGIAR-CSI datasets, were extracted for all georeferenced records for all taxa (Appendix S1 in Supporting Information). These data were used to characterize taxa in terms of their potential ecogeographic niches for each variable. We also assessed the representation of these niches in ex situ conservation by comparing the distributions of a taxon's G occurrences within its full spread of occurrences.

| Conservation gap analysis
The ex situ and in situ conservation of each taxon was assessed fol- The first ex situ score was the sampling representativeness score (SRS ex ), which is the ratio of G occurrences to H occurrences. Unlike the other scores, SRS ex takes into account both georeferenced and non-georeferenced G occurrences. The second ex situ score was the geographic representativeness score (GRS ex ). To calculate this score, 50-km-radius buffers were created around each G occurrence. GRS ex is the percentage of the taxon's thresholded distribution model that is covered by these G occurrence buffers. The third ex situ score was the ecological representativeness score (ERS ex ). This score made use of a raster layer, which divides the terrestrial world into 867 ecoregions (Olson et al., 2001), as well as the buffers around G occurrences. ERS ex is the percentage of ecoregions included in the taxon's distribution model that feature at least once in the taxon's G occurrence buffers.
The first in situ score was the sampling representativeness score (SRS in ), which is the percentage of total occurrences that lie inside the protected areas marked as "designated," "inscribed" or "estab-

| Preliminary threat assessment
To complement the conservation gap analysis, we also used the oc- the EOO, which is actually occupied by a taxon by calculating the minimum number of 2 km × 2 km grids required to cover all occurrence points. These calculations were performed using the R package "redlistr" (Lee et al., 2019). Taxa were categorized using both metrics, whereby a taxon is Critically Endangered when EOO < 100 km 2 or AOO < 10 km 2 ; Endangered when 100 km 2 < EOO < 5,000 km 2 or 10 km 2 < AOO < 500 km 2 ; Vulnerable when 5,000 km 2 < EOO < 20,000 km 2 or 500 km 2 < AOO < 2,000 km 2 ; Near Threatened when 20,000 km 2 > EOO < 45,000 km 2 or 2,000 km 2 < AOO < 4,500 km 2 ; and Least Concern when EOO ≥ 45,000 km 2 and AOO ≥ 4,500 km 2 (IUCN Standards & Petitions Committee, 2019). While these metrics do not provide the full set of criteria needed for classification on the Red List, they offer indications of the threat status of each taxon.

| RE SULTS
A total of 13,846 H records and 654 G records (of which 540 had coordinates) were compiled for analysis, with taxon occurrence numbers ranging from 40 records for S. macrospermum to 4,208 records for Sorghum plumosum (R. Br.) P. Beauv. All taxa had adequate occurrences for distribution modelling (van Proosdij et al., 2016), and all models passed the evaluation criteria (Table S2.3 in Supporting Information).

| Taxon distributions
The predicted range of wild Sorghum includes eastern and northern Australia, South, Southeast and East Asia, Papua New Guinea, Central America and much of sub-Saharan Africa (Figure 1)

| Ecogeographic characterization
Regarding ecogeographic niches, substantial variation was found between taxa. The CWR able to survive in the most extreme climatic niches, measured by median of occurrences, included the following:

| Conservation gap analysis
The majority of sorghum taxa (19 out of 23) were determined to be medium priorities overall for further conservation action, with three taxa being high priorities (S. nitidum, S. propinquum and S. trichocladum), and just one taxon low priority (Sorghum brachypodum Lazarides). FCSc results ranged from 18.96 to 51.10 ( Figure 2; Table   S2.2 in Supporting Information).

F I G U R E 1 Predicted taxonomic richness map combining the 23 wild
The range in comprehensiveness of ex situ conservation was greater than that of FCSc, with FCS ex varying from 0 to 63.75. Nine taxa were classified as high priorities for ex situ conservation, with 13 taxa being of medium priority, and one (S. macrospermum) low priority (Figure 2).
The range in comprehensiveness of in situ conservation was also greater than that of FCSc, with FCS in ranging from 0 to 78.07. Within taxa, FCS in values were generally greater (i.e. indicating a better current state of conservation) than corresponding FCS ex values, with S. macrospermum (whose FCS in was 0) being the only taxon not following this trend. Only S. macrospermum was classified as a high priority for in situ conservation, with 16 taxa being classified as medium priorities, five as low priorities and one (S. brachypodum) as sufficiently conserved ( Figure 2).  (Table 3).

| General patterns
As has been found in other clades (Khoury, Carver, Barchenger, et al., 2020;Lebeda et al., 2019), Sorghum's in situ conservation scores were generally higher than corresponding ex situ scores.
This indicates the potential value of in situ conservation to CWR protection, with the possibility for many taxa to be protected by a single well-placed protected area (Maxted et al., 2013). This is, of course, subject to field verification of taxon presences and sound protected area management (Svancara et al., 2005). Despite occurring within a protected area, without monitoring and management plans a taxon is Vulnerable (Mason et al., 2015;Pressey et al., 2015).
Predictably, national genebanks (with the exception of the Millennium Seed Bank) primarily store germplasm of wild Sorghum taxa native to their own regions. For example, the Australian Grains Genebank does not have more than seven different accessions of any Sorghum taxon that is not native to Australia, even in the case of S. bicolor subsp. verticilliflorum. While this trend is understandable, increased sharing of germplasm between genebanks, while avoiding excessive duplication, could aid in increasing the efficiency with which it can be distributed to local crop developers and researchers in each region, maximizing the genetic diversity available to them.
Although gap analysis scores were calculated across taxa in a consistent manner, potential spatial biases in the underlying datasets (Beck et al., 2014) could have affected distribution models and therefore taxon gap analysis scores to varying degrees. To mitigate this challenge, taxa are separated based on native region in the remainder of the discussion. Species native to Africa and Asia were lumped due to multiple taxa having distributions in both continents.
The preliminary threat assessments of thirteen taxa did not match their current Red List determination (Table 3). While this might potentially suggest a need for revisions of the categorization of these taxa, our primary assessment did not include additional steps, including change over time analyses and expert discussion, which are incorporated into official Red List assessments. Our assessments were solely based on EOO and AOO, with AOO determining the overall categorization for every Sorghum taxon (Table S2.4 in Supporting Information), despite AOO's potential to greatly underestimate true range size (Sheth et al., 2012).

| African and Asian taxa
Of the taxa native to Africa and Asia, only S. propinquum was classified a high-priority taxon for further conservation, with every other taxon being medium priorities. Sorghum propinquum's relatively low ex situ conservation score for the region (8.98) is especially concerning due to this species being in domesticated sorghum's primary gene pool. The other taxon in sorghum's primary gene pool, S. bicolor subsp. verticilliflorum, had a slightly higher ex situ conservation score (18.45), which potentially reflects the taxon's great historical use in sorghum breeding as the crop's progenitor, but again highlights room for improvement in the protection of this taxon.

| Australian taxa
No endemic Australian Sorghum taxa were listed as high priorities for further conservation overall. There is, however, room for improvement. Three of these taxa were considered high priorities for further ex situ conservation: S. amplum, Sorghum grande Lazarides and, despite it having the best FCSc score in the genus, S. brachypodum.
These low ex situ scores are unsurprising considering that these taxa had just two, three and three G accessions, respectively. Further

| Central American taxa
Sorghum trichocladum is the only taxon in the genus native to the Americas, as well as being the only one currently without any ex situ germplasm accessions documented on openly accessible platforms.
It is currently unclear whether there is indeed no germplasm available for this species in genebanks, or whether collections have not yet been identified or reported. Fortunately, S. trichocladum's distribution model significantly overlaps current protected areas, though field verification is needed to confirm these distributions. Its ERS in score of 90.91 is particularly positive and suggests that in situ protections may be well distributed across the different ecoregions in which the species is found.

| Challenges and limitations
There exist several limitations regarding the calculation and use of species distribution models, which should be acknowledged when considering the results of this study. Firstly, there are inevitable gaps in occurrence datasets for taxa that have not been fully sampled. This can lead to the exclusion of some areas of actual ranges in distribution models if these areas are not represented in available datasets.
Also, gaps in georeferencing data also could have affected model accuracy, as well as influencing conservation scores. Secondly, spatial bias towards roadsides and other areas of human activity can impact models built from presence-only data (Hijmans, 2012). These issues are commonplace when using openly available occurrence datasets, but we attempted to mitigate them by producing ten replicate models for each taxon using different random splits between testing and training data. Models were also made more conservative by limiting taxon backgrounds to ecoregions in which taxon occurrence data existed. As mentioned, spatial biases and data availability issues generally affect data from developing countries more than developed ones, potentially leading to inconsistencies in the accuracy of underlying data from the different regions Sorghum is native to (Beck et al., 2014). A final limitation is that our models took 26 ecogeographic variables into account, but did not include some other factors that influence taxon distributions, including biotic interactions, edaphic variables and recent habitat degradation. The 2.5-arcminute spatial resolution used can also lead to some microclimatic conditions within grid cells being overlooked, as well as models potentially being too general in their determinations of "presence" of a species within an environmentally heterogeneous cell. For these reasons, our distribution models should be considered planning tools to guide explorations for confirmation in the field, and not definitive guides of where a taxon is and is not present.
Additionally, there has been debate over the monophyletic status of Sorghum, as well as over which species belong in this genus (e.g. Dillon, Lawrence, et al., 2007;Hawkins et al., 2015;Kellogg, 2013;Spangler, 2003). For this reason, our knowledge about these CWR, in terms of conservation and use in crop improvement, should continue to be updated according to the most recent classifications of Sorghum taxa. Readers should also be aware of changing data regarding the distributions of taxa (whether they are extirpated in an area, or found in previously unknown areas) and consider our study in the light of these developments. conservationists must act urgently, using knowledge already available, in order to ensure the persistence of CWR and their intraspecific diversity before populations decline further. The loss of CWR through extinction and extirpation is a constant threat with irreversible consequences. Further delay of conservation action to prevent these outcomes would be unwise.

| Future directions
In addition to further ex situ and in situ conservation efforts, there remain various actions that could be taken to maximize the value of sorghum's wild relatives to agriculture. Firstly, although improvements to the breeding process and genetic modification in sorghum have recently occurred, continued advances in the gene introgression process and increased acceptance of genetically modified crops by regulatory bodies and the public would help to maximize the simplicity of the process, and would consequently allow more widespread use of wild Sorghum (and any other) taxon in crop development. Increased distribution of knowledge and resources (such as S. bicolor plants that allow cross-species hybridization and ex situ germplasm accessions from different regions) between researchers, crop developers and farmers would also allow faster progress in sorghum improvement. Current knowledge about the general biology of these CWR is limited (Table 2). It is vital that further research, particularly on the physiology of sorghum's CWR and their responses to environmental conditions, is conducted in order to allow a better understanding of which CWR might be useful in sorghum improvement.

ACK N OWLED G EM ENTS
The authors thank the botanists, taxonomists, plant collectors, geospatial scientists and genetic resource professionals who compiled and made available the occurrence and ecogeographic in-

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13166.

DATA AVA I L A B I L I T Y S TAT E M E N T
Occurrence data, processed ecogeographic data, and interactive