On the hunt for the alternate host of Hemileia vastatrix

Abstract Coffee leaf rust (CLR), caused by the fungal pathogen Hemileia vastatrix, has plagued coffee production worldwide for over 150 years. Hemileia vastatrix produces urediniospores, teliospores, and the sexual basidiospores. Infection of coffee by basidiospores of H. vastatrix has never been reported and thus far, no alternate host, capable of supporting an aecial stage in the disease cycle, has been found. Due to this, some argue that an alternate host of H. vastatrix does not exist. Yet, to date, the plant pathology community has been puzzled by the ability of H. vastatrix to overcome resistance in coffee cultivars despite the apparent lack of sexual reproduction and an aecidial stage. The purpose of this study was to introduce a new method to search for the alternate host(s) of H. vastatrix. To do this, we present the novel hypothetical alternate host ranking (HAHR) method and an automated text mining (ATM) procedure, utilizing comprehensive biogeographical botanical data from the designated sites of interests (Ethiopia, Kenya and Sri Lanka) and plant pathology insights. With the HAHR/ATM methods, we produced prioritized lists of potential alternate hosts plant of coffee leaf rust. This is a first attempt to seek out an alternate plant host of a pathogenic fungus in this manner. The HAHR method showed the highest‐ranking probable alternate host as Psychotria mahonii, Rubus apetalus, and Rhamnus prinoides. The cross‐referenced results by the two methods suggest that plant genera of interest are Croton, Euphorbia, and Rubus. The HAHR and ATM methods may also be applied to other plant–rust interactions that include an unknown alternate host or any other biological system, which rely on data mining of published data.

Hemileia vastatrix penetrates coffee leaves via the stomatal openings and grows nutrient-absorbing mycelium through the leaf mesophyll. Vibrant bouquet-shaped, orange uredinia and telia are produced on the abaxial side of the coffee leaves (Arneson, 2000;Kumar et al., 2016). Uredinia give rise to urediniospores, which are dikaryotic and the only reported means of propagation for H. vastatrix (Arneson, 2000;Carvalho, Fernandes, Carvalho, Barreto, & Evans, 2011) (Figure 1). Dry urediniospores can survive up to 6 weeks on detached plant tissue, but will only germinate again in the presence of rain or heavy dew (Arneson, 2000). Under cool, dry conditions, the telia give rise to the two more elusive spore types: teliospores and subsequently basidiospores (Arneson, 2000;Coutinho, Rijkenberg, & Asch, 1995). Teliospores are two-celled, thick-walled and consist of dikaryotic cells (Schumann & Leonard, 2000). Teliospores produce basidia, which then develop four haploid basidiospores (Arneson, 2000;Coutinho et al., 1995) (Figure 1). In most rust fungi, only the teliospores are capable of long-term survival away from a living host plant (Schumann & Leonard, 2000). By producing both asexual and sexual spore types, rust fungi increase the chance of transmission to multiple hosts (Shattock & Preece, 2000). For this reason, many rusts are observed to have complex disease cycles with different spore types or reproductive structures being defined as either macrocyclic (producing five spore types: spermatia, aeciospores, urediniospores, teliospores, and basidiospores) or microcyclic (species often lacking aeciospores and urediniospores, with or without spermatia) (Shattock & Preece, 2000). The sexual stage of a rust fungus' life cycle is of particular importance, because it facilitates the rise of new genotypes via recombination (Shattock & Preece, 2000).
Despite the long history of CLR and the wide interest of the plant pathology community, critical aspects of the disease cycle of H. vastatrix remain unclear (Carvalho et al., 2011). Some have hypothesized that H. vastatrix is a heteroecious rust, thus requiring two hosts for the completion of the disease cycle (Gopalkrishnan, 1951;Petersen, 1974). The fact that basidiospores do not re-infect coffee supports this theory (Gopalkrishnan, 1951). Yet, an alternate host of H. vastatrix has never been reported. It has been postulated that the basidiospores of H. vastatrix are remnants of an earlier rust ancestor and no longer utilized by the fungus (Arneson, 2000;Waller, 1982).
However, others argue that the preservation of the basidiospores in the observed disease cycle provides evidence for a viable, alternate host of H. vastatrix (Petersen, 1974). Others have speculated that based on Tranzschel's Law (Shattock & Preece, 2000), the alternate host of H. vastatrix is an orchid (Rodrigues, 1990).
One of the earliest attempts to re-infect coffee leaves with the "sporidia" (aka. basidiospores) arising from H. vastatrix teliospores was described as an "utter failure" (Ward, 1882). Since then, there have apparently been no reports of infection by H. vastatrix basidiospores in any plant species. This leads us to ask why this spore type is being produced by the fungus at all? There are examples of autoecious (single host) rust fungi, which can infect the same host with all spore types, such as the macrocyclic rust Puccinia helianthi, the causal agent of sunflower rust (Hiratsuka & Sato, 1982 it is most often observed that basidiospores do not infect the same plant species from which they originated (Kolmer, Ordonez, & Groth, 2009;Petersen, 1974). This implies that there is a high likelihood of an unrelated, alternate host, which H. vastatrix could infect to produce spermogonia and later aecia to complete the disease cycle ( Figure 1). Furthermore, a historical report by a British expedition to Sri Lanka in 1882 led to specimen collections of "jungle leaves" including palms, dicots, ferns, and grasses that exhibited the characters of H. vastatrix being chlorotic yellow, "pin-spots" (Ferguson & Ferguson, 1882). However, upon later scientific examination, no signs of H. vastatrix could be confirmed (Ferguson & Ferguson, 1882). To the best of our knowledge, no subsequent studies to search for the possible aecial hosts of H. vastatrix have been published. Another possibility is that multiple host species of H. vastatrix exist, as with the Cronartium species C. flaccidum and C. ribicola. These rust pathogens have been reported to infect eight diverse host plants from six different families in greenhouse inoculation experiments (Kaitera, Hiltunen, & Hantula, 2017).
Modern coffee breeding and cultivation have led to a continuous evolution exertion on H. vastatrix by selection for resistance to CLR in commercial Coffea spp. cultivars (Silva, Várzea, Paulo, & Batista, 2018). Today, more than 50 races of H. vastatrix are known (Talhinhas et al., 2017). This is an inexplicable evolution for a pathogen that supposedly only utilizes clonal reproduction (Silva et al., 2018). Some reports have started to emerge, hypothesizing that the different races of H. vastatrix are the result of cryptosexuality, that is, the occurrence of hidden sexual reproduction within the urediniospores (Carvalho et al., 2011). However, these new findings would not explain the ability of H. vastatrix to produce basidiospores from the teliospores. Another hypothesis relating to the CLR outbreaks in Central America are based on primary host density (Burdon & Chilvers, 1982). This implies that the epidemics of CLR occurred due to the thousands of coffee trees planted in succession within coffee growing regions in the central Americas. This would exclude the need for an alternate host in order for H. vastatrix to proliferate and spread, as the primary host is densely planted and highly accessible to the pathogen. However, this hypothesis does not allow for new variation of the pathogen, but merely maintenance of the clonal propagation of H. vastatrix.
The plant pathology community have adopted a somewhat ad hoc approach to identify alternate host plant species, whereby such species are often found serendipitously in disease-prone environments using a not always structured approach ( Table 1   Table 2   Table 3 having been unknown for a century (Jin, Szabo, & Carson, 2010 Here we present the hypothetical alternate host ranking (HAHR) and automated text mining (ATM) methods to address this gap in knowledge based on a series of assumptions relating to the disease biology of this given pathogen. Our use and integration of comprehensive geographical flora data mapping is novel to traditional plant pathology publications. We believe that this new approach will encourage more multidisciplinary collaborations and hypothesis generation for future studies in this area among plant pathologists and botanists.

| ME THOD
We formulated the so-called HAHR method in order to create ranked lists of plant species, which could be likely alternate host ( com/) and Google Scholar (https ://schol ar.google.dk/) as of 11 June 2019. The following search terms were used both singularly and in combination: "coffee," "coffea," "flora mapping," "vegetation," "origin," "Ethiopia," "Kenya," and "Sri Lanka." Plants species and genera listed in over 40 primary sources were then collected and arranged in an MS Excel spreadsheet. The filter function was used to rank plant species or genera according to their co-occurrence at the site of first discovery of H. vastatrix (Lake Kenya region) (Ferreira & Boley, 1991;Waller, 1982) and/or the site of first reported outbreak of CLR (Sri Lanka

| Co-occurrence with native (undomesticated) Coffea spp. at the site of origin
We started by determining the initial plant species pool based on flora mapping studies performed in co-occurrence of wild Coffea spp., specifically in the south-western highlands of Ethiopia (Gole, 2003;Kelbessa & Soromessa, 2008;Nune, 2008;Schmitt, 2006;Senbeta & Denich, 2006;Tadesse & Nigatu, 1996). Most of the literature found was based on either Coffea arabica or nondefined species of wild coffee. As it is by no means certain which species of coffee that H. vastatrix co-evolved with, there is an inherent assumption of origin with C. arabica or other unknown wild relatives in our method, based on the literature that was available.

Potential natural vegetation (PNV) maps of Ethiopia and Kenya
were also used to compile the initial plant species pool (Table   S1) (van Breugel et al., 2015). PNV maps are defined to illustrate vegetation that would persist under the current climatic conditions without human intervention (van Breugel et al., 2015). The Keffa and Sidamo regions (where the Geba-Dogi, Berhane-Kontir, Boginda-Yeba, and Harenna forest areas lie) have been repeatedly recognized as one of the most probable origins of wild Coffea species (Gole, 2003;Meyer, 1965;Schmitt, 2006;Senbeta & Denich, 2006). These regions were collectively assumed as a site of origin (or the sites of interest in Ethiopia). According to PNV maps, the sites of interest in Ethiopia were postulated as consisting of either "Complex of Afromontane undifferentiated forest" with "wooded grasslands" or "evergreen or semi-evergreen bushland and thicket at lower margins" (Figure 3, code: Fb/Be/wd, Kaffa region, Ethiopia) and/or "Afromontane rain forest" (Figure 3, code: Fa, Sidamo region, Ethiopia). The Global Biodiversity Information Facility (https ://www.gbif.org/) was used to cross-reference the primary literature with the PNV map. Plants species, which fulfilled these criteria, are listed in Table 1.

| Co-occurrence at the site of first discovery of CLR
Subsequently, we analyzed flora mapping at the site of first discovery of CLR. The earliest discovery of CLR was in the Lake Victoria region of Kenya in 1861 by a British explorer on uncultivated, wild coffee (Ferreira & Boley, 1991;Waller, 1982). Given this, we assumed the Lake Victoria region to be the natural site of first discovery of CLR. Based on the PNV maps, the sites of interest in Kenya were hypothesized to be surrounding either the "Lake Victoria transitional rain forest" (

| Co-occurrence at the site of first major outbreak of CLR
The first major outbreak of CLR was reported in Sri Lanka (Ceylon) in 1869 (Berkeley & Broome, 1869). After this, CLR spread around the world in three sequential outbreaks (McCook, 2006 been reported, but this spore type is more frequently dispersed locally by rain-splash due to a tendency to adhere strongly to each other, to leaves, and to smooth surfaces (Brown & Hovmøller, 2002;Nutman, Roberts, & Bock, 1960). Thus, the disease severity of CLR has often been associated with heavy rainfall (Waller, 1982). Due to these aspects of spore transmission and the fact that urediniospores have a limited ability to survive on nonliving coffee leaves, we assumed that this first CLR outbreak was the result of the longer-living teliospores being transported to Sri Lanka on dry plant material. We also assumed that this outbreak was enhanced by the presence of a supportive alternate host in Sri Lanka for the basidiospores to infect, in order to facilitate the generation of new virulent races. Primary flora data (Ashton & Gunatilleke, 1987;Ashton et al., 1997) (Arthur, 1934;Wilson & Henderson, 1966) and the MyCoPortal (http://mycop ortal.org). We focused on studies or observations concerning the plant species or genera of interest and any associated rust pathogen. The following search terms were used (in combination with the plant species or genus name): "rust," "fungi," "Hemileia," or "fungal pathogen." All fungal names and authors were verified by Species Fungorum (http:// www.speci esfun gorum.org/). Table 3 is made based on the presence of the plant species or genera at the site of origin (Ethiopia) and/or the site of first discovery (Kenya) or first outbreak (Sri Lanka) as well as association with (a) Hemileia spp. and (b) other rust fungi. The different rust spore stages were classified according to Hiratsuka and Sato (1982):

| Automated text mining (ATM) approach
Automated text mining of the biomedical literature has been widely used to recognize entities such as species, proteins, or diseases in the scholarly literature, for example, (Pafilis et al., 2013;Piñero et al., 2015). To our knowledge, this is the first attempt to apply this methodology in the context of plant flora data. Dictionary-based text mining uses a fixed set of identifiers and synonyms that are matched to the contents of scientific articles to identify articles mentioning an entity of interest.
We used text mining to identify species that are comentioned with C. arabica in an automated manner. We used the ORGANISMS web resource (Pafilis et al., 2013)  Olea capensis L.

Zanthoxylum gilletii (De Wild.) P.G.Waterman
Zanthoxylum rubescens Planch. ex Hook TA B L E 1 (Continued) TA B L E 2 Medium-ranking list of species considered as potential alternate host(s) of Hemileia vastatrix, based on occurrence at the sites of interest in Ethiopia and Kenya, as well as related plant species at the sites of interest in Sri Lanka

| Comparison of the HAHR and the ATM methods
A manual cross-referencing approach was applied to the list of species generated by the ATM method by using the "find" function in MS Excel for all of the plant species listed in the HAHR method (Tables 1-3). The ranking established in the HAHR was also applied to the ATM (Table 4). The abstracts listed by the ATM method were reviewed to assess whether both C. arabica and the potential plant host species were evident in the publication and how many times they were comentioned (Table 4). The percentage difference in overlap of potential host species was calculated as a way to compare the HAHR and ATM methods output.

| RE SULTS AND D ISCUSS I ON
The HAHR method indicated 158 plant species as potential alternate hosts of Hemileia vastarix, while the ATM method listed over 2,179 species (although some duplication was found). There were 19 plant species, which overlapped both methods (Table 4). Indicating that 12% of the HAHR findings were corroborated by the ATM method.
The low overlap percentage reflects the variation in the two methods.
The HAHR method produced a low-ranking short list (Table 1), a medium-ranking list (Table 2), and a high-ranking list (Table 3) of plant species. Table 1 comprises 158 plant species, which were found at the sites of interest in Ethiopia and Kenya in flora mapping studies concomitantly with Coffea species. The absence of common species between the sites of interest in Ethiopia, Kenya, and Sri Lanka (Table 2) could be seen as an indication that there would be more than one possible alternate host species of CLR. High rates of endemism caused by the long period of isolation of the island (Gunatilleke & Gunatilleke, 1990) mean that over a quarter of the native species present in Sri Lanka are considered unique to the country (Ashton et al., 1997). However, the Deccan-Gondwana ancestry of the country means there may have been an early contribution to the natural plant communities of Sri Lanka's west coast from the African continent (Ashton et al., 1997). This may explain why there are similarities at the genus level at the sites of interest in Ethiopia, Kenya, and Sri Lanka (Table 2).  (0) and aecial rust stages (I) (Arthur, 1934 pp. 152;Nazareno et al., 2017) Rubus apetalus Poir. E, K & S b Species and genus are susceptible to rust fungus: Kuehneola uredinis (Link) Arthur in either the uredinial (II) and telial (III) rust stages (MyCoPortal, 2018;Van Reenen, 1995) Category 2 (Helfer, 2005;MyCoPortal, 2018) and reported susceptibility to decay by basidiomycete fungi (Desalegn, 2013) Cornus volkensii Harms E, K Genus is susceptible to rust fungus: Puccinia porphyrogenita M.A. Curtis reported on Cornus canadensis in either the uredinial (II) or telial (III) rust stages (Arthur, 1934 pp. 251;MyCoPortal, 2018) Croton dichogamus Pax E, S b Genus is susceptible to rust fungus: Bubakia crotonis Arthur reported on Croton argyranthemus, Croton californicus, Croton capitatus, Croton engelmannii, Croton monanthogynus, Croton punctatus, and Croton texenis in either the uredinial (II) or telial (III) rust stages (Arthur, 1934 pp. 60;MyCoPortal, 2018)  Both the methods used in this hypothesis paper are limited by the published literature and databases, which were used as the "data pool" for each of the analyses. The HAHR method relies on published flora data produced in English and is region-specific based on the decision tree ( Figure 2) making it a targeted method in the context of this study. On the other hand, the ATM method is restricted to the PubMed databases and includes all species (animals, bacteria, fungi, and plants) comentioned with C. arabica, making it more comprehensive than the HAHR method. However, this also leads to superfluous data retrieval, which needed to be manually filtered.
Furthermore, the ATM method may yield false-positive hits due to the species names listed in the NCBI taxonomy, that are falsely recognized in the analyzed articles. Again, manual filtering avoided these false-positive results to be included in the results. Given the incorporation of plant species geography into the HAHR method, it is recommended that the findings from this method be prioritized over the ATM method.
A corroboration of the hypothesis raised in this study would be morphological and molecular examinations of historical plant leaf samples from herbaria collections. Plant species from the ranked listings (Tables 1-3), which were collected during epidemic periods, may exhibit symptoms of CLR, which can help to verify the alternate host. This approach has led to the recent revision of the history and geographical range of Colletotrichum acutatum species (Sundelin et al., 2015), as well as the sequencing of a unique genotype of Phytophthora infestans (HERB-1), which is now accepted as the causal virulent race which lead to the 19th century potato late blight epidemic (Yoshida et al., 2013). it is our hope to help solve the mystery that has been perplexing the plant pathology community for more than 150 years.

ACK N OWLED G M ENTS
We gratefully acknowledge collaboration with the BREEDCAFS project and thank Jacques Avelino for his expert input relating to H. vastatrix and his suggestions for possible alternate plant host candidates. We also thank Benjamin and Charles Robotham for their review of the manuscript and helpful annotations.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

AUTH O R CO NTR I B UTI O N S
AK was responsible for the HAHR analysis and manual ATM analysis and for developing the draft. AR and JPBL contributed to the initial plant species pool development and flora literature review. HJLJ and BJ contributed to the literature review concerning the pathogen biology and history. AJ was responsible for the adaptation to the ATM method. All authors commented on the manuscript and approved TA B L E 4 Cross-reference of high-, medium-, and low-ranking species with the automated text mining method (ATM)