Habitat suitability and distribution potential of Liberibacter species (“Candidatus Liberibacter asiaticus” and “Candidatus Liberibacter africanus”) associated with citrus greening disease

To quantify current and predict future distribution of the citrus greening pathogens “Candidatus Liberibacter asiaticus” (Las) in Africa and “Candidatus Liberibacter africanus” (Laf) globally.

Dispersal takes place by movement of adult psyllids and transportation of infected nursery citrus stock (Gottwald, 2010;Lopes & Frare, 2007), with the latter being the most important factor in the long-distance dispersal of HLB and its vector (Halbert et al., 2010(Halbert et al., , 2012Narouei-Khandan et al., 2015). Trees infected with the Liberibacter pathogens show mild-to-severe yellowing on the shoots and subsequently progressive yellowing of the entire tree (Batool et al. 2007). Leaves may also become thicker, leathery and midribs, and lateral veins are sometimes enlarged, swollen and corky (Batool et al. 2007). Mottling and chlorosis are the main characteristic leaf symptoms. The fruits are often underdeveloped, lopsided, show colour inversion and can have a sour or bitter taste (Akhtar & Ahmad, 1999;ANR, 2010;Garnier & Bové, 1983;Jepson, 2009). Ultimately, the infected tree dies as there is currently no treatment for infected trees, and all commonly grown citrus varieties are susceptible to the disease (Li et al., 2019). This will lead to a decline in the production and subsequent collapse of the citrus industry in the region.
Presently, Huanglongbing has been reported in 44 countries in the Americas and Asia (CABI/EPPO, 2017). The first report of HLB on the African mainland was in Ethiopia (Saponari et al., 2010).
Currently, the habitat suitability for Liberibacter infection in Africa is poorly known. As a consequence, the design of spatially explicit management strategies to minimize losses in the citrus industry is challenging. Africa is an ecologically diverse continent with great diversity of agro-ecological zones varying in natural resources and land use patterns, which could lead to differences in the habitat suitability of Liberibacter in the continent. Generally, the distribution of species is significantly affected by climate change (Kerr, 2001), but no studies have assessed the role of climate change in the current and future suitability and distribution of Liberabacter in Africa.
Few risk assessment models are available currently for predicting the potential establishment of HLB (Gutierrez & Ponti, 2013), despite a call for predictive global mapping of the disease fifteen years ago (da Graça & Korsten, 2004). Most available models on the potential spread of HLB are based on suitable climate conditions for the psyllid vector D. citri (Aurambout, Finlay, Luck, & Beattie, 2009;Gutierrez & Ponti, 2013;Torres-Pacheco et al., 2013). However, the risk of the establishment of HLB is not based solely on the distribution of the vector, because other factors such as environmental requirements may differ between the psyllid and the bacterial pathogen (Gottwald, 2010;Narouei-Khandan et al., 2015).
Species distribution models are important tools for investigating the potential impacts of climate change on the distribution of species (Wiens, Stralberg, Jongsomjit, Howell, & Snyder, 2009). These models are employed to project potential future changes in the geographic ranges of species (Barrows, Rotenberry, & Allen, 2010;Gritti, Smith, & Sykes, 2006), estimate rates of extinction (Thomas et al., 2004;Williams, Bolitho, & Fox, 2003), examine the efficacy of existing reserve systems (Araujo, Cabeza, Thuiller, Hannah, & should inform strategies for surveillance and preventive management against the invasion and spread of this destructive disease. | 577 AJENE Et Al. Tellez-Valdes & Davila-Aranda, 2003) and help to prioritize biodiversity conservation efforts (Pyke, Andelman, & Midgley, 2005). Emissions scenarios represent probable futures and are used in species distribution predictions (Porfirio et al., 2014;Rosentrater, 2010;Weaver et al., 2013). They comprise of representative concentration pathways (RCPs) which are scenarios that describe alternate trajectories for carbon dioxide emmissions and the resulting atmospheric concentration from the year 2000 to 2,100. Four representative concentration pathway scenarios (rcp2.6, rcp4.5, rcp6 and rcp8.5) have been selected for climate modelling.
The moderate scenario (rcp4.5) is a stabilization scenario where total radiative forcing is stabilized shortly after the year 2,100, without overshooting the long-run radiative forcing target level (Clarke et al., 2007;Wise et al., 2009). The extreme scenario (rcp8.5) is characterized by continuous rapid increase in concentrations of atmospheric carbon dioxide reaching 950 ppm by the year 2,100 and continued increase for another 100 years (Riahi, Grübler, & Nakicenovic, 2007).
Representative concentration pathways are useful in species distribution modelling, as they aid in clarifying future climate scenarios and their impact on distribution range and the suitable habitat of species based on possible future greenhouse gas emission trajectories (Chhetri, Badola, & Barat, 2018). The use of the moderate and extreme scenarios in this study seeks to establish a wide prediction range on the effects of change in climate on species distribution while being realistic on the current global climatic trends. The use of the moderate and extreme scenarios in this study seeks to establish a wide range for predicting the effects of change in climate on species distribution while being realistic about the current global climate trends.
As a vector-borne plant pathogen, the Candidatus Liberibacter species resides mainly in the plant tissue. Therefore, the presence of the host plant under a particular bioclimatic condition will almost certainly mean the probable survivability of the Liberibacter species. Furthermore, the habitat suitability of certain bacterial species has been shown to be influenced by bioclimatic conditions for example; Stream ecosystem specificities related to the climatic and biogeochemical context can influence the microbial colonization (Sabater et al.., 2008). The hydrological regime, nutrient content, temperature conditions and biological interactions depend on the climate (Sabater et al., 2008). Bioclimatic conditions have also been shown to influence the abundance of the bacterial microbiome in amphibian microbiome (Kueneman et al., 2019). Thus, our approach seeks to provide an alternative method for surveillance purposes in addition to vector surveillance since non-vector transmission (human-mediated) is also a factor in the spread of the pathogen.
This study aimed to quantify the current and predict the future habitat suitability and distribution of Las in Africa under different climate change scenarios and to identify hotspots of habitat suitability of Las using consensus models. We also aimed to quantify the current and predict the future global habitat suitability of Laf to establish the potential worldwide distribution and risk of spread of African citrus greening disease.

| Species data
Historical global presence points from previous reports of Las and Laf were obtained from the Centre for Agriculture and Bioscience  Table   S3). Assessment of affected trees was made by visual inspection of foliage on site, and symptomatic tissue was tested for the presence of Las and Laf by PCR, and identities were confirmed by Sanger sequencing Rasowo et al., 2019). The presence data were cleaned by removing duplicate coordinates from the final dataset before running the models.

| Models
Three species distribution models were selected to study the potential distribution of Huanglongbing in Africa and African citrus greening disease in other parts of the world: (a) The classic climateenvelope-model BIOCLIM (Booth, Nix, Busby & Hutchinson, 2014), which employs only occurrence data to define a multidimensional environmental space in which a species can occur. The model predicts suitable conditions in a "bioclimatic envelope," consisting of a rectilinear region in environmental space representing the range (or some percentage thereof) of observed presence values in each environmental dimension (Phillips, Anderson, & Schapire, 2006).
To avoid the overpredictive effect of outliers, the resulting envelope can be reduced at specified percentiles or standard deviations (Araujo & Petersen 2012).
(b) Maximum Entropy (MaxEnt) (Phillips et al., 2006), which is a general-purpose method for making predictions from incomplete information (Phillips et al., 2006). It requires only presence data, together with environmental data of the study area, employs both continuous and categorical data, can incorporate interactions between different variables and can converge to the optimal probability distribution as a result of deterministic algorithms which have been developed within the model (Phillips et al., 2006).
(c) Boosted Regression Trees (BRT) (Friedman, 2001), which is one of several techniques that enhance the accuracy of a single model by using two algorithms: regression trees and boosting. Boosting improves the model accuracy (Schapire, 2003). It is a forward, stagewise procedure. In boosting, models are fitted in a stepwise manner to the training data; subsequently, appropriate methods are employed to gradually increase emphasis on poorly modelled observations by the existing collection of trees (Friedman, 2001).
These models were selected to represent a spectrum of model complexity as well as make up for the limitations of each individual model via the consensus approach.

| Selection of bioclimatic variables
Environmental predictors included 19 bioclimatic variables (Table 1) of 2.5 km spatial resolution for the current scenario using the baseline average   (Fick & Hijmans, 2017) and for each future scenario using the Coupled Model Intercomparison Project, Phase 5 (CMIP) 2055 average   (Taylor, Stouffer, & Meeh, 2012).
To assess the effect of climate change in the potential distribution of Las, we used two RCPs: CMIP5 rcp4.5 (moderate scenario) and CMIP5 rcp8.5 (extreme scenario). The environmental variables were obtained from the WorldClim database (http://www.world clim.

org/).
Expected multicollinearity among all the predictor bioclimatic variables were tested using the Pearson correlation test. The highly correlated variables were eliminated using the "findCorrelation" function in the "Caret" package (Kuhn et al., 2016) in R v3.5.1 environment via R-Studio (R evelopment Core Team, 2008). The correlation coefficient of |r|> 0.7 was set as a collinearity indicator for variables that would affect the models (Dormann et al., 2012).
Variables within this range were eliminated from the analysis, and only the uncorrelated predictor variables were used in the models.
A Jacknife test for variable importance was run on the selected variables in MaxEnt to determine the percentage contribution of each selected variable to the model.

| Model evaluations
Model accuracy was assessed using the area under receiver operating characteristic curve (ROC curve) (Thuiller, Arau´jo, & Lavorel, 2003). The ROC curve is a graphical technique that represents the relation between the false-positive fraction (1-specificity) and the sensitivity for a number of thresholds (Fielding & Bell, 1997;Phillips et al., 2006). A curve that maximizes sensitivity for low values of (1-specificity) infers good model performance.
The classification of the accuracy of a diagnostic test is the traditional academic point system (Swets, 1988): 0.90-1.00 = excellent; 0.80-0.90 = good; 0.70-0.80 = fair; 0.60-0.70 = poor; and 0.50-0.60 = fail. The difference between the areas under ROC curves generated by two or more models provides a measure of comparative discrimination ability of these models when applied to independent evaluation data. All models were run in R v3.5.1 environment via R-Studio, and the data were exported as ASCII files for enhanced visualization with QGIS software v2.18.15 (QGIS Development Team, 2016).

| MaxEnt
Presence locations for Las and Laf (occurrence data) were compared against all the pseudo-absence points that are available in the study area to avoid model overfitting of spatially clustered presence points and inability to predict spatially independent data. Pseudo-absence points are randomly sampled points from a given geographic area and treated like locations where the species of interest is absent.
Pseudo-absence points were generated automatically in MaxEnt by random selection from all points within the studied area excluding available presence points (Barbet-Massin, Jiguet, Albert, & Thuiller, 2012) ( Figure S4). The model was trained using 75% of the presence data and validated using 25%. The model was run with 5,000 iterations and > 10,000 background points for both current and future climate scenarios.

| BIOCLIM
The mean and standard deviation for each environmental variable were used individually to compute bioclimatic envelopes. The level TA B L E 1 Predictor bioclimatic variables used for modelling the ecological niche for "Candidatus Liberibacter asiaticus" and "Candidatus Liberibacter africanus." Variables selected through a multicollinearity test using the "Find correlation" function in the caret package in the R software are shown in bold. Data were sourced from the WorldClim database accessed on November 2018

| Boosted Regression Trees
The model was calibrated using the gradient boosting function "gbm.
step" in the "dismo" package (Hijmans, Phillips, Leathwick, & Elith, 2017) in R. The data and settings were as follows: the data frame containing the trained data, the predictor variables-gbm.x = 2.20, the response variable-gbm.y = 1, the nature of the error structure (family = "bernoulli"), the tree complexity (5), the learning rate (0.01), and the bag fraction (0.5). All other parameters not named in the call were set at as default.
Additionally, the use of several algorithms to project species distributions is important because models with the highest accuracy on current climate data may not be optimal in projecting onto new areas or climate conditions (Heikkinen, Marmion, & Luoto, 2012). We employed an ensemble modelling method of obtaining a consensus of the three models, weighted by their AUC scores.
Averaging several models have shown that the "output" of interest is isolated from the "noise" associated with individual model er-

| Effect of climate change
The impact of climate change on the future distribution of HLB was determined using the raster calculator in QGIS to resolve the differences between the outputs of the two scenarios assessed. The effect of climate change was calculated by subtracting the raster outputs. The potential distribution due to a change from the extreme emission scenario to the moderate emission scenario was calculated as CMIP5 rcp 4.5 -CMIP5 rcp 8.5 = change. The potential distribution due to a change from the moderate emission scenario to the extreme emission scenario was calculated as CMIP5 rcp 8.5 -CMIP5 rcp 4.5 = change. The final output was streamlined to project only the optimum habitat suitability for better visualization of the change in the distribution of the hotspots. The uncorrelated predictor variables, as well as the biologically relevant precipitation-dependent variables (Table 1), were used in the models. The jackknife test for variable importance showed that the precipitation of the wettest month, and the annual mean temperature had the highest percentage contribution and permutation importance to the model, respectively. Precipitation of the driest month had the least percentage contribution, while the mean temperature of the warmest quarter had the least permutation importance (Table 2).

| Predicted current distribution of Huanglongbing in Africa
The predicted current distribution of HLB, as obtained from the three-model consensus, showed all countries of Central, Eastern, Southern and Western Africa having marginal to optimal habitat suitability for Las (Figure 1a). In contrast, large areas of North Africa were predicted to have habitat unsuitable for Las, but the north-

| Predicted future (2050) distribution of Huanglongbing in Africa
The future distribution of HLB in Africa predicted using the threemodel consensus also showed a wide distribution under the moderate (CMIP5 rcp4.5) climate scenario and the extreme (CMIP5 rcp8.5) climate change scenarios. All countries of Central, Eastern, Southern and Western Africa showed marginal to optimal habitat suitability for Las (Figure 1). The extreme scenario (Figure 1b) showed increased areas of habitat suitability compared to the moderate scenario (Figure 1c). A shift in the distribution of HLB was shown by the difference in the hotspots between the current and the future scenarios showed. Under the extreme scenario, the areas of optimum habitat suitability absent in the current distribution and present in the future distribution were predominantly in Western, Central and Eastern Africa (Figure 2a), whereas under the moderate scenario, these areas were predominantly in Southern Africa (Figure 2b).

| Predicted effect of climate change on habitat suitability of Huanglongbing in Africa
Assessment of predicted habitat suitability showed that there are more areas of optimal habitat suitability for Las under the extreme scenario (CMIP5 rcp8.5) than in the moderate scenario (CMIP5 rcp4.5) (Figure 3a). The major citrus exporting countries in Africa (Egypt, Ethiopia and South Africa) showed fewer hotspots in the moderate scenario than in the extreme scenario (Figure 3b).

| Predicted current distribution of African citrus greening globally
The   (Figure 4).

| Predicted future (2050) distribution of African citrus greening globally
The predicted future distribution of ACG globally under the extreme scenario (CMIP5 rcp8.5) showed an increase in the areas of habitat suitability from the current distribution from the current scenario ( Figure 5). Specifically, few areas showed a slight alteration: Papua New Guinea showed a slight increase, while Australia showed a decrease. The predicted future distribution of ACG globally under the moderate scenario (CMIP5 rcp4.5) showed a decrease in the areas of habitat suitability from the current distribution from the current scenario ( Figure 6).

| D ISCUSS I ON
Huanglongbing caused by Las is a destructive disease of citrus worldwide and has recently been reported on the African continent Saponari et al., 2010), while Laf-associated African citrus greening disease has been reported in Africa since the 1920s (Buitendag & von Broembsen, 1993;Massonié et al., 1976;McClean & Oberholzer, 1965;Rasowo et al., 2019;Roberts et al., 2017). Vector-based modelling has been the norm for predicting the distribution of HLB because modelling ecological niches for plant pathogens can be a problematic endeavour considering the different variables involved. However, environmental and climate data, including monthly temperature and rain, are assumed to reflect the climate suitability for the growth and development of different organisms, including plant pathogens (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005). Ecological factors such as temperature, light and water availability, soil fertility, methane and CO 2 concentration can have positive, neutral or negative effects on disease development, as pathogens show varying responses to these factors (Velásquez et al., 2018). Furthermore, increased industrialization of developing countries leads to has a direct effect on the CO2 emissions and hence an effect on the distribution of plant pathogens (Anderson, 2010).
Studies have shown CO2 levels affect the photosynthetic rate and crop yield of C3 plants such as soybean, cotton, oranges and lemon.
The increase in CO2 concentrations has also been shown to increase disease severity in some plants such as rice and wheat (Kobayashi et al., 2006;Váry, Mullins, McElwain, & Doohan, 2015). Precipitation and soil moisture have a crucial effect on plant disease establishment, because most plant diseases are favoured by conditions of rain, high air humidity and high soil moisture (Velásquez et al., 2018).
For instance, fungal pathogens such as Magnaporthe oryzae require a minimum of five hours of leaf wetness for disease establishment (Magarey, Sutton, & Thayer, 2015). Temperature also plays a vital role in plant-pathogen interaction. Disease development in plants generally have an optimal range; for example, Xanthomonas oryzae has been shown to have an optimal daytime temperature of 35°C and night time temperature of 27°C for the infection of rice (Horino, Mew, & Yamada, 1982), while a temperature range between 26°C and 31°C are optimal for papaya ringspot virus to colonize papaya (Mangrauthia, Singh Shakya, Jain, & Praveen, 2009). Temperature is a crucial factor in the development of plant diseases, as the projected increase in the global temperature may change the areas where crops are susceptible to a particular pathogen (Velásquez et al., 2018). Furthermore, certain diseases with the potential to cause epidemics may never do so due to transient shifts in temperature (Velásquez et al., 2018). Temperature also has a significant effect on vectorborne pathogens such as Las transmitted by Diaphorina citri, Laf transmitted by Trioza erytreae and Banana bunchy top virus (BBTV) transmitted by Pentalonia nigronervosa (Robson, Wright, & Almeida, 2007) and influences the incidence and severity by affecting the vector (Robson et al., 2007). Temperatures favourable for the D. citri and T. erytreae could explain epidemics in Las and Laf even if the conditions are suboptimal for the bacterial replication.
HLB is known to be most severe in warm and wet climates (Bové, 2014), which are common in the areas that showed optimal habitat suitability in this study. We found that the environmental variables which had the highest contribution to the models were the temperature and temperature-dependent precipitation variables.
Specifically, the precipitation of the wettest month had the highest contribution to the habitat suitability of Liberibacter, while the annual mean temperature was the most important variable when considered in combination with other variables. These results are in agreement with a previous study that showed that optimal and limiting temperature conditions for HLB depend on rainfall (Shimwela et al., 2016). Also, the vector D. citri is heat-tolerant and survives in tropical and subtropical climates.
The ensemble approach to plant pathogen predictive modelling carried out in this study corroborates previous vector-based approaches to HLB prediction. Our results highlight the potential distribution of Las in Africa and show that large areas of Central Africa, Eastern Africa, Southern Africa and some parts of Western Africa are highly suitable for HLB establishment. This is in agreement with a global study on the potential spread of HLB that used two correlative modelling approaches to predict the potential distribution of Las based on the distribution of the insect vector (Narouei-Khandan, Halbert, Worner, & VanBruggen, 2015). The authors also pointed to central and south-eastern parts of Africa as highly suitable for HLB. Our results concur with a previous study highlighting areas in West Africa as highly suitable for HLB, also based on habitat suitability of the vector, as obtained from a multimodel framework (Shimwela et al., 2016). where the pathogen was found in previous studies Saponari et al., 2010). In 2016, the presence of Las was reported in Uganda (Kalyebi et al., 2016). However, the presence points from Uganda were not included in the models as the report was shown to be a misidentification of LafCl (Roberts et al., (2017).
In Uganda, large areas of marginal suitability and a few areas of high suitability for Las were predicted but no areas of optimal suitability for the pathogen were observed.
Citrus cultivation in most parts of Africa is a key source of livelihood for small holder farmers as production ranges from small to medium scale . In Kenya and Tanzania, ACG has had the greatest impact on citrus production in the cooler highland regions, causing yield losses of 25%-100% (Pole, Ndung'u, Kimani, & Kagunu, 2010 Vector-based modelling has shown that it took more than seven years after documented individual insect-based infections for the entire grove to become fully symptomatic (Kobori, Takasu, & Ohto, 2012). In the context of invasive pathogens such as Las, this approach could prove fatal as infected saplings from seemingly uninfected orchards can be unintentionally transferred to new groves, thereby facilitating non-vector dissemination of the disease. Infection of trees due to the presence of D. citri can occur such that, within 1-2 years, the entire grove can be asymptomatically infected (Lee et al., 2015).
HLB surveillance has been problematic because infected citrus groves may not show symptoms of the disease until up to 5 years after infection (Manjunath et al., 2008). Therefore, surveillance has relied on the detection of the pathogen within insects because the infection can be detected in the vector months before the plants develop symptoms (Manjunath et al., 2008;Shen et al., 2013). Our projection of the distribution of Las will inform; closer vector monitoring in citrus-producing areas with high suitability for HLB establishment, as well as periodic testing of asymptomatic citrus plants in these high-risk areas.
Liberibacter presence points obtained from detection in citrus plants and used in the models show that the habitat suitability for citrus production has an implicit impact on the potential distribu-

ACK N OWLED G EM ENTS
We would like to thank Dr Brian Isabirye and Abelmutalab Gesmalla for intellectual support and feedback. We gratefully acknowledge the support for this research by the following organizations and

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Raster layers of the models are available to managers and the scientific community upon request.