Geodemography, environment and societal characteristics drive the global diversity of emerging, zoonotic and human pathogens

Abstract Understanding human disease, zoonoses and emergence is a global priority. A deep understanding of pathogen ecology and the complex inherent relationships at the agent–environment interface are essential to inform disease control and mitigation and to predict the next zoonotic pandemic. Here, we present the first analysis of social and environmental factors associated with human, zoonotic and emerging pathogen diversity at a global scale, controlling for research effort. Predictor–response associations were captured by generalized additive models. We used national level data to aid in policy development to inform control and mitigation. We show that human population density, land area, temperature and the human development index at the country level are associated with human, emerging and zoonotic pathogen diversity. Multiple models demonstrating society–agent–environment interactions demonstrate that social, environmental and geographical factors predict global pathogen diversity. The analyses demonstrate that weather variables (temperature and rainfall) have the potential to influence pathogen diversity.


| INTRODUC TI ON
Infectious diseases are a leading threat to both human health and the global economy (Johnson et al., 2015). In a database of 335 emerging infectious disease (EID) events (defined as those caused by newly evolved strains of pathogens such as multi-drug-resistant pathogens or pathogens that have recently entered human populations for the first time or pathogens which have recently increased in incidence) between 1940 and 2004, more than 60% had a zoonotic origin (note that zoonotic pathogens were defined as pathogens that originated in non-human animals) and events increased over time (Jones et al., 2008). Ovser time, we have seen the emergence of many novel pathogens such as severe acute respiratory syndrome coronavirus, H1N1 influenza, Ebola and Nipah virus (Morse et al., 2012;Murphy, 1998). Thirty new diseases including Legionnaires disease, acquired immune deficiency syndrome, hepatitis C, Nipah virus, Helicobacter pylori, severe acute respiratory syndrome, COVID-19, avian influenza, several viral haemorrhagic fevers and bovine spongiform encephalopathy/variant Creutzfeldt-Jakob disease have been reported in the last 50 years (Robin & Anthony, 2004). It is speculated that new infectious diseases could emerge anywhere across the globe at any time (Robin & Anthony, 2004), and the next human pandemic might have a zoonotic origin (Lancet, 2012) as evolution works on biodiversity to create novel pathogens.
Anthropogenic disturbances may play an important role in disease emergence and zoonoses. It has been argued that these disturbances primarily result from human incursions into pristine ecosystems and ecotones (Jones et al., 2013), and microorganisms often exploit such circumstances (Robin & Anthony, 2004); however, only limited information is available about the processes and actual facts of EIDs and zoonoses from wildlife globally. It is proposed that the interactions among ecological, biological and social processes in South East Asia enable microbes to exploit new ecological niches, leading to a high risk from emerging infectious diseases in this region (Coker et al., 2011). Rapid environment change and agricultural intensification have been described to play an important role in this phenomenon (Jones et al., 2013). On the contrary, it should also be noted that majority of the recurringly transmissible zoonotic pathogens (with an existing animal reservoir) come from domestic animals with rare occurrence from wildlife sources-except for a few specific diseases in which domiciliated animals such as rodents are the main reservoir hosts. Therefore, the demographic change in domestic animals might be an important factor associated with the emergence of zoonosis, and this needs to be investigated. Understanding the ecological interactions underlying emerging and zoonotic infectious diseases (Qi et al., 2020;Ward et al., 2020aWard et al., , 2020b is important for their mitigation and control (Johnson et al., 2015).
Not much is known, and we need to look at all interactions and dynamics for EID and zoonosis to develop a more comprehensive contextual and specific understanding. Rapid increase in food trade and human travel is believed to be some of the main drivers of emergence of pathogens (Kilpatrick & Randolph, 2012). Modelling the dynamic factors underlying infectious diseases can assist the development of disease prevention and control strategies (Heesterbeek et al., 2015). Policymakers will benefit from the generated quantitative evidence for decision-making and public health policy formulation (Heesterbeek et al., 2015).
To date, disease models to understand the factors responsible for pathogen diversity are limited. There are no mechanistic analyses that capture pathogen (human, zoonotic and emerging) diversity at the society-environment interface and their non-linear relationships.
Predictive modelling using national data to inform country-specific control programmes is also lacking. Jones and colleagues (Jones et al., 2008), using a global database, analysed 335 EID events reported between 1940 and 2004. However, only the 'emerging infectious disease events' data were modelled; the existing non-linear relationships (between predictor and response variables) were not accounted for and a limited number of socioeconomic and environmental predictor variables were used for model building. Following this, the occurrence of 147 'emerging infectious disease events' for wildlife zoonoses (or just zoonotic origins) was modelled using demographic and environment variables (Allen et al., 2017). However, this study also had some disadvantages as the ongoing linkage of all the diseases with animals could not be established. Biogeographic grouping patterns of 187 human infectious diseases have also been described (Murray et al., 2015).
Building on this previous research focussed on certain events/ diseases, we conducted a novel cross-sectoral assessment rigorously investigating the relative contributions of potential geodemographic, environmental and social factors on the diversity of human, zoonotic and emerging pathogens. Noteworthy among these factors are the driving forces fundamental for emergence of infectious diseases (Binder et al., 1999;Jones et al., 2008;Morens et al., 2010;Robin & Anthony, 2004). A deep understanding of pathogen ecology, as well as the complex inherent relationships at the agent-environment interface, is essential to inform disease control and mitigation and to predict the next pandemic zoonosis. Public health measures are important barriers to disease emergence and zoonoses (Lindgren et al., 2012;Mitchell, 2000). Therefore, the effects of healthcare facilities and expenditure on health, research and development on pathogen diversity were also explored.
We would also like to stress and to make the reader aware of the various uncertainties and controversies surrounding the ontologies used in this and in previous studies conducted on this topic. As per the Joint WHO/FAO Expert Committee on Zoonoses, zoonoses are defined as 'those diseases and infections which are naturally transmitted between vertebrate animals and man (WHO/FAO, 1959)'. However, even the WHO uses different definitions of the term 'zoonosis'. For example, these are two other-and very different-definitions of zoonoses used by WHO: (a) 'A zoonosis is any disease or infection that is naturally transmissible from vertebrate animals to humans' and (b) that 'A zoonosis is an infectious disease that has jumped from a non-human animal to humans' (https://www.who.int/news-room/ fact-sheet s/detai l/zoonoses). This has created a difference in opinion regarding the infectious agents that should be considered as zoonotic pathogens. For example, some people believe that species-jumping or just zoonotic origin pathogens such as HIV or novel coronavirus (https://www.who.int/news-room/fact-sheet s/detai l/zoonoses) classified as zoonotic by the WHO should be considered non-zoonotic pathogens because these pathogens are currently not transmitted between non-human vertebrates and humans. In the current and other studies (Taylor et al., 2001;Woolhouse & Gowtage-Sequeria, 2005), the standard definition of the WHO (WHO/FAO, 1959) for infectious agent classification was used, whereas Jones et al. (2008) coined a different definition to categorize pathogens as zoonotic (i.e., pathogens that originated in non-human animals). Therefore, these modelling studies should be carefully interpreted and compared cautiously, due to differences in the ontologies used for zoonotic or emerging pathogens. Note that the results of our study might have been different if we had only considered the recurringly transmissible pathogens (requiring an animal reservoir for maintenance) between humans and non-human vertebrates as zoonotic pathogens.
Lastly, the zoonotic or emerging pathogens in this study were classified as described in the methods. The isolation of these organisms from different hosts/countries only describes their zoonotic and emerging potential; some of these might not actually be emerging or zoonotic in these specific countries.

| Microbe-country data set
A species-location interactions data set developed by Wardeh et al. (2015) was used to determine viral, bacterial, parasitic and fungal microbes isolated from vertebrate hosts in different countries. The data set used in the analyses contained 13,892 unique microbe-country interactions (Appendix S1, p 130-408). Full details of microbe-country data set development are in the (Appendix S1, p 3). Note that there might be reporting bias due to factors such as difference in capacities in different countries and research focus oriented to prior decisions on risk.

| Human, zoonotic and emerging agents
Initially, we classified all microbes into human, zoonotic and emerging agents as reported by Taylor and colleagues (Taylor et al., 2001), and Woolhouse and Gowtage-Sequeria (Woolhouse & Gowtage-Sequeria, 2005). Zoonotic pathogens were classified as per the World Health Organization (WHO/FAO, 1959), as 'diseases and infections that are naturally transmitted between vertebrate animals and humans'.
However, species such as HIV that are no longer transmitted between humans and animals were not considered zoonotic (Taylor et al., 2001). Emerging pathogens were defined as those 'that have appeared in a human population for the first time or have occurred previously but are increasing in incidence or expanding into areas where they have not previously been reported, usually over the last 20 years' (IOM, 1992). Human agents were defined as previously reported infectious human pathogens (Taylor et al., 2001) plus additional microbes reported to have a human host (Wardeh et al., 2015).
The data set also included human/emerging/zoonotic agents that might not have been presented or reported in human hosts or have demonstrated their zoonotic/emerging potential in a given country but could infect humans and have been demonstrated to be zoonotic/emerging in some other parts of the world. This data set also had information about the microbe type (i.e., whether they were viruses, bacteria, parasites or fungi).

| Country-specific parameters
We examined published and official data and compiled information for 49 country-specific geodemographic, social, trade and environmental factors (Appendix S1, pp 4-5).

| Pathogen-country interactions and countryspecific parameters
The unique pathogen-country interactions data were merged with the country-specific variables to construct the analytical data set.
The final data set had information about country-specific parameters and the numbers of human, emerging, zoonotic and total pathogens isolated from different countries (Appendix S1, pp 6-46).

| Missing data
Initially, we compiled information for 224 countries for 49 variables, but because of a lack of available information the following countries were excluded: Anguilla; Antarctica; Aruba; Bonaire, Sint Eustatius Islands. The final data set (for the initial modelling) contained information for 190 countries (Appendix S1, pp 6-46).

| Predictors and outcome
We used the country-specific parameters (Appendix S1, pp 4-5) as key predictors. Three host-pathogen-associated outcomes were explored: zoonotic emerging and human pathogen diversity at the country level, where the zoonotic/emerging/human pathogen diversity was defined as the total number of different zoonotic/emerging/human pathogens reported from any given country.

| Statistical analysis
Descriptive analyses conducted included the creation of histograms and scatter plots of predictors with outcomes. Data were tested for the assumptions of linearity and normality. A variable was log-transformed if the assumptions of normality were not met (Appendix S1, p 47). We also detected non-error, non-representative outliers (values >1.5-fold interquartile range) in some predictor variables in the data (original or log-transformed). These outliers were treated using winsorization, and all further analyses were conducted both using the original and the winsorized data set.

| Generalized additive model (GAM)
We built three separate models for the response variables of zoonotic, emerging and human pathogen diversity using both the original and winsorized data. Because of non-linear associations between some predictors and the outcomes, we compared additive models using non-linear splines with linear models and retained the model in which the data were a significantly better fit and had a lower Akaike information criterion value.
A correlation matrix was then generated (separately for data with outliers and winsorized data) among the predictor variables to test for collinearity. Initially, only one predictor was retained for the multivariable models from among the predictors having a Pearson correlation coefficient >0.90 (Appendix S1, pp 48-49). From these variables, final multivariable models were constructed. We followed a forward stepwise approach and retained variables with p < .001, followed by retesting of all the variables with p < .25 and all nonsignificant variables. A final multivariable model was also re-tested by individually replacing a non-linear spline variable with a linear variable and retaining the model in which the data fitted significantly better and had a lower Akaike information criterion value.
Statistical analyses were conducted in the R statistical program (R statistical package version 3·4·0, R Development Core Team [2015], http://www.r-proje ct.org). We used the parametric function 'linear' and non-parametric function 's spline (thin plate regression spline fit)' to fit the predictor variables. The function 'ti spline' was used to determine significant interactions among the combined parametric and non-parametric response variables. The GAM model fitting separated the linear trend of predictor variables from any other non-parametric association to determine significance of smoothing variables with non-linear patterns. The final models were examined for concurvity (a non-parametric analogue of multicollinearity), and model diagnostics were also examined.

| Controlling for research effort
To control for research effort, we included gross domestic expenditure on research and development (GERD in 000s US dollars) of countries into all the analytic models. All the non-significant variables were further re-tested in the models.

| Prediction of the expected response
The GAM models were used to predict the expected response variable using all the available data. In brief, we used data from all 224 countries (for predictions; Appendix S1, pp 50-71), wherever available for all the significant predictors reported in the GAM models.

| Generalized additive modelling
In total, all the predictor variables included in the GAM explained 74.8%-87.2% of the deviance in the different models.

| Zoonotic pathogen diversity
The logarithm of land area, human development index, logarithm of human population density, logarithm of mean annual temperature (average), logarithm of exports percentage of gross domestic product (GDP) and forest area percentage were associated with zoonotic pathogen diversity Table 1, Figure 1). There were signifi-   Additional results, including the analysis of winsorized data, are presented in the (Appendix S1, pp 78-87).

| Emerging pathogen diversity
The GAM showed that emerging pathogen diversity is associated with the logarithm of human population density, agriculture land percentage, logarithm of land area, human development index value, national biodiversity index, longitude and the logarithm of sheep population (Table 2, Figure 3).
After controlling for research effort, emerging pathogen diversity was associated with the logarithm of human population density, logarithm of land area, human development index value and the logarithm of mean annual temperature average (Table 2, Figure 4). Further, significant interaction effects among the human development index value and the logarithm of land area, longitude, and the logarithm of human population density, human development index value, and the logarithm of mean annual temperature average, logarithm of mean annual temperature average and death rate per 1,000 people are reported (Figure 4). Additional results, including analysis of the winsorized data, are presented in the (Appendix S1, pp 88-106).

| Human pathogen diversity
The GAM showed a linear relationship of human pathogen diversity with the logarithm of human population density and logarithm of mean annual temperature average. Human pathogen diversity was also associated with the human development index value and national rainfall index (Table 3, Figure 5) Additional results, including analysis of the winsorized data, are presented in the (Appendix S1, pp 107-121).

| Prediction of the expected response
The predictive values of the response variable for all the GAM models are shown in the (Appendix S1, pp 108-112). A global richness map of pathogen diversity in different countries is shown in Figure 7.
The analysis indicates high pathogen diversity in many countries of Latin and North America, Asia, Australia and Europe (Appendix S1, pp 122-126).

| D ISCUSS I ON
We conclude that geodemographic, environmental and social factors to be significant predictors of zoonotic/emerging/human pathogen diversity. We believe that this analysis will help our understanding of the diversity of human, zoonotic and emerging pathogens as well as the associated global risk factors. The data generated are valuable TA B L E 2 Generalized additive models of emerging pathogen diversity (outcome-log number of emerging pathogens) reported from different countries   Significant terms and interactions are presented in Figure 6.

TA B L E 3
Generalized additive models of human pathogen diversity (outcomelog number of human pathogens) reported from different countries for similar investigations in the future. Overall, we found a large range in the number of human, zoonotic and emerging pathogens reported from different countries. Lack of uniformity is likely due to differences in environmental, social and geographical factors at the country level, as well as potential differences in reporting.
We used national level data in this study, so any country could use these models after collecting the desired data for their own policy development and to inform their control or mitigation actions. However, these models do not account for subnational variations; therefore, the results should be carefully interpreted for countries that have large subnational differences in geodemography, environment and societal characteristics. We also controlled for research effort to produce unbiased results; however, a certain level of uncertainty should be expected in the methods and outcomes. We believe gross domestic expenditure on research and development (GERD) per country to be a more appropriate proxy for controlling for research effort compared to a previously reported study (Jones et al., 2008) that accounted for these biases by quantifying reporting effort in an international journal (JID). The latter is biased; for example, scientists from non-English speaking countries are likely to have published their research in their native language journals. Many countries or institutes might have not published their research. GERD is a less biased representation of a country's research effort because laboratory support is more likely to translate into the number of pathogens reported from that country. GAM models indicated higher number of emerging pathogens (uncontrolled research effort) for countries in the longitude range of −50.0° to 0° and ≥150°. After controlling for research effort, countries in a longitude range of −100° to −30.0° (North and South American countries) and having high human population density were found to report a higher number of zoonotic pathogens.
Although only marginally significant, higher number of emerging pathogens were noted in the lower latitudes. The potential effect of proximity to the equator and hemisphere needs to be further investigated.
Certain variations were recorded in the analysis of winsorized data. For example, additional significant predictors-such as log mean annual temperature average, HDI value and longitude-were also associated with zoonotic pathogen diversity (after controlling for research effort); and the variable forest area percentage became non-significantly associated with zoonotic pathogen diversity (after controlling for research effort). After controlling for research effort, the predictors for emerging pathogen diversity (original and winsorized data) were similar except that the log health expenditure per cent GDP was additionally associated with emerging pathogen diversity (winsorized data). We accounted for non-linear relationships and used 49 socioeconomic and environmental variables during model development. By using pathogen biodiversity rather than EID events as our response variable, we included unknown or future disease emergence.
We found human population density, land area, mean annual temperature (average) and human development index value to be associated with the overall diversity of human, emerging and zoonotic pathogens, and this association is not limited to EID events as reported previously (Jones et al., 2008). The national rainfall index was a significant predictor in the models of human and zoonotic pathogen diversity, but had no role for emerging pathogen diversity.
Our analyses indicate that many socioeconomic and environmental factors are equally important for zoonotic and human pathogen diversity, as reported for disease emergence.
Our analysis shows the national biodiversity index as a significant predictor in emerging pathogen diversity models, as for EID events reported by Jones and colleagues (Jones et al., 2008). We also report forest area percentage to be associated with zoonotic pathogen diversity. Forests are home to terrestrial animal biodiversity (Brockerhoff et al., 2017), and deforestation and forest incursions have been linked to zoonosis (Wolfe et al., 2005). We found that the diversity of emerging human pathogens was additionally correlated with longitude and death rate per 1,000 people. This needs to be further explored.   However, we believe that availability of our data assist the countries where no predictor/response data are currently available to develop so that the desired information is available for analysis using our models. This will help overcome the limitations associated with incomplete data in the future.
Overall, social and environmental factors and geography are significantly associated with global pathogen diversity. Finally, our analyses demonstrate that weather variables (temperature and rainfall) have the potential to influence pathogen diversity. Further research is required to assess the long-term impact of these variables.
Similarly, the impact of climate change on pathogen diversity is a topic that needs to be researched. We believe future models based on simultaneous testing of host, agent and environment characteristics for prediction will shed more light on disease emergence and zoonoses.
We conclude that weather variables, as well as forest and biodiversity conservation, have the potential to influence human, zoonotic and emerging pathogen diversity in the near future.

ACK N OWLED G EM ENTS
The authors thank the Department of Education and Training,

Australian Government for providing the 2018 Endeavour Research
Fellowship to the primary author to conduct this research.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.

E TH I C A L A PPROVA L
Informed consent for collection of epidemiological data was not required, as these data were already coded and available in the public domain. No identifiable personal information was used in this study.

DATA AVA I L A B I L I T Y S TAT E M E N T
The analysed data are available along with the manuscript. Sources of the raw data used in the analysis have been cited.