Different environmental gradients associated to the spatiotemporal and genetic pattern of the H5N8 highly pathogenic avian influenza outbreaks in poultry in Italy

Abstract Comprehensive understanding of the patterns and drivers of avian influenza outbreaks is pivotal to inform surveillance systems and heighten nations’ ability to quickly detect and respond to the emergence of novel viruses. Starting in early 2017, the Italian poultry sector has been involved in the massive H5N8 highly pathogenic avian influenza epidemic that spread in the majority of the European countries in 2016/2017. Eighty‐three outbreaks were recorded in north‐eastern Italy, where a densely populated poultry area stretches along the Lombardy, Emilia‐Romagna and Veneto regions. The confirmed cases, affecting both the rural and industrial sectors, depicted two distinct epidemic waves. We adopted a combination of multivariate statistics techniques and multi‐model regression selection and inference, to investigate how environmental factors relate to the pattern of outbreaks diversity with respect to their spatiotemporal and genetic diversity. Results showed that a combination of eco‐climatic and host density predictors were associated with the outbreaks pattern, and variation along gradients was noticeable among genetically and geographically distinct groups of avian influenza cases. These regional contrasts may be indicative of a different mechanism driving the introduction and spreading routes of the influenza virus in the domestic poultry population. This methodological approach may be extended to different spatiotemporal scale to foster site‐specific, ecologically informed risk mitigating strategies.


| INTRODUC TI ON
In 2017, the Italian poultry sector has been involved in the highly pathogenic avian influenza (HPAI) epidemic caused by type A (H5N8) viruses of clade 2.3.4.4 that spread across 29 European countries from the winter of 2016 (Brown et al., 2017). In Italy, eighty-three outbreaks were confirmed in the domestic poultry during 2017, that peculiarly occurred in two distinct epidemic waves, affecting both the rural and wild bird ecosystem, where all influenza virus subtypes are maintained in the low pathogenic forms, characterized by a predominantly environmental transmission between hosts, especially within residential and migrating populations of the orders Anseriformes and Charadriiformes, through sharing foraging and breeding areas.
The second is the poultry farms and their associated value-chain networks system, which forms a secondary system. In the latter, once the AI virus is introduced, its transmission is mainly related to human-mediated activities (Alexander, 2007). Environmental correlates of avian influenza virus occurrence and persistence have been broadly studied at local and global scale, both in wild birds and domestic poultry populations, by means of spatiotemporal and niche suitability modelling, especially for the H5N1 HPAI virus, which has spread extensively and for more than a decade worldwide (FAO, 2019). Several spatial analytical studies were reviewed by Gilbert and Pfeiffer (Gilbert & Pfeiffer, 2012 and references therein).
These authors highlighted three main categories of factors, whose pattern associated with H5N1 HPAI presence showed to be consistent worldwide: (a) high poultry density, with a specific indication of the domestic waterfowls productive type; (b) anthropogenic variables (human population density, distance to roads and cities, which can be a proxy for indirect contacts due to human-mediated activities, movements of poultry, poultry products, personnel, manure, equipment); and (c) presence of water or indices of habitat suitable for waterfowl. Besides, AI occurrence has been found associated to a wide range of values of eco-climatic variables (e.g. vegetation indices, temperature, precipitation) aggregated over different periods (month, year, season), reflecting the plasticity of this virus in nature and supporting the hypothesis that, for a directly transmitted disease, such factors did not represent a constraint. Land cover and land use were rarely systematically included and, if so, they define a combination of previously described factors (environment suitable for wild birds' ecology or human related activities) (Gilbert & Pfeiffer, 2012). Finally, low altitude values were found as well consistently associated to HPAI outbreaks occurrence (Loth et al., 2011;Mannelli, Ferre, & Marangon, 2006). Conversely, for the H5Nx viruses of clade 2.3.4.4 there is a notable paucity of such studies and most of them have a focus on local production systems (Dhingra et al., 2016;Guinat et al., 2019;Guinat et al., 2018;Kim et al., 2018), largely because at the time of publication, extensive data on these viruses potential range were not yet available. The most noticeable difference concerned the prevalent effect of host distribution, especially of extensively reared poultry, on H5N1 HPAI outbreaks occurrence, versus the association of H5Nx clade 2.3.4.4 viruses with a mixture of environmental, managerial (biosecurity practices) and host distribution variables, especially chicken intensively reared.
It is hypothesized that in domestic poultry populations, the risk of introduction is determined by the farm's neighbourhood characteristics, contact network and the levels of biosecurity. Therefore, if wild birds act as the main source of infection for domestic poultry, then the disease pattern is expected to be strongly associated to eco-climatic and physical environmental factors that directly or indirectly promote the availability of shelter and food for waterfowl. Contrarily, any anthropogenic, managerial and demographic aspects or their surrogate measures that are found associated to outbreaks occurrence or outbreaks' features could be deemed as a robust indication that a secondary transmission between domestic premises is at the origin of the infection.
Recently, it was postulated that assembling methodologies from diverse disciplines (ecology, epidemiology and phylogenetics) can  Primary (n = 4) and secondary (n = 32) infected premises that can be grouped into 4 different clusters inferable due to high number of connections (proximity between cases, genetic similarity and/or trade of living poultry  The industrial sector comprises holdings where birds are kept for commercial purposes only. The rural sector includes those holdings where birds (less than 250 in number) are reared and kept exclusively for self-consumption or are reared and traded with other rural premises, or a combination of these activities (Cecchinato et al., 2011) 3 One of the four distinct genomic groups identified during the H5N8 HPAI epidemic in Italy (Fusaro et al., 2017;Mulatti et al., 2018) 4 First wave (1): January-May 2017; second wave (2): July-December 2017 5 As defined by , a primary case (1) is when no at-risk contacts with previously detected outbreak emerged from the epidemiological investigation and no phylogenetic similarity was found. Contrarily, a secondary case (2) is when at-risk contacts with an infected farm during its outbound risk period could be reliably traced back and/or genomic analyses showed a high degree of similarity between virus isolates.

TA B L E 1 (Continued)
advance the understanding of emerging infectious diseases, which are inherently difficult to study because of ecological interactions among animal populations, viruses and their environment (Plowright, Sokolow, Gorman, Daszak, & Foley, 2008). The combined application of statistical techniques from multivariate data analysis (ordination and clustering methods) and classical regression modelling, proved well suited to serve that aim for avian influenza (Carrel, Emch, Nguyen, Todd Jobe, & Wan, 2012;Mughini-Gras et al., 2014). Data reduction is the first step in a complete analysis of ecological information, usually preceding the possibility to examine the ordination-environment relationship(s). Ordination is a well-established strategy for reducing the dimensionality of a multivariate data set by condensing a large number of original variables into a smaller set of new dimensions with a minimum loss of information. It has long been used in ecology for both data presentation and to reveal important and interpretable environmental gradients associated with the community data . By means of ordination, ecological community attributes can be summarized and plotted in a multidimensional graph, which can help us see whether the community data are structured or contain patterns. These patterns may reflect a community's response to multiple environmental changes or more subtle biological interactions. Besides, with the advent of phylogeography, integrating the genetic evolution dimension in the study of the ecological system within which AI viruses occur and evolve became easier.
Our goal in the present study was twofold. Firstly, to determine whether it was possible to identify a pattern in the Italian H5N8 HPAI outbreaks diversity, with respect to their peculiar spatiotemporal and genetic features, using multivariate statistical methods.
Secondly, to reveal whether the pattern of the outbreaks community could be related to eco-environmental variables gradients, through multi-model regression selection and inference. Once the pattern of H5N8 HPAI outbreaks in Italy will be related to environmental factors variation, this may prove useful in developing site-specific risk mitigating measures.

| Outbreaks data
The data set used to explore the environmental correlates of AI outbreaks pattern consists of the 83 H5N8 HPAI cases confirmed in the domestic poultry in Italy in 2017. Table 1 summarizes the main features of each outbreak as derived from the epidemiological investigation and genetic characterization of the virus isolates, which have been thoroughly described by Mulatti et al., (2018) and by Fusaro et al., (2017).

| Environmental data
By reviewing relevant literature, in particular for viruses of the H7 and H5 subtypes (such as H5N1 and H5NX clade 2.3.4.4), we identified factors, as well as proxies for some of these, known to define area suitability for HPAI outbreaks (Belkhiria, Alkhamis, & Martinez-Lopez, 2016;Dhingra et al., 2016;Guinat et al., 2019), or that best relate to virus occurrence or virus introduction and spread in domestic poultry and wild birds, both at local or global-scales (Gilbert & Pfeiffer, 2012;Si, de Boer, & Gong, 2013;Si et al., 2010). This information was then combined with an assessment of data availability, so that seventeen eco-environmental factors were selected as inputs for our data analysis. Those variables were grouped into three main categories: host, land cover and eco-climatic variables.
For each factor/group of factors, corresponding data sources, trends relative to AI virus spread or risk of introduction, rationale for model inclusion, and references are described below and summarized in Table 2.
Host variables include the densities of human population, of domestic poultry holdings and of their respective poultry population.
Poultry farms and poultry population densities were considered a surrogate for the risk of virus spread from farm to farm associated to densely populated poultry areas, where the high direct and indirect contact rates in an immunologically naïve hosts' community influence disease flare-ups with known major detrimental effects (Marangon, Capua, Pozza, & Santucci, 2004). We used the official poultry census data, which were derived in 2017 from the National Animal Registry, coupled with actual census data, collected during each outbreak investigation. Densities were computed for each outbreak for a buffer radius of 3 and 10 km, in relation to the protection and surveillance zones put in place during an outbreak. The number of holdings that fell within the buffer zones and were in operation up to two weeks preceding the date of onset of symptoms (considered as the maximum incubation period for AI in a flock (Swayne, 2016), as well as their maximum stocking capacity, were included in the calculation. Human population density was included as a proxy for any human-mediated transmission mechanisms, and was based on the data provided by the Italian National Institute of Statistics (ISTAT) at the level of municipal census block for the year 2017.
Land cover and eco-climatic variables were extracted from

| Phylogenetic analysis
Pairwise evolutionary distances between domestic cases were calculated from the branch lengths of phylogenetic trees estimated using an alignment comprising one virus sequence associated with a bird belonging to each case. Aligned haemagglutinin (HA) gene sequences and dates of onset of symptoms were used to reconstruct time-measured phylogenetic trees inferred using the program BEAST v1.10.4 (Suchard et al., 2018). Prior to implementation in BEAST, models of nucleotide evolution were compared using BIC and AIC calculated using jModelTest v2.1.7 (Darriba, Taboada, & Doallo, 2012). This indicated that sequence evolution should be modelled using the general time reversible model with a proportion of invariant sites and a gamma distribution describing among-site rate variation with four categories estimated from the data (GTR + I + Γ 4 ). This model was implemented in BEAST with a relaxed molecular clock with branch rates drawn from a lognormal distribution (Drummond, Ho, Phillips, & Rambaut, 2006). This model, where branch-specific rates of evolution are drawn from a lognormal distribution, was fa- Phylogenies were constructed using an alignment of 83 sequences from each domestic case and a further ten sequences from wild birds cases and, alternatively, using an alignment of domestic cases only.
For each alignment, two independent MCMC chains consisting of 30,000,000 steps were run and sampled every 3,000 steps. Samples were combined after removing 10% of samples as burn-in. To account for phylogenetic uncertainty, pairwise evolutionary distances were calculated for each of 100 trees drawn from these posterior samples of trees using the function cophenetic.phylo from the ape R package v5.4 (Paradis & Schliep, 2019). Mean pairwise evolutionary distances were calculated averaging across these and used for further analysis.

| Ordination and cluster analysis
The ordination of H5N8 Italian outbreaks was achieved using nonmetric multidimensional scaling (NMDS). This is a distance-based method, which attempts to represent as closely as possible the pairwise dissimilarity between objects, expressed by ranks.  (Borcard, Gillet, & Legendre, 2011). The goodness-offit criterion, which measures the degree of distortion of the ordination with respect to the original input data, is called stress.
Conventionally, stress values < 5% indicate a good fit for the data,

| Multivariable regression: multi-model selection and inference
When faced with a biological question, such as to determine which covariates are driving an ecosystem, one of the most difficult aspect of the data analysis is probably how to deal with collinearity, which implies that two or more regressors are conveying the same information (Zuur, Ieno, & Elphick, 2010). Before running the multivariable regression, we addressed this issue by sequentially dropping the covariate with the highest variance inflation factor (VIF), recalculate the VIFs and repeat this process until all VIFs were smaller than 5. When the variable with the highest VIF had an alternative measure of the same predictor (e.g. EVI and NDVI), then the more biologically plausible variable from this pair, or the one which gives a better measure of the relative quality of the regression model for a given set of data (i.e.
lower Akaike information criterion, AIC), was chosen to be retained. Within the set of all the potential models, only those whose total cumulative Akaike weights ( To find evidence of fine-scale spatial variation that was not accounted for in the model, plots of the empirical variogram of the standardized residuals for each of the multinomial models whose total cumulative wAICc was at least equal to 0.95 were examined. This was coupled with the computation of envelopes using data permutation (simulations n = 999) under the assumption of no correlation. If the variogram plot falls within the envelopes, the model is presumed to have considerably reduced spatial autocorrelation. The empirical variogram and the variogram envelopes were estimated using the geoR package in R.
Lastly, to visually examine the gradient of effect of the variables included in the best-ranked regression models across the ordination graph, smooth surface thinplate splines were fitted using the ordisurf function in R package vegan. This procedure uses generalized additive models (GAMs) to overlay a smoothed response surface to the ordination space, which allows a more detailed visual interpretation and reveals more complex patterns than just a linear relationship.

| Outbreaks ordination and cluster assignment
The NMDS analysis produced a two-dimensional solution with a final stress value of 1.29%, indicating that the ordination represented well the spatiotemporal and genetic differences among the 83 H5N8 HPAI outbreaks. The ordination graph, given in Figure Figure S4 shows the final dendrogram with boxes around the three clusters. A cophenetic correlation coefficient of 0,95 indicates that the cluster analysis well represent the original distance matrix. Clusters 1, 2 and 3 are arranged by colour code and superimposed on the ordination plot in Figure 2, showing a consistent representation of the spatial, graphical output of ordination and the hierarchical cluster assignment.
Within 1 km around each outbreak locations, the dominant land cover class is agricultural areas, followed by artificial surfaces (well-developed roads and railways) and a relatively higher percentages of forest and semi-natural areas in the third cluster, with the latter feature resembling the higher percentage of outbreaks involving the rural sector, located in more hilly areas. None of the land cover classes resulted significantly different between clusters.

| Regression analysis
Multi-collinearity was high (VIF > 5) among variables representing alternative measurements of animal host density, as well as within the group of eco-climatic variables, as well as between land cover variables and elevation. The two vegetation indexes were highly collinear. We decided to keep the NDVI because it is a better predictor

TA B L E 3
Ninety-five per cent confidence set of best-ranked regression models (the 5 models whose cumulative Akaike weight, wAICc, is ≤0.95) of AI occurrence in the literature (Gilbert & Pfeiffer, 2012;Mannelli et al., 2006;Si et al., 2010) and because it better predict the outcome variable in the regression analysis compared to the combination of agricultural areas and forest and semi-natural areas land cover classes. The following variables were omitted from further analysis in order from first to last: EVI, poultry farm density within 10 km, forest and semi-natural areas, poultry density within 10 km, human density and poultry farm density within 3 km, amount of precipitation, water bodies, rice fields, wetlands and agricultural areas.
The multinomial regression model included 6 predictor variables: poultry population density within 3 km, artificial surfaces, distance to the nearest wetlands, elevation, NDVI and the number of precipitation days.
Sixty-four models were built considering all the possible combination of variables. The highest ranked model had a wAICc of 0.43, and 5 models were required to reach a cumulative wAICc of 0.95 (Table 3). Within the multi-model inference framework, it was possible to quantify the relative variable importance, by estimating the probability that a given explanatory variable is included in the best IT model ( Table 4). The model with averaged parameters showed that only few factors suffice to discriminate the outbreaks observed in the H5N8 epidemic of 2017: host density, vegetation index (NDVI) and elevation, which were always included in the top 5 models (out of a set of 64) that accounted for a cumulative wAICc ≤ 0.95. The climatic variable 'number of rainy days' was included only in 3 out of 5 models.
In contrast, the other factors considered (the percentage of land covered by artificial surfaces and the distance to the nearest wetlands) were of relatively low importance (Table 4), as can be inferred from their lower probability of being included in the best AICc model.

Pattern of community similarity, for both Italy A (Cluster 2) and
Italy B (Cluster 3), was strongly and positively influenced by the host density, precipitation and NDVI, compared to outbreaks of the first wave (Cluster 1), the reference category. Diversity between clusters is reflected as well in the altitude. An increase in altitude can be found associated to Cluster 3 compared to Cluster 1, but conversely for Cluster 2 compared to Cluster1. Artificial surfaces and distance to the nearest wetlands contribute to explain the pattern of AI outbreaks, but with a lower relative importance. A higher percentage of artificial surfaces and increasing distance to the nearest wetlands are associated to the probability of belonging to Cluster 2 and 3 than to Cluster 1. The same relationship holds for the distance to the nearest wetlands, for which a higher probability of belonging to Cluster 2 and 3 than to Cluster 1 is associated to a greater distance to the nearest wetlands.
The variograms of the residuals for the 5 models whose cumulative wAICc was 0.95 were all contained by their envelopes, which suggests that no spatial autocorrelation greater than that which might be expected by chance remained after adjusting for the covariates. Therefore, models that account for spatial dependency were not explored in this study.
The GAM approach revealed a complex pattern in the relationship between the outbreaks and most of the environmental variables, with the exception of precipitation days and percentage of artificial areas, for which the algorithm fitted a linear trend. Figure   S6 shows how those variables (response surfaces) changes across the space of the NMDS graph, where the ordinated outbreaks are  were obtained for distance to the nearest wetlands (22.6%), number of precipitation days (11.2%) and artificial surfaces (7.08%). The fitted response surfaces were significant for all the models.  (Fusaro et al., 2017).

In the NMDS space, a high degree of distinctiveness between
Italy A and Italy B groups of outbreaks is well represented (they are located in two different and opposite quadrants) along with a clear separation from the outbreaks of the first wave (located in between the Italy A and Italy B groups). Indeed, genetic analyses (Fusaro et al., 2017) and further integrated phylogenetic and phylodynamic analyses (Harvey et al., 2019), suggested that Italy A and Italy B groups likely originated in Italy and emerged when the H5N8 virus was already circulating in this country, prior to the first case of the second wave. In addition, the outbreaks ascribed to Italy A and Italy B differed in the geographical locations where they occurred, which never overlapped during the epidemic . Italy A was isolated in outbreaks affecting the eastern part of the DPPA, while Italy B circulated mainly in the western part of that area, with sparse incursions in the southern regions involved in the epidemic in 2017. Contrarily, the similarity shown in the ordination graph within the Italy A and Italy B groups is very high, and it is possible to pinpoint multiple areas where points overlapped. Those correspond to the 4 different clusters inferable as such by Mulatti and colleagues  for the high number of connections (proximity between cases, genetic similarity and/or trade of living poultry).
Using multi-model selection, a combination of host density and eco-climatic variables, rather than land cover predictors, appeared to best model the probability of belonging to one of the three clusters of outbreaks identified in this study, with a ruling effect of poultry population density within a 3km radius, NDVI and elevation, followed by precipitation, presence of artificial surfaces and distance to the nearest wetlands. In addition, the response surfaces produced for each factor using additive models, showed that outbreaks be- Resuming the conceptual framework that tries to relate avian influenza outbreaks pattern to different drivers of the pathogen introduction and spreading routes, it posits that if AI virus is introduced into the domestic poultry via wild birds, the outbreaks pattern could be found related with features that promote wild bird host ecology (i.e. breeding and feeding habits). On the contrary, when the pattern is related to human related activities or some proxies for that, a secondary between holdings transmission hypothesis can be corroborated (Si et al., 2010(Si et al., , 2013 Notwithstanding, it is possible to pinpoint factors-elevation, distance to nearest wetlands and number of rainy days-with respect to whose gradients, Italy A and Italy B can be discriminated as a sort of sub epidemics. Higher altitudes, have always been found negatively associated to the risk of outbreaks occurrence (Gilbert & Pfeiffer, 2012;Mannelli et al., 2006;Si et al., 2010), possibly because of increased geographical isolation, which implies local environmen- A positive gradient of the number of rainy days in the 15 days preceding the onset of symptoms is found associated to outbreaks of Italy A type. Rainfall patterns may importantly determine breeding opportunities and are therefore linked to wild bird numbers (Si et al., 2010). This subsequently can influence the age structure of that community, which may well affect AI dynamics.
However, we cannot rule out that rainfall may also have a direct effect on AI dynamics, as AI virus is generally highly persistent in water (Stallknecht, Shane, Kearney, & Zwank, 1990 proximity. This translated as well in strong discontinuities along the data set of such variables. In cases like this, methods which set a threshold (e.g. regression or classification trees) would have been better at describing the relationship between these predictors and the clusters than linear models does (Ouellette, Legendre, & Borcard, 2012).
This study offered the opportunity to explore the peculiarities of the Italian H5N8 HPAI epidemic using a disease ecology perspective. The results corroborate the hypothesis that, in Italy, the H5N8 HPAI epidemic occurred within the domestic poultry population in 2017, owed their distinctive marks to different epidemiological processes that shaped the virus introduction and further spread (Harvey et al., 2019;Mulatti et al., 2018). However, to date the processes linked to the emergence in the winter and recrudes- One important advantage in our study is that outbreaks community patterns were derived from empirical data, rather than spatially extrapolated on the basis on environmental relationships or assumed ecological niches. Besides, phylogenetic data were integrated in order to consider virus evolution during the epidemic as a pivotal dimension of the epidemic ecological system. The present study has considerable potential, and its insights could be applied to initiate diverse modelling approaches in order to more accurately predict disease occurrence at different space and time scales associated to ecological variation, which are often mediated by complex, large-scale processes that are not immediately amenable to traditional approaches to causal inference. Two steps should be undertaken in any follow-up. Firstly, with more phylogeographic studies becoming available, it should be possible to separate true persistence in a country from new introductions, which would enhance our capacity to identify the areas most susceptible to those events. In conjunction with the recent advent of phylogenetic trait-based approaches, other ecological analysis frameworks may be considered to tackle ecological data assessment, when the relationship between a main explanatory data set and the response is non-linear (Ouellette et al., 2012;Zuur, Ieno, & Smith, 2007), such as niche modelling, multivariate regression trees or classification trees and their extensions (i.e. boosted regression trees). Secondly, we could have more finely estimated risk indices at farm level. For example, biosecurity practices, trade activities and inbound and outbound movements data often hide considerable within country variation. It can be anticipated that they may prove useful in refining several site-specific conditions where AI control efforts may be targeted. Similarly, it would also be important to consider how biological and climatic processes that help explaining outbreaks pattern in domestic poultry, interact with the dynamics of AI infection in the wild bird population, and the practical applications of this for surveillance programmes.
Importantly, eco-environmental conditions are dynamic and so are the interactions between hosts and the environment where avian influenza (AI) viruses circulate. Expanding temporal and spatial coverage to historical data of Italian LPAI and HPAI epidemics or to the European H5N8 HPAI most recent epidemic data may help identify trajectories over which some countries, or peculiar poultry production systems develop greater risk for AI incursions and persistence, and to fully describe the environmental gradients in which influenza viruses may evolve.

CO N FLI C T O F I NTE R E S T
The authors declare no competing interests.

E TH I C A L A PPROVA L
The study did not include any experimentation on animals or humans. Samples were taken from dead birds on infected farms, as part of measures provided by the European Council Directive on Avian Influenza (Council Directive 2005/94/EC).

DATA AVA I L A B I L I T Y S TAT E M E N T
The data supporting the findings of this study are available from the corresponding author upon reasonable request.

S U PP O RTI N G I N FO R M ATI O N
Additional supporting information may be found online in the Supporting Information section.