Combining Bayesian genetic clustering and ecological niche modeling: Insights into wolf intraspecific genetic structure

Abstract The distribution of intraspecific genetic variation and how it relates to environmental factors is of increasing interest to researchers in macroecology and biogeography. Recent studies investigated the relationships between the environment and patterns of intraspecific genetic variation across species ranges but only few rigorously tested the relation between genetic groups and their ecological niches. We quantified the relationship of genetic differentiation (F ST) and the overlap of ecological niches (as measured by n‐dimensional hypervolumes) among genetic groups resulting from spatial Bayesian genetic clustering in the wolf (Canis lupus) in the Italian peninsula. Within the Italian wolf population, four genetic clusters were detected, and these clusters showed different ecological niches. Moreover, different wolf clusters were significantly related to differences in land cover and human disturbance features. Such differences in the ecological niches of genetic clusters should be interpreted in light of neutral processes that hinder movement, dispersal, and gene flow among the genetic clusters, in order to not prematurely assume any selective or adaptive processes. In the present study, we found that both the plasticity of wolves—a habitat generalist—to cope with different environmental conditions and the occurrence of barriers that limit gene flow lead to the formation of genetic intraspecific genetic clusters and their distinct ecological niches.

The approach of Gotelli and Stanton-Geddes (2015) has so far only been applied in handful of studies (Harrisson et al., 2016;Ikeda et al., 2017;Marcer et al., 2016;Shinneman, Means, Potter, & Hipkins, 2016) , and the relationship between genetic distances among genetically defined clusters (e.g., genetic differentiation F ST ; Dupanloup, Schneider, & Excoffier, 2002;Jombart, Devillard, Dufour, & Pontier, 2008) and distances or similarities among their ecological niches (e.g., niche overlap) has not been tested so far. Moreover, none of these studies specifically investigated the environmental factors differing among genetic clusters. However, in cases of low ecological niche overlap among intraspecific genetic clusters-that is, clusters showing cluster-specific ecological niches-a given species could be considered as an assemblage of genetic or evolutionary lineages differing in their spatial distribution and their environments (Marcer et al., 2016).
Thus, in this study, we (a) estimated the pairwise ecological niche overlap between genetic groups resulting from spatial Bayesian clustering, (b) determined the relationship between genetic distances (F ST ) and ecological niche overlap between identified genetic clusters, and (c) identified the ecological factors related to the spatial location of genetic clusters. For this purpose, we used a genetic data set on wolves (Canis lupus), which were molecularly identified during multiyear large-scale monitoring projects based on carcasses, livetrapped individuals, and non-invasively collected samples from the central Apennines to the western Alps in Italy (Caniglia et al., 2012(Caniglia et al., , 2013Caniglia, Fabbri, Galaverni, Milanesi, & Randi, 2014;Fabbri et al., 2007). The population of wolves in Italy experienced a demographic bottleneck in the 1970s (Zimen & Boitani, 1975), which reversed in the 1980s due to legal protection, the occurrence of ample habitat in diverse landscapes abandoned by humans and the abundant availability of wild prey (Meriggi, Brangi, Schenone, Signorelli, & Milanesi, 2011). In about 40 years, the wolf population re-expanded from south-central Italy to the entire Apennine chain (including adjacent lower hills and plains) and the Italian and French western Alps (Caniglia et al., 2013;Fabbri et al., 2007;Galaverni, Caniglia, Fabbri, Milanesi, & Randi, 2016).
F I G U R E 1 Study area in Italy (black lines indicate provincial and regional borders) and wolf sampling locations (white dots with black circles) 2 | ME THODS

| Study area
Data collection was carried out from the central Apennines to the western Alps in Italy, in a study area of 97,044 km 2 (6°62′-13°91′E; 46°46′-42°39′N; Figure 1). The study area was characterized by a large altitudinal range (0-4,634 m a.s.l.), distinct climatic gradients (from temperate to alpine), and diverse human land uses. It was characterized by a high diversity of habitats, such as different types of meadows, pastures, rocky surfaces, and even glaciers in high mountains. In the lower mountains and foothills, abandoned fields currently changing into semi-natural shrublands and deciduous, mixed or evergreen forests were most abundant. In the valleys, plains, and at the coast, agricultural fields and urban areas were dominant.

| Genetic data set
Putative wolf samples (mainly feces but also blood or muscular tissues from carcasses as well as saliva, urine, and hairs; N = 9,317) were collected by more than 400 trained operators (Italian State Forestry Corps, park rangers and technicians, wildlife managers, researchers, students, and volunteers). Training was done by wolf experts, and operators were instructed to collect samples as fresh-looking as possible, excluding degraded ones. Samples were collected along randomly chosen trails and country roads across all habitat types, at least once per season. Sampling was part of a long-term monitoring program on wolves in Italy (from 2000 to 2011; Caniglia et al., 2012Caniglia et al., , 2014Fabbri et al., 2007). Samples were stored at −20°C in 10 volumes of 95% ethanol or Tris/SDS buffer (Caniglia et al., 2012).

The MULTIPROBE II EX Robotic Liquid Handling System of Perkin
Elmer and the QIAGEN stool and tissue extraction kits were used to extract DNA. Multi-locus genotypes, gender, and taxon (i.e., wolf, dog, or hybrid) were determined using 12 unlinked neutral microsatellites, a sex-specific restriction site and taxon-specific markers following the methods described in Caniglia et al., (2014Caniglia et al., ( , 2013 . PCR amplification of each sample was carried out four to eight times. A total of 3,815 samples were reliable genotyped at all markers and belonged to 923 unrelated wolf individuals (Figure 1), 93 dogs and 118 wolf ×dog hybrids (Milanesi et al., 2016, Milanesi, Holderegger, Bollmann, et al., ).

| Genetic clustering
To identify genetic clusters of Italian wolves, we considered the 923 wolf individuals and used the Bayesian genetic clustering method implemented in TESS 2.3 (Chen, Durand, Forbes, & François, 2007), incorporating spatial information on sampling locations (considering the spatial coordinates of the location where a given wolf has been sampled for the first time). We used the admixture model to calculate 10 runs for K max = 2-6 clusters (500,000 sweeps, 100,000 burn-in) using three different values of the spatial interaction parameter h (0, 0.6-the default value-and 0.99 indicating no spatial, medium, and high spatial dependence, respectively).
The optimal number of clusters was estimated through the deviance information criterion (DIC) plot (Chen et al., 2007). For the most likely number of clusters, mean cluster membership probabilities of the 10 runs were estimated with CLUMPP 1.1.2 (Jakobsson & Rosenberg, 2007) and average cluster membership probabilities were then interpolated across the whole study area by thin plate spline function (Tps) in the R package FIELDS (v. 3.3.1; Furrer, Nychka, & Sain, 2009; again only considering the spatial coordinates of the location where a given wolf has been sampled for the first time). Note. Variables with a variance inflation factor (VIF) >3 were removed from further analysis due to multi-collinearity with other variables.

TA B L E 1 Environmental factors used in generalized linear models
We estimated genetic differentiation between the identified ge-

| Predictor variables
We collected a total of 13 predictor variables encompassing environmental, topographic, and anthropogenic factors. Specifically, we estimated habitat diversity and the percentages of six land cover types derived from CORINE Land Cover 2006 IV Level, with a minimum mapping unit/width of 25 ha/100 m and a thematic accuracy ≥85% (https://www.sinanet.isprambiente.it/it/sia-ispra/downloadmais/corine-land-cover/corine-land-cover-2006-iv-livello/view; We calculated the variance inflation factor (VIF) for all the variables and removed predictor variables with values higher than 3 (i.e., highly correlated to other predictors; Zuur, Ieno, & Elphick, 2010; Table 1) to avoid multi-collinearity among predictors. Specifically, we removed cultivated fields, slope, and landscape roughness, because they all exhibited a VIF >3, and only considered the remaining 10 variables in further analyses (Table 1). We also estimated spatial autocorrelation within each predictor variable through Moran's I index (Table S1) using the R package RASTER (v. 2.5-8. Hijmans, 2011).
We found relatively high and positive Moran's I values considering the entire study area (indicating positive spatial autocorrelation, e.g., clustering; Cliff & Ord, 1981) but low values (close to zero, indicating a random pattern with no spatial autocorrelation; Cliff & Ord, 1981) considering only the values of the predictor variables at the locations were wolves where sampled (Supporting Information Table S1).

| Relating genetic differentiation and ecological niche overlap
We estimated the ecological niche of each genetic cluster considering the locations of wolves belonging to a particular cluster by  Table S2 for details on model parameters), which, in contrast to other ENMs, allows estimation of the ecological hyperspace considering the whole set of predictors (Blonder et al., 2014).
We quantified pairwise ecological niche overlap between clusters of wolves as the intersection of two shared/summed ecological niches in the hyperspace. We performed these analyses in the R package HYPERVOLUME (v. 2.0.8; Blonder et al., 2014). We also provided minimum, mean, and maximum values of the predictor variables of the hypervolume in the minimum convex polygon estimated for each wolf cluster (Supporting Information Table S3).
We then compared the pairwise F st -values between genetic clusters (response variable) and the corresponding ecological niche overlap between two clusters (fixed effect) in a linear mixed-effects model (a Toeplitz covariance matrix was considered as random effect to account for the non-independence of distance measurements between pairs of clusters; Milanesi et al., 2016;Milanesi, Breiner, et al., 2017;Milanesi, Holderegger, Bollmann, et al., ;Selkoe et al., 2010;Van Strien, Keller, & Holderegger, 2012).
Actually, linear mixed-effects models have recently been identified as the best performing among several landscape genetics technics (Shirk, Landguth, & Cushman, 2018). Specifically, our approach is conceptually similar to an isolation by environment test (Wang & Bradburd, 2014;Wang, Glor, & Losos, 2013), but instead of relate pairwise genetic and environmental distances (often derived by least-cost paths) among populations, we related pairwise genetic distances with ecological niche overlap between clusters.
In our approach, high standardized regression coefficient (β) values correspond to a large presumed effect of the predictor on the response variable. We expected an inverse relationship between Fst-values and ecological niche overlap (and thus a negative standardized β value), as genetic differentiation should decrease with increasing niche overlap. We used 10,000 permutations to assess significance of linear mixed-effects models using the PGIRMESS package (v. 1.6.7; Giraudoux, 2013) in R.

| Relating cluster distribution and predictor variables
To identify the particular environmental factors related to the spatial distribution of the different genetic clusters of wolves in Italy, we de- We carried out these analyses using the R package MUMIN (v. 1.0.0. Barton, 2009). We also calculated sampling effort through Gaussian kernel density (Elith, Kearney, & Phillips, 2010) based on all sampling locations (i.e., including wolves, dogs, and hybrids locations) and used the resulting values as case weights in the above-mentioned CAR models.

| RE SULTS
Bayesian genetic cluster analysis suggested the existence of four genetic clusters (Figure 2 (Table 2). These are relative values (ranging from 0 to 1), and hence, the overlap is somewhere in the middle between low and high.
Linear mixed-effects models with pairwise genetic differentiation F ST as dependent variable showed a significant relationship (p < 0.001) with ecological niche overlap between clusters with a standardized β-coefficient of 0.557.
Considering EC, the best models (N = 11; AIC ranging between 995.43 and 997.42, better than a random model at AIC = 1,029.28) included altitude, meadows, and habitat diversity, which were all positively related to cluster membership and had the highest W and p < 0.001 (Table 3), while deciduous and coniferous forests, human population density, shrublands, mixed woods, and human settlements showed lower values of W and were non-significant (Table 3). The best models of WA (N = 9; AIC ranging between 278.41 and 280.40, better than a random model at AIC = 300.67) included altitude, shrublands, coniferous, and deciduous forests (only the latter with negative β value) with the highest W and p < 0.001 (Table 3), while habitat diversity, mixed woods, meadows, human settlements, and human population density showed lower values of W and were non-significant (Table 3). In contrast, the best models of WC (N = 16; AIC ranging between 933.04 and 935.04, better than a random model at AIC = 1,018.07) included deciduous forests and mixed woods, both positively related to cluster membership, with the highest W and p < 0.001 (Table 3), while human settlements, coniferous forests, altitude, human population density, habitat diversity, meadows, and shrublands density showed lower values of W and were non-significant (Table 3). Finally, in the best models of NA (N = 12; AIC ranging between 680.43 and 682.42, better than a random model at AIC = 701.18) altitude, deciduous forests, and meadows (only the latter with negative β value) showed the highest W and p < 0.001 (Table 3), while habitat diversity, human settlements, mixed woods, coniferous forests, shrublands, and human population density showed lower values of W and were non-significant (Table 3).

| D ISCUSS I ON
In this study, through the analysis of neutral genetic markers (i.e., microsatellites), we identified four genetic clusters of wolves in Italy and developed cluster-specific ENMs to estimate niche overlap between them. We found moderate ecological niche overlap between the resulting clusters, which was nevertheless significantly related to the genetic distances between them, suggesting that genetically different clusters live in different environmental conditions.
Accordingly, we identified cluster-specific combinations of environmental factors influencing wolf occurrence in Italy.
F I G U R E 2 Bayesian clustering analysis showing barplots for the proportion of cluster memberships assigned to individual wolves under the estimated optimal numbers of clusters (K max = 4). Different colors indicate different clusters F I G U R E 3 Estimation of the number of genetic clusters (K max = 2-6) of wolves in Italy based on the mean (±95% confidence intervals from 10 runs) deviance information criterion (DIC)

| Genetic clusters of wolves in Italy
Why should genetic clusters reflect differences in ecological niches among clusters? There are two main possible explanations: (a) genetic clusters might reflect differential adaptation to different en- tive relevance (i.e., is located in adaptive genes or is closely linked to adaptive regions). In both cases, most or all of the genetic markers used for genetic clustering behave neutrally, suggesting that there probably is no direct adaptive relationship between genetic clusters and the ecological niches they occupy. In fact, demographic spatial patterns can mimic and confound adaptive signals (Beaumont & Nichols, 1996;Foll & Gaggiotti, 2008;Rellstab et al., 2015).
If so, intraspecific genetic clusters can also be caused by limited movement, dispersal, and gene flow among different regions within the distribution range of a species (Gotelli & Stanton-Geddes, 2015), and the genetic differentiation among them is then affected by the interplay of gene flow and genetic drift (Hutchison & Templeton, 1999;Slatkin, 1987). Restricted gene flow among clusters could be caused by many factors affecting movement and dispersal, especially by geographical or landscape barriers-a topic widely explored in evolution, biogeography, landscape ecology, and landscape genetics (Andrew, Ostevik, Ebert, & Rieseberg, 2012;Holderegger & Wagner, 2008;Manel, Schwartz, Luikart, & Taberlet, 2003;McCairns & Bernatchez, 2008;Storfer et al., 2007;Storfer, Murphy, Spear, Holderegger, & Waits, 2010;Temple, Hoffman, & Amos, 2006). Especially in mammals, intrinsic factors such as behavior or the specific use of food resources could also cause limited movement and dispersal among clusters (Pilot et al., 2006;Vonholdt et al., 2010).
The four genetic clusters identified in this study largely coincided with the topology of the main mountain regions occupied by wolves in Italy (parts of the Apennines and the Alps; Figure 3, Supporting Information Figure S1), occupying four continuous areas located TA B L E 2 Pairwise genetic differentiation Fst (below diagonal) and pairwise ecological niche overlap (above diagonal) between clusters of wolves northern Apennines and in the western Alps (Figure 3). A previous study on the history and dynamics of the wolf recolonization in the Apennines detected only three clusters in Italy, namely the central Apennines, the northern Apennines and the western Alps (Fabbri et al., 2007), based on a different genetic data set than the present one.
Thus, an interesting result of our analyses was the splitting of the central Apennine wolf population into two distinct genetic clusters, one on the western and one on the eastern slope of the Apennines ( Figure 3). The discovery of another cluster in the Italian wolf population was rather unexpected due to the assumed high mobility of TA B L E 3 Average standardized coefficients (β), standard errors (SE), p-values (p) and relative importance from Akaike weights (W) based on averaging spatial conditional autoregressive model (CAR) for cluster membership (considering only models with ΔAIC < 2) wolves. However, the two separated genetic clusters of wolves on opposite slopes of the Apennines could be the result of recent re-colonization of low-mountain or hilly areas close to the Tyrrhenian and the Adriatic Sea, originating from the former core habitat of wolves at higher altitudes in the central Apennines (Caniglia et al., 2013;Galaverni et al., 2017;Giacchini, Scotti, & Zabaglia, 2012). The cluster in the western Alps was mainly founded by long-distance dispersal from the Apennines at the onset of the re-colonization process of the Alps, followed by the setting of territorial behavior of reproducing packs there (Fabbri et al., 2007;Valière et al., 2003). Such a scenario is supported by similar patterns of genetic structure found for wild cats (Felix silvestris) in Italy (Mattucci et al., 2013).

| Different niches of wolf clusters in Italy
The four identified clusters of wolves were related to different ecological factors. Specifically, we found significant and positive relationships between WC and mixed woods and deciduous forests, while WA was positively related to coniferous forests and NA was positively related to deciduous forests. Thus, our results highlighted the importance of woods and forests in providing suitable habitat and shelter for wolves (Bassi, Willis, Passilongo, Mattioli, & Apollonio, 2015;Jędrzejewski, Niedzialkowska, Mysłajek, Nowak, & Jędrzejewska, 2005;Llaneza, López-Bao, & Sazatornil, 2012;Meriggi et al., 2011;Milanesi, Meriggi, & Merli, 2012). However, deciduous forests were negatively related to WA. Actually, the areas of occurrence of EC and NA were characterized by a low percentage of coniferous forests, but a high percentage of deciduous forests and mixed woods, while those of WA were mainly covered by coniferous forests. These results suggest that wolves simply are related to the main regional forest type, irrespective of its nature. Moreover, EC, WA, and NA were positively related to altitude, probably because of the abundance of prey, availability of safe areas, and lower disturbance by human activities at higher altitudes (Bassi et al., 2015;Glenz, Massolo, Kuonen, & Schlaepfer, 2001;Jędrzejewski et al., 2005;Llaneza et al., 2012;Pilot et al., 2006).
Shrublands were positively related to WA, probably because of high amounts of open shrublands consisting of land used as pastures by freely roaming livestock in the corresponding region (Eggermann, Costa, Guerra, Kirchner, & Petrucci-Fonseca, 2011). Meadows were positively related to EC, as a large share of meadows in a region potentially translates into a high availability of livestock as prey. Thus, meadows become important for wolves, in regions where forests are sparse (Jędrzejewski et al., 2005). In contrast, meadows were negatively related to the distribution of NA due to small extension of meadows or low accessibility of livestock as prey in this area (Meriggi et al., 2011). EC was positively related to habitat diversity, because wolves tend to use different habitat types for different activities (Houle, Fortin, Dussault, Courtois, & Ouellet, 2010) and because habitat heterogeneity is a known driver of wild ungulate diversity and distribution (Cromsigt, Prins, & Olff, 2009). In summary, wolf clusters in Italy seem to be simply associated with the most abundant land cover types of their respective area, showing that high ecological flexibility of wolves enabling them to cope with diverse and even fragmented landscapes (Llaneza et al., 2012).
Genetic clusters of animals should also be affected by anthropogenic factors. For instance, animals could show avoidance behavior against human settlements or roads. Such avoidance has been assessed in several wild animals (Cushman, McKelvey, Hayden, & Schwartz, 2006) and is well known to also affect wolf occurrence (Gaillard et al., 2010). Indeed, we found that human disturbance, in terms of human population density and human settlements, tended to be related to genetic clusters in Italian wolves, but not in a consistent and statistically significant way. Nevertheless, in agreement with other studies (Jędrzejewski et al., 2005;Llaneza et al., 2012;Milanesi et al., 2016), the distribution of the four different clusters were negatively related to human population density and human settlements, while distance from human settlements was not included in any of the best models for all clusters.

| Conclusions, management implications, and future research
In this study, we found that different genetic clusters of wolves in Italy are differently related to land cover types, probably caused by two main effects. First, the plasticity of the wolf-a distinct habitat Finally, we remark that when using genetic clusters based on neutral loci in combination with ENMs, one pitfall is to simply interpret the results in the light of adaptation. However, there is no direct link between neutral and adaptive genetic diversity or differentiation (Merilä & Crnokrak, 2001;Reed & Frankham, 2001).
Differences in the ecological niches of genetic clusters should thus be interpreted in light of neutral processes that influence movement, dispersal, and gene flow among genetic clusters (Gotelli & Stanton-Geddes, 2015), as done in the present study. However, further studies should compare genetic clusters separately based on neutral and adaptive genetic markers and relate them to ecological niches (as identified by ENMs; Deagle, Jones, Absher, Kingsley, & Reimchen, 2013;Manel & Holderegger, 2013). Since adaptive markers do not have to be in Hardy-Weinberg equilibrium nor in linkage equilibrium, commonly used Bayesian genetic clustering techniques cannot be applied. Nevertheless, assumption-free methods are now available and can be used for this purpose (e.g. TESS 3; Caye, Deist, Martins, Michel, & François, 2016).
Environmental association analyses (linking potentially adaptive genes to particular environmental factors; Rellstab et al., 2015) can then be combined with ENMs to tests whether genetic clusters based on adaptive genes are related to different ecological factors. In such analyses, spatial coincidence of adaptive and neutral genetic clusters would make a strong case for adaptively relevant differences among genetic clusters.

ACK N OWLED G M ENTS
We thank Chris Foote and two anonymous reviewers for their use- Mazzei) and Parco Naturale Regionale della Maremma (G.P. Sammuri).

CO N FLI C T O F I NTE R E S T
None declared.

DATA ACCE SS I B I LIT Y
Due to sensitive information (locations of an endangered species), the data of the present study cannot be made openly accessible.