Landscape genetic inferences vary with sampling scenario for a pond‐breeding amphibian

Abstract A critical decision in landscape genetic studies is whether to use individuals or populations as the sampling unit. This decision affects the time and cost of sampling and may affect ecological inference. We analyzed 334 Columbia spotted frogs at 8 microsatellite loci across 40 sites in northern Idaho to determine how inferences from landscape genetic analyses would vary with sampling design. At all sites, we compared a proportion available sampling scheme (PASS), in which all samples were used, to resampled datasets of 2–11 individuals. Additionally, we compared a population sampling scheme (PSS) to an individual sampling scheme (ISS) at 18 sites with sufficient sample size. We applied an information theoretic approach with both restricted maximum likelihood and maximum likelihood estimation to evaluate competing landscape resistance hypotheses. We found that PSS supported low‐density forest when restricted maximum likelihood was used, but a combination model of most variables when maximum likelihood was used. We also saw variations when AIC was used compared to BIC. ISS supported this model as well as additional models when testing hypotheses of land cover types that create the greatest resistance to gene flow for Columbia spotted frogs. Increased sampling density and study extent, seen by comparing PSS to PASS, showed a change in model support. As number of individuals increased, model support converged at 7–9 individuals for ISS to PSS. ISS may be useful to increase study extent and sampling density, but may lack power to provide strong support for the correct model with microsatellite datasets. Our results highlight the importance of additional research on sampling design effects on landscape genetics inference.


| INTRODUC TI ON
Habitat loss and fragmentation is one of the largest threats to wildlife populations worldwide. As global landscape change continues to accelerate, there is an increasing need to understand how species respond (Cushman, 2006). Knowledge of movement ecology and connectivity is difficult to obtain for many species, but is essential for evaluating population viability of a species at the regional scale (Fahrig & Merriam, 1994). Animal movement is often studied by physical tracking, which has a rich history of use across taxa with a variety of methodologies (Aarts, MacKenzie, McConnell, Fedak, & Matthiopoloulos, 2008;Langkilde & Alford, 2002); however, because movement does not always indicate the transfer of genes (Semlitsch, 2008), it alone is not the most appropriate tool for measuring functional connectivity.
Landscape genetics combines landscape ecology and population genetics to evaluate functional connectivity, which provides inferences about factors affecting movement and reproduction (Manel, Schwartz, Luikart, & Taberlet, 2003, Holderegger & Wagner, 2006, Manel & Holderegger 2013. Quantitative methods that link landscape features and genetic data allow researchers to infer migration events between populations (Storfer et al. 2007). By collecting genetic data across a landscape, researchers can identify how spatial genetic patterns may be influenced by landscape features (Manel et al., 2003). For example, common genetic patterns include isolation by distance (IBD;Wright, 1943), barriers to movement (IBB; Cushman, 2006), isolation by environment (IBE; Wang & Bradburd, 2014), and isolation by resistance (IBR; McRae, 2006).
Landscape genetic sampling schemes can be difficult to properly develop, identify, and implement (Manel et al., 2003;Oyler-McCance, Fedy, & Landguth, 2013;Segelbacher et al., 2010). The sampling scheme used in landscape genetics studies depends on the distribution of the species, the spatial and temporal scales of processes of interest, and availability of resources allocated to sampling (Balkenhol, Cushman, Storfer, & Waits, 2015;Manel et al., 2003;Schwartz & McKelvey, 2008). Inefficient or biased sampling design can decrease the ability of a study to correctly identify the processes leading to population structure (Oyler-McCance et al., 2013). Sampling schemes need to consider the extent of the study area and the distance between potential sampling sites, as well as species distribution, temporal scale, and life history traits of the study organism (Anderson et al., 2010;Prunier et al., 2013).
There are two broad groups of study design sampling types, using either individual or population as the unit of analysis, which often overlap in their hypotheses but vary in their overall approaches (Dyer, 2015). With individual sampling scheme (ISS), only one or few individuals are sampled per geographic location and genetic distances are calculated between all pairs of individuals to create matrices based on individual genotypes (Coulon et al., 2004;Prunier et al., 2013). In contrast, sampling at the population level can be applied where ecologically relevant population delineations occur by sampling many individuals in each aggregate and creating distance matrices by either averaging interindividual distance matrices, as we have done here, or by using population-level genetic distances, for example, F ST (Spear, Peterson, Matocq, & Storfer, 2005). The population-level sampling scheme (PSS) can be problematic because populations are often difficult to delineate a priori, and sufficient sample sizes of many species are difficult to obtain (Manel et al., 2003). A PSS is resource-and time-consuming and often results in a reduced sampling extent or a more diffuse sampling regime, leaving areas unsampled (Prunier et al., 2013).
If fewer than the target number of individuals is collected at a location, that population is often dropped from the final analysis, leaving a gap in sampling and excluding potentially informative genetic data. In addition, a PSS may not be appropriate for species where population delineation is difficult or habitat use is continuous, like highly mobile or migratory species. In these continuous distribution systems, ISS may be most appropriate (Luximon, Petit, & Broquet, 2014).
Despite the drawbacks of the PSS, it is more commonly utilized than ISS (Prunier et al., 2013) because of well-developed population genetic theory and analysis. There is a third, unexplored option, which is to include all individuals from all populations, regardless of number of individuals sampled from each population, This sampling scheme could be utilized when a target number of individuals are not obtained at each sampling area due to low densities, time, funding, or other constraints. Here, we aim to compare the ability of landscape genetic analyses to detect landscape genetic patterns using these three alternative sampling schemes (Box 1).

Box 1 Glossary of sampling scheme terms and abbreviations
To understand how inference of landscape genetic patterns can differ based on sampling scheme, we studied a pond-breeding species, the Columbia spotted frog (Rana luteiventris) in northern Idaho, USA, over an area of 1,555 km 2 . The Columbia spotted frog is a wide-ranging species, with a distribution from the southern Rocky Mountains to southeastern Alaska (Green, Kaiser, Sharbel, Kearsley, & McAllister, 1997). In northern Idaho, breeding populations are often small, with effective population sizes ranging from 3.2 to 37.8; because of this, the persistence of the Columbia spotted frogs in the region may be at risk (Davis & Verrell, 2005;Goldberg & Waits, 2009). Within a smaller extent in this area of the range (213 km 2 ), Columbia spotted frog functional connectivity was found to be negatively influenced by forest presence, while shrub/clear-cut and agriculture land cover types were found to have the lowest resistance to gene flow (Goldberg & Waits, 2010a). Pond-breeding amphibians are a useful model to investigate sampling scheme questions because, due to their distribution and population sizes, they can be sampled using either the ISS or the PSS. Although PSS may be more appropriate for the Columbia spotted frog due to pond-breeding amphibians being generally philopatric (Smith & Green, 2005), this system allowed us to evaluate sampling schemes ranging from a single individual to the population level in an iterative manner. We compared the level of functional connectivity inferred by individual, population, and proportion available sampling schemes using an information theoretic approach to model landscape resistance (Burnham & Anderson, 2002 species where adult migration is <2 km on average (Bull & Hayes, 2001;Pilliod, Peterson, & Ritson, 2002). We expected that ISS would indicate the same variables overall as PSS, albeit with less support.
We used multiple random draws at locations with more than one individual to create resampled ISS replicates. We expected that the ISS approach would indicate the same variables as PASS, but that including these sites with low sample sizes would add noise to the results; specifically, that multiple models would be supported in many of the resampled replicates. With higher numbers of individuals per site, we expected that the results would converge with the PSS based on the added statistical precision provided by more individuals. We did not have an expectation on the number of supported models or model weights across all datasets and replicates.  Dahl et al. 2000). The population size of the nearest city to the study area, Moscow, Idaho, increased 5.3% from 2010 to 2015, which is greater than the national average of 4.1% (United States Census Bureau, 2015). Only a small fraction (13%) of natural wetlands existed as of the most recent comprehensive survey  posing potential limitations for amphibian populations.

| MATERIAL S AND ME THODS
We analyzed tissue samples (mouth swabs and tail clips) from 334 individuals sampled at 40 wetlands from the randomly selected set surveyed for habitat modeling in the study area (Goldberg & Waits, 2009; Figure 1). We extracted DNA from these samples using This was based on the distribution of the sample sizes collected in the field, with 5 of the 18 population-level sites having 11 individuals collected. Prior to the landscape genetic analysis, we measured population genetic statistics on population-level samples and tested for Hardy-Weinberg equilibrium using GENALEX (Peakall & Smouse, 2012) and linkage disequilibrium with ARLEQUIN (Excoffier & Lischer, 2010). We analyzed data with ISS at two sampling densities, for a total of four sampling schemes: population-level sampling To determine the minimum sampling density for ISS approaches to reach the same conclusions as PSS/PASS, we also bootstrapped resampled subsets of 2 through 11 individuals.

| Genetic distances
We used proportion of shared alleles, D ps, as our metric of genetic distance as it can be estimated with both population-and individuallevel sampling (Bowcock et al., 1994).

| Landscape variables and models
We examined how sampling scheme influenced landscape genetic inference using resistance analysis evaluated by information criterion metrics. Land cover (from Pocewicz et al., 2008) was split into separate rasters at a resolution of 15 m 2 : water, shrub, lowdensity (LD) forest, high-density (HD) forest, development, barren, agriculture, and grassland ( Figure 1). Each raster was parameterized with two denoting raster cells containing the respective land cover type and one denoting any other land cover type. Riparian

| RE SULTS
Mean number of alleles per locus for the 18 populations was 4.410 ± 0.942 SD (Table 3)

| D ISCUSS I ON
Landscape genetics is a powerful method for evaluating functional connectivity (Manel et al., 2003, Holderegger & Wagner, 2006, Manel & Holderegger 2013), but appropriate sampling strategies and schemes can be difficult to determine and apply (Manel et al., 2003;Segelbacher et al., 2010). We found that inferences differed between individual and population sampling schemes when we compared the two datasets at 18 sites. This pattern for ISS-18 was likely due to a lack of statistical power. At 40 sites, the models supported by PASS were also supported by the ISS at 40 sites, but the support for the PASS top model was not as strong as for the full dataset and additional models were supported with the ISS-40 dataset. Support varied considerably within the ISS-18 and ISS-40 datasets as well.
No model was supported more than 90 percent, and with most of the models being supported around 25 percent of the time. With increased numbers of individuals sampled, the ISS converged with PSS and PASS. Convergence occurred at nine individuals with ML, but seven individuals with REML. This indicates that small numbers of sampled individuals may be appropriate under certain circumstances, for example, stronger population structure or increased number of loci (Landguth et al., , 2012Prunier et al., 2013).
However, the variation in model support suggests caution is important as mistaken inferences may be drawn if sample size is insufficient. The differences between REML and ML occurred at lower numbers of individuals per population, which highlights a potential methodological issue when moving to an ISS. Prunier et al. (2013) found that the individual sampling scheme could, in most cases, have similar inferences of landscape connectivity when compared with the population sampling scheme, when  (Willing, Dreyer, & Oosterhout, 2012), and subsequent inferences from landscape genetics. This is much lower than the standard recommended value of 20-30 individuals per population (Storfer et al., 2007). It may be appropriate to lower target sample number goals based on these results. However, the number of individuals needed is going to be related to the amount of population differentiation (Kalinowski, 2005). By lowering the number of individuals per population, researchers will be able to increase either the spatial extent or sampling density of their studies or save valuable resources. Because fixed effects vary among the models that were tested in these cases, ML would be most appropriate as opposed to REML (Verbeke & Molenberghs, 2000), although there are examples in the literature of using REML for MLPE (e.g., Emel & Storfer, 2014, Gurka, 2006, Row et al., 2017, Zancolli, Rodel, Steffan-Dewenter, & Storfer, 2014 forest was the most supported model with PSS using ML and at the smaller sampling extent and density when using REML, while high-density forest was the most supported model at the increased sampling extent and density when using REML. In addition, low-density and high-density forest occurred in almost all of the supported models when ML was used. Results of low-density forest and highdensity forest models indicate a reduction in gene flow when these land cover types occur between breeding populations. These results are similar to those previously found for this species at a finer scale in the southwestern portion of this study area, where forest was TA B L E 5 Parameters and information theoretic (BIC) results for models of genetic distance (proportion of shared alleles) in north Idaho, USA, for the Columbia spotted frog (Rana luteiventris) using maximum likelihood (ML). Parameters are land cover of low-density forest (forestld), high-density forest (foresthd), agriculture (ag), shrub/clear-cut (shrub), human development (dev), grassland (grass), distance (distance), solar radiation (solar  13.14 0.00 found to restrict gene flow (Goldberg & Waits, 2010a). However, Goldberg and Waits (2009) found breeding sites to be near low-and high-density forest; specifically, proximity to low-density forest was the most important land cover variable for the presence of breeding populations of this species. Together, these findings indicate the important influence that forests have on Columbia spotted frogs in this region, and the difference between habitat requirements and land cover contributions to functional connectivity.
Landscape features may influence gene flow differently across a species range depending on range-wide variation in interactions with climate and other variables. In this study area, slope was not supported outside of a correlation with high-density forest, but in more mountainous regions, topographic features were supported as an important influence on functional connectivity for this species (Funk et al., 2005;Murphy, Dezzani, Pilliod, & Storfer, 2010). The range of variation of interest may be key to explaining observed differences among study regions, which may explain differences in inferences of range-wide connectivity Short Bull et al., 2011). Although not tested in this study, there is evidence that other at-site abiotic and biotic factors, such as frost-free period, presence of predatory fish, and site productivity, are also important for this species .  (Puechmaille, 2016). Working with empirical data also meant that we were limited to treating the dataset with the greatest statistical power as truth, in contrast with simulation studies. It is possible that the results from the population sampling scheme do not reflect the true drivers of functional connectivity in this system; however, it represented our most-informed dataset.
One additional concern may be that we used unweighted allele-sharing genetic distances. This system has high levels of population structure (Goldberg & Waits, 2010a); because fre- it is assumed frequency distributions are identical for all loci and mating was completely random, which is unlikely to be the case in anuran systems (Arak, 1988;Davies & Halliday, 1977;Howard, 1980;Reading, 2001).
The most appropriate sampling scheme in landscape genetics is still a question that needs further investigation, and will vary by system (Balkenhol et al. 2015, Segelbacher et al., 2010, Storfer et al., 2007. Increased statistical power is obtained by increasing individuals sampled (Prunier et al., 2013), and so increasing sampling density or extent by adding more individuals or populations may result in similar conclusions; however, we observed increasing extent may not result in the same conclusion if the additional area encompasses  Program. We thank the many private landowners who allowed us access to their wetlands and R. Flatz, J. Bauder, and K. Butler assistance on sample collection and processing.

CO N FLI C T O F I NTE R E S T
None declared.