Waste not, want not: Microsatellites remain an economical and informative technology for conservation genetics

Abstract Comparisons of microsatellites and single‐nucleotide polymorphisms (SNPs) have found that SNPs outperform microsatellites in population genetic analyses, questioning the continued utility of microsatellites in population and landscape genetics. Yet, highly polymorphic markers may be of value in species that have reduced genetic variation. This study repeated previous analyses that used microsatellites with SNPs developed from ddRAD sequencing in the black‐capped vireo source‐sink system. SNPs provided greater resolution of genetic diversity, population differentiation, and migrant detection but could not reconstruct parentage relationships due to insufficient heterozygosities. The biological inferences made by both sets of markers were similar: asymmetrical gene flow from source sites to the remaining sink sites. With the landscape genetic analyses, we found different results between the two molecular markers, but associations of the top environmental features (riparian, open habitat, agriculture, and human development) with dispersal estimates were shared between marker types. Despite the higher precision of SNPs, we find that microsatellites effectively uncover population processes and patterns and are superior for parentage analyses in this species with reduced genetic diversity. This study illustrates the continued applicability and relevance of microsatellites in population genetic research.


| INTRODUC TI ON
Molecular markers allow us to answer an array of population genetic questions about gene flow (Edelaar & Bolnick, 2012;Hudson et al., 1992), parentage (García et al., 2002), and population structuring (Clark-Cockerham & Weir, 1993;Narum et al., 2008). The toolbox of molecular markers has rapidly advanced in the last 30 years, progressing from allozyme markers to microsatellites to single-nucleotide polymorphism (SNP) markers, each with progressively higher statistical power (Andrews & Luikart, 2014;Luikart et al., 2003;Morin et al., 2004). The higher resolution of microsatellite and SNP data has extended their use to landscape genetics (Sork et al., 2010), tests of adaptation and selection (Ahrens et al., 2018), hybridization (Toews et al., 2016), outbreeding and inbreeding depression (Steiner et al., 2013), and epigenetics (Harrisson et al., 2014). Of particular interest here is the relatively recent interdisciplinary field | 15801 HAUSER Et Al. of landscape genetics that combines the theory and methods from landscape ecology and population genetics to study how landscape (or seascape) features affect population processes such as gene flow (Zeller et al., 2012). Landscape genetics bridges the gap between environmental factors and species' responses, providing singular insights into ecological and evolutionary processes. Information gleaned from molecular markers offers crucial insights into application in conservation and management efforts.
Microsatellites and SNPs are the most commonly used markers for population genetic studies, each with pros and cons (Morin et al., 2009). Microsatellites are highly polymorphic, providing relatively high statistical power per locus but suffer null alleles, homoplasy, and complex and variable mutation processes that confound results (Defaveri et al., 2013;Putman & Carbone, 2014). The distribution of microsatellite markers genome-wide is also unknown across many species. Still, they are likely not distributed evenly across the genome, potentially yielding a poorer representation of overall genetic variation than SNPs (Narum et al., 2013). Although microsatellites were once the most used marker in population genetics, SNPs are quickly replacing them in ecological, evolutionary, and conservation studies (Baruch & Weller, 2008). SNPs are biallelic and thus have a simpler mutation model but are less informative per locus, requiring more loci than would be needed for microsatellites to achieve the same statistical power (Helyar et al., 2011). SNPs also have lower error rates than microsatellites and, with nextgeneration sequencing, have lower genotyping costs per marker (Morin et al., 2009;Weinman et al., 2015). However, SNPs are not immune to null alleles, especially when generated with restriction site-associated DNA (RAD) sequencing approaches Lowry et al., 2017;Puritz et al., 2014). SNPs occur across the genome, providing a better representation of genome-wide variation (Puckett & Eggert, 2016). Comparative assessments of the two markers found that SNPs outperformed microsatellites with estimates of genetic diversity and population structure (Morin et al., 2009;Muñoz et al., 2017) and performed equally or poorly with parentage analyses (Buchanan et al., 2017;Flanagan & Jones, 2019;Thrasher et al., 2018;Weinman et al., 2015). Nevertheless, microsatellites are still useful and can yield comparable results in population structure characterization and parentage inference (Liu et al., 2005;Väli et al., 2008). Although microsatellite versus SNP comparisons exist for genetic diversity estimates (Defaveri et al., 2013;Morin et al., 2004;Vali et al., 2008), population structure analyses (Helyar et al., 2011;Liu et al., 2005;Morin et al., 2009;Muñoz et al., 2017;Narum et al., 2008;Seddon et al., 2005), and parentage or pedigree inference (Baruch & Weller, 2008;Hauser et al., 2011;Kaiser et al., 2017;Labuschagne et al., 2015;Liu et al., 2017, Thrasher et al., 2018Tokarska et al., 2009;Weinman et al., 2015), their relative performance in landscape genetics studies remains less understood (Hall & Beissinger, 2014;Puckett & Eggert, 2016). For instance, microsatellites outperform other codominant markers in assessing bottlenecks (Spencer et al., 2000), but it is not clear if this advantage persists over large numbers of SNP loci (Morin et al., 2012;Zimmerman et al., 2020). We need such direct comparisons to understand the biases, strengths, and weaknesses associated with different marker types for accurate data interpretation and to inform the adoption of new marker types in long-term studies.
This study repeated microsatellite-based population and landscape genetic analyses (Hauser et al., 2019;Hauser & Leberg, 2020) with SNPs developed from ddRAD sequencing (Peterson et al., 2012). This direct empirical comparison evaluated the relative performance of microsatellites and SNPs for landscape genetic analyses in bottlenecked populations, wherein higher statistical power will often be required to disentangle fine-scale processes. The blackcapped vireo (Vireo atricapilla) source-sink metapopulation in central Texas (Hauser et al., 2019;Hauser & Leberg, 2020;Walker et al., 2016) serves as an ideal bottlenecked system for marker comparison. The species is recovering from a demographic and a genetic bottleneck (Athrey et al., 2012;Grzybowski et al., 1994;McFarland et al., 2013) that resulted in small fragmented populations. Further, the species habitat range is highly fragmented through land conversion from their breeding habitat, scrub habitat, to agriculture and human development, resulting in most of the remnant population being restricted to protected habitats and military bases. The highest density of black-capped vireos exists around Fort Hood in central Texas, where the species has been monitored carefully as a protected species (ESA Endangered from 1970Cimprich & Kostecke, 2006;Wilsey et al., 2014). The source-sink system, in which Fort Hood broadly acts as a source to nearby small sink sites, comprises fragmented habitat patches driven by brown-headed cowbird parasitism (Walker et al., 2016) and mediated by riparian corridors (Hauser & Leberg, 2020).

| ME THODS
We collected DNA samples (toenail clips and/or pin feathers) from & Leberg, 2020, for more details)]. We banded individuals with a unique U.S. Geological Survey band and a unique three-color band combination, and sexed and aged using reliable molt limits (Hauser et al., 2019;Hauser & Leberg, 2020;Pyle, 1997). These sites span the source-sink system identified by demography (Sources: ER, MD, and WR, Sinks: SS, BC and CB; Walker et al., 2016) and microsatellite analyses (Hauser et al., 2019;Hauser & Leberg, 2020; Figure 1). We extracted DNA from the samples using the Qiagen QIAamp Micro DNA Kit (Qiagen Inc, Hilden, Germany) following the protocol for isolation of genomic DNA from small volumes of blood. We also used these 338 samples in the microsatellite analysis using 12 speciesspecific loci (Barr et al., 2007;Hauser et al., 2019;Hauser & Leberg, 2020), and to which we compared the results from the following SNP analysis.
We used 185 of the best quality black-capped vireo samples for de novo SNP discovery and genotyping. We followed the ddRAD library preparation using the restriction enzymes speI and nlaIII for paired-end 150-bp reads (Peterson et al., 2012) and sequenced the libraries on an Illumina HiSeq 4000 lane. Library preparation, quality control, and sequencing were performed at the Texas A&M AgriLife Genomics core facility in College Station, Texas. Paired-end sequence reads (total sequence reads = 802,466,640) were demultiplexed and filtered for poor quality using the process_radtags function in Stacks v2.0 (Rochette et al., 2019), retaining 1,960,156 total reads. We optimized parameters for the de novo pipeline, resulting in the following parameters for genotype calling: m = 3, M = 2, n = 1, r = 0.80, min_maf = 0.05. In optimization, we tested a range of parameters (m = 3-5; M = 2-6; n = 1-6; r = 0.8-0.9; min_maf = 0.01-0.05) and chose the combination that yielded the highest quantity of SNP loci per Paris et al. (2017). We filtered the dataset further in VCFtools for the minor allele count (mac = 3) and genotyping rate (80%, Danecek et al., 2011). For direct comparison, the microsatellite dataset (n = 338; Hauser et al., 2019;Hauser & Leberg, 2020) was subsampled to the same 185 individuals for which SNP data were produced. We performed all following analyses on both the subsampled microsatellite dataset and the SNP dataset generated herein. We designed the following analysis methods to parallel the microsatellite analyses described in Hauser et al. (2019) andLeberg (2020) with minor modifications for large SNP datasets.

| Population genetics
We tested loci per study site and samples for Hardy-Weinberg equilibrium (HWE) deviations and linkage disequilibrium (LD). Loci found to be in LD or deviating from HWE were omitted from further analysis. We calculated observed and expected heterozygosity (H 0 and H e , respectively) using basic.stats function in hierfstat R package (v F I G U R E 1 Black-capped vireo study sites in central Texas (black circles) including Balcones Canyonlands (BC), Colorado Bend State Park (CB), San Saba Property (SS), and on Fort Hood (black triangles) including East Range combined (ER), Maxdale (MD), and West Range combined (WR). The six landscape cover types depicted as follows: agricultural croplands in brown, human development in magenta, forest in green, open habitat (including grazing lands) in yellow, scrub in orange, water bodies in navy blue, and wetlands in light blue 3.5.0) and allelic richness (A r ) using the allel.rich function in hierfstat R package (v 3.5.0) to estimate genetic diversity across the sites (Goudet, 2005). To evaluate how sites differed across these three metrics (H 0 , H e , and A r ), we performed a randomized block ANOVA, blocking by locus, using the "aov" function in R with a post hoc Tukey HSD test using the TukeyHSD R function. In these and subsequent analyses, we corrected alpha levels for multiple comparisons using a standard Bonferroni correction (Hauser et al., 2019;Hauser & Leberg, 2020;Rice, 1989;Sethuraman et al., 2019). Wherever calculating p-values with iterations was computationally impossible for our resources, we assessed significance using 95% confidence intervals (Altman & Krzywinski, 2016;Gardner & Altman, 1986).
We estimated population genetic differentiation (pairwise F ST ) using the pairwise.WCfst function in R package hierfstat (v 3.5.0), estimating 95% confidence intervals with the boot.ppfst function in the same R package (Goudet, 2005). We assessed the population structure using the Bayesian clustering program STRUCTURE (v 2.3.4). We used the admixture model with population as a prior (i.e., LOCPRIOR function; Hubisz et al., 2009)) to determine the number of unique genetic clusters (k) present within our system, testing k values ranging from 1 to 6. We performed these runs with 10 iterations, 500,000 burn-in period, and 500,000 MCMC (Monte Carlo Markov Chain) repetitions. We then submitted the STRUCTURE results to STRUCTURESELECTOR and used the Evanno and Puechmaille methods to determine k (Li & Liu, 2018;Puechmaille, 2016).
We used several approaches to investigate patterns of gene flow among the sites, specifically to determine if there was directional gene flow. Using GENECLASS (v 2.0), we detected first-generation migrants using "L_home/L_max" likelihood ratio, the Paetkau et al.
(1995) criterion, .01 allelic frequency, and .01 p-value threshold. We used parentage assignments in CERVUS (v 3.0.7) to directly observe migration among sites (Kalinowski et al., 2007). For both the SNP and microsatellite datasets, we used the following simulation parameters for 10,000 simulated offspring based on censused blackcapped vireo demography (Cimprich & Kostecke, 2006;Walker et al., 2016, D. Cimprich, personal communication): number of candidate mothers = 414 (5.1% sampled) and candidate fathers = 581 (9.64% sampled), the proportion of loci typed = 0.90. We assigned 81 second-year (SY) offspring to candidate mothers (n = 21) and fathers (n = 56), chosen by their relative age to a given SY offspring, using an SNP dataset with a 90% genotyping rate across the 178 individuals (N loci = 806), with a mean missingness of 8.3%. We additionally ran a small sensitive analysis to test ranges of parameters (number of offspring = 10,000-100,000; number of candidate mothers and father = 0.08-0.50; proportion of loci typed = 0.80-0.95). A more stringent genotyping rate was used for this analysis to avoid biases associated with missing data and parentage analyses (Hammerly et al., 2016) and to ensure that the program could accommodate the dataset (Kalinowski et al., 2007). Candidate parents needed to be sampled in the same year and in an age class old enough to feasibly produce SY offspring (after-second-year; ASY). For black-capped vireos, SY individuals disperse and establish their first breeding territories, whereas older (ASY) individuals have strong site fidelity and remain in the same population for subsequent years. Therefore, we categorized offspring in populations different from their assigned parent as a migrant, whereas SY individuals found to be in the same population as their parents were considered residents.

| Landscape genetics
For all landscape genetics analyses, we used the population-level proportion of shared alleles (Dps) as a metric of gene flow (pairwise. PropShared function in R package PopGenReport; Adamack & Gruber, 2014;Gruber & Adamack, 2017) as Dps is more directly related to gene flow than other metrics of genetic differentiation (Landguth et al., 2010). We tested for isolation by distance at individual and population levels using the mantel.randtest function in the R package adegenet (Jombart, 2008).
We used the same between-site and at-site predictor variable database as Hauser and Leberg (2020), including elevation, Euclidean distance, water, development, forest, scrub, open, agriculture, riparian, the proportion of scrub habitat, and brown-headed cowbird (BHCO) management at the sites. Between-site variables (elevation, Euclidean distance, water, development, forest, scrub, open, agriculture, and riparian) were transformed into resistance surfaces in CIRCUITSCAPE (McRae, 2006). We optimized the valuation of each resistance surface (see Hauser and Leberg (2020) for more details on optimization) using a linear mixed-effects model (R package lme4; Bates et al., 2015) with Dps as the response variable, each resistance value as the fixed effect, and site as the random effect. Only the optimized resistance values for a given variable, the value with the lowest AICc score via the univariate linear mixed-effects models, were used in subsequent hypotheses testing.
To investigate how landscape features influence gene flow in this system, we used a multivariate linear mixed-effects model approach using candidate models driven by a priori hypotheses (Table 1). All candidate models were checked for multicollinearity using a variance inflation factor (VIF) threshold of 4 before fitting the models. We used the linear mixed-effects models in the R package lme4 using the full maximum likelihood with D ps as the response variable, landscape features as fixed effects, and site as the random effect (Bates et al., 2015). We evaluated our candidate models with AIC c , ΔAIC c , and AIC c weights (R package GeNetIt). We considered models with a ΔAIC c < 2 to be competitive (Burnham & Anderson, 2002). Across all methods, we compared results from the SNP data, the subsampled microsatellite data, and the complete microsatellite data (n = 338) presented in Hauser et al. (2019) and Hauser and Leberg (2020).

| Population genetics
After filtering, the genomic dataset included 11,507 SNP loci for missingness of 15.2%. The microsatellite dataset was also subsampled to the same 178 individuals. We found no deviations from HWE or LD for both datasets after a Bonferroni correction at any of our study sites.
For the microsatellites, we found significant differences for H 0 and A r among sites (p < .001) but not for H e (p = .549). All sites, except BC and CB, had significantly lower H 0 than H e (Figure 2 in A r among sites, namely, ER and WR were significantly higher than the rest of the sites (Figure 2). Values for all genetic diversity metrics and their variances calculated using microsatellites were much higher than those using SNPs ( Table 2).
As expected, the full microsatellite dataset showed that most genetic differentiation was between central Texas sites and Fort Hood sites (Table 3; (Table 4). All migrants detected were from WR. Migrants in central Texas sites comprised a much larger proportion of the total population (14.7-29.5%) than those detected in Fort Hood (<1%-6.9%). Regardless of dataset, proportions TA B L E 1 Multivariate candidate models with predicted relationship with gene flow in parentheses (e.g., (−)Agriculture denotes a negative relationship between agriculture and gene flow), and the a priori hypothesis (rationale) for landscape genetic analyses for the black-capped vireo source-sink system Riparian areas, waterways, breeding habitat, and site productivity Note: These set of a priori candidate models were used for both microsatellite and SNP datasets.
Abbreviations: Ag, agriculture; Dev, developed; Distance, isolation by distance; From_CowbirdControl the level of Brown-headed Cowbird control at the emigration site, From_Scrub the area of scrub habitat at the emigration site.
of migrants in central Texas sites were an order of magnitude higher than those in Fort Hood.
In CERVUS, the microsatellite analysis assigned 20 parentoffspring pairs at the 95% confidence interval (Table 5). We identified most offspring assigned to parents as migrants (n = 16), of which most were from Fort Hood (n = 14). We found directional migration from Fort Hood to central Texas (n = 4) compared with central Texas to Fort Hood (n = 1). The full microsatellite dataset assigned more parent-offspring pairs (n = 21) at the 95% confidence interval (Hauser et al., 2019;Hauser & Leberg, 2020) and indicated similar patterns of directional migration from Fort Hood to central Texas as the subsampled dataset. The SNP-based analysis did not assign any candidate parents to offspring across any of the tested parameter settings. For the subsampled microsatellite data, 11 of the 20 candidate models had ΔAIC c < 2, including the null model, indicating a substantial loss of power using microsatellites with this reduced sample size (Table 6). Top models with ΔAIC c < 2 for the SNP dataset were "riparian + water + scrub" and "development" (Table 6). The top models from Hauser and Leberg (2020) Seddon et al., 2005) and genetic structuring (Liu et al., 2005;Morin et al., 2009;Seddon et al., 2005). However, the Bayesian clustering approach STRUCTURE was unable to detect fine-scale population structuring for either marker. This software has been found to perform poorly with fine-scale structure (Janes et al., 2017)

F I G U R E 3
Pairwise F ST estimates (dots) with 95% confidence intervals (error bars) between black-capped vireo populations: BC, CB, ER, SS, MD, WR. Estimates that overlap with 0 are not statistically significant and estimates in which 95% confidence intervals overlap are not statistically different from one another assertion that although SNP data have substantially higher statistical power than microsatellite data, these benefits do not necessarily extend to parentage analysis due to the low heterozygosity values  Note: Directional movement between regions (Fort Hood (FH) and central Texas (CT) (e.g., FH to CT denotes movement from Fort Hood to central Texas), percentage of each subcategory, migrants or residents (%), and percentage of total number of assigned offspring (% total) are also shown. Assignment data using SNP data not shown as there were no successful parent-offspring assignments. Kleinman-Ruiz et al., 2017;Seddon et al., 2005), significant findings do not necessarily translate to biologically relevant differences. Statistically significant differences found in genetic diversity (heterozygosity and allelic richness) and structure metrics (F ST ) among the black-capped vireo study sites were extremely small (on the order of thousandths) and may lack biological significance.
When calculating population genetic metrics, large SNP datasets, such as ours, increase the chance of statistically significant results (using p-values or 95% confidence intervals) and Type I error of results (Wigginton et al., 2005).
This study serves as one of the first direct marker comparisons (others include Hall and Beissinger, (2014) and Puckett and Eggert, (2016)) in a landscape genetic context showing varying results between SNPs and microsatellites. Neither SNPs nor microsatellites found any evidence for isolation by distance, as would be expected for a metapopulation with considerable admixture as seen here (Gaggiotti, 1996;Jenkins et al., 2010 using SNPs (Morin et al., 2009;Muñoz et al., 2017) and therefore could yield more accurate landscape genetic inferences. However, metrics such Dps as used in the present study have not been used in formal comparison and simulation studies. We need further investigations in landscape genetics to understand the respective accuracy and precision of microsatellites and SNPs, especially as many contemporary landscape genetics research is being done with one marker or the other.
We show that overall SNP and microsatellite data can infer similar biological processes and patterns. Microsatellites can still be used for a wide variety of population or conservation questions, despite an extensive adoption of genomics techniques in the field. We especially want to add our voice to the assertion for systems with existing or legacy microsatellite panels, in which development of new markers would be costly, piecewise genotyping is commonplace (as found in management), or where bioinformatics expertise or computational power is not accessible (Flanagan & Jones, 2019). In species with low genetic diversity or that have experienced bottlenecks, especially prevalent in conservation genetics, multiallelic markers, such as microsatellites, could provide the necessary power in parentage analyses when SNPs cannot. Nevertheless, in developing new molecular markers for a population genetic study, SNPs are less expensive per locus than microsatellites and have substantially more statistical power than microsatellites for most comparisons, yielding a cost-effective approach over microsatellites (Flanagan & Jones, 2019). SNPs also allow for investigation into adaptive variation with loci under selection whereas microsatellites cannot (Ahrens et al., 2018;Flanagan & Jones, 2019;Helyar et al., 2011). A third-choice researchers should consider is the microhaplotype, a multiallelic marker produced through next-generation techniques, which yields a higher number of loci than microsatellites and is randomly distributed throughout the genome (Baetscher et al., 2018). We urge researchers to thoroughly consider the utility of each marker based on their system and urge reviewers and editors not to disregard research using microsatellites. This comparison serves as an illustration of such a case where microsatellite and SNP results converge in conclusions and microsatellites still maintain a function in population genetics.

ACK N OWLED G M ENTS
We would like to acknowledge our funding through the Department Research, and the Natural Resources Division, particularly D.
Cimprich and his team at Fort Hood for providing us access and resources for this research. We also would like to thank the two anonymous reviewers who provided thoughtful and constructive feedback.

CO N FLI C T O F I NTE R E S T
The authors have no conflicts of interest to report.