Contemporary genetic structure affects genetic stock identification of steelhead trout in the Snake River basin

Abstract Genetic stock identification is a widely applied tool for the mixed‐stock management of salmonid species throughout the North Pacific Rim. The effectiveness of genetic stock identification is dependent on the level of differentiation among stocks which is often high due to the life history of these species that involves high homing fidelity to their natal streams. However, the utility of this tool can be reduced when natural genetic structuring has been altered by hatchery translocation and/or supplementation. We examined the genetic population structure of ESA‐listed steelhead in the Snake River basin of the United States. We analyzed 9,613 natural‐origin adult steelhead returning to Passive Integrated Transponder detection sites throughout the basin from 2010 through 2017. Individuals were genotyped at 180 single nucleotide polymorphic genetic markers and grouped into 20 populations based on their return location. While we expected to observe a common pattern of hierarchical genetic structuring due to isolation by distance, we observed low genetic differentiation between populations in the upper Salmon River basin compared to geographically distant populations in the lower Snake River basin. These results were consistent with lower genetic stock assignment probabilities observed for populations in this upper basin. We attribute these patterns of reduced genetic structure to the translocation of lower basin steelhead stocks and ongoing hatchery programs in the upper Salmon River basin. We discuss the implications of these findings on the utility of genetic stock identification in the basin and discuss opportunities for increasing assignment probabilities in the face of low genetic structure.

For example, conducting GSI on sockeye salmon (Oncorhynchus nerka) caught in the Port Moller test fishery allowed real-time shifts in fishing effort in Bristol Bay to reduce the risk of overharvesting low abundance stocks in the mixed-stock fishery (Dann et al., 2013).
In addition, GSI has been used to monitor the status and trends of steelhead (O. mykiss, Figure 1) stocks in the Snake River basin that are listed as threatened under the US Endangered Species Act (Northwest Fisheries Science Center, 2015). This method makes use of genetic data from reference populations (representing the contributing stocks) as a baseline to assign fish of unknown origin (e.g., Anderson, Waples, & Kalinowski, 2008;Hasselman et al., 2016;Shaklee, Beacham, Seeb, & White, 1999). Genetic stock identification is most effective when species are phylopatric, and restricted dispersal among populations leads to significant levels of genetic differentiation (Araujo, Candy, Beacham, White, & Wallace, 2014).
Many salmonid species exhibit both strong homing fidelity and genetic structuring making GSI an effective tool to use in their management.
Despite its widespread use, the accuracy of GSI can be reduced when natural genetic structuring has been altered by hatchery translocation and/or supplementation with out-of-basin stocks. For example, Pearse, Martinez, and Garza (2011), found that anthropogenic changes to wild coastal steelhead populations in California, including hatchery supplementation with common stocks, reduced genetic differentiation relative to historic conditions. Reductions in genetic differentiation among populations can adversely affect the accuracy of genetic stock identification. Substantial error can occur in estimating stock compositions when mean F ST among populations is less than 0.01 (Araujo et al., 2014). Because hatchery supplementation programs are widespread throughout the North Pacific Rim understanding changes to genetic structure across the landscape from these programs is an important step in accounting for uncertainty in GSI analyses.
Throughout the Columbia River basin of Washington, Oregon, and Idaho, GSI is widely used to inform management of steelhead for both harvest (e.g., Byrne et al., 2018) and to monitor status and trends for populations listed under the Endangered Species Act (Northwest Fisheries Science Center, 2015). Matala, Ackerman, Campbell, and Narum (2014) found significant isolation by distance (IBD) in both coastal and inland steelhead lineages throughout the Columbia River basin. Isolation by distance would lead to the expectation that across basins the most genetically differentiated populations should occur in the streams located in the headwaters of the basins. However, patterns of IBD can be disrupted by anthropogenic influences, such as supplementation with non-native stocks that alter dispersal across the landscape as was seen in California by Pearse et al. (2011).
Hatchery supplementation has been used in the Snake River, a tributary to the Columbia River, to mitigate for lost habitat and fisheries as a result of hydropower development (Busby et al., 1996). In addition, by the early 1960s, the Idaho Department of Fish and Game had initiated efforts to use captive-reared fish to supplement or reestablish steelhead populations in their historically occupied range in Idaho (Bjornn, 1978). Beginning in 1966, efforts were made by the Idaho Department of Fish and Game to supplement steelhead in the upper Salmon River using fish trapped at the recently completed Hells Canyon Dam (Figure 2, Reingold, 1967). These translocations continued annually through 1972 and their success resulted in the founding of two hatchery populations in the upper Salmon River watershed (Stiefel, 2013). These hatchery populations serve as genetic repositories for the steelhead stocks that previously spawned above Hells Canyon Dam (where there is no fish passage), and provide harvest opportunities as part of legally mandated mitigation programs.
Previous research has documented historical steelhead translocation and supplementation efforts in the Snake River basin, but the consequences of such introductions on GSI have yet to be examined. The objective of this study was to describe the contemporary genetic structure of steelhead populations across the Snake River basin and assess the effects of historical translocations into the upper Salmon River watershed on patterns of IBD and GSI. To accomplish this objective, we genotyped representative samples from across the basin using fin tissues collected from natural-origin adults at Lower Granite Dam whose last known locations were determined using Passive Integrated Transponder (PIT) tag detection sites. Lower Granite Dam provides a sampling point for all steelhead migrating upstream in the Snake River, and PIT tag arrays allow assignment of spawning locations for presumed natural-origin fish.
Returning adult steelhead with both a genetic sample and PIT tag detection can be used to assess the genetic structure in the basin. To assess the impacts of translocations on genetic structure we quantified patterns of IBD at a basin-wide scale including and excluding supplemented populations in the upper Salmon River watershed.
Finally, we assessed the accuracy and precision of GSI assignments of samples from the upper Salmon River watershed relative to other populations throughout the Snake River basin. Vu et al. (2015) developed the Snake River steelhead baseline version 3.1 that we used to assign individuals to one of 10 genetic stocks that were initially identified in the Snake River basin by Ackerman   Figure 2). These genetic stocks roughly correspond to, or are contained within, major population groups (MPGs) identified in the Snake River basin (Ackerman et al., 2012). This GSI baseline was constructed using O. mykiss collected between 1999 and 2013 (Vu et al., 2015), and reflects contemporary steelhead genetic structure in the Snake River basin. Accuracy of this baseline was initially assessed by Vu et al. (2015) using self-assignment tests in the program gsi_sim (Anderson, 2010;Anderson et al., 2008).

| Data collection
We analyzed 31,444 genetic samples collected from putatively natural-origin adult steelhead at the Lower Granite Dam adult fish trapping facility in spawning run years (July 1st-June 31st) 2010-2017. A small, nonlethal sample of fin tissue was collected from each fish for genotyping and subsequent GSI analyses. Wild adult steelhead that were not PIT tagged at time of capture in the adult fish facility had one inserted (Ogden, 2019 and references therein). Tissue samples were stored either in 95% nondenatured ethanol or on dry Whatman sampling paper (Lahood, Miller, Apland, & Ford, 2008) prior to extraction. Genomic DNA was extracted using a Nexttec Genomic DNA Isolation Kit for Fish Tissue according to the manufacturer's instructions (www.nextt ec.biz), and fish were genotyped at a panel of 180 single nucleotide polymorphisms (SNPs) used in the Columbia River steelhead GSI baseline (Hess, Campbell, et al., 2014). Prior to analysis, we removed locus Omy_IL1b-163 due to poor performance (Vu et al., 2015). Genotyping was performed using Fluidigm® 96.96 Dynamic Array™ IFCs (chips) for steelhead returning in spawning run years 2010-2015. For spawning run years 2016-2017, genotyping was performed using the Genotyping-in-Thousands by sequencing (GT-seq) protocol (Campbell, Harmon, & Narum, 2015) on an Illumina NextSeq 500 DNA sequencer (Illumina). More detailed methods for, and results from, these genotyping efforts are reported elsewhere (Ackerman et al., 2012Powell et al., 2017;Powell et al., 2018;Vu et al., 2015).
Samples were filtered to include only natural-origin fish that successfully genotyped at ≥90% of the amplified loci. We identified fish as natural-origin adult steelhead if they had no marks (e.g., adipose or ventral fin clip), no visible fin erosion (Latremouille, 2003), no coded wire tag (CWT), and did not assign to the Snake River hatchery steelhead parentage based tagging baselines. Samples were then further F I G U R E 2 Location of adult steelhead sampling sites color coded by genetic stock. The gray triangle represents the Lower Granite Dam adult fish facility where returning steelhead were implanted with PIT tags and genetically sampled. Black circles represent PIT tag detection sites used in this study (Appendix). The gray square represents Hells Canyon Dam filtered to include only those adult steelhead that were assigned a spawning location in one of 20 populations described by NMFS (2017, Appendix) based on PIT tag detection ( Figure 2). Clear Creek was split from the lower Clearwater River population (CRLMA-s) due to the fact that previous analyses indicate that steelhead from Clear Creek are genetically more similar to collections in the South Fork Clearwater River population than collections from other drainages in the lower Clearwater River population (Ackerman et al., 2012;Vu et al., 2015). This new population was labeled CRLMA-s*. Spawning locations were assigned for adults returning in spawning run years 2010-2015 based on the upstream-most PIT tag detection site (i.e., maximum river kilometer from the mouth of the Columbia River) in a spawn year (Powell et al., 2017). For adults returning in spawning run years 2016-2017, spawning population assignments were determined based on the range of dates across which an individual was present above a given PIT tag detection site (Orme & Kinzer, 2018).
A total of 9,613 adult steelhead passed our filtering criteria and were included in the final analysis.

| Data analysis
Using PIT detections that reflected presumed natal origin, the corresponding genetic samples were grouped into 20 collections based on populations described in NMFS (2017). We set the minimum population size for this study to be 20 returning PIT tagged adult steelhead (Pruett & Winker, 2008) to minimize bias in population genetic parameter estimates. Populations with greater than 20 individuals sampled within a return year were tested for deviations from Hardy-Weinberg Equilibrium with the R package HardyWeinberg version 1.6.3 (Graffelman, 2015;Graffelman & Morales-Camarena, 2008).
We sought to ensure that the final analysis made comparisons across genetically homogenous collections because we sampled returning adults across multiple spawn years. To that end, we tested for genetic differentiation across spawning return years and PIT tag detection sites for all populations with more than 20 detected adult steelhead using 10,000 permutations in the R package hierfstat version 0.04-22 (Goudet & Jombart, 2015). If samples from the same population were statistically differentiated among years, we then tested for differentiation among PIT tag detection sites within years for all years with more than 20 detections. Tests were performed using a Bonferroni adjusted α based on 115 potential simultaneous tests of genetic differentiation (adjusted α = 4.35 * 10 -4 ). Within populations, only groups of genetically homogenous spawn year and PIT tag detection site combinations were used for analysis.
For GSI assignments, we used the full Expectation-Maximization algorithm maximum likelihood estimate option in the program gsi_sim (Anderson, 2010;Anderson et al., 2008). Individuals were assigned to one of 10 genetic stocks in the Snake River steelhead baseline version 3.1 (Vu et al., 2015) based on their maximum probability of membership using the allocate sum procedure (Wood, McKinnell, Mulligan, & Fournier, 1987). These genetic stocks roughly correspond to the major population groups (MPGs) into which these 20 populations defined by NMFS (2017) are aggregated. To assess accuracy of individual GSI assignments, we calculated the proportion of fish returning to PIT tag detection sites within the Snake River basin that assigned to the appropriate genetic stock. To quantify the uncertainty of the GSI assignments, we calculated the cumulative distribution of assignment probability observed for each PIT tagged adult steelhead used in the analysis.
We calculated F ST using Weir and Cockerham's θ (Weir & Cockerham, 1984) in the R package hierfstat version 0.04-22 (Goudet & Jombart, 2015). We averaged pairwise F ST values for populations with genetically differentiated spawning run years to have a single F ST value for any pair of populations in the analysis. Stream distance was calculated using the river kilometer of the lowest PIT tag detection site within each population reported to PTAGIS (www. ptagis.org).
We examined patterns of IBD using the ratio of as the response variable in a linear regression with stream distance between populations (Rousset, 1997). We tested for an association between stream distance and genetic distance using Mantel tests (Mantel, 1967) with 10,000 permutations. We used a statistical test for comparing the strength of IBD described in Powell (2014) that is analogous to the construction of Mantel based confidence intervals presented in Manly (2007).
In this test, if two sets of populations have equivalent patterns of genetic differentiation we expect to observe no relationship between the stream distance matrix of one set of populations (e.g., all populations in their true location) and the residual genetic distance matrix calculated using the slope of the regression line from another set of populations (e.g., populations after removing the upper Salmon River populations). Because the sample set after removing the upper Salmon River populations is used in both steps of the equivalence test, we set this as our reference regression line. If there was a homogenizing effect of translocating steelhead from the mid-Snake River to the upper Salmon River then we should observe a negative correlation between the residual genetic distance matrix for the full dataset and the stream distance matrix. We would also expect to observe no relationship between the stream distance matrix and the residual genetic distance matrix after moving the upper Salmon River populations to Hells Canyon Dam. We incorporated uncertainty in the estimated slope of the regression line describing IBD after removing the upper Salmon River populations by sampling 10,000 slope coefficients from a Normal distribution with mean equal to the slope of the IBD line and standard deviation equal to the estimated standard error of the slope parameter. For each of these 10,000 sampled slope coefficients we estimated a residual distance matrix and calculated a Mantel test statistic using a randomized stream distance matrix. Significance of IBD relationships were determined based on the proportion of these randomizations that producing a smaller test statistic to that observed with the original stream distance matrix.
We constructed a neighbor-joining tree for the populations based on Cavalli-Sforza Edwards chord distance (Cavalli-Sforza & Edwards, 1967) using PHYLIP v3.5 (Felsenstein, 1993). In addition to PIT tag returns we also included the 2016 broodstocks for Pahsimeroi, Oxbow, and Sawtooth fish hatcheries to provide collections representing the hatchery stocks used to supplement the upper Salmon River. Branch support was estimated by resampling loci 1,000 times, and trees were visualized with Dendroscope version 3.5.9 (Huson & Scornavacca, 2012). Unless otherwise stated all analyses were performed in the R version 3.6.1 (R Core Team, 2019).

| RE SULTS
Three loci (Omy_109894-185, Omy_aldB-165, OMS00095) were out of Hardy-Weinberg equilibrium due to a deficit of heterozygotes in more than half of the populations analyzed in a given year and were removed from analysis. We observed significant genotypic differentiation across spawning run years and PIT tag detection sites within the lower Clearwater River (CRLMA-s), Imnaha River We observed a general pattern of increasing assignment accuracy of GSI moving upstream from the mouth of the Snake River (Table 1). However, this pattern of increasing assignment accuracy was not directly replicated with a similar pattern of increasing assignment confidence (Figure 3). For example, we observed similar average individual assignment probabilities in the upper Salmon River and the lower Snake River basin genetic stocks (Figure 3).
We observed a pattern of IBD in natural-origin steelhead across the Snake River basin (p-value ≤ .0001, ρ = 0.34; Figure 4).
We observed a stronger association between genetic distance and geographic distance (p-value = .011) after excluding populations in the upper Salmon River from analysis (p-value ≤ .0001, ρ = 0.67; Figure 4). We did not observe a difference in the relationship between genetic distance and geographic distance between the test that excluded populations in the upper Salmon River and the test TA B L E 1 The proportion of steelhead returning to PIT tag detection sites within a genetic stock that were assigned to one of 10 genetic stocks in the Snake River steelhead GSI baseline version 3. Note: Off diagonal values in each row report the proportion of steelhead that assign to a genetic stock identification reporting unit in which they did not return to spawn.

| D ISCUSS I ON
We used a comprehensive survey of 9,613 adult steelhead returning to PIT tag detection locations across the Snake River basin to investigate patterns of contemporary genetic structure. We found that populations of wild adult steelhead exhibit a pattern of IBD across the Snake River basin. This finding is consistent with expectations based on work performed across the eastern Pacific in both coastal (Arciniega et al., 2016;Garza et al., 2014;Heath, Pollard, & Herbinger, 2001;Pearse, Donohoe, & Garza, 2007;Pearse et al., 2011) and interior lineages . However, the strength of the association between geographic and genetic distance in the Snake River basin has likely been reduced as a result F I G U R E 3 Cumulative distribution functions of assignment probability for the 10 genetic stocks in the Snake River steelhead GSI baseline v3.1. This figure reports the probability that an individual assigned to a given genetic stock (lines) assigns to that genetic stock (y-axis) with at least a specified probability (x-axis) Sawtooth Fish Hatchery was founded using smolts from Pahsimeroi Fish Hatchery (Moore, 1983). While hatchery production efforts in the upper Salmon River have also included the release of juvenile steelhead from the Clearwater River basins (Stiefel, 2013), the results of the genetic analyses described in this paper and by previous authors (Blankenship et al., 2011;Nielsen, Byrne, Graziano, & Kozfkay, 2009)  River is assigned to this genetic stock than other lower basin genetic stocks (Table 1) (Hartl & Clark, 2007) the increase in genetic divergence among populations is a slow process. Therefore, our confidence in genetic stock assignments in the upper Salmon River is reduced as a result of the lower genetic divergence within the Snake River basin due to the success of past translocation efforts, while the accuracy of these assignments reflects a low level of straying due to their geographic isolation.
Genetic stock identification has been an important tool for monitoring wild steelhead in the Snake River Evolutionarily Significant Unit despite the low genetic differentiation among steelhead populations from the upper Salmon River and middle Snake Rivers. The primary use of GSI in the Snake River basin has been to parse the total wild escapement of adults that pass Lower Granite Dam for annual stock abundance estimation (Camacho et al., 2019). Powell et al. (2018) showed that estimated genetic stock proportions are unbiased and that individual assignment accuracy for the Middle Fork Salmon River, South Fork Salmon River, and Upper Clearwater River reporting groups is high (>90%). These watersheds are solely managed for wild fish production with no history of hatchery supplementation, characteristics that make them priorities for monitoring and conservation. However, prior to the implementation of GSI, abundance estimates for these areas were largely unavailable (Busby et al., 1996;Good, Waples, & Adams, 2005) due to the location of many populations in remote or wilderness areas, and environmental conditions at the time of spawning preventing the use of traditional counting methodologies (weirs, rotary screw traps, and redd count surveys). Therefore, GSI remains a critical tool for monitoring wild steelhead in the Snake River basin because of how difficult it is to estimate abundance in these populations with other methods.
Although the results presented here (and previously) indicate some limitations of differentiating stocks that have shared ancestries from translocation and supplementation efforts, there are opportunities to increase assignment accuracy by incorporating SNPs under selection (Ackerman, Habicht, & Seeb, 2011) and by moving to loci that contain multiple alleles (i.e., microhaplotypes) over single-SNP loci (Baetscher, Clemento, Ng, Anderson, & Garza, 2018).
Advances in reduced-representation sequencing and whole-genome sequencing make finding these loci much more cost-effective and efficient than previous methods (Andrews, Good, Miller, Luikart, & Hohenlohe, 2016;Li & Wang, 2017) and will be the focus of our GSI work moving forward.

ACK N OWLED G M ENTS
We would like to thank J. Dillon, T. Copeland, B. Leth, T. Delomas, J. Hargrove, and three anonymous reviewers for providing helpful comments on earlier drafts of this manuscript. Primary funding for this project comes from the Bonneville Power Administration (Project #2010-026-00).

CO N FLI C T O F I NTE R E S T
The authors report no conflict of interest. funding acquisition (lead); project administration (lead); resources