A genotyping‐in‐thousands by sequencing panel to inform invasive deer management using noninvasive fecal and hair samples

Abstract Studies in ecology, evolution, and conservation often rely on noninvasive samples, making it challenging to generate large amounts of high‐quality genetic data for many elusive and at‐risk species. We developed and optimized a Genotyping‐in‐Thousands by sequencing (GT‐seq) panel using noninvasive samples to inform the management of invasive Sitka black‐tailed deer (Odocoileus hemionus sitkensis) in Haida Gwaii (Canada). We validated our panel using paired high‐quality tissue and noninvasive fecal and hair samples to simultaneously distinguish individuals, identify sex, and reconstruct kinship among deer sampled across the archipelago, then provided a proof‐of‐concept application using field‐collected feces on SGang Gwaay, an island of high ecological and cultural value. Genotyping success across 244 loci was high (90.3%) and comparable to that of high‐quality tissue samples genotyped using restriction‐site associated DNA sequencing (92.4%), while genotyping discordance between paired high‐quality tissue and noninvasive samples was low (0.50%). The panel will be used to inform future invasive species operations in Haida Gwaii by providing individual and population information to inform management. More broadly, our GT‐seq workflow that includes quality control analyses for targeted SNP selection and a modified protocol may be of wider utility for other studies and systems where noninvasive genetic sampling is employed.

that can pose challenges for massively parallel DNA sequencing because of poor DNA quality, low DNA quantity, and exogenous DNA contamination (Andrews et al., 2018;Russello et al., 2015). Such impediments are now starting to be overcome, driven in part by multiplexed targeted amplicon sequencing (MTAS) approaches (Eriksson et al., 2020;Hayward et al., 2022;Natesh et al., 2019;. MTAS can target hundreds of single nucleotide polymorphisms (SNP) within a single PCR to genotype thousands of individuals in parallel (Campbell et al., 2015), yet genotyping success and sample retention has varied widely when applying these approaches to noninvasive samples (Eriksson et al., 2020;Hayward et al., 2022;Natesh et al., 2019).
While low costs and fast turnaround times have been highlighted as major advantages of MTAS for applications in conservation and molecular ecology (Campbell et al., 2015;Natesh et al., 2019), a unique aspect of these approaches is their scalability, wherein investigators can strategically include targeted loci that allow SNP panels to simultaneously provide different types of relevant information (e.g., individual/species identification, sex, population assignment). Such designs have been demonstrated in several studies using traditional genetic samples (Bootsma et al., 2020;Chang et al., 2021;May et al., 2020;Sjodin et al., 2020), but have been limited for those employing noninvasive sampling, especially regarding the number of loci and/or samples genotyped as well as the form of starting material. For example, Eriksson et al. (2020) demonstrated high levels of genotyping success and sample retention using coyote feces, but only multiplexed a small number of loci (n = 26). In contrast, Hayward et al. (2022) successfully genotyped several hundred loci using polar bear feces, but experienced only ~60% sample retention even following initial sample screening using species-specific qPCR. Importantly, such studies employing MTAS have been largely limited to fecal samples despite the fact that hair remains a valuable DNA source for many mammalian systems dating back to the first studies employing noninvasive genetic sampling (Andrews et al., 2018;Morin & Woodruff, 1992;Taberlet & Bouvet, 1992;Waits & Paetkau, 2005).
In the Haida Gwaii archipelago (British Columbia, Canada), Sitka black-tailed deer (Odocoileus hemionus sitkensis) is a widespread invasive species causing severe ecological and cultural impacts ( Figure 1). Deer have disrupted ecosystem structure and dynamics due to their unconstrained browsing of native vegetation, preventing temperate rainforest regeneration; these impacts have been observed on numerous islands with deer present, which exhibit lower species richness compared to deer-free islands across several taxa Stockton et al., 2005). In addition, the Haida Nation have longstanding practical and spiritual relationships with several preferred food species of Sitka black-tailed deer, such as western red cedar (Thuya plicata; Parks Canada and the Council of the Haida Nation, 2018). Given many islands with invasive deer lay within the boundary of Gwaii Haanas National Park Reserve, National Marine Conservation Area Reserve, and Haida Heritage Site (hereafter, Gwaii Haanas), Parks Canada and the Council of the Haida Nation have been working together to manage deer on ecologically and culturally significant islands within the park. Management of this highly mobile and invasive deer species could benefit from rapid and cost-effective tools that provide individual-and population-level information to guide levels of effort and approaches to reducing populations.
We used a modified protocol to successfully optimize the panel for use with noninvasive fecal and hair samples, demonstrating high levels of genotyping success and sample retention at hundreds of loci that were similar to those achieved with paired high-quality deer tissue samples. We further provided a proof-of-concept application of our workflow to distinguish individuals, identify sex, and reconstruct  were collected and stored dry (tissue/hair) or in 95% ethanol (tissue/feces) at −20°C. In 2020, a Parks Canada survey team collected fecal pellets (field feces, n = 33) on SGang Gwaay that were stored in 95% ethanol at 4°C. Sample collectors used alcohol wipes to sterilize hands and equipment, as well as pre-filled tubes containing 95% ethanol. Hairs were pulled from deceased animals and tissue was cut from ears using sterile knives. Toothpicks were used for collecting deer fecal pellets from the ground. Feces were also collected immediately after death from the colons of deceased animals (i.e., colon feces), minimizing environmental exposure that can lead to DNA degradation.

| Sample collection and DNA extractions
Genomic DNA from ear tissue (10-20 mg) was extracted using the Qiagen DNeasy ® Blood and Tissue Kit following the manufac-

| SNP selection for GT-seq panel development
We used a previously generated restriction-site associated DNA Each of the 16 SNP datasets was used to calculate the probability of identity (P ID ) and probability of identity between siblings (P IDSIB ) using GenAlEx v.6.5 (Peakall & Smouse, 2012). We compared the number of SNPs required from each MAF category to meet our designated threshold for acceptable individual identification where the probability of identification is less than the reciprocal of sample size (Peakall et al., 2006). Next, the ability of each dataset to estimate pairwise kinship was evaluated by employing the iRel r-package (Gonçalves da Silva & Russello, 2011). Four replicates of each SNP dataset were used to simulate 1,000,000 dyads for each of two forms of first-degree relatives (parent-offspring, PO; full siblings, FS), second-degree relatives (half siblings, HS), third-degree relatives (first cousins, FC), and unrelated (UR). Relationship assignment accuracy was calculated and averaged using all dyads for each relatedness category.
To identify SNPs with sufficient power to identify sex, we used the RADseq dataset to calculate allele frequencies by locus separately for males (n = 90) and females (n = 75), implemented in VCFtools v.0.1.16 (Danecek et al., 2011). We selected loci with differences in allele frequency between sex (≥0.20) and validated their performance using the Bayesian clustering approach implemented in STRUCTURE v.2.3.4. (100,000 MCMC replicates, 100,000 burn-in; Pritchard et al., 2000). For validation, we genotypically assigned 28 individuals of known phenotypic sex that were not among those used to generate the RADseq dataset.
DNA sequences containing candidate SNPs for GT-seq panel development were sent to GTseek LLC (https://gtseek.com/) for subsequent primer design. Primers were designed to be 15-17 bases in length with a GC content between 20% and 80% and annealing temperature range between 58 and 63°C. The targeted amplicons were between 50 and 100 bases.

| Panel optimization using high-quality DNA samples
The initial GT-seq library was constructed using high-quality tissue samples from 92 individuals that were previously genotyped using RADseq (Burgess et al., 2022). These individuals all had paired noninvasive samples (n hair = 46, n colon feces = 46), represented a ~1:1 male/ female sex ratio, and maximized the geographic representation of the dataset (Figure 2). We included four replicates that were previously used to estimate RADseq genotyping error (Burgess et al., 2022), enabling comparison between the two genotyping methods.
Library construction followed the original protocol described The PCR1 products were diluted 1:20 and 3 μl was used from each sample for amplification in PCR2. Next, PCR2 products were quantified with Quant-iT™ PicoGreen™ dsDNA assays (Invitrogen) using a Viia7 real-time PCR system (Life Technologies). PCR products were normalized to 20 ng/μl before pooling 5 μl from each sample for library purification using the MinElute ® PCR Purification Kit. Purified libraries (n = 4; see below) were eluted into 24 μl of nuclease-free water and sequenced using one Mid Output Reagent Kit (300 cycles) per library on an Illumina MiniSeq™.
Using the GT-seq bioinformatics pipeline (https://github.com/ GTseq), raw sequencing reads were demultiplexed into individual. fastq files and genotyped (GTseq_Genotyper_v3.pl). Panel optimization was largely guided by the outputs of GTseq_SeqTest.pl and GTseq_Primer-Interaction-Test.pl; the former identified overrepresented primers based on the proportion of total reads that contained a forward primer, while the latter identified primers that were responsible for copious primer-dimers or undesirable PCR artifacts. All primers found to be negatively affecting genotyping success were omitted from a second GT-seq library, which was constructed, sequenced, and genotyped following the same procedure described

| Panel optimization using noninvasive DNA samples
A third GT-seq library was prepared using noninvasive samples (n hair = 46, n colon feces = 46) that were paired with previously genotyped high-quality tissue samples and 33 field fecal samples collected from SGang Gwaay (Figure 3). The library also included four replicates of each sample type (e.g., hair, colon feces, field feces) for a total of 137 samples. Library construction was identical to previous libraries except samples were standardized at 3 ng/μl when possible due to low DNA concentrations in noninvasive samples after extraction. To account for the degraded DNA template of low concentrations, we performed PCRs twice using separate i7 primers and combined the resulting sequence data bioinformatically.
Given poor genotyping of the third library (see Results), we con- The final library was sequenced and then genotyped using the GTseq bioinformatics pipeline as detailed above.

| Panel validation for invasive species management applications
After the initial genotyping of each library, samples were removed if they yielded >50% missing data. The percentage of genotyping success was then calculated by locus and those with >50% missing data were excluded from the final SNP dataset. Genotyping error was measured as the discordance of genotypes between replicate samples and was calculated both within-sample types and acrosssample types using paired samples. Genotyping success and error were also calculated across sequencing methods by comparing the final GT-seq SNP dataset to the same sites generated using RADseq.
To validate our final GT-seq panel for the targeted invasive species management applications of individual identification, pairwise kinship estimation, and sex identification, we performed F I G U R E 3 Paired sampling strategy demonstrating the transition from noninvasive samples to genotypes using GT-seq and the validation of non-invasive samples with high-quality tissue samples comparative analyses between the noninvasive samples genotyped using GT-seq and their high-quality pairs originally genotyped using RADseq. GenAlEx was used to calculate P ID and P IDSIB and determine the number of unique individuals. Pairwise kinship was calculated between individuals in GenoDive v.3.0 (Meirmans & Van Tienderen, 2004) using the relative probability of allelic identity-bydescent (Loiselle et al., 1995). Finally, we employed Bayesian clustering (Pritchard et al., 2000) of the retained sex-associated SNPs to assign sex of noninvasive samples collected from individuals of known sex in the RADseq dataset. As a proof-of-concept application, the same analyses were used to identify and sex individuals and estimate pairwise kinship using the field fecal samples from SGang Gwaay (n = 33).

| SNP selection
Across both individual identification and pairwise kinship analyses, high MAF SNPs consistently performed better than the other categories. For individual identification, high MAF SNPs were most efficient at calculating P ID and P IDSIB , requiring only 6 and 10 SNPs, respectively; however, our results indicated that 40 SNPs of any MAF were able to meet this threshold (Figure 4a). For the pairwise kinship estimations, high MAF SNPs exhibited the greatest assignment accuracy, which improved with increasing numbers of SNPs across all categories (Figure 4b). Eight SNPs were identified as sexassociated candidates and successfully assigned sex in all 193 deer of known phenotypic sex (Figure 4c).
Based on these results, we submitted DNA sequences for GTseq primer design for 466 high MAF SNPs, plus eight sex-associated SNPs. Of these 474 candidate SNPs, primers were successfully designed for 369 after in silico testing, including for all eight sexassociated SNPs.

| Panel optimization
Sequencing of our first GT-seq library resulted in 10,925,017 paired-end reads, but genotyping percentage across all loci was low (12.5%). We removed 39 loci that were negatively impacting genotyping success, leaving 330 for construction of our second library.
Sequencing of this library resulted in 12,563,048 paired-end reads and genotyping percentage across all loci improved to 86.5%. A single sample was removed due to >50% missing data. An additional 22 loci were removed during optimization (i.e., the corresponding primers were over-amplifying and taking up a disproportionate F I G U R E 4 Analyses to inform GT-seq SNP selection for Sitka black-tailed deer. (a) Probability of identity (P ID ) and probability of identity of siblings (P IDSIB ) calculated using loci of varying minor allele frequency (MAF). The horizontal black line represents the threshold of acceptable individual identification (0.007). (b) Mean probability of correctly identifying first-degree (full-sibling), second-degree (halfsibling), third-degree (first cousin), and unrelated relationships using varying numbers of loci across different MAFs. (c) Sex assignment using Bayesian clustering of eight sex-associated loci number of reads), resulting in a panel of 308 SNPs that included 7 sex-associated SNPs.
The third library that was comprised of noninvasive samples yielded 17,155,628 paired-end reads; however, the mean percentage of on-target reads across samples was only 1.52%. Few samples (10.4%) genotyped at >50% of loci, including 12/46 (26.1%) hair samples, 1/46 (2.2%) colon fecal sample, and none of the field fecal samples (Figure 5a). After applying our revised PCR1 conditions to the same sample set, sequencing of the fourth library yielded 10,719,923 paired-end reads with 51.8% being on target. All hair samples genotyped at >50% of loci and only three fecal samples (one colon and two field) failed to genotype at >50% of loci, resulting in 122 of 125 samples retained (97.6%) for downstream analysis.
Sixty-four loci were removed due to >50% missing data, resulting in a final SNP dataset of 244 loci.

| Panel validation and application
Overall average genotyping success of retained noninvasive samples was high (90.3%) and comparable to that of high-quality tissue samples genotyped using both GT-seq (95.7%) and RADseq (92.4%; Table 1; Figure 5b). Within noninvasive sample types, genotyping success was 94.1% for hair, 89.4% for colon feces, and 85.6% for field feces. Mean sequencing depth was 176 for hair, 119 for colon feces, and 96 for field feces. Genotyping error was much lower (<0.8%) within GT-seq samples, regardless of sample type, than in highquality RADseq samples (6.0%; Burgess et al., 2022). Genotyping discordance between 45 paired tissue and hair samples was 0.27%, while genotyping discordance between 43 paired tissue and fecal samples was 0.74%. Bayesian clustering using the seven retained sex-associated SNPs accurately determined the sex of all test samples of known phenotypic sex (100%).
We identified 20 unique genotypes from the 33 field fecal samples collected on SGang Gwaay. In some cases, different pellets belonging to the same individual were geographically located within 5 m of each other, while others were found on opposite sides of the island (~650 m apart) demonstrating successful genetic markrecapture. All identified individuals from SGang Gwaay exhibited high levels of pairwise kinship (mean = 0.20). Thirty-two of 33 fieldcollected feces were sexed, as one sample was missing data at all sexing loci; of the 20 identified individuals, 9 were females and 11 were males.

| DISCUSS ION
Multiplexed targeted amplicon sequencing approaches, such as GTseq, have begun to overcome challenges related to data quality and quantity that are characteristic of noninvasive samples, allowing in-  Campbell, et al., 2020). Yet, there are still inefficiencies when applying GT-seq to noninvasive samples that can lead to sample loss, marker dropout, and degraded quality that can limit the informativeness of resulting genotypic data and the associated robustness of downstream analyses. Here, we optimized a GT-seq panel for utility with noninvasive fecal and hair samples; by adjusting the PCR1 conditions to account for low-quantity and degraded DNA, we demonstrated high sample retention (97.6%), genotyping success (90.3%), and genotyping concordance (99.5%) relative to those produced from high-quality tissue sample at the same panel of 244 SNPs.
Importantly, the adjustments we made did not require additional laboratory equipment or expertise, and maintained the rapidity and cost-effectiveness that has elevated GT-seq as an MTAS approach of choice in molecular ecology (Eriksson et al., 2020;Hayward et al., 2022;Schmidt, Campbell, et al., 2020).
Another advantage of GT-seq is the ability to design panels that can simultaneously provide insights at multiple scales by strategically targeting SNPs of varying signal. To maximize the effectiveness of multi-scaled panels, it is important to conduct sensitivity analyses in order to determine the number and characteristics of component SNPs, and perform validation to ensure the desired applications are feasible. We designed our SNP selection and validation workflow to construct a GT-seq panel for effectively identifying and sexing individuals, as well as estimating pairwise kinship up to third-order relationships. Other recent applications of multi-scaled GT-seq panels include identifying individuals and reconstructing population structure in polar bears (Hayward et al., 2022) and rattlesnakes , and identifying life history forms, assigning stock, and reconstructing population history in kokanee salmon (Chang et al., 2022;Setzke et al., 2021).
Noninvasive genetic sampling has been frequently used to monitor at-risk or elusive species for conservation, yet also has the potential to directly inform the management of invasive species. eDNA has been demonstrated as an effective noninvasive approach for invasive species research; however, such approaches are largely limited to presence-absence detection (Beng & Corlett, 2020). Often, invasive species management can benefit from population-level information, especially when planning eradications or culls (Browett et al., 2020;Burgess et al., 2021). For example, estimating population densities before a cull can help to inform stopping rules when target densities are reached, preventing excess resource investment (Ramsey et al., 2011). Alternatively, cataloging the number of remaining individuals during eradications can help decide when an eradication has been successfully completed (Macdonald et al., 2019).
As a proof-of-concept, we applied our GT-seq panel to field-

ACK N OWLED G M ENTS
The authors thank all members of the Parks Canada Gwaii Haanas field unit and Council of the Haida Nation who supported this work.
This study was funded by Parks Canada contribution agreement number GC-1031 and NSERC Discovery grant number RGPIN-2019-04621 (MAR). Computational resources were made available F I G U R E 5 GT-seq genotyping success at 244 loci across individuals grouped by sample type, which are represented by distinct shapes within inset legend. (a) Samples genotyped using the original GT-seq protocol, including paired hair and tissue (n = 46 each), paired colon feces and tissue (n = 46 each), and field feces (n = 33). (b) Identical non-invasive samples as in (a) but instead genotyped using our revised PCR1 conditions, including hair (n = 46), colon feces (n = 46), and field feces (n = 33). The dotted lines represent our threshold for acceptable missing data (50%); samples removed due to high missing data are colored grey  Figure 3 was created using BioRender. The authors thank the editor and two anonymous reviewers who helped improve the manuscript with their valuable feedback.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
Illumina raw reads are available from the NCBI sequence read archive (BioProject ID: PRJNA831890); SNP genotypic data are deposited in the Dryad Digital Repository (doi.org/10.5061/dryad.