Population genomic analyses reveal a highly differentiated and endangered genetic cluster of northern goshawks (Accipiter gentilis laingi) in Haida Gwaii

Abstract Accurate knowledge of geographic ranges and genetic relationships among populations is important when managing a species or population of conservation concern. Along the western coast of Canada, a subspecies of the northern goshawk (Accipiter gentilis laingi) is legally designated as Threatened. The range and distinctness of this form, in comparison with the broadly distributed North American subspecies (Accipiter gentilis atricapillus), is unclear. Given this morphological uncertainty, we analyzed genomic relationships in thousands of single nucleotide polymorphisms identified using genotyping‐by‐sequencing of high‐quality genetic samples. Results revealed a genetically distinct population of northern goshawks on the archipelago of Haida Gwaii and subtle structuring among other North American sampling regions. We then developed genotyping assays for ten loci that are highly differentiated between the two main genetic clusters, allowing inclusion of hundreds of low‐quality samples and confirming that the distinct genetic cluster is restricted to Haida Gwaii. As the laingi form was originally described as being based on Haida Gwaii (where the type specimen is from), further morphological analysis may result in this name being restricted to the Haida Gwaii genetic cluster. Regardless of taxonomic treatment, the distinct Haida Gwaii genetic cluster along with the small and declining population size of the Haida Gwaii population suggests a high risk of extinction of an ecologically and genetically distinct form of northern goshawk. Outside of Haida Gwaii, sampling regions along the coast of BC and southeast Alaska (often considered regions inhabited by laingi) show some subtle differentiation from other North American regions. These results will increase the effectiveness of conservation management of northern goshawks in northwestern North America. More broadly, other conservation‐related studies of genetic variation may benefit from the two‐step approach we employed that first surveys genomic variation using high‐quality samples and then genotypes low‐quality samples at particularly informative loci.


| INTRODUC TI ON
Conservation policy and management are becoming increasingly important for the survival of wildlife populations, given the growing impact of human populations on the environment (Barnosky et al., 2011;Côté, Darling, & Brown, 2016;Robinson, 2006). An effective tool has been the listing of individual species or subspecies as Endangered or Threatened under the protection of laws such as the Species at Risk Act (in Canada) or the Endangered Species Act (in the United States). This protection has led to the recovery of species such as the peregrine falcon (Falco peregrinus; Ambrose, Florian, Ritchie, Payer, & O'brien, 2016;Watts et al., 2015), the Kirtland's warbler (Setophaga kirtlandii; Bocetti, Goble, & Scott, 2012), and pink sand-verbena (Ambronia umbellata; Parks Canada Agency, 2017).
Effective conservation depends on clear identification and classification of true biological entities, enabling identification of the individual organisms and populations that are subject to specific threats, regulations, and management actions. Groups of organisms that look similar to each other but are differentiated genetically, behaviorally, ecologically, and/or physiologically (Bickford et al., 2007;Pulido-Santacruz, Aleixo, & Weir, 2018;Toews & Irwin, 2008) pose a particular conservation challenge for policy-makers. Such "cryptic" biodiversity is important to identify and manage properly, as the extinction of one of the cryptic forms would result in the loss of important biological variation. Likewise, when the boundaries between forms that are only subtly differentiated (e.g., two subspecies or otherwise differentiated populations) are unclear, efforts to conserve one of those forms are greatly complicated.
Northern goshawks (Accipiter gentilis) living along the coast of British Columbia, Canada, and southeast Alaska, USA, provide an example of a species with unclear boundaries between forms that differ in their conservation status listing. Taverner (1940) determined that northern goshawks from the Haida Gwaii archipelago (called the "Queen Charlotte Islands" at the time) were more darkly colored than those from the mainland, whereas northern goshawks from Vancouver Island were "more variable and less plainly characterized." Taverner (1940) formally designated this darker form with the subspecies name "A. g. laingi" (also, the "Queen Charlotte Goshawk"), using a specimen from Haida Gwaii (from Masset, on Graham Island) as the type specimen. The other major North American subspecies of northern goshawk is A. g. atricapillus, which ranges across much of the continent. Since Taverner's (1940) work, the exact ranges of these two subspecies along the coast of BC, southeast Alaska, and the state of Washington have been under debate. About one-third of northern goshawks on Vancouver Island and within the Alexander Archipelago of southeast Alaska have been described as having a dark appearance approaching (but not completely matching) the darkness of those on Haida Gwaii (COSEWIC, 2013;Titus, Flatten, & Lowell, 1994;Webster, 1988), providing equivocal evidence as to whether the range of laingi should be considered to include those areas. Given this ambiguity from the appearance of the northern goshawks, biologists have turned to habitat modeling (based in part on radio-telemetry data) to more specifically delineate the inferred range of laingi. For example, the Northern Goshawk Accipiter gentilis laingi Recovery Team (2008)  Act (COSEWIC, 2000(COSEWIC, , 2013. However, continued debate over which individual northern goshawks should be considered laingi versus atricapillus, and hence which geographic areas should be considered within the range of each subspecies, has complicated conservation policy and management (COSEWIC, 2013).
Studies of molecular genetic variation can be of great use in revealing natural biological groups and in assigning individuals to those groups (Mason & Taylor, 2015;Wagner et al., 2013). Previous genetic research in northern goshawks was inconclusive in terms of clarifying the range of laingi. Sonsthagen et al. (2012) concluded that patterns in both mitochondrial DNA and microsatellites were consistent with gene flow among sites along the BC coast and southeast Alaska, but the study did not include northern goshawks from outside of this region. Bayard de Volo, Reynolds, Sonsthagen, Talbot, and Antolin (2013) reported some genetic structure in mitochondrial DNA across North America, but did not include northern goshawks from Haida Gwaii. Until recently, most such analyses of genetic variation in a conservation context have used a small set of molecular markers (e.g., mitochondrial DNA, which is inherited as a single unit from the mother; or a small number, e.g., 5-20, of microsatellite loci). These approaches may not reveal true population differences because the portion of the genome that differs between biological groups can be quite limited, even between groups that are morphologically distinct (Mason & Taylor, 2015;Toews, Taylor, et al., 2016). Recent advances in genomic sequencing allow for a much more comprehensive survey of genetic variation across the genome, such that small portions of the genome that differ between groups can be detected (Toews, Taylor, et al., 2016).
Here, we used a two-stage analysis that first surveys genomewide variation among high-quality genetic samples and then examines genotypes of highly informative loci in a much larger set

K E Y W O R D S
Accipiter gentilis laingi, conservation genetics, designatable unit, genomics, genotype-bysequencing, northern goshawk, population genetics, Species at Risk Act of low-quality genetic samples. Firstly, we used genotyping-by-sequencing (GBS; Elshire et al., 2011)  Europe. To assess congruence between nuclear and mitochondrial DNA patterns, we analyzed variation in mitochondrial DNA control region sequences. Secondly, given that many of the northern goshawk samples are from shed feathers or museum skin samples that provided DNA of insufficient quality for the GBS analysis, we developed assays for 10 markers with high frequency difference between the two main genetic clusters revealed by our genome-wide analysis. These assays were then used to genotype many low-quality DNA samples. We propose that this general two-step approach could be used to better clarify the ranges of other groups of conservation concern and to determine the genetic ancestry of otherwise difficult-to-identify individuals. The overall goal of our study was to provide clarity to conservation managers regarding the number, distributions, and genetic distinctiveness of different forms of northern goshawk in northwestern North America.  Table S1). The tissue type used for DNA varied; we extracted shed feathers (usually collected from nests or from the ground near nests), blood (collected from live birds), muscle tissue (from dead birds, now museum specimens), and toe pads (from museum study skins). All samples used in the data analysis were from different individuals (see below for details).

| Sampling and DNA extraction
DNA was extracted using a standard phenol-chloroform protocol following overnight digestion at 55°C with 0.3 μg of proteinase K in 400 μl of lysis buffer (0.1 M Tris, 5 mM EDTA, 0.2% SDS, 0.2 M NaCl, pH 8.5). In samples consisting of shed feathers, a separate digestion protocol was used (following Bayard de Volo, Reynolds, Douglas, & Antolin, 2008) before DNA extraction. DNA was resuspended in 50-100 μl of 1× TE buffer and quantified in a Qubit fluorometer (Invitrogen) following the manufacturer's instructions. The quality of DNA yielded from the different types of source material varied considerably, with blood and tissue producing the highest quality DNA in the largest quantities. Feathers and toe pad samples produced mostly degraded DNA (i.e., broken into relatively small fragments, hence appearing as a low-molecular-weight smear after electrophoresis), but toe pads typically produced more DNA than feathers did. Because the majority of our samples were from shed feathers, and because they produced mostly degraded DNA, we used a two-step approach in terms of methodology: We conducted GBS on our high-quality tissue samples and TaqMan genotyping assays of informative loci on a broader set of samples of varying DNA quality (described below). Some feather samples from critical areas like Haida Gwaii and Vancouver Island were included on our GBS plates but in general they performed poorly (Supporting Information   Table S1; note that we also had some good tissue samples from those areas; described below).

| Genotyping-by-sequencing
We used a reduced representation genome sequencing method, GBS (Elshire et al., 2011), following modifications as in Alcaide, Scordato, Price, and Irwin (2014) and specified in detail in Toews, Brelsford, Grossen, Milá, and Irwin (2016). We made the following minor modifications to the protocol in : (a) We used 10 μl of cleaned DNA fragments from the ligation reaction for a PCR of 18 cycles; (b) after Qubit quantification of PCR products, 100 ng of each PCR product was added to the pool which was then concentrated in a SpeedVac (Savant DNA110; Thermo Fisher) and run on 5 lanes of a 2% agarose gel; and (c) we selected fragments within a size range of 400-500 bp. Two GBS libraries, each containing up to 96 individually barcoded samples, were prepared as described above and submitted for paired-end sequencing on an Illumina HiSeq 2500 (at McGill University and Génome Québec Innovation Centre). For inclusion in GBS, we prioritized samples for which extractions yielded high molecular weight DNA, but we also included 11 samples whose extractions yielded considerably degraded DNA because they represented birds from important areas for this study (Haida Gwaii, n = 7; Vancouver Island, n = 1; mainland British Columbia, n = 2; and coastal mainland British Columbia, n = 1). The starting materials for DNA extraction for these degraded samples were shed feathers collected near nests (n = 10) or toe pads (n = 1). These 11 samples were submitted for sequencing multiple times (with different barcodes) to try to obtain enough sequence data for our analyses. Across the two libraries, a total of 172 barcodes were used for 160 samples (plus 4 blanks, two no-DNA blanks and two no-barcode blanks, and 16 samples from other projects; Supporting Information Table S1).
All resulting DNA sequences have been deposited in the NCBI SRA under BioProject PRJNA503605.

| mtDNA sequencing
We used the primers L16064 and H15426 (Sonsthagen, Talbot, & White, 2004) to PCR amplify and Sanger sequence 578 bp of the mitochondrial control region (between positions 1,169 and 1,747 of the complete mitochondrial sequence of A. gentilis [GenBank: Buteo jamaicensis, and one Haliaeetus leucocephalus (Supporting Information Table S1). PCR amplifications were carried out in a total volume of 25 μl, with 0.5 μM of each primer, 0.2 mM of dNTP's, 2 mM of MgCl 2 , 0.5 units of recombinant Taq DNA polymerase (Invitrogen), and between 2 and 20 ng of DNA. Cycling conditions were as follows: 94°C for 3 min; 35 cycles of 94°C for 45 s, 51°C for 30 s, and 72°C for 1 min; and a final extension step at 72°C for 10 min. PCR products were sent out for purification and Sanger sequencing at Macrogen (USA) using primer H15426. Mitochondrial DNA (mtDNA) sequences were visually inspected and manually aligned with BioEdit (Hall, 1999). The bald eagle and red-tailed hawk sequences were so divergent that they could not easily be aligned, and they were not considered further. All mtDNA sequences are deposited in NCBI's GenBank under accession numbers MK144145-MK144272.

| GBS sequencing, filtering, and analysis
Sequencing of the two genotyping-by-sequencing libraries resulted in over 1.12 billion reads in total. To analyze these reads, we followed the scripts available at https://doi.org/10.5061/f.t951d from Irwin, Alcaide, Delmore, Irwin, and Owens (2016) for GBS read processing and mapping. In brief, we demultiplexed the raw GBS reads from the two barcoded libraries using a custom perl script from Baute, Owens, Bock, and Rieseberg (2016)  of reads were successfully mapped, and for 134 samples, more than 95% of reads were successfully mapped. As expected, the sample with the highest fraction of reads mapped was from our bald eagle (98.7% reads mapped). The largest source of variation in the number of reads generated and number of reads mapped per sample appeared to be the source material used for DNA extraction. The two most common sources of DNA in our GBS analysis were muscle tissue preserved in ethanol (112 samples) and shed feathers (feathers found near nests; 33 samples). While on average both types of material generated large numbers of reads (4.61 M for feathers and 5.92 M for tissue), the average number of reads mapped was much lower for feathers (1.84 M) than for tissue (5.75 M). Three samples (Supporting Information Table S1) had <100 K reads mapped and were dropped from further analyses. Remaining analyses were performed only on the remaining 157 samples.
Despite most reads having been mapped to the bald eagle genome, our requirement that reads map with high quality (MAPQ ≥20) meant that on average only 56.5% of mapped reads from these 157 samples were used for further analysis. The evolutionary distance between northern goshawks and the bald eagle leads to lower mapping quality for northern goshawk GBS reads (e.g., for sample NGAK020, only 55.6% reads mapped with MAPQ ≥20; Supporting Information Figure S1) in comparison with our bald eagle GBS reads (84.6% of bald eagle reads mapped with MAPQ ≥20).
We used Picard (http://broadinstitute.github.io/picard/) and SAMtools  to generate for each individual a BAM file containing its sequence information. Reads from samples run with multiple barcodes were merged into a single file at this stage. We RealignerTargetCreator and IndelRealigner as this is discouraged with recent versions of GATK, (b) changed option "--max_alternate_ alleles 2" to "--max_alternate_alleles 4" in HaplotypeCaller, (c) did not use the options "--allSites" and "-L" in GenotypeGVCFs, and (d) used option "-hets 0.01" in GenotypeGVCFs.
The resulting VCF file was further filtered using VCFtools v0.1.11 (Danecek et al., 2011) to remove insertion and deletion polymorphisms and loci that were not biallelic. We used a custom perl script  to filter out loci with observed heterozygosity of 0.6 or greater as these are likely the result of paralogous variation instead of allelic variation. At this stage, we used VCFtools to remove from further analysis 22 samples that had more than 80% missing loci in the dataset prior to heterozygosity filtering. Shed feathers were 22 of the 25 (88.0%) eliminated samples due to low reads or high missing data, and only comprised 12 of the 135 (8.9%) remaining samples (Supporting Information Table   S1). Additionally, 554 loci suspected of being sex linked (Supporting Information Table S2 and Appendix S1) were removed from analysis, and seven samples identified as having close kin relationships with other samples in the study were also excluded from further analysis (Supporting Information Tables S3 and S4 and Appendix S1).
The resulting dataset included 128 individuals (Table 1 and Figure 1) and 2,885,805 SNPs. For specific analyses of population structure, this dataset was filtered in various ways (described below with each analysis); all involved filtering (using VCFtools) to include only SNPs with <30% missing genotypes and genotype quality of 10 or higher. We filtered out SNPs with rare alleles by either eliminating SNPs with rare alleles that occurred only once (i.e., eliminating singletons) or keeping only SNPs with a minor allele frequency of 0.05 or higher. Different analyses included either all individuals in the study (n = 128), only goshawks (n = 124), only North American goshawks (n = 119), or only North American goshawks excluding those from Haida Gwaii (n = 107). In some analyses (see below), we filtered to include only unlinked SNPs, using the R package SNPrelate (Zheng et al., 2012).
We used two complementary approaches to inferring patterns of population structure among North American goshawks using the GBS dataset, after filtering to unlinked SNPs with minor allele frequency (MAF) ≥0.05 (n = 6,058 SNPs). First, we performed principal components analyses (PCA) with the R package pcaMethods (Stacklies, Redestig, Scholz, Walther, & Selbig, 2007) (Pritchard, Stephens, & Donnelly, 2000), models the probability of the observed genotypes using ancestry proportions and population allele frequencies. Instead of a Bayesian approach, however, Admixture uses a maximum likelihood approach resulting in much faster runs.
We ran five replicates of Admixture allowing for the number of clusters (K) in the model to vary from 1 to 9, and we terminated each run when the difference in log-likelihood between successive iterations fell below 1 × 10 −9 . We chose the K that minimized cross-validation error and hence best fit the data (Alexander et al., 2009 indicates that sampling region X is admixed between Y and Z, whereas significantly positive values indicate no admixture. To account for linkage between nearby SNPs, we used the command "-k 50" in the estimation of f3 statistics so that SNPs were blocked in windows of 50 consecutive SNPs ordered assuming synteny with the bald eagle. Given that our results (see below) indicated population differentiation between Haida Gwaii and elsewhere, and to a lesser degree between coastal regions and elsewhere, we used assignment tests to explore the degree to which our genomic dataset can be used to predict geographic origin of a sample. This was done by applying discriminant analysis of principal components (DAPC) using the R package adegenet (Jombart, 2008) and using cross-validation values as probabilities of correct assignment. We trained our datasets with 60% of the data and tested them with the remaining 40%. Tests of assignment probability were performed across 10 PCA axes, and for each PCA axis, tests were replicated 20 times. Missing genotypes were imputed with the mean allele frequency for each SNP. We averaged correct assignment proportions across the first 10 PCA axes and across the 20 replicates.

| Inbreeding
To test whether there is more inbreeding in some regions compared to others, we calculated the average genome-wide inbreeding coefficient (F) for each individual within each sampling region using the method of moments implemented in VCFtools (option "--het," Danecek et al., 2011). We did this for each North American sampling region separately after first filtering out singletons (resulting in 24,322 SNPs).

| Divergence between species/subspecies
To quantify patterns of relative population differentiation, we used VCFtools (command "--weir-fst-pop," after first eliminating singleton SNPs) to estimate overall pairwise weighted F ST between sampling regions following Weir and Cockerham (1984).
For the same groups, we also estimated net nucleotide divergence, D A (Nei, 1987).
is the average pairwise nucleotide distance between groups, and D X and D Y are the average pairwise nucleotide distances within groups.
D XY , D X , and D Y were estimated across the 40 largest scaffolds, representing 592,096,274 bp, or 50.24% of the published genome sequence of the bald eagle, using the GATK pipeline and a custom R script described in Irwin et al. (2016). We retained invariant SNPs (using the "-allSites"option in GATK) and included SNPs with up to 60% missing data. For each statistic (D XY , D X , and D Y ), we generated estimates in windows (each containing 5,000 sequenced bp) across the 40 scaffolds, and we then averaged the windows together for a single overall estimate between each pair of populations. For the mtDNA dataset, sequence-based F ST (Hudson, Slatkin, & Maddison, 1992) was estimated in DNAsp v6 (Rozas et al., 2017) and D A in MEGA X (Kumar et al., 2018). To account for recurrent mutations, we used a maximum composite likelihood model (Tamura, Nei, & Kumar, 2004) with rate variation among sites modeled with a gamma distribution (shape parameter = 0.05) and taking into account composition bias among sequences (Tamura & Kumar, 2002).
We used the differentiation estimates between Haida Gwaii and the remaining North American sampling regions to get a rough estimate of the time since they started diverging using the expectation from the neutral theory that D = 2μt, where μ is mutation rate, t is time, and D is genetic distance. We used D A as a proxy for net genetic distance D, an approach that takes into account within-group variation. We used the nuclear DNA substitution rate as estimated from

| Ancestry informative assays
Our GBS results showed two main genetic clusters of North We performed a trial genotyping assay for each of the 11 loci with a small subset of samples for which we had genotypes at these loci from the GBS data. Genotyping was successful at 10 of the 11 TaqMan loci. Each of the 10 successful loci was then genotyped in the entire sample set in two 384 plates each containing 32 reference samples (Supporting Information Table S1). These 32 samples were selected from our GBS samples so that for each locus there were multiple observations of each genotype (homozygous allele 1, heterozygous, and homozygous allele 2).
These 10 assays were used to genotype 444 samples from our entire sample collection, which included large numbers of shed feathers and toe pads from museum skins for which we were only able to extract small amounts of highly degraded DNA. These data are deposited in Dryad under https://doi.org/10.5061/dryad.5sg5082. When multiple feathers were available from a single nesting territory, only one was used for the GBS analysis, but for the TaqMan assays, multiple feathers were used for 13 territories. In only one case, different feathers from the same territory were kept for further analyses because their multilocus genotypes differed.
We successfully genotyped 386 North American northern goshawks for 7 or more of the 10 loci (Supporting Information Table S6). The genotyping rate was high (range 89.6%-99.7%), and the discrepancy rate between the GBS and TaqMan datasets was low (0.79%; i.e., we observed eight genotype discrepancies between datasets out of a total of 1,013 genotypes). Seven discrepancies consisted of a heterozygote genotype for TaqMan and a homozygote genotype for GBS.
One discrepancy consisted of homozygote genotypes for different alleles (Supporting Information Table S6 and Figure S2).

| Distributions of main genetic clusters
We used the R package HIest (Fitzpatrick, 2012) Figure S3).

| Population structure within North American northern goshawks
Analysis of the North American northern goshawk samples clearly reveals two genetic clusters: one that includes all individuals from

Haida Gwaii (HG) and a second containing all remaining North
American individuals (see population abbreviations and sample sizes in Table 1 and Figure 1). This pattern is illustrated using both a principal component analysis (PCA) and an Admixture analysis ( Figure 3 and Supporting Information Table S7). The first PCA axis, explaining 5.8% of the genotypic variation, separates HG from all other sampling regions (Figure 3). Similarly, in the Admixture analysis, where K = 2 was the K value with the greatest support (Supporting Information Figure S4) SNPs that are observed at least twice; Supporting Information Figure   S6), but the Admixture analysis detects no population structure (i.e., one is the K value that minimizes the cross-validation error). We repeated these analyses with HG excluded, and again results suggest some weak population structuring in North America outside of HG (Supporting Information Figures S7 and S8).
These general patterns are further supported by cross-validation tests using discriminant analysis of principal components (DAPC).
When testing assignment to HG versus elsewhere in North America, DAPC correctly assigns individuals 96.2% of the time. Without HG in the dataset, a test of assignment to coastal (that is, AA, VI, and BCc treated as a single group) versus elsewhere in North America correctly assigns individuals 83.3% of the time (note that random assignment to two groups would be right 50% of the time). This suggests some differentiation of these regions, but provides only weak confidence in the assignment of single individuals to coastal versus elsewhere based on the genomic dataset.
A haplotype network of the mtDNA data ( Figure 4 and Supporting Information  (Figure 5a and Supporting Information Table   S5). In contrast, when HG is excluded, the distribution of F ST between areas currently considered within the range for laingi (i.e., VI, AA, and BCc) and areas considered within the range for atricapillus (i.e., BC, AK, WA, and East) is clustered much more tightly around zero (weighted F ST estimate is 0.0059), with the threshold for the top 1% of F ST estimates occurring at only 0.131 (Figure 5b; n = 10,217 SNPs with MAF of 0.05 or higher).

| Genotyping of 10 informative loci
The addition of TaqMan genotyping at ten loci allowed us to dramatically increase the number of individuals included in this study. With this extra information, we were able to more accurately map the geographic distributions of the two genetic clusters identified in the GBS study. To determine how closely this set of 10 loci recapitulates the GBS data (Figure 3), we compared the ancestry proportions calculated from these 10 loci with those estimated from the GBS dataset using the program Admixture. There is broad agreement between the two methods (Supporting Information Figure S9), except the Admixture estimates tend to be somewhat higher when estimated using the subset of 10 loci. This discrepancy is likely due to lower precision of the 10loci dataset compared to the thousands-of-loci GBS dataset.
For each sample, we estimated an ancestry index (S) in HIest Estimates of interclass heterozygosity can be informative with regard to the categories of admixed individuals. Interclass heterozygosity is expected to be very high in early hybrid generations and low in advanced hybrid generations. In our dataset, only four samples out of 386 (i.e., 1%) have interclass heterozygosity of 0.5 or higher suggestive of being first-or second-generation hybrids: two are from HG, one from AA, and one from BCc (Supporting Information Figure S10).

| Gene flow
While our results clearly indicate that HG is genetically distinct from other regions (Figures 3-6), there is some suggestion of recent gene flow between HG and other sampling regions, especially those coastal regions close to HG. Gene flow is suggested by the slightly closer similarity of these coastal regions to HG (i.e., AA, BCc, and VI) in the PCA (Figure 3a) and the Admixture analysis (Figure 3b), by the somewhat higher ancestry proportions F I G U R E 4 Median-joining network for mtDNA sequences (length 423 bp) obtained from 121 northern goshawks sampled in North America. Each pie chart represents a haplotype, with area proportional to its frequency in the sample set. Each black line indicates a single mutational step, and small white dots indicate missing haplotypes. Population abbreviations can be found in Table  1, sample details in Supporting Information Table S1, and haplotype details in Supporting Information  Hudson et al., 1992)  To formally test for genetic mixture between genetically differentiated sampling regions, we used Treemix with our GBS dataset to estimate the statistic f3. Four tests indicate admixture (significantly negative f3 test statistic), and in all four cases, AA appears as a population that is admixed between HG and another sampling region: between HG and East (Z = −6.34520; p < 0.001), HG and WA (Z = −5.32896; p < 0.001), HG and BC (Z = −4.71371; p < 0.001), and HG and AK (Z = −4.69555; p < 0.001). The test statistic also suggests a trend for AA being admixed between HG and VI (Z = −1.41276; p = 0.079). Note that the f3 test only has power to reveal admixture when populations are differentiated.

| Timing of differentiation
To estimate the timing of the divergence between HG and remaining populations, we used mutation rate estimates (see Section 2) together with observed net nucleotide divergence (D A ) between HG and other North American populations for both the nuclear and mtDNA datasets (Supporting Information Table S9). We advise caution regarding the resulting estimated divergence times because of the difficulty in applying long-term divergence rates to shorter time scales (due to saturation, for example), because mutation rates may differ between Resulting estimates of divergence between these genetic clusters are ~13 KYA for the mtDNA and ~24 KYA for the nuclear dataset. These results are about one order of magnitude lower than the estimated divergence time between North American and European goshawks, which is ~247 KYA for the mtDNA and ~346 KYA for the nuclear dataset.

| D ISCUSS I ON
Our analyses of variation in the nuclear and mitochondrial genomes of northern goshawks very clearly reveal a genetically distinct While molecular genetic studies of endangered and threatened species can be important in terms of clarifying the boundaries of conservation units and allowing inference of population history and current health, they are often hindered by greatly limited availability F I G U R E 6 Estimates of ancestry proportions (S) from HIest, ranging from 1 for pure HG cluster individuals to 0 for pure non-HG cluster individuals, based on 10 ancestry informative SNPs of northern goshawks for all North American populations (a), with detail in (b) for the Pacific Northwest. Population codes are in Figure 1 and Table 1   of DNA samples. This is for two reasons: First, these species are usually rare (that being the reason for their conservation status), making them difficult to find and sample. Second, ethical considerations and permitting requirements often limit direct handling of individuals of these species. Our study was subject to these limitations, requiring a creative approach to both sampling and DNA methodology. By widely broadcasting a call for samples, we received crucial contributions from a wide variety of biologists and institutions. These contributions came in many forms, ranging from high-quality tissue samples to feathers collected under nests after long exposure to the elements. Our two-step methodological approach, starting with surveying thousands of SNPs in high-quality samples and then genotyping low-quality samples at informative loci, was highly successful.
This approach maximized sample size in two ways: We obtained high sample size of SNPs in the GBS analysis, enabling us to identify that fraction of the genome that is most informative in terms of population structure; and we maximized our geographic coverage and number of individuals by using the TaqMan assays of those informative markers. We encourage the use of similar two-phase approaches in other genomic analyses of species of conservation concern.
We are not the first to document genetically distinct forms of taxa on Haida Gwaii. Pruett et al. (2013) Taverner's (1940) description of that dark-colored form occurring in Haida Gwaii, along with evidence that somewhat dark individuals can also be found on Vancouver Island (Taverner, 1940) and the Alexander Archipelago of southeast Alaska (Titus et al., 1994;Webster, 1988 (Burtt & Ichida, 2004).
As noted, the dark plumage characteristic upon which the subspecies laingi is based has a wider apparent distribution than just Haida Gwaii, but a pattern of apparent phenotypic intergradation with the more widespread atricapillus outside of Haida Gwaii has been known since the original description (Taverner, 1940;Titus et al., 1994;Webster, 1988 Gwaii and the Alexander Archipelago (and perhaps other nearby areas such as Vancouver Island or the BC coast). Furthermore, we did find two Haida Gwaii individuals in our genotype assay dataset that were supported as "laingi backcrosses." This finding lends some support to the idea that there have been very recent dispersal events to Haida Gwaii (followed by interbreeding), but there is some uncertainty in this assessment due to the small number of loci genotyped in those individuals.
In managing any species of conservation concern, understand- Our results suggest a taxonomic re-evaluation of the laingi subspecies may be appropriate, but this would likely depend in part on more detailed morphological analysis beyond the scope of the present paper. We note, however, that the subspecies concept is the subject of much debate, particularly in terms of how much subspecies taxonomy should depend on genetic clustering versus specific morphological traits contained in the initial subspecies description (Liu et al., 2018;Patten, 2010;Winker, 2009Winker, , 2010. Regardless of taxonomic treatment, we think the evidence in support of treating the Haida Gwaii population as a distinct conservation unit (e.g., a "designatable unit") is strong: Haida Gwaii goshawks are very clearly genetically distinct, have a recognized phenotype (darker plumage than elsewhere, even compared to those in southeast Alaska and Vancouver Island), and inhabit an ecologically differentiated and geographically separated archipelago. This Haida Gwaii population is presumably at very high risk of extinction, given its extremely small (and historically declining) size of just 48-57 individuals (COSEWIC, 2013).
While the Haida Gwaii population size is thought to have declined from historical levels (COSEWIC, 2000(COSEWIC, , 2013, it was likely never very large, given the typical territory sizes of northern goshawks and the limited size of the archipelago. Small populations are of special conservation concern because they tend to have low diversity, to experience inbreeding, and to respond poorly to natural selection given the power of genetic drift on small populations. Our data provide some insights into these three typical characteristics of small populations. First, levels of nuclear nucleotide variability in Haida Gwaii are about 80% of that in other North American sampling regions (Supporting Information Table S10). This reasonably high variability (given the small geographic range and population size) is likely a combined result of the relatively recent (i.e., within the last 20,000 years or so) population differentiation between Haida Gwaii and elsewhere and the occasional gene flow from other regions to Haida Gwaii. Second, our data do not reveal a higher amount of inbreeding in Haida Gwaii, as inbreeding coefficients range from −0.0621 to 0.278 which is within the range in other populations (Supporting Information Table S11 and Figure S11) and consistent with previous estimates (Sonsthagen et al., 2012). This suggests inbreeding is not presently a major concern. However, further decline and/or isolation of the population might elevate the possibility of inbreeding depression. Third, while genetic drift certainly is expected to overpower weak selection in a small population, our highly skewed distribution of F ST values in the comparison of Haida Gwaii and elsewhere suggests that strong selection has likely shaped some characteristics of the Haida Gwaii population. Future research will more specifically test for the role of selection in shaping patterns of genomic variation in these populations.

Outside of Haida Gwaii, populations of goshawks in British
Columbia and nearby regions have also declined from historical levels. This decline has been particularly closely examined for regions currently considered to be inhabited by laingi, given the listing of that taxon as Threatened (COSEWIC, 2000(COSEWIC, , 2013. It should also be noted the atricapillus subspecies (as currently delineated) was recently des- List status in British Columbia indicates that this organism is of special conservation concern within the province. Our genetic results have two implications for management of goshawk populations outside of Haida Gwaii: First, there is only relatively subtle differentiation between coastal populations (e.g., Vancouver Island, southeast Alaska, and the BC coast) and BC interior populations, suggesting that an integrated approach to management may be most effective.
Second, the subtle differentiation that is observed between some sampling regions suggests that goshawks tend to have limited gene flow between regions; this suggests that the causes of observed regional declines (Doyle et al., 2017) may be somewhat specific to each region and management strategies tailored to each region may be necessary to maintain local populations.
While Bancroft. For providing valuable support through sample contribution and/or advice regarding this project, we thank Sally