RAD‐tag and mitochondrial DNA sequencing reveal the genetic structure of a widespread and regionally imperiled freshwater mussel, Obovaria olivaria (Bivalvia: Unionidae)

Abstract Obovaria olivaria is a species of freshwater mussel native to the Mississippi River and Laurentian Great Lakes‐St. Lawrence River drainages of North America. This mussel has experienced population declines across large parts of its distribution and is imperiled in many jurisdictions. Obovaria olivaria uses the similarly imperiled Acipenser fulvescens (Lake Sturgeon) as a host for its glochidia. We employed mitochondrial DNA sequencing and restriction site‐associated DNA sequencing (RAD‐seq) to assess patterns of genetic diversity and population structure of O. olivaria from 19 collection locations including the St. Lawrence River drainage, the Great Lakes drainage, the Upper Mississippi River drainage, the Ohioan River drainage, and the Mississippi Embayment. Heterozygosity was highest in Upper Mississippi and Great Lakes populations, followed by a reduction in diversity and relative effective population size in the St. Lawrence populations. Pairwise F ST ranged from 0.00 to 0.20, and analyses of genetic structure revealed two major ancestral populations, one including all St. Lawrence River/Ottawa River sites and the other including remaining sites; however, significant admixture and isolation by river distance across the range were evident. The genetic diversity and structure of O. olivaria is consistent with the existing literature on Acipenser fulvescens and suggests that, although northern and southern O. olivaria populations are genetically distinct, genetic structure in O. olivaria is largely clinal rather than discrete across its range. Conservation and restoration efforts of O. olivaria should prioritize the maintenance and restoration of locations where O. olivaria remain, especially in northern rivers, and to ensure connectivity that will facilitate dispersal of Acipenser fulvescens and movement of encysted glochidia.


| INTRODUC TI ON
The conservation of freshwater mussels is essential to the health of freshwater aquatic ecosystem given their important ecosystem services, including biofiltration, nutrient cycling, and sediment formation (Elderkin et al., 2007;Vaughn et al., 2008). Freshwater mussels (Bivalvia: Unionidae) are one of the most imperiled groups of freshwater organisms (Ricciardi & Rasmussen, 1999), and like many species, they face tremendous declines due to global climate change and anthropogenic threats. Pollution, legacy contaminants, and invasive species have led to a dramatic loss in aquatic biodiversity (Haag & Williams, 2014). For freshwater mussels, the pearl and button industry heavily exploited populations in the early 1900s. Subsequent construction of dams led to the destruction of irreplaceable habitat and impeded dispersal of host fish for freshwater mussels, while increased agricultural land use and associated run-off into aquatic systems has also been detrimental (Haag, 2012). Lastly, in the early 1990s, Dreissena polymorpha and Dreissena rostriformis bugensis (Zebra and Quagga mussels) became established in the Great Lakes and can out-compete native freshwater mussels and even form large aggregates that suffocate native mussel species (Lucy et al., 2013).
Current management guidelines for imperiled or endangered species strive to preserve unique or rare genetic variation (Fraser, 2008;Jones et al., 2006) and examining trends in the diversity and structure of imperiled freshwater species is crucial to management and recovery planning (FMCS, 2016;Petit et al., 1998).
Obovaria olivaria (common name: Hickorynut, Rafinesque, 1820) is a member of the freshwater bivalve family Unionidae and is widely distributed in central North America (COSEWIC, 2011). Obovaria olivaria is a species typically found in large rivers from the Mississippi River drainage system and the Great Lakes-St. Lawrence basin, extending south to Missouri, Arkansas, and Louisiana, east to Quebec, New York, and Pennsylvania, and west to Kansas (Parmalee & Bogan, 1998). While considered least concern by the IUCN, O. olivaria is considered imperiled (e.g., endangered, threatened, or special concern) across much of its distribution, especially in the Great Lakes region (COSEWIC, 2011;Natureserve, 2021), and is endangered in Canada due to declines in habitat, host fish declines, and the introduction of dreissenid mussels (COSEWIC, 2011).
Like most North American freshwater mussels, O. olivaria is dioecious (COSEWIC, 2011;Hoeh et al., 1995). A female mussel typically broods glochidia (a parasitic larvae) until a suitable host fish is near before discharging mature glochidia to successfully parasitize the host (Barnhart et al., 2008). Glochidia eventually metamorphose into juveniles and dislodge from the gills of the host fish to drop to the bottom of the riverbed (Oesch, 1995). Once mussels develop into free-living filter-feeding adults, their natural movement is severely limited and thus are virtually dependent on the brief period of attachment to the host fish or the supplemental stocking of adult mussels for long-distance dispersal (Haag & Warren, 2011). Transformation of O. olivaria glochidia has been documented on Acipenser fulvescens (common name: Lake Sturgeon, Rafinesque 1817), and O. olivaria in the Great Lakes and the St. Lawrence drainage are only known from areas where A. fulvescens are present (Brady et al., 2004;COSEWIC, 2011). Obovaria olivaria may also utilize Scaphirhynchus platorynchus (Shovelnose Sturgeon) in parts of its range (Coker et al., 1921), particularly in the Upper Mississippi River basin where A. fulvescens is rare and considered vulnerable (Knights et al., 2002;Natureserve, 2021) and S. platorynchus is more abundant and secure (Knights et al., 2002;Natureserve, 2021).
Understanding how genetic diversity is structured among O. olivaria populations across this large distribution will be an important component of future conservation and management plans (Fraser, 2008;Jones et al., 2006). As discussed above, the distribution of freshwater mussels is tied to the distribution and movement of their host fish species (Barnhart et al., 2008;Leibold et al., 2004;Newton et al., 2008;Schwalb et al., 2013;Zanatta & Murphy, 2006;Zanatta & Wilson, 2011). Sturgeons are capable of dispersing over 200 km during spawning periods (Auer, 1996;Wildhaber et al., 2011). Thus, the large distribution of O. olivaria may be driven by long-distance dispersal of its glochidia by sturgeon hosts, combined with stream capture events of waterways during the Pleistocene glaciation (Brady et al., 2004;Coker et al., 1921;Underhill, 1986). These factors may also contribute to a history of extensive gene flow within O. olivaria. However, given limited contemporary connectivity among drainages and the potential for increasingly limited dispersal of fish hosts discussed above, distinguishing whether populations exhibit substantial connectivity (e.g., isolation-by-distance) or more discrete genetic structure among rivers will be important. Beyond impacting gene flow and population genetic structure, recent and historical changes in connectivity among river drainages can leave signatures on levels of genetic diversity, and quantifying variation in diversity among less threatened populations (e.g., Mississippi River) and endangered populations (e.g., St. Lawrence River) could provide important information on how genetic diversity relates to conservation status.
The objectives of this study are to (1) examine the genetic diversity of O. olivaria across its range and identify populations or regions harboring relatively low genetic diversity and (2) examine the genetic structure of O. olivaria across its range to identify patterns of connectivity or presence of discrete population structure and dispersal barriers that can assist in focal management planning. We employ two complementary molecular approaches, mitochondrial DNA (mtDNA) sequencing and restriction site-associated DNA sequencing (RAD-seq), to examine the phylogeography, genetic structure, and genetic diversity of O. olivaria populations throughout its range. Mitochondrial DNA has been widely used in range-wide phylogeographic studies for decades (Avise et al., 1987;Beebee & Roe, 2008), including numerous analyses of freshwater mussels to determine population structure at broad geographic scales (Inoue & Berg, 2017;Inoue et al., 2014;Zanatta & Harris, 2013), but represents only a single genetic marker. RAD-seq is a powerful high-throughput sequencing approach that is increasingly used in population genetics analyses (Andrews et al., 2016). By providing large numbers of genome-wide single-nucleotide polymorphisms (SNPs), RAD-seq is of great value in population and conservation genetics because of the improved power to detect differentiation even in small or geographically restricted populations without requiring prior genomic resources (Andrews et al., 2016;Kang et al., 2017). Considering the imperiled status and rarity of O. olivaria in many parts of its distribution, RAD-seq offers a robust and innovative means to detect genomic differences among populations. Assessing both mitochondrial and genomic diversity in O. olivaria should provide insights into the population structure and diversity that will be valuable for management of this species.   Inoue et al. (2013) from the White River in Arkansas (Mississippi Embayment) were provided by colleagues at Arkansas State University (J. Harris). For newly collected specimens, mantle tissue was nonlethally biopsied from each specimen (Berg et al., 1995), placed in 95% ethanol, and stored at −80°C.

| Genetic and genomic procedures
DNA was extracted using the Qiagen DNeasy Blood and Tissue Extraction Kit™ (QIAGEN) and protocol. Extracted DNA was stained with SYBR Green™ (Thermo Fisher Scientific) dye, and agarose gel electrophoresis was performed to confirm the presence of highquality, high molecular weight genomic DNA and concentrations were quantified using a Nanodrop spectrophotometer (Thermo Fisher model # ND-1000).

| Sanger sequencing preparation
The cytochrome oxidase subunit 1 (COI) region of the mitochondrial genome was amplified by polymerase chain reaction (PCR) using the primers and thermocycler conditions described by Campbell et al. (2005). PCR products were verified by electrophoresis on a 1.5% agarose gel and purified using Exonuclease I (EXO, Amersham Biosciences cat. #E70073X, 10 U/ml) and Shrimp Alkaline Phosphatase (SAP, Amersham Biosciences cat. #E70092X 1 U/ml). An EXOSAP solution was created with 78 µl ddH 2 O, 2 µl EXO, and 20 µl SAP, and then, 2 µl of the mixture was added to each PCR product to remove primers, dNTPs, and other impurities, and were Sanger sequenced by Eton Bioscience (www.etonb io.com, San Diego, California) using the forward primer (Campbell et al., 2005).

| Restriction site-associated DNA library construction and sequencing
We used the Best-RAD protocol (Ali et al., 2016) to develop a SNP dataset. Best-RAD employs restriction enzymes to cleave DNA into short fragments and uses high-throughput sequencing to produce sequence data adjacent to the large number of restriction enzyme cut sites across the genome (Ali et al., 2016;Andrews et al., 2016).
Genomic DNA was quantified using a PicoGreen ® dsDNA quantification assay (Thermo Fisher Scientific, Carlsbad, California) for precise standardization at 20 ng/µl for library preparation. Standardized

| Mitochondrial DNA barcoding dataset
The mtDNA sequences were proofread and aligned using BIOEDIT (Hall, 1999). Additional COI sequences from the White River drainage in the southern Ozark Highlands of Arkansas and the Ohio River were included in the analyses (Inoue et al., 2013;GenBank Accession Numbers: KF035244-KF035229). Metrics for genetic diversity (e.g., number of haplotypes, number of polymorphic sites, and nucleotide diversityπ) were calculated for each population sampled using ARLEQUIN v. 2.0 (Excoffier et al., 2005). Haplotype networks were created based on the number of nucleotide mutations between different haplotypes using POPART software and the TCS algorithm (Clement et al., 2000;Leigh & Bryant, 2015). To determine whether mussels were significantly differentiated within drainages and among sites, hierarchical analysis of molecular variance (AMOVA) was used to estimate haplotype partitioning within and among sampling locations (Excoffier et al., 1992) in Arlequin (Excoffier et al., 2005) with significance of variance components and F-statistics assessed using 1,000 permutations. Pairwise Φ ST values were also calculated to determine pooled sampling location differentiation at the drainage level ( Figure 2).

| RAD-seq generated SNP dataset
The quality of Illumina reads was assessed using FASTQC (Andrews, 2010), and the data were cleaned, processed, and called using Stacks v2.53 (Catchen et al., 2013). To demultiplex and clean reads, we used process_radtags (parameters -c, -q, -r, --best-rad, others default). Because there is no reference genome for O. olivaria, de novo locus assembly and SNP calling were performed using denovo_map.pl, with removal of PCR duplicates (parameters: --rm-duplicates, --paired, --time-components, others default). We used vcftools (Danecek et al., 2011) to create a final dataset with SNPs present in ≥75% of individuals and present in all sampling locations. We required a minimum minor allele frequency of 0.02 to reduce the impact of low-frequency alleles and possible genotyping error (Rochette et al., 2019, but see Supplementary materials Appendix S1 and S2 for results with all SNPs) and thinned the dataset to retain a single SNP per RAD-tag locus to remove tightly linked sites. Individuals with >50% missing data were removed.
Unless otherwise stated, all remaining analyses were conducted in R version 4.0.3 (R Core Team, 2018). The R package radiator v1.1.9 was used for import and data format conversion (Gosselin, 2017).
To confirm that the removal of rare alleles did not impact the genetic structure observed in the dataset, a principal component analysis (PCA) was performed. First, the vcf file was converted to a genind object in adegenet v2.1.3, and then, the function tab was used to replace missing data with mean allele frequency (Jombart & Ahmed, 2011). The PCA was conducted in R package using ade4 v1.7.16 (Dray & Dufour, 2007) with the function dudi.pca (see Appendix S3). Table 1 To assess genetic diversity, we calculated observed and expected heterozygosity and the inbreeding index (H o, H e , F IS, and private alleles) of SNPs for each sampling location using the radiator package's summary_rad and private_alleles functions (Gosselin, 2017). The radiator package's write_genepop function (Gosselin, 2017) was used to convert the vcf to genepop format (Raymond & Rousset, 1995;Rousset et al., 2020), and Hardy-Weinberg calculations were conducted using the R package genepop v1.1.7 with the function test_ HW (Rousset et al., 2020). We tested for significant Hardy-Weinberg deviations for each locus within each population and across populations (combining results across populations using Fisher's Method) and overall for each population (combining results across SNPs using Fisher's Method). Genetic structure was assessed using several complementary methods. Individual-level genotype clustering was first determined using the snmf function in the R package LEA v3.2.0 (Frichot & François, 2015) with K (number of genetic populations) determined by the value producing the lowest cross-entropy (Frichot & François, 2015). significance determined using 1000 bootstrap replicates. We used AMOVA in Arlequin v3.5 as above (Excoffier et al., 2005) to estimate genotype partitioning between and among sampling locations and genetic clusters determined by the DAPC in adegenet (Jombart & Ahmed, 2011).

F I G U R E 1 Distribution of Obovaria olivaria collection sites color-coded by drainage. Collection site codes as in
To assess the location of potential geographic barriers to gene flow exist with the SNP dataset, Monmonier's algorithm was used to find the boundaries of maximum differences between contiguous polygons in a tessellation and to detect geographic locations of barriers to gene flow among genotypes (Monmonier, 1973). The Monmonier's algorithm was executed in R package adegenet using TA B L E 1 Collection sites, including abbreviations (code), major drainage basin, state/province, and the number of Obovaria olivaria samples sequenced for the mitochondrial gene COI, the number of haplotypes found at each collection location, number of unique haplotypes, the mean number of pairwise differences among haplotypes, and the mean nucleotide diversity (π) the monmonier function with pairwise F ST and site coordinates (Jombart & Ahmed, 2011;Manni et al., 2004;Monmonier, 1973 to account for lakes and short overland distances between drainages (i.e., Wisconsin River and Fox/Wolf River into the Great Lakes and the Lake Nipissing drainage in the Great Lakes and Ottawa River/ St. Lawrence River drainages). After assigning the Mississippi (above STCR site) as the river mouth in riverdist, the detectroute function was used to measure the distances between points on the river network segment (Tyers, 2017). Mantel tests for isolation-by-distance were performed in R package adegenet with the function mantel.randtest with 999 permutations using pairwise F ST and pairwise river distances (Jombart & Ahmed, 2011). We excluded sites with only one sampled individual (MISS-P25, LDES-OTT) and one site (DOJO-STL) because of channel braiding, which can produce incorrect distance calculations (Tyers, 2017).
Finally, we were interested in using the SNP data to test various demographic scenarios that might explain genetic diversity and structure among the major regional populations. We were interested in varying patterns of divergence that reflect postglacial expansion from likely glacial refugia from southern into northern populations.
Frequently, freshwater mussels exhibit a stepping-stone model of postglacial colonization, entering the Great Lakes from the Mississippi or Wabash refugia and then expanding northward (Beaver et al., 2019;Elderkin et al., 2007Elderkin et al., , 2008Mathias et al., 2018). For model testing, we conducted approximate Bayesian computation with random forests using DIYABC RF v1.0 and the R package diyabcGUI v 1.0.14 (Collin et al., 2021). We grouped mussels into three regional population clus-  Table 2 for population assignments to regions). We first COI sequence data, and maximum effective population sizes and generation times were varied to determine the best fit for the SNP dataset. We ran the COI testing models with 140,000 iterations and validated models with 500 random forests. Based on the effective population size estimates from mtDNA sequence data, in the SNP data, we set uniform prior distributions for effective population size (henceforth, N e ) and time to be a maximum of 2.5 million individuals for all regional populations and set a maximum divergence time of 50,000 generations. We also examined models constraining N e to 1/10 th this value (250k) to examine sensitivity to this prior. We tested (2) Next, we simulated a model with divergence between Southern used to determine the best model fit, and parameter estimates from the best fit model were determined from 120,000 simulations using random forest with 500 trees for each parameter, prediction of models and parameter estimates was with 500 trees for each parameter estimated.    Hardy-Weinberg equilibrium were detected in any population (Table 2). Four private alleles were found in the SNP dataset, and all were found south of the St. Lawrence (Table 2).

| SNP dataset
The DAPC and snmf analyses of the SNP dataset both revealed two major clusters, but there was clear admixture at spatially intermediate sites (Figure 3 Great Lakes watershed, the White River, Wabash, and White River in Illinois, but as above the Great Lakes mussels were intermediate ( Figure 4). We repeated the snmf analysis for K = 3, which partly separated sites from the Great Lakes region; however, admixture among ancestral populations was still apparent. AMOVA showed significant genetic differentiation among the two regional clusters (F CT = 0.11, p < .001) (Table 4) p < 0.001; Figure 5).
The DIYABC scenario choice determined that the null model (simultaneous divergence of three regional populations) was the best fit model for the SNP and mitochondrial datasets, suggesting limited power in this system to resolve any directional colonization history. However, DIYABC models did consistently show that relative N e was sharply reduced in the Northern populations of Obovaria olivaria (Table 5; Appendix S9). DIYABC models were sensitive to

TA B L E 3
Analysis of molecular variance (AMOVA) for Obovaria olivaria using COI mtDNA sequence data prior parameter settings and although Northern N e was always wellestimated, N e from Southern and Great Lakes populations had broad posterior distributions (Appendix S9). We thus urge caution when interpreting results as real-world values but suggest the variation in N e from population to population is more indicative of general trends in relative effective size (Table 5; Appendix S9). The median generation time since population split was 26,498 generations in the upper bound model (although as for N e the posterior was broad), but similar to geologic estimates of the Great Lakes formation of ~14,000 YBP (Larsen, 1985) in the lower bound model (13,166 generations). Overall, we were unable to precisely estimate parameters in this dataset with DIYABC, but results are consistent with summary statistics presented above that suggest substantially reduced genetic diversity in O. olivaria from the northern St. Lawrence River populations.

| DISCUSS ION
Our combined analysis of mitochondrial and RAD-tag sequences in O. olivaria provides evidence for ongoing or recent connectivity driven by spatial separation within and between regions, which also exhibit differences in genetic diversity. The patterns of range-wide isolation-by-distance presented here imply that O. olivaria dispersal is distance-limited, but such limitations occur at large scales and suggest the potential for long-distance movement on host fish. We also find that O. olivaria exhibits notably reduced genetic diversity in northern sites (the St. Lawrence and Ottawa Rivers), which is especially concerning given the endangered status of these populations.
In the mitochondrial DNA dataset, diversity declined from south to north, with reduced haplotype and nucleotide diversity in the Great Lakes and St. Lawrence drainages and the greatest number of haplotypes and unique haplotypes is found in the Mississippi River drainage. The White (AR) and Black River individuals had 4 unique haplotypes out of 5 haplotypes, which is consistent with other phylogeographic studies on freshwater taxa from this region that display similar endemism (Crandall, 1998;Mayden, 1985;Vaughn et al., 1996;Zanatta & Murphy, 2008). Despite the presence of some unique haplotypes in the southern populations and significant regional Φ ST values, the COI data indicate no extensive phylogeographic barriers across the O. olivaria range.
Results from the SNP data largely parallel those from the mitochondrial sequences. Heterozygosity was greatest in the Mississippi and Great Lakes drainage sites and declined toward the St. Lawrence collecting sites (Table 2) Patterns of genetic structure in freshwater mussel species often reflect changes in the patterns of hydrologic connections in the Great Lakes at the end of the last glacial period (Elderkin et al., 2007(Elderkin et al., , 2008Mathias et al., 2018). For example, at the end of the Pleistocene glaciation (~6000 to ~4500 YBP), the Great Lakes and the St. Lawrence River drainages remained connected near North Bay, Ontario (Larsen, 1985;Teller, 1985), but these connections were later severed (~4500 YBP) (Larsen, 1985).  (Hewitt et al., 2018;Kimura & Weiss, 1964;Mathias et al., 2018;Zanatta & Harris, 2013

| Conservation Recommendations
The results of this study emphasize the importance of riverscape genetics in identifying historically connected populations (Davis et al., 2018). These data indicate that gene flow has likely been  (Table 5), so it may take decades to centuries for any contemporary barriers to gene flow to result in detectable divergence of recently isolated populations (Hoffman et al., 2017). Therefore, routine genetic monitoring of at-risk populations may be advisable.
Further, in both the mitochondrial and SNP data, the St. Lawrence and Ottawa river populations, which are considered endangered, exhibit the lowest genetic diversity and N e , suggesting that these populations should be given priority for conservation and restoration efforts. Although no discrete lineages are present in O. olivaria across its range, the clinal population structure and diversity leads us to recommend that if supplementation or propagation is required to enhance diversity of threatened or endangered populations, the use of genotypes from adjacent populations would be the optimal strategy. It is important to note that while this study focuses on neutral genetic variation that shows evidence of considerable admixture in many parts of the distribution of O. olivaria, adaptive differences may exist between populations. The numbers of SNPs here are insufficient to detect such adaptation, but as whole-genome sequencing becomes more readily accessible examining genetic markers that may be under selection could elucidate evolutionary differences among populations beyond those seen here (Funk et al., 2012). Finally, management plans should also take into consideration the genetic structure and diversity of not only O. olivaria but also that of A. fulvescens and S. platorynchus, as the continued persistence of O. olivaria is intrinsically dependent on these sturgeon host fishes.

ACK N OWLED G M ENTS
Funding for this project was provided by the Department of Fisheries Thanks to the many colleagues who sent tissues or assisted in field