The ghost of introduction past: Spatial and temporal variability in the genetic diversity of invasive smallmouth bass

Abstract Understanding the demographic history of introduced populations is essential for unravelling their invasive potential and adaptability to a novel environment. To this end, levels of genetic diversity within the native and invasive range of a species are often compared. Most studies, however, focus solely on contemporary samples, relying heavily on the premise that the historic population structure within the native range has been maintained over time. Here, we assess this assumption by conducting a three‐way comparison of the genetic diversity of native (historic and contemporary) and invasive (contemporary) smallmouth bass (Micropterus dolomieu) populations. Analyses of a total of 572 M. dolomieu samples, representing the contemporary invasive South African range, contemporary and historical native USA range (dating back to the 1930s when these fish were first introduced into South Africa), revealed that the historical native range had higher genetic diversity levels when compared to both contemporary native and invasive ranges. These results suggest that both contemporary populations experienced a recent genetic bottleneck. Furthermore, the invasive range displayed significant population structure, whereas both historical and contemporary native US populations revealed higher levels of admixture. Comparison of contemporary and historical samples showed both a historic introduction of M. dolomieu and a more recent introduction, thereby demonstrating that undocumented introductions of this species have occurred. Although multiple introductions might have contributed to the high levels of genetic diversity in the invaded range, we discuss alternative factors that may have been responsible for the elevated levels of genetic diversity and highlight the importance of incorporating historic specimens into demographic analyses.


| INTRODUC TI ON
Understanding the demographic history of populations constitutes a fundamental aspect of evolutionary biology. Invasive species are particularly suitable for demographic analyses, as they frequently experience rapid alternations in levels of genetic diversity following introduction (Chown et al., 2015;Hui & Richardson, 2017;Lee, 2002;Rius & Darling, 2014;Roman & Darling, 2007).
Numerous studies have attempted to assess the effects of invasion dynamics on genetic variation (e.g., founder effects, genetic bottlenecks, admixture, propagule pressure; Baker & Stebbins, 1965;Hui & Richardson, 2017;Mayr, 1963) by comparing populations in the native and invasive ranges (Kelly, Muirhead, Heath, & Macisaac, 2006;Kolbe et al., 2004;Naccarato, Dejarnette, & Allman, 2015;Rollins, Woolnough, Wilton, Sinclair, & Sherwin, 2009). These types of studies aid in unravelling the demographic history of the invasive species in question (Ficetola, Bonin, & Miaud, 2008;Gillis, Walters, Fernandes, & Hoffman, 2009;Gray et al., 2014;Neilson & Stepien, 2011). Yet, despite the wealth of specimens and information housed within Natural History collections, the majority of invasion studies to date have focussed exclusively on contemporary populations, thereby relying heavily on the premise that the historic population structure within the native range has been maintained over time.
Historic DNA serves as a valuable reference when examining contemporary genetic diversity (Bouzat, 2000;Dormontt et al., 2014;Guinand, Scribner, & Page, 2003;Lozier & Cameron, 2009), as it allows for the monitoring of temporal changes in genetic diversity across generations (Guinand et al., 2003;Sefc, Payne, & Sorenson, 2007). This temporal approach increases the chance of detecting subtle changes frequently overlooked by studies focussing only on contemporary data (Lozier & Cameron, 2009) and thus allows us to delineate the most likely invasion scenario (Gillis et al., 2009;Thompson et al., 2011;Van Kleunen, Weber, & Fischer, 2010) and reveal connectivity levels among invasive populations (Beneteau, Walter, Mandrak, & Heath, 2012;Funk, Garcia, Cortina, & Hill, 2011;Snyder & Stepien, 2017). This may be of particular importance in studies conducted on taxa for which there is a priori reason to suspect temporal fluctuations in genetic variation, such as highly exploited (and subsequently stocked) taxa or species often associated with human-mediated dispersal. Hence, from an evolutionary perspective, the incorporation of historic DNA is therefore of fundamental importance.
Smallmouth bass, Micropterus dolomieu (Lacepèdé, 1802), presents a suitable model system to investigate variation in genetic diversity through space and time, as the species' exploitation and subsequent stocking events within the native range are well documented (Long, Allen, Porak, & Suski, 2015), and its formal introduction history and subsequent spread into and throughout South Africa are well recorded (De Moor & Bruton, 1988). Twenty-nine M. dolomieu specimens originating from broodstock collected in the Wheeling River, West Virginia, USA, were shipped from the Lewistown hatchery in Maryland, USA, to the Jonkershoek hatchery in South Africa in 1937 (De Moor & Bruton, 1988;Loppnow, Vascotto, & Venturelli, 2013;Powell, 1967). Here, they were reared and bred before being released into multiple water bodies across the country to provide opportunities for angling (De Moor & Bruton, 1988). Most of the documented stockings (De Moor & Bruton, 1988) occurred prior to the cessation of government support to stocking programs in the early 1990s (Ellender, Woodford, Weyl, & Cowx, 2014). predict that the invasive South African range will have a lower genetic diversity when compared to the native (historic and contemporary) North American range due to a loss of alleles, as suggested by Dlugosch and Parker (2008). Furthermore, as heavily exploited species often experience genetic bottlenecks, leaving traces in the species' genetic diversity (Pinsky & Palumbi, 2014), we predict that the genetic diversity will be lower in contemporary time when compared to historical samples in the native range.

| DNA collection and extraction from historical specimens
Specimens representing the historical native range (Figure 1), corresponding to the approximate time of introduction into South Africa (1930Africa ( -1941, were obtained from a host of collections housed at the Smithsonian National Museum of Natural History (NMNH), The Academy of Natural Sciences of Drexel University (ANSP), University of Michigan Museum of Zoology (UMMZ) and the Ohio State University Museum (OSUM) ( Table 1; Appendix 1). In total, 53 formalin-fixed specimens representing 11 drainage systems were obtained for genetic analyses (Table 1). These specimens represent a subset of the M. dolomieu genetic diversity that was present in the native range 20-25 generations ago (Barthel et al., 2008).
Genomic DNA was extracted from preserved muscle tissue (20-50 mg) in a room previously unexposed to fish DNA using sterilized equipment. Prior to each extraction, all equipment and surfaces were treated with 10% bleach to remove any potential contaminants. Pikor, Enfield, Cameron, and Lam (2011) showed that high-quality DNA can be extracted from formalin-fixed tissue if the samples are rehydrated with a series of ethanol washes prior to extraction. Thus, 500 μl of 100% ethanol was added to each tissue sample and vortexed vigorously for 30 s. The liquid was removed, and the process was repeated with 500 μl 70% ethanol, followed by 1,000 μl distilled water. Lastly, 1,000 μl distilled water was added to each sample and left to soak at 55°C for 5 days, vortexing the sample every 24 hr.
Once rehydrated, the sample was moved to a dry Eppendorf tube before DNA extraction, using the QIAamp DNA FFPE tissue extraction kit (QIAGEN). In a recent review, Paireder et al. (2013) demonstrated that this kit consistently outcompeted other extraction methods when working with old (1820-1950), formalin-fixed tissue. Apart from doubling the amount of proteinase K added to each sample (60 μl), extraction followed the manufacturers' protocol. To break the formalin bonds, the samples were heated to 90°C for 1 hr before commencing with the wash steps. Lastly, to ensure the maximum elution of bound DNA, 10 μl elution buffer (warmed to 25.5°C) was added and left to "incubate" at room temperature for 5 min before centrifuging at 20,000 g for 1.5 min. This was repeated three times to yield a total DNA extraction volume of 30 μl. All DNA extractions were stored at −20°C. Nine localities rendering a total of 213 specimens were sampled from the same "broad" area represented by the historical samples to allow for direct genetic diversity comparisons (Table 1)

| Historical and contemporary DNA amplification
To corroborate the morphological identification of the contemporary collected specimens and assess genetic diversity and demographic history of both native and invasive populations, two partial mitochondrial DNA (mtDNA) gene regions, namely cytochrome b (cytb) and control region (CR), were amplified for all the contemporary samples (n = 519). This was not possible for the historical samples due to the limited availability of tissue and the degraded nature of the DNA. A standard 25 μl mastermix was prepared for both mtDNA polymerase chain reactions (PCRs). The internal cytb primers, basscytbf1 (5′-CAC CCC TAC TTC TCC TAC AAA GA-3′) and basscytbr1 (5′-AAG GCR AAG F I G U R E 1 Map of native USA (left) and invasive SA (right) sampling localities. Letters A-K denote historical sampling localities, while numbers denote contemporary sampling localities. All letters and numbers correspond to those used in CGG GTG AGG G-3′; Near, Kassler, Koppelman, Dillman, & Philipp, 2003), were used to amplify the cytb fragment. The primer set CB3R-L (5′-CATATTAAACCCGAATGATATTT-3′; Palumbi, 1996) and HN20-R (5′-GTGCTTATGCTTTAGTTAAGC-3′; Bernatchez & Danzmann, 1993) Lma102, Lma117-Neff, Fu, & Gross, 1999) were successfully amplified. As Lma102 and Lma117 were not polymorphic for a subset of specimens, they were excluded; therefore, nine polymorphic loci were used in the present study (Supporting Information Table   S1). Three multiplex reactions were performed using the KAPA2G™ Fast Multiplex PCR Kit (KapaBiosystems, Cape Town, South Africa).
The same nine microsatellite loci were amplified for the historic samples, following the amplification procedure used for the contemporary DNA, but due to the degraded nature of the DNA, this did not yield results. Thus, the resulting PCR products for each multiplex were diluted with distilled water to obtain a 1/10 PCR product which, as positive controls. Historical specimens were scored blindly (i.e., specimen name removed) and repeated three times to ensure accuracy and consistency. Where scoring inconsistencies were observed (historical specimens) and more than three loci could not be scored (for both historical and contemporary specimens), the entire specimen was removed from the data set and excluded from the study.
Similarly, as one microsatellite locus, Mdo8, did not amplify for the majority of historical samples, it was removed from the historical data set entirely. Thus, nine microsatellite loci were analysed for the contemporary data set, but only eight microsatellite loci were analysed for the historical data set.

| Contemporary mtDNA analyses
To assess genetic diversity levels in both the contemporary native (USA-CN) and invasive (SA-CI) ranges, the number of haplotypes (H), haplotype diversity (h) and nucleotide diversity (π) were calculated for each sample site. The population history for M. dolomieu in both ranges was examined using Fu's Fs (Fu, 1997) and Tajima's D (Tajima, 1989). Assessment of genetic population structure was conducted combining both native and invasive contemporary data sets for each gene fragment. Pairwise F ST values were calculated and a hierarchical analysis of molecular variance (AMOVA) conducted to determine the amount of population subdivision among sampled localities. All analyses were conducted in ARLEQUIN 3.5.2.2 (Excoffier & Lischer, 2010), with statistical significance assessed with 10,000 permutations.
Statistical significance of F IS was assessed after 1,000 permutations in FSTAT 2.9.3.2 (Goudet, 1995). Allelic richness (AR) was calculated using HP-Rare 1.1 (Kalinowski, 2005), correcting for sample size disparity through rarefaction analysis. Analyses were conducted per population for the two contemporary data sets, but due to the small sample size for most of the historical localities (Table 1) a Bonferroni post hoc test was used to further assess the differences between groups. In addition, a stacked bar graph was constructed to visualize the variation among localities and loci. Second, Weir's (1996) F ST was employed to assess the genetic differentiation among sampled localities using FreeNA 1.2 (Chapuis & Estoup, 2007). FreeNA, employing the ENA correction method (Chapuis & Estoup, 2007), was chosen as it has been shown to correctly estimate F ST values in the presence of null alleles (detected in the previous analysis; Chapuis & Estoup, 2007). A jackknife approach with 1,000 bootstrap replicates was employed to assess statistical significance (Chapuis & Estoup, 2007). Next, BOTTLENECK 1.2.02 (Piry, Luikart, & Cornuet, 1999) was used to test the prediction that both contemporary populations (CI and CN) experienced a recent genetic bottleneck. Populations that have undergone a genetic bottleneck are often associated with a loss of (rare) alleles and display elevated levels of heterozygosity when compared to stable populations (Piry et al., 1999). Thus, significant heterozygote excess was evaluated for each of the three groups using a Wilcoxon rank test (10,000 iterations) for two mutational models often associated with microsatellite evolution: the two-phase mutation model (TPM) and the infinite alleles model (IAM).  (Earl & vonHoldt, 2012) was used to determine the most probable K following the Evanno method (Evanno, Regnaut, & Goudet, 2005), before using CLUMPP 1.1.2 (Jakobsson & Rosenberg, 2007) to compile the five replicate runs for the most likely K. DISTRUCT 1.1 (Rosenberg, 2004)  Appendix 2). At last, as suggested by Guillemaud, Beaumont, Ciosi, Cornuet, and Estoup (2010), three supplementary scenarios were simulated to determine whether the two SA groupings (CI and CI S ) originated from (a) a single serial introduction from the source population (CN + HN), (b) two independent introduction events from the same source or (c) an unsampled source population ( Figure 5: i-iii; Appendix 2). To prevent overparameterization, parameters were specified according to the program guidelines (Cornuet et al., 2014).
First, we performed a pre-evaluation of the data set to ensure that at least one scenario and its associated priors could generate simulated data sets similar to that of the observed. This was accomplished by simulating 100,000 data sets and comparing summary statistics for both simulated single-sample (i.e., mean number of alleles, genetic diversity and allele size variance across loci) and two-sample statistics (i.e., mean genetic diversity, number of alleles, allele size variance, mean index of classification, shared allele distance, distance between samples and F ST ) to the observed data (Cornuet et al., 2014). As the mean M index across loci (Garza & Williamson, 2001) was initially developed with conservation planning in mind, this statistic does not perform well with small, unequal sampling sizes and small starting population sizes (Garza & Williamson, 2001). Hence, it was excluded from the summary statistics used in the current analyses. Next, we simulated 10 6 data sets per scenario before calculating the posterior probability (PP) for each. Scenarios were subsequently compared through a logistic regression, which was conducted on the linear discriminant analysis components (Cornuet et al., 2014). Each scenarios error rate was evaluated by generating 100 pseudo-observed data sets, using parameter values obtained from one of the scenarios (e.g., scenario 1). The type I error rate was determined by counting the number of times the PPs were higher for any scenario other than the chosen scenario, divided by the number of pseudo-observed data sets (i.e., 100), while the type II error rate was calculated by counting the number of pseudo-observed data sets that unrightfully received the highest PP support (Cornuet, Ravigne, & Estoup, 2010). between sampling localities and gene fragment (Table 2). In particular, overall nucleotide diversity was higher for cytb in the CI populations (   (Figure 2, Supporting Information Figure S2). Moreover, the ANOVA revealed   Table S3).

| Contemporary and historical microsatellite analyses
The remaining six CI populations, however, displayed substantial levels of admixture, in particular localities BE and OL (Figure 4a). The  Table 1; Figure 4b).
The ABC analysis consistently supported the notion of a more recent introduction. The first set of scenarios tested (Scenarios 1-6; Figure 5) revealed that Scenario 2 had the highest posterior probability (Supporting Information Table S5). The second set of analyses (Scenario A-I; Supporting Information Figure S1) supported both Scenarios C and F (Supporting Information Table S5). The third set of simulations (Scenarios i-iii; Supporting Information Figure S1), where we tested for a single versus multiple introductions from a single source or an unsampled source population, was inconclusive. Scenario iii (unsampled source population) did, however, marginally receive the most support (Supporting Information Table S5).
Type I and Type II error rates were marginally low for the first two sets of simulations conducted (Supporting Information Table S5), but not for the third simulation (Supporting Information Table S5).

| Genetic diversity through space and time
Elevated levels of genetic diversity were observed in the contemporary invasive (CI) range when compared to the contemporary native (CN) range, contradicting the general assumption that genetic diversity is lower in recently invaded ranges than in long-established native populations. However, when comparing all three groups, the historical native (HN) range displayed the highest levels of heterozygosity, number of alleles (Na) and allelic richness (AR). Although this might have resulted from a statistical artefact due to the smaller sample size for the HN range, similar findings were previously reported for Atlantic salmon (Salmo salar; Nielsen, Hansen, & Loeschcke, 1997). The authors observed a significant decrease in Na for the contemporary population when compared to samples collected 60 years before, likely due to a recent genetic bottleneck. Our results support this proposition, as the CN population exhibited high haplotype, but low nucleotide genetic diversity, as well as significantly negative Tajima's D and Fu's Fs levels, all of which are commonly observed in a population that had undergone a genetic bottleneck before experiencing expansion (Grant & Bowen, 1998). Moreover, the lack of population structure in the CN range, as well as low AR and Na, further supports this notion. Strong and sustained declines in population size, such as the ones experienced by commercially exploited species, are known to leave signatures in the genetic diversity of species, in particular by reducing Na and AR (Pinsky & Palumbi, 2014).
Thus, the observed contemporary population dynamics of M. dolomieu in its native range might have resulted from the interaction between overfishing and restocking events during the last two centuries (Long et al., 2015). Micropterus dolomieu has been harvested both commercially and recreationally since the 1800s and has experienced several population declines and even extirpations in some localities (Marsh, 1867). This led the US government to start breeding programmes and enforce stricter regulations on fishing in the 1870s (Long et al., 2015). In 1903 alone, ~500,000 juvenile black bass were released into waterbodies across the USA (Bowers, 1905;Long et al., 2015;Loppnow et al., 2013).   (Chown et al., 2015;Gray et al., 2014). Given that hatcheries make use of artificial selection techniques to enhance species production and abundance (e.g., Aprahamian, Smith, McGinnity, McKelvey, & Taylor, 2003;Lamaze et al., 2012), it is possible that the introduced M. dolomieu F I G U R E 3 Principal component analysis (PCA) conducted on the combined microsatellite genotypes for the three groups (i.e., CI-contemporary invasive SA, CN-contemporary native USA, HN-historical native USA). Each dot represents a genotyped individual, and colours correspond to sampled localities. Variance explained in parentheses F I G U R E 4 STRUCTURE plots representing the population structure within (a) each of the three groups (CI-contemporary invasive SA, CN-contemporary native USA, HN-historical native USA) when ran independently, and (b) population structure for all localities combined into a single run. Each genotyped individual is represented by a vertical line, with each lines' colour proportional to the cluster membership of the individual were of admixed or hybrid origin, as has been reported for stockings of S. fontinalis (Cooper, Miller, & Kapuscinski, 2010;Lamaze et al., 2012;Sloss, Jennings, Franckowiak, & Pratt, 2008).

| Population substructuring in an invaded range
Invasive species capable of harbouring large, genetically diverse source populations are thought to make better invaders (Gaither, Bowen, & Toonen, 2013), as they are equipped with higher adaptive potential (Dlugosch, 2006;Lavergne & Molofsky, 2007;Wellband & Heath, 2017). Within the invasive South African range, M. dolomieu experiences an array of climatic conditions with fluctuating rainfall and temperature regimes (Rutherford, Mucina, & Powrie, 2006). However, despite this, M. dolomieu has not only survived, but also established viable populations and spread throughout the systems into which it was introduced (Van Der Walt, Weyl, Woodford, & Radloff, 2016).
Although the initial introduced individuals may have been of admixed stock, the substantial admixture observed among M. dolomieu populations in the invaded range may also have resulted from hybridization post introduction  as has been observed for M. dolomieu introductions elsewhere (Avise et al., 1997;Bagley, Mayden, Roe, Holznagel, & Harris, 2011;Pipas & Bulow, 1998;Whitmore & Butler, 1982;Whitmore & Hellier, 1988

| The influence of sampling design on genetic diversity
Molecular techniques are indispensable tools in invasion biology (Blanchet, 2012;Muirhead et al., 2008), particularly for reconstructing species invasion histories and routes Guillemaud et al., 2010Guillemaud et al., , 2015Wilson, Dormontt, Prentis, Lowe, & Richardson, 2009 (Hargrove, Weyl, & Austin, 2017). Their results revealed that all lentic populations had allele frequencies dominated by a single allele, but that a population sampled from Kowie Weir, located at the end of a 580 km 2 catchment, was more diverse, suggesting multiple introduction events or hybridization between co-occurring Micropterus species (Hargrove et al., 2017). Thus, choice of sampling locality and, in particular, the degree of isolation are important considerations when assessing the demographic or invasion history of a species.

| Management implications
Understanding the introduction history of an invasive species is crucial when wanting to decide on a management strategy for the species in question  In conclusion, while studies comparing contemporary genetic variation among native and invasive ranges are valuable (Lozier & Cameron, 2009), incorporating historical DNA is essential for monitoring temporal changes in genetic diversity that are often overlooked in comparisons using only contemporary data (Hansen, 2002;Lozier & Cameron, 2009 Our study highlights the importance of including historical DNA; however, caution should be taken when working with historical specimens as the degraded nature of the DNA not only hampers the successful amplification of the specimens (Sefc, Payne, & Sorenson, 2003;Sefc et al., 2007), but also renders it susceptible to genotyping discrepancies. Despite this, we recommend that future studies attempting to infer the demographic history of invasive species should incorporate native historical samples.

ACK N OWLED G EM ENTS
The authors would like to thank Louis Bernatchez and two anonymous reviewers for valuable comments and suggestions on earlier versions of the manuscript. We would like to thank the following peo- This work is based on research supported partly by the Department of Science and Technology (DST) and National Research Foundation (NRF) of South Africa (Grant Nos. OLFW: 110507, 109015;CH: 89967, 109244) and the DST-NRF Centre of Excellence for Invasion Biology.

CO N FLI C T O F I NTE R E S T
None declared. A P P E N D I X 1 (Continued) originate from an admixture event between the sampled CN populations and an unsampled ghost population.

S CE N A R I O I-I I I
The following three scenarios were run to test if both introductions (CI and CI S ) did in fact originate from one source population, that is, USA (CN + HN). Scenario G: Both CI and CI S originated independently from the source population (i.e., multiple introductions from single source). Contrastingly, scenario H suggests that only CI S originated from the source population, with CI originating from CI S (i.e., single introduction). At last, scenario I states that both CI and CI S were founded independently from an unsampled source population, which in turn originated from the source (i.e., multiple introductions, but only a single introduction from the source).