Testing for evolutionary change in restoration: A genomic comparison between ex situ, native, and commercial seed sources of Helianthus maximiliani

Abstract Globally imperiled ecosystems often depend upon collection, propagation, and storage of seed material for use in restoration. However, during the restoration process demographic changes, population bottlenecks, and selection can alter the genetic composition of seed material, with potential impacts for restoration success. The evolutionary outcomes associated with these processes have been demonstrated using theoretical and experimental frameworks, but no study to date has examined their impact on the seed material maintained for conservation and restoration. In this study, we compare genomic variation across seed sources used in conservation and restoration for the perennial prairie plant Helianthus maximiliani, a key component of restorations across North American grasslands. We compare individuals sourced from contemporary wild populations, ex situ conservation collections, commercially produced restoration material, and two populations selected for agronomic traits. Overall, we observed that ex situ and contemporary wild populations exhibited similar genomic composition, while four of five commercial populations and selected lines were differentiated from each other and other seed source populations. Genomic differences across seed sources could not be explained solely by isolation by distance nor directional selection. We did find evidence of sampling effects for ex situ collections, which exhibited significantly increased coancestry relative to commercial populations, suggesting increased relatedness. Interestingly, commercially sourced seed appeared to maintain an increased number of rare alleles relative to ex situ and wild contemporary seed sources. However, while commercial seed populations were not genetically depauperate, the genomic distance between wild and commercially produced seed suggests differentiation in the genomic composition could impact restoration success. Our results point toward the importance of genetic monitoring of seed sources used for conservation and restoration as they are expected to be influenced by the evolutionary processes that contribute to divergence during the restoration process.


| INTRODUC TI ON
Restoration aims to mitigate the loss and degradation of native ecosystems by reducing the abundance of non-native species, increasing biodiversity and habitat connectivity, and re-establishing native plant communities resilient to change (Benayas et al., 2009;Hobbs & Norton, 1996;Hodgson et al., 2016;Thomson et al., 2009). To achieve these goals, extensive inputs of native seed are required, often in quantities too large to be harvested from local, wild populations (Broadhurst et al., 2008;Merritt & Dixon, 2011;Pedrini et al., 2020). To compensate for these deficits, seeds used in restoration are often produced commercially. However, commercial seed production can lead to the evolution of differences that may impact restoration goals (Dyer et al., 2016;Espeland et al., 2017;Nagel et al., 2019;Pizza et al., 2021;Roundy et al., 1997). Evolution of seed material can occur through bottlenecks and sampling effects following collection, propagation, or cultivation which can lead to reductions in genetic diversity and the loss of locally adapted alleles, impacting fitness and reducing the evolutionary potential of restored populations (Blanquart et al., 2013;Fant et al., 2008;Kawecki & Ebert, 2004;Robichaux et al., 1997;Williams, 2001;Wright, 1938).
Combined with selection, which may intentionally or unintentionally lead to genomic and consequent phenotypic change, there is substantial opportunity for evolution during the restoration process (Dyer et al., 2016;Espeland et al., 2017;Nagel et al., 2019). Given the impact different evolutionary processes could have, understanding how these factors interact to influence seed material will have substantial economic and ecological consequences for restoration success (Bischoff et al., 2006, Bucharova et al., 2017, Gerla et al., 2012, Keller et al., 2000, Kimball et al., 2015. Despite a substantial body of work that has outlined best practices for sampling ex situ seed (Griffith et al., 2021;Hoban & Schlarbaum, 2014), selection and sampling effects imposed during collection may also pose a significant challenge to the preservation of genetic variation. Ex situ seed collections aim to preserve extant genetic variation to incorporate into restoration or breeding programs in the future (Hamilton, 1994;Li & Pritchard, 2009).
Both commercial and ex situ seed collections aim to maximize genetic diversity while maintaining locally adaptive genetic variation across space and time (DiSanto & Hamilton, 2020;Griffith et al., 2015). Consequently, genomic comparisons between contemporary wild populations, commercially produced material, and ex situ conservation collections provide an ideal means to evaluate the evolution of seed material maintained for conservation and restoration (Robichauxet al., 1997;Schoen & Brown, 1993;Taft et al., 2020).
Genomic comparisons of conservation and restoration seed sources with contemporary native populations can be used to infer whether evolutionary challenges inherent to the collection and maintenance of these resources cause them to differ from the wild populations they are intended to match (Pizza et al., 2021).
Sampling effects can generate substantial genomic differences across seed sources with lasting impacts to conservation goals and restoration outcomes (DiSanto & Hamilton, 2020;Diwan et al., 1995;Franco et al., 2005;Hamilton, 1994). The genomic effects of sampling correspond to those found following population bottlenecks, including a reduction in effective population sizes (N e ) (Leberg, 1992;Wright, 1938), making this a useful metric to compare across populations when quantifying the effects of sampling. In addition, following a bottleneck, rare alleles are more likely to be lost, influencing the distribution of allele frequencies (Excoffier et al., 2009;Maruyama & Fuerst, 1985;Tajima, 1989). Stochastic changes in allele frequencies associated with sampling may also more broadly effect the estimation of inbreeding coefficients (F is ), linked to inbreeding depression (Cavalli-Sforza & Bodmer, 1971;García-Cortés et al., 2010;Husband & Schemske, 1996), or estimates of coancestry (θ), indicative of relatedness among individuals within populations.
Importantly, not only are these metrics useful for assessing the magnitude of sampling effects, they are also common proxies for evaluating short-and long-term fitness effects associated with inbreeding depression and increased relatedness among breeding individuals (Angeloni et al., 2011;Caballero & Toro, 2000;Hughes et al., 2008;Keller & Waller, 2002). Thus, these metrics provide valuable comparisons to assess the quality of restoration and conservation seed resources relative to their wild counterparts.
In addition to stochastic processes associated with sampling, directional selection in the agronomic environment can cause evolution during cultivation. This can include selection associated with chemical inputs and fertilizers used to improve yield, reductions in competition, or abiotic stress (Dyer et al., 2016;Espeland et al., 2017). Moreover, individuals with traits promoted by mechanized agricultural harvest, such as reduced shattering, minimum heights, or selected phenology, could evolve in commercially produced material relative to wild populations (Dyer et al., 2016;Nagel et al., 2019).
Previous experimental evidence indicates selection can influence the genetic and phenotypic composition of restoration seed (Dyer et al., 2016;Nagel et al., 2019), but no study to date has directly compared the genomic composition of commercially produced seed with native remnant populations in the region of restoration. Genomic signatures of selection can be identified through a variety of statistical analyses developed from the site frequency spectrum (SFS), the distribution of allele frequencies sampled across the genome (Hohenlohe et al., 2010;Nielsen, 2001). Of these metrics, Tajima's D is notable for possessing relatively high statistical power compared to other methods that estimate the strength of selection (Simonsen K E Y W O R D S comparative genomics, ecological restoration, evolutionary potential, ex situ conservation, genetic bottlenecks, seed provenance, selection Tajima, 1983). If selection occurs during commercial seed production, then we would expect commercial populations to have more negative values of Tajima's D relative to wild populations. Natural variation in gene flow could also contribute to genetic differentiation among populations, manifesting as isolation by distance (IBD) when genetic differences increase with spatial distance (Slatkin, 1993;Wright, 1943). When present, IBD is expected to produce a positive relationship between genetic differences and the spatial separation between populations and could explain genomic patterns that might otherwise be attributed to selection associated with different seed sources. Consequently, the relationship between geographic and genetic distance can provide a valuable null hypothesis against which alternative evolutionary scenarios can be tested (Bradburd et al., 2016). If genomic differentiation among seed source populations is solely explained by IBD, evolution associated with seed source type has likely not occurred. However, if IBD is absent or is insufficient to explain population differences, other evolutionary factors likely contribute to differentiation across seed source types.
North American grasslands remain one of the most threatened ecosystems globally, with over 98% converted due to anthropogenic development (Comer et al., 2018;Hoekstra et al., 2004;Samson et al., 2004). However, restoration efforts are ongoing to mitigate the loss of native grasslands by planting commercially produced seed mixes (Benayas et al., 2009;Hobbs & Norton, 1996;Thomson et al., 2009). The perennial forb Helianthus maximiliani Schrad. (or Maximilian sunflower) is a common constituent of native grassland communities and a frequent component of restoration seed mixes (McKenna et al., 2019;USDA, 2004). H. maximiliani is distributed across an extensive range of climatic variation spanning a broad latitudinal range from northern Mexico to southern Canada (Kawakami et al., 2014;USDA 2004). Previous genetic studies using microsatellites revealed substantial heterozygosity and low inbreeding rates within populations, consistent with an obligate outcrossing mating system (Kawakami et al., 2014). Quantitative genetic experiments have also found differentiation in traits associated with climatic variation, including freezing tolerance and flowering time (Kawakami et al., 2014;Tetreault et al., 2016). There have also been efforts to breed H. maximiliani (hereafter selected lines) as a source of seed oil by selecting for increased height, reduced shattering, and increased seed yield (Asselin et al., 2020). Here, we take a genomics approach to evaluate the factors contributing to evolutionary change among H. maximiliani seed sources to inform both conservation and restoration efforts into the future.
Specifically, we compare contemporary wild populations with seed from ex situ collections, seed commercially produced for restoration, and agronomically selected seed to (1) test for differences in genomic composition using ordination and metrics of genetic differentiation, (2) test whether isolation by distance can explain genomic differences among seed source populations, and (3) quantify the impact of sampling and selection across seed source types by comparing genomic summary statistics. With this third objective, we compare statistics that indicate how much genetic variation is maintained across seed sources as a metric of evolutionary potential, including expected heterozygosity (H e ), inbreeding coefficients (F is ), and linkage disequilibrium effective population size (LD-N e ). We also estimate and compare parameters that can be used to evaluate whether sampling effects or the impact of selection contribute to genomic differences across seed sources. This includes F is and LD-N e , in addition to coancestry (θ), and Tajima's D. Overall, this study serves as an important test of recent hypotheses that identify the role evolutionary processes can play throughout the collection, propagation, and implementation stages of conservation and restoration. Our results provide valuable guidance for the future collection and deployment of seed for restoration while identifying new avenues of research that can address the evolutionary consequences of seed collection and cultivation.

| Population sampling
To assess the impact of demographic variation and unintentional selection on the evolution of seed material used in restoration, we compared the genomic composition of contemporary wild populations to seed from three distinct sources: seed collected and/or cultivated by commercial suppliers for restoration, seed preserved in ex situ collections, and lines selected for agronomic traits (hereafter selected lines) to assess the impact demographic variation and unin-  Ex situ seed populations included in this study were sourced from the USDA National Genetic Resources Program (https://www. ars-grin.gov/). These bulk seed collections, designated by local provenance, were collected in North Dakota in September of 1991 (4 collections) and 1995 (2 collections). Ex situ seeds were bulk-harvested by clipping mature seed heads, following which seed heads were dried and cleaned prior to storage in a cold room at 4℃ with 25% humidity. Leaf tissue samples were collected from 20 randomly selected individuals from each population or seed collection (20 individuals × 13 sources = 260 total individuals) and stored in silica gel prior to DNA extraction.

| DNA sequencing and genotyping
We extracted DNA from ~10 mg of dried leaf tissue using a modi- Sequence files were demultiplexed using ipyrad version 0.9.12 (Eaton 2014) allowing for zero mismatches in the barcode region.
Following demultiplexing, single nucleotide polymorphisms (SNPs) were called across populations and seed sources using the dDocent v2.7.8 pipeline (Puritz et al., 2014a,b). In the first step of the pipeline, reads were trimmed using the program TRIMMOMATIC (Bolger et al., 2014), including the removal of low-quality bases and Illumina adapters. Following read trimming, the pipeline aligned reads to the Helianthus annuus v1.0 genome using BWA (Li & Durbin 2009). Sequence alignment was performed using the software's default parameters (a match score of 1, mismatch score of 4, and gap score of 6). Finally, as a last step, dDocent called SNPs using the software FREEBAYES (Garrison & Marth 2012) that produced a VCF file with 4,735,557 total SNPs. Downstream SNP filtering of the VCF file first removed missing loci variants with conditional genotype quality (GQ) <20 and genotype depth <3. Then, loci with Phred scores (QUAL) ≤30, allele counts <3, minor allele frequencies <0.05, call rates across all individuals <0.9, mean depth across samples >154 (based on the equation from Li, 2014), and linkage scores >0.5 within a 10 kb window were removed. Following downstream filtering, 12,943 polymorphic loci were kept and used for subsequent analyses. Individuals with more than 30% missing genotypes were removed from the analysis. In total, 14 individuals from wild contemporary populations, two individuals from ex situ collections, and one individual from a commercial supplier were discarded, leaving a total of 363 genotyped individuals for inclusion in subsequent analysis (Table 1).

| Population structure and genetic differences between seed sources
To test for the effects of seed source on the genetic structure among H. maximilani populations, we used principal component analysis (PCA) and discriminant analysis of principal components (DAPC) to partition the genetic variation observed in our sampling. Pairing these methods provides valuable insight as it allows comparison of a method agnostic to a priori expectations for population structure (PCA) to one which attempts to best depict differences between populations (DAPC). Additionally, while PCA allows for the visualization of individual axes which explain decreasing amounts of the total genomic variation, DAPC can combine and display variation across multiple axes of variation simultaneously. DAPC accomplishes this by first partitioning genetic variance using PCA and then using discriminant analysis to maximize interpopulation variation while minimizing intrapopulation variation. This allows DAPC to identify the axes of variation that simultaneously maximize between group differences and minimize within group differences (Jombart et al., 2010). Thus, TA B L E 1 Geographic location, sample size, and year of harvest for Helianthus maximiliani seed sources Note: n: number of individuals in each population retained for genetic analysis.
State Collected From: KS, Kansas; MN, Minnesota; ND, North Dakota; and SD, South Dakota.
Cultivated: N, seed collected from naturally growing stands and Y, grown in an agroecosystem for at least one generation prior to seed harvest.
DAPC will isolate and incorporate only those axes that contribute to differences between our populations, while PCA depicts population groupings onto major axes of variation.
We performed principal component analysis (PCA) on the matrix of SNPs used for all individuals in the study. Missing data (2.5% of all loci) were substituted with the mean allele frequencies at each locus.
We calculated the total variation explained by each axis by dividing the eigenvalue of each PCA and the total sum of all eigenvalues. PCA was performed with the dudi.pca function within the ADEGENET package (Jombart 2008;Jombart & Ahmed 2011). We then plotted individuals along the first two PCA axes using s.class function in the package ADEGRAPHICS (Siberchicot et al., 2017).
We first applied DAPC to the entire SNP dataset and then to a subset of the data including only individuals from wild contemporary and ex situ populations. Cross-validation identified the optimal number of PC axes (175 and 108, respectively) necessary to describe among population differences for analysis of all populations and for the ex situ and wild contemporary population comparison alone. We then retained 18 and 11 discriminant functions for depicting between group differences for all seed sources and ex situ and wild contemporary analysis, respectively. All DAPC analyses were performed using the R package ADEGENET (Jombart 2008).

| Isolation by distance in seed collections
Genomic differences across populations can arise from the independent evolution of populations connected by limited gene flow giving rise to isolation by distance (IBD). Patterns of neutral evolution produced by IBD could create genomic differences that are erroneously attributed to the effects of selection or sampling. For this reason, we tested for any correlation between F st calculated between two populations and the spatial distance between them.
Testing for IBD was also necessary due to the uneven spatial distribution of populations from different seed sources. If we identify a positive signal of IBD, then genomic differences could be due to the spatial arrangement of sampling, rather than any evolved differences. Therefore, it is necessary to test for IBD to confidently attribute genomic differences to environmental or sampling effects associated with different seed source types in our later analyses.
Pairwise genetic differences between populations were calculated as F st using the Weir and Cockerham's method which is unbiased with regard to differences in sample sizes (Weir & Cockerham 1984;Willing et al., 2012). Unlike DAPC, pairwise F st allows us to quantify the total genetic differences between population pairs. Importantly, we can compare the magnitude of genetic differences for populations of the same seed source type (e.g., two wild contemporary populations) to differences calculated between populations of different seed source types (e.g., wild contemporary population versus commercial population). Therefore, before testing for IBD, we confirmed that estimates of pairwise F st indicated the presence of genomic differences between seed sources and were similar to patterns of differentiation observed with DAPC and PCA. If evolved differences between populations have developed due to conditions associated with seed source type, we expect inter-source F st to be larger than intra-source F st . To test for differences in inter-and intrasource Fst, we used a Wilcoxon rank sum test implemented with the function "wilcox.test" in R.
To test for effects of IBD and seed source types on pairwise F st , we first calculated the geographic distance between seed collections. Exact geographic location data were available for all wild contemporary and ex situ populations. Locations for commercial populations C-1, C-2, and C-3 were estimated as approximate locations based on descriptions of the counties, cities, reserves, and geographic features associated with the provenance for each collection. Provenance data were not available for the remaining two commercial populations (C-4 and C-5) and selected lines (S-1 and S-2), and therefore, these populations were not included in the analysis.
Geographic distances were calculated using the haversine formula, which accounts for the curvature of the earth (Robusto 1957), and then square root transformed to improve model fits. Distance measurements were made using with the R package GEODIST.
The relationship between F st , distance, and seed source types used to calculate F st was evaluated using model selection with a series of linear mixed models. In these models, F st was expressed as function of spatial distance (fixed effect) and a factorial variable coded for the different pairwise seed source comparisons with random slopes and intercepts. The variable for seed source comparisons required six levels in total, three for each of the intra-source comparisons (wild to wild, ex situ to ex situ, and commercial to commercial) and three for each of the inter-source comparisons (wild to ex situ, wild to commercial, and ex situ to commercial). We compared the full model which included both spatial distance and seed sources to two reduced models each of which included only one of the terms. A likelihood-ratio test, implemented with lrtest function in the package LMTEST (Zeileis & Hothorn 2002), was used to identify significant differences between the full and reduced models. When the full and reduced models were significantly different, we chose the model with the greatest log-likelihood value as the model with the best fit.

| Signatures of sampling and selection
To ascertain the importance of sampling effect and selection in contributing to differences among seed sources, we calculated expected heterozygosity (H e ), inbreeding coefficients (F is ), linkage disequilibrium effective population size (LD-N e ), and coancestry coefficients (θ). Expected heterozygosity (H e ) and inbreeding coefficients (F is ) were calculated individually for each SNP using the R package We compared H e and F is across seed source types using linear mixed models with seed source type as a fixed effect and population as a random effect. The significance of individual terms and post hoc tests were performed with the R package LMERTEST (Kuznetsova et al., 2017). Differences between LD-N e and θ among wild contemporary, ex situ, and commercial seed sources were compared using linear models implemented with the lm function. Selected populations were not included in linear models due to lack of replication (n = 2).
To test for signatures of selection or bottlenecks across seed source types, we calculated Tajima's D for each population (Tajima 1989  We also note here that plotting per-locus F st as a function of H e revealed that the data were depauperate in low H e , high F st loci ( Figure 2). This pattern has been found in other work and matches the expected relationship for these variables when drift and selection contribute similarly to evolutionary differentiation (Narum & Hess 2011). When drift and selection similarly impact the genome, accounting for neutral differentiation in outlier analysis could increase type II error, while failing to account for it would increase type I error. As a result, outlier analyses are not expected to yield reliable results and were therefore not considered in this manuscript.
To visualize the frequency of rare alleles and overall genetic diversity across seed source types, we constructed a folded site frequency spectrum (SFS) for each seed source, with the exception of the selected lines. SFSs were estimated from the filtered SNP dataset (12,943 variants, 363 individuals) using the set of R functions available at https://github.com/sheng lin-liu/vcf2sfs. Individuals from populations classified as the same seed source type (Table 1) were pooled together to generate seed source-specific allele frequency profiles.

| Population structure and genetic differences between seed sources
In total, 363 individuals and 12,943 SNP loci passed our filtering requirements. PCA of the entire dataset required 110 axes to explain over 50% of the total genetic variation across all seed source types.
A total of 4.0% of the total genetic variation was explained by the first principal component axis, which differentiated the two selected populations from all other seed sources ( Figure 3A). The second axis explained 2.2% of the total genetic variation and separated ex situ population ES-E and commercial populations C-2 and C-5 on either end of the axis and from all remaining populations at the center.
When genetic variation was partitioned for all seed sources using DAPC, the first two axes explained 32.2% and 18.1% of genetic variation, respectively ( Figure 3B). These values are considerably greater than PCA axes because DAPC incorporates and depicts the relationships across multiple axes of variation simultaneously.
DAPC, which also attempts to maximize differences between pre-

| Isolation by distance in seed collections
Pairwise F st ranged from −0.001, between commercial populations C-2 and C-5, to 0.238 between ex situ populations ES-E and selected population S-1 ( Figure S2). Negative F st likely arises due to similarity between populations and the infrequent presence of polymorphic loci at otherwise monomorphic sites (Roesti et al., 2012).
Pairwise  Table 2). The model using only distance as an independent variable was not significantly different from the full model.
Wild contemporary and commercial seed populations spanned a wide range of LD-N e estimates in comparison with ex situ and selected populations ( Figure 4C). Despite these trends, we did not observe significant differences in LD-N e between seed source types (F 2,14 = 3.3, p = 0.069). Patterns of coancestry across seed source types mirrored those for LD-N e , and the linear model comparing the effect of seed source was significant (F 2,14 = 3.6, p = 0.037).
Although wild contemporary and commercial populations had lower θ (range 0.001-0.022) than ex situ and selected populations (0.011-0.042), post hoc comparison revealed only ex situ and commercial populations were significantly different ( Figure 4D).
Genome-wide estimates of Tajima's D were not significantly different from neutral expectations for any population, including selected lines (Table S1). Nonetheless, Tajima's D estimated for commercial populations was significantly greater than those for wild contemporary populations (analysis of variance: F 2,14 = 3.82, p = 0.048; Tukey post hoc test wild and commercial: p = 0.047) ( Figure 4E) suggesting differences in the site frequency spectrum among populations for these two seed sources. The shape of the folded SFS for wild, ex situ, and commercial seed sources was similar, with few rare alleles, a peak at a frequency around 0.09, and a gradual decline at higher frequencies ( Figure 5). The SFS for commercial genotypes could be distinguished from ex situ and wild genotypes by a higher abundance of the rarest alleles.

| DISCUSS ION
An overarching goal of both restoration and conservation is to maintain evolutionary potential to ensure populations sustain the ability to adapt to change Hoffmann & Sgro 2011).
However, for both ex situ conservation collections and seed propagated for restoration, the efficacy of these goals may be dependent upon the amount and type of genetic variation maintained in populations. Sampling effects and genetic bottlenecks associated Commercial with seed collection and selection during propagation can create genotypic differences between seed source types. Using a genomic dataset assembled from wild contemporary, commercial, ex situ, and selected populations of H. maximiliani, we tested for the presence of genomic differences that could be attributed to seed source type. We found evidence that commercial seed and selected lines were genetically differentiated from wild and ex situ collections.
These differences could not be explained by neutral evolutionary processes, such as isolation by distance, implicating other explanations for genomic differences among seed sources. While we did not find direct evidence that selection caused genomic differentiation between seed sources, increased coancestry and low LD-N e in ex situ collections were consistent with an impact of sampling during seed collection. Varying genomic composition of commercial seed sources relative to wild, contemporary populations suggest further study is required to evaluate whether genomic differences correspond to functional differences that impact restoration success.
Common garden studies have shown that seed transfer across environments can impact plant traits and performance (Bucharova et al., 2017;Giencke et al., 2018;Johnson et al., 2004;Lesica & Allendorf 1999;Yoko et al., 2020). Consequently, the genomic differences we observe here warrant additional study to link genomic differences in H. maximiliani seed sources to functional traits and persistence in restored environments. Our work also suggests that similar studies in other plant species are warranted to better understand how restoration seed inputs can evolve and the degree to which genomic differences among seed sources could impact restoration success.
Evolution in commercially produced seed material could be caused by population bottlenecks during initial collection or during cultivated propagation (Espeland et al., 2017;Pizza et al., 2021). A common garden study comparing commercial seed and wild collected seed found fitness was reduced in commercial seed, regardless of whether the environment was stressful or not, consistent with the expectations of a population bottleneck reducing fitness (Pizza et al., 2021). The potential of sampling to affect the genomic composition of collections has also been a concern in ex situ conservation. Population bottlenecks can co-occur with selection caused by the application of fertilizers and pesticides, the use of machinery to harvest seed, or an unnatural biotic environment (Espeland et al., 2017). local practices in commercial production or that suppliers are pulling from similar genetic resources. We did not find robust evidence for selection as a cause for the differences between wild and commercial seed, which suggests it is more likely that seed suppliers are using similar genetic stock. Regardless of the reason for this effect, similarity in commercial seed does not match most restoration goals which attempt to balance high genetic diversity with the need for locally adapted seed inputs Hufford & Mazer 2003;McKay et al., 2005), a problem compounded when commercial seed is not a close analogue to wild populations. Commercial seed is used for restoration because the necessary volume of seed cannot be sustainably harvested from wild populations (Broadhurst et al., 2008).
If there are few H. maximiliani populations of appropriate size for harvesting seed within different regions, it would then be unsurprising if different commercial suppliers obtained and mixed germplasm from the same wild sources. While we do not know the fitness of commercial seed relative to wild genotypes, the genomic dissimilarity between commercial and wild seed warrants greater communication between seed suppliers and restoration practitioners to understand the potential causes of differences observed.
Genomic differences between H. maximiliani populations were not correlated with geographic distances and do not appear to demonstrate patterns of IBD. In natural populations, genomic differences are expected to increase in response to increasing spatial distance and a corresponding reduction in gene flow among populations (Slatkin 1993;Wright 1943). The absence of IBD in our data could have multiple explanations. First, there could be sufficient gene flow to connect H. maximiliani populations across the largest spatial scales included in our analysis, but substantial gene flow should also homogenize the genomic variation between populations. This does not correspond to the results of our DAPC analysis, which was able to partition genomic variation, not just at the scale of seed source types, but at the level of individual popula- tions. An alternative cause for the lack of IBD could be rates of gene flow near zero, such that every population is functionally isolated, negating the effect of distance. Although fragmentation of prairie habitat in North America has indeed increased the isolation among plant populations (Samson et al., 2004;Wimberly et al., 2018), the complete cessation of gene flow across populations has not been observed in other species. In the grass Festuca hallii, distance was still correlated with genetic variation across the same geographic region considered in our study (Qiu et al., 2009). Although grasses and sunflowers differ in their pollination ecology and methods of seed dispersal, these patterns of differentiation in F. hallii suggest it is unlikely that prairie plant populations are so isolated that geographic distance has no effect on population structure. Rather, given the structure of our analysis, it is more probable that seed source differences disrupted patterns of IBD and more strongly predicted dif- found that molecular evidence of evolution was not apparent in two (Nagel et al., 2019). Species that were perennial or outcrossing, such as H. maximiliani, were also less likely to exhibit evidence of selection. Thus, although we uncovered multiple ways in which commercial and wild populations differ, the life history and mating system of H. maximiliani may have buffered against evolutionary change during commercial production. Overall, the genomic differences between commercial and wild populations do not appear to be driven by selection during cultivation, a phenomenon which might be more common in plant species with shorter life histories or that exhibit greater instances of selfing.
We found significant differences in coancestry between ex situ and commercial seed sources. Ex situ populations also had lower LD-N e than commercial populations, and although this comparison was not significant, a high coancestry should coincide with higher rates of linkage disequilibrium and lower LD-N e . Low LD-N e and higher coancestry without corresponding increases in F is could reflect the sampling methods used to establish these collections. Alleles are more likely to be identical by descent in populations with greater coancestry and are less likely to represent the uniform sampling of large populations (Cavalli-Sforza & Bodmer 1971). In ex situ seed collections, high coancestry and low LD-N e could result from sampling large quantities of seed from a relatively small number of maternal individuals. Sampling in this manner would also not immediately reduce H e or increase F is in a self-incompatible species prior to sexual reproduction (Allendorf 1986;Leberg 1992), but would increase coancestry and LD-N e because of the large number of half-siblings represented in the collection. The difference between commercial and ex situ collections may imply that commercial seed provides a superior resource by harboring greater genotypic diversity. Whether or not this is true likely depends on the specific goal of the collection.
For example, high coancestry could be mitigated if multiple ex situ collections are mated prior to deployment in the wild. Additionally, genomic clustering analyses indicate ex situ collections are closer analogues to contemporary wild populations and could be superior resources for restoration if the genotypic differences depicted in our analysis correlate with functional differences. This suggests the need for additional work to evaluate the consequences of high coancestry and genomic differences from wild populations and will be essential for applying our results into practice for restoration.
The production of seed for restoration and conservation includes an inherent conflict between maintaining the genomic composition of wild populations and supplying large volumes of seed (Broadhurst et al., 2008;Espeland et al., 2017). In addition to these challenges, the goals of conservation are themselves sometimes in conflict, with the need to maintain populations that are locally adapted while maximizing genetic diversity to buffer against contemporary and future environmental challenges, respectively (Bucharova et al., 2017;Hamilton et al., 2020). The loss of genetic diversity and evolution of functional traits during cultivation is thus a major concern for restoration efforts. In our comparison of commercial and wild H. maximiliani collections, we did not find evidence of selection or reduced genetic variation in commercial seed, but we did observe significant differences in their genotypic composition. Additionally, the surprising genomic similarity of commercial seed sourced from the same region is evidence for a homogenizing factor either during seed collection or cultivation. High similarity across commercial seed inputs is at odds with the goal of maximizing genetic diversity while maintaining local adaptation and has the potential to reduce the efficacy of restoration in the short and long term (McKay et al., 2005). Given the species-specific evolutionary consequences of cultivation (Nagel et al., 2019), it is also possible that other seed inputs which are less buffered against the genomic effects of selection, due to their life history or mating strategies, will exhibit increased differentiation from wild populations during commercial production (Ballesteros-Mejia et al., 2016;Hamrick et al., 1979). Additional study evaluating the trait variation and contribution of H. maximiliani to ecosystem services between wild and commercial seed collected across varied restored habitats is necessary. Furthermore, to fully integrate the consequences of our study for restoration, similar work comparing plant species commonly used in restoration will be important for generalizing these results. Until this work can be performed, increased collaboration between producers and users of commercial seed is needed to better understand the effects of provenance, individual methods of harvest, and cultivation on seed material needed to best meet restoration goals .