A rangewide herbarium‐derived dataset indicates high levels of gene flow in black cherry (Prunus serotina)

Abstract Isolation by Distance (IBD) is a genetic pattern in which populations geographically closer to one another are more genetically similar to each other than populations which are farther apart. Black cherry (Prunus serotina Ehrh.) (Rosaceae) is a forest tree species widespread in eastern North America, and found sporadically in the southwestern United States, Mexico, and Guatemala. IBD has been studied in relatively few North American plant taxa, and no study has rigorously sampled across the range of such a widespread species. In this study, IBD and overall genetic structure were assessed in eastern black cherry (P. serotina Ehrh. var. serotina), the widespread variety of eastern North America. Eastern North America. Prunus serotina Ehrh. var. serotina (Rosaceae). Dense sampling across the entire range of eastern black cherry was made possible by genotyping 15 microsatellite loci in 439 herbarium samples from all portions of the range. Mantel tests and STRUCTURE analyses were performed to evaluate the hypothesis of IBD and genetic structure. Mantel tests demonstrated significant but weak IBD, while STRUCTURE analyses revealed no clear geographic pattern of genetic groups. The modest geographic/genetic structure across the eastern black cherry range suggests widespread gene flow in this taxon. This is consistent with P. serotina's status as a disturbance‐associated species. Further studies should similarly evaluate IBD in species characteristic of low‐disturbance forests.

analyzed large sets of samples, these are often drawn from a relatively small number (<100) of locations (Campitelli & Stinchcombe, 2013;Griffin & Barrett, 2004;Hoban et al., 2010;Mylecraine, Kuser, Smouse, & Zimmermann, 2004;Parker, Hamrick, Parker, & Stacy, 1997;Waselkov & Olsen, 2014) and/or from a subset of the total species range (Dennhardt, DeKeyser, Tennefos, & Travers, 2016;Hadziabdic, 2010;Lloyd, Roche, & Roberts, 2011;Yakimowski & Eckert, 2008). This lack of sampling is due to feasibility, as field collecting material across large spatial scales is an expensive and extremely time-consuming task (planning routes, obtaining permits, fieldwork, preparing voucher specimens, permit reporting). Field collecting is therefore often done in a highly non-random manner, with sampling confined to a few groups of geographically clustered populations. As a result, our current understanding of the geographic structure of genetic variation in widespread North American plants is based on uneven sampling designs.

found no signal in North American
Arabidopsis thaliana (L.) Heynh. (p = 0.99) with amplified fragment length polymorphisms (AFLPs), and Kirk, Paul, Straka, and Freeland (2011) and Stabile et al. (2016) found no signal of IBD in North American Phragmites australis (Cav.) Trin. ex Steud. (p = 0.979 and 0.62, respectively) using microsatellite markers. Durka, Bossdorf, Prati, and Auge (2005) did find a significant but weak signal of IBD in Alliaria petiolata Cavara & Grande (p = 0.002, R 2 = 0.043) with microsatellite loci. This general lack of IBD in exotic species is not surprising given the potential for multiple introductions of genetically variable material to the non-native range, and the propensity for widespread non-native species to exhibit frequent long-range gene flow.
Previous research therefore suggests that IBD is to be expected in native North American plants, and this study aims to test this basic hypothesis of genetic structure in eastern black cherry (Prunus serotina Ehrh. var. serotina), a widespread eastern North American forest tree that is important ecologically as a wildlife food source (Thompson & Willson, 1979) and as timber (Auclair & Cottam, 1971).
Each year over 7 million board feet of black cherry are harvested (Howard & Westby, 2013), largely for use as a veneer in furniture making (Gatchell, 1971 McVaugh from Texas to Guanajuato (McVaugh, 1951).
As described above, obtaining a geographically representative set of eastern black cherry samples would not be feasible through traditional fieldwork. In this study, we bypass this time limitation through the extensive use of museum tissue. Black cherry is commonly collected, and thousands of specimens are archived in North American herbaria. Specimen locality data allow for the rapid choice of a geographically representative set of specimens from which a genetic dataset can be obtained.

| Obtaining samples
Our group personally visited 21 herbaria (obtaining 373 specimens) and requested tissue from 13 herbaria (131 specimens) in 2014, 2015, and 2016 (see Acknowledgments and Table S1 in Appendix S1). Following visits to large national institutions, specific regional and local collections were targeted in order to address geographic gaps in our sample set. We attempted to choose one specimen per county, generally avoiding adjacent counties. All identifications were verified, with a positive identification for P. serotina var. serotina involving two or more of the following characteristics when possible: small orange hairs at the base of the midrib, (hooked) serrated leaf margins, leaves more than 2.5× as long as wide, and persistent sepals when in fruit. One or two small leaves were removed when sufficient material was present, and labels noting that material was removed for DNA extraction were affixed to all sampled sheets. All specimens were georeferenced with Google Earth (Google Inc, 2009) and EarthPoint (Clark, 2015) software. All specimen locality data can be found in Dryad (https://doi.org/10.5061/dryad.r1f08nt).
Amplicons were sized at the University of Chicago Comprehensive Cancer Center DNA Sequencing and Genotyping Facility, and alleles were scored with GeneMarker 1.9 (SoftGenetics, 2012). As a quality control measure, 20% of samples were both re-extracted and re-amplified, and an additional 14% were re-amplified from existing extractions. All genotypes included in the study can be found in Dryad (https://doi.org/10.5061/dryad.r1f08nt).

| Genotyping success assessment
To determine if specimen age or curatorial conditions affected genotyping success, we explored the relationship between the number of loci genotyped with both age of the specimen and its herbarium of origin. A generalized linear model implemented in R (R Core Team, 2015) was used. A generalized linear model ("glm" function) was constructed with the "poisson" distribution argument due to the non-normal distribution of our dataset. A likelihood ratio test was then used to evaluate the fit of nested statistical models ("drop1" function with the "Chisq" test argument) to discover any significant relationship between specimen age, herbarium, and number of loci genotyped.

| Mantel tests and structure analyses
Isolation by Distance was evaluated using Mantel tests (Mantel, 1967), which assessed the overall correlation between matrices of genetic and geographic distances among samples. Mantel  the "≥12-locus dataset" was chosen for the remainder of our analyses to strike a balance between missing data and sample number.
Mantel tests on subsets of the ≥12-locus dataset to rule out effects of age and quality were also conducted. These subsets included an "old" dataset (individuals collected >33 years ago), a "young" dataset (individuals collected <33 years ago), and a "quality" dataset (individuals collected from relatively ecologically intact locations in which disturbance was minimal). "Quality" locations were those with relatively specific location information which did not note obviously anthropogenically altered sites (roadsides, agricultural landscapes, etc.). A Mantel correlogram analysis (Oden & Sokal, 1986), which seeks to find a correlation between genetic and geographic distances between samples separated by given geographic distance classes, was also conducted. at K = 2 to K = 10, with each iteration featuring 100,000 burnin generations followed by 500,000 data collection generations. All STRUCTURE analyses assumed the admixture and independent allele frequency models. The approach outlined in Evanno, Regnaut, and Goudet (2005) and implemented in Structure Harvester (Earl & vonHoldt, 2012) was used to find the most likely value of K.

Genotyped Loci
suggests that IBD has not changed qualitatively in this species over the past century, and that the modest signal we observed is not an artifact of comparing samples representing different collection periods. Similarly, the signal observed with the "quality" ≥12-locus dataset suggests that low observed IBD is not a product of over-sampling individuals from anthropogenic sites. Herbarium collections are strongly biased toward roadsides and other human-accessible areas (Daru et al., 2018), a fact that should be considered when choosing specimens for landscape genetic study. Consistent with this modest relationship between genetic similarity and geographic distance, STRUCTURE suggests that few samples are strongly assigned to any of the most optimal set of genetic groups. It is important to emphasize that our dataset is different from most studies in that we utilized a "many populations, one individual per population" (inter-individual) strategy, as opposed to a one featuring a "few populations, many The apparently powerful homogenizing force of gene flow across P. serotina var. serotina's range is consistent with its life history. The species is fast-growing (Auclair, 1975), pollinated by generalist insects (Fortuna, García, Guimarães, & Bascompte, 2008;Guitian, Guitian, & Sanchez, 1993;Jacobs et al., 2009;Lander et al., 2013), and has seeds that are potentially dispersed via ornithochory (Marks, 1974;Thompson & Willson, 1979). It also is often found in ruderal habitats, many of which are likely dispersal corridors (road margins, fencerows, railroads). Indeed, P. serotina var. serotina has become a highly problematic invasive species in Europe (Pairon et al., 2010). The Floristic Quality Assessment (FQA) method incorporates a coefficient of conservatism assigned to each species based on its tolerance of anthropogenic disturbance, with these "C-values" ranging from 0 (high tolerance) to 10 (low tolerance) (Freyman, Masters, & Packard, 2016 Juglans cinerea, C = 6.6; Cornus florida C = 5.8) (Hadziabdic, 2010;Hoban et al., 2010;Victory et al., 2006).

| Implications and future directions
If the low level of IBD observed here indeed results from eastern black cherry's life history, this result requires context before any generalization is made regarding widespread North American forest trees. Fortunately, similar rangewide inter-individual studies of widespread woody plants typical of higher quality habitats could be easily undertaken given the efficiency of the herbarium sampling strategy and the availability of microsatellite markers.
For example, a set of loci developed for Liriodendron tulipifera L.

CO N FLI C T O F I NTE R E S T
The authors have identified no potential conflict of interests.

AUTH O R CO NTR I B UTI O N S
J.B. and J.S. conceived the ideas; L.K. and J.B. collected the data; L.K. and J.B. analyzed the data; and L.K. and J.B. led the writing.

DATA ACCE SS I B I LIT Y
All microsatellite genotypes and specimen locality data can be found in Dryad: Genotypes and Locality: https://doi.org/10.5061/dryad.