Demographic history and gene flow in the peatmosses Sphagnum recurvum and Sphagnum flexuosum (Bryophyta: Sphagnaceae)

Abstract Population size changes and gene flow are processes that can have significant impacts on evolution. The aim of this study was to investigate the relationship of geography to patterns of gene flow and population size changes in a pair of closely related Sphagnum (peatmoss) species: S. recurvum and S. flexuosum. Both species occur in eastern North America, and S. flexuosum also occurs in Europe. Genetic data from restriction‐site‐associated DNA sequencing (RAD‐seq) were used in this study. Analyses of gene flow were accomplished using coalescent simulations of site frequency spectra (SFSs). Signatures of gene flow were confirmed by f 4 statistics. For S. flexuosum, genetic diversity of plants in glaciated areas appeared to be lower than that in unglaciated areas, suggesting that glaciation can have an impact on effective population sizes. There is asymmetric gene flow from eastern North America to Europe, suggesting that Europe might have been colonized by plants from eastern North America after the last glacial maximum. The rate of gene flow between S. flexuosum and S. recurvum is lower than that between geographically disjunct S. flexuosum populations. The rate of gene flow between species is higher among sympatric plants of the two species than between currently allopatric S. flexuosum populations. There was also gene flow from S. recurvum to the ancestor S. flexuosum on both continents which occurred through secondary contact. These results illustrate a complex history of interspecific gene flow between S. flexuosum and S. recurvum, which occurred in at least two phases: between ancestral populations after secondary contact and between currently sympatric plants.

Gene flow, the movement of genetic material between individuals from differentiated populations, can act as a homogenizing force among partially divergent species. In some instances, however, gene flow can augment genetic diversity within populations and provide potentially adaptive alleles or even promote the speciation process (Abbott et al., 2013;Morjan & Rieseberg, 2004;Richards & Martin, 2017;Slatkin, 1987;Suarez-Gonzalez et al., 2018). Rates of gene flow can obviously be affected by physical distances; proximate individuals are more likely to exchange genetic material than more distant ones. Gene flow can occur between populations within the same species (intraspecific gene flow) and between populations of different species (interspecific gene flow). Since individuals from different species usually have some degree of reproductive isolation, interspecific gene flow in general should occur at lower levels than intraspecific gene flow (Edelman & Mallet, 2021;Ellstrand, 2014).
Changes in population size through time can also influence the genetic makeup of populations. For example, a population expansion can create an excess of rare alleles that can mimic signatures of selection. Population sizes can be influenced by changes in environmental conditions that cause populations to contract or expand (Nielsen et al., 2009). Major environmental changes such as glaciation can have profound effects on population sizes (Abbott & Brochmann, 2003). Founder events following dispersal or range expansions can also significantly impact the genetic makeup of populations (Hewitt, 1996).
Developments in sequencing technologies have made genomescale data in non-model organisms much easier to acquire. Such large datasets allow for the analyses of complex evolutionary models (Ekblom & Galindo, 2011). Statistical methods have been developed to compare demographic models that include different effective population sizes through time and variable gene flow histories (Beichman et al., 2018;Excoffier et al., 2013). This can be instrumental in understanding the speciation process of closely related species, which involves a complex interaction of population divergence, population size changes, and gene flow.
Sphagnum is of unparalleled ecological importance because some 25%-30% of the entire terrestrial pool of carbon is estimated to be bound up in partially decomposed peat within Sphagnum-dominated peatlands (Gorham, 1991;Yu, 2011). Thus, understanding evolutionary and ecological processes in Sphagnum has profound implications for biogeochemistry and the control of global climate (Weston et al., 2018).
There are around 300-500 species of Sphagnum worldwide, and although the Sphagnum clade is hundreds of millions of years old, most extant species seem to have emerged through relatively recent diversification during the last 10-15 million years . Sphagnum is capable of long-distance dispersal via either spores or vegetative fragments (Sundberg, 2013). Many species have intercontinental ranges, with some degree of population structure across their geographic ranges (Kyrkjeeide et al., 2016).
Long-distance dispersal allows populations in different geographic regions, even between continents, to remain connected by gene flow (Shaw et al., 2015;Stenøien et al., 2011). Multiple species of Sphagnum often occupy the same habitat, usually by specializing in different microhabitats. In fact, many sites have ten or more sympatric species. Different Sphagnum species in the same habitat can hybridize, at least occasionally (Cronberg, 1998;Cronberg & Natcheva, 2002). In addition to hybridization occurring in current populations, recent analyses using genomic data showed signatures of ancient introgressions between Sphagnum species (Meleshko et al., 2021). Many Sphagnum species occur in northern areas that were covered by ice during the last glacial maximum (LGM) and have experienced significant shifts in geographical range during the recent past (Abbott & Brochmann, 2003;Gignac et al., 2000). These attributes, interspecific gene flow, recent range changes, and the potential for long distance dispersal can make the demographic history of Sphagnum species very complex. Moreover, the broad intercontinental geographic ranges of individual Sphagnum species add a layer of potential demographic and evolutionary complexity compared to most seed plant species that have much more restricted geographic ranges (Frahm & Vitt, 1993;Qian, 1999). Understanding patterns of gene flow and population size changes in closely related Sphagnum species is required to fully understand speciation processes and Sphagnum diversification.
These species are members of the so-called S. recurvum complex (Flatberg, 1992), which is part of the subgenus Cuspidata.
The geographic distributions of S. recurvum and S. flexuosum provide a natural experiment for testing factors that impact patterns of interspecific gene flow, intraspecific gene flow between continents, and population size changes in these closely related species. Plant communities in Europe and eastern North America have been affected differently during the LGM. Europe suffered more diversity lost during the LGM (Adams & Woodward, 1989;Svenning, 2003). Fossil records have shown that there are many woody plant genera that existed in Europe during the Upper Tertiary (25-2 Mya) but now persist only in eastern North America and Asia (Adams & Woodward, 1989). One explanation for this pattern is that with the Appalachian Mountains oriented in a north-south direction, plants in eastern North America were able to freely migrate during cold periods of the Pleistocene, whereas plants in Europe were more likely blocked by the east-west orienting Alps (Hewitt, 1996;Soltis et al., 2006). Another explanation for greater diversity loss in Europe during the LGM is that southern refugia in Europe had dry climates that could not support many mesic temperate plants (Svenning, 2003). Most of the mesic temperate tree species in Europe that survived the LGM were restricted to only the Mediterranean and Black Sea regions (Svenning et al., 2008). Since Opportunities for interspecific gene flow between S. recurvum and S. flexuosum were likely impacted by their intercontinental ranges. Hybridization between the species is obviously more likely between plants currently growing on the same continent, but intercontinental migration within these spore-reproducing plants makes it possible that plants now disjunct across the Atlantic Ocean could bear signatures of gene flow as well (Shaw et al., 2014;Stenøien et al., 2011). There are several possibilities for interspecific gene flow between S. flexuosum and S. recurvum: between presently allopatric plants (i.e., eastern North American S. recurvum and European S. flexuosum), between presently sympatric plants (S. recurvum and eastern North American S. flexuosum), or between plants ancestral to current population systems (S. recurvum and the ancestor of both S. flexuosum populations).
The goals of this study were to answer the following questions.
(1) are eastern North American versus European metapopulation systems within S. flexuosum connected by intraspecific gene flow? If so, is the rate of gene flow symmetrical between the two continents?
(2) Is the rate of interspecific gene flow between S. flexuosum and S. recurvum higher between plants currently sympatric on the same continent than between plants currently separated on different continents? (3) Is there evidence of gene flow between S. recurvum and those ancestral to the currently disjunct populations within S. flexuosum? And if so, was that gene flow limited to the period during and after speciation, did it occur after secondary contact, or was it continuous? (4) Is genetic diversity in S. flexuosum lower in glaciated than unglaciated areas of eastern North America and Europe? 2 | ME THODS

| Taxon sampling
Restriction-site-associated DNA sequencing (RAD-seq) raw reads from Duffy et al. (2020) were used in this study. For DNA-extraction, library preparation, sequencing protocols, and data availability, see Duffy et al. (2020). A total of 60 samples were divided into three groups for the present study: S. recurvum (16 samples), S. flexuosum from eastern North America (28 samples, hereafter "ENA S. flexuosum"), and S. flexuosum from Europe (16 samples, hereafter "EUR S. flexuosum"). All our European samples of S. flexusosum were collected from a relatively small area in Norway, which limits some generalities about the species in "Europe." Recently collected samples from other areas were not available. Nevertheless, the questions we address should be relatively robust to this sampling limitation (see discussion). Figure 2 shows the geographical locations of samples used in this study. In addition to S. flexuosum and S. recurvum samples, one sample of S. cuspidatulum Müll. Hal. and two samples of S. fallax H.
Klinggr. were also included for the introgression analysis. RAD-seq reads for S. fallax samples were obtained from Duffy et al. (2020), while the S. cuspidatulum sample was acquired from in silico digestion of the genomic resequencing sample (see data availability for more information). Specimen voucher information is provided in the appendix (Table A1).

| RAD-seq data assembly
The RAD-seq raw reads were assembled using ipyrad version 0.7.29 (Eaton, 2014) with default parameters except noted here. The reads were aligned to the S. divinum (v1.1) reference genome (https:// phyto zome-next.jgi.doe.gov/), which is an outgroup species relative to S. recurvum and S. flexuosum  in order to infer derived versus ancestral alleles. The samples were treated as haploid. Based on a previous study (Duffy et al., 2020), a read clustering threshold of 0.9 was used to maximize the number of variable sites.
Loci presented in less than 80% of the samples were discarded.

| Genetic diversity and introgression analysis
Within population nucleotide diversity (π), and pairwise F st , genetic distance (D xy ) and number of fixed, shared, and monomorphic sites among ENA S. flexuosum, EUR S. flexuosum, and S. recurvum were calculated by the R package popgenome (Pfeifer et al., 2014).
Nucleotide diversity and genetic distance are defined as the average pairwise nucleotide differences between samples within and between populations, respectively (Nei & Li, 1979). For the analysis comparing genetic diversity among geographic regions within S. flexuosum, two subsets of ENA S. flexuosum samples were created: Maryland (ten samples) and central New York (nine samples). These subsets have similar distributional ranges to the Norwegian (EUR) collections. The samples from Europe and central New York represent glaciated regions, and samples from Maryland come from an unglaciated region. Our sampling is insufficient to confirm any relationship between glacial history and genetic diversity but can yield a preliminary assessment. Nucleotide diversity was calculated for each group of S. flexuosum samples using the same method as above.
Jackknife resampling was used to calculate the variance of nucleotide diversity estimates; n subsamples of each group was made by excluding one sample from the dataset, where n is the number samples in the group. Statistical differences of nucleotide diversity estimates were analyzed by ANOVA and post-hoc Student's t-test using Bonferroni correction for multiple comparisons. Additional samples of ENA S. flexuosum samples were excluded from these geographic comparisons so we could use samples from comparable areas, but these were included in other analyses.
Two ABBA/BABA site pattern statistics were calculated using the program Dsuite (Malinsky et al., 2021) to detect signatures of introgression: Patterson's D statistics (Green et al., 2010) and f 4 ratios (Patterson et al., 2012). In the introgression analyses, outgroup samples of S. cuspidatulum and S. fallax were also included. Sphagnum cuspidatulum is a tropical species from Southeast Asia (Eddy, 1977).
Phylogenetic analyses have shown that S. cuspidatulum is strongly supported as sister to S. recurvum (unpublished data). Including this species in the analyses allows for an inference about introgression between S. recurvum and the ancestor of S. flexuosum Europe and eastern North America. Sphagnum fallax was also included as an outgroup because it is one of the closest relatives to the "S. flexuosum + S. recurvum" clade (Duffy et al., 2020).
This method uses site frequency spectra (SFS) as input. Unfolded SFS were calculated using easySFS pipeline (https://github.com/ isaac overc ast/easySFS). It is possible to calculate unfolded SFS in this case because the RAD-seq reads were aligned to an outgroup reference genome, thus retaining information about derived and ancestral alleles. Since SFS requires each site to have no missing data, the easySFS pipeline allows SNPs to be subsampled from the dataset. This reduces the number of samples but increases the number of SNPs with no missing data. The populations were subsampled as F I G U R E 2 Geographic locations of S. recurvum and S. flexuosum samples used in this study. First, demographic parameters were inferred from the SFS containing all SNPs. According to Excoffier et al. (2013), the use of linked SNPs should not bias demographic parameter estimation and can help increase the amount of information for parameter inference. At least 100 independent runs were performed for each model. In each run, the expected SFSs were generated from 50,000 simulations, and the demographic parameters were optimized in 40 ECM cycles.
In the second set of analyses, the best demographic parameters for each model were used to compute the approximate likelihood based on SFS containing only one SNP per RAD-seq locus. In this case, the expected SFS was generated from 10 million simulations to increase the accuracy of the approximate likelihood. This approximate likelihood was then used to calculate an Akaike information criterion (AIC) for the model.
Confidence intervals of demographic parameters in the best model were obtained from parametric bootstrap. Demographic parameters in the best model were used to simulate 100 independent SFSs. For each of the simulated SFS, ten independent runs were performed using 50,000 simulations and 40 ECM cycles. Demographic parameters from the best run of each simulated SFS were then combined to calculate confidence intervals.

| RAD-seq reads assembly
The total number of raw reads from 60 samples was 90,196,383, ranging from 375,249 to 2,594,985 reads per sample (median ± SD = 1,565,588.5 ± 572,048.9). The assembly pipeline yielded 14,874 loci that are present in more than 80% of the samples, and 13,756 of those contained one or more SNPs. The mean SNP coverage was 75.12%.

TA B L E 1
Nucleotide diversity within populations (π) and pairwise comparisons of the fixation index (F st ), genetic distance (D xy ), shared polymorphic sites, and fixed differences between S. recurvum and populations within S. flexuosum   (Table 1).

| Introgession estimates: ABBA/BABA site patterns analyses
In all four species trios (Table 3)

| Demographic history
Of 33 demographic models tested, the best model was "model 10 with secondary contact" (Figure 4) flexuosum has the smallest effective population size, followed by ENA S. flexuosum, and then S. recurvum with the largest. Figure 5 shows variation in demographic parameter estimates from parametric bootstrap. Tables B1 and B2 show demographic parameters, approximate likelihoods, and AIC values for all 33 demographic models tested. Table B2 provides 95% confidence intervals for demographic parameters in "full migration", "full migration with secondary contact", "model 10", and "model 10 with secondary contact" models based on parametric bootstrap. Figures B2-B5 show boxplots of demographic parameters estimates for the models included in Table B2. TA B L E 3 ABBA/BABA site pattern statistics. Z-score and p-value of D statistics were computed by jackknife. Significant p-value (<.0125) suggests the presence of introgression.

| Gene flow between S. flexuosum and S. recurvum
There is much evidence that hybridization is widespread in plants and can have significant evolutionary impacts (Rieseberg, 1995;Rieseberg & Carney, 1998;Suarez-Gonzalez et al., 2018). One possible outcome of hybridization is introgression, where hybrids backcross to one or both parental species. After generations of backcrossing, a small genomic fraction can be transferred from one species to another (Abbott et al., 2013;Edelman & Mallet, 2021). In mosses, the initial F1 hybrid is the short-lived sporophyte generation, but meiosis in the sporangia (capsules) of such hybrids yields recombinant haploid gametophytes with allelic representation from the two parental species across loci. There is no (or little) heterozygosity to shield hybridity from natural selection. In Sphagnum, it is common for many species to grow intimately mixed, and demonstrably recombinant individuals have been detected in Sphagnum (Cronberg, 1989;Cronberg, 1998;Cronberg & Natcheva, 2002). Allopolyploid species with diploid gametophytes and tetraploid sporophytes have also been documented in Sphagnum from all over the world (for example, Karlin et al., 2010;Ricca & Shaw, 2010;Såstad et al., 2001), and these provide further evidence that hybridization can and does occur in the genus.

| Relative genetic diversity and intercontinental gene flow in S. flexuosum
The last glacial maximum caused huge changes in species distributions and genetic structure of organisms in the Northern Hemisphere (Abbott & Brochmann, 2003;Hewitt, 2000). Plants in areas previously covered by ice sheets could have survived in local refugia or could have been extirpated and recolonized from another continent after the ice sheet receded. It has been shown that for angiosperms, Europe suffered more diversity losses during the last glacial maximum than did those in North America (Adams & Woodward, 1989;Svenning, 2003). With its intercontinental amphi-Atlantic distribution, and with most of its current distribution located in areas pre-  (Table 1).
However, when considering demographic models with all possible migration events, migration rates and directions between S. flexuosum in Europe and eastern North America are not consistent (Tables B2 and B3). In the "full migration" model, the migration rate from eastern North America to Europe is 20 times higher than the rate from Europe to eastern North America. This pattern corresponds to the best demographic model. However, in the "full migration with secondary contact" model, the pattern is reversed; the migration rate from eastern North America to Europe is approximately half the migration rate from Europe to eastern North America. Nevertheless, Sphagnum species suggest dates on the order of at least 10 million years ago (e.g., Shaw et al., 2010Shaw et al., , 2019. Nevertheless, it seems unlikely that a genetic signature of any bottleneck associated with that ancient speciation process and range expansion persists today. Another important signature of recent population divergence is that there is no fixed difference between S. flexuosum in eastern North America and Europe. Thus, the difference in genetic diversity detected here is more likely to be caused by recent events and possibly associated with glaciation. This scenario corresponds to an earlier work by Ledent et al. (2019) which showed that post-glacial assembly of European bryophytes involves high contribution of migrants from other continents.  (Désamoré et al., 2016). This is also the case in some circumarctic angiosperms (Brochmann & Brysting, 2008). shown that ice-free areas in Europe were drier than in eastern North America, which can produce severe effects on plants that cannot tolerate drought (Svenning, 2003). A study using species distribution modeling on European trees during the LGM has shown that boreal species have existed in northern refugia across the plains of Central and Eastern Europe, while nemoral species were restricted to southern refugia such as the Mediterranean and Black Sea regions (Svenning et al., 2008). Furthermore, a comparison of niche requirements of the relictual and extinct plant taxa in Europe has shown that relictual taxa are more cold and drought-tolerant than the extinct taxa (Svenning, 2003). Studies of genetic diversity of bryophytes within glaciated and unglaciated areas of Europe also yielded similar patterns. Plants of the epiphytic bryophyte Leucodon sciuroides, which relies on host trees, have lower genetic diversity in glaciated areas than unglaciated areas (Cronberg, 2000). On the other hand, the cold-tolerant Hylocomium splendens appears to have a center of genetic diversity in Northern Scandinavia, which was glaciated (Cronberg et al., 1997). Thus, since Sphagnum requires mesic habitats, it is reasonable to expect that Sphagnum in Europe would have been affected by the LGM in ways similar to temperate angiosperms that were less able to tolerate drought.  (Ellegren & Galtier, 2016;Kimura, 1983). Variation in mutation rates can alter the relationship between genetic diversity and effective population size, but since this study focuses on plants from the same species, it can be assumed that the mutation rates are similar in all the groups being compared. There are empirical evidence showing positive correlation between genetic diversity and effective population sizes (Hague & Routman, 2016;Leimu et al., 2006). For some models, there can be a set of parameters that explain the data even better than the best model, but those set of parameters were not evaluated. It is also possible that 100 independent runs per model are not enough to adequately cover the parameter space.

| Limitations associated with the inference of demographic models
This can be problematic if there are multiple demographic models with similar approximate likelihood but have substantially different values of demographic parameters. In this case, it will be difficult to determine the best demographic model. Thus, in addition to the best demographic model reported here, it is prudent to compare parameter estimates of the best model with other models that have similar approximate likelihoods, especially the "full migration model" which contains all demographic parameters.

| CON CLUS IONS
This study supports the interpretation that S. flexuosum in glaciated areas has lower genetic diversity than unglaciated areas, that plants in Europe are derived from eastern North America, and that the population systems disjunct across the Atlantic Ocean are still con-

ACK N OWLED G M ENTS
We would like to thank George Tiley for useful discussions about the inference of demographic models. We would like to thank Rossarin Pollawath for helping with S. cuspidatulum sample collection. The two anonymous reviewers have provided constructive suggestions.
The corresponding author has been supported by the Department of Biology, Duke University, and the Queen Sirikit Scholarship, the Crown Property Bureau, Thailand. Genome assembly and annotation for the S. divinum reference (v1.1) is available at Phytozome (https://phyto zome-next.jgi.doe.gov/) This research was supported by NSF grant DEB-1928514.

DATA AVA I L A B I L I T Y S TAT E M E N T
In silico digested reads from a genomic resequencing sample of S. cuspidatulum (Library IUSS) are available in Dryad (https://doi. org/10.5061/dryad.1c59z w3xc). Demultiplexed Illumina reads from RAD-seq samples of S. flexuosum, S. recurvum, and S. fallax are available in Dryad https://doi.org/10.5061/dryad.1g1jw sts7 (Duffy et al., 2020). For the list of samples used in this study, see Appendix A.

R E FE R E N C E S
A PPEN D I X A TA B L E A 1 Voucher information of the specimens used in this study, for more information see Duffy et al. (2020).

TA B L E A 1 (Continued)
A PPEN D I X B Demographic models tested in this study.

TA B L E B 1
List of all tested demographic models and the inferred parameters, see Figure B1 for the definition of parameter names. Note: Highlighted models were selected for parametric bootstrap (see Table B2; Figures B2-B5) TA B L E B 2 Demographic parameters and confidence intervals of the full migration model and model 10 (the best demographic model) with continuous ancestral migration and secondary contact, see Figure B1 for the definition of parameter names.   Gene Flow Rates, secondary contact Gene Flow events (see Figure B1)