Using genomic information for management planning of an endangered perennial, Viola uliginosa

Abstract Species occupying habitats subjected to frequent natural and/or anthropogenic changes are a challenge for conservation management. We studied one such species, Viola uliginosa, an endangered perennial wetland species typically inhabiting sporadically flooded meadows alongside rivers/lakes. In order to estimate genomic diversity, population structure, and history, we sampled five sites in Finland, three in Estonia, and one each in Slovenia, Belarus, and Poland using genomic SNP data with double‐digest restriction site‐associated DNA sequencing (ddRAD‐seq). We found monophyletic populations, high levels of inbreeding (mean population F SNP = 0.407–0.945), low effective population sizes (N e = 0.8–50.9), indications of past demographic expansion, and rare long‐distance dispersal. Our results are important in implementing conservation strategies for V. uliginosa, which should include founding of seed banks, ex situ cultivations, and reintroductions with individuals of proper origin, combined with continuous population monitoring and habitat management.

newly founded populations and their source population. This was illustrated in the annual jewelweed (Impatiens capensis), which showed strong genetic structuring along bodies of water, due mainly to genetic drift (Toczydlowski & Waller, 2019). Additionally, differential selection pressures can increase divergence between the populations at the newly occupied and the source sites, if the newly occupied site stays occupied long enough for natural selection to operate. For instance, the endangered Spanish catchfly (Silene otites), in situ, and their respective ex situ populations became highly genetically differentiated within a few decades, partly attributed not only to the effect of genetic drift, but also to unintended selection during cultivation (Lauterbach, Burkart, & Gemeinholzer, 2012).
Genetic diversity, therefore, tends to be lost during founding of new populations. Such cases have been reported in an isolated population of alder buckthorn (Frangula alnus) in Ireland (Finlay, Bradley, Preston, & Provan, 2017) and in invasive populations of Carolina geranium (Geranium carolinianum) in China (Shirk, Hamrick, Zhang, & Qiang, 2014). Moreover, small populations commonly have lower genetic diversity than large populations, clearly demonstrated in a survey of 247 plant species, where rare and common species were compared at a generic level using allozymes and the rare species had significantly lower genetic diversity than their common counterparts (Cole, 2003). Further, small populations are prone to increased homozygosity, inbreeding, and inbreeding depression, leading to reduced fitness of highly related individuals (Charlesworth & Willis, 2009), which can threaten their viability (Frankham, 2005).
The above features pose challenges for conservation management. If a habitat for a population is lost for a period of time, a population could be reintroduced to the site provided that the habitat once again becomes favorable and a proper source population exists. When an imminent threat to an occupied habitat exists, the population could be relocated to another suitable site. Identification of proper source populations or introduction sites requires understanding of genetic structure and diversity of the species at the range in question, in addition to understanding the evolutionary processes underlying the observed genetic structure and diversity.
Here, we studied endangered populations of Viola uliginosa, which are at the northern edge of the species' range in Finland, to clarify the amount of genetic diversity, inbreeding, and population structure. Our goal was to define whether and which populations could be used as source populations for reintroductions to sites from which the species became locally extinct, but are still suitable for V. uliginosa. In addition, we sampled nearby populations from Estonia and distant populations at the center and western edge of the distribution range to infer the origin and phylogenetic position of the Finnish populations.

| Study species and sampling
Viola uliginosa Besser is a perennial wetland species that inhabits typically rich, flooded meadows along rivers/lakes. The stronghold for the species distribution is in Eastern Europe, with sporadic sites at the western and northern edges of the main distribution ( Figure 1). Viola uliginosa is listed as endangered in Finland , existing at present in only six sites. It has been monitored for 30 years in an attempt to reduce further loss (Siitonen, 1990). As it occupies rare eutrophic swamp forests and flooded meadows, it can be considered as an indicator species for these habitats (Kuris & Ruskule, 2006). The species also has been classified as near-threatened, vulnerable, endangered, or critically endangered in many other parts of its range, primarily because of the habitat type (e.g., temporal flooding mostly at springtime), habitat loss, and human disturbance (Ranta, Jokinen, & Laaka-Lindberg, 2016).
Viola uliginosa reproduces primarily by clonal propagation using subterranean stems (Cieślak, Paul, & Ronikier, 2006;Paul et al., 2016), but it also forms seeds that are able to exist in a seed bank and can be viable for years (Ranta & Siitonen, 2011 (Małobęcki et al., 2016). Pollination in this species has not been studied in detail, but cross-pollination occurs by several species of bees, hoverflies, and flies, as in other Viola species, in addition to self-pollination (Beattie, 1971). Even though reproduction is thought to be mostly clonal, seeds have a high germination capacity, putatively allowing rapid adaptation to changes in the habitat (Ranta et al., 2016). Local-level seed dispersal by snails and ants has been observed to occur, and long-distance dispersal can occur by floating in water currents (P. Ranta, personal communication).
We focused our sampling on extant Finnish populations of V. uliginosa, which constitute the northern edge of its global distribution. For inferring the history of the Finnish populations, we also collected samples from Estonia (isolated by the Gulf of Finland) and Belarus, both of which form the species' center of distribution and from Poland and Slovenia, which form the southern and western edge populations, respectively. Thus, our sampling provides a latitudinal perspective. We collected one or two leaves per individual from each sampling site during summer 2016 and 2017 (with all relevant permits), with a special effort to sample individuals as distantly from each other as possible from each site. We sampled five sites in Finland (Kökar,Åland N = 30,Hanko N = 30,Vihti N = 30,Sastamala N = 20,Tohmajärvi N = 30), three in Estonia (Maatsalu N = 21, Vormsi, two populations N = 10 + 20, Ridala N = 21), one in Slovenia (Ljubljansko Barje N = 30), one in Belarus (Iljinka, consisting of two adjacent sites, N = 10 + 30), and one in Poland (Lipa, consisting of three adjacent sites, N = 10 in all) (Figure 1). The sixth known Finnish site in Mäntsälä was rapidly declining at the time of sampling, thus was not sampled (see Table S1 for more detailed sampling information  Table S1).

| ddRAD library preparation and sequencing
The quantity of genomic DNA (gDNA) was determined with the PicoGreen Kit (Molecular Probes, Eugene, OR, USA). The ddRADseq library was implemented following protocols described in Peterson, Weber, Kay, Fisher, and Hoekstra (2012) and Lee et al. (2018) with the following modifications. Briefly, gDNA was digested at 37°C for 3 hr using the restriction enzymes PstI and  Ranta et al. (2016) with slight modifications according to Matulevičiūté (2015)

| Bioinformatics
Raw paired-end reads were demultiplexed with no mismatches tolerated using their unique barcode and adapter sequences with ipyrad v.0.7.23 (Eaton & Overcast, 2016). The quality of raw demultiplexed reads was checked with FastQC software (available at http://www. bionf ormat ics.babra ham.ac.uk/proje cts/fastq/ ). The demultiplexed paired-end reads were run through PEAR (Zhang, Kobert, Flouri, & Stamatakis, 2014) using default setting to merge overlapping reads, and input into the ipyrad pipeline. All ipyrad defaults were used, with the following exceptions: The minimum depth at which majority rule base calls are made was set to 3, the cluster threshold was set to 0.90, the minimum number of samples that must have data at a given locus for it to be retained was set to 48, 70, and 95, and the assembly method was set to "denovo" and "reference" for independent testing. Consensus sequences that had low coverage (<6 reads), excessively undetermined, or characterized with many heterozygous sites (>8), potentially resulting from paralogs or highly repetitive genomic regions, were discarded. Additionally, we excluded all loci with excessive (>50% of samples) shared polymorphic sites as they likely represented clustering of paralogs. Up to four shared polymorphic sites per called locus were allowed to accommodate polyploid genomes. This was done, because tetraploid marker data can be indistinguishable from diploid data (Gompert & Mock, 2017) and the species has been reported to produce polyploid individuals in a micropropagation experiment (Slazak et al., 2015); thus to our knowledge, polyploidy has not been observed in the wild. As there was no evidence of polyploidy, all samples were treated as diploids in the following analyses, thus allowing two haplotypes per polymorphic site. In the "denovo" assembly, sequences were assembled without any reference genome with homology inferred during alignment clustering by sequence similarity using the program vsearch (http://github.com/torog nes/vsearch). In the "reference" assembly, the sequences were mapped to the whole genome of Viola pubescens (GenBank, GCA_002752925) using BWA with default bwa-mem setting (Li, 2013) based on 90% of sequence similarity.

| Genetic differences among populations
The ddrad_m95 dataset (Table 1) was used for population analysis.
Patterns of genetic diversity among populations were examined using Arlequin v.3.5.1 (Excoffier & Lischer, 2010 SNP-based inbreeding coefficients were calculated using vcftools (Danecek et al., 2011). Vcftools calculates the inbreeding coefficient O is the observed number of homozygotes, E is the expected number of homozygotes (given population allele frequency), and N is the total number of genotyped loci. Boxplots between populations and F SNP values were executed with R v.3.5.2 (R Core Team, 2015) and graphically represented using the packages corrplot (Wei, 2013) and ggplot2 (Wickham, 2009).

| Clustering analysis
Population clustering with admixture from SNP frequency data was inferred to better visualize genomic variation between individuals with STRUCTURE v.2.3.1 (Pritchard, Stephens, & Donnelly, 2000).
In this analysis, 1,958 putatively unlinked SNPs were identified by selecting a single SNP from each locus in the "ddrad_m95" data matrix (Table S1). Ten replicates were run with K = 1-11. Each run had a burn-in of 50K generations followed by 500K generations of sampling. Replicates were permuted using CLUMPP (Jakobsson & Rosenberg, 2007) and the optimal K value was inferred using StructureHarvester (Earl & VonHoldt, 2012) according to the ad hoc ΔK statistics (Evanno, Regnaut, & Goudet, 2005), which is the second-order rate of change in the likelihood function. Structure results were visualized using DISTRUCT (Rosenberg, 2004).
The package FineRADstructure was also used to investigate the genetic structure in V. uliginosa (Malinsky, Trucchi, Lawson, & Falush, 2018). The package includes RADpainter, a program designed to infer the co-ancestry matrix and estimate the number of populations within the dataset. The input file used was an allele.loci matrix ("ddrad_m48" = 15% of missing data) generated with ipyrad program. The allele data were converted using a python script (available at http://github.com/ edgar domor tiz/fineR ADstr ucture-tools ; last accessed 21 June 2019).
Samples were assigned to populations using 100,000 iterations as burn-in prior to sampling with 100,000 iterations. The trees were constructed using 10,000 iterations and the output visualized using the fineradstructureplot.r and finestructurelibrary.r R scripts (http://cichl id.gurdon.cam.ac.uk/fineR ADstr ucture.html).

| RE SULTS
We obtained over 2,000,000-bp sequence and 31,000 SNPs for the ddrad_m48 dataset and over 430,000-bp sequence and over 4,100 SNPs for the ddrad_m95 dataset with only 15% and 2.0% of missing data, respectively. There were 8,770 parsimony-informative sites in ddrad_m48 data and 1,713 in the ddrad_m95 data. The alignment with the reference sequence exceeded 2,500,000 bp and 51,400 SNPs (Table 1).
The three sites with lowest diversity values were all from Finland (Vihti, Kökar, and Hanko) ( Table 2) Table 3).
The maximum-likelihood tree (Figure 3 and Figure S2) showed that each population is monophyletic with high support and that  Note: n = number of individuals, S = number of polymorphic sites, π = nucleotide diversity, HRag = raggedness statistic r, τ = 2 µt, where µ is the mutation rate and t is time in generations, θ 0 = theta before population size change, θ 1 = theta after population size change.

TA B L E 3
Average linkage disequilibrium (r 2 ), effective population sizes (N e ) with 95% confidence intervals, and average inbreeding coefficients (F SNP ) with standard deviations in brackets of Viola uliginosa populations. Note that infinite estimates of the effective population size are likely due to a correction for the sample size that is larger than r 2 rather than a truly infinite population size ( Note: Infinite N e estimates occur when genetic variation is high and the sampling error among individuals is stronger than the signal of genetic drift from a finite number of parents (Do et al., 2014), thus infinite ≠ infinitely large population.   (Table 2). This was further supported by the mismatch distributions constructed from all the data and from the Finnish and Estonian populations ( Figure S5). The best scenario according to the DIYABC was scenario 2 in Figure 2 (simultaneous divergence of first Estonian and then Finnish populations) that received 100% support against the one constructed using the branching order of the phylogenetic tree. Based on the principal component analysis, the observed data were located among the data simulated according to scenario 2 and distant from data simulated according to scenario 1 (see Figure S6). Divergence times were given in generations, and as there were no estimates for a generation time
Thus, the genus Viola in general has wide variation in genetic diversity, likely stemming from different levels of sexual reproduction, varying population sizes, generation lengths, and ages of species and populations, and in addition to the strength and type of selection, the species and/or populations are facing. Genetic diversity of V. uliginosa in Poland suggested high genetic uniformity over the study populations, when AFLP markers were applied. Similar to our results, each individual possessed an own genotype profile although individuals from a given subpopulation were very alike, with similarity coefficients typically of 0.94-0.99 (Cieślak et al., 2006).
Individual inbreeding coefficients in our study populations were high, with 68 individuals from the 95 studied having values above 0.500. Similar high inbreeding coefficients have been observed in the endangered seaside violet V. grayi (Hirai et al., 2012). In addition, all effective population sizes for V. uliginosa were < 51, supporting a reproductive strategy favoring clonal propagation. The northernmost population in Finland had the highest inbreeding coefficient and the N e estimate for one population was < 1, possibly due to a higher proportion of clonality. It has been commonly suggested that ecologically marginal populations suffer from small effective population sizes, because of fragmentation and lack of gene flow due to lack of suitable habitat (e.g., Caughley, Grice, Barker, & Brown, 1988;Young, Boyle, & Brown, 1996). The small N e is manifested by an increase in self-pollination, inbreeding, and clonality in plants at the edge populations when compared to central populations (Arnaudhaond et al., 2006;Beatty, McEvoy, Sweeney, & Provan, 2008;Eckert, 2002;Silvertown, 2008). Our data support the abundant center hypothesis, which predicts that genetic diversity is lowest at the species margin because biological tolerances of a species reach their limit and/or due to repeated founder events during postglacial colonization (Brown, 1984;Petit et al., 2003;Sagarin & Gaines, 2002). However, the hypothesis is subject to further evaluation, as a meta-analysis reported a decline in genetic diversity toward range margins in only 64.2% of the reviewed 134 studies (Eckert et al., 2010 (Ranta et al., 2016) supporting the potential importance of sexual reproduction. Indeed, the species was reintroduced in 2014 at one site after 40 years of absence and was still flowering in 2017 (Hyvärinen, 2018;Ranta, 2014). The reintroduction used seeds collected from a private garden with plants originating from almost the same site, which had been destroyed due to construction work in 1975 (this population was not sampled for this study, Kulmala, Ryttäri, & Laaka-Lindberg, 2016;Ranta, 2014). Thus, although sexual reproduction may not be common, it can play an important role in genetic diversity of V. uliginosa populations, especially at the northern edge of the species' range.

| Population history and the origin of the Finnish populations
The phylogenetic tree constructed using the maximum-likelihood

ACK N OWLED G M ENTS
We are grateful for the invaluable help from Laura Törmälä for assistance in the laboratory. The authors also wish to acknowledge