Intercontinental migration pattern and genetic differentiation of arctic‐alpine Rhodiola rosea L.: A chloroplast DNA survey

Abstract Our study describes genetic lineages and historical biogeography of Rhodiola rosea a widely distributed arctic‐alpine perennial species of the Northern Hemisphere based on sequence analysis of six chloroplast regions. Specimens of 44 localities from the Northern Hemisphere have been sequenced and compared with those available in the GenBank. Our results support the migration of the species into Europe via the Central Asian highland corridor, reaching the European Alpine System (EAS) and also the western European edge, the British Isles. The EAS proved to be an important center of genetic diversity, especially the region of the Eastern Alps and the Dolomites where signs of glacial refugia was observed. Apart from those of the EAS, a common lineage was detected along the Atlantic coast from the British Isles toward Scandinavia as well as Iceland and the eastern parts of North America. Accordingly, the British Isles represent a main link between the northern Atlantic and southern EAS lineages.

Engelskjøn, Gielly, Taberlet, & Brochmann, 2005). Moreover, some Scandivanian populations of Dryas octopetala provided evidence for the existence of a contact zone in the area where the colonizing European lineage most probably was intermingled with immigrants of the Northeast lineage (Alsos et al., 2007;Skrede, Eidesen, Pińeiro Portela, & Brochmann, 2006). The complex biogeographic connection of the QTP and the NH involves also intercontinental disjunction reaching North America, which was demonstrated in some herbaceous species inhabiting the arctic regions and high mountain elevations (Alsos et al., 2015). The radiation toward the new habitats, the evolution of genetic lineages and the different migration and colonization potential of species in response to the historical climatic oscillations and abiotic changes provided a complex biogeographic pattern in most taxa originating from the QTP (Zhang, Meng, Allen, Wen, & Rao, 2014). Moreover, studies on population level of single species across the entire range provided further insights into species' phylogeography and evolution in relatively recent historical times involving the climate oscillations of the Quaternary (Christe et al., 2014;Schönswetter, Stehlik, Holderegger, & Tribsch, 2005;Taberlet, Fumagalli, & Wust-Saucy, 1998).
The number of species in the Rhodiola genus varies according to different authors between 60 (Ohba, 2005) to 200 (Germano & Ramazanov, 1999). This genus (fam. Crassulaceae) is one of the most studied plant genera of QTP origin (Mayuzumi & Ohba, 2004;Zhang et al., 2014). Phylogenetic studies have revealed a relatively rapid diversification and radiation with high species diversity on the QTP and adjacent regions. Only five species are distributed in northeast Asia and three species of Rhodiola exhibit intercontinental distribution. These are Rhodiola integrifolia Raf. that occurs on both sides of the Bering Strait, Rhodiola rhodanta (A. Gray) H. Jacobsen that is distributed only in the western part of North America (Ohba, 1989) and R. rosea. L. with the broadest range among all Rhodiola species, ranging from East Asia, toward the Arctic regions, Europe as well as eastern North America. In Europe R. rosea is distributed in Iceland, the British Isles, the European Alpine System, and Scandinavia (Hegi, 1963). It is a dioecious, cold-adapted perennial, occupying a narrow range of the arctic-alpine habitats ( Figure 1). In eastern North America, it is restricted to the eastern coastal areas (Cuerrier, Archambault, Rapinski, & Bruneau, 2015;Ohba, 1989;Olfelt & Freyman, 2014).
Due to its extremely high morphological variability as described by Ohba (1981), there are several taxonomic interpretations within has smaller leaves, shorter flowering stem and the leaves are serrate (Fu & Ohba, 2001). Gontcharova, Gontcharov, Yakubov, and Kondo (2009) recognize three subspecies: R. rosea subsp. rosea, R. rosea subsp. Arctica, and R. rosea subsp. sachalinensis based on seed surface morphology. Yanbaev, Bairamgulov, Redkina, and Mullagulov (2007) reported Rhodiola iremelica Boriss. as an endemic species to the southern Urals, but based on molecular evidences, using the internal transcribed spacer György, Szabó, Bacharov, and Pedryc (2012) could not differentiate this species proposing this taxa only at subspecies rank of R. rosea. Several other molecular studies also demonstrate the high genetic variability within this taxon (Elameen, Klemsdal, Dragland, Fjellheim, & Rognli, 2008;György, Fjelldal, Szabo, Aspholm, & Pedryc, 2013;György, Vouillamoz, Ladányi, & Pedryc, 2014;Kozyrenko, Gontcharova, & Gontcharov, 2011). Zhang et al. (2014) stated that Rhodiola species of disjunct intercontinental distribution have reached North America at least twice, via Beringia and via the amphi-atlantic route. Latter route seems to fit well for R. rosea with its extremely small, winged seeds compared to other species. This particular seed morphology makes R. rosea a candidate species for long-distance dispersal that most probably dates to the period when the NALB (Northern Atlantic Land Bridge) was not available anymore for species' migration.
Molecular study applied on the European populations of R. rosea by microsatellite markers has revealed distinct clusters of populations within the EAS and suggested at least two glacial refugia for the species, one in the area of the Eastern Alps and the Carpathians and another one in the Dolomites (György, Vouillamoz, & Höhn, 2016).
With its wide and disjunct circumboreal distribution and high genetic variation, R. rosea seems to be a model plant species to study spatial diversification and migration from the QTP toward the Northern Hemisphere. Kozyrenko et al. (2011) based on ISSR markers, have come to the conclusion that at least two distinct evolutionary lines exist within R. rosea. The species migrated from its center of origin westward along the Ural Mts. and eastward by the mountain ridges of eastern Siberia. In a recent study on the chloroplast trn L-F region, Cuerrier et al. (2015) reported the existence of two infraspecific variants. Coastal and Alpine populations were found to differ in sharing two indels (duplication of 23 and 19 bp) suggesting for coastal population a common North American and Scandinavian origin, while the Alpine populations seem to have originated from Eurasia via the Central Asian highland corridor.
Accordingly, we assumed that along the westward expansion of the species, R. rosea has diversified within the area of Europe (EAS), which resulted in the two infraspecific variations observed by Cuerrier et al. (2015). These genetic lineages potentially contributed to the colonization of the Northern American continent. In the present study, chloroplast sequence data were used to gain further insights into the species' genetic diversity along its distribution range, especially in Europe and by using a phylogeographic approach to propose a historical biogeographic scenario for the present-day distribution of the species, focusing on the genetic lineages and evolution within the EAS.

| Sampling material
Altogether 44 populations of R. rosea were sampled, which are listed in Table 1. Mostly leaves were collected and stored at −20ºC.
In some cases, seeds were obtained from Pharmaplant Ltd., germinated and the plantlets were used as source of DNA. Vouchers of the samples were deposited at Szent István University. DNA was extracted with SP Plant Mini Kit (Omega, VWR International Kft, Budapest). DNA concentration and quality was assessed using NanoDrop (BioScience, Hungary) and visually checked on 1% agarose gel.
In some samples, not one, but two, (Triglav, Passo Gavia, Val Fredda, Zirbitzkogel, Präbichl) or even three fragments (Hochkar) were detected on the agarose gel in case of the trnL-F locus. After cloning and sequencing, these fragments all turned out to represent the same sequence. Duplication has happened in the primer binding site and due to this one (or two) of the fragments was an artifact.

| Diversity and differentiation analysis
Genetic diversity for each loci, that is, the number of haplotypes (h), haplotype diversity (Hd) (Nei & Tajima, 1981), and nucleotide diversity (p) (Nei, 1987) were calculated using the software DnaSP v5.1.0.1 (Librado & Rozas, 2009). Genetic distances (p-uncorrected) within and between lineages were calculated with MEGA7 (Kumar et al., 2016). Polymorphic information content (PIC) values were calculated for each chloroplast locus based on the formula: PIC = 1 − ∑(P ij )2, where P ij is the frequency of the ith allele revealed by the jth marker summed across all alleles amplified (Botstein, White, Skolnick, & Davis, 1980). The marker is highly informative if PIC value is >0.5 but only reasonably informative if PIC value is ranging from 0.25 to 0.5.
PopART (Leigh & Bryant, 2015) with implemented Templeton-Crandall-Singh (TCS) statistical parsimony network analysis (Clement, Snell, Walker, Posada, & Crandall, 2002) was used to evaluate genealogical relationships among sequences. Each insertion/ deletion (indel) was considered as a single mutation event, and all indels were therefore coded as single positions in the final alignments.
The connection limit for the TCS analysis was 95% and gaps were treated as a fifth state.
The Bayesian clustering approach implemented in STRUCTURE 2.3.4 (Pritchard, Stephens, & Donnelly, 2000) was used to infer groups or subpopulations in the sequence dataset. The analysis was performed with an admixture linkage model with correlated allele frequencies. K value (the number of genetic groups) was set to 1-10 with a burn-in period of 105 steps followed by 106 repetitions of Markov chain Monte Carlo (MCMC), which is capable to deal with linked markers. Twenty repetitions were set for each run. The webbased STRUCTURE HARVESTER (Earl & von Holdt, 2012) was used to apply the Evanno method (Evanno, Regnaut, & Goudet, 2005) to detect the value of the optimal K that best fit the sequence data.

| Phylogenetic analysis
The trnL-F region was evaluated for phylogenetic construction, since GenBank (NCBI) sequences for other geographical regions, including Greenland and North America, were only available for this locus. Prior the analysis, FastGap 1.2 (Borchsenius, 2009) was used to code the phylogenetic information of gaps (indels) as binary characters within the sequence matrix by applying the "simple indel coding" algorithm of Simmons and Ochoterena (2000). The refined alignment of sequences and the indel binary matrix was incorporated in the concatenated dataset. The maximum likelihood (ML) approach (Felsenstein, 1981) implemented in RaxMLGUI 1.5b2 (Silvestro & Michalak, 2012) was used for tree construction, which is based on explicit models of sequence evolution and proved to be statistically robust. RaxMLGUI was run under the GTR nucleotide substitution model with gammadistributed rate heterogeneity (GTR+Γ). Bootstrap support of branches was calculated with 1,000 data resamples. Rhodiola semenovii, Rhodiola kirilowii, and R. integrifolia voucher specimens were used as outgroups in the phylogenetic analyses.

| RE SULTS
We psbA-trnH GUG by Zhang et al., 2014). The mean nucleotide diversity estimates for these was π = 7.688 × 10 −3 . Tajima's D values for all loci studied did not show a significant deviation from neutrality (Table 2).
We assigned the studied genotypes to five geographic regions:   Table 2.

| Analysis of the studied loci
In case of the trnL-F locus, the amplified fragment length was between 921 and 1,055 bp resulting in an alignment of 1,109 bp length.
Altogether five indels were identified at four sites of the trnL-F region and ten SNP, of which seven are singletons. Indels of 23 and 19 bp were already known from this region (Cuerrier et al., 2015), which were also detected in this study. Both are duplications.
Furthermore, an indel of 12 bp was found close to the 3′ end, which is also a duplication. At the 5′ end, an insertion of 67 bp was detected alone or duplicated (Figure 2).
In case of the psbA5'R-matk8F locus, the amplified fragment length was between 845 and 852 bp resulting in an alignment of 863 bp length. This region contained six indels of short sizes at four sites and 23 SNPs of which more than 50% were singleton. The region of psbA-trnH GUG was found to be extremely variable. The am-  (Table 1).

| Haplotype network analysis
There are between seven and 27 haplotypes for each loci, separately in our samples, and we constructed the haplotype networks for each of these loci. Unrooted haplotype genealogies were estimated from the substitution polymorphisms (including indels coded as single characters) observed at each of the six loci (Figure 3), and each analysis yielded one most parsimonious arrangement.
There is a phylogeographic pattern in the distribution of R. rosea to the Asian samples, sometimes sharing the haplotype. For trnL-trnF, psbB-psbH, and 5'rpS12-rpL20, a rather simple network structure outlined, while for psbA5'R-matk8F, psbA-trnH GUG , and trnC GCA -ycf6R, the structure is more complex, but still biogeographically interpretable.
A concatenated haplotype network (Figure 4) was also built, but since using all six loci resulted in a very complex network, in which almost all samples represented a separate haplotype, we decided to omit the two most polymorphic loci (psbA5'R-matk8F, psbA-trnH GUG ).
We used Bayesian Structure analysis to see whether the samples from the same geographic region are clustered together in the same group. Even though we assigned the samples into five geographical groups, the Evanno method indicates that a K = 3 model fits best the data ( and also an Irish sample. These samples contain both indels that were previously described by Cuerrier et al. (2015). Cluster E includes a single sample from Hochkar (Göstling Alps) and F includes some more samples from the Austrian Alps, Julian Alps, and the Dolomites. These latter two groups are closely related, not including the two earlier described indels by Cuerrier et al. (2015), A few sequences are also available for the psbA-trn region in the GenBank. However, this locus proved to be the most variable region, so even though an attempt was made to generate a maximum parsimony tree also for these sequences a very complex tree was obtained without any clear clustering.

| D ISCUSS I ON
The noncoding regions of the chloroplast are widely used to study genetic lineages and relationships among taxa, and to describe complex historical biogeography of species and genera in relation to their present distribution. The main advantages of these chloroplast regions are the universality of the primers and the robustness of the amplification process .
In the study of DNA barcoding of Rhodiola species by Zhang et al.
(2014, five frequently used sequences (rbcL, matK, trnH-psbA, ITS and trnL-F) were tested for their utility. Even though ITS sequences were found to be the most powerful, trnL-F region showed to be also a promising alternative marker for barcoding Rhodiola species. Regarding phylogeographical patterns, the study by Cuerrier et al. (2015) has revealed two intraspecific variants of R. rosea based on this chloroplast region.
Coastal and Alpine populations were found to differ in the presence or absence of two indels (duplications of 23 and 19 bp). Based on this, the authors have concluded that coastal populations of North America F I G U R E 5 Estimated structure for K = 2, K = 3 and K = 4 of assignment analysis performed in STRUCTURE (Pritchard et al., 2000). The most likely number of clusters (K) detected by Evanno et al. (2005) method implemented in STRUCTURE HARVESTER (Earl & von Holdt, 2012). Each individual is represented by a vertical stacked column indicating the proportions of K groups and Scandinavia are genetically separated from the Alpine populations.
Our Structure analysis ( Figure 5) and the extended maximum likelihood tree ( Figure 6) confirm their findings, but on the other hand, our results revealed a more elaborated pattern within Europe. Both the Structure analysis ( Figure 5) and the haplotype networks (Figures 3 and 4) identify the samples from the Central-European mountains as a distinct group. In the QTP, which is considered to be the center of origin and in adjacent regions, no insertions are present in the trnF-L region of roseroot. Neither were these insertions detected anywhere in Asia nor in most of the European samples. However, since there are some coastal samples in Europe like Dingel in Ireland, or Novaya Zemlya, Kamchatka in Asia which do not exhibit these two characteristic "coastal" indels (Cuerrier et al., 2015) and there are samples containing the two indels being not exactly nearby the coastal areas like Halti or Kilpisjärvi in Finland the term amphi-Atlantic would be more accurate to be used instead of coastal.
The lineage harboring the two insertions, which is distributed along the coastal parts of Scandinavia, the British Isles and in the eastern parts of North America suggest common origin of these populations. We interpret this pattern as evidence for the large periglacial distribution of R. rosea during the time of the Pleistocene from where it could have colonized the North Atlantic costs, Iceland, and North America postglacially (Abbott & Brochmann, 2003;Brochmann, Gabrielsen, Nordal, Landvik, & Elven, 2003;Schmitt, 2007). However, in the British Isles, both types are present (containing and missing the two insertions), while in Scandinavia only those with the two insertions were found. Moreover, we also observed individuals in Wales and in the Inismore island (Ireland) bearing only one insertion ( Figure 3A). Accordingly, the earlier described two indels might have had their origin on the British Isles from where they have moved toward the north (Scandinavia) and west (North America; Figure 7B). Alternatively, individuals exhibiting the two insertions have expanded from an ancient, Taymir-Siberian Northeastern lineage ( Figure 7A) (Alsos et al., 2007) (Holderegger & Abbott, 2003) or V. uliginosum (Alsos et al., 2005). In case of V. uliginosum, a deep phylogenetic split was presumed early before the glaciations. However, Atlantic cost seems to have genetic material of different origin. In case of D. octopetala, Scandinavian territories were considered to be even contact zones between the European and Eastern lineages (Skrede et al., 2006).
In R. rosea, the lineage without any insertion of the British Isles most probably originates from the EAS and underwent stepping stone mutation process on the British Isles. This is indicated by the existence of only one insertion present in Wales and in Inishmore island. Accordingly, the British Isles proved to be a major diversification center in case of R. rosea. Moreover, two other loci studied in our work, psbA-trnH GUG and trncGCA-ycf6R, support this hypothesis ( Figure 3C,F).
Coastal and intercontinental colonization toward the northwest by long-distance dispersal resulted in the eastern North American distribution of R. rosea as it has been earlier reported for other amphi-arctic species (Alsos et al., 2015). Molecular analysis of Rhodiola species extending toward North America have reported the presence of the species at least since the Middle Pleistocene and suggested its entry into the American continent at least twice.
According to Zhang et al. (2014), ancestors of R. rhodantha and R. integrifolia have reached the continent from the east via Beringia while R. rosea arrived more likely via the amphi-Atlantic route, which is supported by our results as well.
Our trnL-F results confirm the two distinct ancient evolutionary lineages from the QTP: an eastward and a westward expanding lineage, as proposed by Kozyrenko et al. (2011). After reaching the EAS most probably before the glacial cycles or at the beginning of the glacial cycles of the Pleistocene (Taberlet et al., 1998),  (Schönswetter et al., 2005;Taberlet et al., 1998). Also all other loci studied confirm diversification in the region of the EAS (Figures 3 and 4). Our former study based on nuclear microsatellites has already revealed that the Eastern Alps and the Dolomites exhibit a distinct genetic pattern compared to other Alpine regions and might have served as possible refugia for R. rosea (György et al., 2016)  F I G U R E 7 Hypothetical distribution routes for Rhodiola rosea based on trn L-F sequences. A, two distinct ancient evolutionary lineages started from the QTP: an eastward and a westward expanding lineage. The westward route followed the mountain ranges of the Central Asiatic highland corridor and the EAS to the west and the Taymir-Siberian region to the northwest. The mutations resulting in the two indels happened in the area of the British Isles in a stepwise manner and the altered lineage moved forward to the North to Scandinavia and to the western parts of North America; B, two distinct ancient evolutionary lineages started from the QTP: an eastward and a westward expanding lineage. The westward route followed the mountain ranges of the Central Asiatic highland corridor and the EAS to the west and the Taymir-Siberian region to the northwest. The mutations resulting in the two indels have happened somewhere in the northern glacial refugia or even before the glaciations. The altered lineage has moved through the Amphi-Atlantic route and in the British Isles it met the ancient lineage where they hybridized resulting the local unexpectedly high diversity. Signs are showing the mutation events Interestingly, in case of the trnL-F region another, earlier not known duplication has been revealed within the EAS, namely in the sample of the Eastern Carpathians, the Calimani Mts. (Figure 3A). Although only one population was included in the study, we could detect the signs of a different gene stock only persisting in the Eastern Carpathians. Samples from the neighboring Southern Carpathians proved to be different as they belong to the large group without any insertions in the trnL-F region ( Figure 2, group A). The sample from the Calimani Mts. indeed represents a distinguished haplotype based on psbA5'R-matk8F, psbA-trnH GUG , and 5'rpS12-rpL20 loci also. We consider it would be interesting to analyze more samples from this region to highlight the characteristics of the gene stock preserved along the Carpathians. Earlier, genetic lineages different from those of the Alps have been reported from the Northern Carpathians, the Tatra Mts.; in case of D. octopetala, an eastern genetic lineage was detected being more closely related to the northeast Russian ones than to the Alpine (Skrede et al., 2006). The available case studies reported also phylogeographical structuring of populations from the Eastern and Southern Carpathians (Mráz et al., 2007;Ronikier, 2011).

| CON CLUS ION
Our study explores high resolution in the genetic pattern of R. rosea a widely distributed arctic-alpine perennial species of the Northern Hemisphere based on six chloroplast regions. As it has been already stated this species originates from the QTP from where it has expanded through the NH. Migration and diversification toward the east were documented by several studies (Hou & Lou, 2014;Kozyrenko et al., 2011). Based on the sequence alignment of the trnL-F region, northern expansion toward Siberia and the colonization of western Eurasia have started most probably at the same historical time (Zhang et al., 2014). Our results support the migration of the species into Europe via the Central Asian highland corridor, reaching the EAS and also the western Apart from those of the EAS, a common lineage was detected along the Atlantic coast from the British Isles toward Scandinavia as well as Iceland and the Eastern parts of North America. Accordingly, the British Isles seems to represent the main link between the northern Atlantic and southern EAS lineages.

ACK N OWLED G M ENTS
The following people are acknowledged for their assistance in col-