Multiple late-Pleistocene colonisation events of the Antarctic pearlwort Colobanthus quitensis (Caryophyllaceae) reveal the recent arrival of native Antarctic vascular flora

Aim: Antarctica's remote and extreme terrestrial environments are inhabited by only two species of native vascular plants. We assessed genetic connectivity amongst Antarctic and South American populations of one of these species, Colobanthus quitensis , to determine its origin and age in Antarctica. Taxon: quitensis Methods: Four chloroplast markers and one nuclear marker were sequenced from 270 samples from a latitudinal transect spanning 21–68° S. Phylogeographic, population genetic and molecular dating analyses were used to assess the demographic history of C. quitensis and the age of the species in Antarctica. Results: Maritime Antarctic populations consisted of two different haplotype clusters, occupying the northern and southern Maritime Antarctic. Molecular dating analyses suggested C. quitensis to be a young (<1 Ma) species, with contemporary population structure derived since the late-Pleistocene. Main conclusions: The Maritime Antarctic populations likely derived from two independent, late-Pleistocene dispersal events. Both clusters shared haplotypes with sub-Antarctic South Georgia, suggesting higher connectivity across the Southern Ocean than previously thought. The overall findings of multiple colonization events by a vascular plant species to Antarctica, and the recent timing of these events, are of significance with respect to future colonizations of the Antarctic Peninsula by vascular plants, particularly with predicted increases in ice-free land in this area. This study


| INTRODUC TI ON
Antarctic terrestrial ecosystems experience some of the most extreme conditions on Earth. Estimates of current ice-free land surface area range from ~0.2% to 0.4% (Burton-Johnson, Black, Fretwell, & Kaluza-Gilbert, 2016;Terauds et al., 2012), with glacial models suggesting that most if not all of this area has been covered by ice during multiple glacial cycles (DeConto & Pollard, 2016;Pollard & DeConto, 2009). Significant ice sheet expansions occurred in the Miocene  (Convey et al., 2008).
Recent biological research has challenged this view, revealing many examples of species with long-term pre-glacial persistence.
Recent molecular research has revealed the Antarctic bryophyte flora to comprise a mixture of long-term survivors (Biersma et al., 2017;Biersma, Jackson, Stech, et al., 2018;Ochyra, 2003;Pisa et al., 2014) and more recent arrivals (Biersma, Jackson, Bracegirdle, et al., 2018;Biersma et al., 2017;Kato, Arikawa, Imura, & Kanda, 2013). While long-term survivors can be found within the bryoflora, the low diversity within the vascular flora suggests that it may be of recent origin. Fasanella, Premoli, Urdampilleta, González, and Chiapella (2017), studying the genetic diversity within D. antarctica, recently detected 17 nuclear DNA and six plastid DNA haplotypes in Patagonia, while Antarctica had just one nuclear and four plastid DNA haplotypes. As the haplotypes present in Antarctica were only a small fraction of those present in Patagonia, and the nuclear haplotype in Antarctica was also found in Patagonia, this suggested that the species likely dispersed to the Antarctic in the mid-to late-Pleistocene.
Here, by applying population genetic and molecular dating analyses to C. quitensis specimens collected from across the widest range of localities sampled to date, we aimed to assess (a) whether C. quitensis may have survived the LGM in refugia in the Maritime Antarctic (encompassing the Antarctic Peninsula, South Shetland Islands and South Orkney Islands), or (b) whether its arrival in these regions is a more recent post-glacial event.

| Sampling
The full biogeographic range of C. quitensis includes areas of the Antarctic Peninsula north from 69° S (Convey, Hopkins, Roberts, & Tyler, 2011), the South Shetland Islands, the South Orkney Islands, South Georgia, the Falkland Islands, the southern ranges of Chile and Argentina, the High Andes regions of Chile, Argentina, Ecuador and Bolivia, and extends into Mexico (Moore, 1970). Our dataset consisted of 270 samples collected from across the southern part of the species' biogeographic range, where it is most commonly found (see Figure 1a; for the full distribution of the species see Figure 1b). To allow for a detailed study of both within-population and wider geographical variation, we combined two types of available datasets: (a) a population level dataset of a total of 200 freshly collected samples from 19 different field locations, with several (n > 1) samples collected per location, and (b) a dataset derived from single samples (n = 1) from 70 locations, derived from herbarium specimens, single fresh collections and one previously sequenced specimen from GenBank (see Table S1, Appendix S1). We included as many samples from the two datasets as possible in all analyses.
For the phylogenetic analyses, we included 21 samples (four from GenBank and 17 newly sequenced samples) from seven additional Colobanthus species as outgroups, viz., C. subulatus (D'Urv.) Hook.f., C. kerguelensis Hook.f., C. apetalus (Labill.) Druce, C. strictus Cheesem., C. hookeri Cheesem., C. affinis (Hook.) Hook.f. and C. masonae L.B.Moore. For the molecular dating analyses, nine specimens from Sagina species (seven from GenBank and two newly sequenced samples) were included as outgroups, based on the relationships reported by Dillenberger and Kadereit (2014) and Greenberg and Donoghue (2011) (see Table S1 for information on samples and GenBank accession numbers). Herbarium samples were obtained from the herbaria of the British Antarctic Survey, UK, and the University of Magallanes in Punta Arenas, Chile (herbarium codes AAS and HIP, respectively).
DNA was extracted from leaf tissue using the DNeasy Plant Mini Kit (Qiagen GmbH, Hilden, Germany), and E.Z.N.A. Plant DNA Kit (Omega Biotek, USA) following the manufacturers' instructions, using liquid nitrogen and a mortar and pestle for tissue disruption. PCR amplification was carried out using the Taq PCR Core Kit (Qiagen GmbH, Hilden, Germany) and Platinum Taq DNA polymerase (Invitrogen, Life Technologies) according to the manufacturers' instructions, with the addition of 1 μl of bovine serum albumin.
Primer information and annealing temperatures are given in Table S2 (Appendix S1). Forward and reverse sequencing was performed by LGC Genomics (Berlin, Germany) and Macrogen (Seoul, Korea).
Forward and reverse sequences were combined and aligned with prank 140603 (Löytynoja & Goldman, 2008), using default settings, with minor corrections made manually. Models of DNA sequence evolution were selected using jModelTest 2.1.10 (Darriba, Taboada, Doallo, & Posada, 2012;Guindon & Gascuel, 2003) implementing the SPR base tree search, G rate variation option and the corrected Akaike information criterion (AICc) method for model comparisons.

| Phylogenetic and population genetic analyses
Bayesian analyses were performed using MrBayes 3.2 (Ronquist et al., 2012). All analyses were run for 25 × 10 6 generations, applying default settings and the closest match to the jModelTest identified substitution models per partition (cpDNA: nst = 6, rates = gamma; ITS: nst = 1, rates = equal), sampling every 1.0 × 10 3 generations, and omitting the first 25% of trees as burn-in. Convergence was assessed using Tracer 1.6 (Rambaut, Suchard, Xie, & Drummond, 2014) by verifying that split frequencies had an average standard deviation of <0.01 and all posterior parameter estimates exceeded effective sample sizes by >200. Maximum clade credibility trees with median heights were visualized using Figtree 1.4.2 (http:// tree.bio.ed.ac.uk/softw are/figtr ee/). Maximum likelihood analyses were performed using RAxML-GUI 1.3.1 (Silvestro & Michalak, 2012), applying the 'bootstrap+consensus' option (1,000 iterations) using the jModelTest identified models of evolution with other settings as default. We inferred trees for ITS, trnQ-rps16, atpB-rbcL and the combined chloroplast regions ndhF-rpl32R and rpl32F-trnL (the latter were combined because they were always present together), as well as generating a combined tree using ndhF-rpl32R, rpl32F-trnL and ITS (for which data from most specimens were included; n = 232). To assess for topological incongruence among phylogenies derived from the cpDNA and nrDNA partitions, we used >70% bootstrap (BS) and >95% posterior probability support (PP) thresholds. Topological conflicts were assumed to be significant if two conflicting relationships for the same set of taxa were both supported with bootstrap values ≥70% and PP ≥95%. Phylogenetic analyses of the combined cpDNA and nrDNA datasets were conducted using an alignment containing unique sequences only (for a list of unique sequences see Table S1), that were extracted from the full dataset using Geneious 9.1.8 (https:// www.genei ous.com).

| Molecular dating
Relative divergence times and ages for C. quitensis were calculated using starbeast 2.5.1 (Bouckaert et al., 2014) on the combined ndhF-rpl32R, rpl32F-trnL and ITS dataset that also included nine Sagina specimens as outgroups. Analyses were performed using the unique haplotypes only. As there are no fossil or geological calibration points available for molecular dating within the genus, we used two alternative methods for date calibration: (a) employing a previously calculated divergence date for the split between Colobanthus and Sagina from Dillenberger and Kadereit (2017), in the form of a lognormal prior of 3.44 Ma and a 95% highest posterior density interval of 1.34-5.91 Ma, and (b) applying a substitution rate on the cpDNA partition of 0.8 ± 0.06 × 10 −9 subst./site/year, based on the rate estimated for chloroplast noncoding regions by Yamane, Yano, and Kawahara (2006) and previously applied on the Caryophyllaceae species Silene acaulis (Gussarova et al., 2015). For both methods, we applied a coalescent Bayesian Skyline tree prior, strict molecular clock and the JC69 and GTR+G models of evolution for ITS and cpDNA markers, respectively, allowing independent clocks for both genomic partitions. We used a linear multi-species coalescent with constant root as well as the appropriate ploidy level gene trees for both partitions. All runs had a chain length of 1.0 × 10 8 generations, logging parameters every 5.0 × 10 3 generations. Convergence was assessed in Tracer as described above. A maximum clade credibility tree with median node heights and 10% burn-in was constructed using TreeAnnotator 2.5.1 (Bouckaert et al., 2014) and visualized using Figtree 1.4.2 (http://tree.bio.ed.ac.uk/softw are/figtr ee/).

| Genetic diversity
Of all markers, the combined rpl32-trnL and ndhF-rpl32 regions were the most variable, followed by ITS, atpB-rbcL and trnQ-rps16 (containing 16, 7, 5 and 3 PI sites, respectively; Table 1). Amongst the different biogeographic regions, Patagonia showed the highest molecular diversity (gene diversity, variable and parsimony informative sites and nucleotide diversity), followed by the High Andes, the Antarctic Peninsula and South Georgia, with no variation being detected within the South Shetland Islands (Table 1). The neutrality tests revealed a significant negative Fu's F s value for the combined atpB-rbcL and the combined ndhF-rpl32 and trnQ-rps16 datasets, indicating a likely recent population expansion or change in selection for these DNA regions within the species.

| Phylogenetic and population genetic analyses
The phylogenetic analyses revealed no significant topological incongruences among rpl32-trnL and ndhF-rpl32 and ITS partitions, allowing for a combined analysis of data from these cpDNA and nrDNA regions ( Figure 2; see Figure S1.1 and S1.2a-d in Appendix S1 for phylogenetic trees of combined and single markers, respectively).

| Divergence time analysis
The estimated age using the fossil-calibrated method (I) revealed an earlier split of the most recent common ancestor (T MRCA ; split Sagina -Colobanthus) than the rate-informed dating analysis (II) ( Table 3).
Both methods suggested that the genus Colobanthus diverged throughout the course of the Pleistocene, and that C. quitensis origi- using method (I) were 7.20 ± 0.06 × 10 −9 subst./site/year for ITS, and 2.27 ± 0.06 × 10 −8 subst./site/year for ndhF-rpl32R+rpl32F-trnL. The rate calculated for ITS using method (II) was approximately sixfold lower, at 1.13 ± 0.01 × 10 −9 subst./site/year. TA B L E 1 (a) Summary of genetic diversity indices on all markers within Colobanthus quitensis. Tajima's D and Fu's F s neutrality tests were performed on chloroplast markers (monophyletic groups only). (b) Genetic indices of biogeographic regions with sample sizes of n > 10 within the concatenated ndhF-rpl32 and rpl32-trnL dataset  America could also have been the original source location, with this region being thought to have harboured various Pleistocene F I G U R E 2 (a) Bayesian phylogeny for Colobanthus quitensis and outgroup species of unique sequences within the combined ITS and ndhF-rpl32R+rpl32F-trnL dataset (for list of unique sequences see Table S1). Posterior probabilities and maximum likelihood bootstrap values from MrBayes and RAxML-GUI analyses, respectively, are shown below each node. Colours indicate biogeographical regions shown in Figure 1. Note a disparity between posterior probabilities and maximum likelihood bootstrap support values, likely caused by short branch lengths and/or polytomies (Lewis, Holder, & Holsinger, 2005) refugia (Sersic et al., 2011). Intriguingly, the southern Maritime

| The origin of Colobanthus quitensis in Maritime Antarctica and in South Georgia
Antarctic group also showed a close affinity to the northern High Andes populations (separated by only two mutational steps; Figure 3b,c), a finding worthy of investigation in future studies.

| Recent arrival of the Antarctic vascular flora
Exactly when the dispersal events across the Southern Ocean occurred is not certain, but the shared haplotypes and genotypes with specimens from South Georgia as well as the genetic similarity to specimens from South American regions suggest that C.
quitensis reached the Antarctic on a relatively recent (late-Pleistocene) timescale. As we report here, previous studies have recorded low genetic variation within C. quitensis (Acuña-Rodríguez et al., 2014;Androsiuk et al., 2015;Koc et al., 2018;Lee & Postle, 1975), suggesting a recent spread of the species along the Antarctic Peninsula. This stands in contrast with an earlier suggestion that C. quitensis is a likely pre-glacial relict present in Antarctica since the Oligocene-Pliocene (Parnikoza et al., 2007). Notably, we also find that C. quitensis is itself a relatively young species (<1 Ma; see Table 3), and much younger than the Oligocene-Pliocene.
Our overall results suggest that C. quitensis likely only became

S. Georgia
Falkland Is.
10 samples 1 sample High Andes

| Genetic variation of Colobanthus quitensis within southern South America
Based on the sampling included in this study, an early split could be found between the majority of the populations from the central South American Andes and those from the remaining populations, including Patagonia ( Figure 2). We note that we did not have access to material from populations from areas further north in South America and in southern North America (Mexico), where the species is also sporadically found (see Figure 1b). With its origin in the Andean range and/or cold regions of Patagonia (see Figure 2), C. quitensis is likely pre-adapted to cold, high altitude environments, which are characterized by highly variable conditions (e.g. in temperatures and water availability). The genetic similarity of C. quitensis across its current biogeographical distribution suggests ecological niche conservation in its ability to withstand harsh and/or variable conditions (as shown by its tolerance to cold, moderately saline and/or dry environments), combined with opportunistic dispersal capabilities to reach and colonize other suitable habitats (e.g. Antarctic and sub-Antarctic environments). Overall, the genetic information shown here may be useful for future studies that apply niche comparative methods to link macroclimatic variables in explaining the past, present and future distribution of C. quitensis.  (Dillenberger & Kadereit, 2017). b Based on estimated substitution rate for noncoding chloroplast regions (Yamane et al., 2006).

TA B L E 3
Mean estimated time to most recent common ancestor (T MRCA ) (95% HDP lower-upper) for Sagina and Colobanthus, the genus Colobanthus and species C. quitensis, using two alternative methods for date calibration (see footnotes). All analyses were performed on the combined ndhF-rpl32R, rpl32F-trnL and ITS dataset While the overall distribution of Colobanthus appears Gondwanan (including representatives from New Zealand, Australia, South America, many sub-Antarctic islands and Antarctica), the genus' age is clearly much younger than the break-up of Gondwana (see Table 3 and Dillenberger & Kadereit, 2017), supporting the hypothesis that many Colobanthus species are efficient trans-oceanic dispersers. This is confirmed by their presence on various geologically young islands, such as sub-Antarctic Prince Edward Island (c. 215 ka; Hänel & Chown, 1998).
Further phylogeographical studies are required to assess historical dispersal and speciation patterns within the wider genus.

| Possible modes of dispersal
While bryophytes and other spore-dispersed biota could have been distributed to Antarctica by wind (e.g. see Biersma, Jackson, Bracegirdle, et al., 2018), the weight of the seeds of C. quitensis . Whether its seeds could survive exposure to seawater during rafting is unknown, but recently an example of rafting kelp has revealed that Antarctica is not completely isolated from biological sea-rafting particles from mid-latitude source populations (Avila et al., 2020).
Another possible mode of dispersal for C. quitensis could have been via the plumage of common Antarctic birds, such as gulls (Parnikoza et al., 2012(Parnikoza et al., , 2018. However, C. quitensis seeds are smooth and have no hooks or spines to facilitate their attachment to bird plumage, lessening the likelihood of this type of trans-oceanic dispersal. Alternatively, dispersal via the guts of birds, such as the whiterumped sandpiper (Calidris fuscicollis) could have facilitated a historical dispersal event. This long-distance migrating shorebird, breeding in the North American Arctic and wintering in southern South America and the Falkland Islands, is also a rare visitor to South Georgia and the South Shetland Islands (Trivelpiece et al., 1987), where sightings have increased over the last 30 years (Korczak-Abshire, Angiel, & Wierzbicki, 2011). The species has also been observed east of the Antarctic Peninsula (James Ross Island), on the western Antarctic Peninsula, and as far south as Rothera Point, Adelaide Island (Pavel & Weidinger, 2013). In its wintering grounds in southern South America, C. fuscicollis feeds on wetlands, shores and saltmarshes. Here, its diet consists mainly of invertebrates, but it also feeds on seeds, including Caryophyllaceae and Poaceae species, which can make up its entire stomach contents (Montalti, Arambarri, Soave, Darrieu, & Camperi, 2003). A rare dispersal event of this seed-foraging shorebird could thus have facilitated the transfer of either or both C. quitensis and D.
antarctica to Antarctica. Future studies are needed to investigate the likelihood of this mode of dispersal.

ACK N OWLED G EM ENTS
We thank Osvaldo J. Vidal for access to the HIP herbarium, Simon

DATA AVA I L A B I L I T Y S TAT E M E N T
Sequence data have been submitted to the GenBank database under accession numbers MN640112-MN640391 and MN614479-MN615128 (see Table S1, Appendix S1). Phylogenetic and Popart matrixes are available in the Dryad Digital Repository (https://doi. org/10.5061/dryad.qrfj6 q5bw).