The xeric side of the Brazilian Atlantic Forest: The forces shaping phylogeographic structure of cacti

Abstract In order to investigate biogeographic influences on xeric biota in the Brazilian Atlantic Forest (BAF), a biodiversity hotspot, we used a monophyletic group including three cactus taxa as a model to perform a phylogeographic study: Cereus fernambucensis subsp. fernambucensis, C. fernambucensis subsp. sericifer, and C. insularis. These cacti are allopatric and grow in xeric habitats along BAF, including isolated granite and gneiss rock outcrops (Inselbergs), sand dune vegetation (Restinga forest), and the rocky shore of an oceanic archipelago (islands of Fernando de Noronha). The nucleotide information from nuclear gene phytochrome C and plastid intergenic spacer trnS‐trnG was used to perform different approaches and statistical analyses, comprising population structure, demographic changes, phylogenetic relationships, and biogeographic reconstruction in both spatial and temporal scales. We recovered four allopatric population groups with highly supported branches in the phylogenetic tree with divergence initiated in the middle Pleistocene: southern distribution of C. fernambucensis subsp. fernambucensis, northern distribution of C. fernambucensis subsp. fernambucensis together with C. insularis, southern distribution of C. fernambucensis subsp. sericifer, and northern distribution of C. fernambucensis subsp. sericifer. Further, the results suggest that genetic diversity of population groups was strongly shaped by an initial colonization event from south to north followed by fragmentation. The phylogenetic pattern found for C. insularis is plausible with peripatric speciation in the archipelago of Fernando de Noronha. To explain the phylogeographic patterns, the putative effects of both climatic and sea level changes as well as neotectonic activity during the Pleistocene are discussed.

. The BAF comprises two distinct bioclimatic regions (northern and southern), with a transition zone near the Doce River (Carnaval et al., 2014). Plant taxa restricted to only one of these regions are common, resulting in strong floristic distinction between the northern and southern BAF (Fiaschi & Pirani, 2009). Moreover, recent studies suggest that BAF combines influences of historical connections with other biomes such as the Amazon forest (Sobral-Souza, Lima-Ribeiro, & Solferini, 2015), seasonally dry tropical forest (SDTF, Mogni, Oakley, & Prado, 2015), and Cerrado (Antonelli, Verola, Parisod, & Gustafsson, 2010), resulting in a wide heterogeneity of plant communities. Although BAF harbors predominantly evergreen rainforest, it also includes xeric or open vegetation areas differing markedly from that of the surroundings, such as the open scrub vegetation along the sandy coastal plains named restinga, as well as the inselberg flora (Scarano, 2002).
Restinga communities are predominantly Quaternary habitats characterized by sandy soils mainly covered by herbaceous and shrubby xeric vegetation exposed to oceanic influence and direct solar radiation (Scarano, 2002). This vegetation extends in a narrow belt along most Brazilian coastal plains between evergreen forest and the sea. Inselbergs are isolated rock outcrops of Precambrian granite and gneiss, harboring a rich flora associated with harsh conditions such as poor soil, high temperature, and insolation, leading to low moisture retention (Porembski, 2007). As these rock outcrops are mostly embedded in a forest matrix, they are frequently considered continental islands (Pinheiro et al., 2013;Porembski, 2007). Different factors are proposed to explain phylogeographic patterns in BAF, including rivers (e.g., Cazé et al., 2016;Neto, Furtado, Zappi, Filho, & Forzza, 2016) and geological faults (Batalha-Filho et al., 2013;Thomé, Zamudio, Haddad, & Alexandrino, 2014;Thomé et al., 2010) as putative geographic barriers. Further, Pleistocene climatic changes have also been invoked to explain diversification within BAF (e.g., Cabanne et al., 2016;Cardoso, Cristiano, Tavares, Schubart, & Heinze, 2015), by persistence of rainforest species in stable areas (refugia) in the northern bioclimatic region and expansion of open vegetation formation in the south (Carnaval et al., 2014). These global climatic changes also had impacts on sea level, which in turn might have influenced the geographic distribution of coastal vegetation (Ramos-Fregonezi et al., 2015), for instance by the exposition of Brazilian continental shelf during glacial periods (Leite et al., 2016).
In order to contribute to the understanding of the diversification events in xeric habitats of BAF, we performed a phylogeographic study with the columnar cacti Cereus fernambucensis Lemaire and C. insularis Hemsley (Cactaceae; Cereeae), which represents a Pleistocene monophyletic lineage (Franco et al., 2017). While C. insularis grows only on the small oceanic archipelago of Fernando de Noronha (3.8°S, 32°W), C. fernambucensis has a wider distributional range along xeric areas of BAF and is represented by two allopatric subspecies named C. fernambucensis subsp. fernambucensis and C. fernambucensis subsp. sericifer (Ritter) Taylor & Zappi (Taylor & Zappi, 2004). The main morphological distinctions between these subspecies are the size of the vegetative body and flowers, flower color, and fruit color (see Appendix S1). Cereus fernambucensis subsp.
fernambucensis is a characteristic component of restinga forest, growing in dunes and rocky seashores along the eastern Brazilian coastal plains, with latitudes around 5-25°S. Conversely, C. fernambucencis subsp. sericifer has an inland and more fragmented distribution, associated with granitic and gneissic inselbergs in southeastern Brazil.
Here, we addressed several biological questions regarding the studied Cereus taxa and delineate the following predictions: (i) considering the close phylogenetic relationship between C. insularis and C. fernambucensis (Franco et al., 2017) and based on island biogeography assumptions (Cowie & Holland, 2006), we expect a peripatric origin of C. insularis caused by a founder effect from continental populations, likely leading to the paraphyly of the progenitor lineage (C. fernambucensis subsp. fernambucensis); (ii) higher population divergence in C. fernambucensis subsp. sericifer due to its more fragmented range in comparison with C. fernambucensis subsp. fernambucensis; (iii) based on previous phylogeographic data from co-occurring restinga species of cactus (Menezes et al., 2016) and cactophilic flies (Franco & Manfrin, 2013), we expect to find population fragmentation along restinga forest; and finally, (iv) based on Pleistocene Refuges Hypothesis (PRH), it is expected that paleoclimatic Pleistocene oscillations should have influenced the demographic dynamics of our focal group in the past, likely promoting range expansion and higher genetic connectivity during glacial periods and, consequently, shaping the present-day patterns of geographic distribution and population structure.

| Sampling and molecular methods
We sampled individuals from 31 localities, including 20 locations of C. fernambucensis subsp. fernambucensis, seven of C. fernambucensis subsp. sericifer, and four from C. insularis, covering the entire documented distribution of our ingroup (Table 1; Figure 1). Total Genomic DNA was extracted from root tissues with DNeasy Plant Mini Kit (Qiagen). Exon 1 from nuclear phytochrome C (PHYC) gene and the plastid intergenic spacer trnS-trnG were used as molecular markers.
These segments were selected based on previous variability screening for Cereus (Romeiro-Brito, Moraes, Silva et al., 2016). Amplification reactions for trnS-trnG and PHYC were performed following Bonatelli, Zappi, Taylor, and Moraes (2013) and Helsen, Browne, Anderson, Verdyck, and Dongen (2009), respectively. The direct sequencing was prepared using the Big Dye terminator 3.1 kit (Applied Biosystems) and conducted in Gene Amp PCR System 9700 (Applied Biosystems). The forward and reverse sequences were assembled in Chromas 1.5 software, and the alignments were performed in ClustalW (Thompson, Higgins, & Gibson, 1994
Species tree (Edwards, Liu, & Pearl, 2007) was estimated using BEAST2 (Bouckaert et al., 2014), assuming the selected substitution model, Yule tree coalescent prior and the relaxed LogNormal clock model for PHYC and strict clock model for trnS-trnG. The clock model for each partition was selected after comparison of the marginal likelihoods from independent runs assuming strict or relaxed lognormal clocks in a path sampling analysis with eight steps and 500 thousands generation after a 50% burn-in. The species tree was obtained after 100 million MCMC generations, with a 25% burn-in, and sampling trees every 5,000 steps. Divergence time for each node was estimated using a uniform prior distribution for the plastid marker trnS-trnG including the minimum and maximum substitution rates observed in the chloroplast sequences of angiosperms, that is, 0.29% and 0.11% of substitution per million years (Bonatelli et al., 2014;Wolfe et al., 1987), respectively, and using a wide prior for PHYC evolutionary rate following Perez, Bonatelli, Moraes, and Carstens (2016).
The discriminant analysis of principal components (DAPC) was performed in the R package adegenet (Jombart, Devillard, & Balloux, 2010) after a preliminary run using the smallest number of principal components (PC) that accounted for the total variance in the data, and for a crescent number of clusters (K) from 2 to 10, to assess the most likely number of groups through Bayesian Information Criterion (BIC). An optimization procedure was carried to select the number of PCs, in order to maximize the successful reassignment of data, measured as the α-score. The output of the DAPC analysis was plotted with Distruct (Rosenberg, 2004). Global and hierarchical analysis of molecular variance (AMOVA) and standard diversity indexes were conducted with the program Arlequin 3.5.2.2 (Excoffier & Lischer, 2010).

Geographic coordinates (S, W) Voucher
We also performed extended Bayesian skyline plot (EBSP) in BEAST2 (Bouckaert et al., 2014) using trnS-trnG and PHYC datasets, assuming the models for nucleotide evolution and molecular clock identified for each partition. The substitution rate was available only for plastid sequences. Depending on the analysis, the MCMC runs were carried for 20 to 80 million generations sampling every 2,000 steps, with a 15% burn-in. The results of EBSP were analyzed in TRACER 1.6 (available from http://beast.bio.ed.ac.uk/Tracer).
We used 1,000 random trees from our species tree output in both analyses and assumed four as the maximum number of ancestral areas.
For discrete phylogeography (DP) approach, we implemented the diffusion model (

| Circumscription of genetic groups
We ter. An additional DAPC run using the α-score to refine our analysis suggests that only 3 PCs (89.6% of the total variance) needed to be retained, and we showed the results of this optimization. The four groups recovered by the DAPC matched the same grouping as revealed in the haplotypes network (Figure 1c).
The four population groups recovered in DAPC were also highly supported by the species tree analysis, but the genealogical relationships among them showed low support (Figure 2). AMOVA results were congruent with DAPC and the species tree, as the among-group variance taking into account four genetic groups was higher than the taxonomic circumscription ( Table 2). The standard indices of diversity by population group are given in Table 3.

| Demographic analyses
We considered the reliability of inferred demographic events according to the congruence detected by both analyzed markers.
Thus, only FS group showed a signature of population expansion as evidenced by a unimodal mismatch distribution and significant negative values of Tajima's D (Table 3). However, the EBSP performed for this group did not reach convergence even after several attempts with different priors, probably due to low intragroup genetic resolution. To overcome the lack of genetic variation, we also reported the neutrality tests (Table 3) and EBSP for total sample, as we adopted a sampling strategy similar to the "pooled" scheme simulated by Heller, Chikhi, and Siegismund (2013) which minimizes the effects of substructure in demographic inferences. At any rate, we cannot reject constant population size as the parameter "number of population changes" statistically did not differ from zero [mean: 1.02 (95% HPD: 0.00-3.00)].

| Time estimates and biogeographic analysis
The  (Table 4). The S-DIVA recovered a large geographic area (NII + SII + SRF + NRF1) as a most probable ancestral range.
For BBM analysis, the results were partially congruent (Table 4; see Appendix S3).
Although both S-DIVA and BBM analyses are more suitable for higher phylogenetic levels and polytomies, as we found here, may obscure ancestral reconstructions (Yu et al., 2015) we performed the discrete phylogeography diffusion (Lemey et al., 2009)

| DISCUSSION
The levels of genetic diversity found in both the trnS-trnG and PHYC were similar, with the plastid marker exhibiting a higher geographic structure (Appendix S2). In plants, contrasting population structure estimates from plastid and nuclear markers might be associated with differences between seed and pollen dispersal leading to cytonuclear

Source of variation df
Variance components T A B L E 2 Analysis of molecular variance (AMOVA) based on trnS-trnG and PHYC variation discordance (Petit & Excoffier, 2009). For C. fernambucensis, xenogamy is the predominant system of reproduction and its pollination is promoted mainly by the hawkmoth Cocytius antaeus (Locatelli & Machado, 1999). As the hawkmoths travel long distances (Locatelli & Machado, 1999), they may potentially spread pollen and genetic information through time among populations.

Percentage of variation
Regarding seed dispersal, the genus Cereus has zoochorous fruits for which dispersal is attributed to frugivorous bats, small mammals, and birds (Taylor & Zappi, 2004). There is no specific information for our focal group, but birds seem to be the main agents responsible for seed dispersal in C. jamacaru (Gomes, Quirino, & Araujo, 2014), a member of the sister lineage of our ingroup (Franco et al., 2017). BAF includes a high bird diversity (Myers et al., 2000); therefore, we may hypothesize that seed dispersal for cacti is also very effective in this biome. In the face of potentially high capacities for seed and pollen dispersal in our target species, the different levels of phylogeographic structure between trnS-trnG and PHYC should be better explained by distinct effective population size of these markers, leading the nucleus to retain shared polymorphism for a longer time than the plastid.
Instead of a clear reciprocally monophyletic pattern for the three taxa included in our sample, we found four supported genetic groups with unresolved relationships among them. The divergence observed among these lineages, rather than the lack of phylogenetic signal, is likely explained by a rapid process of population diversification in the The codes for populations groups are the same used in Figure 1.
The least-squares procedure to fit model mismatch distribution and observed distribution did not converge after 2,000 steps.
T A B L E 4 Results of biogeographic reconstructions in S-Diva and BBM. The geographic areas (ISL, NRF1, NRF2, SRF, NII, and SII) are described in Figure 2 Lineages Ancestral range (probability value) a Biogeographic event (probability value) a Pleistocene ( Figure 2). Moreover, the absence of a clear signature of recent demographic shift suggests that patterns of genetic diversity in these groups were strongly shaped by the initial colonization event followed by fragmentation. Interestingly, C. insularis composes a monophyletic group together with northern populations of C. fernambucensis subsp. fernambucensis, leading C. fernambucensis to be paraphyletic ( Figure 2). This is likely a consequence of accelerated population differentiation of the C. insularis lineage after the colonization at Fernando de Noronha by archaic continental populations of C. fernambucensis lineage, favoring peripatric speciation. In fact, in a peripatric model of speciation, the parental widespread lineage becomes paraphyletic (Rieseberg & Brouillet, 1994), at least until the lineage sorting renders the parental species a monophyletic status (Funk & Omland, 2003). In agreeing with the idea of its peripatric origin, C. insularis shown a series of morphological vegetative differences in relation to C. fernambucensis, including higher rib number and smaller flower size (Taylor & Zappi, 2004) that might be a consequence of founder-event speciation which lead to rapid morphological differentiation of the peripheric and small population.

S-DIVA results
As these kind of oceanic islands were never totally connected to the continent (Cowie & Holland, 2006), peripatric speciation is far more plausible for C. insularis. Further, even though Fernando de Noronha orogenesis initiate during the Miocene (Cordani, 1970;Lopes & Ulbrich, 2016), it is also plausible that the long-term colonization of this archipelago occurred during the Pleistocene, as suggested by our age estimative (Figure 2). This is justified by the occurrence of intense volcanic activity until late Pleistocene (Cordani, 1970) that must have promoted recurrent extinctions and precluded the maintenance of terrestrial biota in these islands. The Fernando de Noronha archipelago is nowadays located at about 350 km from the continent (Figure 1) as the easternmost of a chain of seamounts aligned in an east-west direction to the Brazilian continental shelf (Vital, 2014). In this chain, only the Fernando de Noronha and Atol das Rocas archipelagos are emergent nowadays. However, sea level fluctuations probably promoted massive changes in the extension of oceanic island areas, which might have affected conditions for immigration, speciation, and extinction in such islands (Weigelt, Steinbauer, Cabral, & Kreft, 2016). In glacial periods of the Pleistocene, the average sea level was much lower than at present, exposing the northeast Brazilian continental shelf, which has around 40 km of extension in this region (Vital, 2014), and probably reducing the linear distance between Fernando de Noronha and the continent. More meaningfully, the lowering of sea level exposed some of the present-day submerged seamounts, which might have allowed a stepping stone colonization that reached Fernando de Noronha islands during the Pleistocene. A similar kind of colonization was also proposed for other oceanic islands, such as those from the Macaronesian archipelago (Rijsdijk et al., 2014).
Based on our population grouping inferences, and assuming that C. fernambucensis subsp. sericifer could be a monophyletic lineage, despite the low support in species tree for SS and SN population groups (PP = 0.77; Figure 2), we conjecture at least three main geographic centers of diversification for these cacti in BAF: (i) the inland rock outcrops and inselbergs; (ii) the southern restinga forest; and (iii) the northern restinga forest together with the islands of Fernando de Noronha. As the diversification was estimated to be within the Pleistocene (Figure 2), the influence of range shifts of BAF vegetation during this period could be hypothesized as the major driver of population differentiation across these areas, following PRH.
For BAF, majority of the evidences suggest that northern evergreen forest was more stable across Pleistocene climate changes, while the southern forest was covered predominantly by open vegetation during glacial times (Behling, 2002;Carnaval et al., 2014). Expansion of open vegetation in the past likely promoted a higher connectivity between inselberg and southern restinga vegetation. The initial fragmentation of these areas was likely driven by the isolation of inselberg and restinga biotas by subsequent expansion of the rainforest matrix between them, with dendritic infiltration along river valleys and surrounding rock outcrops during the past interglacial times. This probably explains both the initial differentiation of the C. fernambucensis subspecies, as well as the discontinuity in the distribution and differentiation of SN and SS populations groups of C. fernambucensis subsp. sericifer, which nowadays occur in distinct mountain ranges, separated by lowlands and river valleys ( Figure 1).
Although some studies show long-term connectivity in taxa from restinga, such as orchids (Pinheiro et al., 2011) and ants (Cardoso et al., 2015), we detected a break in geographic distribution of populations around latitude −17°, in the region politically known as southern Bahia, limiting the FS and FNI populations groups (Figure 1).
Despite this area being near to the delta of the Jequitinhonha River, which has been proposed as a riverine barrier for many BAF groups, including cacti from Pilosocereus arrabidae (Lem.) Byles & Rowley group (Menezes et al., 2016), the FNI group includes locations on both sides of this river (the S94 location at south and the remaining at north) suggesting that Jequitinhonha river has, at least nowadays, limited impact as barrier to gene flow in C. fernambucensis subsp. fernambucensis.
To conjecture about an alternative hypothesis to explain the disjunction observed in our target species in southern Bahia, it is important to observe that this region has a unique flora (Fernandes & de Queiroz, 2015). Further, it is highlighted in several biogeographic studies as a place of disjunction (Cazé et al., 2016;Menezes et al., 2016;Pinheiro et al., 2013)  As these studies include different organisms with distinct dispersal capacities, such as flies (Franco & Manfrin, 2013), amphibians (Carnaval et al., 2009), lizards (Pellegrino et al., 2011), and plants (Cazé et al., 2016;Pinheiro et al., 2013), it is reasonable that asynchrony and recurrent events such as the climatic Pleistocene oscillations could explain some of these empirical observations. However, is also possible that only PRH is an insufficient model to explain all these biogeographic data obtained from groups that diversified in different time scales and in some cases predating the Pleistocene (Pellegrino et al., 2011).

An alternative hypothesis to explain the disjunctions in southern
Bahia is the recent tectonic activity in this region. The geological faults identified in the "Barreiras" formation from southern Bahia, informally named here as "Cabralia" faults ( Figure 1), were likely active during the Quaternary, changing the landscape and the drainage system from a dendritic to a subparallel system (Lima, Vilas Boas, & Bezerra, 2006).
Thus, before the neotectonic activity, a net of interlaced channels forming temporary coastal lagoons probably predominated (Lima et al., 2006), and those might have acted as a barrier for cacti and, together with climatic change during the Pleistocene, could have contributed to the diversification of the two population groups of C. fernambucensis.
Likewise, these events could be related to the continuity/discontinuity dynamic inferred for other BAF taxa during the Pleistocene. Despite the fact that neotectonic activity in the "Guapiara" fault has been invoked as the main cause of vicariance in frogs (Thomé et al., 2010(Thomé et al., , 2014 and birds (Batalha-Filho et al., 2013) in southern BAF, to the best of our knowledge, this is the first time that such activity is used to conjecture about biogeographic scenarios in northern BAF. Further studies are needed to understand the causal impact of possible tectonic activity in the "Barreiras" formation for geographic distributions of BAF biota.
The ABC results were conclusive in rejecting the north-to-south colonization, as model 3 showed very low PP (Figure 3), and extremely low Bayes Factor values when compared to the other two models (results not shown). Further, the independent analyses performed here indicate that south-to-north colonization is more likely to explain our data, despite the lack of strong statistical support to discriminate between models 1 and 2. First, a south-to-north colonization is congruent with all independent biogeographic reconstructions based on distinctive assumptions. Second, the internal H1 haplotype found in SFR operational unit showed the higher outgroup weight (0.31) in statistical parsimony analysis, which is consistent with a condition of an ancient haplotype. Likewise, the haplotypes from the FNI population group found in operational geographic units NRF1 (H11 and H12) and ISL (H14) present a tip position in genealogy (Figure 1) in agreement with more recent haplotypes.
The pattern of south-to-north dispersion for taxa associated with xeric habitats and the opposite pattern for those taxa associated with forested areas might be a widespread phylogeographic pattern in BAF, considering the idea that the northern BAF has been more stable in maintaining the forests during glaciations than the southern BAF (Carnaval & Moritz, 2008;Carnaval et al., 2014). However, this expectation is not always observed in empirical studies, probably due to the differential impacts of climatic changes in different organisms as well as other biogeographic influences and idiosyncrasies.
BAF presents complex topography, broad latitudinal variation, and sea influence leading to a miscellany of historical events that can be ascribed to explain this hyperdiverse biome (Amaral et al., 2016;Cabanne et al., 2016). However, much emphasis has been given to PRH and riverine barriers, while alternative explanations such as the role of both recent orogenic activity and regression/transgression of sea level are relatively neglected. The emphasis on riverine and PRH hypotheses may partially be attributed to the scarcity of precise geological data for most of the Brazilian coast (Thomé et al., 2010). Another possible reason is the strong tendency of researchers to explain historical events occurring during the Quaternary as a single consequence of refuge models or, particularly for BAF, to use the traditional riverine hypothesis to explain discontinuity in geographic distribution. Evidently, these are meritorious explanatory models with much support by several studies. However, in the statistical phylogeography era, these preconceptions may result in the establishment of biased and simplistic scenarios to be tested in model-based analysis, thus limiting the emergence of new hypotheses.
The discussions about the main drivers of diversification in BAF have puzzled scientists over several years and remain controversial. In recent years, many efforts have been made to understand its astonishing diversity, including several case studies and the establishment of alternative biogeographic hypotheses, such as the putative impact of Brazilian shelf topography in the retention of forested areas (Leite et al., 2016; but see Amaral et al., 2016). Despite of this, data from taxa associated with xeric habitats in BAF are still scarce in comparison with taxa inhabiting core evergreen BAF forest. For a more complete picture of BAF biogeography, this bias needs to be minimized, as climatic events may impact taxa from forested and open vegetation areas in different ways. Furthermore, the idiosyncrasies of a particular taxon in response to climatic changes or putative geographic barriers have to be taken into account.

ACKNOWLEDGMENTS
We thank MSc Heidi S.M. Utsunomiya for technical assistance and Dr Gulzar Khan for revising the English. We also agree to the anonymous referees for critical comments on previous versions of this manuscript and to Mr. Gerardus Olsthoorn for supplying some of the samples used in this study. This work was supported by grants from the