Gene flow and Andean uplift shape the diversification of Gasteracantha cancriformis (Araneae: Araneidae) in Northern South America

Abstract The Andean uplift has played a major role in shaping the current Neotropical biodiversity. However, in arthropods other than butterflies, little is known about how this geographic barrier has impacted species historical diversification. Here, we examined the phylogeography of the widespread color polymorphic spider Gasteracantha cancriformis to evaluate the effect of the northern Andean uplift on its divergence and assess whether its diversification occurred in the presence of gene flow. We inferred phylogenetic relationships and divergence times in G. cancriformis using mitochondrial and nuclear data from 105 individuals in northern South America. Genetic diversity, divergence, and population structure were quantified. We also compared multiple demographic scenarios for this species using a model‐based approach (phrapl) to determine divergence with or without gene flow. At last, we evaluated the association between genetic variation and color polymorphism. Both nuclear and mitochondrial data supported two well‐differentiated clades, which correspond to populations occurring on opposite sides of the Eastern cordillera of the Colombian Andes. The final uplift of this cordillera was identified as the most likely force that shaped the diversification of G. cancriformis in northern South America, resulting in a cis‐ and trans‐Andean phylogeographic structure for the species. We also found shared genetic variation between the cis‐ and trans‐Andean clades, which is better explained by a scenario of historical divergence in the face of gene flow. This has been likely facilitated by the presence of low‐elevation passes across the Eastern Colombian cordillera. Our work constitutes the first example in which the Andean uplift coupled with gene flow influenced the evolutionary history of an arachnid lineage.


| INTRODUC TI ON
The northern Andes in South America is one of the most biodiverse regions on the planet, and the origins of this rich diversity have been linked to past geological and climatic events such as the uplift of the Andes and quaternary climatic oscillations (Kattan, Franco, & Rojas, 2004). The effect of these geoclimatic events in promoting divergence between Neotropical populations and species can be elucidated using genetic data, especially by detecting deviations from the expected coalescent patterns in neutral loci (Rull, 2011). Most studies addressing this question have identified the uplift of the northern Andes as a major driver of Neotropical diversification in a scenario consistent with allopatric differentiation, wherein the complex topography of the Andes isolated populations on both sides of this barrier thus restricting gene flow (Antonelli et al., 2009;Brumfield & Capparella, 1996;Hoorn et al., 2010). In contrast with this predominant view, a recent comparative phylogeographic study found discordant divergence times for multiple avian lineages with cross-Andean distribution, a result that is better explained by dispersal ability across the Andes rather than a single vicariant event (Smith, Harvey, Faircloth, Glenn, & Brumfield, 2014;Smith, McCormack et al., 2014). In line with this finding, new evidence supports the notion that common diversification modes in Neotropical birds include secondary contact between cross-Andean populations or between divergence in the presence of gene flow, facilitated by low-elevation corridors along the Andes (Cadena, Pedraza, & Brumfield, 2016;Oswald, Overcast, Mauck, Andersen, & Smith, 2017).
However, our current knowledge on the modes of animal diversification in the northern Andes is mostly based on vertebrates, and despite arthropods being the most diverse group of animals, analyses of their diversification in this region remain scarce (De-silva et al., 2017;Turchetto-Zolet, Pinheiro, Salgueiro, & Palma-Silva, 2013).
Some studies limited to insects, especially butterflies, show that the Andean mountains played an important role in their diversification, where speciation with and without gene flow across the Andes has occurred (Arias et al., 2014;Chazot et al., 2016Chazot et al., , 2017De-silva et al., 2017;Díaz et al., 2014;Dick, Roubik, Gruber, & Bermingham, 2004;Elias et al., 2009). Despite this, a comprehensive understanding on how the Andean orogeny has promoted Neotropical animal diversification requires the inclusion of additional arthropod taxa, such as arachnids.

Gasteracantha cancriformis (Linnaeus, 1758) is a common
Neotropical orb-web spider that exhibits sexual dimorphism, with females being up to four times larger than males (5 to 13 mm vs 2 to 3 mm, respectively; Muma, 1971). Furthermore, while males apparently spin no definite web, females build a stronger net with reinforced guylines, distinct tufts of silk on the foundation lines and a definitive central disk (Muma, 1971). This orb-web varies in size and position, and is used to capture prey such as whiteflies, flies, moths, and beetles (Muma, 1971). The species is also characterized by its color polymorphism, with at least eight known morphs (Gawryszewski, 2007), but the causes maintaining this variation remain unknown (Gawryszewski & Motta, 2012).
Gasteracantha cancriformis has a wide distribution in the Neotropical and subtropical region (from the south of the United States to northern Argentina; Levi, 1978) and consequently occurs on both sides of the Andes and in the Colombian inter-Andean valleys. This distribution makes it an excellent model to test whether the uplift of the Andes has influenced its diversification at the population level. For example, the Andean uplift may have fragmented populations at opposite sides, thus limiting genetic connectivity; this scenario predicts reciprocally monophyletic clades at each flank of the Andes. Otherwise, long-distance migration of individuals across dispersal corridors in the Andes could favor gene flow among areas; this scenario of divergence with gene flow predicts monophyletic groups at opposite sides of the Andes sharing genetic variation between them. At last, rampant migration of individuals across the Andes could promote genetic homogenization of populations in spite of the geographic barrier; in this scenario, it is expected to find population clustering not explained by geography.
Here, we implemented a multilocus approach to study the genetic connectivity between polymorphic populations of G. cancriformis across the northern Andes ( Figure 1, Supporting information Table S1) and tested scenarios of strict vicariance versus divergence with gene flow. We also evaluated whether lineage clustering in this spider is explained by geography or color pattern. Overall, this work contributes to deepen our understanding of how Andean orogeny has shaped processes of biotic diversification and biogeography in the Neotropics.

| Sample collection
We used standard aerial searching and beating methods to sample a total of 105 individuals of G. cancriformis from 17 localities distributed from the north of Colombia to southeastern Brazil (Figure 1, Supporting information Table S1). Specimens were color coded following the categorization established by Gawryszewski (2007), preserved in a 20% dimethylsulfoxide (DMSO) solution saturated with NaCl, and stored at −80°C. Colombian samples were deposited in the "Colección de Artrópodos de la Universidad del Rosario" (CAUR#229), and Brazilian specimens were deposited in the "Coleção Científica de Aracnídeos e Miriápodes of the Instituto Butantan" (São Paulo, Brazil).

| Molecular phylogenetics and divergence times
The nucleotide substitution model for each mitochondrial gene was selected using the BIC criterion in jMoDeltest 0.1.1 (Posada, 2008).
The most suitable models were HKY+I for COI and TIM+I for 16S.
Tree topologies were estimated with Bayesian inference (BI) using beast 1.7.4 (Drummond, Suchard, Xie, & Rambaut, 2012) and including two Micrathena vigorsi individuals as outgroups (Supporting information Table S1). We unlinked the substitution model for each gene and linked the clock model and the tree model. We applied a lognormal relaxed clock to estimate divergence times using a mutation rate of 0.0112 (SD = 0.001) substitution/site/million years previously reported for node dating and calibration in spiders (Bartoleti et al., 2017;Bidegaray-Batista & Arnedo, 2011;Kuntner, Arnedo, Trontelj, Lokovšek, & Agnarsson, 2013). We ran two runs of 100 million generations sampling every 1000 generations. The initial 10,000 trees were discarded as burn-in using treeannotator (Drummond & Bouckaert, 2015). We examined the output in tracer 1.7 (Rambaut, Drummond, Xie, Baele & Suchard, 2018) to confirm that all effective sample size (ESS) values were >200 and to confirm the convergence of the chains to the stationary distribution. The maximum credibility tree that best represented the posterior distribution was visualized and edited with Figtree 1.4 .
Phylogenetic reconstruction was also performed with maximum likelihood (ML) in IQtree (Nguyen, Schmidt, Von Haeseler, & Minh, 2015) using the same substitution models described before Medellin, and (r) Campinas and applying the edge-proportional partition. Node support was assessed with 1000 bootstrap replicates. We also constructed haplotype median-joining networks per locus with PoPart (Leigh, Bryant, & Nakagawa, 2015).

| Population genetics
We calculated haplotype diversity (h), nuclear diversity (π), number of segregating sites (ss), and Tajima's D with DnaSP v5.10 (Librado & Rozas, 2009). Genetic structure was evaluated using F ST at two different levels: among phylogenetic clades (i.e., populations occurring in opposite sides of the Eastern cordillera of the Colombian Andes) and among populations. Significance of deviations from panmixia was assessed with the Hudson permutation test (Hudson, Boos, & Kaplan, 1992). An analysis of molecular variance (AMOVA) was also calculated for the same levels of differentiation with arlequin 3.5 (Excoffier & Lischer, 2010) using 10,000 permutations.
Using the nuclear dataset, we identified the number of population genetic clusters (K) with the Bayesian clustering approach implemented in structure 2.3.4 (Pritchard, Stephens, & Donnelly, 2000). We ran the analysis under the admixture model with a 50,000 burn-in, 100,000 MCMC sampling generations for K ranging from 1 to 13 (localities with only one individual were removed from this analysis), and 20 iterations for each value of K.
We determined the best K by applying three complementary approaches as recommended by Janes et al. (2017): (i) according to the Δ K method of Evanno (Evanno, Regnaut, & Goudet, 2005), (ii) plotting the likelihood of K for each value of K (Earl & vonHoldt, 2012), and (iii) reporting multiple barplots for K values between 2 and 5. All these tests were implemented in cluMPak (Kopelman, Mayzel, Jakobsson, Rosenberg, & Mayrose, 2015). We ran a multivariate analysis in order to make an additional validation of the genetic clusters for each locus. We did this by transforming fasta sequences into a genind object and loading it into the adegenet R package (Jombart & Ahmed, 2011), in which we performed a principal component analysis (PCA). We retained the first two components for a subsequent canonical discriminant analysis using the R package candisc (Friendly & Fox, 2017).
As isolation by distance (IBD) can obscure population structure signals, we investigated the presence of IBD for each locus using the Mantel (Mantel, 1967) with the R package vegan (Dixon, 2003).
In order to do this, pairwise geographic distances among localities were calculated with the function distm from the R package geosphere (Hijmans, 2016), while genetic distances were estimated by linearizing the F ST values previously obtained. We also implemented a partial Mantel test (Smouse, Long, & Sokal, 1986) to separate the effect of geographic distance from the population assignments based on structure results.

| Demographic model testing and parameter estimation
We used PhraPl (Jackson, Morales, Carstens, & O'Meara, 2017) to choose a demographic model that fits our data. PhraPl compares the topologies obtained from empirical data with those simulated under multiple demographic models and then, by calculating the proportion of times that simulated topologies match the empirical ones, it approximates the log-likelihood of the data under a given model.
PhraPl calculates Akaike information criterion (AIC) as a measure of lack of model fit while the associated AIC weights (wAIC) can be interpreted as model probabilities.
PhraPl can compare various demographic models, which can be broadly categorized as (a) isolation only (IO), (b) migration only (MO), (c) isolation with migration (IM), and (d) mixed models (ME).
We tested six models that fall into the IO and IM categories. The first is an IO model with a single coalescent event and no migration, while the other five are IM models that assume constant gene flow along the branch length and differ in the direction and strength of migration.
As input for PhraPl, we assigned each individual to its collection site (east or west of the Eastern cordillera) and loaded five midpoint rooted trees estimated with IQ-tree (one per locus; Nguyen et al., 2015). We used jMoDeltest (Posada, 2008) to select the most plau- in each dataset as such information is not available yet for G. cancriformis. When we detected gene flow in our dataset, we tested two additional models that correspond to recent (τ-9τ/10) and ancient secondary contact (ττ/5), starting from the tips.

| Phenotype by genotype association
To test whether there is an association between the coloration of individuals and their genetic variation, we ran a chi-square Monte Carlo test under the null hypothesis of independence between coloration and genetic haplotypes. This analysis was run for each locus and establishing color morphs categories following the categories of Gawryszewski (2007). The analysis included only Colombian specimens, due to the fact that Brazilian samples do not have color records.

| Phylogenetic relationships and divergence time
Both BI and ML showed the same phylogenetic pattern which re-  Figure S1). This estimate largely coincides to that of G-PhoCS using mitochondrial data, where divergence time was 1.68 Ma (95% HPD = 0.66-2.53 Ma; Supporting information Figure S1 and Table S2). Both of these dates are very close to the Pliocene/Pleistocene boundary and concordant with the final EC uplift (Gregory-Wodzicki, 2000). When using nuclear data, G-PhoCS estimated a divergence time of 0.66 Ma (95% HPD = 0.37-0.96 Ma; Supporting information Figure S1).

| Population genetics
Mitochondrial and nuclear median-joining networks revealed two groups with shared haplotypes, corresponding to the eEC and wEC clades (Supporting information Figure S2). These clades are separated by 25 mutational steps in the mitochondrial marker COI and by 10 mutational steps in the mitochondrial marker 16S. In nuclear loci, the groups are differentiated by one mutation (Supporting information Figure S2). These groups are genetically differentiated as shown by our F ST analysis (F ST_ mtDNA = 0.60; F ST_ ITS = 0.24; F I G U R E 2 Mitochondrial phylogeny. Best recovered tree with mtDNA where node supports are represented by circles divided into two: The upper half corresponds to posterior probabilities obtained by Bayesian inference, and the lower half to the maximum-likelihood bootstrap values after 1,000 bootstrap pseudoreplicates. Colored circles at the tips represent the color phenotype in the opisthosoma of each individual. Green and red squares highlight the eastern (eEC) and western (wEC) sides of the Eastern Cordillera of the Colombian Andes. Green branches highlight individuals sampled in the eEC that cluster into the wEC clade F ST_ 28S = 0.25; F ST_ HSP90 = 0.20; for all loci p < 0.05 in the Hudson permutation test).
Mitochondrial nucleotide diversity was higher in the eEC clade than in the wEC clade (Table 1); however, this did not hold true for nuclear loci, which may be due to differences in effective size and other causes of mito-nuclear discordance (Toews & Brelsford, 2012).
None of the loci showed significant Tajimas' D suggesting neutral evolution (Table 1). For all loci, genetic structure was more pronounced among populations sampled at different sides of the EC than among populations at the same side (Supporting information Figure S3). This pattern was also reflected in the AMOVA analysis, where most of the mitochondrial variation is explained by differences among regions (eEC and wEC clades; Supporting information Table S3). However, for nuclear loci, most of the variance is due to differences within populations.
All methods applied to select the optimal value of K consistently revealed two groups (K = 2; Figure Figure S3). In agreement with those results, the canonical discriminant analyses also identified two geographic clusters (Supporting information Figure S6). Individuals from both groups share variation between them, for example, most individuals from Villavicencio (eastern foothills of the EC) had either wEC or eEC mtDNA, and up to 30% of their nDNA was of wEC origin. Furthermore, there were two individuals from this locality with wEC mtDNA, and almost ~80% of their nDNA was of wEC origin (Figures 2 and 3, Supporting information Figure S7). Likewise, two individuals from Boquia and Bucaramanga (West and Central Cordillera, respectively) presented wEC mtDNA, but their nDNA showed a shared ancestry of almost 50% with the eEC populations (Figures 2 and 3; Supporting information Figure S7). We ruled out any effect of isolation by distance rather than Andean divergence causing the geographic population structure observed here (Supporting information Figure S8 and Table S4).

| Demographic models
PhraPl revealed different wAIC values and high ∆AIC between models with gene flow (IM) and isolation only (IO; Table 2 and  Table S5). The model with no migration had the lowest wAIC indicating that a single vicariant event with no genetic exchange is not plausible. However, it is difficult to differentiate between symmetrical and asymmetrical gene flow (Table 2; Supporting information Table S5). When we tested recent versus ancient secondary contact, the latter model was better supported, ruling out recent secondary contact but suggesting at least some isolation caused by the vicariant event (Table 2). These results imply that gene flow is responsible for the shared ancestry of G. cancriformis between eEC and wEC geographic regions. Likewise, G-PhoCS also detected reciprocal gene flow between individuals in these two regions (Supporting information Table S2).
TA B L E 1 Population genetic summary statistics for the eastern (eEC) and western (wEC) sides of the Eastern Colombian cordillera for each locus None of the loci showed Tajima's D values that departed from neutral expectations.

| Phenotype by genotype association
We found the white phenotype to be the most frequent morph among all populations studied, while the black-white morph was only present in the eEC populations. Nonetheless, in the Colombian Cauca valley (wEC), we collected a black morph that had not been previously reported (Supporting information Figure S9). Although our molecular sampling (mtDNA and nDNA) revealed a statistical association between genetic variation and geography, such association was not found for color polymorphism (Supporting information Table S6). This lack of association is also evident in the mtDNA phylogeny, as individuals of different colors cluster within same clade (Figure 2).

| D ISCUSS I ON
Our mitochondrial and nuclear data showed two well-supported genetic clusters separated by the EC of the Colombian Andes, although this pattern was more evident in mtDNA than in nDNA.
This discordance is likely due to the differences in effective population sizes between nDNA and mtDNA, the latter having four times less effective population size causing it to complete the process of lineage sorting faster and, as a consequence, estimate older coalescent times (Toews & Brelsford, 2012). In contrast, the larger effective population size of nDNA causes not only younger F I G U R E 3 Bayesian population assignment test based on nDNA. A population assignment test with the software structure based on three nuclear loci identified two distinct populations (K = 2). Bar plots show Bayesian assignment probabilities for individuals where each color represents the most likely ancestry from which the genotype was derived (green: eEC and red: wEC). Bars on the bottom indicate the geographic region that each population belongs to. Populations are coded as described in Figure 1. In population d, individuals 67 and 68 (light gray arrow) have almost 80% of their nDNA from wEC. Individual 78 in population i (mid gray arrow) and individual 95 in population m (black arrow) have wEC mtDNA, but their nDNA showed almost 50% of shared ancestry with the eEC populations TA B L E 2 Measures of fit of alternative models, labeled as in Figure 4 Model Note. Isolation only (a), isolation with migration (b to f), and secondary contact (g and h).
F I G U R E 4 Demographic scenarios tested for the evolution of G. cancriformis in the northern Andes. (a) Divergence with no migration, (b) divergence with unidirectional migration from wEC to eEC, (c) divergence with unidirectional migration from eEC to wEC, (d) divergence with bidirectional symmetrical migration, (e) divergence with bidirectional asymmetrical migration from eEC to wEC, and (f) divergence with bidirectional asymmetrical migration from wEC to eEC coalescent times but also the co-existence of multiple haplotypes with minor differences among them, which increases the "within population" variance, as reflected in our AMOVA analysis. This mito-nuclear discordance is not exclusive to our case but has also been observed in other studies which found the Andes mountains to play a role in splitting populations (Smith, Harvey et al., 2014;Toews & Brelsford, 2012 (Fouquet, Santana Cassini, Fernando Baptista Haddad, Pech, & Trefaut Rodrigues, 2014;Harvey & Brumfield, 2015;Werneck, Gamble, Colli, Rodrigues, & Sites, 2012). The genetic homogeneity found within the wEC clade may also be explained by the topography of the Western and Central Colombian Cordilleras, which are considerably narrower than the Eastern one (Guarnizo et al., 2015), and which might have facilitated the dispersal of individuals across this region.
Here, despite the vicariance associated with the Andean uplift that resulted in eEC and wEC Andean clades for G. cancriformis, we found individuals with shared ancestry between the main two geographic groups. The approximate likelihood demographic model implemented, identified gene flow as the most likely explanation for this. Furthermore, the model with the best support implies divergence in the face of gene flow after τ/5 generations forward in time, which suggests a short allopatric period.
Altitudinal depressions across the Andes can contribute to the dispersal of individuals, thus allowing admixture between populations that occur at opposite sides of this barrier. In fact, the EC of the Colombian Andes is not a uniform barrier along its length and has at least two depressions, the Andalucía pass and the Suaza-Pescado valleys, which may be acting as dispersal corridors (Cadena et al., 2016 (Gressitt, 1965). Although this displacement strategy has not been observed in G. cancriformis, it is used by its sister taxa (Bell, Bohan, Shaw, & Weyman, 2005).
Color polymorphism in the opisthosoma of G. cancriformis did not explain the structure found in this species. However, the lack of association of mtDNA or nDNA haplotypes with coloration may be due to the nature of the loci studied, as they evolve neutrally and are not members of any known pigmentation pathway in arthropods (Wittkopp & Beldade, 2009). It is also important to bear in mind that information of color phenotype is not available for all samples (mainly for those from Brazil), which may be hindering a phenotype by genotype association. This seems, however, unlikely as most of the samples within the wEC clade have phenotype information and still do not cluster by phenotype. Even so, the mtDNA phylogenetic pattern suggests, to some extent, that this polymorphism pre-dates the geographic split. Otherwise, the genetic connectivity between the populations at both sides of the Andes may be favoring the flow of color alleles thus maintaining phenotypic variation. This remains to be tested.
The scenario of vicariance coupled with gene flow found here supports the original ideas of Chapman (1917, 1926) and Haffer (1967a, who claimed that the similarities in the composition of the flora and fauna at both sides of the Andes might be due to dispersals through altitudinal depressions in the cordilleras (Chapman, 1917(Chapman, , 1926Haffer, 1967a,b). However, because we are studying populations of a single species, the signatures of allele sharing we obtained could also be the result of incomplete lineage sorting (ILS), especially affecting nuclear loci. By bad luck, determining the extent to which gene flow and ILS contribute to such shared genetic variation is difficult (Holder, Anderson, & Holloway, 2001;Kubatko, 2009;Kubatko & Degnan, 2007). A more exhaustive evaluation of the role of the This case constitutes one of the few phylogeographic studies in arthropods, and the first arachnids, showing that the Andes acts as a permeable barrier rather than an absolute one for the free movement of alleles. However, our findings are based on a limited sample of the genome. Therefore, further work on this topic implementing genomic approaches is needed, especially in invertebrates. This will allow the implementation of comprehensive analyses of comparative phylogeography to understand how geographic barriers and genetic connectivity across them shape genetic diversity and impact genetic structure.

CO N FLI C T O F I NTE R E S T
All authors declare no conflict of interest.