Local adaptation in mainland anole lizards: Integrating population history and genome–environment associations

Abstract Environmental gradients constrain physiological performance and thus species’ ranges, suggesting that species occurrence in diverse environments may be associated with local adaptation. Genome–environment association analyses (GEAA) have become central for studies of local adaptation, yet they are sensitive to the spatial orientation of historical range expansions relative to landscape gradients. To test whether potentially adaptive genotypes occur in varied climates in wide‐ranged species, we implemented GEAA on the basis of genomewide data from the anole lizards Anolis ortonii and Anolis punctatus, which expanded from Amazonia, presently dominated by warm and wet settings, into the cooler and less rainy Atlantic Forest. To examine whether local adaptation has been constrained by population structure and history, we estimated effective population sizes, divergence times, and gene flow under a coalescent framework. In both species, divergence between Amazonian and Atlantic Forest populations dates back to the mid‐Pleistocene, with subsequent gene flow. We recovered eleven candidate genes involved with metabolism, immunity, development, and cell signaling in A. punctatus and found no loci whose frequency is associated with environmental gradients in A. ortonii. Distinct signatures of adaptation between these species are not associated with historical constraints or distinct climatic space occupancies. Similar patterns of spatial structure between selected and neutral SNPs along the climatic gradient, as supported by patterns of genetic clustering in A. punctatus, may have led to conservative GEAA performance. This study illustrates how tests of local adaptation can benefit from knowledge about species histories to support hypothesis formulation, sampling design, and landscape gradient characterization.


| 11933
PRATES ET Al. Berrigan, & Serra, 2000). Yet, experimental studies indicate that physiological function can be highly conserved within species over wide environmental gradients (e.g., Crowley, 1985;Hertz, Huey, & Nevo, 1983;John-Alder, Barnhart, & Bennett, 1989;Prates & Navas, 2009, Prates, Angilleta, Wilson, Niehaus, & Navas, 2013Van Damme, Bauwens, Castilla, & Verheyen, 1989), and this lack of phenotypic differentiation has been associated with the homogenizing effects of population gene flow (Bridle & Vines, 2007;Lenormand, 2002). It is therefore unclear to which extent the occurrence of species in distinct environments is linked to local adaptation and associated genetic differentiation. Uncovering how environmental gradients contribute to the genetic makeup of organisms, while documenting how population structure and history constrain differentiation, can shed light on the role of local adaptation in the establishment of species in diverse habitats.
To examine local adaptation in heterogeneous landscapes, analyses of genome-environment associations (GEAA) have become a central tool (Rellstab, Gugerli, Eckert, Hancock, & Holderegger, 2015). GEAA studies test for correlations between allele frequencies at multiple loci and environmental predictors across a species' range (Frichot, Schoville, Bouchard, & François, 2013) and are able to account for neutral patterns of population genetic structure that can mimic signatures of selection (Rellstab et al., 2015). However, these analyses are sensitive to the spatial orientation of historical range expansions relative to that of the environmental gradients: A simulation study found that GEAA studies perform best (have lower false discovery rates) when the axes of historical expansion and ecological variation are aligned in space (Frichot, Schoville, Villemereuil, Gaggiotti, & François, 2015). This finding has important implications for empirical studies. First, it suggests that knowledge about the history of occupation of the landscape is crucial to analyses of adaptive genomic differentiation on the basis of GEAA. Second, it implies that studies of local adaptation will benefit from targeting species whose biogeographic trajectories have led to a spatial and ecological arrangement that maximizes GEAA performance.
A system that fits these GEAA requirements well is that of organisms that have undergone range expansions in South American forests. Several species colonized the Atlantic Forest from Amazonia, subsequently expanding southward toward subtropical regions (Batalha-Filho, Fjeldså, Fabre, & Miyaki, 2013;Costa, 2003;Gehara et al., 2014;Prates, Rivera, Rodrigues, & Carnaval, 2016;Prates, Xue, et al., 2016). It has been shown that Amazonia and the Atlantic Forest experience largely distinct climates (Ledo & Colli, 2017;Sobral-Souza, Lima-Ribeiro, & Solferini, 2015). Moreover, pronounced climatic differences distinguish the northern from the southern Atlantic Forest . Species whose ranges now encompass these regions span a climatic gradient from warm and wet conditions in Amazonia to cooler and less rainy settings in the Atlantic Forest. Therefore, these organisms represent an appropriate system to examine signatures of local adaptation associated with species occurrence along environmental gradients on the basis of GEAA. However, the role of genotypes that confer adaptation in shaping the distribution of tropical species remains underexplored (Damasceno, Strangas, Carnaval, Rodrigues, & Moritz, 2014;Moritz, Patton, Schneider, & Smith, 2000;Teixeira, 2016).
This investigation asks whether genetic differentiation identified as potentially adaptive is associated with climatic differentiation along the ranges of the lizards Anolis ortonii and Anolis punctatus.
To explore GEAA in this system, we first infer population structure and history within each anole species, estimating levels of population gene flow-a force that could oppose local adaptation.
Then, we test whether species establishment in distinct climates is associated with potentially adaptive genetic differentiation between populations. To characterize the climatic gradients presently occupied by A. ortonii and A. punctatus, we also estimate climatic space occupancy over their ranges.

| Molecular sampling
We sampled 46 individuals of Anolis punctatus and 23 of Anolis ortonii (specimen and locality information in Supporting Information Table S1; Supporting Information deposited in the Dryad Digital Repository: https://doi.org/10.5061/dryad.1bj51s9), encompassing a substantial portion of their distributions (Ribeiro-Júnior, 2015). Reduced representation sequence data were generated as described by Prates, Xue, et al. (2016). Briefly, genomic DNA was extracted from liver fragments preserved in 100% ethanol through a high-salt extraction protocol following proteinase and RNAase treatment. After examining DNA fragment size using agarose gels, DNA concentration was measured in a Qubit 2 fluorometer (Invitrogen, Waltham) and diluted to ensure a final concentration of 30-100 ng DNA/µl in a total volume of 30 µl (in TE buffer). A restriction site-associated DNA library was generated through a genotype-by-sequencing protocol (GBS; Elshire et al., 2011) at the Institute of Biotechnology at Cornell University.
Genomic DNA was digested with the EcoT22I restriction enzyme, and the resulting fragments were tagged with individual barcodes, PCR-amplified, multiplexed, and sequenced on a single lane on an Illumina HiSeq 2500 platform (rapid run mode). The number of single-end reads per individual ranged from around 500,000 to six million, with a read length of 100 base pairs. De-multiplexed raw sequence data were deposited in the Sequence Read Archive (accession PRJNA492310).
We used Ipyrad v. 0.7.23 (Eaton, 2014; available at https://ipyrad.readthedocs.io) to de-multiplex and assign reads to individuals based on sequence barcodes (allowing no mismatches from individual barcodes), perform de novo read assembly (clustering similarity threshold > 0.9), align reads into loci, and call single-nucleotide polymorphisms (SNPs). To filter out poor-quality reads and reduce base-calling error, we implemented a minimum Phred quality score (=33), minimum sequence coverage (=10×), minimum read length (=70 bp), maximum proportion of heterozygous sites per locus (=0.25), and maximum number of heterozygous individuals per locus (=5) while ensuring that variable sites had no more than two alleles. Following the de-multiplexing step in Ipyrad, read quality and length were assessed for each sample using FastQC (available at https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). For phylogenetic and genetic clustering analyses, a single SNP was randomly extracted per locus to reduce linkage disequilibrium across sites and to maximize sampling of independent SNP histories. Previous to GEAA and genetic clustering analyses, we used VCFtools v. 0.1.13 (Danecek et al., 2011) to filter out SNPs whose minor allele frequency (MAF) was lower than 0.1 to prevent the inclusion of SNPs that may correspond to sequencing errors (Ahrens et al., 2018).

| Inferring population genetic structure
To approximate the number of populations in A. ortonii and A. punctatus, we estimated the number of genetic clusters (k) within each species separately using sNMF v. 1.2 (Sparse Nonnegative Matrix Factorization; Frichot, Mathieu, Trouillon, Bouchard, & François, 2014). We tested k = 1-10, with 100 replicates for each k. The run with the lowest entropy value, estimated by masking 5% of the samples, was considered to identify the best k (Frichot et al., 2014). To examine the robustness of sNMF to the regularization parameter (a), we ran preliminary analyses with a = 1, 10, 100, and 1,000; the best entropy scores were obtained with a = 100. Up to 10% missing data were allowed (i.e., each DNA site was present in at least 90% of sampled individuals). The resulting datasets were composed of 2,580 unlinked SNPs for A. ortonii and 1,807 for

A. punctatus.
To further characterize population genetic clustering, we per-

| Inferring population trees
To obtain a population tree for the estimation of gene flow under a historical demographic framework (see below), we inferred phylogenetic relationships between the genetic clusters (thereafter populations) identified by sNMF. Phylogenetic analyses were performed for A. punctatus only, for which three populations were recovered; in the case of A. ortonii, where only two populations were found based on the genetic data (see Section ), they were treated as sister to each other in downstream analyses.  Bouckaert et al., 2014). A gamma-distributed prior (given by parameters shape, α, and scale, ρ) was applied for the θ parameter (= 4Neμ, where Ne corresponds to the effective population size and μ to the per-generation mutation rate), using α = 1 and ρ = 0.05 (with mean = 0.05). A gamma prior with α = 2 and ρ = 100 (with mean = 200) was set to the rate of the Yule diversification process (λ) given an expected tree height (τ = Tμ, where T corresponds to the total tree time in number of generations) of ~0.004 substitutions per site based on recent phylogeographic study that included A. punctatus (Prates, Rivera, et al., 2016). To help define λ prior ranges, we used the python script Pyule (available at https://github.com/joaks1/ pyule). Default settings were implemented for the mutation parameters. Tree rooting followed Prates, Xue, et al. (2016). To help define prior distributions implemented in SNAPP and G-PhoCS (see below), we used the qgamma, dgamma, and plot functions in R v. 3.3.3 (R Core Team, 2018).
We performed two independent SNAPP runs of five million generations each, sampling every 1,000 steps. After ensuring convergence between runs and stationarity of model parameters in Tracer v. 1.6 (Drummond, Suchard, Xie, & Rambaut, 2012) and discarding 10% of generations as burn-in, we visualized posterior tree densities in DensiTree v. 2.4.8 (Bouckaert et al., 2014). Due to computational constraints, population trees were inferred based on five individuals from each of the populations inferred by sNMF, while allowing no missing data. Selected individuals were sampled in sites with no evidence of admixture as inferred by sNMF. The resulting dataset was composed of 2,927 unlinked SNPs across 15 samples of A. punctatus.

| Estimating historical demographic parameters
To estimate historical demographic parameters such as gene flow, we used G-PhoCS v. 1.3 (Generalized Phylogenetic Coalescent Sampler; Gronau, Hubisz, Gulko, Danko, & Siepel, 2011), which uses the entire nucleotide sequences of loci (as opposed to SNPs only). Populations of A. ortonii and A. punctatus were defined as the populations inferred through sNMF; historical relationships between populations were defined based on SNAPP results for A. punctatus, while the two populations of A. ortonii were treated as sister to one another.
To assess historical population gene flow, we included bidirectional migration bands for every population pair (i.e., one migration band in each direction).
In G-PhoCS analyses, we applied gamma distributions (given by parameters shape, α, and rate, β) for the θ and τ priors, setting α = 1 and β = 20 (with mean = 0.05). The gamma prior for each migration band, m (=M/μ, where M corresponds to the proportion of individuals in one population that arrived from another population in each generation, and μ is the per-generation mutation rate), was set with α = 1 and β = 2 × 10 −7 (with mean = 5,000,000, which corresponds to 1/1,000 of the individuals in one population having arrived from another population in each generation). To convert mutation ratescaled parameter estimates to absolute effective population sizes and divergence times, we assumed an average substitution rate of 2.4 × 10 −9 substitutions per site per year, as estimated by a study that included South American anole species (Prates, Rivera, et al., 2016). To convert estimated coalescent times from the number of generations to years, we assumed a generation time of one year in anoles (Muñoz et al., 2013;Tollis, Ausubel, Ghimire, & Boissinot, 2012).
We performed G-PhoCS analyses for each species separately implementing a Markov Chain Monte Carlo of 400,000 generations and sampling every 100 generations. The stationarity of each run and the posterior distribution of model parameters were assessed as described for SNAPP analyses. Due to computational constraints, G-PhoCS analyses used 2,000 randomly selected loci while allowing for no more than 10% missing data.

| Choice of environmental variables
As environmental predictors in GEAA, we used bioclimatic variables from the WorldClim database (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005) that describe spatial patterns of temperature and precipitation variation. We limited our analysis to environmental variables that are easily interpretable and that showed clear geographic variation across the distribution of our study anoles: annual mean temperature, maximum temperature of the warmest month, mean temperature of the warmest quarter, mean temperature of the coldest quarter, annual precipitation, precipitation of the wettest month, and precipitation of the wettest quarter. Values were extracted from the collection site of each lizard sample using QGIS v. 2.18.15 (available at https://github. com/qgis/QGIS). Because temperature and precipitation variables were often correlated (pairwise Pearson correlation coefficients >0.7), we followed best practices for GEAA and implemented a PCA on all variables, using the first principal component as a predictor in GEAA (Frichot et al., 2013). PCA was applied using the prcomp function in R.
To visualize and compare environmental space occupancy across the range of A. ortonii and A. punctatus, we used plots of the climatic PC1 against latitude based on sites sampled for genetic data. To better characterize the climatic spaces occupied by these two species, we also compiled a broader dataset that includes georeferenced museum specimens, with a total of 89 and 144 unique records for A. ortonii and A. punctatus, respectively (Supporting Information Table S2).

| Performing genome-environment association analyses
To test for associations between environmental predictors and allele frequencies, we used LFMM (Latent Factor Mixed Linear Models; Frichot et al., 2013). LFMM jointly models the effects of environmental variables and neutral genetic structure on allele frequencies, incorporating population structure in the form of unobserved (latent) factors. LFMM has low rates of false positives and is robust to the effects of isolation-by-distance and a variety of sampling designs and underlying demographic histories (Rellstab et al., 2015). To define the number of latent factors used in the LFMM analyses, we explored values around the best number of genetic clusters (k) inferred by sNMF, as recommended by Frichot et al. (2013). Specifically, we performed preliminary runs incorporating one to six latent factors. Two and five latent factors provided better control of the effects of neutral genetic structure in A. ortonii and A. punctatus, respectively (see below), and were used in final analyses.
To ensure that false discoveries (i.e., spurious associations between allele frequencies and environmental predictors) were adequately controlled for, we corrected association p-values based on empirical genomic inflation factors (λ). These factors are used to modify the baseline null hypothesis in GEAA with the goal of limiting inflation related to population structure and other confounding factors (François, Martins, Caye, & Schoville, 2016). To select λ, we inspected histograms of corrected p-values (Supporting Information Figure S1) to ensure that they exhibited a uniform distribution over the interval [0,1] (as expected under the null hypothesis of selective neutrality for most loci) with a peak close to 0 (which corresponds to potentially selected loci; see François et al., 2016). Specifically, we implemented λ = 1.5 and 0.45 for A. ortonii and A. punctatus, respectively. A list of candidate SNPs was generated using the Benjamini-Hochberg algorithm (Benjamini & Hochberg, 1995) by assuming a maximum false discovery rate of 10 −5 .
LFMM was implemented using the lfmm function of the LEA R package, with 10 independent runs of 50,000 iterations each and 25,000 iterations as burn-in. A maximum of 10% missing data was allowed. To reduce the potential effects of missing sites in GEAA, we performed haplotype imputation based on the ancestry coefficients estimated by SNMF, using the impute function of the LEA package

| Candidate gene identity and function
LFMM analyses recovered SNPs whose frequencies are linked to environmental gradients (see Section 3.4). To examine whether the candidate loci harboring these SNPs correspond to genes, we  Alföldi et al., 2011). For blasting, we used a maximum e-value of 10 −5 while masking low complexity (e.g., repetitive) regions. In the case of mapped loci that correspond to protein-coding regions, we then examined gene identity and function as summarized by gene ontology terms based on the UniProt database (available at https://www.uniprot. org/uniprot/).

| Population genetic structure
Clustering analyses using SNMF inferred higher levels of population structure in Anolis punctatus than in Anolis ortonii, with three and two populations recovered in each species, respectively (Figure 1a,b).

| Population history
A scenario of early separation between Amazonian and Atlantic Forest populations in A. ortonii differs from the pattern seen in A. punctatus. In A. punctatus, phylogenetic analyses using SNAPP found that the Atlantic Forest population is nested among Amazonian samples (Figure 2). This Atlantic population is the sister of the population that occurs in eastern Amazonia; the clade formed by them is sister to that population that extends from southwestern Amazonia (around the Madeira River) northwards.

G-PhoCS analyses inferred large effective population sizes in
all regions for both A. ortonii and A. punctatus, estimated as hundreds of thousands to millions of individuals ( Figure 2). Coalescent F I G U R E 1 Genetic clustering based on all SNPs from Anolis ortonii (a) as well as all SNPs (b) and candidate SNPs only (c) from Anolis punctatus. Proportions in pie charts on maps correspond to ancestry coefficients estimated by genetic clustering analyses. Gray areas on map indicate South American rainforests. Red arrows indicate A. punctatus sample MTR 20798 from Pacaraima on the Brazil-Venezuela border in the Guiana Shield region, a locality that is climatically similar to Atlantic Forest sites (see Figure  3); this sample is genetically more similar to eastern Amazonian samples based in the entire SNP dataset, yet more similar to Atlantic Forest samples based on the candidate SNPs only A. ortonii (all SNPs) A. punctatus (all SNPs) A. punctatus

| Climatic space occupancy
The first component of the PCA based on temperature and precipitation variables (PC1), used as a predictor variable in GEAA, explains >70% of the total climatic variation across sampled sites in both species. Positive scores in climate PC1 correspond to lower temperature and precipitation values (Supporting Information Table S3). Plots of climatic space occupancy along a latitudinal gradient based on PC1 suggest that both species experience similar climates over their range in Amazonia (Figure 3). However, while A. punctatus occupies cooler and less rainy environments in the southern end of its distribution in the Atlantic Forest, A. ortonii is restricted to warmer and more humid sites in the northern Atlantic Forest, which is climatically more similar to Amazonia. This pattern holds when considering both sites sampled for genetic data and a broader dataset that also includes hundreds of occurrence records from museum specimens (Figure 3).

| Genome-environment association analyses
GEAA controlling for neutral genetic structure using LFMM found allele frequencies of 86 SNPs in 39 loci to be significantly associated with climatic gradients in A. punctatus. Among these loci, 30 blasted against regions of the annotated reference genome of A. carolinensis. Eleven loci uniquely mapped to known protein-coding genes (Table 1); two candidate loci mapped non-specifically   Note. Only genes that successfully blasted against known protein-coding regions of the genome of A. carolinensis are shown.

Anolis punctatus
to more than four genes; and the remaining loci blasted against non-coding regions, which may correspond to regions that regulate gene expression (e.g., enhancers or transcription factor recognition sites) or that are physically linked to genes that underwent selection.
In the case of A. ortonii, GEAA found no evidence of genomic differentiation associated with environmental gradients. In this species, no SNPs were linked with temperature and precipitation variation across space, even when relaxing the false discovery rate from 10 −5 to 10 −3 .
Inspection of read quality using FastQC indicated similar highquality reads for individuals of both species (not shown), suggesting that differences in DNA quality do not account for distinct GEAA results between A. punctatus and A. ortonii.

| Patterns of candidate allele sharing among populations
In  (Figure 1b). A genetic PCA based on these two SNP datasets shows the same pattern (Supporting Information Figure S2).

| D ISCUSS I ON
Based on genomewide data of A. ortonii and A. punctatus, two widespread South American anole lizard species, we infer distinct genetic populations in Amazonia and the Atlantic Forest, suggesting that separation of these forests following a period of contact has favored genetic divergence (Figure 1). In both species, historical demographic analyses indicate large effective population sizes, mid-Pleistocene colonizations, and post-divergence population gene flow ( Figure 2). Lastly, we find evidence of associations between environmental gradients and allele frequency patterns in A. punctatus, but not in A. ortonii (Table 1), which is consistent with local adaptation in A. punctatus only. Analyses of environmental space occupancy suggest that the two species experience largely similar climates in Amazonia and the northern Atlantic Forest, yet A. punctatus occupies cooler and less humid localities that are not occupied by A. ortonii in the southern Atlantic Forest (Figure 3).

| Population history of South American forest anoles
Historical relationships and coalescent times between Amazonian and Atlantic Forest populations of both A. ortonii and A. punctatus support the hypothesis that a forest corridor connected these two major forest blocks in the mid-Pleistocene. These findings corroborate the results of previous analyses based on much smaller genetic datasets for these species (Prates et al., 2015;Prates, Rivera, et al., 2016). Moreover, they agree with paleontological, geochemical, and palynological records supporting a history of pulses of increased precipitation in South America during the Quaternary. Wetter climates have been implicated in periodic forest expansions in present-day northeastern Brazil, leading to connections and species exchange between Amazonia and the Atlantic Forest (e.g., Cheng et al., 2013;de Oliveira, Barreto, & Suguio, 1999;de Vivo & Carmignotto, 2004).
Recurrent forest corridors may have led to periodic re-establishment of gene flow following the colonization of new regions. A previous test of this idea, based on a multi-locus dataset from A. ortonii and A. punctatus, was inconclusive, presumably due to limited signal in the genetic data available (Prates, Rivera, et al., 2016). Here, on the basis thousands of restriction site-associated markers, we found evi-

| Candidate genes point to ecologically relevant physiological processes
GEAA results provide insights about physiological processes that may have experienced selection in response to climatic regimes in A. punctatus. The candidate genes identified play important roles in energy metabolism, immunity, development, and cell signaling. For instance, the malic enzyme 1 gene (ME1) encodes an enzyme that links the glycolytic and citric acid pathways (Jiang, Du, Mancuso, Wellen, & Yang, 2013) while generating NADPH necessary for fatty acid biosynthesis (Bukato, Kochan, & Świerczyński, 1995). The collagen type V alpha 2 chain gene (COL5A2) encodes a structural component of the extracellular matrix that is involved with skeletal, skin, and eye development through collagen fibril organization (Andrikopoulos, Liu, Keene, Jaenisch, & Ramirez, 1995). The product of the dedicator of cytokinesis 2 gene (DOCK2) is crucial in immune response by mediating the differentiation of T cells and lymphocyte migration to sites of infection (Fukui et al., 2001;Kunisaki et al., 2006). Finally, the coxsackie-and adenovirus receptor-like membrane protein gene (CLMP) encodes a transmembrane component of cell-to-cell junctional complexes that is essential for intestine development (Raschperger, Engstrom, Pettersson, & Fuxe, 2003). By pointing to ecologically relevant physiological processes, candidate genes recovered by GEAA can guide eco-physiological investigations of South American lizards. Future experimental studies that compare physiological tolerances between individuals from distinct habitats have the potential to reveal links between patterns of genetic variation, organismal function, and species distributions (Navas, 2002).
Similar to this study, other investigations found differences in the frequency of alleles that underlie ecologically relevant physiological processes between populations that inhabit contrasting habitats (e.g., Yoder et al., 2014;Benestan et al., 2016;Dennenmoser, Vamosi, Nolte, & Rogers, 2017;Guo, Lu, Liao, & Merilä, 2016 Figure S2a) and previous analyses of historical relationships based on individual samples of A. punctatus (Prates, Rivera, et al., 2016;Prates, Xue, et al., 2016) found that Guiana Shield samples are not closely related to Atlantic Forest ones. Alternatively, this pattern may reflect differences in how neutral and beneficial alleles spread in geographic space, as demonstrated in other animal species that occur in contrasting habitats (Hoekstra, Drumm, & Nachman, 2004). Future analyses of local adaptation in populations that occur in the Guiana highlands will rely on improved sampling for this key region.

| GEAA in species that have expanded along environmental gradients
Our analyses of local adaptation may have been affected by conservative GEAA performance in species that underwent a history of range expansions along environmental gradients. When the axis of range expansion is parallel to the orientation of ecological gradients, the selected sites are expected to exhibit patterns of spatial structure that are similar to those of the neutral sites. As a result, the test's null hypothesis (which relies on neutral background genetic variation) yields a stricter control of false positives , and GEAA is expected to detect only the outlier loci that exhibit strong associations with the environmental predictor. A similar pattern of spatial structure between presumably se-

CO N FLI C T O F I NTE R E S T
None declared.

DATA ACCE SS I B I LIT Y
All sequence data were deposited in the Sequence Read Archive