Detection of genes positively selected in Cuban Anolis lizards that naturally inhabit hot and open areas and currently thrive in urban areas

Abstract Species of Anolis lizards of the West Indies that naturally inhabit hot and open areas also tend to thrive in urban areas. In this study, transcriptome was sequenced for nine species of Cuban Anolis lizards that are closely related to each other, but inhabit different thermal microhabitats. Using PAML and HyPhy software, we attempted to identify genes and amino acid sites under positive selection in the common ancestral branch of A. porcatus and A. allisoni, and the branch of A. sagrei, which inhabit hot and open areas, and thrive in urban areas. Although there were no genes where positive selection was commonly detected on both of the tested branches, positive selection was detected in genes involved in the stress response (e.g., DNA damage and oxidative stress) and cardiac function, which could be related to adaptive evolution of tolerance to heat or ultraviolet radiation, on both branches. These findings suggest that adaptive evolution of the response to stress caused by heat or ultraviolet radiation might have occurred in ancestors of Anolis species inhabiting hot and open areas and might be related to the current thriving in urban areas of them.


| INTRODUC TI ON
Because forests have a thermal buffering capacity (De Frenne et al., 2019), a reduction in forest or canopy cover due to human activities can destroy the habitats of tree-dwelling organisms, and would also result in severe temperature fluctuations on the ground. Urbanization, which is associated with the loss of forests, not only increases the amount of open areas with low thermal buffering capacity, but also changes the physical structure, humidity, light exposure, and predatorprey relationships (Forman, 2014). However, some species appear to rapidly respond to such changes and thrive in urban environments (e.g., Johnson & Munshi-South, 2017;Winchell et al., 2016Winchell et al., , 2018. Anolis lizards inhabiting the Caribbean islands are model organisms for the study of mechanisms underlying adaptive radiation as different ecomorphs have adapted to distinct structural microhabitats (Losos, ). Some species of Anolis lizards thrive in the urban areas, while other species are rarely seen there . Species of Caribbean Anolis lizards inhabiting hot and dry environments in the Lesser Antilles tend to have greater tolerance to urban areas .
Cuba has the largest number of endemic Anolis lizard species in the Caribbean Losos, ;Uetz et al., ). In addition to structural habitat differentiation through evolution, ecomorphs segregate in different thermal microhabitats, such as shaded forests, forest margins, and open areas (Cádiz et al., 2013). Species of various lineages living in different thermal microhabitats, such as deep forests and open areas can be found in the same local area, which could be linked to local species diversity (Cádiz et al., 2013). For example, A. allisoni (Figure 1), A. porcatus, and A. sagrei are found in hot and open areas, while A. alutaceus and A. isolepis, A. garridoi, A. allogus, and A. me-strei are found in cool and shaded forests, and A. homolechis is found in intermediate forest margins (Rodríguez-Schettino, 1999). A. allogus, A. homolechis, A. mestrei, and A. sagrei are all trunk-ground species and can coexist sympatrically owing to differentiated thermal microhabitats (Cádiz et al., 2013;Losos, ;Ruibal, 1961;Rodríguez-Schettino, 1999;Rodríguez-Schettino et al., 2010). In Cuba, several species of Anolis lizards occupy urban areas. In the city of Havana, A. porcatus and A. sagrei are the most common in urban environments, including the downtown and residential areas, which are open areas with high solar radiation and temperatures, similar to their natural habitats.
A. allisoni has been also introduced into such urban environments, although this species is not as common in Havana as A. porcatus and A. sagrei. In addition, A. carolinensis, a close relative of A. porcatus and the only Anolis lizard native to North America (Glor et al., 2005), has recently invaded the islands of Okinawa and Ogasawara (Japan), as well as the Hawaiian islands (USA) (Suzuki-Ohno et al., 2017;Tamate et al., 2017), while A. porcatus and A. allisoni have been observed in the state of Florida (USA) (Donini & Allman, 2017;Kolbe et al., 2007).
A. sagrei has also recently invaded various areas, including Florida, the Hawaiian islands, and the island of Taiwan (Kolbe et al., 2004). Traits related to the adaptation to hot and open habitats, and tolerance to urban environments may also be associated with invasiveness.
Several studies of the genetic basis underlying adaptation of Cuban Anolis lizards to different thermal microhabitats have been conducted by comparing A. sagrei inhabiting hot and open areas with A. allogus inhabiting cool and shaded forests and A. homolechis inhabiting forest margins. Akashi et al. (2018) analyzed the temperature at which these three species escape from heat source by behavioral experiment. Then, they expressed a putative molecular heat sensor, transient receptor potential ion channel ankyrin 1 (trpa1) of these three species on Xenopus oocytes and electrophysiologically quantified activation temperature of TRPA1 channel encoded by trpa1 . The results of their experiments showed that the temperature which elicits escaping behavior and thermal activation threshold of TRPA1 of A. homolechis and A. sagrei are higher than those of A. allogus . In addition, Akashi et al. (2016) detected differentially expressed genes (DEGs) associated with adaptation to different thermal microhabitats and identified a gene associated with circadian regulation and metabolism, nr1d1, as a DEG with opposite expression patterns for the hotadapted A. sagrei and the cool-adapted A. allogus. However, little progress has been made in the identification of genetic variants involved in the adaptation to different thermal microhabitats.
Anolis cristatellus recently expanded into urban areas on the island of Puerto Rico. Campbell-Staton et al. (2020) investigated the adaptive evolution of this species to heat stress and found that nonsynonymous polymorphisms at one site within the protein synthesis gene rars are associated with plasticity of heat tolerance with parallel selection across multiple urban populations. A. cristatellus may originally have had the ability to adapt to a wide range of environments, from forests to urban areas. Winchell et al. (2020) suggested that species whose natural habitats are hot and dry tend to thrive in urban areas. Therefore, species that evolutionary adapt to open, hot, and dry habitats could have an innate ability to invade and colonize urban environments. Hence, to determine how certain species can thrive in urban environments, it is necessary to clarify the genetic changes that occurred during adaptive evolution to open and hot natural habitats.
The purpose of this study was to detect genes that evolved under positive selection in Cuban Anolis lizards of different lineages that adapted to open and hot environments and are now thriving in urban areas. The estimated evolutionary history of urban tolerance liability associated with hot and dry conditions in natural habitats shows a possible evolutionary increase in urban tolerance in the common ancestor of A. allisoni and A. porcatus, and in A. sagrei . In this study, we phylogenetically analyzed protein-coding sequences in nine species of Cuban Anolis species: A. alutaceus, A. garridoi, A. isolepis, A. allisoni, A. porcatus, A. allogus, A. homolechis, A. mestrei, and A. sagrei

| RNA extraction, RNA sequencing (RNAseq), and de novo assembly
According to a previous report, genes expressed in the brain, liver, and skin of three Anolis lizards (A. allogus, A. homolechis, and A. sagrei) differ to some extent (Akashi et al., 2016). To sequence by RNA-seq and comprehensively detect positive selection for a wide range of genes, it is necessary to extract RNA from multiple tissues that have a difference in the expressed genes. Therefore, we extracted RNA from those three tissues (i.e., brain, liver, and skin). The excised tissues were stored in RNAlater®. DRA004457) and similarly assembled. The isolates of A. allogus, A. homolechis, and A. sagrei individuals were named "a_allogus," "a_homolechis," and "a_sagrei," respectively (Table 1A). The quality and completeness of the de novo assembled transcriptomes were evaluated using Transrate (Smith-Unna et al., 2016) with a protein sequence set of A. carolinensis obtained from the Ensembl 99 as a reference and BUSCO (Simão et al., 2015) using the ortholog database for vertebrates, respectively.

| Ortholog detection and multiple alignment
For the transcriptome of each species, coding sequences (CDSs) were predicted and translated into protein sequences using the TransDecoder program implemented in Trinity software (Grabherr et al., 2011). To estimate homologous protein pairs, homology searches were performed reciprocally between the predicted proteins and the A. carolinensis protein sequence set obtained from Ensembl 99 using National Center for Biotechnology Information (NCBI) Blast+ (Camacho et al., 2009) (e-value < 10 -15 ). If multiple protein sequences were registered for one gene of A. carolinensis in the Ensembl 99 database, the longest one was used. The protein sequences of each species that were reciprocally homologous with a protein sequence of A. carolinensis were retrieved and used as an ortholog group together with the A. carolinensis sequence in subsequent analyses. The annotation of the A. carolinensis ortholog in Ensemble 99 was adopted for gene name encoding these proteins.

If a gene name was not annotated to an A. carolinensis ortholog in
Ensembl 99, the gene name of a human or chicken ortholog for the gene in Ensembl 99 was used. For genes where positive selection was detected, if gene name was not annotated to the ortholog of A. carolinensis, human or chicken in Ensembl 99, we conducted homology search by Web BLAST (https://blast.ncbi.nlm.nih.gov/Blast. cgi). Then, if the protein sequence of the A. carolinensis ortholog registered in Ensembl 99 best hit a gene of A. carolinensis with gene name registered in NCBI, the gene name of the gene registered in NCBI was adopted. The orthologous CDSs from multiple species were aligned together with a codon model using the PRANK multiple alignment program without a user-specified initial guide tree (Löytynoja & Goldman, 2005). Then, sites that were gaps in more F I G U R E 2 The phylogenetic relationships of nine Cuban Anolis lizards used to detect genes under positive selection and A. carolinensis used as reference species for ortholog detection and gene name assignment, and their natural habitats. The phylogenetic relationship of nine Cuban Anolis lizards is based on a reconstruction by Cádiz et al. (2018). The position of A. carolinensis is based on a reconstruction by Glor et al. (2005). Branches colored in yellow indicate foreground branches for detection of genes under positive selection by codeml and aBSREL. The color of the circle represents thermal microhabitat of each species, where blue indicates cool and shaded habitats, such as forests, green indicates intermediate forest margin habitats, and yellow indicates hot and open habitats than two species were removed, and sequences with lengths after removal of the gap sites that were less than half those of A. carolinensis were excluded from further analysis of positive selection. A maximum likelihood phylogenetic tree for each ortholog group was reconstructed using RAxML with GTR-GAMMA model (Stamatakis, 2014).

| Detection of positive selection
Positive selection was detected using two software packages that implemented the branch site model: codeml from the PAML package (Yang, 2007) and aBSREL (Smith et al., 2015) from HyPhy package (Pond & Muse, 2005). For codeml, two models were fit and tested based on the null and alternative hypotheses of sequence evolution.
The null hypothesis assumed that the gene evolved neutrally and the value of ω (i.e., dN/dS) was = 1 in all branches (Model A1: model = 2, NSsites = 2, fix_omega = 1, omega = 1), while the alternative hypothesis assumed that the ω value was > 1 in the foreground branch (Model A: model = 2, NSsites = 2, fix_omega = 0, omega = 1). The p-values were calculated using the chi-squared test with the maximum log-likelihood of these two models. aBSREL (Smith et al., 2015) implemented in the HyPhy package (Pond & Muse, 2005)  p-values for each analysis using codeml and aBSREL in accordance with the method described by Storey et al. (2020), and genes were considered as candidates where positive selection occurred in the foreground branches when the q-value obtained from codeml or aB-SREL analysis was less than 0.05. If an amino acid site was assigned a posterior probability > 95% by Bayes empirical Bayes (BEB) analysis (Yang et al., 2005) in codeml, which infers amino acid sites under positive selection, and the amino acid of this site of A. allisoni and A. porcatus, or A. sagrei, differed from that of other forest-dwelling and forest margin-dwelling species, the site was considered a candidate under positive selection (hereafter, referred to as "a selection site"). The enrichment of gene ontology in genes, where positive selection was detected, was examined using PANTHER (Thomas et al., 2003).

| Prediction of the impact of amino acid substitutions at selection sites
The potential effect of amino acid substitutions at selection sites on the biological function of proteins encoded by genes, where positive selection was detected, was predicted using SIFT (Kumar et al., 2009) with the vertebrate subsections of UniProtKB and Provean (Choi et al., 2012) with NCBI nr protein database. Those two software tools calculate the impact of amino acid substitutions TA B L E 1 (A) Downloaded RNA-seq data generated by Akashi et al. (2016)  on the protein in the context of evolutionary conservation. For each gene, protein sequence of A. isolepis was used as query. When the SIFT score was less than 0.05 or the Provean score was less than −2.5, the amino acid substitution at the BEB selective site was considered to have a significant impact.

| Genes and amino acid sites with positive selection detected in lineages that inhabit hot and open areas, and thrive in urban areas
Genes under positive selection were detected in the common ances-  (Table 3A, Table S1). aBSREL analysis alone detected a positive selection in two of these genes (baz1b and atp2a1), whereas both analyses detected a positive selection in five of these genes (trpm7, mxi1, trim39, abcb6, and eif3a) ( Table 3A, Table S1). In the A. sagrei branch, positive selection was detected in 14 genes by either codeml or aBSREL analysis (Table 3B, Table S1). Of these, positive selection was detected by the codeml analysis alone in four genes (hgh1, tent4a, gba, and agap2), by aBSREL analysis alone in three genes (utp3, tex9, and pnpla6), and by both analyses in seven genes (cadm1, terf2, adgrl4, sptbn2, mef2a, tlcd3a, and c10orf88) ( Table 3B, Table S1).
There was no common gene where positive selection was detected in the common ancestral branch of A. allisoni and A. porcatus, and TA B L E 2 Results of assessment of the quality (Transrate assembly score) and the completeness (BUSCO metrics) of de novo transcriptome assemblies, and CDS prediction and reciprocal ortholog detection with A. carolinensis *Indicates gene names not annotated to the ortholog of A. carolinensis but assigned using the results of homology searches using Web BLAST. Gene names that were not annotated to the ortholog of A. carolinensis but were annotated to human orthologs are indicated by †.

Number of contigs
the A. sagrei branch. Of seven genes where positive selection was detected in the common ancestral branch of A. allisoni and A. porcatus, selection sites (defined in the methods) were detected in mxi1, trim39, abcb6, and eif3a (Table 4). In 14 genes where positive selection was detected in the A. sagrei branch, no selection sites were detected. No significant enrichment of gene ontology with a p-value of < .05 after correction of the false discovery rate was detected when the enrichment of gene ontology of genes where positive selection was detected in the two foreground branches was examined.
In mxi1, trim39, abcb6, and eif3a where positive selection was detected in the common ancestral branch of A. allisoni and A. porcatus, the impact of amino acid substitutions of selection sites was predicted. As a result, of those selection sites, a significant impact was detected by both SIFT and Provean analysis at one selection site in abcb6, and a significant impact was detected only by Provean analysis at one selection site in eif3a (Table 4).  after DNA damage (Oppikofer et al., 2017), TRIM39 encoded by trim39 regulates the balance between cytostasis and apoptosis after DNA damage , and TERF2 encoded by terf2 is involved in recognition of genomic double-strand breaks as an early response to DNA damage (Bradshaw et al., 2005). trpm7, mxi1 and abcb6, detected in the common ancestral branch of A. allisoni and A. porcatus, and gba and c10orf88 in the A. sagrei branch might be related to response to oxidative stress. TRPM7 encoded by trpm7 is activated by oxidative stress (Abiria et al., 2017) and mediates oxidative stress-induced anoxic neuronal death (Aarts et al., 2003). MXI1 encoded by mxi1 is thought to be involved in the protection of cells from apoptosis (Corn et al., 2005) and prevention of excessive production of reactive oxygen species (Delpuech et al., 2007). ABCB6 encoded by abcb6 is thought to protect cells from peroxide-induced stress (Lynch et al., 2009). PAAT encoded by c10orf88 plays an important role in the maintenance of mitochondrial homeostasis, while the depletion of PAAT promotes oxidative stress-induced cell death and mitochondrial DNA damage (Yang et al., 2014). gba might also be related to oxidative stress (Cleeter et al., 2013), but it may contribute to epidermal tolerance to desiccation since the expression of this gene is increased by epidermal water loss (Buraczewska et al., 2009).

| D ISCUSS I ON
In addition, eif3a, in the common ancestral branch of A. allisoni and A. porcatus, and mef2a, in the A. sagrei branch, are involved in the response to cellular stress, such as iron depletion and hypoxia (Lane et al., 2013;Zhao et al., 1999).
The physiological limitation in thermal tolerance is thought to be related to cardiac function (Casselman et al., 2012;Pörtner et al., 2017). The expression of sarcoplasmic/endoplasmic reticulum calcium ATPase encoded by atp2a1, detected in the common ancestral branch of A. allisoni and A. porcatus, plays an important role in thermal acclimation of cardiac function (Korajoki & Vornanen, 2012) and a deficiency of ELTD1 encoded by adgrl4, in the A. sagrei branch, has been reported to exacerbate cardiac function (Xiao et al., 2012).
Of the 21 genes where positive selection was detected in the common ancestral branch of A. allisoni and A. porcatus or the A. sagrei branch, selection sites were detected in mxi1, trim39, abcb6, and eif3a, and a significant impact was detected at one of the selection sites of abcb6 and eif3a. An amino acid variation at these sites would be a priority target to experimentally search for mutations related to adaptive evolution to hot and open habitats, and urban tolerance.
However, future studies are necessary to examine how these amino acid mutations affect the living body.
Positive selection was not commonly detected on both of the two tested lineages in any of 5,962 genes examined in total. This result suggests that there might not be many common genes that were subject to positive selection in those two lineages. Corbett-Detig et al. (2020) recently suggested that the convergent evolution of morphology and behavior in Anolis lizards may involve the evolution of many different protein or amino acid sites. Thus, adaptive evolution to hot and open habitats in different species might have been caused by the evolution of many different proteins or amino acid sites. However, many genes were excluded from analyses in this study (e.g., because a gene sequence for one or more species was not obtained or multiple alignment for a gene was significantly shortened in lengths after the gap removal). Completion of gene sequencing of these species is therefore needed for more comprehensive analysis.

| CON CLUS ION
In this study, positive selection was detected in seven and 14 genes in the common ancestral branch of A. allisoni and A. porcatus, and that of A. sagrei inhabiting hot and open area and thrive in urban areas, respectively. Although there were no genes where positive selection was commonly detected in both of the tested branches, positive selection was detected in genes involved in the stress response (e.g., DNA damage and oxidative stress) and cardiac function, which could be related to tolerance to high temperatures or UV radiation, on both branches. These results suggest that the positively selected changes of these genes might be associated with adaptation to high temperatures and open habitats, which in turn allows tolerance to hot and open urban habitats. The finding that positive selection of any of the 5,962 genes examined was not detected commonly in both of the tested lineages suggests that adaptive evolution of a common gene is rare. However, the focus of this study was limited to the coding regions and many genes were not able to be included in our analyses. Hence, further studies of more complete gene sets of many species, which can be obtained through genome sequencing, are required.

CO N FLI C T O F I NTE R E S T
None declared. Funding acquisition (lead); Investigation (lead); Project administration (lead); Supervision (lead); Writing-original draft (supporting);

AUTH O R CO NTR I B UTI O N
Writing-review & editing (lead).

DATA AVA I L A B I L I T Y S TAT E M E N T
RNA-seq raw sequence reads are available through the DDBJ Sequence Read Archive under accession no. DRA010304. All the other data are included in Supplementary files.