Natural CO2 seeps reveal adaptive potential to ocean acidification in fish

Abstract Volcanic CO2 seeps are natural laboratories that can provide insights into the adaptation of species to ocean acidification. While many species are challenged by reduced‐pH levels, some species benefit from the altered environment and thrive. Here, we explore the molecular mechanisms of adaptation to ocean acidification in a population of a temperate fish species that experiences increased population sizes under elevated CO2. Fish from CO2 seeps exhibited an overall increased gene expression in gonad tissue compared with those from ambient CO2 sites. Up‐regulated genes at CO2 seeps are possible targets of adaptive selection as they can directly influence the physiological performance of fishes exposed to ocean acidification. Most of the up‐regulated genes at seeps were functionally involved in the maintenance of pH homeostasis and increased metabolism, and presented a deviation from neutral evolution expectations in their patterns of DNA polymorphisms, providing evidence for adaptive selection to ocean acidification. The targets of this adaptive selection are likely regulatory sequences responsible for the increased expression of these genes, which would allow a fine‐tuned physiological regulation to maintain homeostasis and thrive at CO2 seeps. Our findings reveal that standing genetic variation in DNA sequences regulating the expression of genes in response to a reduced‐pH environment could provide for adaptive potential to near‐future ocean acidification in fishes. Moreover, with this study we provide a forthright methodology combining transcriptomics and genomics, which can be applied to infer the adaptive potential to different environmental conditions in wild marine populations.


| INTRODUC TI ON
Human-driven global change is challenging the scientific community to understand how marine species might adapt to predicted environmental conditions in the near future (e.g., hypoxia, ocean warming, and ocean acidification). The effects of the uptake of anthropogenic atmospheric CO 2 by oceans propagate across the biological hierarchy, from changes in the building blocks of life at nanoscales (Leung et al., 2020) to organismal physiology and behavior (Strader et al., 2020) through to ecosystem processes and their properties (Nagelkerken et al., 2016;Sunday et al., 2017;Teixidó et al., 2018).
These higher-order changes alter biodiversity, species abundances | 1795 PETIT-MARTY ET Al. and composition, and how energy propagates through food webs (Doubleday et al., 2019;Martínez-Crego et al., 2020;Nagelkerken et al., 2020). Some observations of wild populations are surprising because they reveal species thriving rather than declining in naturally acidified environments Nagelkerken et al., 2017), but we still have little understanding of how adjustments to climate change at organismal level are linked to adaptive processes at the population level.
To survive in a reduced-pH environment, marine organisms have to adjust their physiology, which, at the molecular level, is achieved by modifying the expression of genes. At the organismal level, we are learning that reduced-pH environments alter the expression of many genes (hundreds or more, e.g., Evans et al., 2017;Schunter et al., 2016) mostly affecting traits related to calcification, metabolism, acid-base regulation, stress responses, and behavior (Strader et al., 2020). Such changes in gene expression can reflect the capacity of species to respond to environmental variability (i.e., physiological acclimatization; Schlichting & Smith, 2002) but could also indicate responses with negative effects on the fitness of individuals, which would compromise the long-term survival of the species (Kelly & Hofmann, 2013;Strader et al., 2020). At the population level, little is known about the potential of the wild species to long-term adaptation to ocean acidification, given that naturally pH-reduced coastal marine environments are not common (Kelly & Hofmann, 2013;Palumbi et al., 2019). This knowledge from natural environments is critical for the understanding of the adaptive potential to environmental stressors in wild populations because the inherent buffering processes within naturally complex systems (Connell & Ghedini, 2015) reduce the applicability of observations from laboratory experiments to natural ecosystems .
Long-term adaptation happens when genetic changes alter phenotypes improving the fitness of the individuals within the environment they live in. Alteration to individual fitness is driven by environmental or ecological changes that exert selective pressures on populations. The fitness differential among individuals of a population fuels natural selection allowing long-term adaptation to occur (Orr, 2009). If a genetic change improves the fitness of an individual (beneficial or adaptive mutation), natural selection will increase the frequency of this change toward reaching the whole population (i.e., complete or local adaptation, Orr, 2009). However, the fitness effect of a mutation may vary spatially among different environments inhabited by a single population with random mating. When this happens natural selection is therefore spatially heterogeneous (Hedrick, 2006(Hedrick, , 2007Levene, 1953), implying that within one generation, a mutation could be positively selected if the carriers reside in one area (i.e., individuals with the mutation will contribute more to the next generation) but selected against or not selected if they reside in another (i.e., individuals with the mutation will contribute less to the next generation). This process is especially important in marine species as most of them distribute across heterogeneous environments and have highly dispersive larvae (Bernatchez, 2016;Kelly & Hofmann, 2013;Palumbi et al., 2019). Spatially heterogeneous selection was first proposed by Levene (1953) to explain how genetic variation can be maintained within a population distributed across different ecological niches. Currently, this kind of selection is commonly known as a form of balancing selection (Charlesworth, 2006;Hedrick, 2006Hedrick, , 2007. Balancing selection refers to different selective processes that result in the maintenance of genetic diversity within populations (Charlesworth, 2006;Fijarczyk & Babik, 2015;Hedrick, 2007). Hence, the importance of balancing selection as a mechanism of preserving the adaptive potential of marine species to climate change has been highlighted in recent studies (Bernatchez, 2016;Bitter et al., 2019;Brennan et al., 2019;Kelly & Hofmann, 2013;Palumbi et al., 2019).
Balancing selection can be identified as it leaves a footprint in the pattern of neutral nucleotide polymorphisms in the nearby genomic regions of mutations under selection, mostly characterized by an excess of intermediate-frequency alleles (Charlesworth, 2006;Fijarczyk & Babik, 2015;Hedrick, 2007). Evidence of balancing selection's capacity to maintain adaptive mutations to ocean acidification has been demonstrated in the sea urchins (Strongylocentrotus purpuratus; Brennan et al., 2019) and mussels (Mytilus galloprovincialis; Bitter et al., 2019), but not yet for fishes.
Understanding how fitness is connected to genotypes is a 'holy grail' toward enabling predictions on the adaptive potential of species to environmental change in general and climate change in particular. Changes in gene expression due to a change in the environment can provide insights into the fitness benefits of a genotype because the responding genes are likely targets of natural selection (Kelly & Hofmann, 2013;Schlichting & Smith, 2002). The transcriptome of individuals in a given environmental condition could be seen as a first representation of their phenotypes. Genes with increased transcription under environmental change are especially relevant as they are likely triggering regulatory circuits enhancing or suppressing the transcription of other genes (Yosef & Regev, 2011). Hence, if an environmental change persists in time, genetic changes in those responding genes can allow faster or improved responses, which are then expected to be positively selected and increase their frequencies in the population. Therefore, transcriptomics has become a powerful tool to investigate changes in expression of genes underlying the responses to environmental stressors and can aid in revealing the adaptive mechanisms of these responses.
Volcanic CO 2 seeps provide a unique opportunity to study the long-term adaptation to ocean acidification of marine species. CO 2 seeps exhibit a net resource enrichment that benefits the whole trophic chain (Doubleday et al., 2019;Martínez-Crego et al., 2020;Nagelkerken et al., 2017) and can boost population densities of various fish species Nagelkerken et al., 2016).
While the net resource enrichment at CO 2 seeps can be a positive selective pressure for adaptation to reduced pH by marine species, the molecular mechanisms underlying the adaptation to ocean acidification remain elusive and are still unknown in wild fish populations.
Here, we use transcriptomics to discover candidate genes of adaptive selection in a naturally reduced-pH environment to infer potential mechanisms of adaptation to ocean acidification in a fish species.
We study a fish (Common triplefin, Forsterygion lapillum Hardy, 1989) that experiences doubling in its population size on temperate rocky reefs at natural CO 2 vents at White Island, New Zealand, as a model species whose populations benefit from ocean acidification.

| Sampling
Two CO 2 seeps at White Island (New Zealand) are located within temperate rocky reef habitats at 6 to 8 m of depth and have an average pH reduction of about −0.17 and −0.24 units, respectively, across years compared with ambient pH seawater (Nagelkerken et al., 2021), reflective of representative concentration pathway emission scenarios 4.5-6.0 (Bopp et al., 2013). Twenty individuals of the Common triplefin Forsterygion lapillum were collected on SCUBA with hand nets from rocky reefs habitats at the volcanic vents and adjacent control sites of White Island (New Zealand) in January 2019. Ten individuals were caught at the underwater CO 2 seeps with reduced pH (hereafter called 'seeps') and ten in neighboring zones (~25 m away) with ambient pH (hereafter called 'control'). Individuals were measured for total body length.
Their gonad tissue was dissected out immediately and placed in RNAlater (Invitrogen Inc.) and eventually frozen at −80 C for further processing. After sequencing, the average number of paired-end RNA-seq reads per individual was 39.1 ± 2 M. Additionally, one RNA-seq library from gonads of a wild-type zebrafish (Danio rerio) was downloaded from the NCBI SRA database (Accession no. SRX4149436; N reads = 16.8 M) and was used as a control for the de novo assembly and annotation processes.

| RNA sequencing and processing
All sequenced reads were trimmed using the program Trimmomatic (Bolger et al., 2014) eliminating sequencing adaptors and low-quality positions (sliding window 4:15). Unpaired reads and reads with a final length lower than 40 nts were removed from further analysis. The read quality was assessed using FastQC (Andrews, 2010) and summarized with MultiQC (Ewels et al., 2016) with the average Phred score of 36 after trimming.
The final average number of Common triplefin reads per sample was 22.7 ± 1.3 M.

| Sexing of the individuals
Given the small size of the Common triplefin gonads, the sex of the fish was determined by using the levels of expression of zona pellucida-related genes (zp genes). We looked for highly expressed zp genes in female gonads of zebrafish (Danio rerio) in the Bgee DataBase (https://bgee.org/). Then, we retrieved the protein sequences of the orthologous genes of these zp genes for four phylogenetically closest species to the triplefin with sequenced genomes (from Eupercaria group of species in Ensembl genome phylogeny: Labrus bergylta (ballan wrasse), Cottoperca gobio (channel bull blenny), Gasterosteus aculeatus (stickleback), and Larimichthys crocea (large yellow croaker) from Ensembl orthologous gene database (https://www.ensem bl.org). Then, RNA-seq libraries of the Common triplefins were transformed to FASTA format using se-qKit (Shen et al., 2016) and blasted against a database with the orthologous zp genes, using BLASTn software with the task dcmegablast (Smith & Waterman, 1981). The number of reads with significant blast hits for each sample was taken as first evidence to separate males from females. An RNA-seq library presenting a high number of blast hits (putative female) was assembled and annotated. Then, the levels of gene expression of the orthologous zp genes were quantified for all samples following the methodology described below. We found striking concordance between the number of blast hits and the number of quantified reads for zp genes and classified the samples according to these results (Table   S1). The 20 fish randomly collected represented ten females (six from control sites and four from CO 2 seeps) and ten males (four from control sites and six from CO 2 seeps).

| Primary transcriptome assembly
De novo transcriptome assemblies were performed using the software Trinity v2 with default options (Grabherr et al., 2011) separately for (1) one male and one female from control, (2) one male and one female from seeps, and (3) one sample of zebrafish gonads (assembly control). Assembled transcripts were translated into proteins and coding nucleotide sequences using the software Transdecoder v5 (Haas & Papanicolaou, 2015). The primary transcriptome assembly steps are shown in the first part of Figure S1 and basic statistics in Table S2.

| Primary assembly postprocessing and annotation
The longest protein sequences of the assembled transcripts for the four triplefin samples (min. 100 aa) were blasted against the Ensembl Release 99 protein databases (https://www.ensem bl.org) of the four phylogenetically closest species with sequenced genomes: Labrus bergylta, Cottoperca gobio, Gasterosteus aculeatus, and Larimichthys crocea. The same was performed for the control sample of zebrafish, but the primary assembly was blasted against the zebrafish Ensembl protein database.
To eliminate transcripts redundancy produced by alternative transcripts and orthologous genes, we filtered the BLASTp results following these steps (second part of Figure S1):

Each Common triplefin assembly (N = 4) was blasted (BLASTp)
separately against each reference protein database (N = 4) with a cutoff of e-value <10 −6 and a percentage of sequence identity >70%.
2. For each BLASTp result to a reference species, we merged the results of the four samples grouping them by the ID of the reference protein and keeping the ID of the assembled transcript whose blast alignment covered the highest length of a reference protein.
The number of assembled transcripts is presented in Table S3 at protein/transcript level by each assembly and at the gene level for the merged assembly in Table S4. The comparison with zebrafish showed that the postprocessing method using blast analyses against phylogenetically related, though distant, species, results in nearly the same number of assembled transcripts as expected in a gonad sample from zebrafish (Table S3).
3. The BLASTp results for each reference species (N = 4) were merged grouping the results by a unique gene ID represented by the orthologous gene in stickleback (the best-annotated genome of the four reference species), and keeping the ID of the assembled transcript whose blast alignment covered the highest length of a reference protein sequence. Merging different assemblies and using different reference genomes for annotation give a higher number and quality of assembled coding sequences than using single samples and only one reference species (Tables S3   and S4).
The reference transcriptome assembly has 13,630 nonredundant genes with protein sequences that can be aligned to reference sequences in more than 50% of their length (82% covering more than 90% and 64% covering the full length; Table S5). Seventyeight percent of the genes in this reference assembly has annotated orthologous genes in the four related species, while only 4% have annotated orthologous genes in one single species. Thus, the Common triplefin reference transcriptome presents great reliability in terms of functional annotation, with two parameters useful to assess their confidence: the number of related species with significant protein homology, and the length of the recovered transcript (Table S5).

| Gene expression quantification and differential gene expression analysis
The quantification of expression of the genes was performed using Salmon v1.2 (Patro et al., 2017). To obtain TPM (transcripts per million) estimates, we used Salmon's options --validatMapping --GC, and bootstrapping (--numBootstraps 10,000). DESeq2 v2 (Love et al., 2014) was used to test for differential gene expression between control and seep individuals. Normalized read counts were calculated by sample and gene from Salmon expression quantification results. A likelihood-ratio test (LRT) was used to test for significant differential gene expression contrasting a model including environmental pH factor (pH) plus the covariants (sex and body size) and the interaction between pH and the covariants, against an alternative model without the environmental pH factor. p-values from the differential expression analysis were adjusted for multiple testing with the Benjamini-Hochberg procedure (Love et al., 2014). To obtain a set of genes to be used as a neutral reference in the selection analysis, we searched for genes without evidence of differential expression. This was done by means of Wald tests contrasting the expression of genes among sexes (N = 8916 DE genes, nonadjusted p-value < 0.05) and among pH environments without taking into account the variance in sex and body length (N = 8248 DE genes, nonadjusted p-value < 0.05). We used an ANCOVA test to investigate the effect of pH on the overall level of gene expression controlling for the variation in body size and sex. These gene expression levels represent the average of normalized read counts across all genes, as computed by DESeq2. Two-way ANOVAs (with fixed factors pH and sex) were used to test differences in levels of gene expression for the whole dataset and by functional keywords categories (homeostasis, energy production, protein production, protein binding, actin, and signal transduction). Statistical analyses were performed in R (R Core Team, 2020).

| Functional annotation and enrichment analysis of up-regulated genes
Gene Ontology terms (GO) were obtained for the orthologous genes in stickleback genome from Ensembl Release 99 Database (https://www.ensem bl.org). The functional enrichment analysis was performed by counting the number of genes for each GO category in the up-regulated gene dataset and comparing this with the number of annotated genes in each category in the stickleback genome. Tests for differences used Fisher's exact test where p-values were adjusted via Bonferroni correction. Functional annotation for genes without GO annotation in stickleback Ensembl database (N = 8) was obtained from orthologous protein annotation in UniProt for stickleback or zebrafish. Functional keywords were extracted from the GO terms of each gene (Table S6).
Information on genes related to X-linked disease in humans was retrieved from the GeneCards webpage (https://www.genec ards. org).

| Natural selection analysis
Patterns of nucleotide variation across genome regions are informative regarding the presence of natural selection. The folded site frequency spectrum (fSFS) shows the frequency of the minor allele (MAF) in a dataset and is used to find deviations of the pattern of nucleotide diversity related to that expected under a neutral model. Thus, we evaluated the impact of natural selection on genes expressed at significantly higher levels in individuals from CO 2 seeps (up-regulated genes) by comparing the fSFS of these genes with a set of neutral genes. We considered neutral genes the set of genes (N = 106) without evidence of different levels of expression between seeps and control, or between males and females (Wald test p-value > 0.05), and not functionally related to up-regulated genes (i.e., not containing any of the functional keywords listed in Table S6).
As we are searching for genetic variation in differentially expressed genes and the methods for calling single nucleotide polymorphisms (SNPs) are based on read coverage, we only call SNPs for genes with evidence of being expressed in all individuals. For this, we mapped all filtered RNA-seq reads against the datasets of up-regulated and neutral reference coding sequences using MagicBlast v1.5 with default parameters (Boratyn et al., 2019). The obtained bam files were then used to guide the assemblies of up-regulated and neutral genes for each sample with Trinity v2 using the option -genome_guided, and a minimum number of read coverage of 3 (Grabherr et al., 2011). We commonly assembled transcripts in all samples for 39 up-regulated and 97 neutral genes. To perform the SNP calling, we piled up the bam files using SAMTools (Li et al., 2009)

and then used
VarScan mpileup software (Koboldt et al., 2009) with the parameters: --min-coverage = 10, --min-reads 2 = 3, --min-avg-qual = 15, Alternatively, we called SNPs by aligning the longest assembled transcripts obtained for each gene and sample using MUSCLE (Edgar, 2004) obtaining the variant positions with a custom perl script that counts the number of the different nucleotides (A, C, T, and G) in each position of the alignment. This alternative method (aligning method) lacks power to detect low-frequency SNPs. However, aligning complete assembled transcripts minimizes the probability of false SNP discovery, and therefore, we use this method for the verification of the detected high-frequency SNPs with VarScan.
We identified 265 and 1116 SNPs in up-regulated and neutral genes, respectively, with VarScan. We did not find a decrease in the average number of low-frequency SNPs (MAF < 10%) in control (low expression) when compared to seeps (high-expression) in candidate genes (Table S7), indicating that the SNP calling is not affected by the differences in expression. Basic statistics of the discovered SNPs by the two methods are presented in Table S7. To get the fSFS, we only took biallelic SNPs with allele information in all twenty fish into account. fSFS was calculated for up-regulated genes, neutral genes, and the up-regulated genes grouped by their functional category.
All coding SNPs with intermediate frequency were also detected by the aligning method of SNP detection, and classified in synonymous or nonsynonymous by aligning the assembled transcripts from the twenty fish to the coding sequence of the orthologous species with high similarity (see Primary assembly postprocessing and annotation). Differences among fSFSs were tested using one-sided two-sample Kolmogorov-Smirnov tests as implemented in R (R Core Team, 2020). Differences in SNP allele frequency between seeps and control were tested using Fisher's exact test, and the q-value R package (R Core Team, 2020) was used to correct for multiple testing.

| Overall elevated gene expression in fish from natural CO 2 seeps
High variability in gene expression is expected from wild populations, and therefore, we first analyzed the effect of elevated CO 2 on the overall gene expression levels, taking into account individual traits that can influence this. The average level of gene expression was significantly higher in fish from seeps than from control (analysis of covariance, p = 0.016), and this covaried negatively with fish body length (p = 0.006; Table S8; Figure S2). Sex did not have a significant effect on overall gene expression (Table S8, p > 0.05), although the highest levels of expression were observed in males from seeps ( Figure S2B).
To identify all genes with significantly elevated expression in fish from CO 2 seeps versus those from controls, a LRT was performed.
For each gene, we calculated the likelihood of the differences in gene expression in a model including the three factors (sex, body length, and environmental pH), and in a model not including the pH as a factor. For 107 genes, the model including environmental pH better explained the differential expression than the alternative model excluding this factor (LRT, adjusted p < 0.05; Table S9). Sixty-six of these genes exhibit higher levels of expression (up-regulated) in seep individuals (Table 1). Up-regulated genes are potential candidates as targets of adaptive selection to ocean acidification and further analyses focused on these.

| Functional analyses of candidate genes for adaptation to ocean acidification
The expression of the 66 up-regulated genes in seeps was on average higher in males than in females (Figure 1). We found only three genes with clear evidence of being highly expressed in seep females, but less expressed in seep males (EMP1, tmod4, hypk; Figure S3A-C), and one with similar elevated levels of expression in both sexes at seeps (taldo1; Figure S3D).
The functional annotation of these 66 genes included 260 different GO terms (Table S10) to gonad-specific functions, but we found five genes related to TA B L E 1 List of the 66 up-regulated genes found in Common triplefin living at CO 2 seeps with information of the confidence of the assembled transcripts (N species: number of related species where high similarity proteins were found; % recovered protein: percentage of the protein sequence that was aligned with the orthologous reference protein), and its functions as described by functional keywords hormonal activity (Table S10). Moreover, two genes are associated with X-linked diseases in humans, EMP1 and ophn1. EMP1 is one of the three genes showing average elevated values of gene expression in females from seeps but also presented a high variance among individuals ( Figure S3C). The opposite was observed for ophn1 gene, with highest expression in males from seeps and less variance among individuals ( Figure S3E).
Sixty-four out of the 66 genes could be annotated and grouped by common functional keywords (FK ; Table S6) and can be represented by eleven FK that we further grouped into six general main functions (Table 1)

| Natural selection footprint in candidate genes for adaptation to ocean acidification
Fish larvae of the same age cohort recruit across control and adjacent seep sites at White Island, and hence, adaptive selection leading to local adaptation at CO 2 seep environments is counteracted by the dispersal of larvae (i.e., balancing selection). We tested this hypothesis by comparing the fSFS of the candidate genes to adaptive selection (i.e., the up-regulated genes in seeps) against the fSFS obtained from a set of neutral genes (i.e., without any evidence of differential expression in seeps). To minimize SNP miss-calling (see Methods), we only obtained fSFS for genes in which full-length transcripts could be assembled in all 20 samples (N = 39 and 97, for candidate and neutral genes, respectively). No significant difference was found between fSFSs from seep and control sites (one-side K-S test, p = 0.818; Figure 2a). However, a significant skew toward higher minor allele frequencies (MAFs) was found in the candidate when compared to the neutral genes for the whole population (seep + control; one-side K-S test, p = 0.041; Figure 2 A), as can be expected under a balancing selection scenario. In agreement with this balancing selection scenario, we found higher values of average heterozygosity in candidate compared with neutral genes (Table S7).
To investigate whether this skew toward higher MAF in the fre- SNPs being more evident in the energy production-related genes ( Figure 2B).
If there is differential postlarval mortality among seep and control environments, or juveniles move preferentially to the environment to which they are better fitted, then the adults we analyzed do not necessarily represent the recruited larval population at the CO 2 seeps, but the best-fitted ones. In this case, the whole fSFS (control + seeps) would also present an excess of SNPs with intermediate frequencies, but the frequency of the alternative alleles would be different between control and seeps. We evaluated this possibility by testing differences in the allele frequencies of the SNPs detected in the candidate genes among seeps and control sites. The results indicated that there are no significant differences in the allele frequencies of the SNPs among the sites after correcting for multiple testing (p-value > 0.05), suggesting no genetic differentiation between control and seeps adult fish for the up-regulated genes in a reduced-pH environment.

| Fitness effects of the potential adaptive mutations to ocean acidification
Nearly half of the evaluated candidate genes (47%) presented one or are also functionally related to homeostasis maintenance (Table 1).
rasa3 is also related to signal transduction function, while hsp90ab1 is a molecular chaperone with many different functions such as response to estrogen and regulation of specific target proteins involved in cell cycle control and signal transduction (FK: homeostasis, energy production and protein binding; Table 1). Finally, taldo1 is the one with similar elevated expression levels in both males and females from CO 2 seeps and it is important in the metabolism of carbohydrates (FK: energy production; Table 1). Hence, these six genes with greatest evidence of an adaptive process to the reduced-pH environment also show strong evidence of elevated expression at CO 2 seeps ( Figure S3D,F-J) and are functionally related to energy production, homeostasis maintenance, and regulation of downstream biological processes.

| D ISCUSS I ON
Volcanic CO 2 vents can act as natural laboratories to study molecular responses to reduced environmental pH in marine species, providing insights into their scope for adaptation to future ocean acidification. Fish from natural CO 2 seeps showed increased gene expression levels across the gonad transcriptomes, suggesting an effect of ocean acidification on the overall regulation of gene expression. The elevated gene expression was more pronounced in males than in females from CO 2 seeps, which could have reproductive consequences and is possibly linked to the higher reproductive investment in males (Nagelkerken et al., 2021) as Common triplefin males provide parental care of the nests (Wellenreuther & Clements, 2007). Among the genes that presented significant elevated expression in individuals from CO 2 seeps compared with those from control areas, we found seven genes that are regulators of gene expression.
We also detected elevated expression in genes with regulatory functions on downstream biological processes, such as hsp90ab1, rasa3, and agmat. Hence, the increased expression of these genes with regulatory functions can potentially trigger the overall elevated rates of gene expression observed in the fishes from CO 2 seeps.
Nearly half of the genes with significantly higher expression in individuals (males and females) from CO 2 seeps are functionally related to acid-base regulation to maintain pH homeostasis. This observation adds to the emerging view that some genes are being up-regulated in response to the greater energetic demand of living in acidified conditions (Kelly & Hofmann, 2013;Strader et al., 2020). Indeed, maintaining pH homeostasis places greater energetic demands on individuals via increased production of proteins transporting ions within and among cells. Accordingly, we find elevated expression in genes involved in protein and energy production, suggesting an overall increase in metabolic rates in individuals living at CO 2 seeps. Increased metabolism could potentially create energetic trade-offs, where increased energetic investment in maintaining homeostasis occurs to the detriment of other important energy-demanding physiological traits such as reproduction (Kelly & Hofmann, 2013). However, Common triplefins from these CO 2 seeps experience higher food availability due to bottom-up CO 2 enrichment (Doubleday et al., 2019), and with their behavioral dominance over other fish species, they gain better access to food resources . Thus, individuals living at CO 2 seeps can maintain physiological homeostasis and compensate for the increasing energetic demand by increasing their food intake or reducing their activity levels (Nagelkerken et al., 2021). This energetic compensation might not involve long-term adaptation directly but involve mechanisms of physiological acclimatization and behavioral adjustments. Nevertheless, the increase in population size of Common triplefins within CO 2 seeps (Nagelkerken et al., 2016) (Hedrick, 2006(Hedrick, , 2007Levene, 1953). Because larvae recruit randomly among acidified and nonacidified environments, the adaptive alleles for reduced pH did not reach fixation (i.e., local adaptation) in the Common triplefins at White Island but rather are maintained as standing genetic variation within the whole population. However, these adaptive alleles provide advantages when fish are recruited and live in the acidified environment and therefore offer adaptive potential to future ocean acidification.
Most of the potential adaptive SNPs to a reduced-pH environment that we found in Common triplefins do not change proteins but are likely to be mutations located in noncoding nearby genomic regions. Such regions contain transcription's promoters and en-  (Yosef & Regev, 2011), which would explain the observed increased transcription levels in fish from the CO 2 seeps. These mutated regulatory sequences would not impact the fitness of the individuals carrying them when living in an ambient pH environment, but these might allow fine-tuned physiological regulation in a reduced-pH environment. This idea of adaptation through changes in regulatory sequences of gene transcription is supported by the growing evidence indicating that such changes are highly important in species evolution (Carroll, 2008;Holloway et al., 2007) and adaptation to local conditions (Fraser, 2013;Mack et al., 2018). Moreover, we find the footprint of natural selection on the pattern of nucleotide polymorphism in genes grouped by functional categories. This result agrees with the emerging view that adaptation is more likely happening through the adaptive selection of mutations with small fitness effects located in many functionally related genes (Pritchard & Rienzo, 2010). Thus, our results suggest that Common triplefins have adapted to life in an acidified environment through genetic changes in the cis-regulatory sequences of several genes functionally important to respond to a reduced-pH environment.
The mechanism of adaptation we observed for the Common triplefin suggests some differences with current evidence of adaptation to acidified environments in other marine species. Evidence of balancing selection that maintains adaptive genetic variation to ocean acidification has also been found in wild populations of purple sea urchins (Brennan et al., 2019) and Mediterranean mussels (Bitter et al., 2019). These studies showed changes in allele frequencies due to selection on standing genetic variation that impacts larval survival after a treatment of reduced pH (Bitter et al., 2019;Brennan et al., 2019;Pespeni et al., 2013). As a consequence of this kind of selection, adaptation would be fast, but the adult population size is expected to decrease in the acidified environment. This decline in population size could have long-term consequences such as loss of adaptive potential to other environmental conditions (Pespeni et al., 2013). However, in our case-study, the Common triplefin shows an increased adult population size at CO 2 seeps . Moreover, our results do not show evidence of differences in allele frequencies among seeps and control in candidate genes to adaptive selection, which suggests that these do not affect larval survival but most likely the physiological performance of adults under reduced pH. Hence, we propose a new conceptual model explaining how balancing selection could maintain the adaptive genetic variation to ocean acidification in Common triplefins, which is an alternative to that previously proposed for the purple sea urchin F I G U R E 3 Models of potential adaptation to ocean acidification through balancing selection on (a) mutations with large fitness effects (e.g., affecting larval survival; modified from Palumbi et al., 2019) where individuals carry an allele of one mutation (phenotype represented by red circles) that is beneficial under reduced pH (i.e., increasing survival) but detrimental under ambient pH, (i.e., decreasing survival), and (b) polygenic mutations with small fitness effects (e.g., enhancers that increase gene expression in the reduced-pH environment), where individuals carry some or several mutations that are beneficial under reduced pH. The amount of red color in the circles represents the different phenotypes produced by the different combinations of beneficial mutations. The genetic load or evolutionary cost in (a) will be higher than in (b), as many larvae that settle randomly in the different environments will not leave descendants (i.e., all gray circles in reduced pH and all red circles in ambient pH), while in (b), only a small fraction of the individuals will not be well fitted in either of the environments (Pespeni et al., 2013, Palumbi et al., 2019; Figure 3A). The new model involves adaptive selection of mutations with small fitness effects located in the cis-regulatory sequences of several genes functionally important to respond to a reduced-pH environment ( Figure 3B). Different from the model proposed for purple sea urchin, this new model would release the Common triplefin population from the high evolutionary cost of maintaining beneficial mutations to reduced pH with large fitness effects (i.e., on larval survival). Thus, most of the Common triplefins from White Island are probably carriers of some or many beneficial genetic variants that might provide adaptation to ocean acidification through the regulation of gene expression.
In summary, our results indicate that adaptive genetic variation to reduced pH could be located in regulatory sequences of the transcription of genes responding to this environmental change. Regulatory mutations could potentially change the regulatory circuits of gene expression allowing fine-tuned physiological responses to ocean acidification. Our case-study in the Common triplefin shows that these adaptive mutations are maintained within the whole population at a local scale. Evidence of population structuring at regional scales has been found for Common triplefins (Rabone et al., 2015), and the degree of maintenance of adaptive mutations to ocean acidification at this scale deserves further studies in this species. However, the model presented here is applicable to all fish species with pelagic larvae and genetic flow among different environments. Acidification of the ocean is intensifying, and marine species tend to occur across wide geographical ranges with variable pH. Thus, it is likely that adaptive mutations allowing for life in slightly reduced-or variable-pH environments already exist within local populations (Kelly & Hofmann, 2013;Palumbi et al., 2019;Strader et al., 2020). Highly dispersive marine larvae contribute to the flow of this local adaptive genetic variation among species populations, while balancing selection likely allows for the long-term maintenance of this adaptive potential within populations. Hence, it might be expected that mutations in regulatory sequences of gene expression efficiently adjusting the physiological responses to reduced pH will become targets of pervasive adaptive selection in the near future under increasing ocean acidification.

ACK N OWLED G EM ENTS
This study was financially supported by an HKU Seed Fund for Basic Research Grant (201902159006)

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available at NCBI BioProject at https://www.ncbi.nlm.nih.gov/biopr oject, with reference number: PRJNA658206.