Unique genetic variants underlie parallel gene expression within a young adaptive radiation despite specialization on highly divergent resources

There are many cases of parallel gene expression underlying the evolution of convergent niche specialization, but parallel expression could also underlie divergent specialization. We investigated divergence in gene expression and genetic variation across three sympatric Cyprinodon pupfishes endemic to San Salvador Island, Bahamas. This recent radiation consists of a generalist and two derived specialists adapted to novel niches – a ‘scale-eater’ and a ‘snaileater.’ We sampled total mRNA from all three species at two early developmental stages and compared gene expression with whole-genome genetic differentiation between all three species. 82% of genes that were differentially expressed between snail-eaters and generalists were up or downregulated in the same direction between scale-eaters and generalists; however, there were no shared fixed variants underlying this parallel expression. These genes showing parallel expression did not exhibit increased developmental constraints, but were enriched for effects on metabolic processes. Alternatively, genes showing divergent expression were enriched for effects on cranial skeleton development and pigment biosynthesis, reflecting the most divergent phenotypes observed between specialist species. Our findings reveal that convergent adaptation to higher trophic levels between divergent niche specialists through shared genetic pathways is governed by unique genetic variants.

scaffold N50, = 835,301; contig N50 = 20,803; (38)). We followed Genome Analysis Toolkit (v 155 3.5) best practices and hard filter criteria to call and refine our SNP variant dataset (QD < 2.0; FS 156 < 60; MQRankSum < -12.5; ReadPosRankSum < -8 (39)). Our final SNP dataset contained 16 157 million variants and a mean sequencing coverage of 7× per individual (range: 5.2-9.3×). 158 We identified SNPs that were fixed in each specialist species. We calculated genome 159 wide F st using VCFtools' 'weir-fst-pop' function for two different population comparisons 160 involving samples collected from San Salvador: generalists (n = 13) vs. snail-eaters (n = 11) and 161 generalists (n = 13) vs. scale-eaters (n = 9). Our SNP dataset included 14 scale-eaters, however, 162 we split our scale-eater population into two groups (large-jawed scale-eaters, n = 9 and small-163 jawed scale-eaters, n = 5) based on previous evidence that these two populations are genetically 164 distinct (31,32). This allowed us to identify SNPs unique to large-jawed scale-eaters (i.e. C. 165 desquamator (40)), which were the only type of scale-eater we sampled for RNA-seq. We 166 identify zebrafish protein orthologs with high similarity (E-value < 1) to NCBI protein 184 accessions for genes that we identified as differentially expressed between Cyprinodon species. 185 We used these orthologs to determine if any gene ontology categories were enriched across 186 differentially expressed genes. We grouped enriched GO categories into similar representative 187 terms using the REVIGO clustering algorithm (46). 188 We also used these orthologs to estimate pleiotropy for differentially expressed genes 189 based on the number of associated GO biological processes, protein-protein interactions (PPIs), 190 and developmental stages when they are known to be expressed (47). We again used the GO 191 Consortium (44) to determine the number of biological processes associated with each gene. We 192 examined biological process annotations only for genes from ZFIN (zfin.org) with experimental 193 evidence (GO evidence code EXP). The String protein database (v. 10; (48)) calculates a 194 combined score measuring confidence in protein interactions by considering known interactions 195 (experimentally determined and from manually curated databases) and predicted interactions. 196 genes, focusing only on interactions with experimental evidence (i.e. non-zero experimental 198 evidence scores). 199 Next, we determined the number of developmental stages where a gene is known to be 200 expressed using the Bgee expression call database for zebrafish (v. 14.0 (49)). We considered 201 eight developmental stages from larval day five to juvenile day 89 from the Zebrafish Stage 202 Ontology (ZFS) that were deemed 'gold quality,' meaning there was no contradicting call of 203 absence of expression for the same gene, in the same developmental stage (49). We tested 204 whether levels of gene pleiotropy were different in genes showing parallel expression in 205 specialists versus divergently expressed genes by fitting a generalized linear model (negative 206 binomial family; glm.nb function in the R library "MASS") on count data for pleiotropy 207 estimates. We did not measure pleiotropy for genes expressed at 17-20 dpf due to the low 208 number of zebrafish orthologs matched for genes with parallel expression in craniofacial tissues 209 (11 out of 23). 210 Finally, we used the duplicated genes database (50) to identify paralogs in our 211 differentially expressed gene dataset. We calculated whether genes showing parallel expression 212 in specialists had more paralogs than divergently expressed genes using Pearson's Chi-square 213 test in R. 214 215

Differential expression between generalists and specialists 217
Total mRNA sequencing across all 29 samples resulted in 677 million raw reads, which was 218 reduced to 674 million reads after quality control and filtering. 81.2% of these reads successfully 219 aligned to the reference genome and 75.5% of aligned reads mapped to annotated features with 220 an average read depth of 309× per sample. We used this dataset to identify differences in gene 221 expression associated with specialist phenotypes at two developmental stages and within two 222 groups of tissues. 223 out 24,383 total genes annotated in the Cyprinodon variegatus assembly (NCBI, C. variegatus 225 Annotation Release 100, (38)). These analyses revealed 1,014 genes (1,119 isoforms) 226 differentially expressed between generalists vs. snail-eaters and 5,982 genes (6,776 isoforms) 227 differentially expressed between generalists vs. scale-eaters ( Fig. 1) (Benjamini and Hochberg 228 adjusted P < 0.05). 833 genes (923 isoforms) showed the same direction of expression (up or 229 down regulated) in both specialists relative to generalists, indicating parallel expression in 230 specialists (Fig. 2). Specifically, 497 differentially expressed isoforms showed lower expression 231 in both specialist species compared to generalists, while 424 showed higher expression in 232 specialists (Fig. S1). This is significantly more parallel expression than would be expected by 233 chance for genes (n = 6,996; 100,000 permutations; P < 1.0 × 10 -5 ) and gene isoforms (n = 234 8,998; 100,000 permutations; P < 7.0 × 10 -4 ) (Fig. S2). 235 Craniofacial morphology is the most rapidly diversifying trait in the San Salvador 236 radiation (51). In order to detect genes expressed during jaw development, we compared 237 expression within craniofacial tissue at the 17-20 dpf stage. We discovered 120 genes (123 238 isoforms) differentially expressed between generalists vs. snail-eaters and 1,903 genes (2,222 239 isoforms) differentially expressed between generalists vs. scale-eaters (Benjamini and Hochberg 240 adjusted P < 0.05). We observed a lower proportion of parallel expression in specialists for 241 craniofacial tissue compared to whole-body tissue. 23 genes showed the same direction of 242 expression in specialists relative to generalists, with 11 differentially expressed isoforms 243 showing lower expression in both specialist species and 14 showing higher expression. This is 244 significantly less parallel expression than would be expected by chance for genes (n =2,023; 245 100,000 permutations; P < 1.0 × 10 -5 ) and gene isoforms (n = 2,345; 100,000 permutations; P < 246 1.0 × 10 -5 ) (Fig. S2). 247 We also identified 4 genes that showed opposite patterns of expression in specialists 248 relative to generalists (Table S2). In 8-10 dpf whole-body tissue, 2 genes (agxt2, si:ch211-249 197h24.9) were upregulated in snail-eaters and downregulated in scale-eaters, while one gene 250 (plin2) was upregulated in scale-eaters and downregulated in snail-eaters. In 17-20 dpf 251 craniofacial tissue, one gene (mybpc2a) was upregulated in snail-eaters and downregulated in 252 scale-eaters. 253 generalists and snail-eaters and 1,543 SNPs fixed between generalists and scale-eaters. None of 257 these fixed variants were shared between specialists. Next, we determined which of these fixed 258 SNPs fell within gene regions (either exonic, intronic, or within 10kb of the first or last exon). 259 29% of SNPs fixed in snail-eaters overlapped with 21 gene regions, while 59% of SNPs fixed in 260 scale-eaters overlapped with 245 gene regions. We found 365 SNPs fixed in scale-eaters (24%) 261 within 68 gene regions that showed differential expression between generalists and scale-eaters 262 in whole-body tissue (8-10 dpf) and 123 SNPs (8%) within 26 gene regions differentially 263 expressed between generalists and scale-eaters in craniofacial tissue (17-20 dpf). We suspect that 264 some of these fixed variants are causal cis-regulatory variants responsible for species specific 265 expression patterns that ultimately give rise to phenotypic differences in scale-eaters. 266 Conversely, we only identified a single SNP fixed in snail-eaters within a gene (tmprss2) that 267 was differentially expressed between generalists and snail-eaters in whole-body tissue (8-10 dpf). 268 We did not find any fixed variants near genes differentially expressed between generalists and 269 snail-eaters in craniofacial tissue, possibly suggesting that fixed variants regulate expression 270 divergence at an earlier developmental. 271 Since we did not find any variants that were fixed between scale-eaters and generalists 272 that were also fixed between snail-eaters and generalists, we repeated our analysis with a lower 273 threshold of genetic divergence by examining SNPs with F st We performed GO enrichment analyses with zebrafish orthologs for genes showing parallel 304 expression patterns in specialists (n = 620) and genes showing divergent expression patterns in 305 scale eaters (n = 3,349) and snail-eaters (n = 102). We restricted these analyses to genes 306 expressed at 8-10 dpf because the number of genes showing parallel expression in specialists at 307 17-20 dpf (n = 23) was low and did not show enrichment for any biological process. 308 Genes showing parallel expression in trophic specialists were enriched for metabolic 309 processes (20% of GO terms). In contrast, genes with divergent expression patterns in specialists 310 were enriched for cranial skeletal development and pigment biosynthesis (7% and 3% of terms, 311 viscerocranium, striated muscle, liver, erythrocyte, and the development of the heart, optic cup, 314 pancreas, and brain. 41 divergently expressed genes were annotated for viscerocranium 315 development, which is particularly interesting because this tissue gives rise to the most divergent 316 craniofacial structures in specialists (55,56). 317

318
The genetic basis of extreme craniofacial divergence 319 We previously described 30 candidate gene regions containing variants fixed between trophic 320 specialist species associated with variation in jaw length (31). These candidates also showed 321 signatures of a recent hard selective sweep (31). Encouragingly, we found eleven of these genes 322 differentially expressed between generalists and specialists (seven at 8-10 dpf and four at 17-20 323 dpf) (Table S4). 324 We searched for variants fixed in specialists within gene regions across all 7,394 genes 325 that were differentially expressed between either generalists and snail-eaters or generalists and 326 scale-eaters. We discovered fixed SNPs in 81 of these gene regions (either intronic or within 327 10kb of the first or last exon), potentially implicating cis-acting variants regulating gene 328 expression. Further testing this hypothesis, we searched for signatures of hard selective sweeps 329 in specialists. Interestingly, 95% of these gene regions containing fixed SNPs showed signs of 330 experiencing a selective sweep (estimated by SweeD; CLR > 95 th percentile across their 331 respective scaffolds) ( Table S5). All of these genes regions contained SNPs fixed between 332 generalists and scale-eaters and showed differential expression in this same comparison. Five of 333 these 81 genes were indicated as strong craniofacial candidates in our previous study (31). 334 Finally, we compared this list of genes experiencing selection to those annotated for cranial 335 skeletal system development (GO:1904888) and muscle organ development (GO:0007517). This 336 revealed three genes containing fixed variation in scale-eaters that likely influence craniofacial 337 divergence through cis-acting regulatory mechanisms (loxl3b (annotated for cranial effects); 338 fbxo32 and klhl40a (annotated for muscle effects)). 339 340 evolution of two trophic specialist species that rapidly diverged from a generalist common 343 ancestor within the last 10,000 years. We examined how gene expression and SNP variation 344 influence snail-eater and scale-eater niche adaptations using comparisons between each specialist 345 and their generalist sister species. We found a significant amount of parallelism at the level of 346 gene expression yet no parallelism at the level of fixed genetic variation in specialists. 347 Specifically, 82% of genes that were differentially expressed between snail-eaters and generalists 348 were up or downregulated in the same direction when comparing expression between scale-349 eaters and generalists (Fig. 2). We show that this parallel expression is not the result of the same 350 underlying genetic variation. 351 We explored two possible explanations for this pattern: 1) developmental constraints 352 resulting from pleiotropy or functional redundancy favor parallel expression in specialist species 353 or 2) parallel selective environments act on similar genetic pathways needed for adaptation to 354 higher trophic levels. 355 356

Developmental constraints on the evolution of parallel gene expression 357
We tested two hypotheses considering how different forms of developmental constraint could 358 promote parallel expression patterns in specialist species. One mechanism constraining the 359 probability of parallelism is pleiotropy -when the expression of one transcript influences 360 multiple phenotypic characteristics (22,47,57). Higher gene pleiotropy is correlated with 361 participation in more protein-protein interactions (PPIs), which can cause cascading downstream 362 effects on multiple biological processes (52,53). We predicted that genes showing parallel 363 expression might be under higher constraint due to negative pleiotropy. We estimated three 364 measures of gene pleiotropy (number of associated GO biological processes, protein-protein 365 interactions (PPIs), and developmental stages when they are known to be expressed) and found 366 no significant difference in any measure for genes showing parallel versus divergent expression 367 patterns in specialists (Fig. 3). This finding is consistent with some empirical evidence and 368 generalists (66). Fish scales and mollusks contribute to more nitrogen-rich diets in specialists 386 compared to generalist species that primarily consume algae and detritus (66). Perhaps the same 387 metabolic processes required for this type of diet are adaptive at higher trophic levels for both 388 scale-eaters and snail-eaters, which might explain patterns of parallel expression. Thus, we 389 predicted that genes showing parallel expression would affect metabolic processes that may be 390 similar between specialists, whereas genes showing divergent expression in specialists would 391 affect developmental processes. 392 GO enrichment analyses support both hypotheses. We found that 20% of GO categories 393 enriched for genes showing parallel expression described metabolic processes, and zero 394 described cranial skeletal development or pigment biosynthesis (Fig. 4A). In contrast, 10% of 395 terms showing enrichment in the divergently expressed gene set described developmental 396 processes (cranial skeletal development and pigment biosynthesis) and only 11% described 397 generalist (CHM and JAM personal observation). In contrast, genes showing divergent 403 expression in specialists are more important for shaping divergent skeletal and pigmentation 404 phenotypes between species (Fig. 4B, C and D). 405 406

Parallel gene expression despite divergent genetic variation 407
We find significant parallel gene expression across genes that are annotated for effects on 408 metabolism, yet shared expression patterns do not seem to be driven by the same fixed variants. 409 This is surprising in this young radiation given that the probability of parallel genetic variation 410 underlying phenotypic convergence increases with decreasing divergence time (1,24,67). 411 Although 95% of differentially expressed gene regions containing fixed SNPs show signs of 412 experiencing a selective sweep, and almost none of these variants in these SNPs were in exons, it 413 is still possible that fixed alleles do not regulate parallel expression of metabolic genes. Instead, 414 we reasoned that segregating variants shared between specialists might be responsible for shared 415 expression patterns. However, we still failed to find any shared variants at lower levels of 416 differentiation (F st ≥ 0.8). Another possibility is that parallel expression is not controlled by 417 shared SNPs, but rather a more complex type of genetic variation (copy number variants, 418 inversions, insertions, deletions). However, preliminary results suggest that there are no shared 419 insertion/deletions between specialists (JAM unpublished data). Many studies on parallel 420 adaptation through gene reuse describe convergence within pigmentation and skeletal 421 development pathways (6,18,23,68). Perhaps the architecture of metabolic adaptation is more 422 flexible, having more mutational targets or employing more late-acting developmental regulatory 423 networks that are less constrained than early-acting networks (67,69-73). Our findings highlight 424 the importance of understanding the probability of genetic convergence across different 425 biological processes. 426 427 influences larval development, however, some activation of parallel gene networks is likely 430 specified at pre-hatching developmental stages (69,70). It is also possible that we did not have 431 the power to identify subtle differences in expression for genes that showed high divergence 432 between specialists and generalists. Detecting differential expression of transcripts is notoriously 433 difficult when read counts are low and variance within treatment groups is high (74,75). We 434 were able to detect differential expression for genes with a mean normalized count as low as 1.6 435 (median = 150) and log 2 fold change as low as 0.2 (median = 1.11). Thus, we could not detect 436 any variation causing very slight fold changes in expression below 0.2. Furthermore, our scale-437 eater sample sizes (8-10 dpf n = 3; 17-20 dpf n = 2) were lower than that of generalists and snail-438 eaters (n = 6 at both stages) (Fig. S1), which likely resulted in more false positives for scale-eater 439 comparisons. 440 Finally, our results are robust in a recently published independent analysis of gene 441 expression in San Salvador pupfishes that identified many of the same genes we found 442 divergently expressed between specialists (38). We examined this dataset using the same 443 significance thresholds for differentially expressed genes as described in Lencer et al. for mRNA 444 extracted from all three species at 8 dpf and 15dpf (p < 0.1 and |Log 2 fold change| > 0.2). We 445 found that 40% of genes divergently expressed between specialists in this dataset were 446 divergently expressed in our own dataset. Importantly, Lencer et al. only examined cranial 447 tissues at both of these developmental stages, and they did not investigate parallel expression 448 between each specialist. Next we searched for evidence of parallel expression for mRNA 449 extracted from all three species at 8 dpf. 28.8% of genes that were differentially expressed 450 between snail-eaters and generalists were up or downregulated in the same direction between 451 scale-eaters and generalists. This is a lower proportion of parallel expression than we identified 452 (Fig. 2)  scale-eaters (right). Genes that show the same expression patterns in specialists relative to 732 generalists are blue, and those showing divergent expression patterns unique to each specialist 733 are green. Significantly more genes show the same expression pattern (either up or 734 downregulated) in specialists relative to generalist gene expression at 8-10 dpf than expected by 735 chance (Fig. S2; 100,000 permutations; P < 1.0 × 10 -5 ). Alternatively, significantly fewer genes 736 show the same expression pattern in 17-20 dpf craniofacial tissue ( Fig. S2; 100,000 737 permutations; P < 7.0 × 10 -4 ). 738