Genetic divergence and phenotypic plasticity contribute to variation in cuticular hydrocarbons in the seaweed fly Coelopa frigida

Male sexual signals facilitate female choice and thereby reduce mating costs. The seaweed fly Coelopa frigida is characterized by intense sexual conflict, yet it is unclear which traits are under sexual selection. In this study, we investigated phenotypic and genetic variation in cuticular hydrocarbons (CHCs) in C. frigida for the first time. CHCs are ubiquitous in insects and they act as sex pheromone in many dipterans. We characterized CHCs and found that composition was affected by both genetic (sex and population origin) and environmental (larval diet) factors. We identified general shifts in CHC chemistry as well as individual compounds and found that the methylated compounds, mean chain length, proportion of alkenes, and normalized total CHCs differed between sexes and populations. We combined this data with whole genome resequencing data to examine the genetic underpinnings of these differences. We identified 11 genes related to CHC synthesis and found population level outlier SNPs in 5 that are concordant with phenotypic differences. Together these results reveal that the CHC composition of C. frigida is dynamic, strongly affected by the larval environment, and likely under natural and/or sexual selection.


INTRODUCTION 21
Females and males often differ in optimal reproductive strategies, particularly regarding 22 the mode and frequency of matings. Males can increase their fitness by mating with 23 multiple females, while one or a few matings are generally sufficient for females to 24 maximize their reproductive success (Bonduriansky, 2010). This sexual conflict over 25 mating rates commonly leads to males harassing females to increase the likelihood that 26 their genes get passed on to the next generation (Sánchez-Guillén et al., 2017). Such 27 male mating harassment can carry considerable costs for females, e.g. by reducing 28 reproductive success and increasing mortality (Gosden & Svensson, 2007). For example, 29 repeated matings are the norm in the blue-tailed damselfly Ischnura elegans, particularly 30 when population densities are high (Corbet, 1999), with females showing elevated 31 immune responses, possibly because of mating related injuries or sexually transmitted 32 diseases (Chauhan et al., 2016). Likewise, in the guppy Poecilia reticulata, males attempt 33 to copulate with females by thrusting their gonopodium towards the female's genital pore 34 as often as once every minute, disrupting normal activities like foraging (Magurran & 35 Nowak, 1991). 36 37 Females can minimize the cost of unwanted copulations by making it difficult for males 38 to detect them (as seen in species with female limited colour polymorphism, 39 standard wrack consisting of 50 % Fucus spp. and 50 % Saccharina latissima which had 132 been chopped, frozen, and then defrosted. The exception to this is a subset of Ystad 133 adults that were allowed to mate on the same material they had been collected in (field 134 wrack). Thus we had 5 treatment groups: All four populations (Skeie,Østhassel,Stavder 135 ,Ystad) on standard wrack, and Ystad also on field wrack. Adult flies were allowed to lay 136 eggs on the provided wrack and this second generation was raised entirely in a 137 temperature controlled room at 25 °C with a 12 h/12 h light dark cycle. As larvae pupated 138 they were transferred to individual 2 mL tubes with a small amount of cotton soaked in 139 5 % (w/v) glucose to provide moisture and food for the eclosing adult. This ensured 140 virginity in all eclosing flies. The tubes were checked every day and the date of eclosure 141 of each fly was noted. Two days after eclosure adult flies were frozen at -80 °C and kept 142 there until CHC extraction. 143

CHC Analysis 144
Frozen flies were placed in 12-well porcelain plates to defrost and dry, as moisture can 145 affect lipid dissolution. Each fly was then placed in a 1.5 ml high recovery vial 146 containing 300 µl of n-hexane, vortexed at a low speed for 5 second, and extracted for 5 147 minutes. Afterwards flies were removed from the vial and allowed to air dry on the 148 porcelain plates before they were weighed (Sartorius Quintix 124-1S microbalance) to 149 the nearest 0.0001 grams and sexed. Extracts were evaporated until dry under nitrogen 150 gas and then stored at -20°. Before analysis 20 µl of n-hexane containing 1 µg/ml n-151 nonane was added as an internal standard to each vial, which was then vortexed at 152 maximum speed for 10 seconds. 153 The extracts from 20 male and 20 female flies from every treatment group were analyzed 154 on a GC (Agilent GC 6890) coupled to a MS (Agilent 5973 MSD) using a HP-5MS 155 capillary column (Agilent) (see Table S1 in supplementary materials for instrument 156 settings). 157 The total ion chromatograms were quality checked, de-noised and Savitzky-Golay 158 filtered before peak detection and peak area integration in OpenChrom®. Peaks were 159 then aligned using the R-package 'GCalignR' (Ottensmann, Stoffel, & Hoffman, 2017) 160 prior to statistical analysis. Peaks were tentatively identified by their mass spectra and 161 retention time index based on a C21-C40 n-alkane standard solution (Sigma-Aldrich). 162

163
The effects of sex and population on CHC profile were assessed using flies raised on 164 standard wrack using a balanced sampling design (N=13 for each combination) and 127 165 peaks were identified by the R package "GCalignR". The effects of sex and diet (i.e. 166 wrack type) during the larval stage on CHC profiles were analyzed using a balanced 167 design of flies from the Ystad population (N=12 for each combination) and 111 peaks 168 were identified by R package "GCalignR". Peak areas were normalized on the peak area 169 of the internal standard and based on the weight of the fly before being auto-scaled prior 170 to statistical analysis. Clustering of samples was visualized with a PCA, and group 171 differences were analyzed using a PERMANOVA followed by multiple group 172 comparisons using the PRIMER-E 7 software. We also analyzed group differences via 173 (O)PLS-DA using the "ropls"-packages in R (Thevenot, Roux, Xu, Ezan, & Junot, 2015). 174 Candidate compounds for group differentiation were determined from variables of importance for OPLS-DA projection (VIP scores) (Galindo-Prieto et al., 2014), which 176 were further assessed by univariate analysis using a false discovery adjusted significance 177 level of α = 0.05 using the Benjamini and Hochberg's FDR-controlling procedure 178 (Benjamini & Hochberg, 1995). 179 We examined sex and country effects in more depth to examine shifts in chemistry that 180 could be linked to our genetic data (see below). Our results above indicated that the 181 differences in CHC profiles between sexes and populations of C. frigida were due to 182 quantitative rather than qualitative differences (i.e. changes in relative amounts rather 183 than presence/absence of certain compounds). Thus, we looked for general shifts in CHC 184 chemistry by examining 1. The total normalized peak area (i.e. the sum of all peaks used 185 in analysis), 2. The proportion of alkenes, and 3. The proportion of methylated 186 compounds. For #1 we used all 127 peaks used in the previous analysis and for #2 and #3 187 we used the subset of these peaks that could be accurately categorized (116 peaks). For 188 the total peak area we transformed the data with a log transformation and then used the 189 'dredge' function from the MuMIn package (Barton, 2017) to compare AIC values for 190 nested glm (Gaussian distribution, link=identity) models with the terms sex, country, and 191 country*sex. If there were two models that differed in AIC by less than 1, we chose the 192 simpler model. For alkenes and methylated compounds our data was proportional and did 193 not include 1 or 0 so we used the 'betareg' function from the betareg package (Grun et 194 al., 2012). We again used the 'dredge' function to compare AIC values for nested models 195 with the terms sex, country, and country*sex and took the simpler model in case of an 196 AIC difference of less than 1. 197 We also analysed the distribution of chain lengths in our samples. To do this, we 199 subsetted all peaks where we had accurate length information (107 peaks) and then 200 calculated the weighted mean chain length as 201 where l i is the length of a carbon chain and p i is the proportional concentration of all 203 chains of that length. We also calculated dispersion around this mean as 204 We analyzed these in the same manner as described above, using AIC values to conduct 206 model choice from glm models (Gaussian distribution, link=identity). 207 Candidate gene analysis 208 Based on a literature search for genes involved in CHC synthesis, we identified 37 209 reference protein sequences, which were downloaded from Flybase (flybase.org) in 210 October 2017. We used tblastx with the default settings to identify orthologs in the C. 211 frigida transcriptome (Berdan & Wellenreuther, unpublished). We then used gmap (Wu 212 & Watanabe, 2005) to find areas of the C. frigida genome (Wellenreuther,et al.,213 unpublished) that might contain these genes. We output all alignments as a gff3 file. We 214 have a previously made a VCF file created from 30x whole genome re-sequencing of 46 215 C. frigida across five populations in Scandinavia (Berdan et. al, unpublished). This VCF 216 file was made by aligning raw reads to the C. frigida genome using bwa-mem (Li & 217 Durbin, 2010), duplicate reads were marked using 'picard' (http://picard.sourceforge.net). Auwera et al. 2013). VCF filtering was done as described in Berdan et al. 2015. We used 221 this VCF file and the gff3 output from gmap in SNPeff (Cingolani et al., 2012) to 222 annotate the SNPs in these genes. 223

224
To find which SNPs are diverged between populations we used vcftools (Danecek et al., 225 2011) to calculate Cockerham and Weir's F ST estimate (Weir & Cockerham, 1984) for all 226 SNPs in our reference genome. We retained the top 5% of this distribution (F ST ≥ 0.142, 227 mean F ST 0.024) as SNPs that were potentially divergent. We then subset the VCF file to 228 include only outlier SNPs located in our previously identified genomic regions. 229

CHC analysis 231
A diverse blend of hydrocarbons with more than 100 different compounds were found in 232 the cuticular extract of C. frigida. The majority of CHCs ranged in chain length from 23 233 to 33 carbon atoms. They contained odd numbered n-alkanes, methyl-, dimethyl-and 234 trimethyl-alkanes, as well as odd numbered alkenes. However, some even numbered 235 alkanes were also present but in lower quantities. 236

Sex and population effects 237
Results of the PERMANOVA indicated significant differences between females and 238 males and between populations but also a significant interaction between sex and 239 population (Table 1A). Pairwise tests on the interaction showed a significant difference between males and females for all populations except Ystad (Table S2) Table S3). After combining the Swedish and Norwegian populations, the 246 PERMANOVA indicated highly significant differences between females and males as 247 well as between Sweden and Norway (Table 1B). OPLS-DA models for both sex and 248 country were significant and lead to separate clusters of females and males, and Sweden 249 and Norway, respectively ( Figure 2B, D, Table S3). The OPLS-DA model revealed 41 250 peaks that contributed with a VIP score >1 to the differentiation between females and 251 males. Of these, 31 were also significant in t-tests with a FDR adjusted p-values (Table  252 2). Generally compounds were more abundant in males with the exception of only 3 253 peaks that were more abundant in females. Looking at country differentiation between 254 Swedish and Norwegian populations we found 43 compounds with a VIP score >1, of 255 which 29 were significant in t-tests with FDR adjusted p-values and consisted of an equal 256 mix of compounds more abundant in either of the countries (Table 2). Differences in 257 CHC profiles between sexes and populations were due to quantitative differences, 258 affecting a wide range of compounds instead of qualitative differences, i.e. the presence/ 259 absence of a few distinct compounds 260 261

Sex and diet effects 262
Ystad population and also a significant effect of the larval diet (wrack type) (Table 1C). 264 Both OPLS-DA models for either sex or wrack type were significant and lead to separate 265 clusters of females and males and of flies raised on standard or field wrack ( Figure 3B,C, 266 Table S3). The OPLS-DA model revealed 38 peaks that contributed to the differentiation 267 between females and males with a VIP score >1. Of these, 24 compounds were also 268 significant in a t-test with FDR adjusted p-values ( Table 2). All of these, except for 3 269 peaks, were more abundant in males. Overall 65% (26 compounds) were shared between 270 this analysis and the previous analysis investigating sex and population effects. Looking 271 at differentiation between flies raised on a diet consisting of standard or field wrack we 272 found 33 compounds with a VIP score >1. Of these, 27 compounds were significant 273 using a t-test with FDR adjusted p-values, all with higher amounts in flies reared on field 274 wrack (Table 2). Flies reared on field wrack had more CHCs per gram body weight than 275 flies reared on standard wrack (normalized totals: Females field wrack 7.96 ± 0.94, 276 females standard wrack 4.09 ± 0.47, males field wrack 11.24 ± 1.03, males standard 277 wrack 7.09 ± 1.21). Differences in CHC profiles between sexes and diets were 278 exclusively due to quantitative differences, affecting a wide range of compounds instead 279 of qualitative differences, i.e. the presence/ absence of a few distinct compounds 280 281

Shifts in Chemistry 282
As our results above indicated an overall effect of country rather than population we used 283 country and sex for our terms in the glm models. For total normalized peak area, the best 284 model contained only sex as a main effect (for a full AIC comparison for all models see Table S4A-E). In general, males had more CHCs per gram body weight than females 286 ( Figure 4A). For the proportion of alkenes, the best model contained country and sex as 287 main effects (for type II analysis of deviance tables for all best models see Table S5A-E). 288 Females tended to have more alkenes than males and individuals from Norway had more 289 alkenes than individuals from Sweden ( Figure 4B). The best model for the proportion of 290 methylated compounds contained sex and country as the main effects. As with alkenes, 291 females tended to have more methylated compounds than males and individuals from 292 Norway tended to have more methylated compounds than individuals from Sweden 293 ( Figure 4C). 294 295 The best model for weighted mean chain length was country and sex. Overall, males had 296 longer mean chain lengths than females and individuals from Sweden had longer mean 297 chain lengths than individuals from Norway ( Figure 5A). The best model for dispersion 298 around mean chain length was also country and sex. Males had higher dispersion than 299 females and individuals from Norway had higher dispersion than individuals from 300 Sweden ( Figure 5B). 301 302

Genetic analysis 303
We identified 15 isoforms from 13 transcripts in our transcriptome with matches to our 304 query proteins. These transcripts mapped to 16 unique areas of the genome assembly. We 305 blasted these areas against the NCBI nr database (accessed in April 2018) to confirm our 306 annotation. Five of these areas had either no match or matched to an unrelated protein 307 and were thus removed. The remaining genes include 2 putative fatty acid synthases, 2 308 putative desaturases, 5 putative elongases, 1 putative cytochrome P450 reductase, and 1 309 putative cytochrome P450-4G1 (Table 3). We found 1042 SNPs within these genes 310 (including 5K upstream and 5K downstream), of these 11 had an F ST value > 0.142 (i.e. 311 in the top 5% of the F ST distribution, Table 4). 312

314
In species with sexual conflict females may minimize unwanted copulations via mate 315 rejection responses; however, these responses typically carry associated costs (Watson et 316 al., 1998). The ability to modulate mating outcomes and thus the level of male 317 harassment experienced by the female based on male traits is especially beneficial in 318 systems where no "courtship" exists. Here we explored a potential male trait, cuticular 319 hydrocarbons (CHCs), and detected considerable phenotypic CHC variation attributable 320 to sex, population, and diet and genetic variation in candidate genes involved in CHC 321 synthesis that was attributable to population. We discuss these results and their 322 consequences below. 323 324 Males and females differed in their CHC composition. Not only did the composition 325 between males and females differ (Table 2, Figure 2-5) but, in addition to that, males 326 consistently had more CHC per gram body weight than females ( Figure 4A). This is a 327 strong indication that CHCs play a role in sexual communication in C. frigida under a 328 female choice scenario, as it is unlikely that desiccation risk would differ between males 329 and females. A role for CHCs in sexual selection in C. frigida would be in line with the 330 general finding that CHCs play a major role in communication in insects, specifically in 331 Diptera (Howard & Blomquist, 2005;Blomquist & Bagnères, 2010;Ferveur & Cobb, 332 2010). Differences in CHCs have been shown to be involved in a wide variety of 333 behaviours in diptera such as courtship, aggregation, and dominance (Ferveur & Cobb, 334 2010). For instance, 11-cis-vaccenyl acetate, a male specific hydrocarbon in Drosophila 335 melanogaster, increases male-male aggression while suppressing male mating (Wang & 336 Anderson, 2010). The same compound is also involved in aggregation behavior in both 337 D. simulans and D. melanogaster (Bartelt et al., 1985;Schaner et al., 1987). Other 338 aggregation hydrocarbons, such as (Z)-10-heneicosene, in D. virilis have been shown to 339 attract males and females of certain ages (Bartelt & Jackson, 1984). Behavioural tests 340 will be necessary to confirm that CHCs are used for mate selection in C. frigida but 341 attention should also be paid to their possible role in behaviours such as aggression and 342 aggregation. This may be particularly relevant to Coelopa frigida as this species has been 343 described as forming large aggregations (Mather, 2011). 344

345
We found a shift in the composition between sexes, populations, and diets rather than 346 differences in the presence/absence of specific compounds. This type of pattern differs 347 from some Dipterans where sex and population differences are mostly qualitative 348 (Carlson & Schlein, 1991;Ferveur & Sureau, 1996;Dallerac et al., 2000b;Gomes et al., 349 2008;Everaerts et al., 2010) although many dipterans show quantitative differences (ex: 350 Jallon & David, 1987;Byrne et al., 1995). This discrepancy is likely to be partially 351 explained by the distribution of compounds that make up the CHCs in C. frigida. While 352 species ) the distribution of these compounds is somewhat 354 different. Many dipterans have principal CHC components that make up a large 355 proportion of the CHC composition (Ferveur & Sureau, 1996;Etges & Jackson, 2001;356 Gomes et al., 2008). Cuticular extracts from C. frigida have a large number of 357 compounds, but do not appear to have specific compounds that are dominant in either sex 358 or any of the populations. When averaging across all samples, the most abundant 359 compound was the C25 alkane, which made up on average 15% of the CHCs. 360

361
We examined coding variation to determine if we could tie population level phenotypic 362 differences to genetic differences. We observed country level variation with strong 363 differences between Norway and Sweden ( Figure 2D, Figure 4, Figure 5). This pattern 364 mirrors a strong genetic split between the two countries (representing approximately 14% 365 of genetic variation, Berdan et al, unpublished). We found potential outlier SNPs in 366 desaturases, elongases, a cytochrome P450-4G1, and a cytochrome P450 reductase. 367 Desaturases add double bonds or triple bonds to alkanes (Howard & Blomquist, 2005;368 Blomquist & Bagnères, 2010). We found an outlier missense variant in a putative 369 Desaturase 1 as well as a missense and two downstream outlier variants in a putative 370 Desaturase (Table 4). This aligns with our phenotypic data showing differences in the 371 proportion of alkanes vs. alkenes between Norway and Sweden. Elongases lengthen fatty 372 acyl-CoAs and are necessary for chains longer than 16 carbon atoms (Blomquist & 373 Bagnères, 2010). We found both synonymous and downstream outlier variants in putative 374 C. frigida elongases (Table 4) and corresponding differences in both the mean chain length and variance of the chain length (dispersion around the mean) due to country of 376 origin. P450 reductases are responsible for reducing fatty acids to aldehydes (Blomquist 377 & Bagnères, 2010). Knockdown of P450 reductases in D. melanogaster leads to a 378 striking reduction of cuticular hydrocarbons (Qiu et al., 2012). P450-4G genes, which are 379 unique to insects, encode an oxidative decarboxylase that catalyzes the aldehyde to 380 hydrocarbon reaction. Work in other insect species has found that a knockdown or 381 knockout of this gene leads to a decrease in overall hydrocarbon levels (Reed et al., 1995;382 Qiu et al., 2012;Chen et al., 2016). We found a downstream outlier variant in our 383 cytochrome P450-4G1 and multiple kinds of outlier variants in a cytochrome P450 384 reductase. However, the total amount of CHCs did not differ between populations, only 385 sexes. As males and females share a genome (with the exception of the sex chromosome) 386 it is unlikely that these SNPs affect this difference. Finally, we also found variation 387 between Sweden and Norway in the proportion of methylated compounds. Other studies 388 suggest that differences in the proportion of methylated compounds are often caused by 389 either variation in Fatty Acid Synthase (FAS, de Renobales et al., 1986;Blomquist et al., 390 1994;Juarez et al., 1996) or multiple copies of FAS with different functions (Chung et 391 al., 2014). We found two different FAS' in our genome (Table 3)  and tests of function of these loci will be necessary to provide a direct link. However, the 395 association between the known function of several of these genes (elongases, desaturases) 396 and the corresponding differences in populations are marked. Given that we see shifts in 397 multiple chemical aspects (length, methylation, and alkene/alkane ratio) it is likely that 398 multiple loci are responsible for these patterns. 399

400
The wrackbed itself also had a strong influence on the CHC composition ( Figure 3C). 401 Wrackbed composition and microbiome (i.e. the food source for C. frigida larvae) varies 402 across C. frigida populations in Europe (Berdan, et al., in prep, Day et al., 1983;Butlin & 403 Day, 1989;Wellenreuther et al., 2017), indicating that natural populations may be have 404 different CHC composition. This is concordant with other research showing that larval 405 diet impacts CHC composition (Liang & Silverman, 2000;Rundle et al., 2005;Etges & 406 de Oliveira, 2014;Stojkovic et al., 2014). Adding to this, we found strong population 407 (country) level differences in CHC composition even when larvae were reared on the 408 same substrate. We also found several outlier SNPs in a variety of genes involved in 409 CHC synthesis. Together these results indicate several possibilities: 1. Natural selection 410 may select for a different CHC composition in different environments and natural 411 polymorphism in CHC genes may be under direct selection due to this. For example, 412 Drosophila serrata and D. birchii differ in the environment they inhabit (habitat 413 generalist vs. habitat specialist for humid rainforests). These two species also differ in the 414 concentration of methyl branched CHCs (especially important in desiccation resistance) 415 caused, in part, by changes in the cis-regulatory sequence likely under natural selection 416 (Chung et al., 2014). 2. Natural populations differ strongly in their CHC composition. If 417 assortative mating or selection on female preferences is present that could lead to locally 418 adapted male traits and female preferences (Chung & Carroll, 2015). This would reduce 419 effective migration between populations, as foreign males would be at a disadvantage.
Differences in CHC composition and corresponding preferences have been shown to 421 cause behavioral isolation between many different Drosophila species or populations 422 (Coyne et al., 1994;Rundle et al., 2005;Etges & de Oliveira, 2014). 423

424
In conclusion, the analysis of cuticular hydrocarbon extracts from male and female C. 425 frigida from multiple populations revealed a complex and variable mix of more than 100 426 different compounds. We found extensive phenotypic variation attributable to diet, sex, 427 and country as well as associated genetic variation attributable to the latter. This reveals 428 that CHC composition is dynamic, strongly affected by the larval environment, and most 429 likely under natural and/or sexual selection. Further work is needed to confirm that CHCs 430 are used as sexual signals in this system and to explore the evolutionary consequences of 431 Table 2 -List of compounds that are significantly important for differentiation of at least factor (sex, diet, or country). If compounds are significant for differentiation of any category in any comparison a * appears under that factor. Color indicates the direction of the differentiation (red-higher in females, blue -higher in males, grey -higher in Norwegian populations, green -higher in Swedish populations, yellow -higher in flies raised on standard wrack, purple -higher in flies raised on field wrack).