Contrasting patterns of divergence at the regulatory and sequence level in European Daphnia galeata natural populations

Abstract Understanding the genetic basis of local adaptation has long been a focus of evolutionary biology. Recently, there has been increased interest in deciphering the evolutionary role of Daphnia's plasticity and the molecular mechanisms of local adaptation. Using transcriptome data, we assessed the differences in gene expression profiles and sequences in four European Daphnia galeata populations. In total, ~33% of 32,903 transcripts were differentially expressed between populations. Among 10,280 differentially expressed transcripts, 5,209 transcripts deviated from neutral expectations and their population‐specific expression pattern is likely the result of local adaptation processes. Furthermore, a SNP analysis allowed inferring population structure and distribution of genetic variation. The population divergence at the sequence level was comparatively higher than the gene expression level by several orders of magnitude consistent with strong founder effects and lack of gene flow between populations. Using sequence homology, the candidate transcripts were annotated using a comparative genomics approach. Additionally, we also performed a weighted gene co‐expression analysis to identify population‐specific regulatory patterns of transcripts in D. galeata. Thus, we identified candidate transcriptomic regions for local adaptation in this key species of aquatic ecosystems in the absence of any laboratory‐induced stressor.

Genetic variation within and among populations is strongly influenced by their colonization history and the demographic changes following the primary establishment of a population. Population sizes may vary after colonization across the species based on environmental factors and further colonization (Böndel et al., 2015).
Colonization events depend on dispersal ability, and dispersal rates strongly differ from gene flow estimates in several species (De Meester, Gomez, Okamura, & Schwenk, 2002). This is particularly evident in freshwater zooplankton species, where several studies suggest a high potential for dispersal when populations rapidly colonize new habitats and spread invasively (Havel, Colbourne, & Hebert, 2000;Louette & De Meester, 2004;Mergeay et al., 2008).
However, genetic studies show that the observed rate of gene flow is much lower than would be expected in organisms with high dispersal potential (Boileau, Hebert, & Schwartz, 1992;De Meester et al., 2002;Thielsch, Brede, Petrusek, De Meester, & Schwenk, 2009).
This ambiguity between dispersal potential and rate of gene flow can be explained by founder effects (Boileau et al., 1992) complemented by local adaptation, resulting in monopolization of resources by local populations (De Meester et al., 2002). This process leads to the impression that population genetic variation correlates with the colonization patterns (Orsini, Vanoverbeke, Swillen, Mergeay, & De Meester, 2013).
Amongst freshwater zooplankton species, the water flea Daphnia (Figure 1) is the best studied and has been an important model for ecology, population genetics, evolutionary biology, and toxicology (Ebert, 2005). This genus belongs to the order Cladocera and has attracted scientific interest since the 17th century (Desmarais, 1997). It inhabits most types of freshwater habitats and includes more than 100 known species of freshwater plankton organisms (Ebert, 2005). Daphnia make an interesting subject of investigation in comparative functional genomics (Eads, Andrews, & Colbourne, 2008). Apart from the fact that Daphnia species have an appropriate size for being used in laboratory cultures, they are easy to cultivate and have short generation times. Because of their clonal mode of reproduction, Daphnia are highly suited for quantitative genetic studies, which can enhance our understanding of their evolutionary ecology.
In the present study on Daphnia galeata, sampled from four different lakes in Europe, we conducted a large-scale RNA-seq study in the absence of any laboratory-induced environmental stressor.
Using transcriptome data, we quantified the constitutive expression profiles and performed a sequence analysis of the four populations.
We addressed the following questions: (a) Are there differences in gene expression profiles between the four populations? (b) How is the observed variation explained by the different levels of organization, that is, genotype and population? (c) Do the observed differences in expression profiles result from genetic drift or selection? (d) What is the role of genetic drift and/or natural selection in shaping sequence variation? (e) What are the functional roles of the transcripts?
Our study brought contrasting patterns of divergence at the regulatory and sequence level into light. While no population-specific gene expression patterns were found for majority of the analyzed transcripts, divergence patterns at the sequence level hinted at strong influences of founder effects, bottleneck events, and divergent selection. Further, our gene co-expression network analysis showed conserved patterns while assessing the population-specific networks and supported our observations at the regulatory level.
We were able to identify candidate transcripts for local adaptation using combined approaches. Further comparative genomics analyses are needed to complement our preliminary functional annotations of these candidate transcripts to identify the ecological drivers behind the observed patterns of adaptation.  Dp196,Dp281,Dp512,SwiD1,SwiD10,SwiD12,SwiD14,SwiD15,SwiD2), following protocols by Taylor, Hebert, and Colbourne (1996) and Yin, Wolinska, and Giessler (2010), respectively.

| Sampling and RNA collection
Mature females for six clonal lines per lake were placed at equal densities (40 individuals per L) in semi-artificial medium for a week, during which the juveniles were regularly removed. Gravid females from the equal density beakers were then collected within three days during a time window of a few hours. Twenty to thirty individuals were homogenized in a 1.5-ml centrifuge tube in 1 ml Trizol (Invitrogen, Waltham, MA, USA) immediately after removing the water. The Trizol homogenates were kept at −80°C until further processing.

| RNA preparation and sequencing
Total RNA was extracted following a modified phenol/chloroform protocol and further processed using the RNeasy kit (Qiagen, Hilden, Germany). The total RNA was eluted in RNAse-free water, and the concentration and quality (RNA integrity number and phenol contamination) were checked using a NanoDrop spectrophotometer (Thermo Scientific, Wilmington, DE, USA) and a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). The 72 total RNA samples (4 lakes × 6 genotypes × 3 biological replicates) were sent to the company GATC (Konstanz, Germany) for library preparation and sequencing. Following reverse transcription and cDNA construction using random primers, 50 bp single-end (SE) reads were sequenced on an Illumina HiSeq 2000 (San Diego, CA, USA). To avoid block effects and confounding effects in the downstream analysis, we used a completely randomized design; each library was sequenced on at least two different lanes, on a total of nine lanes. Detailed information can be found in Supporting Information Table S1.

| Quality trimming, mapping, and read counts
All reads with ambiguous bases (Ns) were removed before trimming.
Bases with a phred score below 20 were trimmed at the 3′ and 5′ ends. Reads shorter than 45 bp after trimming were discarded. All trimming steps were conducted using locally installed version of Galaxy at the Gene Center in Munich, Germany.
Trimmed reads were mapped to the reference D. galeata transcriptome (Huylmans, Lopez Ezquerra, Parsch, & Cordellier, 2016; available from NCBI: https://www.ncbi.nlm.nih.gov, GenBank ID: HAFN00000000.1) using NextGenMap (Sedlazeck, Rescheneder, & Haeseler, 2013) with increased sensitivity (-i 0.8 -kmer-skip 0 -s 0.0). Read counts were obtained from the SAM files using a custom python script (available upon request) and discarding ambiguously mapped reads. The raw count table was analyzed in R (R Development Core Team, 2008) using the package DESeq2 (Love, Huber, & Anders, 2014). Normalization was done with size factor procedure. Standard differential analysis steps of DESeq2 such as estimation of dispersion and negative binomial GLM fitting were applied. The count outliers were automatically detected using Cook's distance, which is a measure of how much the fitted coefficients would change if an individual sample was removed (Cook, 1977). The four lists of DETs obtained above were combined to identify population-specific transcripts, and Venn diagrams depicting the overlap between the contrasts were created using the VennDiagram package (Chen & Paul, 2011) in R.

| Evaluating the role of natural selection on transcript expression levels: DRIFTSEL
We searched for transcripts for which the identified differential expression could not be explained by phylogenetic distance and genetic drift alone. To identify signals of possible selection, we used the approach of Ovaskainen, Zheng, Karhunen, Cano Arias, and Merilä (2011) implemented in the R package DRIFTSEL 2.1.2 (Karhunen, Merila, Leinonen, Cano, & Ovaskainen, 2013), considering the expression of every single transcript as a trait. To perform this analysis, we made use of the microsatellite data and normalized read count values. Allele frequencies were obtained from microsatellite data collected in a previous study, independently from the species identification step outlined above. Microsatellite data of 30-40 resting eggs also sampled from the same sediment layers the resurrected clonal lines come from was obtained from a study by Herrmann (2017). Briefly, eleven microsatellite loci were analyzed for each clonal line according to the protocol published by Thielsch et al. (2009). Primers for all loci were multiplexed, and PCR was performed using the Type-it Microsatellite PCR Kit (Qiagen, Hilden, Germany). Alleles were recorded manually, and allelic frequencies were calculated with GenAlEx (Peakall & Smouse, 2012).
Using microsatellite allelic frequencies, the coancestry coefficients by admixture F model were calculated using "do.all" function implemented in the RAFM package (Karhunen & Ovaskainen, 2012).
We ran a total of 200,000 iterations with thinning at an interval of 1,000 and discarded the first 1,000 iterations as "burn-in." The output was a list which contained samples from the posterior distri- Large H-values (H-value ≥ 0.95) imply that the populations are more locally adapted than expected by chance.

| Intra-and interpopulation variation
To quantify the respective contributions of the factors "genotype" and "population" to the observed variation in gene expression profiles, we performed a linear mixed model analysis in R on the normalized read counts obtained from DESeq2. We used the slope of "Genotypes" as a random factor (to account for multiple replicates per genotype per population) and "Population" and "Genotype" as fixed factors. To compute p-values for our model, we used the "ANOVA" function in the R package "car" (Weisberg, 2011). To correct for multiple testing, p adj values were calculated for each transcript using the Benjamini-Hochberg procedure.

| Variant calling and filtering
The variant calling and filtering steps have already been described in Herrmann, Ravindran, Schwenk, and Cordellier (2017b). Briefly, the aligned reads from RNA-seq data were merged using samtools (Li et al., 2009) Using the SNPRelate package (Zheng et al., 2012) in R/ BioConductor, the variant dataset was limited to only biallelic sites for downstream analysis. These were further pruned for linkage disequilibrium considering a threshold of 0.2 (r 2 > 0.2), thereby retaining 393,514 SNPs. A PCA was plotted using the functions in SNPRelate which include calculating the genetic covariance matrix from genotypes, computing the correlation coefficients between sample loadings and genotypes for each SNP, calculating SNP eigenvectors (loadings), and estimating the sample loadings of a new dataset from specified SNP eigenvectors.

| Neutrality statistics
To obtain alignments of transcript sequences, SNP calling datasets were filtered as described above. Beagle 4.1 (Browning & Browning, 2007) was used to phase SNP calling data, and a python script (available upon request) was used to parse the phased vcf file to sample sequences in fasta format. After phasing, we obtained 13,006 transcripts containing SNPs and the sequences were input in R. A

multiple sequence alignment and Tajima's D statistics (with p-values)
were obtained population-wise for each transcript using the pegas package (Paradis, 2010) in R.

| Inbreeding coefficient and mutation frequencies
The inbreeding coefficient for the final SNP dataset was calculated with the -het function in VCFtools (Danecek et al., 2011). The ratio between the expected heterozygosity (H E ) and observed heterozygosity (H O ) was calculated based on available SNP information, and plots were created using ggplot2 (Wickham, 2016) in R.

| Sequence versus regulatory variation
To visualize the proportion of transcripts responsible for local adaptation at regulatory and sequence level, we consolidated the list of transcripts from various analyses as performed above and represented it with an alluvial diagram (http://rawgraphs.io/). In an alluvial diagram, each black rectangle is called a "node," the colored regions linking the nodes are called "flows," and the vertical group of nodes is called "steps." In our analyses, we had four steps: DESeq2, DRIFTSEL, LOSITAN, and Tajima's D (Figure 4).
We classified the orthoMCL clusters into the following categories: a Clusters that contain only D. galeata-specific transcripts.
b Clusters that are shared between D. galeata and D. pulex.
c Clusters that are shared between D. galeata and D. magna. f Clusters that are shared among all five analyzed species (Daphnia and both insects).

| Inparalogs and misassemblies
To assess whether D. galeata DETs in an orthologous group are "inparalogs," isoforms or the result of misassembly, we computed the pairwise sequence divergence for those orthoMCL clusters containing DETs from at least two different populations. Since each significantly differentially expressed transcript was assigned as a DET only to the population in which it was upregulated the most, clusters containing more than one DET most likely contained DETs from different populations. Based on the number of populations within their orthoMCL cluster, the DETs were classified into the categories: "1Pop," "2Pop," "3Pop," and "4Pop," and unclustered DETs were categorized as "no-cluster DETs." "No-cluster DETs" and "1Pop" DETs were excluded from further analysis. In total, there were 716 orthoMCL clusters that contained DETs from at least two populations. Pairwise alignments of the amino acid sequences in each orthologous group were performed using the iterative refinement method incorporating local pairwise alignment information (L-INS-i) in MAFFT (Katoh, Misawa, Kuma, & Miyata, 2002). We then used EMBOSS tranalign (Rice, Longden, & Bleasby, 2000) to generate alignments of nucleic acid coding regions translated from aligned protein sequences. Pairwise genetic divergence was computed with "dist.dna" function implemented in the ape package (Paradis, Claude, & Strimmer, 2004) in R, using the Kimura-2-parameters model with gamma correction. We used an arbitrary cutoff value of 2 to distinguish inparalogs from misassembled sequences.

| Gene ontology (GO) enrichment analysis
DETs with a H.value ≥0.95 (DRIFTSEL result) and transcripts with a nonzero D value in each of the four populations (Tajima's D result) were analyzed with "topGO" (Alexa & Rahnenfuhrer, 2016) in R, using a custom GO annotation for D. galeata. GO terms enriched in the transcripts of interest in each population from each analysis (DRIFTSEL and Tajima's D) were identified using the "weight01" algorithm for all three ontologies, namely molecular function, cellular component, and biological processes. We used a Fisher test, and those GO terms with a classicFisher value ≤0.05 were considered to be enriched for each ontology in each population. A multiple testing procedure was not applied as the p-values returned by the "weight01" algorithm are interpreted to be corrected and might exclude "true" annotations (Alexa & Rahnenfuhrer, 2016).

| Weighted gene co-expression network analysis
To gain insights into the population-specific regulatory patterns of transcripts in D. galeata, we performed a weighted gene co-expression network analysis with WGCNA (Langfelder & Horvath, 2008) using the variance stabilized normalized read counts obtained in DESeq2 analysis. Transcripts and samples that had lower expression values were excluded from every population using the "goodSam-pleGenes" function in WGCNA and used for downstream analysis. In total, 32,375 transcripts were used for the construction of gene coexpression networks. To identify population-specific co-expression modules (i.e., clusters of highly correlated transcripts), a network was first built using the full dataset (i.e., with samples and transcripts from all populations) and one network for each population using expression values specific to all genotype and biological replicates.
The population-specific network was compared to the reference network, and an adjacency matrix was calculated. Clusters were identified using the WGCNA Topological Overlap Matrices (TOM). if a transcript has an MMred value close to ±1, the transcript is assigned to the red module (Langfelder & Horvath, 2008). Each module is assigned a color based on the module size: "turquoise" denotes the largest module, blue next, followed by brown, green, yellow, and so on. The color "grey" is reserved for unassigned transcripts (Langfelder & Horvath, 2008). Similarly, the module "gold" consists of 1,000 randomly selected transcripts that represent a sample of the whole network and statistical measures have no meaning for this module (Langfelder & Horvath, 2008).
After obtaining the module definitions from each comparison, we assessed how well our modules in the reference network are preserved in the population-specific networks using the "modulePreservation" function, which outputs a single Z-score summary. The higher the Z-score, the more preserved a module is between the reference and population-specific network. A module was deemed to be preserved if the Z-score value was above 10, an arbitrary value deemed suitable by Langfelder, Luo, Oldham, and Horvath (2011).

| Sequencing results and mapping statistics
The dataset used for this study has been described in a previous publication by Herrmann et al. (2017b)

| Differential expression
The intraspecific variation in transcript expression in the four populations was visualized from a read counts matrix of the 32,903 tran-

| Role of natural selection on transcript expression levels
The DRIFTSEL multivariate approach was used to identify transcripts for which the observed differential expression could not be explained by phylogenetic distance and genetic drift alone; the alternative explanation being that the observed divergence would be attributable to selection and therefore possibly to local adaptation events. In total, 48% of 10,820 differentially expressed transcripts

| Expression variation among individuals and populations
The statistical significance of difference between group means of expression values was assessed with a linear mixed model analysis for each transcript, evaluating the factors "population" and "Genotype." For 414 transcripts, the means were statistically significantly different between populations but not between genotypes. The reverse was true for 10,201 transcripts. For 10,060 transcripts, the factors "genotype" and "population" explained the observed variation in gene expression. The remaining 12,228 transcripts had no significant p adj values for either of the factors.

| Sequence-based divergence
After applying the VariantFiltration criteria in the GATK SNP calling step, the resulting SNP set contained 414,546 variants distributed in 14,860 transcripts. These transcripts had an average of 28.2 SNPs per transcript. The vast majority (13,597 transcripts) was found to be biallelic, and 1,083 transcripts were multiallelic (Table 1).
A PCA was carried out based on a matrix of all biallelic SNP sites to illustrate the population structure among the four populations.
Although PC1 explained the maximum variance (12%) (Figure 3a) and four distinct clusters corresponding to the populations were seen against PC2. PC2 and PC3 each explained 8% of the variance (Supporting Information Appendix S2). PC2 clearly separated the genotypes belonging to Pop.J from the remainder of the data (Figure 3a).

| Sequence evolution
To assess the respective contributions of random and nonrandom evolutionary events on DNA sequence divergence, we calculated the Tajima's D statistic for each transcript in the four populations.

| Sequence versus regulatory variation
The proportion of transcripts identified to be candidates for local adaptation at both sequence and regulatory level were visualized using a flow diagram (Figure 4). Among the 10,820 transcripts identified to be differentially expressed, ~46% showed signs of selection at the regulatory level according to DRIFTSEL. Of these, ~15% were identified as outliers under balancing and/ or diversifying selection in LOSITAN. About 26% of these outliers had a significantly negative or positive Tajima's D value in at least one population, which might be attributed to selection but can also stem from other evolutionary processes such as population growth, reduction or subdivision, bottleneck events, and migration.

Population
All We were able to predict domains for ~50% of our transcripts.
Among the DETs, a slightly higher proportion of transcripts, ~53%, had known protein domains (Supporting Information Appendix S3b,

| Assessment of assembly artifacts and inparalogs
In total, 3,325 DETs belonged to the "no-cluster DETs" category

| Gene ontology enrichment analysis
GO enrichment analysis was performed on the candidate transcripts as identified from DRIFTSEL (H value ≥0.95) and Tajima's D analyses.
We observed an enrichment for several metabolic processes such as ATP binding, DNA binding, microtubule binding, transporter activities, and signaling pathways (Supporting Information

| Weighted gene co-expression network analysis
The WGCNA on 32,375 transcripts identified 29 co-expression modules ( Figure 6) in the reference network (see Section 2). We observed varying numbers of modules and transcripts clustered in each population-specific network (Supporting Information Table S7a-d).
However, after assessing the conserved modules, where each population-specific network was compared to the reference network, 24 modules (out of 29) were well conserved (Z-score ≥10) among the

| D ISCUSS I ON
In this study, we describe an approach to distinguish between neutral and adaptive evolutionary processes at gene expression and DNA sequence level using D. galeata transcriptome data. We identified differentially expressed transcripts in each of the four populations. We also used the multivariate DRIFTSEL approach combining expression values and microsatellite data, to investigate the role of selection in shaping D. galeata differential expression profiles. Furthermore, we identified SNPs to understand the sequence-level differentiation among the four populations. Finally, we annotated the functions of our candidate transcripts for local adaptation. This study is a first step towards description of polymorphisms in D. galeata possibly involved in phenotypic responses to environmental perturbations and as such promising candidates for future studies.

| Population divergence at the sequence level
SNPs became the absolute marker of choice for molecular genetic analysis as the mining of polymorphisms is the cheapest source for F I G U R E 5 Flow diagram representing the proportion of transcripts that are candidates for local adaptation at the regulatory and sequence level. Each analysis or "step" is represented by a vertical group of black rectangle bars, called nodes. The colored areas linking the nodes are called "flows." The DESeq2 step contains four nodes: PopG (yellow), PopJ (black), PopLC (pink), and PopM (green), which represent the number of transcripts specifically upregulated in each of the four populations as identified by DESeq2 analysis. The DRIFTSEL step contains 2 nodes: "H.value ≤0.95" (grey) and "H.value ≥0.95" (purple). The LOSITAN step contains 5 nodes: "NC" (grey) with transcripts without LOSITAN result (not calculated); "noOL" (grey), transcripts where none of the SNPs in a transcript were identified as outliers; "Bal" (cyan), transcripts containing at least one SNP that is under balancing selection; "Div" (pink), transcripts containing at least one SNP under diversifying selection; and "BalDiv" (pale green), transcripts containing SNPs that are under both balancing and diversifying selection. The Tajima Genetic differentiation among populations of passively dispersed aquatic invertebrates is strong in most cases, despite the dispersal probability expedited by water birds and other vectors carrying their diapausing eggs (Mills, Lunt, & Gomez, 2007;Munoz, Chaturvedi, De Meester, & Weider, 2016;Ventura et al., 2014). Population genetic differentiation has been observed even at small spatial scales (i.e., less than 1 km) in Daphnia (Hamrova, Mergeay, & Petrusek, 2011;Yin et al., 2010). Additionally, the monopolization effect, a concept in Daphnia from Lake Greifensee and Lake Constance ) and crustacean zooplankton from Lake Constance (Straile & Walter, 1998) which have all undergone historical bottleneck events.
Similarly, Lake Müggelsee, a large shallow lake, has undergone severe bottlenecks due to increased turbidity and because vegetation disappeared almost completely after the 1960s (Okun, Mendonca, & Mehner, 2005). One other explanation for the excess of rare alleles is selection against genotypes carrying deleterious alleles. lower frequency of rare alleles also occurs if there is a recent population admixture (Stajich & Hahn, 2005). This argument is consistent with our inbreeding coefficient measures. Most of the genotypes in population G, as well as M9, were more heterozygous than expected.
While genotype M9 from Müggelsee might be an exception, the pattern observed in Greifensee could be the consequence of outbreeding and/or high genetic variability in this population. This might also stem from past hybridization events . Although inbreeding (Keller & Donald, 2002), but also due to a lack of variation in the source population, either caused by a small founder population size or a severe bottleneck during population history (Luikart, Allendorf, Cornuet, & Sherwin, 1998). Further, the ecology and growth dynamics of Daphnia populations might exacerbate the founder effects. After an initial hatching phase from the resting eggs bank and exponential population growth in the spring, clonal selection occurs throughout the growing season (Vanoverbeke & De Meester, 1997). Therefore, it is possible that only a few clonal lines contribute to the resting eggs population each year. However, while a reduced number of clonal lines might contribute to the yearly "archiving" of genetic diversity, two processes counteract the immediate diversity loss. First, the spring recruitment does not only rely on eggs from the previous year but rather on a mixture (Vanoverbeke & De Meester, 1997), and might even integrate overwintering clones in larger permanent lakes (but see Yin, Gießler, Griebel, & Wolinska, 2014 for an overview). Second, clonal erosion does not affect the same genotypes every year, leading to year-to-year heterogeneity, such as the one observed in the long-term study by Griebel, Giessler, Yin, and Wolinska (2016). Clonal erosion thus does not necessarily lead to a downward spiral of genetic diversity loss, and the high stochasticity of both clonal selection and hatching ensures a preservation of the genetic diversity in every habitat. in Gliricidia sepium (Chalmers, Waugh, Sprent, & Simons, 1992) and

| Gene expression variability and signals of selection
Arabidopisis halleri (Macnair, 2002)  Macháček, 1991). A common garden experiment conducted on the very same clonal lines also showed a higher phenotypic variability within populations than among populations (Tams, Luneburg, Seddar, Detampel, & Cordellier, 2018). Finally, the observed relative homogeneity in the gene expression profiles might be the consequence of high selective pressure on transcription regulation or canalization (Waddington, 1942

| Sequence versus regulatory variation in Daphnia galeata
Correlating expression profiles with sequence divergence helps to

| Gene annotation and evaluation of inparalogs
Gene annotation is quite challenging in organisms lacking reference genomes, and functional annotation then relies on the availability of transcriptomic sequences from the closest available taxon. In this study, we were able to annotate 66.5% of the transcripts using BLAST analysis (Supporting Information Appendix S3a). However, many of the transcripts were homologous to a D. pulex "hypothetical protein," likely because (a) they are similar in function to noncoding regions or pseudogenes or (b) novel coding transcripts that are yet to be functionally characterized (Vatanparast et al., 2016). Furthermore, we were able to predict domains for 80% of the transcripts using Pfam analysis (Supporting Information Appendix S3b, Table S4).
Our orthoMCL results (Supporting Information Appendix S3c, Table   S4) showed that several (~45%) of the D. galeata transcripts were orthologous to one or all species of Daphnia used for comparison, indicating that the genes/transcripts have all evolved from a common Daphnia-specific ancestral gene via speciation. In addition to this, ~25% of Daphnia genes/transcripts are orthologous to two insect species (D. melanogaster and N. vitripennis). Our level of unannotated transcripts is similar to results reported from other organisms lacking extensive genomic resources, for example, from plants like field pea (Pisum sativum; Sudheesh et al., 2015), chickpea (Cicer arietinum; Kudapa et al., 2014), and winged bean (Psophocarpus tetragonolobus; Vatanparast et al., 2016). This limited our interpretation of the functional role of Daphnia transcripts and thereby their associations to known ecological stressors. A second issue raised when lacking a reference genome is that it might be difficult to tease apart inparalogs created by duplication events, isoforms, and even misassemblies, leading to an artificially inflated number of similar sequences for each distinct gene in the transcript set. Only ~18% of the population-specific DETs had one or more putative paralogs also identified as differentially expressed in at least one other population. For DETs from two or more populations that co-occurred in orthoMCL clusters, we were able to distinguish between actual paralogs (transcript pairs that had a sequence divergence value >2, Figure 5b) and transcripts with sequence divergence value <2. Genomic information is now required for this species in order to accurately assign transcripts to genes and correctly assess whether two different populations might indeed express different gene copies with similar functions.

| FUTURE D IREC TI ON S AND CON CLUS IONS
In summary, we described here an approach that combines both transcriptomic expression profiles and sequence information to understand local adaptation in D. galeata. Although the set of transcripts contributing to population divergence at the sequence and the expression level differs, both levels constitute alternative routes for responding to selection pressures (Pai, Pritchard, & Gilad, 2015), showing that these transcripts can contribute to local adaptation and paving way for future research. From our functional analysis, it was evident that most of our transcripts were Daphnia specific although they had hypothetical functions. To understand the function of the hypothetical transcripts in D. galeata and their response to environmental perturbations, a comparative approach using the gene expression data from numerous other Daphnia studies should be used. Although we noticed correlations between expression patterns and sequence divergence for the D. galeata transcripts, we lack genomic and phylogenetic information. This information may help "bridge the gap" for understanding the relative roles of positive or negative selection in driving coding sequence and gene expression divergence.

ACK N OWLED G M ENTS
We would like to thank A. Jueterbock and O. Ovaskainen for suggestions and help on the DRIFTSEL implementation, and four anonymous reviewers as well as Luisa Orsini for their comments on earlier versions of the manuscript. This study was financially supported by an individual grant from the Volkswagen Stiftung (to MC). This work benefits from and contributes to the Daphnia Genomics Consortium.

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R ' S CO NTR I B UTI O N S
SPR and MC planned the study; MC conducted the molecular work; SPR and MC designed the analysis; SPR, MC, and MH analyzed the data; SPR and MC wrote the manuscript; all authors commented on results and contributed substantially to the manuscript.

DATA ACCE SS I B I LIT Y
The raw sequence reads used for this study as well as the experimental setup for the analysis of differentially expressed genes are available on ArrayExpress (https://www.ebi.ac.uk/arrayexpress; Accession no.: E-MTAB-6144).
The raw read counts used as input for differential transcript expression, results for the pairwise contrast analysis conducted in DESEq2, and the number of variants per sample before and after filtering, number of variant sites per transcript are all available on DRYAD in Supporting Information Tables S10, S11, and S3-S4,