The strength and form of natural selection on transcript abundance in the wild

Abstract Gene transcription variation is known to contribute to disease susceptibility and adaptation, but we currently know very little about how contemporary natural selection shapes transcript abundance. Here, we propose a novel analytical framework to quantify the strength and form of ongoing natural selection at the transcriptome level in a wild vertebrate. We estimated selection on transcript abundance in a cohort of a wild salmonid fish (Salmo trutta) affected by an extracellular myxozoan parasite (Tetracapsuloides bryosalmonae) through mark–recapture field sampling and the integration of RNA‐sequencing with classical regression‐based selection analysis. We show, based on fin transcriptomes of the host, that infection by the parasite and subsequent host survival is linked to upregulation of mitotic cell cycle process. We also detect a widespread signal of disruptive selection on transcripts linked to host immune defence, host–pathogen interactions, cellular repair and maintenance. Our results provide insights into how selection can be measured at the transcriptome level to dissect the molecular mechanisms of contemporary evolution driven by climate change and emerging anthropogenic threats. We anticipate that the approach described here will enable critical information on the molecular processes and targets of natural selection to be obtained in real time.


| INTRODUC TI ON
Understanding how natural selection acts on traits and eventually on organisms represents a fundamental challenge in biology (Mayr, 1982). Using a now classical regression-based approach (Lande & Arnold, 1983), ecologists have generated thousands of phenotypic selection estimates over the past 35 years; these estimates help to understand the contemporary selection processes in nature and enable comparisons of the strength and mode of selection across traits and species (Kingsolver et al., 2001;Kingsolver & Pfennig, 2007;Siepielski et al., 2017). However, despite this wealth of phenotypic selection estimates and a large number of studies that indirectly infer the roles of different evolutionary forces in shaping gene expression patterns (Fraser et al., 2010;Gilad et al., 2006), we know very little about how natural selection affects transcript abundance in the wild (Miller et al., 2011). This is remarkable given that | 2725 AHMAD et Al. variation in transcript abundance is of central importance to evolution (Emilsson et al., 2008;Fraser, 2013;Fraser et al., 2010;Gilad et al., 2006;Miller et al., 2011), linking molecular functions to performance and Darwinian fitness.
Here, we present an integrative approach investigating how contemporary natural selection shapes transcriptomic variation by combining analyses of selection differentials and gradients (Lande & Arnold, 1983) with the high-throughput screening of molecular phenotypes at the gene transcription level. Such use of the socalled molecular phenotypes has been highly successful in medical science for discovering the mechanisms underlying complex human diseases (e.g., Chaussabel et al., 2008;Cobb et al., 2005), but we currently know very little about how within-generation natural selection in the wild translates to changes at the RNA and protein levels (Husak, 2016). However, regression-based and distributional selection differentials and gradients (Henshaw & Zemel, 2016;Lande & Arnold, 1983), which measure the effect of a trait on relative fitness in standard deviation trait units, can be used to estimate the form and strength of contemporary natural selection on any quantitative trait, including transcript abundances, allowing direct comparisons among traits, populations and species (Lande & Arnold, 1983).
We focus on a host-parasite system consisting of brown trout (Salmo trutta) as the host and a myxozoan parasite (Tetracapsuloides bryosalmonae), the causative agent of temperature-dependent proliferative kidney disease (PKD) in salmonid fishes (Okamura et al., 2011).
Mass release of T. bryosalmonae spores from bryozoans occurs from spring to early summer (Hedrick et al., 1993), resulting in the synchronized infection of young-of-the-year fish through gills and/or skin (Longshaw et al., 2002). Inside the salmonid host, the parasite multiplies and induces an inflammatory response and tumour-like chronic lymphoid hyperplasia in the kidney (Bettge et al., 2009;Hedrick et al., 1993). The impairment to the kidney, the major organ responsible for haematopoiesis in fish, results in anaemia (Clifton-Hadley et al., 1987;Hedrick et al., 1993), which decreases oxygen transportation capacity, lowering the maximum metabolic rate and upper thermal tolerance (Bruneaux et al., 2016). The reduction in aerobic and renal capacity, combined with decreased oxygen solubility and increased oxygen demand at higher temperatures, makes PKD a serious threat to cold-water salmonid populations in regions affected by warm summers, which are expected to become more frequent under global warming (Okamura et al., 2011). Compared to many other host-parasite systems, brown trout and T. bryosalmonae represents a highly suitable model for studying contemporary natural selection on host gene expression in the wild because the parasite shows a high prevalence (Hedrick et al., 1993) and imposes a strong temperature-dependent effect on host physiology, performance (Bruneaux et al., 2016) and survival (Hedrick et al., 1993).
Furthermore, many challenges associated with field data, such as differences in host age, infection onset and conspecific co-infection dynamics Doeschl-Wilson et al., 2012) or host exposure avoidance (Graham et al., 2010), are minimal or absent.
To quantify the strength and form of within-generation selection on transcript abundance, we collected small fin biopsies from wild juvenile trout in August, when we expected all individuals to be infected, for transcriptome and multilocus fingerprint profiling, after which they were released back into their native environment.
Approximately 1 month later, after the anticipated period of parasite-associated mortality, we recaptured and identified survivors based on multilocus genotype information and tested whether fin-tissue transcript abundances measured in August correlate with survival. To further elucidate the transcriptional signatures linked with the observed mortality, we also measured the T. bryosalmonae load in kidney tissue among survivors to identify transcripts and protein-protein interaction (PPI) networks associated with both survival and parasite load (PL).

| Study population, temporal abundance and temperature records
The coastal river Altja (length 17.6 km, catchment area 46.1 km 2 ) flows into the Gulf of Finland in the Baltic Sea ( Figure S1) and supports a small wild anadromous brown trout population with a high prevalence of Tetracapsuloides bryosalmonae (Dash & Vasemägi, 2014).

| Field sampling, phenotyping and genetic markrecapture analysis
On 30 August 2015, we electrofished 278 young-of-the-year trout in the river Altja from the same five areas along a 330-m stretch that were sampled in 2014 (Table S1, Figure S1, area IDs 1-5). Individuals were anaesthetized with buffered MS-222 (SigmaAldrich) and measured for fork length (±1 mm) as a measure of body size. After small biopsies of the right pelvic fin tissue for genetic mark-recapture analysis and 3′ RNA sequencing (see below), we released the recovered trout within their original capture area. As fins regenerate in teleost fishes, a small fin biopsy is unlikely to impair fish survival (Gjerde & Refstie, 1988). Biopsies were immediately frozen in liquid nitrogen and stored at −80°C.
We used a fin subsample for DNA analysis and individual identification (see below). Approximately 1 month after initial electrofishing (22-27 September), we caught 685 young-of-the-year trout along a 780-m stretch that included the initial 330-m stretch (Table S1).
The five initial catch areas (area IDs 1-5) were electrofished three times (Table S1), and we estimated the capture probability and total number of fish using the depletion method (Zippin, 1958) implemented in the fsa (Fisheries Stock Assessment) package version 0.8.17 (Ogle, 2017)  ; kidney-to-muscle ratio as a measure of kidney swollenness [KS]) and quantification of PL were carried out as previously described (Debes et al., 2017). The relationships between PL and PKD-linked phenotypic traits (Hct, KS, FL) were analysed using general linear mixed models in asreml-r 3.0 (Butler et al., 2009). To control for genetic variation in the expression of the traits, the models accounted for the genetic relationship matrix of the individuals (A) via linking (A −1 ) to random individual effects.
The relationship matrix was obtained by genotyping the sampled individuals (see next section) and then reconstructing their parentage using colony version 2.0.6.5 (Jones & Wang, 2010).

| Microsatellite analysis and identification of individual recaptures
We genotyped individuals using 14 highly variable microsatellite loci as previously described (Debes et al., 2017). Individuals with at least 10 successfully genotyped loci were included in the analysis. Recaptured individuals were defined as having identical genotypes with at most one allele mismatch (at least a 95% match when 10 loci were genotyped) using microsatellite toolkit (Stephen D. E. Park, Trinity College, Dublin, Ireland). To estimate genotyping error F I G U R E 1 Temperature dependence of PKD in wild trout. (a) Dead young-ofthe-year brown trout found in the Altja river with putative PKD-associated death symptoms (swollen kidney, a widely opened mouth and flared gills suggestive of anaemia). (b) Body section of trout with normal (left) and swollen (right) kidney. (c) Effect of temperature on juvenile trout abundance during 2005-2017 in the Altja river in relation to average summer air temperature (7-year moving average mean summer air temperature over 73 years is highlighted in bold). rates, we amplified and genotyped 440 randomly selected samples twice, which indicated low error rates (mean allelic dropout rate: 0.0107, range 0.0013-0.0292; mean false allele rate: 0.0027, range 0.0010-0.0176).

| Quantitative real-time polymerase chain reaction (qPCR)
PL was determined from kidney tissues collected in September 2015 by qPCR using previously described protocols (Debes et al., 2017). For .031 × PL2015). To estimate technical bias among plates, we included a log 10 dilution series (50, 5, 0.5, 0.1 and 0.05 ng/µl) from a reference sample in quadruplicate per gene on every plate. Standardized amplification efficiency was calculated among plates, using plate-and gene-specific efficiencies estimated from the Cq versus log 10 -dilution slopes (Debes et al., 2017). Subsequently, we fitted a linear mixed model to estimate PL for each individual that accounted for technical bias among plates and wells (Debes et al., 2017).

| RNA isolation and library preparation for Illumina sequencing
Total RNA was successfully extracted from pelvic fin tissue of 238 individuals (85.6%) out of 278 collected in August 2015 (i.e., survivors and nonsurvivors) using the NucleoSpin RNA kit (Macherey-Nagel) and ensuring RNA quality using the Agilent 2100 Bioanalyzer.
We prepared a barcoded library using Lexogen QuantSeq 3′ mRNA-Seq Library Prep Kit FWD for Illumina according to the manufacturer's recommendations (Lexogen). QuantSeq provides an efficient and cost-effective protocol for generation of strand-specific nextgeneration sequencing libraries close to the 3′ end of polyadenylated RNAs (Moll et al., 2014). This approach is analogous to other tag-based RNA sequencing (RNAseq) approaches, such as TagSeq (Meyer et al., 2011), which have been shown to generate more accurate estimates of transcript abundances than standard RNAseq with a fraction of the sequencing effort (Lohman et al., 2016).
We inspected all barcoded libraries for quality with fragment analyzer (Advanced Analytical, AATI) using the High Sensitivity NGS Fragment Analysis Kit. The three pooled barcoded libraries, consisting of 64, 91 and 96 individuals, were single-end sequenced using an Illumina HiSeq2500 in 14 lanes. For the first two pooled libraries, we generated 125-bp-long reads in two lanes. For the remaining 12 lanes, we generated 100-bp reads. Overall, we obtained 2.21 billion raw reads, with 1.5-34.6 million reads per individual (median 8.9 million reads). In addition, conventional Illumina mRNA pairedend sequencing (100 bp) was carried out for two fin-clip mRNA We assessed the quality of reads before and after trimming using fastqc (version 0.11.2; https:// www.bioin forma tics.babra ham.ac.uk/proje cts/fastq c/). A total of 1.1-27.1 million QuantSeq reads per sample remained after quality filtering (median 6.8 million reads).

| Trout fin-specific splice sites, reference genome modifications and mapping
To identify brown trout fin-specific splice sites, we mapped qualityfiltered paired-end reads from two pooled fin libraries (total 44.5 million paired and 9.1 million unpaired reads) and three 3′ mRNA-Seq samples with the highest read depth (total 58.4 million reads) to the Atlantic salmon genome (GCF_000233375.1) (Lien et al., 2016) using spliced aligner hisat2 (version 2.1.0 (Kim et al., 2015). Subsequently, the resulting spliced alignment and the salmon genome annotation file were used as an input for stringtie (Pertea et al., 2015) to assemble full-length transcripts expressed in the trout fins. The splice-site locations of the stringtie output were extracted using the extract_splice_sites.py script provided with hisat2. Furthermore, single nucleotide variants were called from the pooled fin paired-end alignment using mpileup in samtools (Li et al., 2009). The Atlantic salmon genome was modified with the alternative alleles. Finally, all quality-controlled reads from QuantSeq 3′ mRNA-Seq were splice aligned to the modified reference genome using hisat2 with trout fin-specific splice sites as a guide.

| Estimation of transcript abundance and batch effect correction
All alignment data were loaded into R as the Ranged-SummarizedExperiment object returned by the summarizeOverlaps function available in the R package genomicalignments (Lawrence et al., 2013). The original salmon-genome-annotation GFF file was used to dissect exons on the basis of gene information, and union mode was selected for assigning the reads within the exons while considering strand information. Read counts from separate lanes, runs and replicate files were summed to individual counts using collapseReplicates implemented in the R package deseq2 (Love et al., 2014). The resulting data object contained a raw read count matrix and phenotypes for each sample. For subsequent analysis, we selected only protein-coding, nuclear genes with an average of >10 reads per individual. To make the gene expression data compatible for linear modelling, the raw read counts were converted into quantile-normalized log 2 -counts per million (logCPM) using the voom method (Law et al., 2014) implemented in the limma package (Ritchie et al., 2015). Pooled library batch effects were removed by employing the ComBat function implemented in the sva package (Leek et al., 2012), and corrected gene counts were used in differential gene expression analysis.

| Differential expression (DE)
To describe the relationships between the continuous phenotype (PL) and transcript abundance, generalized linear models were fitted using the glm function available in R. Each gene in the corrected gene count matrix was used as a predictor against the phenotype (response variable) assuming the normal distribution for both.
Q-values were calculated using the qvalue package implemented in R. To identify genes associated with survival, we used an iterative random forest (RF) classification approach using the ranger (Wright & Ziegler, 2017) R package. The corrected gene count matrix and survival status (dependent variable) were used as an input for the classification.
After each RF iteration (100,000 trees), genes with permutation importance value <0 were eliminated for the next iteration. The iterations ended when all the genes in the input matrix have permutation importance values ≥0. After 64 iterations, a final set of 1270 genes classified individuals into survivors and nonsurvivors with a 16% error rate. RF misclassified 25 (10.5%) recaptured individuals as nonsurvivors, and 13 (5.5%) uncaptured individuals as survivors. While misclassification of the recaptured individuals may reflect their poor physiological status, it is likely that some proportion of uncaptured individuals survived. This was further supported by mark-recapture analysis, which indicated that a small number (n = 13.9, 95% CI 7.2-23.1) of surviving fish that were marked in August were not recaptured in September (Table S1). For subsequent analysis, we therefore treated the 13 putatively uncaptured individuals as survivors based on their transcript profiles (Table S4) Figure S3). Furthermore, DE analysis based on uncorrected survival produced a hill-shaped, rather than uniform, p-value distribution, indicating that misclassification of individuals probably resulted in violation of the statistical test assumptions ( Figure S2). Thus, corrected survival status was used in the DE analysis using deseq2. First, the raw read counts, library size factors and dispersion were estimated using estimateSizeFactors and estimateDispersions, respectively and then the differential gene expression was performed using nbinomWaldTest along with the three pooled library IDs as a covariate. However, given that this work primarily aimed to generate new hypotheses rather than validate earlier findings and focused on pathways rather than single transcript detection, we adopted a relatively relaxed significance threshold (unadjusted p < .01) for DE and survival analysis.

| Weighted gene co-expression network analysis (WGCNA)
Genes associated with survival were subjected to automatic network construction and module detection using the blockwiseModules function implemented in the R package wgcna (Langfelder & Horvath, 2008). The cut-off for the minimum scale-free topologyfitting index was set to 0.8 (power = 7), and we used biweight midcorrelation (bicor) to estimate correlations (other parameters: networkType = "signed," minKMEtoStay = 0.2). For the analysis of quadratic selection differentials (see below), similar parameters were used (except cut-off for the minimum scale-free topology fitting index = 0.61 [power = 6]).

| Gene Ontology (GO) and protein-protein interaction network analysis
Atlantic salmon orthologue gene symbols, entrez IDs, and descriptions in humans (86.8%), zebrafish (3.6%) or other organisms (9.6%) were searched using complete gene names in NCBI using the rentrez (Winter, 2017) package in R. GO-enrichment analysis was performed using string-db (Szklarczyk et al., 2015) and gorilla (Eden et al., 2009), in which all orthologue gene symbols were used as a background list. For string-db PPI analysis, we used single lists of gene symbols against human protein references (minimum interaction score: 0.70; text mining disabled).

| Quantification of linear and nonlinear selection differentials
We estimated linear and nonlinear (i.e., quadratic) selection differentials for each of the 18,717 gene transcripts quantified in 238 individuals based on both corrected and uncorrected survival (binary status, nonsurvivor = 0, survivor = 1; Table S2). Subsequently, estimates of linear or quadratic selection differentials were computed using generalized linear models under the glm function in R. These models used logit-link functions and binomial error distributions for the binary survival response and the predictor of the mean-centred and variance-scaled (mean = 0, SD = 1) transcript levels (linear differentials) or added a quadratic scaled transcript-level predictor (nonlinear differentials). To transform the logistic regression model coefficients to selection differentials on the relative fitness scale that is meaningful to microevolutionary studies, we used the method suggested by Janzen and Stern (1998). The p-values for each selection differential were estimated using t tests that were constructed based on logistic regression coefficients, their standard errors and model residual degrees of freedom. In addition, to calculate the distribution of linear and nonlinear selection differentials when no selection is acting on transcripts, we randomized the survival values (1000 permutations) and estimated selection differentials as described above. To compare the strength of directional and nondirectional selection, we used a recently developed unified measure, the distributional selection differential (DSD) (Henshaw & Zemel, 2016) for standardized trait values (mean = 0, SD = 1). This enabled us to use a single framework to partition total selection (DSD) into selection on the trait mean (dD) and selection on the shape of the trait distribution (dN).

| Quantification of linear selection gradients
The linear selection gradients for the DE genes were reconstructed from a subset of principal components (Chong et al., 2018), as this approach not only enables the multicollinearity between the predictors to be handled but is also suitable for cases in which the number of predictors exceeds the number of observations. For this purpose, the principle components were calculated from the correlation matrix of the standardized values of 416 DE genes, and the linear selection gradients were subsequently computed for the first 55 PCs (explaining 76% of the variation) with the glm function in R, using the logit-link function and binomial error distribution. The eigenvectors from the original 55 PCs were then matrix multiplied with the resulting linear selection gradients to reconstruct the selection gradients for individual genes (Chong et al., 2018). Similarly, the selection gradients for 416 DE genes were calculated by including FL as a predictor. The standard errors were reconstituted for these gradients as described by Chong et al. (2018). The t-statistic was calculated by dividing the gradients by the standard errors, and the p-values were estimated from the results using 237 degrees of freedom. The pvalues were corrected for the FDR using the p.adjust function in R.

| Potential links between survival and parasite load
PPI network analysis using string-db indicated that both survival-as-

| Linear selection differentials and gradients
To quantify the strength and form of contemporary natural selection on transcript abundance, we first estimated standardized linear (s) and quadratic selection differentials (λ) for 18,717 transcripts.

Selection differentials quantify selection (both direct and indirect)
on a trait in terms of the effects of trait values on relative fitness in units of standard deviations of the trait, allowing direct comparisons among traits, populations and species (Lande & Arnold, 1983). We compared their magnitudes to a large data set of phenotypic selection estimates based on a variety of traits and taxa (1,834 published estimates of s) (Siepielski et al., 2017). The vast majority of s values, which measure the change in a population's mean trait value before and after selection, were small (median(|s|) = 0.047; 95% values of s between −0.132 and 0.129, Ŝ ± SE = −0.0011 ± 0.0005, t 18716 = −2.14, p = .033), whereas the coregulated gene associated with survival showed larger values of s (Figure 4a). Similar results were obtained for the linear differential estimates calculated using both uncorrected and corrected survival information ( Figure S4).
Next, we measured the linear selection gradients (β) for each of the 416 transcripts that covaried with survival after removing the effect of indirect selection in a multiple regression framework using principal component scores (Chong et al., 2018). Among the recon-  (Table S2) were highly correlated (r 2 = 0.950) indicating that fish size has little effect on these estimates.

| Nonlinear selection
Direct comparison of the strength of linear and nonlinear selection using distributional selection differentials (Henshaw & Zemel, 2016) revealed that the linear component of selection was generally stronger than the nonlinear component, which represents selection on the shape of the trait distribution (mean dD = 0.053, dN = 0.031; signed test, p = 9.4 × 10 −206 ; Figure 4b). Nevertheless, for 7273 (40.8%) genes, the nonlinear differentials were higher than the linear selection differentials ( Figure S5). Furthermore, permutations indicated that while a small proportion of transcripts were affected by directional selection, the data set was highly enriched for transcripts influenced by disruptive selection, reflecting the elevated survival associated with extreme transcript abundance ( Figure 5); the distribution of λ was shifted strongly towards the right tail (Figure 4b, 95% values of λ between −0.137 and 0.279), and its mean differed from zero (̂ ± SE = 0.058 ± 0.001, t 18,716 = 78, p = 2.2 × 10 −16 ; compared to 658 phenotypic λ estimates [Siepielski et al., 2017], two-sample Wilcoxon test p = 2.2 × 10 −16 ). Similar results were obtained for the quadratic selection differentials calculated for both corrected and uncorrected survival information ( Figure S4).  Figure 4b).

GO analysis indicated that genes shaped by disruptive selection
The constructed co-expressed gene modules showed further enrichment for more specific GO terms, such as the cellular response to cytokine stimulus (brown module, GO:0071345, FDR = 0.015, n = 23) and antigen processing and presentation of peptide antigen via MHC class II (brown module, GO:0002495, FDR = 0.049, n = 10).

| D ISCUSS I ON
There has long been interest in understanding the relative roles of drift and selection shaping gene expression variation within and between species (Dunn et al., 2013;Romero et al., 2012). The common approach to this complex question encompasses phylogenetic or comparative analyses that aim to indirectly identify patterns of expression, which do not fit neutral expectations over evolutionarily long time periods. However, these approaches describe the response to selection (R) and not the strength of selection (S) when expressed in the context of the breeder's equation (R = Sh 2 ), where h 2 is narrow-sense heritability. By combining 3′ RNA-sequencing, genetic mark-recapture and selection analysis, we adopted an alternative approach as in Groen et al. (2020) to directly quantify the intensity and form of contemporary natural selection on transcript abundance. As a result, we were able to characterize the transcriptomic targets and potential molecular pathways involved in the process of contemporary parasite-driven selection.

| The strength of natural selection on transcript abundance
Based on uni-and multivariate regression analysis, we identified a small number of transcripts potentially affected by selection. This is F I G U R E 4 Strength of survival selection on 18,717 transcripts and published phenotypic traits. (a) Linear selection differentials s. (b) Quadratic selection differentials λ. Differentials for transcripts and published phenotypic traits (Siepielski et al., 2017) are shown as dark and light grey histograms, respectively. Negative and positive λ values reflect stabilizing and disruptive selection, respectively. Estimates <−0.75 were assigned a value of −0.75, and estimates >0.75 were assigned a value of 0.75. Selection differentials for the WGCNA gene modules are shown as coloured pins. The inserted figures illustrate the relationships between survival and module eigengenes as cubic spline (Schluter, 1988) functions (95% CI in grey) for the red and brown modules; short insert lines reflect individual data. The inserted boxplot illustrates total selection as measured by the distributional selection differential (DSD; Henshaw & Zemel, 2016), which is broken down into components representing selection on the trait mean (dD = |s|) and selection on the shape of the trait distribution (dN). The line across the box represents the median; the box edges represent the upper and lower quartiles; the whiskers extend to a maximum of 1.5× interquartile range beyond the box; and the points represent outliers consistent with recent work in rice, suggesting that directional selection is generally weak at microevolutionary times, and the strength of selection depends on environmental conditions (Groen et al., 2020).
Nevertheless, the estimated selection differentials measuring both indirect and direct selection on traits ranged widely from −0.26 to 0.23, implying that if heritable variation is present and constraints are absent, selection can exert evolutionary changes in transcript abundances at evolutionarily short timescales (Campbell-Staton et al., 2017;Donihue et al., 2020;Kingsolver et al., 2001;Kingsolver & Pfennig, 2007). When we compared our transcriptomic data with published phenotypic traits (n = 1834) in terms of the strength of linear selection (which includes both indirect and direct selection components), similar frequency distributions were observed, with the large majority of estimated differentials being close to zero.
However, the phenotypic traits possessed longer "tails" for selection differentials than the transcripts, suggesting either rare but very strong linear selection on some phenotypic traits and/or potential bias due to small sample sizes. On the other hand, selection differentials quantify selection considering each trait separately and measure the total selection acting on the trait (both direct and indirect).
Thus, when traits are highly correlated, as is likely among the many transcripts, it becomes impractical to distinguish separate influences of individual transcript abundances on relative fitness. To overcome this limitation, we subsequently calculated selection gradients (β) for 416 transcripts that covaried with survival to quantify the strength of direct selection on individual genes after removing indirect selection from other correlated transcripts. Altogether, we identified 67 significant β estimates ranging from −0.47 to −0.15 and from 0.15 to 0.63, indicating that direct selection on transcript abundances has the potential to cause substantial evolutionary changes at relatively short timescales. However, further studies are clearly needed to shed light on how environmental conditions driven by climatic fluctuations influence the strength and form of selection on transcript abundances (Groen et al., 2020). Given that gene expression variation has a strong environmental component, we expect that the patterns of selection often vary considerably among years and populations, and changes in the direction of selection are frequent, as observed for phenotypic traits (Campbell-Staton et al., 2017;Donihue et al., 2020;Siepielski et al., 2009Siepielski et al., , 2017.

| The form of natural selection on transcript abundance
Direct comparison of the strength of linear and nonlinear selection using the distributional selection differentials (Henshaw & Zemel, 2016) revealed that for 40% of transcripts, the nonlinear differentials were higher than the linear selection differentials.
Furthermore, when nonlinear selection was partitioned into stabilizing and disruptive components, our data set was highly enriched for transcripts showing signatures indicative of disruptive selection. This is unexpected because disruptive selection is thought to be rare in nature (Kingsolver et al., 2001;Kingsolver & Pfennig, 2007).
Moreover, this finding contrasts with the expectation that stabilizing selection is more common than disruptive selection if most populations are well adapted to their current environment (Kingsolver et al., 2001). On the other hand, it has been suggested that disruptive selection may be more widespread than previously thought, reflecting density-dependent or frequency-dependent competition for resources (Kingsolver & Pfennig, 2007). Thus, our results corroborate with phenotypic selection estimates, and also suggest that host transcript abundance may be influenced by disruptive selection in response to parasite infection. For example, it may be more beneficial for a host to either invoke a strong immune response (i.e., highly resistant hosts with the lowest PL and lowest disease expression as measured by kidney hyperplasia) or tolerate the damage from a high PL than to partially control the parasite load (i.e., hosts suffering from damage by both parasites and immunopathology). The functional categorization of genes and gene modules under disruptive selection supported this hypothesis, because they were highly enriched for biological processes related to host immune defence, host-pathogen interactions, cellular repair and maintenance. These inferred functions presumably reflect the complex nature of hostparasite interactions, as the transcripts shaped by disruptive selection were involved in a wide range of molecular processes.

| Functional annotation of putative targets of selection
Variation in transcript abundance, similar to that in morphological or physiological traits, is expected to be shaped by selection through whole-animal performance, which can be defined as how well an organism accomplishes certain ecologically relevant tasks (Arnold, 1983). Therefore, it is pertinent to ask what performance traits are "visible" to selection in the studied host-parasite system.
The functional categorization of genes and correlated gene modules provides some clues to this question, as both survival-and PL-associated transcripts shaped by linear selection were highly enriched for genes involved in the mitotic cell cycle. First, it is unlikely that the genes associated with survival reflect variation for general stress response of the host caused by Tetracapsuloides bryosalmonae infection. This is because most stress factors lead to an arrest of mitosis (Burgess et al., 2014;Kassahn et al., 2009;Martín-Hernández et al., 2017), whereas we detected that PL associated with up-regulation, rather than down-regulation (what may be expected for arrest of mitosis), of the key mitotic cell cycle host genes (AURKB, UBE2C, BIRC5; Figure 3). Second, the observed associations between fin tissue transcriptome, PL and survival may reflect the host response to parasite entry because T. bryosalmonae enters its salmonid host through surface tissues, which may include fins (Longshaw et al., 2002). However, we currently lack experimental evidence that T. bryosalmonae entry causes upregulation of cell-cycle activity in fin or/and other mucosal tissues of the host. Third, the coupling of the transcription of mitotic cell cycle genes in fin, PL and survival may reflect the severe physiological impact of PKD on the host at the whole organismal level.
For example, previous studies in salmonids have demonstrated that one of the main PKD symptoms is tumour-like proliferation of the lymphoid renal tissue, where the kidney may increase in size to over ten times its normal volume (Figure 1; Bettge et al., 2009;Clifton-Hadley et al., 1987;Hedrick et al., 1993). Similarly, PKD causes enlargement of the spleen, and several studies suggest that PKD in salmonids is a systemic disease that affects multiple organs and tissues (Bettge et al., 2009;Bruneaux et al., 2016;Clifton-Hadley et al., 1987;Hedrick et al., 1993;Longshaw et al., 2002;Okamura et al., 2011). In teleost fishes, pelvic fins consist of epidermis, bony rays, ligaments, nerve fibres, connective tissue cells and blood vessels. The observed associations between cell-cyclerelated host genes, PL and survival may, therefore, reflect the importance of blood homeostasis and sustaining normal kidney function. However, analysis of multiple tissues, including renal, blood and fin transcriptomes, during the progression of PKD is clearly needed to further dissect the molecular mechanisms of the host response, as we currently lack comprehensive understanding of the inflammatory, mitotic and immune processes across organs (Chevrier, 2019). Regardless of the specific physiological mecha- Two limitations in this study may be addressed in future research. First, despite the high electrofishing effort, low dispersal and relatively high recapture probabilities, we probably did not recover all survivors. We therefore carried out functional and selection analysis based on both initial recapture information and by treating 13 putatively uncaptured individuals as survivors, as suggested by their transcript profiles that matched recaptured individuals. However, given that the main findings (e.g., distribution of the linear and nonlinear selection coefficients, GO enrichment patterns) remained very similar irrespective of the classification, imperfect classification of small number of survivors probably had only a small effect on the main conclusions. Second, even though it was not possible to analyse the primary target tissue of the parasite (kidney), requiring lethal sampling and thereby preventing survival estimates, the systemic nature of PKD conceivably enabled us to acquire biologically meaningful information from fin biopsies with a minimal expected effect on fish survival (Gjerde & Refstie, 1988). Similarly, transcript abundances in fin tissue have recently been associated with ageing-related mortality in fish, demonstrating the usefulness of fin tissue for linking gene expression and whole-organism performance (Baumgart et al., 2016). More generally, because of their major role in pathogen defence, mucosal surface tissues have been widely used to study innate and adaptive immune responses in teleost fishes (Gomez et al., 2013).
In summary, our work demonstrates the power and challenges of integrating nonlethal sampling and transcriptomics with classical ecological methods to dissect complex high-order organismal traits, such as survival in the wild, into functionally interpretable molecular processes. As such, our study provides a novel perspective for studying contemporary selection at the suborganismal level and is readily applicable to other species and systems, where nonlethal sampling of blood, mucosal and other tissues is feasible. We anticipate that the approach described here will enable critical information on the molecular mechanisms and targets of natural selection to be obtained in real time, as wild populations increasingly contend with novel selective pressures (Hendry et al., 2008), including those imposed by global warming (Hoffmann et al., 2011).

AUTH O R CO NTR I B UTI O N S
A.V. conceived the study. A.V., F.A., P.V.D., S.K. and L.P. collected the samples. P.V.D. measured haematocrit and kidney swollenness.