Balancing selection in Pattern Recognition Receptor signalling pathways is associated with gene function and pleiotropy in a wild rodent

Pathogen‐mediated balancing selection is commonly considered to play an important role in the maintenance of genetic diversity, in particular in immune genes. However, the factors that may influence which immune genes are the targets of such selection are largely unknown. To address this, here we focus on Pattern Recognition Receptor (PRR) signalling pathways, which play a key role in innate immunity. We used whole‐genome resequencing data from a population of bank voles (Myodes glareolus) to test for associations between balancing selection, pleiotropy and gene function in a set of 123 PRR signalling pathway genes. To investigate the effect of gene function, we compared genes encoding (a) receptors for microbial ligands versus downstream signalling proteins, and (b) receptors recognizing components of microbial cell walls, flagella and capsids versus receptors recognizing features of microbial nucleic acids. Analyses based on the nucleotide diversity of full coding sequences showed that balancing selection primarily targeted receptor genes with a low degree of pleiotropy. Moreover, genes encoding receptors recognizing components of microbial cell walls etc. were more important targets of balancing selection than receptors recognizing nucleic acids. Tests for localized signatures of balancing selection in coding and noncoding sequences showed that such signatures were mostly located in introns, and more evenly distributed among different functional categories of PRR pathway genes. The finding that signatures of balancing selection in full coding sequences primarily occur in receptor genes, in particular those encoding receptors for components of microbial cell walls etc., is consistent with the idea that coevolution between hosts and pathogens is an important cause of balancing selection on immune genes.

& Clark, 2013; Tennessen et al., 2015;Unckless et al., 2016). In humans, for example, there is evidence of balancing selection on a number of immune genes other than MHC genes, including NOD2 which encodes an innate immune receptor (Nakagome et al., 2012), DEFB1 which encodes an antimicrobial peptide (Cagliani et al., 2008) and TRIM5 which encodes a retroviral restriction factor . Taken together, this clearly supports the view, originally proposed by Haldane (1949), that host-pathogen interactions are important for the maintenance of genetic diversity (Lazzaro & Little, 2009;Woolhouse, Webster, Domingo, Charlesworth, & Levin, 2002) One factor that could conceivably affect the likelihood of pathogen-mediated balancing selection on an immune gene is its function.
Here we consider two broad functional categories of immune genes: those encoding receptors directly involved in the recognition of pathogens, and those encoding signal transduction proteins downstream of receptors. Host-pathogen interactions can result in balancing selection through several different processes (Hedrick, 2002;Karasov et al., 2014). One particularly influential idea is that balancing selection is a result of coevolution between a host-pathogen pair driven by negative frequency-dependent selection, often referred to as Red Queen dynamics (Woolhouse et al., 2002). Models of this process-"gene-for-gene" and "matching-allele" models, and variants of them-typically assume that the host genes involved encode proteins that interact directly with pathogens (Agrawal & Lively, 2003;Dybdahl, Jenkins, & Nuismer, 2014). Consequently, if host-pathogen coevolution is an important cause of balancing selection, genes encoding innate immune receptors involved in pathogen recognition could be expected to be more important targets than genes encoding signal transduction proteins. Moreover, receptors can be divided into two types: those recognizing components in bacterial or fungal cell walls, bacterial flagella, or viral capsids, and those recognizing features of microbial nucleic acids (e.g., dsRNA and unmethylated CpG sequences; Takeuchi & Akira, 2010). This classification is motivated because components of microbial cell walls, flagella and capsids are likely to be evolutionarily more dynamic than features of microbial nucleic acids. Thus, if balancing selection is driven by coevolution, we would expect it to primarily target receptors recognizing components of cell walls etc.
Another factor that could affect the likelihood of balancing selection on a gene is its degree of pleiotropy. In analyses across species, genes with a high degree of pleiotropy, measured for example as the number of protein-protein interactions, are more constrained (Chesmore, Bartlett, Cheng, & Williams, 2016;Fraser, Hirsh, Steinmetz, Scharfe, & Feldman, 2002). It seems likely that this effect can operate also at the intraspecific level, so that more pleiotropic genes are less likely to evolve under balancing selection. Gene function and pleiotropy are probably associated with each other (receptors having lower pleiotropy than signalling genes), and as far as we are aware no study has disentangled their relative importance for balancing selection. These transcription factors induce expression of inflammatory cytokines, chemokines, costimulatory molecules, etc. (Caruso, Warner, Inohara, & Núñez, 2014;Gay, Symmons, Gangloff, & Bryant, 2014;Geijtenbeek & Gringhuis, 2009;Loo & Gale, 2011). Some NLRs also activate inflammatory cytokines such as interleukin-1β (IL-1β) and IL-18 (which are initially present as inactive precursors) through the formation of inflammasomes, instead of inducing transcription of cytokines through activation of transcription factors (Wen, Miao, & Ting, 2013).
We used whole-genome resequencing data from a population of bank voles (Myodes glareolus) to analyse patterns of intraspecific diversity in a set of 123 genes in PRR pathways, from receptors to transcription factors. First, we analysed which factors (gene function, pleiotropy) influence the amount of variation (nucleotide diversity) in coding sequences, and performed population genetic analyses to test if genes with exceptionally high diversity were under balancing selection. Second, we scanned for signatures of balancing selection in both protein coding and noncoding sequences of PRR pathway genes using betascan (Siewert & Voight, 2017), and then tested if these signatures were associated with gene function and pleiotropy.
F I G U R E 1 Pattern Recognition Receptor (PRR) signalling pathway genes included in the analyses. PRR pathway genes were divided into two main functional categories: genes encoding receptors and genes encoding signal transduction proteins ("receptor genes" and "signalling genes," respectively). Receptors were divided into two subcategories depending on what types of ligands (Microbe-Associated Molecular Patterns; MAMPs) they recognize: components of bacterial and fungal cell walls, bacterial flagella or viral capsids versus features of microbial nucleic acids. Signal transduction genes include genes encoding adaptor proteins, protein kinases and other enzymes, transcription factors, etc [Colour figure can be viewed at wileyonlinelibrary.com] [Colour figure can be viewed at wileyonlinelibrary.com] 2 | MATERIAL AND ME THODS
PRR pathway genes were divided into two main categoriesreceptors and signalling genes-depending on the function of the Consortium, 2017) and PhosphoSitePlus (Hornbeck et al., 2015) about the function of the encoded protein.

| Study species and field work
The bank vole (Myodes glareolus, Schreber 1780) is a small rodent in the subfamily Arvicolinae (lemmings, voles and muskrats). It occurs in forests and meadows with thick ground cover, from western Europe to central Siberia (Wilson, Lacher, & Mittermeier, 2017).
Bank voles for this study were trapped at six different sites (N = 2-8 per site, depending on the area of each site) at Revingehed, an area comprising 43 km 2 of grazed grassland with small woods, marshes and meadows, situated 20 km east of Lund, southern Sweden. All sites are well connected, without any obvious dispersal barriers.
Previous analyses based on microsatellite loci showed weak differentiation among the sites (F ST 0.022-0.055; Tschirren, Andersson, Scherman, Westerdahl, & Råberg, 2012). To avoid including siblings we only sampled adults (i.e., after post-juvenile dispersal). Samples were collected with permission from the Malmö/Lund board for animal experiment ethics (permission M47-14).  (Table S3). All libraries were sequenced on a Hiseq X (Illumina). The reads were assembled de novo using a combination of allpaths-lg (Gnerre et al., 2011), sspace (Boetzer, Henkel, Jansen, Butler, & Pirovano, 2011) and besst (Sahlin, Chikhi, & Arvestad, 2016), with gap closing provided by gapcloser in the soapdenovo package (Li et al., 2010) (Table S4). The initial assembly was filtered for contaminating sequences and short and redundant scaffolds. We further assembled scaffolds containing partial coding sequences of our focal immune genes by using synteny to the prairie vole or mouse genome. Gene annotation was performed on a repeatmasked version of the genome. Briefly, augustus (Stanke, Diekhans, Baertsch, & Haussler, 2008) was used to generate gene models guided by RNAseq data from six tissues (Table S5) mapped to the genome with gsnap (Wu, Reeder, Lawrence, Becker, & Brauer, 2016) and rodent proteins mapped to the reference genome with exonerate (Slater & Birney, 2005). We further performed manual curation of the focal genes in webapollo (Lee et al., 2013). See the Supporting Information and Table S6 for further details on de novo genome assembly and annotation.

| Whole-genome resequencing and variant calling
DNA from spleens of 30 bank voles (Table S7) were extracted.
Sequence libraries with a targeted insert size of 350 bp were prepared and sequenced on 30 Hiseq X (Illumina) lanes. For mapping we also included the 670-bp library used in the de novo genome assembly. The reads were mapped to the reference assembly using bwa (Li & Durbin, 2009) and read duplicates were removed with picardtools (http://broad insti tute.github.io/picar d/). Variants were called using freebayes (Garrison & Marth, 2012). The raw variants were filtered in several steps, including for low quality score, coverage and overlap with annotated repeats. See the Supporting Information and Table S8 for further details on whole-genome resequencing and variant calling.
We performed a phylogenetic clustering of mitochondrial haplotypes by downloading a set of complete bank vole mitochondrial genomes originating from several bank vole populations in Europe (Filipi, Marková, Searle, & Kotlík, 2015). We aligned the mitochondrial sequences of our samples with those of the European population set using clustalw and created a neighbor-joining tree in the R package ape (Paradis, Claude, & Strimmer, 2004).

| Analyses of balancing selection in coding regions
We used popgenome (Pfeifer, Wittelsbu, Ramos-onsins, & Lercher, 2014) to calculate nucleotide diversity (π) for the coding sequence of each of the 123 PRR pathway genes. To obtain an empirical background distribution of nucleotide diversity, we extracted coding sequences from a set of control genes in the bank vole genome. We initially considered all genes (N = 12,314) that could be uniquely assigned to a mouse protein through blast (excluding, for example, genes split over multiple scaffolds) and that were not included in our focal set of immune genes. We also required that the same bank vole gene was the reciprocally best blast hit of the mouse gene and that the highest scoring segment pair (HSP) of the coding sequence of the bank vole gene covered at least 90% of the mouse protein. This resulted in a total of 9,246 control genes.
For PRR pathway genes with exceptionally high nucleotide diversity in the coding region ("outlier genes," with π above the 95th percentile of the empirical distribution based on 9,246 control genes), we used Hudson-Kreitman-Aguadé (HKA) tests (Hudson, Kreitman, & Aguadé, 1987) to investigate whether the high nucleotide diversity was a result of balancing selection (Fijarczyk & Babik, 2015).
HKA tests were performed with the software mlhka (Wright & Charlesworth, 2004), where polymorphism/divergence ratios (at synonymous sites) for a focal gene are compared to polymorphism/ divergence ratios for a set of neutrally evolving genes. We first used either house mouse (Mus musculus) or prairie vole (Microtus ochrogaster) as the outgroup for a few of the highly polymorphic genes.
In all cases the analyses yielded identical conclusions and we ultimately decided to perform all analyses with the house mouse as outgroup because its reference genome is better annotated. We used tblastx to match the coding sequence of the outlier bank vole genes to protein-coding transcripts of mouse and extracted the highest scoring transcript to be used for alignment with the bank vole sequences. The sequence alignment was performed with geneious R11 (Biomatters) using "translation align." The alignments were manually inspected and fixed insertions or deletions between the species were removed. We used dnasp version 6 (Rozas et al., 2017) to obtain polymorphism and divergence estimates for the aligned data.
Specifically, for each gene we calculated the number of synonymous segregating sites and the total number of synonymous sites in the bank vole samples. We also estimated the population-scaled mutation rate (θ) for each gene, to use as starting value for the mlhka analysis. For divergence estimates, we calculated the number of synonymous differences between the mouse sequence and one of the haplotypes of the bank vole reference genome. To obtain a set of putatively neutrally evolving control genes for the mlhka analyses, we first filtered the 9,246 control genes according to the following criteria: (a) ≥3 synonymous sites, (b) Tajima's D of nonsynonymous and synonymous sites between − 1 and 1, (c) size range within the distribution of the outlier genes (498-5,169 bp) and (d) the mouse orthologue is not located on a sex chromosome. From this set of suitable control genes (N = 3,340) we randomly selected 60 genes.
We then performed McDonald-Kreitman tests on these genes (McDonald & Kreitman, 1994), using dnasp (Rozas et al., 2017). We ranked the 60 genes according the Neutrality Index (NI). From this list we selected a set of 20 genes with a median NI as close as possible to 1 (NI median = 1.04, range: 0.7-2.0, all p ≥ .36), which were then used as control genes in the mlhka tests. The alignment of these sequences to the mouse orthologue and the calculation of divergence and polymorphism followed the same approach as described above for the outlier genes (Table S9). For each outlier gene, we ran the mlhka analysis at least four times with chain length = 100,000 and varying input parameters (divergence time [T] and mutation rate [θ]) to ensure convergence. We performed the HKA test on individual genes separately instead of groups of genes (e.g., comparing all receptor genes versus the 20 control genes) because we see little reason to expect that all (or even most) genes in a certain category should be under balancing selection.
Some of the focal genes are paralogues situated close to each other; they are therefore prone to paralogous gene conversion (PGC), which could confound signatures of balancing selection. To test for PGC, we used geneious R11 to construct dot plots, and dnasp version 6 (Rozas et al., 2017) to perform sliding window analyses of coding sequence nucleotide diversity (π) and nucleotide divergence between the paralogues (D xy ). We used popart (Leigh & Bryant, 2015) to construct median-joining networks (Bandelt, Forster, & Röhl, 1999).

| Analyses of balancing selection in coding and noncoding regions
We used betascan (Siewert & Voight, 2017) to search for more localized signatures of long-term balancing selection in coding and noncoding sequences of the 123 PRR pathway genes. betascan calculates a summary statistic (β) that quantifies allele frequency correlations between single nucleotide polymorphisms (SNPs). Under long-term balancing selection there is expected to be a cluster of SNPs with similar allele frequencies, which will cause β to be high (Siewert & Voight, 2017). In the software we used the folded allele frequency spectrum, and set a minimum minor allele frequency of 0.15 (to reduce the number of false positive signals) as recommended (Siewert & Voight, 2017). As the average recombination rate in rodents is about half that in humans (Jensen-Seaman et al., 2004), we used a window size of 2 kb, twice of that used by Siewert and Voight (2017).
We also evaluated window sizes of 1 and 5 kb and found the results qualitatively similar to those based on 2 kb. To reduce the risk of spurious signals caused by copy number variation and incorrectly mapped paralogues, we filtered beta values for SNPs located within 1 kb of a duplication or deletion called by delly (Rausch et al., 2012) (see Supporting Information). For each PRR pathway gene we selected the highest β value within the annotated gene interval ± 2 kb (β max ). β could only be calculated for 119 of 123 PRR pathway genes because some lacked SNPs with sufficiently high minor allele frequency. To obtain a null distribution for β max , we calculated the empirical distribution of β max within the annotated gene interval ± 2 kb of the control genes (N = 8,818; fewer than in the set of control genes for π because of the filtering steps above; Figure S1), and considered PRR pathway genes with a β max ≥ the 95th percentile (β = 235.5) as potential targets of long-term balancing selection.
In analyses of β max we included gene length as a covariate, as longer genes provide a larger target for balancing selection. Gene length was calculated as the length of the annotated gene interval plus 2,000 bp at either end. For genes with signatures of balancing selection according to betascan, we used popgenome (Pfeifer et al., 2014) to perform sliding window analyses with a window size of 2,000 bp and a step size of 500 bp for nucleotide diversity and Tajima's D (where a high value is indicative of balancing selection; Fijarczyk & Babik, 2015;Tajima, 1989) across the entire gene and in up-and downstream regions.

| Sanger sequencing
Copy-number variation and incorrect mapping of paralogues can lead to artefactually high nucleotide diversity and indicate that a locus is under balancing selection. To avoid this, the genome resequencing data were subjected to a number of filtering steps before the analyses above (Supporting Information). To ensure these filtering steps were sufficient, we performed Sanger sequencing of a selection of highly polymorphic genes (CARD6, CD36, CLEC5A, DDX58, IFIH1,   IRAK3, LY75, NLRP10, NOD1, TLR1, TLR2, TLR5, TLR6; see Table S10 for primers). Primers were designed to amplify a fragment covering the peak (or, in the case of NOD1, as close as possible to the peak) in nucleotide diversity and/or β. The PCR products were sequenced in both directions on a Genetic Analyser 3500 (Applied Biosystems) using Big Dye terminator chemistry (Applied Biosystems). Sequences were processed, assembled and aligned using geneious R11. For each locus, we Sanger sequenced 5-38 individuals from an independent set of bank voles (sampled from the same population as the genome resequencing data set; see Table S10 for sample sizes for each gene), so that we could confirm that SNPs with relatively even allele frequencies were present as both homozygotes and heterozygotes (which should rule out copy-number variation and incorrect mapping of paralogues). Sanger sequencing confirmed the SNPs observed in the genome resequencing data set for all the tested genes. Thus, we are confident that the filtering steps were sufficient to eliminate confounding effects of copy-number variation and incorrect mapping.

| Pleiotropy
We used the number of protein-protein interactions (PPI, i.e., the number of other host proteins a given protein in PRR signalling pathways interacts with) as a proxy for pleiotropy (Papakostas et al., 2014;Zhang & Yang, 2015). The numbers of PPIs for Mus musculus orthologues of the bank vole PRR pathway genes were retrieved from the STRING database (Szklarczyk et al., 2015), and were based on the sources "experiments" and "databases," with high confidence (0.7).

| Statistical analyses
Analyses of factors influencing nucleotide diversity and β max were performed with proc glm in SAS 9.4 (SAS Institute). Nonsignificant terms (p ≥ .1) were deleted in a step-wise manner. Nucleotide diversity, β max , PPI and gene length were log 10 -transformed to normalize the distribution of residuals and improve homoscedasticity.

| De novo genome assembly
We constructed a de novo genome assembly from a male bank vole caught in southern Sweden using four paired-end short read libraries (Table S3). Following several assembly and filtering steps we obtained an assembly comprising 50,926 scaffolds with a scaffold N50 of 1.6 Mb and a total scaffold length of 2.4 Gb (Table S4). Using RNAseq data from six different tissues (Table S5) and protein homology data from other rodents we annotated 24,940 gene models that correspond to 16,983 unique mouse genes.

| Whole-genome resequencing
To assay variation in a population of bank voles from southern Sweden we performed whole-genome resequencing of an additional 30 individuals and mapped these data to the reference genome with an average coverage of mapped reads of 44× (Table S8). Using the whole-genome resequencing data, together with a paired-end library from the reference genome sample, we obtained a set of 21.6 million filtered SNPs of which 21.4 million were bi-allelic. To provide a basic characterization of the samples included in the resequencing data set from a phylogeographic perspective, we performed a phylogenetic analysis of full-length mitochondrial haplotypes. This showed that our samples formed a monophyletic group with shallow divergence belonging to the "Carpathian" mitochondrial DNA (mtDNA) clade ( Figure S2), confirming previous analyses of bank vole populations in southern Sweden based on smaller data sets (Filipi et al., 2015;Morger et al., 2015).

| PRR gene set
We obtained complete or nearly complete coding sequences (at least 75% of the mouse orthologues) for 123 PRR signalling pathway genes: 35 encoding receptors and 88 encoding signal transduction proteins (Figure 1; Table S1). Of the receptors, 22 recognize components of microbial cell walls etc., 10 recognize nucleic acids, and for three the ligands are unclear (Figure 1).

| Associations between gene function, pleiotropy and coding sequence nucleotide diversity of PRR pathway genes
To test for effects of gene function (receptor versus signalling) and number of protein-protein interactions (PPI; as a proxy for pleiotropy; see Material and Methods) on nucleotide diversity, we performed a general linear model with nucleotide diversity against gene function, PPI and their interaction. There were significant main effects of both gene function (F = 15.65, df = 1, 119, p < .0001) and PPI (F = 22.14, df = 1, 119, p < .0001), and also a significant interaction between gene function and PPI (F = 8.13, df = 1, 119, p = .0051). Figure 2a shows that receptor genes on average had higher diversity than signalling genes, but only at low PPI. This conclusion is robust to restricting the analysis to subsets of the data (e.g., PPI ≤ 10 or PPI ≤ 40) to eliminate problems with heteroscedasticity and different ranges of PPI between receptor and signalling genes (data not shown).

Inspection of
To test for a difference in diversity between receptors recognizing different types of ligands (proteins etc. versus nucleic acids), we performed a general linear model with nucleotide diversity against receptor type, PPI and their interaction (excluding the three receptors with unknown ligands; Figure 1). This analysis showed a main effect of receptor type (F = 6.61, df = 1, 27, p = .016; Figure 2b) and a marginally nonsignificant interaction (F = 3.87, df = 1, 27, p = .06), but no overall effect of PPI (PPI: F = 1.48, df = 1, 27, p = .23).

| Signatures of balancing selection in coding sequences of PRR pathway genes
High nucleotide diversity can be a result of balancing selection, but it could also simply be a result of a high mutation rate and/or relaxed  in the coding sequence of some PRR genes (see Figure 2a) was a result of balancing selection, we first identified which of the 123 PRR genes had exceptionally high nucleotide diversity. For this, we obtained SNP variation of the coding sequence for a set of 9,246 high-quality control genes (see Materials and Methods for criteria) to use as an empirical background distribution. Both receptor and signalling genes had higher median nucleotide diversity than control genes (Mann-Whitney tests, receptor versus. control: p < .001; signalling versus. control: p < .006; Figure S3). We used the 95th percentile of nucleotide diversity (π = 0.0037) in the control gene data set as a cut-off to define which of the focal immune genes had exceptionally high levels of variation. Nine receptors and three signalling genes had a coding sequence diversity above this threshold ( Figure S3;

| Associations between gene function, pleiotropy and signatures of balancing selection in coding and noncoding sequences
The HKA test of whole coding sequences described above pick up signals of balancing selection that span a substantial part of the coding sequence of a gene, but has little power to detect signals of more localized selection on the coding sequence (in particular in large multi-exon genes), and selection on noncoding regions of a gene. We  Figure S4).
Of the 119 PRR pathway genes for which we could calculate β, seven (four receptors and three signalling genes; 4/34 versus 3/85, Fisher's exact test: p = .1) had a β max greater than or equal to the 95th percentile of the empirical distribution (Table 2). In most cases, β max was located in noncoding regions (mostly introns) and coincided with peaks in nucleotide diversity and Tajima's D ( Figure S5). There was a TA B L E 1 PRR pathway genes with coding sequence nucleotide diversity (π) above the 95th percentile of the empirical genome-wide distribution positive correlation between β max and nucleotide diversity of the full coding sequence (r s = 0.648, N = 119, p < .01).

| Confounding effects of paralogous gene conversion?
One potentially confounding factor in analyses of balancing selection is PGC. PGC can occur between paralogues with high sequence identity situated in tandem (Chen, Cooper, Chuzhanova, Férec, & Patrinos, 2007). TLR1 and TLR6 are estimated to be 26 kb apart in the bank vole genome and the coding sequences have 71.7% identity at the nucleotide level; these two genes are therefore prone to PGC. Indeed, previous analyses have shown that PGC constrains divergence of TLR1 and TLR6 between species (Kruithof, Satta, Liu, Dunoyer-geindre, & Fish, 2007). The rate of PGC is typically at least an order of magnitude higher than single nucleotide mutations; consequently, PGC can inflate the nucleotide diversity (Innan, 2003;Thornton, 2007). The high nucleotide diversity of TLR1 and TLR6 could therefore be a result of PGC instead of balancing selection.

Detailed analyses of diversity within and divergence between TLR1
and TLR6 provided strong evidence for PGC between these genes also in the bank vole, and nucleotide diversity peaked in the region affected by PGC (Figure 4a). PGC violates assumptions of neutrality tests traditionally used to detect balancing selection, such as the HKA test (Innan, 2003;Thornton, 2007). We therefore regard the results of the HKA tests for both TLR1 and TLR6 as dubious. Still, haplotype networks based on the regions subject to PGC showed that in both TLR1 and TLR6 there were divergent haplotype groups at relatively even frequencies ( Figure 4b); this pattern suggests that the high diversity is not just an effect of PGC in combination with drift, but that balancing selection has also played a role (Storz et al., 2007).
PGC has, as far as we are aware, not been described for orthologues of any other genes with signatures of balancing selection in   , , ,
The inclusion of a set of control genes in the HKA analyses makes these relatively robust to confounding effects of demography (Wright & Charlesworth, 2004). We addressed the potential effects of gene conversion and found evidence that it has contributed to the high diversity of TLR1 and TLR6, but not in other cases. Further analyses based on more than one bank vole population, and other closely related species, are required to investigate if admixture and introgression have contributed to the high levels of polymorphism in some PRR pathway genes in the bank vole, as appear to be the case for some immune genes in humans (Dannemann, Andrés, & Kelso, 2016;Fumagalli et al., 2010). With these caveats, we regard the genes identified here (Tables 1 and 2) as highly likely targets of balancing selection.
Analyses of coding sequences showed that genes encoding receptors had higher nucleotide diversity than genes encoding signal- balancing selection based on β max (i.e., with β max above the 95th percentile). This is possibly an effect of β max in these cases being mostly located in noncoding sequences (in particular in introns; Table 2), suggesting that β max primarily picked up signatures of selection acting on regulatory variation (although we note that it is difficult to determine the location of the target of selection based on these analyses alone). The prediction, based on models of coevolution (Agrawal & Lively, 2003;Dybdahl et al., 2014), that receptors recognizing cell wall components etc. should be key targets of balancing selection primarily concerns selection on coding sequences. In contrast, when it comes to balancing selection on regulatory variation, it is more difficult to see why genes in a certain functional group should be particularly important targets.
The analyses of whole coding sequences showed a negative association between pleiotropy and nucleotide diversity. Similarly, the analyses of localized signatures of balancing selection in coding and noncoding sequences showed negative associations between pleiotropy and β max . Again, these patterns mirror results of analyses across species, where genes with a higher number of protein-protein interactions are more constrained (Chesmore et al., 2016;Fraser et al., 2002). However, in the case of divergence across species, the most important factor is the level of expression (highly expressed genes are more constrained), and it is debatable whether there is an independent effect of pleiotropy (Alvarez-Ponce, 2012;Fraser & Hirsh, 2004;Pál, Papp, & Lercher, 2006;Zhang & Yang, 2015).
Further studies are required to determine if the association between nucleotide diversity and pleiotropy found in the present study is a direct effect, or rather a result of both pleiotropy and diversity being associated with the level of gene expression.
Our finding that genes encoding receptors for pathogen recognition, in particular receptors recognizing components of microbial cell walls etc., are primary targets of balancing selection is consistent with the idea that coevolution is an important driver of pathogen-mediated balancing selection (Haldane, 1949;Woolhouse et al., 2002). However, it is important to recognize that this pattern does not exclude other causes of balancing selection, such as heterozygote advantage (Doherty & Zinkernagel, 1975), environmental heterogeneity in pathogen abundance (Hedrick, 2002;Lazzaro & Little, 2009) and "diffuse interactions" (i.e., negative frequency-dependent selection involving a multitude of pathogens, but without coevolution; Karasov et al., 2014). Ultimately, the relative importance of different processes needs to be determined gene by gene through analyses of functional effects of polymorphisms. One key step here will be to test assumptions behind different models, for example whether a polymorphism leads to a host genotype-by-pathogen genotype interaction, as required for Red Queen coevolution (Woolhouse et al., 2002).

ACK N OWLED G EM ENTS
This work was supported by grants from the Swedish Research Council