A candidate tolerance gene identified in a natural population of field voles (Microtus agrestis)

The animal immune response has hitherto been viewed primarily in the context of resistance only. However, individuals can also employ a tolerance strategy to maintain good health in the face of ongoing infection. To shed light on the genetic and physiological basis of tolerance, we use a natural population of field voles, Microtus agrestis, to search for an association between the expression of the transcription factor Gata3, previously identified as a marker of tolerance in this system, and polymorphism in 84 immune and nonimmune genes. Our results show clear evidence for an association between Gata3 expression and polymorphism in the Fcer1a gene, with the explanatory power of this polymorphism being comparable to that of other nongenetic variables previously identified as important predictors of Gata3 expression. We also uncover the possible mechanism behind this association using an existing protein–protein interaction network for the mouse model rodent, Mus musculus, which we validate using our own expression network for M. agrestis. Our results suggest that the polymorphism in question may be working at the transcriptional level, leading to changes in the expression of the Th2‐related genes, Tyrosine‐protein kinase BTK and Tyrosine‐protein kinase TXK, and hence potentially altering the strength of the Th2 response, of which Gata3 is a mediator. We believe our work has implications for both treatment and control of infectious disease.

those of mounting an immune response. Tolerance of infection is now attracting considerable interest in the immunological and ecological literature (Medzhitov et al., 2012;R aberg, Graham, & Read, 2009) and provides a new perspective to help understand how the immune response in animals functions following infection, which has hitherto been viewed primarily in the context of resistance only.
Individuals in apparently similar circumstances differ in their responses to infection, and some are worse than others at either resisting or tolerating infection (Arriero et al., 2017;Buehler, Piersma, Matson, & Tieleman, 2008;Kluen, Siitari, & Brommer, 2013). Beyond recognizing that such variation exists in natural populations, though, we understand little of the genetic and physiological basis of this variation. Improving this understanding is a key step towards predicting which individuals are most vulnerable to infectious disease (R aberg, 2014). Genetic variation for tolerance has been previously demonstrated in inbred strains of laboratory mice (Raberg, Sim, & Read, 2007) and, to a more limited extent, in natural systems (Regoes et al., 2014). However, knowledge of specific genes controlling tolerance, and hence potentially driving this heritable variation in strategy in the wild, is lacking. Candidate genes include those involved in limiting immunopathology and/or regulation of the immune response (Medzhitov et al., 2012;R aberg et al., 2009). In the laboratory, a genetic locus on mouse chromosome 11 (Ctrq3) has been shown to influence tolerance to Chlamydia psittaci infection, with circumstantial evidence for candidate genes belonging to the family of immunityrelated GTPases (Miyairi et al., 2012). Another study has also identified a signalling protease required for melanization in Drosophila melanogaster (CG3066) as being of importance (Ayres & Schneider, 2008).
Finally, in humans, an association between HLA-B genotype and degree of tolerance to HIV has been shown (Regoes et al., 2014).
Our own work has previously identified the expression of a particular master transcription factor, Gata3, as a marker of tolerance in mature male field voles, Microtus agrestis. This work showed that macroparasite infection in these mature voles gave rise to elevated levels of Gata3 expression, which in turn gave rise to improved body condition and enhanced survival (Jackson et al., 2014). This fits with the known role of Gata3 as a mediator of the Th2 response and the role of the Th2 immune system in tissue repair (Allen & Wynn, 2011). Furthermore, we have shown evidence for consistent differences between individuals in their typical level of Gata3 expression, after other measured sources of variation have been taken into account (Arriero et al., 2017).
Together, our results imply consistent difference between individuals in the strength of their tolerance response.
Here, we address the contribution of genotype to consistent individual differences in the expression of Gata3, a marker of tolerance. We use a natural population of wild M. agrestis to search for an association between Gata3 expression and polymorphism in 84 immune and nonimmune genes. We find Gata3 expression associated with polymorphism at the Fcer1a gene (which encodes the alpha chain of the high-affinity receptor for immunoglobulin epsilon, IgE) and show that the proportion of variation in Gata3 expression explained by this polymorphism is comparable to that explained by other environmental and physiological variables. We also shed light on the possible mechanism behind this association by constructing a protein-protein interaction network for the mouse model rodent, Mus musculus, which we validate using our own expression network for M. agrestis.

| Field design and animals
We studied M. agrestis in Kielder Forest, Northumberland (55°13 0 N, 2°33 0 W), using live-trapping to access individual animals from natural populations. Our studies were designed to permit the analysis of individual variation in host condition and survival, infection status and the expression of immune genes (for full details of some of the methods below, see Jackson et al., 2011Jackson et al., , 2014. The studies were divided into longitudinal and cross-sectional components.

| Initial survey
We repeated our field design at two spatially separate sites (BLB and SQC) in 2008-2009, and a further two (SCP and KTH) in 2009-2010. Each site contained a central trapping grid (~0.375 ha) of 150 (10 9 15) regularly spaced traps (3-to 5-m intervals) which was used in a capture-recapture study (reported elsewhere). The crosssectional component reported here utilized curvilinear transects of 100 live traps arranged at 5-to 10-m intervals, which were placed around the margins of each habitat.
Ten voles per month were destructively sampled from the transects between February and November (2008) or April and November (2009-2010. In November (2008) and March (2009and 2010, larger numbers of animals were sampled both from the transects and from the central grid habitats. These samples are used here to carry out a haplotype association analysis. On capture, each animal was examined for ectoparasites. Only results for male M. agrestis are reported here given the focus of previous work (Jackson et al., 2014). Males were classified as either immature (nonmating with undeveloped testes) or mature (mating with large testes and expanded seminal vesicles). Some biometric data were also collected, including body weight (g) and snout-vent length (mm). All animal procedures carried out as part of this initial survey were performed with approval from the University of Liverpool Animal Welfare Committee and under a UK Home Office licence (PPL 40/3235 to MB). count of adult cestodes found in the gut (Anoplocephaloides dentata aff., Paranoplocephala sp., Rodentolepis asymmetrica, Arostrilepis horrida). We collected infection metrics for these macroparasites because they are the most common species that would be expected to be in strong contact with the host immune system (Jackson et al., 2014).
Similarly to the initial survey, each site contained a trapping grid of 150-197 regularly spaces traps (at~5 m intervals), but this was used both for cross-sectional and longitudinal components (not reported here). Sixty-four voles were also destructively sampled from the grids between July and October 2015 to assay expression by RNA-Seq. In this study, voles were killed by a rising concentration of CO 2 , followed by exsanguination. Both females and males were included to maximize sample size. These samples were shown to be comparable in terms of weight, age and sex to the population sampled in the initial survey (Table S1)

| Immunological assays
We used two-step reverse transcription quantitative real-time PCR (Q-PCR) to measure messenger RNA (mRNA) accumulation of Gata binding protein 3 (Gata3, a transcription factor associated with the Th2 response) from splenocyte cultures stimulated with mitogen phytohaemagglutinin (PHA; see Jackson et al., 2011 for details). Gata3 has previously been identified as a marker of tolerance in mature male voles (Jackson et al., 2014). PHA preferentially activates and stimulates proliferation of CD4+ helper T cells in vitro (O'Donovan, Johns, & Wilcox, 1995). Here, we use that observed expression profile as a measure of the potential responsiveness of the immune system in vivo.

| SNP identification and genotyping
We identified 288 single nucleotide polymorphisms (SNPs) in 85 immune-related genes and 25 nonimmune genes. Immune genes included cytokine genes and other genes known to be involved in pathogen resistance. The Immunome database version 1.1. (http:// structure.bmc.lu.se/idbase/Immunome/index.php), a manually curated database containing information on 893 genes considered essential to the human immune system, was used a starting point for identifying a list of candidate immune genes (Ortutay & Vihinen, 2006). First, we excluded all those genes in this database with no known orthologue in house mice. We then applied a heuristic approach to ensure that those genes which were most likely to be of interest given our previous work (Jackson et al., 2014) were represented in our list and excluded those genes with no known polymorphisms in M. agrestis. We also chose a set of nonimmune genes to act as a control for spurious associations, caused, for example, by demographic effects. This set was composed solely of metabolic genes, as these are far less likely to be involved in host-pathogen interactions (see Table S2 for full list of immune and nonimmune genes identified).
DNA was extracted from the livers of voles that had been destructively sampled as part of the cross-sectional study and for which Gata3 expression levels were available (n = 221) using DNeasy Blood and Tissue Kit (Qiagen). Genotyping was then performed by KBiosciences (Hoddesdon, UK; http://www.kbioscience.co.uk) using the KASPar SNP genotyping system. This included negative controls (water) and duplicate samples to validate reproducibility. The SNP genotyping data were (i) converted into haplotype data for each gene and (ii) tested for associations with mitogen-stimulated Gata3 expression while controlling for other known covariates, using the HAPASSOC package (Burkett, Graham, & McNeney, 2006;Burkett, McNeney, & Graham, 2004). This software allows likelihood inference of trait associations with SNP haplotypes and other attributes, adopts a generalized linear model framework and estimates parameters using an expectation-maximization algorithm. If the haplotype combination of an individual cannot be inferred from its genotyping data (i) because it is heterozygous at two or more markers or (ii) because it has missing data for a single marker, the approach implemented in HAPASSOC is to consider all possible haplotype combinations for that individual. Standard errors accounting for this added uncertainty are calculated using the Louis' method (Louis, 1982).

| Statistical analyses
We assumed an additive genetic model, where Gata3 expression is linearly related to the number of copies of a haplotype present and we pooled together all those haplotypes with frequencies below 5%. Gata3 expression values were Yeo-Johnsontransformed (Yeo & Johnson, 2000) to achieve approximately normal and homoscedastic residuals. Other nongenetic covariates included in this model were site (BLB, SQC, SCP & KTH), maturity (either immature or mature male), residual weight (adjusted for body size) and the first principal component from a PCA summarizing the macroparasites measured. This component explained 47% of the variation in macroparasite burden and showed high positive loadings for all three macroparasite groups (ticks: 0.56, fleas: 0.57 and adult cestodes: 0.60). Grouping of ectoparasites and endoparasites in this way is in line with previous work that shows that both ectoparasites Boppana, Thangamani, Alarcon-Chaidez, Adler, & Wikel, 2009) and endoparasites (Anthony et al., 2007;Harris & Gause, 2011) stimulate the Th2 response, which has been suggested to be a mechanism for adaptive and rapid tissue repair against parasite-induced damage (Allen & Wynn, 2011). These variables have previously been identified as important predictors of Gata3 expression (Jackson et al., 2011(Jackson et al., , 2014. All nongenetic covariates were tested for independence (variance inflation factors = 1.09-1.24).
As this was a haplotype association analysis, we excluded all genes with a single SNP and all monomorphic SNPs (see Table S2 for these), resulting in a total of 238 SNPs in 62 immune-related genes and 22 nonimmune genes being included in the analysis (see Table S3 for final list of immune and nonimmune genes). We also excluded those subjects for which more than one single-locus genotype had missing data. Because of the large number of association tests performed, the Benjamini and Hochberg method of correction was applied to all p-values, with the false discovery rate set to .1 (Benjamini & Hochberg, 1995). Resulting q-values (FDR-corrected p-values) were checked for a uniform distribution.
We were unable to include any random variables or interactions between nongenetic variables in the initial haplotype association analysis due to the framework adopted by HAPASSOC (Burkett et al., 2006 and site as random effects. It also included previously identified interactions between maturity and macroparasitic load, as well as maturity and residual weight (Jackson et al., 2014). A single haplotype of interest was identified in the initial haplotype association analysis (Section 3.1). Genotype was therefore coded as the number of copies of this haplotype. This was treated as a continuous variable because only five individuals were found to have two copies of this haplotype, making it difficult for reliable comparisons to be made between factor levels. Treatment of genotype as a continuous variable also reduced the number of degrees of freedom by one. Only those individuals whose combination of haplotypes or "haplotype phase" could be determined with certainty were included in this analysis, but this was the majority of individuals (n = 191; 86%). The contribution of genotype relative to other predictors in explaining variance in Gata3 expression was assessed by calculating the marginal R 2 using the MUMIN package (Barto n, 2017) for (i) the full LMM and (ii) the LMM with each of the fixed effects (as well as any associated interaction terms) removed individually. The AICc for each model was also calculated to compare model fits. This treatment, as for exposure to PHA, selectively promotes the proliferation of T cells (Frauwirth & Thompson, 2002). RNA was extracted using Invitrogen PureLink kits. cDNA sequencing libraries were prepared using Illumina RiboZero kits to deplete rRNA followed by library construction with NEBNext Ultra directional RNA library prep kit according to manufacturers protocols. Samples were sequenced to produce 2 9 75 bp paired-end reads on an Illumina HiSeq4000 platform. Adaptor sequences were removed using CUTADAPT version 1.2.1 and further trimmed with SICKLE version 1.200 with a minimum window quality score of 20. This resulted in a mean library size of 18 million (range = 5-50 million) paired-end reads.

| Read mapping
High-quality reads were mapped against a draft genome for M. agrestis (GenBank Accession no. LIQJ00000000), using TOPHAT version 2.1.0 (Trapnell, Pachter, & Salzberg, 2009). BRAKER2 was used to generate a set of predicted gene models using mapped reads to guide Augustus (Hoff, Lange, Lomsadze, Borodovsky, & Stanke, 2015). Mapped reads were then counted using FEATURECOUNTS (Liao, Smyth, & Shi, 2014). Further analysis of gene count data was performed in R version 3.4.0 (R Core Team, 2017) using the EDGER package (Robinson, McCarthy, & Smyth, 2010). Count data were filtered to remove those genes with fewer than 3 counts per million across all samples to avoid convergence problems later on. Following filtering, library sizes were recalculated, data were normalized and MDS plots were generated to check for any unusual patterns in the data.

| Protein-protein interaction network construction
The STRING database version 10 (Szklarczyk et al., 2015) for M. musculus was used to construct a network of proteins known to interact with either Gata3 or Fcer1a using the stringApp in CYTOSCAPE version 3.3.0 (Shannon et al., 2003). The default confidence score cut-off of 0.4 was used to extract those interactions that were well supported.

| Expression network construction
To validate the M. musculus network, which included seven genes (Fcer1a, Gata3 and five other genes; see Section 3.3), we constructed a network for the same seven genes using the normalized count data. Spearman rank correlation coefficients were calculated for each combination of these genes, and associated p-values deduced from a null distribution composed of 2 9 10 8 coefficients generated from a randomized version of the data set. Only statistically significant correlations (p < .05) were reported and included in the network. Two paralogous vole genes were found for the mouse gene, Btk, but these were summarized as a single node in the vole network. This resulted in one pair of duplicated edges between these Btk paralogues and Jun-the more significant edge is presented in the network.

| RESULTS
The majority of SNPs were found to be in Hardy-Weinberg equilibrium (n = 259; 90%), and only four genes were found to have all SNPs departing from Hardy-Weinberg equilibrium: Gucy2f, Il13ra1, Tlr13, Tlr7 and Tlr8 (see Table S2 for summary of all loci). High LD was detected between SNPs within the same genes (mean D 0 = .76; 95% CI = 0.72-.081) but not between SNPs located in different genes (mean D 0 = .28; 95% CI = 0.28-0.28).

| Gata3 expression is associated with polymorphism in Fcer1a
Of the 84 immune and nonimmune genes tested, only polymorphism in the gene Fcer1a was found to be significantly associated with Gata3 expression (q = 0.07; FDR cut-off = 0.1). Three haplotypes were identified at this locus: GCC, ACC and ACT at frequencies of 0.12, 0.76 and 0.07, respectively. The GCC haplotype was associated with lower expression levels of Gata3 than the ACC and ACT haplotypes (p = .003; 0.01; Figure 1). This was confirmed by the LMM (p = .002; Table 1). No significant association was found between polymorphism in the Gata3 gene itself and Gata3 expression (q = 1.00).

| The Fcer1a polymorphism is comparable in explanatory power to nongenetic variables previously identified as important predictors of Gata3 expression
The percentage variance in Gata3 expression explained by the fixed effects in the full model (or marginal R 2 ) including genotype was 10%. This dropped to about 5% when genotype, macroparasites or maturity was removed (individually) and to 8% when maturity 9 macroparasites was removed, indicating that genotype was comparable in explanatory power to other nongenetic variables previously identified as important predictors of Gata3 expression (Table 2). Furthermore, the greatest increase in AICc (relative to the full model) was observed when genotype was removed (DAICc = 7.7). However, a degree of overlap or multicollinearity between the variables was evident from these estimates.

| Both Fcer1a and Gata3 are associated with Btk and Txk in the mouse model and vole
The M. musculus network included seven nodes (the proteins Fcer1a and Gata3, as well as Txk, Btk, Jun, Fos and Itk) and 18 edges (Figure 2a). The M. agrestis network included six of these nodes connected by 10 edges (Figure 2b). Itk could not be included as it was not annotated in the vole genome. Nine of 18 of the edges in the M. musculus network were identified, in addition to a significant edge between Btk and Txk (q = À.32; p < .01). Btk was found to be significantly correlated with both Fcer1a (q = .26, p = .02) and Gata3 (q = À.41, p < .001), as was Txk (Fcer1a: q = À.23, p = .03; Gata3: q = .43, p < .001).

GCC ACC ACT
Gata3 expression level

| DISCUSSION
In this study, we have found an association between polymorphism in the gene Fcer1a and the expression of the transcription factor Gata3, which has previously been identified as a marker of tolerance to infection in this system. We have also shown that this polymorphism is comparable in explaining power to other nongenetic variables previously identified as important predictors of Gata3 expression (Jackson et al., 2014).
Our results indicate that genotype has the potential to play an important role in driving consistent individual differences in immune gene expression in the wild (Arriero et al., 2017). This suggests that individuals are, to a significant, detectable degree, hard-wired to respond in a certain way to challenges from parasites and pathogens.
However, little is known about how natural selection acts on tolerance. Previous studies have found evidence for tolerance being less costly than resistance (Howick & Lazzaro, 2014). Under this scenario, one may expect tolerance to evolve more quickly and to have lower levels of genetic variation than resistance (R aberg, 2014). Indeed, some evidence for positive directional selection on tolerance already exists (Hayward et al., 2014). However, genetic variation may also be maintained by temporal shifts in the strengths and directions of selection pressures. This may lead to low frequencies of individual haplotypes, as observed here.
Our results also shed light on the potential molecular and physiological mechanisms driving tolerance in the wild, which hitherto  have been neglected. We find no effect of polymorphism in the Gata3 gene on its own expression, but rather a trans-acting effect of Fcer1a on Gata3 expression. By starting with an existing mouse protein-protein interaction network and subsequently validating this using a novel vole expression data set, we have also found evidence for a functionally relevant mechanism for this association.
Fcer1a encodes the alpha chain of the high-affinity receptor for immunoglobulin epsilon (IgE). This receptor is expressed on basophils, mast cells and eosinophils. When activated by an antigen interacting with Fcer1-bound IgE, these cells promote a cascade of antimacroparasitic Th2 responses, of which Gata3 is also a mediator. This is reflected by the fact that, among other proteins, both Gata3 and Fcer1a are known to interact with two nonreceptor kinases: Tyrosine-protein kinase BTK (Btk) and Tyrosine-protein kinase TXK (Txk). Btk plays a key role in B-cell development, differentiation and signalling (Maas & Hendriks, 2001), and Txk exerts its effects on Th cell differentiation and function (Sahu et al., 2008).
We were able to validate both of these interactions using our own expression network for M. agrestis. This suggests that the polymorphism in question may be working at the transcriptional level, leading to changes in the expression of Th2-related genes and hence potentially altering the strength of the Th2 response.
We focus here on tolerance, as this is a neglected area of study, but a diversity of immune strategies have been identified in natural populations (Abolins, Pocock, Hafalla, Riley, & Viney, 2011;Buehler et al., 2008). In our own study population of voles, we have shown a link between Gata3 expression and macroparasite resistance in immature male voles (Jackson et al., 2014), suggestive of an important role for Gata3 not just as a marker of tolerance, but more generally, of the immune strategy adopted by an individual. Indeed, this is consistent with previous work in a laboratory setting, which shows that polymorphism at a single locus can confer both resistance and tolerance (Ayres & Schneider, 2008;Miyairi et al., 2012). In the context of tolerance though, these results could have important implications for controlling the spread of disease, as high levels of tolerance can be associated with neutral or even positive effects on parasite prevalence (Miller, White, & Boots, 2006;Roy & Kirchner, 2000) and tolerant individuals can act as "superspreaders," responsible for a large proportion of transmission events (Lloyd-Smith, Schreiber, Kopp, & Getz, 2005). In general, the identification of tolerance genes or haplotypes could facilitate the identification of such high-risk individuals, enabling more targeted control and helping to prevent the spread of disease in the wild. On the other hand, tolerance is also associated with good health and condition despite infection, which could act as a potential pathway for the development of new treatments for infectious disease (Medzhitov et al., 2012;R aberg, 2014). Mapping out the network mediating the effects of a tolerance gene is a first step towards this. For these reasons, we believe this is an exciting and rare example of a candidate tolerance gene in a natural population, which we hope to continue monitoring to shed further light not only on tolerance, but on immune strategy more generally, in the wild.

ACKNOWLEDG EMENTS
The authors wish to thank the many individuals involved in obtaining