Genotype–environment associations reveal genes potentially linked to avian malaria infection in populations of an endemic island bird

Patterns of pathogen prevalence are, at least partially, the result of coevolutionary host–pathogen interactions. Thus, exploring the distribution of host genetic variation in relation to infection by a pathogen within and across populations can provide important insights into mechanisms of host defence and adaptation. Here, we use a landscape genomics approach (Bayenv) in conjunction with genome‐wide data (ddRADseq) to test for associations between avian malaria (Plasmodium) prevalence and host genetic variation across 13 populations of the island endemic Berthelot's pipit (Anthus berthelotii). Considerable and consistent spatial heterogeneity in malaria prevalence was observed among populations over a period of 15 years. The prevalence of malaria infection was also strongly positively correlated with pox (Avipoxvirus) prevalence. Multiple host loci showed significant associations with malaria prevalence after controlling for genome‐wide neutral genetic structure. These sites were located near to or within genes linked to metabolism, stress response, transcriptional regulation, complement activity and the inflammatory response, many previously implicated in vertebrate responses to malarial infection. Our findings identify diverse genes – not just limited to the immune system – that may be involved in host protection against malaria and suggest that spatially variable pathogen pressure may be an important evolutionary driver of genetic divergence among wild animal populations, such as Berthelot's pipit. Furthermore, our data indicate that spatio‐temporal variation in multiple different pathogens (e.g. malaria and pox in this case) may have to be studied together to develop a more holistic understanding of host pathogen‐mediated evolution.

Importantly, these mechanisms and divergent host-pathogen coevolutionary cycles across a host's geographic range may lead to spatial differences in the variants conferring defence against a given pathogen at any given time.Investigating the distribution of genetic variation within and among host populations facing a given pathogen can help to identify the loci and mechanisms that underlie the host's evolutionary response to that pathogen.
It can, however, be challenging to identify the underlying host loci involved in modulating host-pathogen interactions in natural systems.Analyses of vertebrate pathogen defences tend to have focused on well-known resistance-related genes (i.e.those that prevent or limit infection) (Aguilar et al., 2004;Brouwer et al., 2010;Davies et al., 2021;Hawley & Fleischer, 2012;Quéméré et al., 2015;Tschirren et al., 2011).By comparison, the genes involved in tolerance mechanisms which act to mitigate fitness costs by minimising direct or immune-mediated damage have been largely overlooked, despite evidence that tolerance is an important host defence strategy (Råberg et al., 2007(Råberg et al., , 2009)).The candidate gene approach, as well as other approaches such as genome-wide association studies (GWAS), also requires reliable individual-based phenotypic datasets to establish robust links between genetic variants and pathogen infection.However, limited knowledge of individual-level exposure to, or recovery from, infectious diseases in wild host systems can confound or conceal associations with genotypes.In such complex scenarios, bottom-up genome scanning methods therefore offer a valuable alternative to uncover potentially adaptive loci and identify candidate genes for testing in future in-depth studies.These methods involve systematically screening thousands of variants across the genome, operating without the constraints of pre-existing knowledge about candidate genes or specific individual host phenotypes (Pardo-Diaz et al., 2015).
Conducting genome scans to investigate genomic variation linked to divergent and spatially variable selection entails two primary methods: genetic differentiation-based approaches and genotype-environment association (GEA) analyses (Hoban et al., 2016;Schoville et al., 2012).While differentiation-based methods (such as those that involve F ST outlier tests or alternative statistics) identify loci that exhibit extreme values of differentiation among populations compared to the genome-wide average (Beaumont, 2005;Lewontin & Krakauer, 1973), GEA analyses identify loci that show a strong association between the geographic pattern of allele frequencies and spatial variation in an environmental variable (Rellstab et al., 2015).
GEA methods can therefore help to elucidate the environmental selective pressure driving the allele frequency patterns, yet they do not explicitly incorporate individual phenotypic data.Instead, they operate on the assumption that population-level variation in an environmental variable (e.g. in pathogen prevalence) is representative of the strength of selective pressure acting on phenotypes in that population.Importantly, a number of model-based GEA tools exist that also account for the effects of neutral genetic structure across populations (e.g.latent factor mixed models, Bayenv, BayPass, Günther & Coop, 2013;Gautier, 2015;Frichot et al., 2013).Thus, the application of GEA in host-pathogen systems can help to identify the genetic variants that may have accumulated in different populations of the same host species due to spatially variable selective pressure from a specific pathogen (e.g.Fumagalli et al., 2011;Garroway et al., 2013;Mackinnon et al., 2016;Zueva et al., 2014Zueva et al., , 2018)).
Avian malaria (here defined widely as infection by Plasmodium and Haemoproteus spp.) is a well-studied disease of wild birds due to its near-global distribution, wide host range and relevance for host health and conservation (Valkiunas, 2005).Infected birds typically undergo an acute phase, whereby high parasite loads are briefly experienced soon after infection, followed by a chronic phase with low parasitaemia (that may be undetectable in the blood) which can last throughout an individual's lifetime with potential relapses (Lapointe et al., 2012).Evidence from experimentally infected birds has shown reductions in condition and survival as a result of high parasitaemia (Atkinson et al., 1995;Palinauskas et al., 2008;Yorinks et al., 2000).Wild-caught birds are more likely to have chronic levels of malaria rather than acute infections because the probability of capturing an individual depends on its survival and mobility (Mukhin et al., 2016); however, numerous studies have also shown negative effects on fitness including survival, lifespan (Asghar et al., 2015;Lachish et al., 2011;Marzal et al., 2008) and reproductive success (Asghar et al., 2011;Pigeault et al., 2020).Collectively, evidence suggests that malaria has the potential to drive host variation in both immune genes and other parts of the genome related to health and homeostasis.
Berthelot's pipit (Anthus berthelotii) is endemic across three archipelagos in the North Atlantic -the Canary Islands, the Madeira archipelago and the Selvagens.It has only relatively recently dispersed across this island system (c.8500-50,000 years ago) (Illera et al., 2007;Martin et al., 2023;Spurgin et al., 2014) since its origin in the Canary Islands c. 2.1-2.5 million years ago (Martin et al., 2023;Voelker, 1999).Previous genomic work has shown an absence of subsequent gene flow between populations (Martin et al., 2021), thus potentially permitting population differentiation within and among archipelagos through local adaptation.The pipit therefore provides a useful system for studying the initial evolutionary responses to differential selective pressures in a wild vertebrate, including pathogen-mediated selection (e.g.Armstrong et al., 2018Armstrong et al., , 2019;;González-Quevedo et al., 2015, 2016).Patterns of avian blood pathogen pressure are due to the combined effect of climatic, biogeographic and anthropogenic factors (González-Quevedo et al., 2014;Illera et al., 2017;Padilla et al., 2017;Sehgal, 2015;Spurgin et al., 2012) which are highly heterogenous within and among these island archipelagos.Importantly, prevalence of pathogen infection in Berthelot's pipits, that is, avian malaria and pox (Avipoxvirus species), differs markedly between populations, aligning closely with biogeographical expectations based on island size and isolation (Spurgin et al., 2012).Simultaneously, prevalence demonstrates relative consistency within populations, as evidenced by studies conducted over a time frame ranging up to 15 years (Illera et al., 2008;Sheppard et al., 2022;Spurgin et al., 2012).Different populations are therefore likely to have been exposed to consistently different levels of pathogen-mediated selection leading to variation in disease resistance/tolerance evolving across the Berthelot's pipit geographic range.
Polymorphisms in key genes such as Toll-like receptors (TLRs) and the major histocompatibility complex (MHC), as well as a variant near to interleukin-16 (an inflammatory regulator), were found to be associated with malaria (Armstrong et al., 2018(Armstrong et al., , 2019;;González-Quevedo et al., 2016).However, it remains uncertain whether the variants underlying individual trait variation exhibit patterns of local adaptation across varying spatial scales, given the potential influence of distinct coevolutionary processes.Observations within populations or archipelagos may not necessarily extrapolate to broader spatial contexts (Armstrong et al., 2018;Bonneaud et al., 2006;Kloch et al., 2010).Genome scans using measures of population differentiation (EigenGWAS; Chen et al., 2016) have also revealed other loci affecting immunity, homeostasis and metabolism that are highly divergent between and/or within archipelagos (Armstrong et al., 2018;Martin et al., 2021).However, such an approach does not incorporate environmental or disease data, thus revealing little about how these drivers of spatially variable selection affect specific sites in the genome.Furthermore, differentiation-based methods primarily identify the most divergent SNPs, whereas GEA methods are more likely to detect loci with gradual allele frequency changes in response to spatially varying environments (Hancock et al., 2010).
There is also evidence that GEA tests possess higher power than differentiation outlier methods in detecting adaptive loci within island models (Lotterhos & Whitlock, 2015).A broad, multi-population, multi-gene assessment of associations between pathogen prevalence and genetic variation would therefore be beneficial within this species to gain further insights into the genetic basis of disease.
We recently utilised a Bayenv method with restriction siteassociated DNA sequencing (RAD-seq) data across the Berthelot's pipit range to uncover candidate genes involved in the host's response to avian poxvirus (Sheppard et al., 2022).Here, we use RAD-seq data and population-level measures of avian malaria prevalence from Berthelot's pipit, in conjunction with the Bayenv method, to identify loci that may have evolved in response to avian malaria infection.We then explore the overlap between previously identified avian pox-associated single nucleotide polymorphisms (SNPs) and malaria-associated SNPs in an attempt to disentangle the effects of both diseases.We hypothesise that there could be a limited degree of overlap in SNPs linked to genes governing resistance, primarily due to the specificity of pathogen-targeting mechanisms.
In contrast, SNPs linked to genes influencing tolerance are expected to exhibit a higher degree of overlap, given the shared mechanisms aimed at mitigating harm during infections.

| Blood sampling
We used 1574 Berthelot's pipits blood samples obtained between 2005 and 2020 as part of other studies (Armstrong et al., 2019;González-Quevedo et al., 2014;Illera et al., 2007;Sheppard et al., 2022;Spurgin et al., 2012) (details in Table S1).Sampling included the 12 islands of the species' range across Macaronesia (Figure 1).Individuals inhabiting the plateau of Mount Teide (>2000 m above sea level) were considered a 13th population, separated from the lowland Tenerife population by the dense forest vegetation at moderate elevations which the pipit does not inhabit.Individuals were caught using spring nets baited with Tenebrio molitor larvae, and blood samples (c. 25 μL), collected from the brachial vein, were stored in absolute ethanol (800 μL) at room temperature.Every bird was ringed with a numbered metal band from the relevant Spanish or Portuguese authorities to prevent resampling.

| DNA extraction and molecular detection of malaria infection
Genomic DNA was extracted from blood using a salt extraction protocol (Richardson et al., 2001).Samples were screened for Haemoproteus and Plasmodium spp.based on mitochondrial genome sequences using either the nested PCR protocols described in Waldenström et al. (2004) (samples collected prior to 2019) or the multiplex PCR assay described in Ciloglu et al. (2019) (samples collected in 2019 and onwards).Recent research into the comparative effectiveness of nested and multiplex PCR methods in diagnosing malaria suggests that they exhibit comparable sensitivity for Plasmodium and Haemoproteus (Ciloglu et al., 2019;Musa et al., 2022).
However, the multiplex PCR assay outperforms the nested PCR in detecting co-infections, involving more than one species or strain of pathogen.Negative and positive controls were included in each PCR run.Each sample underwent two separate successful PCR screening events (three if the first two tests were inconsistent) and any samples that were positive twice were assigned as infected.All positive samples collected prior to 2019 (n = 400) had been previously Sanger sequenced to identify the strain of haemosporidian (see Armstrong et al., 2019;González-Quevedo et al., 2014;Illera et al., 2008;Spurgin et al., 2012).No evidence of Haemoproteus spp.infection was detected, but four strains of Plasmodium (LK6, LK5, KYS9 and PS1530) were detected with no evidence of mixed-strain infections.The LK6 strain accounted for >95% of infections, therefore testing for differences in strain-specific genetic associations was not possible.Instead, birds were classified by their Plasmodium infection status (not infected/infected) only.

| RAD sequencing
RAD-seq markers have previously been used in GWAS analyses (based on individual assessments of disease) to identify genetic variants associated with LK6 infection status in Berthelot's pipits (Armstrong et al., 2018), and in a GEA to identify genetic variants associated with poxvirus prevalence (population level) across their range (Sheppard et al., 2022).Here, we used the 'Berthelot's' double-digest RAD-seq (ddRAD-seq) library generated in Armstrong et al. (2018) and applied the same filtering steps as in Sheppard et al. (2022).The initial ddRAD-seq library was prepared were mapped to the zebra finch (Taeniopygia guttata) genome version 3.2.4(Warren et al., 2010).Loci with ambiguous genotypes in ≤3 samples or those that showed an excess of missing/ambiguous genotypes (>10% of samples) were treated as missing data.Multiple SNPs were outputted from each RAD-tag and treated as separate loci (SNPs are denoted by the RAD-tag followed by their bp distance from the beginning of the tag), resulting in 9960 SNPs.We then performed additional filtering steps using Plink version 1.9 (Chang et al., 2015), removing loci with a minor allele frequency (MAF) of <0.05 and one variant from each pair showing strong linkage disequilibrium (LD; r 2 > .5)with a sliding window of 50 kb (step size of five).

| Pathogen covariables for association analyses
Population malaria prevalence (Table S2) was calculated using the Plasmodium infection status of all birds sampled between 2005 and 2020 to account for year-to-year variability arising from natural fluctuations in infection rates and differences in sample size (cases of The distribution of Berthelot's pipit populations and the mean malaria (Plasmodium) prevalence (percentage infected) in those populations across a 15-year sampling period.Canary Island populations: EH, El Hierro; FV, Fuerteventura; GC, Gran Canaria; GOM, La Gomera; GRA, La Graciosa; LP, La Palma; LZ, Lanzarote; TEID, El Teide; TF, Tenerife.Madeiran populations: DG, Deserta Grande; M, Madeira; PS, Porto Santo.Selvagens population: SG, Selvagem Grande.
infected individuals/total number of individuals × 100).We considered utilising a binomial logistic regression to model the relationship between malaria and pox prevalence (previously estimated in Sheppard et al., 2022) across populations because prevalence is a proportion derived from counts of infected and uninfected individuals.However, model diagnostics suggested that a beta distribution provided a more suitable fit for our data.Consequently, we represent prevalence as the proportion of infected birds and employed a zero-inflated beta regression using the glmmTMB package (version 1.1.8;Brooks et al., 2017) in R version 4.3.1 (R Core Team, 2023).
A beta regression model is well-suited for variables that are doubly bounded and often exhibit non-normal distributions and heteroscedasticity (Douma & Weedon, 2019).As our results indicated a strong positive correlation between malaria prevalence and pox prevalence across populations (see Section 3), we used both the 'raw' malaria prevalence values and (in a separate model) residuals from malaria prevalence regressed on pox prevalence as inputs for the Bayenv models.This approach allowed us to examine the role of malaria independently of the influence of pox virus, while the 'raw' values encompass more environmental variability and therefore provide greater power.All values were standardised to a mean of 0 and a standard deviation of 1 prior to input into GEA analyses.

| Genotype-environment association analyses
GEAs were conducted using Bayenv2.0 (Günther & Coop, 2013) to identify associations between the allele frequencies of each SNP across populations and the gradient of malaria prevalence (or residual malaria) observed across populations.Subsequent analyses and visualisation of the output were performed using R version 4.3.1 (R Core Team, 2023).
Bayenv2.0 employs a univariate Bayesian framework and explicitly accounts for shared evolutionary history and population structure by incorporating a covariance matrix of population allele frequencies as the null model (Coop et al., 2010;Günther & Coop, 2013).The matrix was generated previously by Sheppard et al. (2022) by running Bayenv2.0 with the filtered ddRAD-seq data and averaging the last covariance matrices of 10 replicate runs, each of 100,000 iterations.
For each SNP in the dataset, the analysis assigns a Bayes factor (BF) value, which describes the strength of evidence provided by the data for the alternative model -that is, a linear relationship between allele frequency and the pathogen covariable across populations -relative to the null model.Rather than relying solely on the parametric model, we also calculated Spearman's rank correlation coefficients (ρ) and ensured candidates indicated by BFs are not confounded by outlying populations (Günther & Coop, 2013).To further confirm this, we plotted population-level malaria prevalence against population allele frequencies for any candidate SNPs, using Plink version 1.9 (Chang et al., 2015), to estimate minor allele frequencies (MAFs).Following the recommendation of Blair et al. (2014), we averaged estimates across five independent runs, each of 1,500,000 iterations and with different random seeds, because Markov chain Monte Carlo (MCMC) algorithms can generate high run-to-run variability (mean pairwise correlation between the five runs for the main analysis: for BF values, R = .690± .222, and for ρ, R = .997± .0001).
SNPs ranked above both the 99th percentile of averaged BF and the 90th percentile of averaged absolute value of ρ were then considered robust candidates, in line with the manual (Gunther & Coop, 2018).We used the zebra finch genome Taeniopygia_guttata-3.2.4 (Warren et al., 2010) in the NCBI Genome Data Viewer v.5.1 (www.ncbi.nlm.nih.gov/ genome/ gdv/ browser) to identify candidate genes.These candidates were defined as genes located within a 10 kb proximity either upstream or downstream of a malaria-associated SNP.
Given the observed extent of LD within populations of Berthelot's pipit (Armstrong et al., 2018;Martin et al., 2021), we consider candidate loci located at this distance to be physically linked to the respective gene.

| RE SULTS
Across the Berthelot's pipit geographic range, there was considerable inter-population variation in malaria (Plasmodium) prevalence (Table S2).Two populations were completely free of infection (Deserta Grande and Selvagem Grande) and prevalence in infected populations ranged between 1.1% in Madeira and 66.3% in Porto Santo -neighbouring populations in the Madeiran archipelago.
Malaria was detected in every population within the Canary Islands, where the general pattern of prevalence is characterised by a negative east-west gradient (Figure 1).At the population level, the proportion of individuals infected with malaria was positively associated with previous estimates (Sheppard et al., 2022) of pox infection within this species (beta regression estimate = 9.16 ± 1.99, p < .01;A total of 2334 SNPs with minor allele frequency >0.05 were retained for analysis with Bayenv following LD filtering (r 2 > .5) of the ddRAD-seq dataset (Armstrong et al., 2018).Twenty-three SNPs (c.1% of the total SNPs) passed the stringent thresholds (Figure 3) and were identified as having an association between allele frequencies and malaria prevalence at the population level following five runs of Bayenv.These associations did not seem to be driven by single outlying populations (Figure 4).The SNPs found in both the top 1% of the averaged BFs (>6.68) and 10% of the averaged absolute values of Spearman's ρ (>0.25) were located in proximity to (within 10 kb upstream or downstream) or within 20 annotated candidate genes (Table 1).Specifically, eight SNPs were located within genes (34.8%), seven were located 10 kb up-or downstream of a gene(s) (30.4%), one was located within 10 kb of a pseudogene (4.3%) and seven were not near genes (30.4%).
When the same Bayenv analysis was performed using 'residual malaria', 18 SNPs exhibited a strong association.However, several of these associations appeared to be influenced by the most divergent, malaria-free population(s) (Figure S2).These SNPs were located in proximity to or within 13 candidate genes (Table S3).Notably, two of these SNPs, corresponding to three candidate genes, demonstrated significant associations that were shared between the analyses conducted with 'raw' prevalence values and 'residual malaria'.Specifically, eight SNPs were located within genes (34.8%), three

F I G U R E 3
The distribution of Bayes factor values and absolute Spearman's rank correlation coefficients (ρ), generated and averaged from five replicate runs of Bayenv, for genome-wide ddRAD SNPs among 13 Berthelot's pipit populations with respect to malaria prevalence.SNPs were considered candidates for associations with population-level malaria prevalence if they ranked in the highest 1% of Bayes factor values (>6.68, threshold indicated by the vertical red line) and 10% of Spearman's ρ (threshold indicated by the horizontal red line).Twentythree SNPs were identified as candidates (in red).

F I G U R E 4
Minor allele frequency (MAF) of 23 candidate SNPs identified by testing for associations with population-level malaria prevalence in Bayenv.Bayenv corrects for neutral population structure among allele frequencies in the analysis.Nearby genes are noted below the SNP names.Populations are grouped by archipelago (CI, Canary Islands; M, Madeira; S, Selvagens) and then ordered according to malaria prevalence (highest-lowest).Malaria-free populations are indicated by an asterisk.Acronyms: DG, Deserta Grande; EH, El Hierro; FV, Fuerteventura; GC, Gran Canaria; GOM, La Gomera; GRA, La Graciosa; LP, La Palma; LZ, Lanzarote; M, Madeira; PS, Porto Santo; SG, Selvagem Grande; TEID, El Teide; TF, Tenerife.
were located 10 kb up-or downstream of a gene(s) (16.7%), one was located within 10 kb of a pseudogene (5.5%) and six were not near genes (33.3%).

| DISCUSS ION
Applying a broad multi-population GEA approach, we identify candidate SNPs associated with avian malaria across the range of Berthelot's pipit.Population-level allele frequency variation at these candidate SNPs correlated with the local prevalence of malaria after controlling for genome-wide divergence due to neutral structure.
These results reveal genes potentially involved in a host's response to malaria and suggest that local adaptation to spatially variable pathogen pressures, such as malaria, may be an important driver of genetic variation in Berthelot's pipit.
Spatially variable selective pressure from avian malaria has been shown to shape host genetic structure across populations (Foster et al., 2007), drive variation in immunity and health (Names, Schultz, Hahn, et al., 2021) and drive the evolution of variable tolerance (Atkinson et al., 2013).In the present study, the main GEA analysis revealed 23 loci that showed strong evidence of association with population-level malaria prevalence, likely as a result of selective pressure.While Bayenv corrects for neutral population allele frequency patterns (Coop et al., 2010;Günther & Coop, 2013), the methods used here are correlative so inferences can be confounded by collinearity along the axis of interest, whether with an additional environmental variable (discussed below in relation to pox) or the principal axes of neutral population structure.From previous work, malaria prevalence seems to be independent of genetic structure across Berthelot's pipit populations; rather, other biogeographical and environmental factors likely explain the distribution of avian malaria (Spurgin et al., 2012).For instance, Porto Santo experiences the highest levels of malaria prevalence yet other populations within the same archipelago, between which there is little genetic structure, are completely free of avian malaria (Figure 1).Hence, it is unlikely that prior distribution of susceptible alleles through founder effects would drive associations between malaria and specific loci.
Instead, the identified genomic associations are consistent with A set of 20 malaria-associated candidate genes were located within 10 kb of the 23 associated SNPs, many of which have not been previously linked to divergence across this system.The molecular mechanisms that underlie the host response to malaria have predominantly been investigated in humans and murine models, as well as captive and natural populations of birds.We did not identify some of the immune-related genes commonly reported to influence malaria-related phenotypes in these taxa such as g6pD, abO, cd40lG, tnF, atp2b4, beta globins, interleukins, toll-like receptors or MHC (Bonneaud et al., 2006;Longley et al., 2011;MalariaGEN, 2019;Ravenhall et al., 2018;Rockett et al., 2014;Sepil et al., 2013).This may in part be due to a lack of genomic resolution to evaluate these in our study (see below) or the differences in specific host-pathogen interactions.However, we found a few genes that function in the innate immune system and inflammatory response (c8A, c8B and smad2).For example, all complement pathways culminate in the formation of the membrane attack complex -an important innate immune effector that transverses pathogen cell membranes and facilitates pathogen clearance -of which complement 8 (c8) is a key component (Bayly-Jones et al., 2017).Studies in both humans and mice have revealed the complement system is activated against malaria ( Kurtovic et al., 2020;Silver et al., 2010) with elevated levels of membrane attack complex formation observed during infection (Roestenberg et al., 2007).In birds, other genes of the complement system have been shown to be differentially expressed during infection with malaria (Videvall et al., 2015) and there is evidence that elevated complement activity might act to control parasitaemia (Ellis et al., 2015).Also, smad2 encodes an intracellular transcription factor, activated by the transforming growth factor β (TGFβ) signalling pathway, which together regulates inflammation and interacts with other immune system pathways.TGFβ itself was inferred to be under selection from malaria in low-elevation populations of Hawai'i 'amakihi (Chlorodrepanis virens) (Cassin-Sackett et al., 2019).Given the dual role inflammation plays in malaria clearance and associated pathological changes (Popa & Popa, 2021), genes involved in its regulation may represent important mechanisms of malaria tolerance.Overall, these genes correspond with previous findings that suggest both the innate immune system and inflammatory response are commonly initiated in response to malaria infection, thus these genes seem appropriate candidates for further study into responses to malaria infection.
Recent transcriptomic studies have uncovered genes that are upregulated or downregulated during infectious disease (e.g.Boštjančić et al., 2022;Chu et al., 2016;Huang et al., 2013;McNew et al., 2022;Rosenblum et al., 2012).Those examining gene expression changes in the host response to malaria have identified genes that were related to a number of functions beyond those strictly within the immune system, including the stress response, cell death regulation, metabolism, cell signalling, transcriptional regulation and telomerase activity (Quin et al., 2017;Videvall et al., 2015Videvall et al., , 2020)).In the present study, we detected associations with malaria prevalence within two of the same genes that were previously overexpressed in infected avian hosts (Videvall et al., 2015), ube2H and cwc27.Both of these genes have been associated with metabolic processes; ube2H is involved in ubiquitin-dependent protein catabolism, targeting proteins for degradation and can also affect antigen processing and presentation (Stewart et al., 2016), and cwc27 is a spliceosome component likely involved in mRNA processing and protein metabolic processes (Busetto et al., 2020).Among the malaria-associated variants identified in the present study, many were also located near genes with links to protein and RNA metabolism (hspa8,sod1,mylk3,dupd1,kat6B,zcchc14,znf423,xrcc6bp1) and a single variant was near a gene involved in fatty-acid oxidative metabolism (mlycD).Immune functions incur high energetic and metabolic costs (Lochmiller & Deerenberg, 2000;Martin et al., 2003;Ots et al., 2001) and there is some evidence that energy costs of infection may impact body condition in Berthelot's pipit (Spurgin et al., 2012), thus metabolic adaptations could influence host fitness.
During malaria infection, the production of reactive oxygen species (ROS) occurs as a result of phagocytosis (which is beneficial for pathogen clearance), as well as host inflammatory signalling, host metabolic changes induced by infection and the degradation of haemoglobin by the pathogen itself (Vasquez et al., 2021).Thus, the production of ROS is a key component of the host defence response, however, high levels can lead to non-specific oxidative damage to host lipids, proteins and DNA (Szabó, 2003).The imbalance between oxidants and antioxidant defences is referred to as oxidative stress and is a common occurrence across a variety of taxa experiencing  (Pamplona & Costantini, 2011).Among the candidate genes we identified in this study, hspa8, an isoform of heat shock protein 70 (hsp70), contributes to the protein damage stress response (Sottile & Nadin, 2018).hsp70 has previously been shown to be highly differentiated between malaria-naïve and -exposed populations of Hawai'i 'amakihi (passerine birds) and therefore inferred to be under selection (Cassin-Sackett et al., 2019).Another candidate gene, xrc-c6bp1, is potentially involved in the repair of DNA double-strand breaks (Fischer & Meese, 2007), but as far as we are aware, it has not been linked to malaria response previously.A variant near sod1 was also identified.This gene encodes a well-known antioxidant enzyme that catalyses superoxide radicals, thereby regulating levels of ROS and limiting their potential toxicity (Fridavich, 1995).
Superoxide dismutase has been repeatedly linked to malaria infection across a range of taxa, including evidence for lower antioxidant activity in birds (Jiménez-Peñuela et al., 2023), decreasing levels in monkeys (Srivastava et al., 1992) and downregulation of gene expression (Reuling et al., 2018) and links with parasitaemia in humans (Andrade et al., 2010).There is also evidence that polymorphism within sod1 may be associated with malaria susceptibility (Fernandes et al., 2015).Therefore, it is plausible that variation within or near these genes may mediate stress, minimise tissue damage and affect the host's ability to tolerate infection by malaria.
Many of the genes identified in our study are regulatory (e.g.hspa8, smad2, c8A, c8B, sod1, mylk3, mlycD, kat6B, znf423, rpap3, crhr2, adgrd1), which suggests that mechanisms for pathogen tolerance may be an important part of the response to malaria infection in Berthelot's pipit.Furthermore, our findings suggest malariamediated selection may drive the evolution of parts of the genome involved in diverse functions related to metabolic processes, stress response and transcriptional regulation, as well as the innate immune system and inflammatory response.This result underscores the importance of genome-wide approaches for the identification of candidate genes that underlie host-pathogen interactions, even when their relevance may not be immediately apparent.Some of the genes we have identified have multifarious roles in broad cellular processes (Table 1), therefore understanding the exact mechanism by which these genes influence the host's malaria response is not possible here.While much of the variation that we have identified seems to be linked with appropriate candidate genes that may putatively confer defence against malaria, we stress that validating an adaptive role requires further exploration through functional, General signatures of selection in the genome of Berthelot's pipit have previously been investigated at the population and archipelago level using differentiation-based analyses (Armstrong et al., 2018;Martin et al., 2021), with a single overlapping candidate gene region identified between studies at these different scales.In contrast to differentiation-based methods, GEA methods have the capability to identify the specific environmental variables driving selection, providing a more direct insight into the selective pressures at play.
Here, we have specifically looked at loci that align with malaria prevalence across the geographic range.One of the genes identified in the current study (znf598), which codes for a zinc finger protein, was inferred to be under divergent selection between archipelagos (Armstrong et al., 2018).This family of proteins can interact with DNA, RNA and proteins and are involved in numerous cellular functions including transcriptional regulation, ubiquitin-mediated protein degradation and regulation of apoptosis (Cassandri et al., 2017), but the link to malaria protection is unclear.Thus, the most strongly divergent SNPs among populations of Berthelot's pipit (as identified in differentiation-based analyses) may not be shaped by pathogen pressures because we only detected a signal for one of the same genes using the GEA.
We detected genetic associations with population-level malaria prevalence within our RAD-generated SNP set.While RAD-seq is a cost-effective way to generate genome-wide marker sets, it is a reduced representation sequencing approach so potentially important genes in unrepresented regions could have gone undetected.
This could explain why we did not detect immune genes identified as being relevant for malaria resistance in other taxa (mentioned above), or additional immune genes overall since only a small fraction of avian immune genes can be evaluated with this dataset (Sheppard et al., 2022).Equally, some identified variants were not located within genes, which could either reflect the limited density of the marker set or potentially important functional non-coding regions.Even when associated variants were identified within genes, some lack research (e.g.zcchc14) making it difficult to interpret their potential function during infection.It is also important to note that while Bayenv is a powerful tool, it is not without weaknesses.False positives can derive from confounding correlations between environmental factors, which may be especially difficult to separate at the between-population level (see below), or even strong genetic drift.We employed a rather conservative approach, selecting only the candidates that ranked above high percentiles.While this could increase the number of false negatives, the aim of this study was to detect the top candidate gene set that merits future withinpopulation investigation, rather than every possible association with malaria.Future research aimed at detecting additional variants Concurrent infections of avian malaria and poxvirus have been documented in a number of host species (e.g.Alley et al., 2010;Atkinson et al., 2005;Ha et al., 2012;Samuel et al., 2018), including Berthelot's pipits (Spurgin et al., 2012).Although high pathogen prevalence itself might suggest a strong likelihood of coinfection, our previous research demonstrated non-random cooccurrence within the pipit populations of Tenerife and Porto Santo through individual-based analyses (Sheppard et al., 2022).
In that same study, we used a Bayenv analysis to identify genetic associations with pox prevalence differences between populations.Here, we show that patterns of both malaria and pox prevalence are similarly structured at the population level, thus making it technically difficult to isolate the effects of either pathogen in this natural system.Due to the univariate nature of Bayenv2.0, the model lacks the capability to account for shared variance among multiple predictors.Therefore, caution must be exercised when interpreting loci correlated with multiple environmental variables.
While some identified genes may be associated with pox rather than malaria, and vice versa, there is variability allowing the detection of additional candidates involved in malaria protection.In the current study, we identified 15 novel candidate genes (17 variants) putatively involved in malaria response in Berthelot's pipit that were not identified in the previous pox analysis.Conversely, seven genes (eight variants) identified in the previous pox analysis were not associated with malaria prevalence in the current study, and only five genes (six variants) were identified in both studies (hspa8, smad2, ube2H, cwc27 and mlycD).
To better discern malaria's individual contribution as a driver of genetic variation, we also tested for genomic associations with 'residual malaria' prevalence.Beyond the three initially identified genes with 'raw' prevalence values (c8A, c8B and crhr2), the analysis identified an additional 10 candidate genes (16 SNPs).As expected, none of these genes, including the overlapping three, were identified in the previous pox study (Sheppard et al., 2022).A literature search did not reveal logical associations with malaria infection for these new genes (Table S3), however, a few have been identified as outliers/candidates in prior pipit studies using differentiation-based outlier tests (wdr72) (Armstrong et al., 2018;Martin et al., 2021) or in individual-level associations with other unrelated traits such as bill length (plekhn1) (Armstrong et al., 2018).Many of these outliers can be attributed to a combination of factors, including the demographic history of certain populations aligning with the gradient of malaria prevalence across populations.The 'residual' analysis, with reduced variation in pathogen data, has constrained power, potentially resulting in the dominance of drift-driven false positives.Notably, smaller populations like Selvagem Grande and Deserta Grande, having undergone extreme bottlenecking and being the only malariafree populations, may be driving these new associations when power is limited (refer to Figure S2).Considering the analysis with 'raw' malaria prevalence had significantly greater power, associations with residual malaria did not yield a substantially improved candidate gene set.However, future in-depth analyses examining SNPs associated with malaria in this species should undoubtedly include c8A, c8B and crhr2.

Reasons for the association between malaria and pox in
Berthelot's pipit are unknown but could include common risk factors that mediate exposure, such as host or vector densities, behaviours and climatic/habitat conditions (Hellard et al., 2015;Johnson & Buller, 2011).For example, the geographic distribution of malaria and pox infections might be shaped by similar anthropogenic factors at very local scales, as has been shown in other studies (Carrete et al., 2009;González-Quevedo et al., 2014;Padilla et al., 2017) and both pathogens can be transmitted among wild birds via mosquito vectors (Akey et al., 1981;Forrester, 1991).Alternatively, the co-occurrence of these pathogens may reflect pathogen-induced immunosuppression facilitating subsequent infection by a second pathogen (synergistic interaction) (Clark et al., 2016;Schat & Skinner, 2013).Many pathogenic infections, including malaria, are known to induce immunosuppression of the host response, increasing host susceptibility to other pathogens (Cox, 2001;Mabbott, 2018).An investigation of avian pox and malaria in Hawaiian birds by Samuel et al. (2018) concluded from epidemiological data that synergistic interactions or age-related acquisition were likely explanations for positive associations between these two pathogens.We believe the latter is unlikely in our system since age was not a significant predictor of malaria (Spurgin et al., 2012) or pox infection status (Sheppard et al., 2022).
We have been operating under the assumption that the covariation of SNP frequencies with malaria prevalence reflects pathogenmediated selection.It is important to note, however, that the causal relationship might also be reversed; populations with high frequencies of specific protective SNPs could potentially be more resistant to malaria.The mechanisms identified in this study, such as the regulation of inflammation, metabolism, stress response and cell death, likely represent broad defence mechanisms that could be activated during infections by various pathogens.If these mechanisms respond similarly to pox infection, then the cooccurrence patterns of avian malaria and pox across Berthelot's pipit populations could, theoretically, be attributed to spatial variation in underlying resistance or tolerance to multiple pathogens.Nevertheless, we consider this scenario unlikely, given that the spatial cooccurrence patterns align with biogeographical rules (Spurgin et al., 2012) and are also likely to be influenced by additional environmental factors (González-Quevedo et al., 2014).Regardless, the confounding factors that exist within the data presented here make it difficult to tease apart the initial drivers of pathogen cooccurrence.
Using the literature across taxa, we categorised genes identified in the GEA pipit studies (Sheppard et al., 2022, and the present study; 27 in total) that might overlap due to the collinearity between pox and malaria prevalence across populations.This categorisation was based on whether these genes have previously been directly linked to malaria, pox or both (Table 2).Among these, five genes exhibited associations with both pox and malaria in Berthelot's pipit (Tables 1 and 2).Two of these had previously been linked with both malaria and pox in other species and taxa, two had been linked with malaria only and none had been linked exclusively to pox only in other species or taxa.For genes identified exclusively in either of our two pipit studies (i.e. for an association with malaria prevalence only or an association with pox prevalence only), only malaria-associated candidates could be linked with malaria and/or pox in other species or taxa.However, a lack of research and annotation for many genes is likely to be an issue.For example, it seems genes with easily identifiable functional pathways, such as the heat shock response, complement pathway and corticosterone stress response, could clearly be linked to both pathogens.The greater number of genes linked to malaria does not necessarily imply a higher involvement of candidates in malaria response compared to pox response.It might reflect an imbalance in research efforts into malaria and pox viruses.
Ideally, we would leverage the gradient of malaria and pox prevalence across populations of Berthelot's pipits (Figure 2) to determine the most suitable populations to try and isolate the effect of each pathogen.In many populations where one pathogen appears to be absent, the other is present but at low infection rates (c.1%-5%).Consequently, selective pressure from that pathogen might be too weak to detect any associations.Outlier populations TA B L E 2 Evidence from the literature for links between the candidate genes identified in Berthelot's pipit and malaria and/or poxviruses in other species/taxa.

F
SNPs identified as being associated with population-level malaria prevalence using Bayenv, ordered by Bayes factor (BF) values.chain Encodes subunits of the complement system component essential for the formation of the membrane attack complex (MAC).The MAC serves as an important immune effector, creating pores in the membranes of target cellscontrol (RQC) pathway when a ribosome has stalled during translation May play a role in regulated exocytosis Note: SNPs highlighted in blue text were previously also identified as being associated with pox prevalence (Sheppard et al., 2022) using the same genetic dataset in a Bayenv analysis.a Chromosome location and base position based on Taeniopygia_guttata-3.2.4 genome assembly.b Within 10 kb of focal SNP.TA B L E 1 (Continued) locally divergent selection across the species' range and provide support for the hypothesis that there is adaptation to the local pathogen regime.
experimental and genomic investigations.If the GEA has identified genes involved in local adaptation in response to malaria, then future research should establish the association among gene-specific variants, infection status and individual host fitness.This can be achieved through extensive within-population studies or -although challenging in non-model wild species like Berthelot's pipit -experimental infection studies.
under malaria-mediated selection, or indeed corroborating the candidate genes identified in the current study, may utilise alternative statistical approaches.Specifically, harnessing the capabilities of whole-genome sequencing data would facilitate a comprehensive identification of genetic variants.The broad coverage provided by whole-genome data enhances statistical power, offering a significant advantage in capturing subtle or weak signals of selection.Furthermore, exploiting museum specimens could prove valuable for characterising historical polymorphic SNPs that are close to or have reached fixation in contemporary populations due to positive or purifying selection.
of corticosterone, the main glucocorticoid stress hormone in birds, for which high levels may result in higher malaria parasite load in a passerine bird(Names, Schultz, Krause, et al., 2021) Candidate genes listed here were identified by testing for associations with population-level avian malaria (in this study) or avian pox(Sheppard et al., 2022) prevalence in Bayenv.Genes are first ordered according to the associated pathogen(s) and then by decreasing Bayes factor value.