Genome‐wide evidence for divergent selection between populations of a major agricultural pathogen

Abstract The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX‐based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host‐driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation.

fields, which imposes strong directional selection that leads to the evolution of more virulent and drug-resistant pathogen populations (Stukenbrock & McDonald, 2008). Despite abundant evidence for rapid evolution of pathogens in agricultural ecosystems, the loci involved in adaptive evolution remain largely unknown. Identifying loci under recent positive selection in populations can provide key insights into the mechanisms of adaptation (Weigel & Nordborg, 2015).
Signatures of positive selection were found in key loci functionally shown to play a role in the evolution of virulence, that is effector genes, and fungicide resistance in pathogen populations (Aguileta, Refr egier, Yockteng, Fournier, & Giraud, 2009). During the co-evolution of hosts and pathogens, the host evolves defence mechanisms that target specific pathogen genotypes. In turn, the pathogen evolves strategies to escape recognition by the host and infect the host tissues (Jones & Dangl, 2006). Pathogen effector genes should be under particularly strong positive selection because expressing these genes can be highly detrimental or beneficial for the pathogen (de Jonge, Bolton, & Thomma, 2011;Presti et al., 2015). Strong directional selection was found to affect polymorphism in a small number of well-characterized effector genes. These genes mostly encode small secreted proteins that are under selection to avoid recognition by the prevalent host genotypes (Dai, Jia, Correll, Wang, & Wang, 2010;Van de Wouw et al., 2010). A few genes encoding cell wall degrading enzymes and host-specific toxins were also under positive selection in pathogen populations (Brunner, Torriani, Croll, Stukenbrock, & McDonald, 2013;McDonald, Oliver, Friesen, Brunner, & McDonald, 2013). The fixation of adaptive mutations in the mitochondrial cytochrome b gene and the CYP51 gene contributed to strobilurin and azole fungicide resistance, respectively (Brunner, Stefanato, & Mcdonald, 2008;Brunner, Stefansson, Fountaine, Richina, & McDonald, 2015;Delmas et al., 2017;Estep et al., 2015;Pereira, McDonald, & Brunner, 2017;Torriani, Brunner, McDonald, & Sierotzki, 2009). Despite these examples, the proportion of the genome under positive selection and the functions of the loci in these selected regions remain largely unknown.
In contrast to the environmental and genetic uniformity present at the field scale, agricultural landscapes may be highly heterogeneous at the continental scale. Environmental heterogeneity in agricultural landscapes may also be created on a temporal scale through crop rotations. These spatial and temporal differences in environmental heterogeneity are likely to impose selection on the corresponding pathogen populations. Earlier analysis of global genotypic and phenotypic diversity in allopatric fungal populations indicated that divergent selection likely contributed to local adaptation Stefansson, McDonald, & Willi, 2013;Zhan & McDonald, 2011;Zhan, Stefanato, & McDonald, 2006;Zhan et al., 2005). Divergent selection can also be detected at a very local scale, with host genotypes or fungicide treatments selecting for adapted pathogen populations even within single fields (Cowger, Hoffer, & Mundt, 2000;Walker et al., 2017). Because similar agricultural practices can lead to similar environments (e.g., by planting genetically identical crops, applying the same fertilizers and spraying the same fungicides) for pathogen populations on different continents, there are opportunities for parallel adaptation affecting the same pathogen traits. However, it remains largely unknown whether the same loci will be affected in the same way by similar selection pressures applied in different regions .
Genome-wide signatures of recent selection can be detected using genome scans. Selective sweeps are detected based on changes in genetic diversity along chromosomes, with the power to detect sweeps resting largely on hitchhiking effects between an adaptive locus and proximal polymorphisms. Scans for divergence screen for extreme population differentiation at a small subset of the loci (Nielsen, 2005;Vitti, Grossman, & Sabeti, 2013). Genome scans were successfully applied to fungal populations found in natural ecosystems. Selective sweeps were found to affect between 1% and 17% of the genome in two sister species of the anther smut fungus, Microbotryum lychnidis-dioicae and M. silenes-dioicae (Badouin et al., 2017). The sweep regions contained several genes with pathogenicity-related functions. A dominant force of selection was proposed to be adaptation to the host (Badouin et al., 2017). In other plant-associated fungi, genome-wide scans for divergent selection identified several outlier regions for population divergence along a gradient of abiotic environments that included salinity and temperature (Branco et al., 2015(Branco et al., , 2017Ellison et al., 2011). The genomes of fungal pathogens in agricultural ecosystems are likely to be similarly affected by selection due to a combination of biotic and abiotic factors, but there was no empirical evidence for this until now.
The fungus Zymoseptoria tritici is the most damaging wheat pathogen in Europe (Fones & Gurr, 2015). The fungus establishes itself first as an apparent biotroph on wheat leaves, then switches to necrotrophy after killing the host cells and finally lives as a saprotroph on the dead plant material. The fungus undergoes several cycles of sexual and asexual reproduction annually (Eyal, 1999). High levels of gene flow through airborne ascospore dispersal and frequent sexual reproduction maintain large effective population sizes, leading to a rapid decay in linkage disequilibrium (Croll, Lendenmann, Stewart, & McDonald, 2015;Zhan et al., 2005). Given these properties, we expect that Z. tritici populations should respond rapidly to selection pressures during the cropping season. Studies of field populations showed that the pathogen rapidly evolved resistance to fungicides and gained the ability to infect previously resistant hosts (Cowger et al., 2000;O'Driscoll, Kildea, Doohan, Spink, & Mullins, 2014). Recent studies based on quantitative trait loci (QTL) mapping identified candidate loci for fungicide resistance, melanization, temperature sensitivity and virulence Lendenmann, Croll, Palma-Guerrero, Stewart, & McDonald, 2016;Lendenmann, Croll, Stewart, & McDonald, 2014;Mirzadi Gohari et al., 2015;Stewart et al., 2018). Genome-wide association mapping showed that the pathogen overcame host resistance by mutations in genes encoding small secreted proteins (Hartmann, S anchez-Vallet, McDonald, & Croll, 2017;Zhong et al., 2017). Transcriptomic studies showed waves of expression of genes encoding secreted proteins (i.e., effectors, peptidases, peroxidases and cell wall degrading enzymes) and secondary metabolite production pathways during infection Rudd et al., 2015). In addition, analyses of synonymous versus nonsynonymous substitutions among Z. tritici and its wild relatives identified 27 positively selected genes . Three of these genes had an experimentally validated impact on virulence and reproduction during infection (Poppe, Dorsheimer, Happel, & Stukenbrock, 2015).
A study using a similar strategy identified diversifying selection in six of 48 genes encoding plant cell wall degrading enzymes . Although targeted analyses of loci segregating variation within the species and genes under positive selection among lineages identified candidate genes involved in adaptation, the genomic regions and corresponding genes under recent selection in populations of Z. tritici are unknown.
In this study, we aimed to identify the proportion of the genome and the specific gene functions under recent positive selection in populations of Z. tritici. We analysed whole-genome data of 123 isolates from four field populations located on three continents and planted to different wheat cultivars. In Oregon (USA), isolates were from two sympatric populations collected on the same day from two different wheat cultivars planted in the same field (Croll, Zala, & McDonald, 2013;Torriani, Stukenbrock, Brunner, McDonald, & Croll, 2011;Zhan et al., 2005). Population structure was found to be consistent with the geographic origin of the isolate, with distinct environments likely leading to locally adapted pathogen populations based on selection for fungicide resistance, temperature sensitivity and virulence (Zhan et al., 2005(Zhan et al., , 2006. Individual populations harboured substantial genetic variation based on genome-wide analyses of single nucleotide polymorphism (SNP) and gene content . Linkage disequilibrium decayed within 10 kb for all populations and within much less distance in the most polymorphic populations  We analysed a total of 123 isolates of Z. tritici that were collected from naturally infected wheat fields between 1990 and 2001: Australia (n = 26), Israel (n = 24), Switzerland (n = 27) and Oregon, USA (n = 46). In each location, all isolates were collected from a single field and cultivar, except for the Oregon isolates that were collected from two different wheat cultivars, Madsen and Stephens, growing in the same field (Zhan et al., 2005). Whole-genome sequencing data were generated for all 123 isolates Torriani et al., 2011). In brief, high-quality genomic DNA was extracted from liquid cultures and Illumina paired-end sequencing of 100-bp read length, and an insert size of ca. 500 bp was performed to generate 0.7-2.5 Gb sequence data per isolate. All Illumina sequence data are accessible from the NCBI Short Read Archive (for Accession nos, see Supporting information: Table S1).
Average coverage ranged from 89 to 299 for all 123 isolates and no evidence of chromosomal aneuploidy was found .
SNPs were filtered for quality using the GATK VariantFiltration and SelectVariants tools (QUAL < 250). We retained 1,678,238 SNPs with a genotyping rate >90% from FREEBAYES. To identify the most stringently called SNPs, we retained only SNPs called using Haplo-typeCaller and FREEBAYES using the option -diff-site of VCFTOOLS version 0.1.13 (Danecek et al., 2011). The overlap contained 1,456,070 SNPs (i.e., 83.2% of all HaplotypeCaller SNPs; Supporting information: Figure S1A). The SNP quality (QUAL) and the alternative allele frequency were highly correlated (Supporting information: Figure S1B-S1D). Pearson's product-moment correlation tests were performed using the open source software R. We excluded SNPs located on accessory chromosomes and tri-allelic SNPs. After these additional filtering steps, we retained a total of 1,375,999 SNPs for all further analyses. We annotated and predicted the effect of SNPs using SnpEff 4.3i (Cingolani, Platts et al., 2012) and SnpSift (Cingolani, Patel et al. et al., 2012).

| Assignment of ancestral and derived SNP alleles
To identify ancestral alleles at SNPs, we analysed whole-genome sequencing data of the two closest known sister species of Z. tritici.

| Analyses of population structure
We analysed the population structure of all 123 Zymoseptoria tritici isolates using three methods. We performed a principal component analysis (PCA) based on all SNPs using the software TASSEL version 5.2.14 (Bradbury et al., 2007). We estimated population differentiation by calculating all pairwise F ST fixation indices among populations (Wright, 1951). F ST was calculated using the R package {hierfstats} (Goudet, 2005) that implements Yang's algorithm (Yang, 1998). Furthermore, we used the Bayesian unsupervised genetic clustering algorithm implemented in the software STRUCTURE version 2.3.4 (Pritchard, Stephens, & Donnelly, 2000). We used an admixture model with correlated frequencies and no prior information about the population demography. The K parameter was tested for values ranging from 1 to 8 with 10 repetitions for each tested K value. We used 50,000 samples as a burn-in period and 100,000 samples per run for the Monte Carlo Markov Chain (MCMC) replicates. Parameter convergence was inspected visually. Cluster assignment probabilities were computed using the CLUMPP program (Jakobsson & Rosenberg, 2007) and prepared for visualization using the DISTRUCT program (Rosenberg, 2004). For

| Detection of selective sweeps
We detected selective sweeps using an extended haplotype homozygosity (EHH) and a composite likelihood ratio (CLR) tests. We performed each test for the four allopatric populations separately and included only SNPs with known ancestral states. For both tests, we used conservative percentile thresholds of the test statistics distribution to identify the strongest selective sweep regions. For the EHH test, we computed the integrated haplotype score (iHS) measure as implemented in the R package REHH (Gautier & Vitalis, 2012). The EHH corresponds to the decay of haplotype identity as a function of distance (Sabeti et al., 2007). Alleles favoured by positive selection are expected to be found on long haplotypes. The iHS statistic aims to detect abnormally long haplotype blocks by comparing the integrated EHH of the ancestral allele and the integrated EHH of the derived allele at each SNP. The method controls for heterogeneous recombination rates across the genome (Voight, Kudaravalli, Wen, & Pritchard, 2006). The 99.9th percentile of the distribution of absolute iHS values was used as a threshold for the detection of outlier SNPs. We clustered outlier SNPs in a set of selective sweep regions based on EHH variation around significant SNPs. Similar to Park et al. (2012), we computed windows around each core SNP based on a EHH decay threshold (Park et al., 2012). To be conservative, we computed windows around each core SNP where EHH decayed to 0.4 and grouped SNPs with windows overlapping on length >50%. Selective sweep region coordinates were based on the coordinates of the largest overlaps. As a second method, we performed the CLR test implemented in the software SWEED v3.3.2 (Pavlidis, Zivkovi c, Stamatakis, & Alachiotis, 2013). SWEED analyses the variation in the site-frequency spectrum along the chromosome and implements the composite likelihood ratios (CLRs) test of SweepFinder (Nielsen, 2005). The CLR statistic computes the ratio of the likelihood of a selective sweep at a given position (referred to as grid point) to the likelihood of a null model without a selective sweep.
The CLR statistic is robust to demographic events such as population expansions because the null model relies on the variation of the site-frequency spectrum along the sequence of the whole genome or a full contig rather than the standard neutral model (Nielsen, 2005;Pavlidis et al., 2013). We calculated the CLR within each population and for each chromosome separately at grid points for every 1 kb. As low SNP density can lead to misleading CLR scores, we computed SNPs density in 50 kb nonoverlapping windows along chromosomes at the genome-wide level. We retained only CLR values of grid points contained in 50 kb windows of at least 100 SNPs. Chromosome-wide estimations of linkage disequilibrium decay were previously estimated to range from 0.6 to 2.2 kb in these three populations . Based on these estimations, we maintained separate sweep regions if the distance between grid points with outlier CLR scores was at least 5 kb. For each identified selective sweep region, we extended the region containing outlier CLR scores by adding 2.2 kb to each end to account for potential blocks of high linkage disequilibrium. As the Australian population showed slower linkage disequilibrium decay (decay to r 2 < 0.2 within 10.2 kb; , we grouped grid points with outlier CLR scores if the grid points were <20 kb and extended selective sweep regions by adding 10 kb to each end. We considered selective sweep regions to be shared among populations and identified by both methods if the selective sweep regions overlapped by >50% in length. At last, we calculated the nucleotide diversity per site (p) and the Tajima's D (Tajima, 1989) statistic per gene using the POPGENOME R package (Pfeifer, Wittelsb€ urger, Ramos-Onsins, & Lercher, 2014). We performed the analyses in the four Z. tritici populations separately using the entire SNP data set for each population. We calculated statistics only for genes containing at least 10 SNPs (i.e., 4,865, 9,299, 9,025 and 8,096 genes in the populations from Australia, Switzerland, Israel and Oregon, respectively).

| Divergence tests and detection of outlier loci
To detect SNPs harbouring signatures of population divergence, we calculated the XtX statistic implemented in the program BAYPASS (formerly BAYENV) using the software default parameters (Gautier, 2015).
We used all SNPs detected among the 123 isolates with a minor allele frequency of at least 0.05 (732,840 SNPs in total). A covariance matrix of allele frequencies was generated for all SNPs and used in the inference model to incorporate information on the shared demographic history of the populations (Coop, Witonsky, Rienzo, & Pritchard, 2010;G€ unther & Coop, 2013). To determine a significance threshold for loci under selection, we used the R function simulate.baypass() provided by BAYPASS to generate a pseudoobserved data set of 1,000 SNPs following the specified inference model. We then computed the XtX statistics for this pseudoobserved data set. In addition, we extracted the XtX values for a set of 1,457 synonymous SNPs separated by at least 20 kb (Supporting information: Figure S5). As a conservative significance threshold, we used the maximum XtX value obtained for both data sets. In addition, we computed pairwise F ST values in 1,000-bp nonoverlapping windows for the entire set of SNP loci (Nei, 1973;Wright, 1951).
We used the R package {hierfstat} (Goudet, 2005) that implements Yang's algorithm (Yang, 1998)  To investigate the presence of regions under positive selection in the two Oregon sympatric populations, we computed the XtX statistic as described above on a set of 571,265 polymorphic SNPs with minor allele frequency >0.05. We performed the cross-population extended haplotype homozygosity (XP-EHH) test (Sabeti et al., 2007) using the R package REHH. We used as an outlier detection threshold the 99.9th percentile of the distribution of absolute XP-EHH values. For divergence scans, we clustered significant SNPs in separate genomic regions with high population differentiation if the distance between significant SNPs was at least 5 kb. This distance corresponds to the average distance of linkage disequilibrium decay (r 2 < 0.2) across the four populations.

| Analysis of loci in candidate regions
Selective sweep regions identified by genome scans were analysed for their gene content using the updated gene models produced for the reference genome (Grandaubert, Bhattacharyya, & Stukenbrock, 2015). We used an in-house pipeline to functionally annotate all | 2729 based on RNAseq data obtained from analyses of wheat seedling infections (Rudd et al., 2015). Rudd et al. (2015) measured gene transcript expression at five key stages of the wheat infection cycle (i.e., 1, 4, 9, 14 and 21 days postinfection). Expression level changes (X-fold) were calculated as the ratio of the maximum value versus minimum value of reads per kilobase of transcript per million mapped reads (RPKM) during host infection.
We performed GO enrichment analyses for the genes located in selective sweep regions detected using the site-frequency-spectrumbased method (SWEED software) and the iHS based method (REHH software). We used the R packages {GSEABase} and {GOstats} (Falcon & Gentleman, 2007). The significance of enrichments was assessed using hypergeometric tests with a false discovery rate threshold of 0.05. We included only GO terms that were assigned to at least five different genes in the genome.

| Linkage disequilibrium analyses
We analysed the linkage disequilibrium block structure in the selective sweep region affecting the polyketide synthase (PKS) 3 gene cluster on chromosome 5. For this, we used all SNPs with a minor allele frequency >0.1 in the Swiss, Israel and Oregon populations.

| Widespread and narrow selective sweep regions in pathogen populations
We performed genome scans using whole-genome sequence data generated for 123 Z. tritici isolates sampled from single wheat fields in Australia, Switzerland, Israel and Oregon, USA (Figure 1a). We detected a total of 1,375,999 high-confidence, bi-allelic SNP loci that were confirmed by two independent SNP callers. All analyses were restricted to SNPs on core chromosomes (i.e., chromosomes shared among all isolates; Supporting information: Figure S1). Population structure was consistent with the geographic origin of the isolates, and no field population showed evidence of intrapopulation genetic substructure (Figure 1b,c; Supporting information: Figure S2, Supporting information: Note S1). Overall, genetic diversity decreased with the distance from the pathogen centre of origin in the Fertile Crescent and the number of bottlenecks, consistent with previous studies (Figure 1a; Supporting information: Table S2; Supporting information: Figure S3; Zhan et al., 2005;Stukenbrock et al., 2007).
To detect the proportion of the genome that experienced recent positive selection in Z. tritici, we first screened for signatures of selective sweeps in individual populations. For this, we restricted the analyses to SNPs with known ancestral state. Using whole-genome sequence data of four and five strains from the two closely related sister species Z. ardabiliae and Z. pseudotritici, respectively, we were able to assign ancestral alleles for 584,327 SNPs (42.5%; Figure 1a; Supporting information: Table S2). Genome scans potentially produce high numbers of false positives as the demographic history of the population (i.e., population size variation and bottlenecks) can cause signatures resembling selective sweeps (Nielsen, 2005;Vitti et al., 2013). To limit the rate of false positives, we chose methods that incorporate genome-wide patterns of genetic variation, used First, we performed an extended haplotype homozygosity (EHH) test. We computed the integrated haplotype score (iHS) statistic (Voight et al., 2006) using the algorithm implemented in the R pack- Supporting information: Table S3). Sweep regions detected using the iHS statistic were on average 1.2 kb in length and covered a total of 4% of the core genome (Table 1) Figure S4). This is consistent with expectations for signatures produced by selective sweeps. However, we found no overall significant differences in diversity statistics for genes contained in selective sweeps identified by the EHH test, which may be due to the fact that EHH tests have higher sensitivities to detect incomplete sweeps.   Given that populations showed predominantly population-specific signatures of selection, we next analysed evidence for divergent selection among populations. For this, we calculated the XtX statistic implemented in the software BAYPASS that incorporates both the population co-ancestry and demography history (Gautier, 2015;Coop et al., 2010;G€ unther & Coop, 2013). The XtX statistic accounts for population structure by calculating a kinship matrix for a subsample of genome-wide SNPs. We used all detected SNPs filtered for minor allele frequency of 5% (732,840 SNPs). We identified 390 SNPs that were outliers for population differentiation based on simulations and synonymous SNPs (XtX > 16.7; Supporting information: Figure S5).

|
Highly differentiated SNPs clustered into 51 genomic regions distributed over all 13 core chromosomes (Figure 3; Supporting information:    (Rudd et al., 2015). Remarkably, one outlier SNP was located in the major (2) Percentage of genes in 10 kb nonoverlapping windows (gradient shows differences from 0 to 100%). (3) Content of transposable element sequences in 10 kb nonoverlapping windows (gradient shows differences from 0% to 50%). (4-6) Mean values calculated in 1 kb nonoverlapping windows of pairwise F ST for SNPs shared between the Swiss and Israel populations (4), the Swiss and Oregon populations (5), and the Israel and Oregon populations (6). (7) XtX statistics calculated at each SNP locus using the BAYPASS software. The red dots correspond to outlier SNPs [Colour figure can be viewed at wileyonlinelibrary.com] avirulence effector gene AvrStb6 located on chromosome 5 at 69,206 bp (Zhong et al., 2017). Outlier SNPs were also found in three genes of the PKS3 gene cluster on chromosome 5, including the polyketide synthase (5_00021) and a MFS transporter gene (5_00006; Figure 5). A strong XtX outlier region was located on chromosome 5 (Figure 5c). This region showed also high F ST values (>0.75) in the pairwise comparisons of Israel-Oregon and Israel-Switzerland (Figure 5b). A selective sweep of the PKS gene (5_00021) was also detected by the CLR test in the Swiss population (123.3-127.8 kb; Figure 5a). This region was also characterized by high levels of linkage disequilibrium (r 2 > 0.5) for pairs of SNPs at >10 kb distance in the Swiss, Israel and Oregon populations (Figure 5e), whereas linkage disequilibrium decayed generally withiñ 1-2.2 kb . The high levels of linkage disequilibrium are consistent with a recent selective sweep.

| Evidence for host-driven selection in sympatric pathogen populations
As the two sympatric Oregon populations were sampled on the same day from two different wheat cultivars, Madsen and Stephens (Zhan et al., 2005), growing in the same field, we investigated whether genotypes showed signatures of positive selection according to their cultivar of origin. We performed divergence scans using the XtX statistic but found no evidence for highly differentiated loci between populations (Supporting information: Figure S6). Using the cross-population extended haplotype homozygosity (XP-EHH) test (Sabeti et al., 2007) with a 99.9% outlier threshold, we identified 221 SNPs that were under positive selection only in one cultivarassociated population but not the other. Outlier SNPs clustered in 12 genomic regions on chromosomes 1, 2, 6, 9 and 12 (Supporting information: Figure S7; Supporting information: Table S8). A total of 170 of 221 outlier SNPs were located in 16 distinct genes. These genes encoded proteins with either no conserved domain or functions linked to general metabolism and transcriptional regulation.
The gene 9_00014 encoded a cysteine-rich extracellular membrane protein with a CFEM domain (Supporting information: Table S12). Our and previous studies suggest that the number of selective sweeps detected in sexually reproducing fungal pathogen populations is higher than found in populations of mammals or plants (Evans et al., 2014;Qanbari et al. 2014;Bonhomme et al., 2015).
These differences might be due to intrinsic population genetic properties of sexually reproducing fungal pathogens, such as high recombination rates and large population sizes that can accelerate the response to selection. Fungal populations with lower recombination rates will likely show less prevalent signatures of positive selection.
The observed differences might also be due to the genetic architecture of pathogenicity traits and the strength of selection imposed on plant pathogens. In agroecosystems, the introduction of new host cultivars with different resistance mechanisms can rapidly alter selection pressures on pathogen populations. In addition, abiotic factors such as the application of fungicides, and fluctuations in temperature and humidity can alter selection pressures during the life cycle of a pathogen. Our findings were consistent with numerous soft sweeps caused by a variety of environmental factors. Largely due to their high effective population sizes, soft sweeps are thought to be dominant in rapidly evolving microorganisms, including plant pathogens (Delmas et al., 2017;Messer & Petrov, 2013).
We found little overlap between selective sweep regions detected by CLR and EHH tests. Some degree of discordance is expected because the methods are designed to detect different types of selection signatures (i.e., shifts in allele frequency spectra versus large haplotype block size). The CLR test allows detection of hard sweeps, whereas iHS is more powerful for detecting incomplete sweeps (Pavlidis & Alachiotis 2017; Vitti et al., 2013). Little overlap between genomic regions identified with these selection scan methods is common (Evans et al., 2014;Pavlidis & Alachiotis 2017 We found fewer selective sweeps in the less genetically diverse populations from Australia and Oregon Zhan et al., 2005). Smaller population sizes and a slower decay in linkage disequilibrium are expected to make selection less efficient because beneficial mutations are more likely to be in linkage disequilibrium with slightly deleterious mutations. This is consistent with the smaller number of selective sweeps detected in populations with smaller effective population sizes. In addition to true differences in numbers of selective sweeps, the ability to detect selective sweeps can be affected by differences in the number of sampled individuals, levels of genetic diversity and the decay of linkage disequilibrium (Crisci, Poh, Mahajan, & Jensen, 2013). Although our study was based on genome scans that are robust to demographic histories of individual populations, some differences in selection signatures among populations may represent false positives.
We found differences in the number and position of selective sweeps across the Z. tritici genome among the four allopatric populations. Differences in power and demographic history may partly explain this pattern, although it is unlikely the only explanation given the extent of the observed heterogeneity. The identification of genomic regions with high population differentiation using the XtX test confirms that selection pressures were heterogeneous among these populations and favoured different alleles in different populations. We previously identified signatures of divergent selection at loci affected by copy number variation among these four populations . Such divergent selection is likely as the populations experienced differences in fungicide usage, annual mean temperatures and deployed host cultivars (Zhan & McDonald, 2011;Zhan et al., 2005Zhan et al., , 2006. In natural ecosystems, fungal populations collected across ecological gradients also showed differences in the number and identity of loci under positive selection, which is strongly indicative of divergent selection (Branco et al., 2015(Branco et al., , 2017Ellison et al., 2011). Given sufficiently low levels of gene flow, divergent selection leads to locally adapted populations. Evidence for local adaptation in plant-colonizing fungi has mostly been found at the phenotypic level for adaptation to temperature and hosts (Branco et al., 2017), but the loci underlying local adaptation remain largely unknown . Alternatives to local adaptation that can also lead to highly differentiated genomic regions include environment-independent factors such as genetic incompatibilities due to prezygotic or postzygotic isolation, epistasis or underdominance. These factors often spatially coincide with ecological factors, leading to high differentiation among spatially structured populations (Bierne, Welch, Loire, Bonhomme, & David, 2011).
As our system is haploid, allelic incompatibilities due to hybrid infertility, selection against hybrids, or underdominance cannot occur.
However, epistasis among loci under selection may indeed exist and could interfere with the power to detect loci under positive selection. The fact that we analysed complete genomes instead of a reduced set of candidate loci provided significant power to identify novel adaptive loci. Analyses of putative functions showed that many loci in the selected regions were likely to encode functions linked to host and abiotic adaptation. In the end, analysing the contribution of individual loci to phenotypic traits will enable a clear distinction between loci underlying local adaptation and endogenous barriers (Hoban et al., 2016).
The two different hosts cultivated in sympatry in Oregon may have generated divergent selection pressures that favoured host specialization. It is interesting that the two sympatric populations were shown to differ in levels of fungicide resistance and aggressiveness on two other wheat cultivars, suggesting that the cultivars planted in Oregon indeed acted as an agent of divergent selection (Yang, Gao, Shang, Zhan, & McDonald, 2013;Zhan et al., 2005).
However, the physical proximity of the two cultivars in the field and the ability of Z. tritici to disperse likely constrained the opportunity for local adaptation to either of the two host populations. We identi- showed that many of these genes were strongly upregulated upon host infection. In Z. tritici, association mapping showed that mutations in two genes encoding SSPs (8_609 and AvrStb6) were associated with host-specific virulence Zhong et al., 2017). These two genes were not found in selective sweep regions, which suggests that these populations were not experiencing recent positive selection by the host genotypes specifically associated with 8_609 and AvrStb6. Remarkably, AvrStb6 was an outlier for divergent selection among populations in the XtX scan, which suggests that in at least one population selection on the AvrStb6 locus may have been significant. AvrStb6 encodes a protein recognized by the product of the host resistance gene Stb6 and drastically reduces pathogen fitness on hosts carrying Stb6 (Zhong et al., 2017). Several codons of the AvrStb6 gene were shown to be under positive diversifying selection (Brunner & McDonald, 2018).
We found no evidence for selective sweeps in the proximity of the second known host-specific virulence gene 8_609. The 8_609 locus is characterized by large chromosomal rearrangements that led to the adaptive deletion of the virulence gene . Environmental factors are also likely to impose significant selection pressure (Stukenbrock & McDonald, 2009). Plant pathogens must cope with fluctuating temperatures and humidity, the host microbiota, variability in nutrients obtained from the host and agricultural practices (e.g., fertilizer and fungicide applications). This is reflected by the selective sweep regions that encompass genes with diverse functions that are most likely unrelated to host adaptation.
In particular, we found an enrichment in transmembrane transport functions. Transmembrane transporters are involved in a wide range of cellular processes including obtaining nutrients during the infection (Chen et al., 2010), efflux of toxic compounds and fungicides (Del Sorbo, Schoonbeek, & De Waard, 2000), virulence (Wahl, Wippel, Goos, K€ amper, & Sauer, 2010), environment sensing and stress responses (Bahn et al., 2007). Genome scans of populations of the root symbiont S. brevipes showed that adaptation to cold, salinity and water stress was most likely mediated by a series of transmembrane transporters (Branco et al., 2015(Branco et al., , 2017. However, none of the transporter genes in the identified selective sweeps have yet been experimentally shown to affect adaptation in Z. tritici. Regions affected by selective sweeps also encoded proteins playing a role in regulatory pathways (e.g., transcription factors) and secondary metabolite production. The roles of secondary metabolites produced by Z. tritici remain poorly understood, but the clusters are likely to play roles in host adaptation, competition with microbes and environmental stress (Howlett, 2006). The PKS1 cluster is involved in melanin production in Z. tritici (Butler & Day, 1998;Lendenmann et al., 2014), but compounds produced by the four secondary metabolite clusters found in selective sweep regions remain unknown. These four clusters are upregulated at different stages of the infection process, suggesting distinctive roles in host colonization . In addition to evidence for selection, the PKS9 cluster is affected by a large-scale deletion polymorphism affecting the entire cluster . The polyketide synthase gene of the PKS3 cluster was an outlier for population differentiation and was affected by a selective sweep. This gene had a peak of expression during the early establishment of the infection and may be involved in the production of a toxin targeting the host . Exposure to azole fungicides led to high levels of resistance in the Swiss population, but not in any other analysed population (Zhan et al., 2005). Therefore, sweeps due to selection for azole resistance should be restricted to the Swiss population. The CYP51 gene encoding the target of azoles is indeed highly polymorphic in the Swiss population, with a large number of amino acid substitutions contributing to resistance (Brunner et al., 2008;Cools & Fraaije, 2013;Lendenmann et al., 2015). Although we found no direct evidence of positive selection in the CYP51 locus, multiple SNPs had high XtX population differentiation values (>10).
EHH and CLR tests may be unsuitable to detect the very rapid diversification into dozens of haplotypes that are typically observed at the CYP51 locus.
Heterogeneity in both biotic and abiotic stresses creates the opportunity for local adaptation in plant pathogens Mboup et al., 2012). We found evidence for divergent selection among four allopatric populations with selection likely due to both abiotic factors and host resistance mechanisms. We also found evidence that different host genotypes in sympatry caused divergent selection pressure on the pathogen. The widespread evidence for selection across the genome suggests that the high effective population sizes of the pathogen strongly favoured rapid responses to divergent selection pressures. Loci under divergent selection among populations may ultimately constitute the genetic basis for local adaptation, but may also reveal constraints in the evolution of the pathogen. For example, selection for fungicide resistance is known to lead to correlated responses that negatively affect growth rates (Mohd-Assaad,  or virulence (Hagerty & Mundt, 2016). Using genome scans to identify genes under recent selection in pathogens will fill important gaps in our understanding of host-pathogen interactions and their evolutionary trajectory (Hall, Bento, & Ebert, 2017