The genomic determinants of adaptive evolution in a fungal pathogen

Background Antagonistic host-pathogen co-evolution is a determining factor in the outcome of infection and shapes genetic diversity at the population level of both partners. Little is known about the overall genomic rate of evolution in pathogens or the molecular bases of rapid adaptation. Here we apply a population genomic approach to infer genome-wide patterns of selection among thirteen isolates of the fungal pathogen Zymoseptoria tritici. Results Based on whole genome alignments, we report extensive presence-absence polymorphisms of genes resulting from a high extent of within population karyotypic variation. We apply different test statistics based on the distribution of non-synonymous and synonymous polymorphisms (pN/pS) and substitutions (dN/dS) to estimate rates of adaptation and identify specific targets of selection. We report that the pN/pS ratio of sites under purifying selection negatively correlates with the local recombination rate, as expected under a scenario of background selection. Contrasting polymorphism and divergence with a closely related species, we estimate that 44% of substitutions in the proteome of Z. tritici are adaptive and we find that the rate of adaptation is positively correlated with recombination rate. The proportion of adaptive amino acid substitutions in genes encoding determinants of pathogenicity is as high as 68% underlining the importance of positive selection in the evolution of virulence-associated traits. Using a maximum likelihood approach and codon models of sequence evolution, we furthermore identify a set of 786 additional genes showing signatures of diversifying selection, which we find are preferentially located in region of high recombination rates. Conclusions We report a high rate of adaptive mutations in the genome of Z. tritici, in particular in genes involved in host-pathogen interactions. Furthermore, we identify a set of genes with signature of balancing / diversifying selection, which we find are related to secondary metabolism and host infection. Finally, we report a strong impact of recombination on the efficacy of purifying, adaptive and diversifying selection, suggesting pervasive effect of linked selection in the genome and thereby highlighting the importance of recombination in this pathogen species.

selection. The genic signature of positive diversifying selection is an increased proportion of 114 non--synonymous polymorphisms [17]. Yet, in a functionally important gene, the majority of 115 non--synonymous sites will be evolving under selective constraints, and only a small number 116 will be allowed to change without negatively affecting protein function. Therefore, in most 117 cases, searching for signatures of positive diversifying selection simply by comparing the 118 average proportion of non--synonymous to synonymous polymorphisms in a gene will show 119 little statistical power [18]. To circumvent this, evolutionary codon models that allow for 120 varying selection intensity across a gene sequences can be applied to identify specific codon 121 positions that have an excess of non--synonymous to synonymous mutations (a higher pN/pS 122 ratio) [18]. 123 In this study we assess the different types of selection affecting the genome of a fungal wheat

161
Genetic structure of Z. tritici populations 162 We generated a population genomic dataset of thirteen Z. tritici isolates obtained from 163 different field populations in Europe and Iran (Table  S1). Given the high extent of structural 164 variation in genomes of Z. tritici isolates, we assembled each genome de novo to capture 165 structural differences. We used the resulting thirteen genome assemblies to generate a 166 6 filtered multiple genome alignment of 27 Mb (Table S2) comprising a total of 1,489,362 SNPs 167 of which approximately 50% locate in protein coding regions (Table S1). The SNP data was 168 used to compute the overall genetic diversity of the sample showing a mean value of π = 169 0.022 per site. 170 Several studies have addressed the population genetic structure of Z. tritici at different spatial 171 scales (e.g. [26,27]). These studies have concluded that Z. tritici comprises a panmictic 172 population with very little population structure. We here used the genomic data to 173 investigate the relationship of the Z. tritici isolates. First, we assessed population genetic 174 structure by analyzing the ancestral recombination graph of the thirteen genomes. To this 175 end, we slid 10 kb windows along the multiple genome alignment, and estimated the 176 genealogy for each window. The resulting 1,850 trees were combined into a super tree 177 ( Figure  1A). If the sample of genomes is taken from a panmictic population, the super tree is 178 expected to be a star tree. However, here we observe at least two clusters, one comprising all 179 European isolates and the other comprising two isolates collected in Iran ( Figure 1A). We  chromosome fragments and genes among the thirteen isolates. We focused on non--repetitive 201 DNA and show, in agreement with a previous study [35], that 96.4% of the core chromosomes 202 (chromosome 1 to 13) are shared among all isolates ( Figure S1). However, as expected, the 203 mapping of assembled contigs aligning to the accessory chromosomes varied considerably 204 between isolates ( Figure S1).

205
Overall, 9,499 of the 11,839 genes predicted in the reference genome of IPO323 are present 206 in all of the thirteen isolates. Of the remaining, 1,584 genes are located on core chromosomes 207 and 667 on accessory chromosomes. This set of genes comprises genes present at different 208 frequencies among the thirteen Z. tritici isolates including 89 genes present only in IPO323 209 and of which 77 locate on core chromosomes.

210
A previous genome comparison of two Swiss Z. tritici isolates revealed nearly 300 novel genes 211 not characterized in the reference genome [36]. We speculated that the different isolates of 212 Z. tritici also could contain accessory genes and chromosomes not present in the reference 213 IPO323. To test this we investigated the subset of contigs that were not aligned to the 214 reference genome. We found a total of 5.2 Mb in the twelve re--sequenced Z. tritici isolates 215 with no homology to sequences in the reference genome. These new Z. tritici sequences, 216 corresponding to accessory fragments, encompass between 602 kb and 995 kb of non--217 repetitive DNA in each genomes. We applied an ab initio gene predictor to assess whether 218 these accessory sequences encode proteins and found evidence for 1,041 novel full--length 219 gene models encoding proteins of at least 30 amino acids. Of these 75% encode proteins 220 without known function consistent with a general pattern observed on the accessory 221 chromosomes with a high proportion of gene models without a predicted function [19].

222
Isolate--specific fragments were previously shown to encode effector candidate genes strongly  Measures of adaptive evolution in Z. tritici 231 We next aimed to obtain a quantitative assessment of adaptive mutations in the genome of Z.     Table 2) showing that highly expressed genes are more conserved in Z. tritici. 328 We furthermore find a highly significant negative effect of recombination rate on ω0, 329 implying that purifying selection is more efficient in regions with high recombination rate Lastly, we find that GC content is significantly negatively correlated with ω0, that is, genes 338 with higher GC tend to be under stronger purifying selection. This may be explained by an   We fitted codon models for genes present in at least three isolates (83% of genes located on 361 core chromosomes and 31% of genes on the accessory chromosomes) (Table S3) (Table  3), and three of them 396 appeared to have a highly significant effect. First, we report that highly expressed genes are 397 less likely to be under positive selection, which we hypothesise is due to a higher level of 398 selective constraint. Next, we find that genes in regions with higher recombination rates are 399 more likely to be under positive diversifying selection, which we interpret as evidence for a   In a previous study the evolutionary history of Z. tritici was inferred using a whole genome 443 coalescence analyses [16]. We showed that divergence of Z. tritici and its sister species Z.          coordinates. Based on 10,000 random permutations, it appeared that only clusters containing 616 more than three genes were significant at the 5% level. Simulation with recombination 659 We used the MS program in order to simulated ancestral recombination graphs (ARGs) for 660 thirteen individuals and a given recombination rate [82]. The resulting ARGs were used to             Figure S1: Chromosome polymorphism in the twelve re--sequenced genomes compared to