The transfer of a mitochondrial selfish element to the nuclear genome and its consequences

Homing endonucleases (HE) are enzymes capable of excising their encoding gene and inserting it in a highly specific target sequence. As such, they act both as intronic sequences (type-I introns) and selfish invasive elements. HEs are present in all three kingdoms of life and viruses; in eukaryotes, they are mostly found in the genomes of mitochondria and chloroplasts, as well as nuclear ribosomal RNAs. We here report the case of a HE that integrated into a telomeric region of the fungal maize pathogen Ustilago maydis. We show that the gene has a mitochondrial origin, but its original copy is absent from the U. maydis mitochondrial genome, suggesting a subsequent loss or a horizontal transfer. The telomeric HE underwent mutations in its active site and acquired a new start codon, but we did not detect significant transcription of the newly created open reading frame. The insertion site is located in a putative RecQ helicase gene, truncating the C-terminal domain of the protein. The truncated helicase is expressed during infection of the host, together with other homologous telomeric helicases. This unusual homing event represents a singular evolutionary time point: the creation of two new genes whose fate is not yet written. The HE gene lost its homing activity and can potentially acquire a new function, while its insertion created a truncated version of an existing gene, possibly altering its original function.


Introduction
The elucidation of the mechanisms at the origin of genetic variation is a longstanding goal of molecular evolutionary biology. Mutation accumulation experiments -together with comparative analysis of sequence data -are instrumental in studying the processes shaping genetic diversity at the molecular level (Kondrashov and Kondrashov 2010;Eyre-Walker and Keightley 2007). They revealed that the spectrum of mutations ranges from single nucleotide substitutions to large scale chromosomal rearrangements, and encompasses insertions, deletions, inversions, and duplication of genetic material of variable length (Lynch et al. 2008). Mutation events may result from intrinsic factors such as replication errors and repair of DNA damage. In some cases, however, mutations can be caused or favored by extrinsic factors, such as mutagenic environmental conditions or parasitic genome entities like viruses or selfish mobile elements. Such particular sequences, able to replicate and invade the host genome, may have multiple effects including inserting long stretches of DNA that do not encode any organismic function, but also disrupting, copying and moving parts of the genome sequence. These selfish element-mediated mutations can significantly contribute to the evolution of their host: first, the invasion of these elements creates "junk" DNA that can significantly increase the genome size (Lynch 2007), and some of this material can be ultimately domesticated and acquire a new function, beneficial to the host (Kaessmann 2010;Volff 2006). Second, the genome dynamics resulting from the activity of these elements can generate novelty by gene duplication (Ohta 2000;Dutheil et al. 2016) or serve as a mechanism of parasexuality and compensate for the reduced diversity in the absence of sexual reproduction (Dong et al. 2015;Möller and Stukenbrock 2017). Finally, mechanisms that evolved to control these elements (such as repeat-induced point mutations in fungi (Gladyshev 2017)) may also incidentally affect genetic diversity (Grandaubert et al. 2014).
Selfish elements whose impact on genome evolution is less well documented are the homing endonuclease genes (HEG), encoding a protein able to recognize a particular genomic DNA sequence and cut it (homing endonuclease, HE). The resulting double-strand break is subsequently repaired by homologous recombination using the HEG itself as a template, resulting in its insertion in the target location (Stoddard 2005). As the recognized sequence is highly specific, the insertion typically happens at a homologous position. In this process, a heg + element containing the endonuclease gene converts a hegallele (devoid of HEG but harbouring the recognition sequence) to heg + , a mobility mechanism referred to as homing (Dujon et al. 1989). After the insertion, the host cell is homozygous heg + , and the HEG segregates at a higher frequency than the Mendelian rate (Goddard and Burt 1999). The open reading frame of the HEG is included in a sequence capable of self-splicing, either at the RNA or protein level, avoiding disruption of functionality when inserted in a protein-coding gene. This mechanism results in the so-called group-I introns or inteins, respectively (Chevalier and Stoddard 2001;Stoddard 2005). The dynamic of HEGs has been well described, and involves three stages: (i) conversion from hegto heg + by homing activity, (ii) degeneration of the HEG leading to the loss of homing activity, but still protecting against a new insertion because the target is altered by the insertion event and (iii) loss of the HEG leading to the restoration of the hegallele (Gogarten and Hilario 2006;Barzel et al. 2011). This cycle leads to recurrent gains and losses of HEG at a given genomic position, and ultimately to the loss of the HEG at the population level unless new genes invade from other locations or by horizontal gene transfer (Gogarten and Hilario 2006).
HEGs are found in all kingdoms of life as well as in the genomes of organelles, mitochondria and chloroplasts (Stoddard 2005;Lambowitz and Belfort 1993;Belfort and Roberts 1997). In several fungi, HEGs are residents of mitochondria. Here, we study the molecular evolution of a HEG from the fungus Ustilago maydis, which serves as a model for the elucidation of (1) fundamental biological processes like cell polarity, morphogenesis, organellar targeting, and (2) the mechanisms allowing biotrophic fungi to colonize plants and cause disease (Steinberg and Perez-Martin 2008;Djamei and Kahmann 2012;Vollmeister et al. 2012;Ast et al. 2013). U. maydis is the most well-studied representative of smut fungi, a large group of plant pathogens, because of the ease by which it can be manipulated both genetically and through reverse genetics approaches (Vollmeister et al. 2012). Besides, its compact, fully annotated genome comprises only 20.5 Mb and is mostly devoid of repetitive DNA (Kämper et al. 2006). The genome sequences of several related species, Sporisorium reilianum, S. scitamineum and Ustilago hordei causing head smut in corn, smut whip in sugarcane and covered smut in barley, respectively, provide a powerful resource for comparative studies (Schirawski et al. 2010;Laurie et al. 2012;Dutheil et al. 2016). We report here the case of a gene from U. maydis, which we demonstrate to be a former mitochondrial HEG recently integrated into the nuclear genome. The integration of the gene has truncated the gene containing the insertion, followed by inactivation of the endonuclease active site, which generated a new open reading frame that contains the DNA-binding domain of the HEG (Derbyshire et al. 1997).

Results
We report the analysis of the nuclear gene UMAG_11064 from the smut fungus U. maydis, which was identified as an outlier in a whole-genome analysis of codon usage. We first provide evidence that the gene is a former HEG and then reconstruct the molecular events that led to its insertion in the nuclear genome using comparative sequence analysis. Finally, we assess the phenotypic impact of the insertion event.
The UMAG_11064 nuclear gene has a mitochondrial codon usage.
We studied the synonymous codon usage in protein-coding genes of the smut fungus U. maydis, using within-group correspondence analysis. As opposed to other methods, within-group correspondence analysis allows to compare codon usage while adequately taking into account confounding factors such as variation in amino-acid usage (Perrière and Thioulouse 2002). We report a distinct synonymous codon usage for nuclear genes and mitochondrial genes ( Figure 1A), with the notable exception of the nuclear gene UMAG_11064, which displays a typical mitochondrial codon usage. The UMAG_11064 gene is located in the telomeric region of chromosome 9, with no further downstream annotated gene ( Figure 1B). It displays a low GC content of 30%, which contrasts with the GC content of the flanking regions (50%) and the rather homogeneous composition of the genome sequence of U. maydis as a whole. It is, however, in the compositional range of the mitochondrial genome ( Figure 1B). Altogether, the synonymous codon usage and GC content of UMAG_11064 suggest a mitochondrial origin.
In order to confirm the chromosomal location of UMAG_11064, we amplified and sequenced three regions encompassing the gene using primers within the UMAG_11064 gene and primers in adjacent chromosomal genes upstream and downstream of UMAG_11064 ( Figure S1).
The sequences of the amplified segments were in full agreement with the genome sequence of U.
maydis (Kämper et al. 2006), thereby ruling out possible assembly artefacts in this region.
Surprisingly, the sequence of UMAG_11064 has no match in the mitochondrial genome of U. maydis (GenBank entry NC_008368.1), which suggests that UMAG_11064 is an authentic nuclear gene. As both the GC content and synonymous codon usage of UMAG_11064 are indistinguishable from the ones of mitochondrial genes and have not moved toward the nuclear equilibrium, the transfer of the gene to its nuclear position must have occurred recently.

The UMAG_11064 gene contains parts of a former GIY-YIG homing endonuclease
To gain insight into the nature of the UMAG_11064 gene, its predicted nucleotide sequence was searched against the NCBI non-redundant nucleotide sequence database. High similarity matches were found in the mitochondrial genome of three other smut fungi (Supplementary Table S1): S. reilianum (87% identity), S. scitamineum (79%), and U. bromivora (76%). Two other very similar sequences were found in the mitochondrial genome of two other smut fungi, Tilletia indica and Tilletia walkeri, as well as in mitochondrial genomes from other basidiomycetes (e.g. Laccaria bicolor) and ascomycetes (e.g. Leptosphaeria maculans, see Supplementary Table S1). The protein sequence of UMAG_11064 shows high similarity with fungal HEGs, in particular of the so-called GIY-YIG family (Supplementary Table S2) (Stoddard 2005).
The closest fully annotated protein sequence matching UMAG_11064 corresponds to the GIY-YIG HEG located in intron 1 of the cox1 gene of Agaricus bisporus (I-AbiIII-P). The amino-acid sequence of UMAG_11064 matches the N-terminal part of this protein containing the DNA-binding domain of the HE (Derbyshire et al. 1997). As the GC profile of UMAG_11064 suggests that the upstream region also has a mitochondrial origin ( Figure 1B), we performed a codon alignment of the 5' region with the full intron sequence of A. bisporus, T. indica and T. walkeri as well as the sequence of I-AbIII-P in order to search for putative traces of the activity domain of the HE ( Figure   2). We used the Macse software (Ranwez et al. 2011) to infer codon alignment in the presence of frameshifts. We found that the intergenic region between UMAG_11065 and UMAG_11064 displays homology to the activity domain of other GIY-YIG HE, and contains remnants of the former active site of the type GVY-YIG ( Figure 2). Compared to I-AbiIII-P and homologous sequences in Tilletia, however, a frameshift mutation has occurred in the active site (a 7 bp deletion). The predicted gene model for UMAG_11064, therefore, starts at a conserved methionine position, 14 amino-acids downstream of the former active site ( Figure 2). Altogether, these results suggest that UMAG_11064 is a former HE which inserted into the nuclear genome, was then inactivated by a deletion in its active site and acquired a new start codon.
The UMAG_11064 gene is similar to an intronic mitochondrial sequence of S. reilianum cox1 gene of the smut fungus S. reilianum while this sequence was absent in the mitochondrial genome of U. maydis. The cox1 genes of S. reilianum and U. maydis both have eight introns, of which only seven are homologous in position and sequence ( Figure 3). S. reilianum has one extra intron in position 1, while U. maydis has one extra intron in position 6. In U. maydis all introns but the sixth one are reported to be of type I, i.e. contain a HEG which is responsible for their correct excision. A blast search of this intron's sequence, however, revealed similarity with a homing endonuclease of type LAGLIDADG (Supplementary Table S4). In S. reilianum, intron 1 (the putative precursor of UMAG_11064) and intron 2 are not annotated as containing a HEG. Blast searches of the corresponding sequences, however, provided evidence for homology with a GIY-YIG HE (Supplementary Table S5) and a LAGLIDADG HE, respectively (Supplementary Table   S6).
Furthermore, intron 1 in S. reilianum was not detected in U. maydis. A closer inspection showed that the ORF could be aligned with related HEs (Figure 2). This alignment revealed an insertion of four amino-acids, a deletion of the first glycine residue in the active site plus several frameshifts at the beginning of the gene, which suggests that this gene has been altered and might not encode a functional HE any longer.

UMAG_11064 inserted into a gene encoding a RecQ helicase
In order to study the effect of the HEG insertion in the nuclear genome, we looked at the genomic environment of the UMAG_11064 gene. Downstream of UMAG_11064 are telomeric repeats, while the next upstream gene, UMAG_11065, is uncharacterized. A similarity search for UMAG_11065 detected 13 homologous sequences in the U. maydis genome (including one, UMAG_12076, on an unmapped contig), but only low-similarity matches in other sequenced smut fungi (see Methods). The closest non-smut related sequence comes from a gene from Fusarium oxysporum. We inferred the evolutionary relationships between the 14 genes by reconstructing a maximum likelihood phylogenetic tree, and found that the UMAG_11065 gene is closely related to  Table 1). The UMAG_04486 gene, however, is predicted to be almost six times as long as UMAG_11065, suggesting that the latter was truncated because of the UMAG_11064 insertion. A search for similar sequences of UMAG_11065 and its relatives in public databases revealed homology with so-called RecQ helicases (Supplementary Table S3), enzymes known to be involved in DNA repair and telomere expansion (Singh et al. 2012). While this function is only predicted by homology, we note that all 12 chromosomal recQ related genes are located very close to telomeres in U. maydis (Table 1), suggesting a role of these gene in telomere maintenance (Sánchez-Alonso and Guzmán 1998).
Interestingly, this gene family also contains the gene UMAG_03394, which is located four genes upstream of UMAG_11065. Chromosome 9 appears to be the only chromosome with two helicase genes on the same chromosome end (Table 1).

U. maydis populations shows structural polymorphism in the telomeric region of chromosome 9
Because the UMAG_11064 gene still displays a strong signature of its mitochondrial origin (codon usage and GC content), its transfer most likely occurred recently. In order to provide a timeframe for the insertion event, we examined the structure of the genomic region of the insertion in other U. maydis and S. reilianum isolates, as well as the structure of the cox1 exons 1, 2 and 7.
The regions that could be amplified and their corresponding sizes are listed in Table 2. The UMAG_11064 gene is present in the FB1-derived strain SG200, as well as the Holliday strains 518 and 521, but is absent in nuclear as well mitochondrial genome sequences of a recent U. maydis isolate from the US, strain 10-1, as well as from 5 Mexican isolates (I2, O2, P2, S5 and T6, Figure   S2A). The UMAG_11072 gene, however, which is located further away from the telomere on the same chromosome arm, could be amplified in all strains ( Figure S2B). All U. maydis strains possess intron 6 in the mitochondrial cox1 gene, which is absent in S. reilianum, while the three S. reilianum strains tested carry intron 1, that is absent in all U. maydis strains ( Figure S2C-D). These results suggest that the UMAG_11064 gene inserted in an ancestor of the two strains 518 and 521, after the divergence from other U. maydis strains, an event that occurred very recently. Moreover, the most direct descendant of the progenitor of the HEs, i.e. intron 1 in the cox1 gene, could not be found in any of the sequenced mitochondrial genomes of U. maydis strains, while it is present in the three sequenced S. reilianum strains ( Figure S2).

Functional characterization
To shed light on the functional implication of the translocation of the HEG and subsequent mutations we (i) assessed the expression profile of these genes and (ii) generated a deletion strain and phenotyped it. For the expression analysis we relied on a previously published RNASeq data set (Lanver et al. 2018), from which we extracted the expression profiles of genes in the telomeric region of chromosome 9 ( Figure 5A). While the expression of UMAG_11064 remained close to zero in the three replicates, expression of UMAG_11065 increased during plant infection. The telomeric region was highly heterogeneous in terms of expression profile: while UMAG_11066 and UMAG_03393 did not show any significant level of expression, UMAG_03392 was down-regulated starting at twelve hours post-infection, while UMAG_03394, another RecQencoding gene homologous to UMAG_11065, displayed constitutively high levels of expression ( Figure 5A). All homologs of UMAG_11065 show a significantly higher expression during infection (Tukey's posthoc test, false discovery rate of 5%, Figure 5B). The comparison of expression profiles revealed two main classes of genes ( Figure 5C): highly expressed genes (upper group), and moderately expressed genes (lower group), to which UMAG_11065 belongs. We further note that the differences in expression profiles do not mirror the protein sequence similarity of the genes (Mantel permutation test, p-value = 0.566).
To assess the function UMAG_11064 and UMAG_11065 were simultaneously deleted in SG200, a solopathogenic haploid strain that can cause disease without a mating partner (Kämper et (Kämper 2004). Gene deletion was verified by Southern analysis ( Figure S3). Virulence assays, conducted in triplicate revealed no statistically different symptoms of SG2001106511064 compared to SG200 in infected maize plants ( Figure   6A, Chi-square test, p-value = 0.453). Since RecQ helicases contribute to dealing with replication stress (Kojic and Holloman 2012) we also determined the sensitivity of the mutant to various stressors including UV, hydroxyurea and Congo Red. ( Figure 6B). We report that the deletion strain shows increased sensitivity to cell wall stress induced by Congo Red and increased resistance to UV stress. Since UMAG_11064 does not show any detectable level of expression, we hypothesize that the deletion of UMAG_11065 is responsible for this phenotype.

Discussion
The codon usage and GC content of the UMAG_11064 gene, as well as its similarity to known mitochondrial HEGs, points at a recent transfer into the nuclear genome of U. maydis.
Moreover, the precursor of this gene is absent from the mitochondrial genome of this species. To explain this pattern, we propose a scenario involving a transfer of the gene to the nuclear genome followed by a loss of the mitochondrial copy ( Figure 7). We hypothesize that the mitochondrial HEG was present in the U. maydis ancestor. The evolutionary scenario involves two events: the insertion of the HEG into the nuclear genome, on the one hand, creating a HEG + genotype at the nuclear locus (designated [HEG + ] nuc ), and the loss of the mitochondrial copy, creating a HEGgenotype at the mitochondrial locus (designated [HEG -] mit ). These two events might have happened HEG was not ancestral to U. maydis, but was horizontally transferred from S. reilianum (or a related species). In support of this hypothesis is the high similarity of the UMAG_11064 gene to the S. reilianum mitochondrial HEG (Figure 2), which contrasts with the relatively high nucleotide divergence between the two species, which diverged around 20 My ago .
Besides, it is worth noting that U. maydis and S. reilianum share the same host, and that hybridization between smut species has been reported (Fischer 1957;Boidin 1986).
HEGs are found in eukaryotic nuclei but are usually restricted to small and large ribosomal RNA subunit genes (Lambowitz and Belfort 1993;Dunin-Horkawicz et al. 2006). While transfer of DNA segments and functional genes from organellar genomes to the nucleus is well Contrasting with this result, the insertion of the GIY-YIG HEG that inserted into the ancestor of the UMAG_11065 gene potentially had non-neutral effects, resulting in an expressed truncated protein.
The sequence of UMAG_11064 suggests a recent transfer into the nuclear genome, but finding several mutations within the active site, the encoded protein is unlikely to be functional. As no significant level of expression was measured for this gene, this newly acquired gene is most likely undergoing pseudogenisation. However, as this mitochondrial HEG inserted into a nuclear U. maydis gene, it might have had phenotypic consequences not directly due to the HEG gene itself.
The UMAG_11065 gene appeared to have been truncated by the HEG insertion, which removed the C-terminal part of the encoded protein, and the truncated UMAG_11065 is expressed during infection. While we were unable to detect a contribution to virulence, our results point at a putative role of the truncated RecQ helicase into stress tolerance, as it increases both resistance to UV radiation and susceptibility to cell wall stress. We hypothesise that the first effect is possibly due to the truncated UMAG_11065 protein interfering with telomere maintenance, making the cell more susceptible to UV damage. How the truncated UMAG_11065 RecQ helicase could improve coping with cell wall stress, however, remains to be investigated, as well as the potential fitness benefit or cost of these phenotypes.

Conclusions
In this study, we report instances of two stages of the life cycle of HEGs. Intron 1 of the mitochondrial cox1 gene of S. reilianum was shown to contain a degenerated GIY-YIG HEG, while the homologous position in the U. maydis gene displays no intron. Besides, in the telomeric region of chromosome 9 of the nuclear genome of U. maydis, we found evidence of a recent migration of a very similar GIY-YIG HEG. This very rare event could be uncovered thanks to its recent occurrence and the singularly homogeneous composition of the U. maydis nuclear genome. It likely represents a snapshot of evolution, when a mutational event occurred, but selection did not have time yet to act. The future of this insertion remains, therefore, to be written. Its absence in any field isolates of U. maydis sequenced so far suggests that either the mutation was lost in natural populations, or that it occurred in the lab after the selection of the original Holliday strains. These results demonstrate that HEGs, like other mobile elements, may represent a so far understudied source of genetic diversity.  (Mewes et al. 2011). Mitochondrial genes were extracted from the U. maydis full mitochondrial genome (Genbank accession number: NC_008368.1). Within-group correspondence analysis of synonymous codon usage was performed using the ade4 package for R, following the procedure described in (Charif et al. 2005). The proportion of G and C nucleotides was computed along with the first 10 kb of U. maydis chromosome 9, using 300 bp windows slid by 1 bp. The corresponding R code is available as Supplementary File S1.

Strains, growth conditions and virulence assays
The haploid S. reilianum strains SRZ1 and SRZ2 as well as the solopathogenic strain JS161 derived from SRZ1 have been described (Schirawski et al. 2010). Deletion mutants were generated by gene replacement using a PCR-based approach and verified by Southern analysis (Kämper 2004 um11064_lb_fw and um11065_rb_rv, transformed into SG200 and transformants carrying a deletion of UMAG_11064 and UMAG_11065 were identified by southern analysis ( Figure S3).
U. maydis strains were grown at 28°C in liquid YEPSL medium (0.4% yeast extract, 0.4% peptone, 2% sucrose) or on PD solid medium (2.4% Potato Dextrose broth, 2% agar). Stress assays were performed as described in (Krombach et al. 2018). Transformation and selection of U. maydis transformants followed published procedures (Kämper et al. 2006). To assess virulence, seven day old maize seedlings of the maize variety Early Golden Bantam (Urban Farmer, Westfield, Indiana, USA) were syringe-infected. At least three independent infections were carried out and disease symptoms were scored according to Kämper et al. (Kämper et al. 2006). Consistence of replicates was tested using a chi-squared test and p-values were computed using 1,000,000 permutations. As no significant difference between replicates was observed (p-value = 0.347 for the wildtype and pvalue = 0.829 for the deletion strain), observation were pooled between all replicates for each strain before being compared.

Blast searches and gene alignment
We performed BlastN and BlastP (Altschul et al. 1990) searches using the (translated) sequence of UMAG_11064 as a query using NCBI online blast tools. The non-redundant nucleotide and protein sequence databases were selected for BlastN and BlastP, respectively. Results were further processed with scripts using the NCBIXML module from BioPython modules (Cock et al. 2009). The Macse codon aligner (Ranwez et al. 2011) was used in order to infer the position of putative frameshifts in the upstream region of UMAG_11064. The alignment was depicted using the Boxshade software and was further manually annotated. The sequences of U. maydis cox1 intron 6, as well as S. reilianum cox1 introns 1 and 2 were used as query and searched against the protein non redundant database using NCBI BlastX, excluding environmental samples and model sequences.
The cox1 genes from U. maydis and S. reilianum were aligned and pairwise similarity was synteny and local pairwise similarity was depicted using the genoPlotR package for R (Guy et al. 2010).

History of the UMAG_11065 family
The sequence of the UMAG_11065 protein was used as a query for a search against several smut fungi (U. maydis, U. hordei, S. reilianum, S. scitamineum, Melanopsichum pennsylvanicum, Pseudozyma flocculosa), complete proteome using BlastP (Altschul et al. 1990). The search finds 17 hits within the U. maydis genome with an E-value below 0.0001, as well as two genes in Sporisorium scitamineum (SPSC_04622 and SPSC_05783) and two genes in Pseudozyma flocculosa (PFL1_06135 and PFL1_02192). Using NCBI BlastP, we found several sequences from Fusarium oxyparum with high similarity. We selected the sequence FOXG_04692 as a representative and added it to the data set. The Guidance web server with the GUIDANCE2 algorithm was then used to align the protein sequences and assess the quality of the resulting alignment. Default options from the server were kept, selecting the MAFFT aligner (Katoh et al. 2002). Several sequences appeared to be of shallow alignment quality and were discarded. The remaining sequences were realigned using the same protocol. Four iterations were performed until the final alignment had a quality good enough for phylogenetic inference. The final alignment contained 14 sequences and had a global score of 0.79. These 14 alignable sequences contained 13 U. maydis sequences (including UMAG_11065), and the F. oxysporum gene, other sequences from smut genomes were too divergent to be unambiguously aligned. Using Guidance, we further masked columns in the alignment with a score below 0.93 (a maximum of one position out of 14 in the column was allowed to be uncertain).
A phylogenetic analysis was conducted using the program Seaview 4 (Gouy et al. 2010).
First, a site selection was performed in order to filter regions with too many gaps, leaving 506 sites.
as of neighbouring genes and paralogs elsewhere in the genome, were extracted from the Gene Expression Omnibus data set GSE103876 (Lanver et al. 2018). Gene clustering based on expression profiles was conducted using a hierarchical clustering with an average linkage on a Canberra distance, suitable for expression counts, as implemented in the 'dist' and 'hclust' functions in R (R Core Team 2018). The resulting clustering tree was converted to a distance matrix and compared to the inferred phylogeny of the genes using a Mantel permutation test, as implemented in the 'ape' package for R (Paradis et al. 2004). Differences in expression between time points were assessed by fitting the linear model "expression ~ time * gene", testing the effect of time while controlling for interaction with the "gene" variable. Residuals were normalized using a Box-Cox transform as implemented in the MASS package for R. Tukey's posthoc comparisons were conducted on the resulting model, allowing for a 5% false discovery rate.
repair and telomere maintenance.  (1) Position reported to the length of the chromosome or contig.

Tables
(2) N-terminal fragment only. Plus and minus signs indicate whether the corresponding gene could be amplified or not.
Numbers indicate the size of the amplified region in base pairs. the NCBI non-redundant nucleotide database, using BlastN. All hits with an E-value lower than 1E-04 are included, alongside with corresponding alignment length and percentage of sequence identity.

Supplementary
Supplementary Table S2: Homology search results using UMAG_11064 as a query on NCBI non-redundant protein database, using BlastP. All hits with an E-value lower than 1E-04 are included, alongside with corresponding alignment length and percentage of sequence identity.   Y H F  F  I T --W T N N  S Y  I  V  D N T I P H E  I-AbiIII-P  -------I D  H N I N  L T S N I  I  with the closest homolog from F. oxysporum (see Table 1). Support values higher than 0.6 are reported.    Table S7. Figure S2: Amplification of UMAG_11064, UMAG_11072 and cox1 exons 1 and 7 in several U. maydis and S. reilianum strains. Strains are as in Table 2. Primer sequences are provided in Table S7. Figure S3: Verification of the deletion of UMAG_11064 and UMAG_11065. A) Schematic map of the genomic region containing UMAG_11064 and UMAG_11065 in SG200 and SG2001106411065. Primers used to amplify the left and right border sequences are indicated. B) DNA of SG200 and SG2001106411065 was cleaved with Fsp1 and subjected tho southern blot analysis using a mixture of Probes 1 and 2 indicated in A). The 2.94 kb fragment is diagnostic for SG200 while the 4.19 kb fragment is diagnostic for the deletion of UMAG_11064 and UMAG_11065.

Supplementary file:
Supplementary File S1: Scripts used to conduct the phylogenetic and statistical analyses, As well as R code used to generate figures 1, 3, 4, 5 and 6.