The soil fungus Rhizoctonia solani is an economically important pathogen of agricultural and forestry crops. Here, we present the complete sequence and analysis of the mitochondrial genome of R. solani, field isolate Rhs1AP. The genome (235 849 bp) is the largest mitochondrial genome of a filamentous fungus sequenced to date and exhibits a rich accumulation of introns, novel repeat sequences, homing endonuclease genes, and hypothetical genes. Stable secondary structures exhibited by repeat sequences suggest that they comprise functional, possibly catalytic RNA elements. RNA-Seq expression profiling confirmed that the majority of homing endonuclease genes and hypothetical genes are transcriptionally active. Comparative analysis suggests that the mitochondrial genome of R. solani is an example of a dynamic history of expansion in filamentous fungi.
The soil fungus Rhizoctonia solani (teleomorph = Thanatephorus cucumeris, Phylum Basidiomycota) is an economically important pathogen of agricultural and forestry crops. The fungus is also a competitive saprobe and an endomycorrhizal symbiont of orchids that represents an evolutionary link between beneficial and disease-causing fungi (Cubeta & Vilgalys, 2000; Rodriguez-Carres et al., 2011). Recently, as a consortium, we initiated a collaborative project to sequence the complete nuclear and mitochondrial (mt) genomes of the potato pathogen R. solani anastomosis group 3 (AG-3), strain Rhs1AP. Here, we present the assembly, annotation of the R. solani mt genome, and phylogenetic analysis of mt genomes of related fungi in the Basidiomycota.
Fungal mtDNA genomes display diversity in size ranging from 24 874 bp for Cryptococcus neoformans to 135 005 bp for Agaricus bisporus (Ferandon et al., 2013) (http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/organelles.html). The mt genome size disparity among fungi suggests a dynamic history of expansion and contraction, similar to plant mitochondria (Scott & Logan, 2007). Several major factors that contribute to mt genome diversity are known and include proliferation of noncoding content (e.g. short dispersed repeats), expansion of existing introns, gene duplication followed by inactivation (e.g. partial copies), and acquisition of homing endonuclease genes (HEGs) or other sequences from diverse sources (Sloan et al., 2012). The relatively small size and mostly uniparental inheritance of fungal mitochondria makes them ideal candidates for studying evolution, fungicide insensitivity, population genetics, and taxonomy (Bullerwell & Lang, 2005; Alverson et al., 2010; Joardar et al., 2012). However, these studies are hampered by the paucity of sequenced mt genomes of fungi in the Basidiomycota as less than twenty are currently available. Comparative genomic analysis suggests that R. solani has the largest sequenced mt genome in fungi which is driven by the multiplication of novel repetitive elements. The phylogenetic analysis confirmed the position of R. solani as an early diverging and transitional lineage of the Agaricomycotina.
Materials and methods
Five to seven days mycelial mats of R. solani Rhs1AP grown in potato dextrose broth at 24 °C were washed three times with sterile deionized water and freeze-dried. Lyophilized tissue was ground into a fine powder in liquid nitrogen with a mortar and pestle and used for DNA extraction with SDS buffer and phenol chloroform as previously described (Murray & Thompson, 1980; Zolan & Pukkila, 1986; Vilgalys & Gonzalez, 1990). Extracted DNA was placed into 10-mL tubes and subjected to cesium chloride ultracentrifugation in a VTi80 rotor for 12 h at 508 735 g at 20 °C using the procedure of Sambrook et al. (1989).
Sequencing was performed as part of the complete genome sequencing project using the Sanger 3730xl sequencing instrument to obtain 272 974 reads from libraries with different insert sizes (4 and 10 kb plasmid, and 40 kb fosmid) to obtain 28X sequence coverage and linkage of contigs. There was continuous clone coverage which provided clear evidence for the circularity of the mt genome.
Mitochondrial genome assembly and closure
The reads were initially assembled with celera assembler (CA) software version 5.3 (Miller et al., 2008). To assemble the mitochondrial genome, identified mitochondrial sequences from R. solani and other fungi in the Basidiomycota were used as queries in blastn searches against all CA output files. The contigs were manually assembled into the final genomic sequence, which was trimmed and rotated using the finishing package Consed (Gordon et al., 1998).
RNA sample preparation and Illumina sequencing (RNA-Seq)
Total RNA was isolated from the mycelium of R. solani harvested at 12, 48, 144 and 196 h during the sclerotia formation process using TRIzol reagent (Life Technologies). Oligo (dT)-based enrichment for mRNA was carried out using MicroPoly(A)Purist™ Kit (Ambion). Subsequently, mRNA was fragmented into smaller pieces and reverse-transcribed to cDNA with the aid of SuperScript® Double-Stranded cDNA Synthesis Kit (Life Technologies). After each preparation step, the sample quality and quantity was assessed using Agilent 2100 Bioanalyzer and the Agilent RNA 6000 Nano Kit (Agilent). The cDNA library construction involved end-repair, A-tailing, adapter ligation, and library amplification followed by cluster generation and sequencing. The cDNA library was sequenced using the Illumina Genome Analyzer II (GA II) instrument (www.illumina.com).
RNA-Seq data analysis
Raw sequence data were processed, filtered, and normalized by the Illumina pipeline to generate FASTQ files, which were analyzed using the RNA-Seq module of CLC Genomics Workbench. All reads were mapped to R. solani mt coding sequences to calculate expression values for each gene in RPKM (reads per kilobase exon model per million mapped reads) units (Mortazavi et al., 2008). Genes were considered expressed if they had more than four sequence reads mapped to their exonic regions.
Mitochondrial genome annotation
The open reading frames (ORFs) were identified using Artemis (Rutherford et al., 2000) with genetic code 4. Functional assignments were made based on sequence similarity to characterized fungal mitochondrial proteins using blastp searches against NCBI databases and HMM searches against the protein families in the Pfam database (Finn et al., 2010). ORFs that had no significant similarity to known genes and were at least 65 amino acids in length were annotated as hypothetical proteins. The tRNAs and ribosomal RNAs were identified using tRNAscan-SE (Lowe & Eddy, 1997) and Rfam (Griffiths-Jones et al., 2005).
Repeat identification and analysis
Interspersed repeats were identified using RepeatMasker (Tarailo-Graovac & Chen, 2009) and a curated fungal-specific repeat database (Repbase) (Jurka et al., 2005), as well as with RepeatScout (Price et al., 2005) and PrintRepeats (http://www.genome.ou.edu/miropeats.html) using default settings. psi-blast-based sequence homology searches were performed using TransposonPSI. mite-hunter (Han & Wessler, 2010) was used to search for miniature inverted repeat transposable elements (MITEs). The nine consensus repeat sequences generated by RepeatScout were used as the reference repeat library for RepeatMasker. Outputs from PrintRepeats were clustered using Usearch (http://www.drive5.com/usearch/), and blastn searches were performed on these clusters and on the nine consensus sequences from RepeatScout. Tandem repeats were identified using the program TRF with default parameters.
Fourteen core proteins (atp6, atp8, atp9, cox1, cox2, cox3, cob, nadh1, nadh2, nadh3, nadh4, nadh4l, nadh5, and nadh6) encoded by 13 genomes were concatenated and aligned using Muscle (Edgar, 2004). Regions with poor alignments were removed with Gblocks using default settings (Castresana, 2000). Maximum-Likelihood (ML) trees were generated using the Randomized Axelerated Maximum-Likelihood (RAxML) program (Stamatakis et al., 2005). Multiple ML trees were generated, and the best-scoring tree was identified. One hundred boot-strapped trees were generated and used to assign the bootstrap support values to the best-scoring ML tree. The JTT amino acid substitution model was used with the Gamma model of rate heterogeneity. Podospora anserina was used as the outgroup taxon. A maximum parsimony-based tree was also generated using the protpars program of the phylip package (Felsenstein, 1989).
Mitochondrial genome sequence
The mt genome of R. solani Rhs1AP is circular, with 235 849 bp, and represents the largest fungal mt genome sequenced to date (Fig. 1a). This size is consistent with the results of Jian et al., (1998) that estimated a genome size of c. 200 kb based on restriction analysis and with the R. solani AG1-IB mitochondrial genome recently published (162 751 bp) (Wibberg et al., 2012). The next largest mt genomes are A. bisporus (135 005 bp) (Ferandon et al., 2013) and Chaetomium thermophilum (127 206 bp), respectively. The mt genome of R. solani Rhs1AP had a G + C content of 35.9%, with no strand bias. We found no evidence for heterozygosity in the mt genome, suggesting that it was completely homozygous. In contrast, population genetics studies of field isolates of R. solani AG-3 showed that the nuclear genomes are heterozygous (Ceresini et al., 2007). Unlike other anastomosis groups of R. solani, the Rhs1AP mt genome had no apparent integrated or free-replicating linear or circular plasmids (Miyasaka et al., 1990). Syntenic analysis with two R. solani mitochondria showed high identity (> 95%) and general conservation of core mitochondrial gene sequences and order, but differed significantly in accessory and noncoding regions (Fig. 1c).
Rhizoctonia solani Rhs1AP mt genome contained 138 genes including 15 core genes found in all Basidiomycota mt genomes. Core genes encoded the hydrophobic subunits of respiratory chain complexes – atp6, atp8, atp9, cox1, cox2, cox3, cob, nadh1, nadh2, nadh3, nadh4, nadh4l, nadh5, nadh6, and rps3 encoding the 40S ribosomal protein S3. Accessory genes include 33 homing endonucleases and 86 hypothetical genes. Additionally, partial copies of atp6, cox2, nadh3, and nadh6 genes were also identified. The partial genes lacked the N-terminus and were located close to their full length copies (Fig. 1a). The genome encoded 27 tRNAs for all 20 amino acids and the small and large ribosomal RNA subunits. All core genes were present on the positive strand, whereas one tRNA and 20 hypothetical genes were found on the negative strand.
Six core genes, cox1, cox2, cox3, nadh1, nadh5, and cob, were intron-rich and contained complex nested intronic structures composed of hypothetical genes, repeats, and homing endonuclease genes (HEGs) each containing multiple introns. Although known group IB and IIB1 introns were found in the COX1 and SSU genes (Supporting Information, Fig. S1), the majority of introns did not align well with known group I or group II introns. For example, the cox1 gene spanned 20 892 bp and had a coding sequence length of 1587 bp in seven exons (Fig. 1b). Intronic regions in cox1 had 44 repeat elements, five HEGs, and four hypothetical genes. The fourth intron contained a LAGLIDADG HEG, which itself had three repeat-containing introns. Interestingly, the cox1 locus in other fungi [e.g. A. bisporus (29 902 bp)] (Ferandon et al., 2010) was larger, but the expansion was driven by group I introns and not by repeat elements as observed in R. solani.
Abundance of homing endonuclease genes and accessory genes
We identified 33 genes encoding homing endonucleases, which are associated with self-splicing mobile genetic elements. All R. solani Rhs1AP endonucleases belonged to the two classes most frequently identified in fungal mitochondria and characterized by the sequence motifs LAGLIDADG and GIY-YIG. These motifs are usually found in group I intronic sequences and intergenic regions (Belfort et al., 2005). In total, the genome contained eight GIY-YIG and 25 LAGLIDADG HEG genes. Twenty-two of these 33 endonucleases were located inside introns, and the remaining 11 were present in intergenic regions of the genome. Eight core genes contained introns, five of which contained HEGs. Homing endonucleases are common in fungal and other eukaryotic mt genomes (Belfort et al., 2005) and can bind to asymmetric recognition sequences of 12–40 bp. HEGs are hypothesized to represent ‘semi-selfish elements’ that promote transfer (homing) of introns by cleaving DNA at a specific site (Belfort et al., 2005). Endonucleases can also function as intron-specific maturases by facilitating (but not catalyzing) RNA splicing (Lambowitz & Perlman, 1990; Lambowitz & Belfort, 1993). Introns associated with mitochondrial genomes are self-splicing due to limited pre-RNA processing in the organelle, and homing endonucleases can facilitate their splicing and integration into new sites. Thus, acquisition of endonuclease genes by self-splicing introns in R. solani likely facilitated their expansion by converting them into invasive mobile elements (Haugen & Bhattacharya, 2004). In addition to HEGs, the R. solani mt genome contained 86 hypothetical proteins, which have no significant similarity to other protein sequences deposited in public databases, PFAM domains, or to each other. The size of the hypothetical genes ranged from 66 to 467 amino acids. Twenty of the 86 hypothetical genes were found on the negative strand, which suggests that they are of recent origin.
One-third of the genome is occupied by interspersed repeats
The most salient feature of the R. solani Rhs1AP mt genome was the presence of interspersed repeats that occupy 34% of the genome. There were at least three types of repeats found in our study: short interspersed palindromic sequences < 40 bp, mid-length (50–95 bp) sequences, and longer elements (> 100–963 bp). No known transposable or interspersed elements were detected. The palindromic sequences were each repeated numerous times, up to 100 times in one case (Fig. 1a). The RNA products of three highly similar mid-length repeated elements were found to have significantly lower folding free energy than those of randomly shuffled sequences of the same dinucleotide and nucleotide composition (t-test, P < E-30). Furthermore, these elements were predicted to fold into conserved and highly stable secondary structures that resemble structures of RNAs with enzymatic activity (Fig. 2a). In aggregate, these elements were repeated 26 times in the genome and were all preceded by a candidate promoter sequence (TTAATTCGCCTA[A/T]) (Kubelik et al., 1990). One additional element found three times had conserved secondary structure similar to those found in introns with ribozyme activity (Fig. 2b) and may represent a highly stable core structure with catalytic activity. Based on our analysis, none of the elements showed all the necessary characteristics to be classified as MITEs, short interspersed elements (SINEs), or other well-characterized classes of TEs, although some had terminal inverted repeats and others had apparent target site duplications (TSD). The presence of TSD provides evidence of past transposition events. However, as none of the predicted genes shared sequence similarity with known transposases or reverse transcriptases, it is unlikely that any of these elements behave as transposons. Usually, three or more of these repeat elements appeared clustered in islands in the mt genome (Fig. 1a), with the largest (963 bp) containing all other elements, except for two of the palindromic repeat elements. These findings suggest two possibilities: These regions represent hot-spots for insertion, and many elements jump consecutively into these locations, or a single large element was introduced in the genome, then copied – and in some cases degenerated. Subsequently, smaller elements were copied around the genome by recombination or possibly transposition using an unknown transposase as evidenced by TSD. None of the repeat elements were shared between the R. solani Rhs1AP and AG1-IB mitochondria, underscoring the relatively large divergence between these groups. Further, our analysis of other fungal genomes in the Basidiomycota showed no similar repeat structures (Table S1), suggesting that this phenomenon is unique to R. solani and that their acquisition and evolution warrants further investigation.
Rhizoctonia solani represents basal and transitional lineage within the Agaricomycotina
To understand the evolutionary relationships between R. solani and other fungi in the Basidiomycota, we performed a phylogenetic analysis of mt DNA sequences from 14 core proteins (Fig. 3). The coding regions of both R. solani mitochondria were highly similar (> 97% identity). At the time of our analysis, eleven publicly available Basidiomycota mt genomes were included in the analysis and a fungus in the Ascomycota, P. anserina, was used as the out-group taxon. Our results are congruent with previous findings based on multilocus sequence typing analyses (Rodriguez-Carres et al., 2011) and suggest that R. solani represents an early diverging, transitional lineage within Agaricomycotina. Earlier studies indicate that R. solani belongs to the Cantharelloid clade in the Agaricomycotina (mushroom-forming fungi) (Moncalvo et al., 2006). Numerous phylogenies of the Kingdom Fungi (Matheny, 2005; James et al., 2006; Hibbett et al., 2007) based on analysis of different loci (RPB2, RPB2, EF1-alpha, and ribosomal DNA) have shown that different genera (Botryobasidium, Cantharellus, Ceratobasidium, Thanatephorus, and Uthatobasidium) in the Cantharelloid clade usually displayed a basal position within the Agaricomycotina.
Majority of accessory genes are transcriptionally active
To validate predicted endonuclease and hypothetical genes, RNA samples collected from R. solani were sequenced with RNA-Seq technology (Nagalakshmi et al., 2008). A total of 114 195 reads mapped to the genes of the mt genome, of which 31 281 mapped to the protein-coding genes while 82 914 mapped to ribosomal RNA genes and an additional 46 760 to intergenic sequences (30% of reads; Fig. S1). The fraction of mapped reads was lower than anticipated based on the estimated mt to nuclear genome size ratio (0.0024) (M.A. Cubeta and D.C. Schwartz, unpublished) and expected mt copy number. However, the data show that many accessory genes were transcriptionally active (Fig. 1a). Eighty of the 138 genes were expressed (minimum coverage of four reads) with a median coverage of 72 reads. In addition to the 15 core genes, 53 hypothetical genes, 6 LAGLIDADG and 3 GIY-YIG endonuclease genes were also expressed. A hypothetical gene (343.t000094) had the highest expression level among the mt genes. It is noteworthy that many of the represented intergenic regions were repeats surrounding HEG genes or were also present in the assembled nuclear genome, suggesting that their relatively high abundance may be in part due to sequence over-abundance rather than to a particular biological function.
To our knowledge, the soil fungus and potato pathogen R. solani represents the largest mitochondrial genome sequenced in a filamentous fungus. In addition to the functional core set of genes, the genome is comprised of a rich collection of introns, homing endonucleases, hypothetical genes, and novel repetitive elements. Our data provide evidence for mitochondrial size expansion, which highlights the role of interspersed repeat elements in the evolution of fungal mitochondria. Although the genetic mechanisms associated with this expansion are not known and beyond the scope of this study, there are several plausible explanations for this observation.
The fungal strain Rhs1AP used in this study represents a naturally occurring heterokaryotic (N + N) organism that possesses at least two distinct haploid nuclear genomes based on whole genome sequence data and an optical map of the chromosomes (M.A. Cubeta, W.C. Nierman, and D.C. Schwartz, data not shown). Information from the optical mapping studies has provided evidence for recombination and occurrence of at least three copies of five different chromosomes that are distributed across the genome. This genome complexity suggests that our strain has been involved in previous hyphal fusion (anastomosis) and interaction events that have contributed to nuclear complexity and possibly the observed cytoplasmic mtDNA diversity. In nature, heterokaryons of R. solani form as a result of the interaction and fusion of hyphae from two compatible haploid (homokaryotic) strains during the mating process (Cubeta & Vilgalys, 1997). However, heterokaryons can also form during the interaction of heterokaryotic and closely related homokaryotic strains. In both of these types of hyphal interactions, there is mixing of cytoplasm (heteroplasmy) and mitochondrial DNA. Although it is widely accepted that the inheritance of mitochondrial DNA in animals and fungi is uniparental and nonrecombining (Taylor et al., 1986; Griffiths, 1996; Basse, 2010), there is also evidence for biparental and recombining behavior in these organisms under laboratory and field conditions (Barr et al., 2005). For example, Saville et al. (1998) and Smith et al. (1990) demonstrated the occurrence of a low frequency of mtDNA recombination in a natural population of the root-infecting basidiomycete fungus Armillaria gallica. de la Bastide and Horgen (2003) also provided experimental evidence for the detection of nonparental mitochondrial DNA haplotypes in crosses of homokaryons of the common button mushroom fungus A. bisporus. More recently, Xu et al. (2000) and van Diepeningen et al. (2010) reported mtDNA recombination in natural population of the basidiomycete human pathogenic yeast C. gattii and dung-inhabiting filamentous ascomycete fungus P. anserina, respectively. Interestingly, the authors in the latter study have suggested that mitochondrial fusion coupled with the activity of homing endonuclease genes, which were found in the assembled nuclear genome and adjacent to intergenic regions of the mtDNA in R. solani, may possibly contribute to recombination and patterns of mtDNA inheritance. Recent research also suggests an association of mating genes and uniparental inheritance of mtDNA in the human and plant pathogenic basidiomycete fungi C. neoformans and Ustilago maydis (Basse, 2010). Perhaps the nuclear condition (homokaryon vs. heterokaryon) of the vegetative hyphae of the interacting fungal strain can influence diversity and inheritance of mtDNA as suggested by Skosireva et al. (2010) that showed deviations from expected patterns of uniparental mtDNA inheritance in crosses of haploid and nonhaploid strains of C. neoformans.
In addition to genetic compatibility systems that regulate the fusion of hyphae and mating of fungi, most fungi investigated possess somatic recognition systems and an ability to recognize self versus nonself (Glass et al., 2004). However, the genetic basis for somatic recognition systems are not well understood in Rhizoctonia fungi, but is thought to involve the interaction of multiple loci. In filamentous basidiomycete fungi, somatic incompatibility often occurs between two interacting heterokaryotic strains that differ at the loci that regulate somatic recognition, which results in a programmed cell death response of the interacting hyphal cells. It has been hypothesized that somatic recognition systems have evolved in fungi to prevent the introgression of mitochondrial and nuclear DNA, mycoviruses, plasmids, and other selfish elements and organelles, although alternate hypotheses related to kin selection have also been proposed (Aanen et al., 2008). In R. solani, the occurrence of mitochondrial- and nuclear-associated plasmids and double-stranded RNA (dsRNA) elements are common and can often bypass cell death responses (Tavantzis, 1994; Charlton & Cubeta, 2007) and systems that regulate uniparental inheritance. Perhaps plasmids and dsRNAs may promote the spread of associated mitochondrial elements that contribute to the diversity and size expansion in the mtDNA genome of R. solani observed in this study.
The soil fungus R. solani represents an important basal and transitional lineage of filamentous fungi, and the mt genome sequence of this organism will provide a valuable resource to further examine the phylogenetic relatedness of mushroom-forming fungi and their allies, which have been poorly sampled. The complex nested intronic structures in core genes found in the mt genome of R. solani represent common targets of antifungal agents used to manage plant disease caused by the fungus. Interestingly, resistance (insensitivity) in R. solani anastomosis group 1-IA, a major pathogen that causes sheath blight disease of rice, to the quinone outside inhibitor (QoI) strobilurin family of fungicides has been recently reported (Castroagudin et al., 2013; Olaya et al., 2013). Resistance to this fungicide is associated with the F129L mutation (phenylalanine to leucine substitution at codon 129) in the cytochrome oxidase b gene of the mtDNA (Olaya et al., 2013). The information generated in this study will provide a unique opportunity to understand fungicide resistance mechanisms, and to develop diagnostic assays and fungicide resistance monitoring strategies for a plant disease-causing organism of global economic importance.
We thank Dana Busam, Karen Beeson, Sana Scherbakova, and Lakshmi Viswanathan from JCVI for assistance with sample processing, library construction, and sequencing. We are also grateful to Paul Paukstelis from the University of Maryland for helpful suggestions regarding the analysis of the RSOL-mtRPT secondary structure. We thank JoAnne Crouch, Frank Martin, and Stellos Tavantizis for presubmission review of this manuscript. Funding for this study has been provided by a grant from NSF/USDA-CSREES Microbial Genome Sequencing Program #2007-35600-18550 to W.N., R.D., and M.C.