Comparative analysis of mitochondrial genomes of Rhizophagus irregularis – syn. Glomus irregulare – reveals a polymorphism induced by variability generating elements


  • Damien Formey,

    1. Laboratoire de Recherche en Sciences Végétales, Université de Toulouse, UPS, UMR5546, Castanet-Tolosan Cedex, France
    2. CNRS, UMR5546, Castanet-Tolosan Cedex, France
    3. Agro-Nutrition, Carbonne, France
    Search for more papers by this author
    • These authors contributed equally to this work.

  • Marion Molès,

    1. Laboratoire de Recherche en Sciences Végétales, Université de Toulouse, UPS, UMR5546, Castanet-Tolosan Cedex, France
    2. CNRS, UMR5546, Castanet-Tolosan Cedex, France
    Search for more papers by this author
    • These authors contributed equally to this work.

  • Alexandra Haouy,

    1. Laboratoire de Recherche en Sciences Végétales, Université de Toulouse, UPS, UMR5546, Castanet-Tolosan Cedex, France
    2. CNRS, UMR5546, Castanet-Tolosan Cedex, France
    Search for more papers by this author
  • Bruno Savelli,

    1. Laboratoire de Recherche en Sciences Végétales, Université de Toulouse, UPS, UMR5546, Castanet-Tolosan Cedex, France
    2. CNRS, UMR5546, Castanet-Tolosan Cedex, France
    Search for more papers by this author
  • Olivier Bouchez,

    1. Plateforme Génomique, Campus INRA Chemin de Borde-Rouge, Castanet-Tolosan Cedex, France
    Search for more papers by this author
  • Guillaume Bécard,

    1. Laboratoire de Recherche en Sciences Végétales, Université de Toulouse, UPS, UMR5546, Castanet-Tolosan Cedex, France
    2. CNRS, UMR5546, Castanet-Tolosan Cedex, France
    Search for more papers by this author
  • Christophe Roux

    Corresponding author
    1. CNRS, UMR5546, Castanet-Tolosan Cedex, France
    • Laboratoire de Recherche en Sciences Végétales, Université de Toulouse, UPS, UMR5546, Castanet-Tolosan Cedex, France
    Search for more papers by this author

Author for correspondence:

Christophe Roux

Tel: +33 562 19 35 04



  • Arbuscular mycorrhizal (AM) fungi are involved in one of the most widespread plant–fungus interactions. A number of studies on the population dynamics of AM fungi have used mitochondrial (mt) DNA sequences, and yet mt AM fungus genomes are poorly known. To date, four mt genomes of three species of AM fungi are available, among which are two from Rhizophagus irregularis.
  • In order to study intra- and interstrain mt genome variability of R. irregularis, we sequenced and de novo assembled four additional mt genomes of this species. We used 454 pyrosequencing and Illumina technologies to directly sequence mt genomes from total genomic DNA.
  • The mt genomes are unique within each strain. Interstrain divergences in genome size, as a result of highly polymorphic intergenic and intronic sequences, were observed. The polymorphism is brought about by three types of variability generating element (VGE): homing endonucleases, DNA polymerase domain-containing open reading frames and small inverted repeats. Based on VGE positioning, mt sequences and nuclear markers, two subclades of R. irregularis were characterized.
  • The discovery of VGEs highlights the great intraspecific plasticity of the R. irregularis mt genome. VGEs allow the design of powerful mt markers for the typing and monitoring of R. irregularis strains in genetic and population studies.


Arbuscular mycorrhizal (AM) symbiosis is considered to be the most widespread plant–fungus interaction as it concerns c. 80% of terrestrial plant species in all ecosystems (Smith & Read, 2008). AM fungi belong to a specific phylogenetic group basal to Dikarya, the Glomeromycota (Schwarzott et al., 2001; Krüger et al., 2011), and are of great agronomic interest because of their role in plant water and mineral assimilation (Smith & Read, 2008). A considerable amount of work on the ecological incidence of AM fungi in natural and field conditions has been undertaken. A correlation was established between the diversity of AM fungal species in the soil and the plant biodiversity and productivity above ground (van der Heijden et al., 1998). As a feedback effect, it was also reported that plant species influence the relative abundance of AM fungi by modifying the structure of the soil microorganism community (van der Heijden et al., 1998; Klironomos, 2002). Overall, these studies highlight the importance of AM fungi in structuring plant communities and increasing ecosystem productivity. The most widely studied species among Glomeromycota is Rhizophagus irregularis (Błaszk., Wubet, Renker & Buscot) C. Walker & A. Schüßler – syn. Glomus irregulare Błaszk., Wubet, Renker & Buscot (Blaszkowski et al., 2008; Stockinger et al., 2009; Schüßler & Walker, 2010), a ubiquitous species present in natural and cultivated soils (Börstler et al., 2008). Population and diversity analyses of R. irregularis require the definition of robust and reliable molecular markers. Mitochondrial (mt) DNA sequences have been mainly used, particularly the mt large-subunit (mtLSU) ribosomal RNA genes (Raab et al., 2005; Börstler et al., 2008, 2010; Croll et al., 2008; Thiéry et al., 2010). The relevance of using mt ribosomal regions as molecular markers was strengthened by the fact that a unique haplotype of mtLSU was identified within each fungal isolate (Raab et al., 2005), whereas nuclear markers, like the internal transcribed spacer (ITS) regions, were found to be highly polymorphic within fungal isolates (Stockinger et al., 2009). Börstler et al. (2008) successfully designed a restriction fragment length polymorphism (RFLP) strategy on mtLSU sequences. They were able to distinguish 12 haplotypes among 16 isolates of R. irregularis. Surprisingly, haplotype diversity in two cultivated field sites was found to be higher than in two seminatural sites used as references (Börstler et al., 2010). These studies highlighted the utility of using mtLSU markers to investigate the diversity of R. irregularis strains in natural conditions. They also revealed that the sole use of mtLSU markers probably underestimates the actual strain diversity (Croll et al., 2008). To sum up, these studies emphasized that: (1) there is only one mtLSU haplotype in each R. irregularis strain; (2) mtLSU allows the differentiation of most strains in different haplotypes; and (3) to reach a finer scale of typing and to identify individuals, it will be necessary to define more polymorphic mt regions.

Only a comprehensive analysis of several complete mt genomes of strains of the same species will reveal the occurrence of highly polymorphic regions and strain-specific markers. To date, the mt genomes from two different strains of R. irregularis have been sequenced – FACE#494 (Lee & Young, 2009) and DAOM197198 (HQ189519.1). The latter isolate is the model AM fungal strain adopted by the scientific community because of its relative ease of propagation and in vitro production in large quantities (Chabot et al., 1992). Recently, mt genomes of two other species of AM fungi – Gigaspora margarita (Pelin et al., 2012) and Gigaspora rosea (Nadimi et al., 2012) – have been released. The mt genomes of these three species of AM fungi were found to possess standard fungal mt genomes, including the expected classical set of genes, whereas a high interspecific variability of intergenic sequences was observed.

In addition to these studies on the interspecific variability of mt genomes, our objective was to evaluate intraspecific polymorphism of the R. irregularis mt genome. Intraspecific polymorphic regions would be helpful to design markers for fundamental genetic studies in vitro or in the field, or for quality control of AM fungal inoculum. We therefore sequenced and de novo assembled new mt genomes of four different strains of R. irregularis. In order to evaluate the efficiency of direct sequencing from total genomic DNA (gDNA) using next generation sequencing (NGS) approaches, Roche 454 GS FLX pyrosequencing and the MiSeq System (Illumina) were used. We then compared these mt genomes to analyze intra- and interstrain variability. Finally, we identified useful mt genomic regions to be used as selective markers for strain typing.

Materials and Methods

DNA preparation from fungal strains

Four strains of R. irregularis obtained from GINCO_BEL ( and strain DAOM197198 were selected for this study (Table 1). All strains were cultured on M medium (solidified with 0.3% gellan gum; Phytagel, Sigma) with Ri T-DNA transformed carrot roots (Bécard & Fortin, 1988) in a two-compartment Petri dish system (St-Arnaud et al., 1996). Mycelium and spores collected from the fungal compartment after 2 months of cultivation (20 000–50 000 spores) were ground in liquid nitrogen. The resulting powder was used for DNA extraction based on the cetyltrimethylammonium bromide (CTAB) method (Saghai-Maroof et al., 1984).

Table 1. Strains of Rhizophagus irregularis used in this study and the number of reads and contigs obtained for each strain after 454 or MiSeq sequencing
NameOrigin of isolatesReads 454 (MiSeq)Contigs 454 (MiSeq)
MUCL_43204Clarence Creek, Ontario, Canada261 0465928
MUCL_46239Iles-de-la-Madeleine, Quebec, Canada381 4247311
MUCL_46240Buckingham, Quebec, Canada   87 901 (5.5 × 106)   797 (1.4 × 105)
MUCL_46241Ripon, Quebec, Canada249 5973632
DAOM197198Pont Rouge, Quebec, Canada247 6314360

Sequencing and assembling procedures

Sequencing procedures were performed at the GeT-PlaGe platform, a core facility for genomics in Toulouse (France) ( Approximately 0.5 μg of gDNA was used for the construction of Roche 454 GS FLX sequencing libraries following the manufacturer's protocol. Genomic DNA from strain DAOM197198 was first sequenced in two independent runs, each on one-eighth of the Pico Titer Plate, to test the sequencing efficiency of the mt genome using Roche 454 pyrosequencing. Total gDNA of the four additional R. irregularis strains (MUCL_43204, MUCL_46239, MUCL_46240, MUCL_46241) was then sequenced. A specific index (MID, multiplex identifier) was chosen for each strain library. The libraries were then multiplexed and sequenced on two half-plates in equimolar amounts. To complete the mt genome of MUCL_43204, we performed additional sequencing on one region of one-eighth of the Pico Titer Plate. The generated 454 reads were cleaned and filtered with Pyrocleaner 1.2 software (Mariette et al., 2011), resulting in a sequence suppression of 12.8% on average per library. Cleaned reads were assembled using GS de novo Assembler 2.5.3, from Roche, leading to a mean of 4406 contigs per library (see Table 1), each with a length of at least 500 bp. In order to complete the sequence of the mt genome of strain MUCL_46240 obtained by Roche 454 pyrosequencing and to make comparison with another sequencing technique, extracts of total gDNA (0.5 μg) were sequenced using the MiSeq System (Illumina, San Diego, CA, USA). Generated reads were de novo assembled using CLC Genomics Workbench 5.0.1. software (CLC bio, Aarhus, Denmark).

Sequence annotation and comparison

The mt genomes were annotated using Artemis 13.2.7 software from the Wellcome Trust Sanger Institute (Rutherford et al., 2000) and NCBI BLAST (Altschul et al., 1997; Carver et al., 2005; Johnson et al., 2008) against the nonredundant (nr) and CDD v2.05 (Conserved Domain Database) databases. The transfer RNAs were annotated using tRNAscan-SE Search Server 1.21 (Lowe & Eddy, 1997; Schattner et al., 2005). Small inverted repeats (SIRs) were manually annotated using NCBI BLAST and their secondary structure was predicted using RNAfold WebServer (Gruber et al., 2008). The complete mt genomes of MUCL_43204, MUCL_46239 and MUCL_46240/MUCL_46241 are deposited in Genbank with the accession numbers JQ514224, JQ514223 and JQ514225, respectively.

The mt genomes were compared using Artemis Comparison Tool (ACT) v8 (Carver et al., 2005) and MAUVE software 2.3.1 (Darling et al., 2004) with the progressive Mauve algorithm (Darling et al., 2010). The height of the similarity profile is inversely proportional to the average alignment column entropy over a region of the alignment. The percentage of nucleotide identity of the different mt genomes and conserved mitochondrial genes (CMGs) (ATP6, ATP8, ATP9, CYTB, COX1, COX2, COX3, NAD1, NAD2, NAD3, NAD4, NAD4L, NAD5, NAD6), compared with the FACE#494 genome, was calculated with the complete sequence of the mt genome and a generated consensus assembly of the CMG sequences using EMBOSS Stretcher online (

Single-nucleotide polymorphism (SNP) analysis

The analyses of intrastrain polymorphism were performed on mt and nuclear contigs. For the mt genome, analyses were carried out on the consensus contig assembled by GS Assembler, and the analyses for the nuclear genome were performed on the 10 longest contigs that did not match with the mt genome. For each strain, the genome consensus contig assembly generated by GS was used as a reference and new alignment was performed using Burrows–Wheeler Align (BWA) (Li & Durbin, 2010) to generate assembly files in SAM format ( Sequence alignments were then treated with Samtools (Li et al., 2009) to obtain a list of potential SNPs. A Perl script was developed in order to retain the SNPs that meet the following criteria: they belong to sequences without large soft-clipped portions upstream or downstream of an area of identity and with a matching quality > 100; they are found at least twice at the given read depth and are not included in, or do not generate, a nucleotide homopolymer of three bases or more. Data obtained from the Illumina MiSeq System were analysed using the SNP detection function of the software CLC Genomics Workbench 5.0.1.

Microsatellite analysis

Genomic DNA was extracted as described above. We used a set of 16 microsatellite markers of the 18 described by Mathimaran et al. (2008): Glint02 to Glint16 and Glint18. PCR products were indirectly revealed using a fluorescein amidite (FAM)-labeled M13 primer (5′→3′ GTTTTCCCAGTCACGACGTTG), all forward primers possessing a M13 extension. DNA templates containing either Pfu polymerase or Taq polymerase were amplified using the following thermal cycling conditions: 2 min at 94°C, followed by 35 cycles of 30 s at 94°C, 30 s at 55°C, 45 s (Pfu polymerase) or 30 s (Taq polymerase) at 72°C, and a final extension for 5 min at 72°C. FAM-labeled M13 primer allowed the visualization of the fragment length on the 48-capillary 3730 DNA Analyzer (Applied Biosystems, Inc., USA). Microsatellite data were manually analysed using Genemapper 4.0 (Applied Biosystems, Inc., Carlsbad, CA, USA).

Phylogenetic analyses

The alignments of the mtLSU ribosomal nucleotide sequences or CMG consensus amino acid sequences were performed with MAFFT online ( (Katoh et al., 2002). Alignments were automatically trimmed with the Gappyout module of the software Trimal (Capella-Gutiérrez et al., 2009). The evolutionary history was inferred using both the maximum likelihood method based on the Tamura–Nei model (Tamura et al., 2004), conducted in MEGA5 (Tamura et al., 2011), and the Le-Gascuel model of evolution performed by the software PhyML 3.0 (Guindon et al., 2010). The percentage of trees in which the associated taxa clustered together (bootstraps) was calculated on the basis of 500 replicates. For the phylogeny conducted in MEGA5, when the number of common sites was < 100 or less than one-quarter of the total number of sites, the maximum parsimony method was used; otherwise, the BIONJ method with an Markov Cluster distance matrix was used. All positions containing gaps and missing data were eliminated. For distance analyses using data from microsatellite markers and variability generating elements (VGEs), matrices of characters were constructed by coding each marker considered as an individual character, and then inferring by neighbor joining (observed divergence as criterion of distance analysis) using PAUP software (Swofford, 1993).

Primer design and PCR

One set of primers was designed on strain-specific zones established by mt genome comparison (Supporting Information Table S1). PCR amplifications were carried out using Taq polymerase and the reaction buffer supplied (Promega) with the following conditions: 94°C for 2 min and then 40 cycles at 94°C for 30 s, 60°C for 1 min and 72°C for 1 min, followed by a final extension period at 72°C for 10 min. Another set of primers was designed on flanking regions of short variable sequences to analyze size polymorphism by PCR amplification (Table S1). The PCR protocol was identical except for the elongation time (72°C for 30 s).


Direct sequencing of mt genomes

To evaluate the sequencing efficiency of mt genomes from total gDNA, preliminary sequencing runs of 454 GS FLX Titanium were performed on strain DAOM197198, whose mt genome is already available. From two steps of one-eighth plate sequencing, 247 631 reads were obtained (Table 1). The resulting de novo assembled mt genome was totally identical to HQ189519.1 with a 5.97× coverage of the mt genome. From this evaluation, the gDNA of the four other strains was tagged in four distinct libraries and sequenced in one run of 454 GS FLX Titanium. The 343-Mb sequences obtained resulted in 979 968 reads, corresponding to an average of 244 992 reads per strain, with an average length of 436.5 bp. As mentioned in Table 1, the efficiency of sequencing among the four strains was not equivalent. Long contigs, corresponding to circular mt genomes, were obtained for the strains MUCL_46239 and MUCL_46241. Complementary 454 sequencing provided missing information for MUCL_43204. For the strain MUCL_46240, the mt genome was not totally sequenced, with coverage at 84% when compared with the closest mt genomes (MUCL_46241). To complete this mt genome sequence and to test the sequencing efficiency of another NGS technique, the genome of MUCL_46240 was sequenced independently by the MiSeq System (Illumina), generating 1.2 Gbp. At the end, four complete genomes of strains MUCL_43204, MUCL_46239, MUCL_46240 and MUCL_46241, in addition to that of DAOM197198, were obtained with an average coverage of 11.7×, 14.8×, 46.3× and 10.3×, respectively.

SNP analysis

For all strains, the sequencing depth provided information on the intrastrain SNP of mt genomes. The first analyses were performed on sequences obtained by Roche 454 pyrosequencing, that is, for strains MUCL_43204, MUCL_46239, MUCL_46241 and DAOM197198. Nuclear sequences presented an average SNP rate of 139.7 × 10−5 SNP/nucleotide, whereas mt sequences did not contain any SNPs (Table 2). In addition, we identified a high rate of short indels in reads when mapping them onto mt and nuclear genomes (not shown). These indels could be 454 sequencing errors because of the known biases caused by homopolymeric regions and low GC content (Balzer et al., 2011). In agreement with this hypothesis, no indels were found in the mt genome of the strain MUCL_46240 obtained by the MiSeq System.

Table 2. Intrastrain single nucleotide polymorphism (SNP) rate estimated in nuclear and mitochondrial sequences, obtained by Roche 454 pyrosequencing, of four strains of Rhizophagus irregularis
StrainNuclear SNP rate (SNP/10−5 nucleotides)aMitochondrial SNP rate (SNP/10−5 nucleotides)
  1. a

    The nuclear contigs used correspond to the 10 longest contigs of each strain that do not match with mitochondrial sequences.


Phylogenetic analyses

We confirmed that the strains used belonged to the clade of R. irregularis species employing mtLSU ribosomal sequences (Thiéry et al., 2010; Fig. 1) and a consensus of amino acid sequences of CMG (Supporting Information Fig. S1). In the latter tree, all CMG sequences from AM fungi (R. irregularis, Gigaspora rosea and Gigaspora margarita) were grouped in a clade distinct from the other fungal taxa. Interestingly, when comparing mtLSU or CMG nucleotide sequences among the R. irregularis strains, the sequences split into two subclades (Fig. S2a,b). A similar distribution of strains in subclades was observed using the nuclear ITS (Fig. S3). To further confirm that nuclear sequences support this dichotomy, nuclear microsatellite markers defined for typing R. irregularis strains (Croll et al., 2008) were used. The tree deduced from the microsatellite allele size pattern (Table S2) revealed similar clustering to that obtained from the mt genome comparison: one group formed by MUCL_43204 and the other formed by MUCL_46239, MUCL_46240, MUCL_46241 and DAOM197198. Intriguingly, mtLSU sequences, CMG sequences (Fig. S2) and microsatellite patterns (Table S2) of MUCL_46240 were found to be identical to those of MUCL_46241.

Figure 1.

Phylogeny of 22 isolates of arbuscular mycorrhizal (AM) fungi based on mitochondrial large subunit (mtLSU) ribosomal sequence comparison. Alignment was based on data from Thiéry et al. (2010) leading to 369 conserved sites of mtLSU. Phylogenetic inferences were calculated using the maximum likelihood method based on the Tamura–Nei model. The major clades are surrounded by colored rectangles. The tree with the highest log likelihood (−1363.6593) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Branch lengths are representative of the number of substitutions per site. All positions containing gaps and missing data were eliminated.

Genome composition and comparison

The mt genomes of strains MUCL_46240 and MUCL_46241 were found to be totally identical, confirming the above analyses (Figs 2, S4). The four resulting mt genome sequences were compared with that of strain FACE#494 (Table 3). All genomes are circular with an average GC content of c. 37%. The genome size corresponds to an average size compared with other fungal mt genomes (usually < 100 kb). Intriguingly, the genome sizes varied widely from 70 783 bp for DAOM197198 to 87 754 bp for MUCL_43204. All strains have the same classic set of proteins, as already annotated in the FACE#494 strain (Fig. S4). Transfer RNAs are present in a complete set and are the same in all strains, except for MUCL_43204, lacking the pseudo-Asp-tRNA. The strains also exhibit a different number of predicted open reading frames (ORFs), FACE#494 having the lowest ORF content (19) and MUCL_43204 having the highest (34) (Table 3). This variation in the number of predicted ORFs could partly explain the variation in genome size. Calculation of the percentage of identity of each genome with the genome of FACE#494, using either the whole genome or CMGs only, revealed (Table 3) that (1) most divergences were found in the intergenic regions, and (2) MUCL_43204 is distant from the others, with 65.8% of nucleotide sequence identity (CMGs) instead of over 99% for the other strains. The synteny and identity analyses of these genomes showed conserved coding sequences in organization and identity, but differences in sequence or position of introns and the intergenic region (Fig. 2). For instance, the light blue block on the left of the figure moved position (after and before the green block) and orientation (above and below the central line of each panel). This region corresponds to part of a DNA polymerase domain, like the purple boxes on the right of the figure. These purple blocks are of different sizes depending on the strain, and participate in genome size variations. In MUCL_43204, this phenomenon is easily distinguishable: multiple insertion events have led to the creation of introns that invaded the genome and increased its size. These examples indicate that the variations observed in organization and genome size are caused by the insertion/deletion of genetic elements. Three types of VGE were identified: homing endonuclease ORFs (HEO), DNA polymerase-domain containing ORF (DPDCO) (Fig. 3) and SIRs (Fig. S5). HEOs and SIRs are found in both CMG and intergenic region, whereas DPDCOs are found only in the CMG intergenic region.

Figure 2.

Multiple mitochondrial genome alignment of five Rhizophagus irregularis strains performed with MAUVE 2.3.1. Each genome's panel contains the name of the strain, a scale showing the sequence coordinates for that genome and a single black horizontal center arrow representing the limit between the strands. Colored blocks correspond to presumably homologous genome sequences. When a block lies above or below the center line, the aligned region is in the forward or reverse orientation, respectively, relative to the first strain. Regions without a block lack detectable homology among the compared genomes. Inside each block, the height of the similarity profile corresponds to the average level of conservation in that region. Areas that are completely white were not aligned. Vertical lines link homologous blocks between the genomes. The boxes below the blocks represent the features of the Genbank annotation. The top row represents the features on the forward strand, the middle boxes represent the features on the reverse strand and the bottom boxes represent the small inverted repeats. White box, conserved mitochondrial genes; blue box, DNA polymerase domain-containing open reading frame (ORF); red box, homing endonuclease ORF; green box, predicted ORF.

Figure 3.

Distribution of the different variability generating elements (VGEs) among the mitochondrial genomes of five different strains of Rhizophagus irregularis. Roman numbers, standard numbers and letters represent the locus of each homing endonuclease open reading frame (ORF) (HEO) with a GIY-YIG domain, HEO with a LAGLIDADG domain and DNA polymerase domain-containing ORF (DPDCO), respectively. Black dots represent the presence of the element in the genome. The absence of a black dot indicates that the corresponding VGE is not present at the corresponding locus. The tree on the right illustrates the distance analysis based on HEOs and DPDCOs as characters.

Table 3. Comparison of mitochondrial genomes of five strains of Rhizophagus irregularis obtained in this study with that of FACE#494 (Lee & Young, 2009)
Strain%GCGenome size (bp)CMGsStructural genesPredicted ORFsa%ID genome vs 494%ID of CMGs vs 494HEOsDPDCOsSIRs
GIY-YIG domainLAGLIDADG domain
  1. CMG, conserved mitochondrial gene; DPDCO, DNA polymerase domain-containing ORF; HEO, homing endonuclease ORF; ORF, open reading frame; SIR, small inverted repeat.

  2. a

    Including ORF-encoding proteins with unknown function and pseudogenes.

FACE#49437.270606142819   –   –2   5526
DAOM19719837.27078314281992.899.92   5627
MUCL_4623937.27081814282195.699.82   5628
MUCL_46241/MUCL_4624037.37479714282390.099.52   5632

HEOs are enzymes encoded by ORFs embedded within introns. In the genomes studied, we found two HEOs with GIY-YIG domains, two with a double type 1 LAGLIDADG domain and three with a single type 2 LAGLIDADG domain. The MUCL_43204 strain has a LAGLIDADG HEO organization different from that of the other strains. It is composed of two HEOs with a putative LAGLIDADG domain, three with a single type 1 LAGLIDAG domain, four with a double type 1 LAGLIDADG domain and three with a single type 2 domain. HEO insertions are responsible for 42% of the size increase in MUCL_43204 (not shown).

DPDCOs are long ORFs that contain complete or degenerate DNA polymerase domains and are differentially distributed in the genomes studied. We found five DPDCOs in strains FACE#494 and MUCL_43204, and six in strains MUCL_46239, MUCL_46241 and DAOM197198. One DPDCO alone explains most of the genome size difference observed between FACE#494 and MUCL_46241 (purple block in Fig. 2).

SIRs are small elements (33–99 bp) that have a predicted structure that confers on them the ability to be folded in a single hairpin (Boer & Gray, 1991). Five different types of SIRs were identified based on their sequence and folding structure (Fig. S5). They were differentially distributed in the five genomes: 26 SIRs in FACE#494, 27 in DAOM197198, 28 in MUCL_46329, 32 in MUCL_46241 and 50 in MUCL_43204.

Overall, we see that the SIR number and the number and position of the HEOs/DPDCOs are the main differences between MUCL_43204 and the other strains, supporting the subclade formed by MUCL_43204 (Fig. 3).

PCR-based strain identification strategies

As VGEs clearly differentiate the mt genomes of the different strains, these elements were targeted to define strain-specific PCR primers. The first strategy was to design primers targeting a specific region (DPDCO or HEO) for each strain, leading to a diagnostic method based on the presence/absence of PCR amplification. Specific primers were successfully designed for specific DNA amplification, from one spore, of strains DAOM197198, MUCL_43204 and MUCL_46240/MUCL_46241 (Fig. S6). The strain MUCL_46239 did not present a sequence that was sufficiently specific or long to design one pair of specific primers.

A second strategy was developed based on the length polymorphism of amplicons. Two pairs of primers targeting the region flanking VGEs (HEO or SIR) were chosen that discriminated between the five strains (Table 4, Fig. S7). As expected, the patterns were identical for the pair of strains MUCL_46240/MUCL_46241.

Table 4. Amplicon size (bp) obtained with the polymorphism length primers for different strains of Rhizophagus irregularis
PrimerFACE #494aDAOM 197198MUCL 46239MUCL 43204MUCL 46240MUCL 46241
  1. a

    For FACE#494, sizes are in silico predictions from the mitochondrial genome sequence (Lee & Young, 2009).



In order to analyze mt genome variability in R. irregularis at the strain level and to provide markers for typing these strains, we sequenced the mt genomes of four new strains of R. irregularis in addition to DAOM197198. To our knowledge, this is the second study dealing with the intraspecific comparison of fungal mt genomes, but the first ever conducted with AM fungal mt genomes. Torriani et al. (2008) compared the mt genomes of isolates of the Ascomycete Mycosphaerella graminicola, from USA and Europe, and observed little variation. We show here that comparison of the mt genomes of R. irregularis, obtained by direct sequencing of total DNA using (NGS) approaches, provides new horizons to investigate the populations and genetics of this species.

Multiplexed direct sequencing of mt genomes

Recent analyses dealing with the study of mt genomes from different taxa have underlined the power of NGS techniques (Zaragoza et al., 2010; Horn et al., 2012; Jex et al., 2012). One of our goals was to estimate the ability of these techniques to generate entire mt genome sequences of AM fungi from a reasonable amount of total gDNA in a semi-routine manner. The first published R. irregularis (FACE#494) mt genome was sequenced after a preliminary step of whole genome amplification (WGA; Lee & Young, 2009). WGA is a powerful approach when the quantity of biological material is limiting, but polymerase amplification introduces sequence errors and, moreover, is not sufficiently exhaustive to ensure that all regions of the mt genome will be represented (Pinard et al., 2006). From our experience of using the two 454 and Illumina sequencing techniques, we learned that the Illumina technique, combined with user-friendly bioinformatics software (CLC Genomics Workbench 5.0.1.), is the most efficient strategy to sequence and de novo assemble AM fungal mt genomes from the gDNA of R. irregularis. Total gDNA was extracted from a reasonable amount of spores (minimum of 10 000, i.e. one double-compartment Petri dish). The steady increase in the effectiveness/cost ratio of NGS techniques, the constant decrease in biological material required and the improvement in bioanalytical algorithms should favor their routine use.

mt genomes showed no variation within strains

As indicated in Table 2, in contrast with nuclear genomes, no SNPs were detected in mt genomes. A higher SNP rate in nuclear than in mt genomes has already been observed in yeasts (Clark-Walker, 1991), in contrast with the observations in mammals (Saccone et al., 2000). The fact that the assembly of the R. irregularis mt genome did not show any ambiguity, together with the absence of SNPs, argues for a single mt genome per strain, in agreement with the results of previous studies (Kuhn et al., 2001; Börstler et al., 2008; Lee & Young, 2009).

Interspecific phylogenetic relationships

The phylogeny of Glomeromycota and related taxa is still under debate as the relative position of this group varies according to the sequences used (nuclear, James et al., 2006; mt, Lee & Young, 2009; nuclear and mt, Liu et al., 2009) and the number of taxa used. Recently, two phylogenetic studies based on mtDNA-encoded proteins (Nadimi et al., 2012; Pelin et al., 2012) have proposed that Mortierellales and Glomeromycota are sister clades. Nadimi et al. (2012) have further defined that the robustness of this grouping is decreased when introducing in the analysis Smittium culisetae, a Harpellales species. After adding our sequences of mtDNA-encoded proteins to those available in Genbank, we obtained a tree (Fig. S1) in which Rhizopus oryzae and Mortierella verticillata grouped together as sister clades of Glomeromycota, still forming an independent clade. This new version of the relative position of basal fungi reinforces the comment of Nadimi et al. (2012) that only a broad taxon sampling of mtDNAs will allow the resolution of the phylogeny of basal fungi.

Intraspecific phylogenetic relationships and co-segregation of mt and nuclear genomes

By showing that the intraspecific variations were lower than the interspecific variations, we confirmed that all strains belong to the same species: R. irregularis (Fig. 1). However, our results, for both mt (LSU, CMG, VGE number and position) and nuclear (ITS and microsatellites) markers, showed concordant subgrouping of MUCL_43204 apart from the other strains (Figs S2, S3 and Table S2). Several studies have shown that nuclear material could be exchanged between strains (Angelard et al., 2010; Angelard & Sanders, 2011; Colard et al., 2012), but mt genomes were not investigated because of the lack of appropriate markers. Our results suggest the existence of co-segregation mechanisms and/or co-evolution of the two genomes. The mechanisms governing the transmission of mitochondria could intervene at the segregation or post-segregation levels. Indeed, mechanisms of vegetative incompatibility during anastomosis could prevent the mixing of mt material from different strains. Or, it can be envisaged that mechanisms of selection occur after hyphal fusion, driven by nuclear and mt genome interactions, as described in the oomycete Phytophthora infestans (Giovannetti et al., 1994). Although meiotic and mating type-like genes have been identified in Rhizophagus spp. (Halary et al., 2011; Tisserant et al., 2012), there is no evidence for sexual life. Clearly the mt genome homogeneity and nuclear genome heterogeneity within strains pose fundamental questions with regard to the mechanisms involved in the exchange of genetic material between individuals.

mt genome plasticity is generated by three types of VGE

The comparison of the five mt genomes allowed us to identify different VGEs: HEOs, DPDCO and SIRs.


HEOs participate in genome dynamics as a result of the creation and reorganization of introns (Gimble, 2000). Through the enzymes produced, these ORFs could confer to introns the ability to be spliced and/or to be re-inserted in the genome at a target site of 10–40 bp. These sequences are considered as mobile elements that do not alter the RNA produced by the host gene because of their ability to self-splice (Chevalier & Stoddard, 2001). HEOs are also known to generate partial duplication of the 3’ end of genes at their insertion spot, resulting in different gene versions (Paquin et al., 1997). In some cases, these genes can be differently spliced, giving HEOs an efficient role for neofunctionalization. In Podospora anserina, it has been shown that the first intron of the COX1 mt gene plays a crucial role in the lifespan of the organism (Begel et al., 1999). The COX1 gene of MUCL_43204 has nine insertions not present in the other strains; one is present in the first intron and corresponds to a degenerated HEO with a LAGLIDADG domain. Metabolic activation of mitochondria is important during the presymbiotic growth of AM fungi (Besserer et al., 2006, 2008). It would be interesting, through experiments on cell respiration, growth rate, host specificity and colonization efficiency, to examine the incidence of these HEO-induced genome rearrangements.


Several VGEs had partial or complete DPDCOs. mtDNA polymerases are largely unknown or uncharacterized in most eukaryotes (Shutt & Gray, 2006) with the exception of humans and yeasts (Kaguni, 2004). In R. irregularis, these motifs have a size range from 307 to 2021 bp; some are shared between strains and others are strain specific, thereby indicating great variability and plasticity of the mt genome. In contrast with FACE#494, the other five strains have an insertion (at position 7611 in FACE#494) that corresponds to a DPDCO (1560 bp). This observation in close isolates suggests that the insertion–deletion is recent and that these elements have the ability to be mobile. It has been proposed recently that the origin of DPDCOs in the ciliate Oxytricha trifallax comes from linear mt plasmids, which act as mobile elements that invade the mt genome periodically (Swart et al., 2011). In the Basidiomycete Moniliophthora perniciosa, the stable integration of a linear plasmid has been reported in the circular mt genome (Formighieri et al., 2008). This integrated plasmid is composed of two hypothetical proteins and an RNA polymerase on the forward strand, and of one hypothetical ORF and a DNA polymerase on the reverse strand. In the mt genome of strain MUCL_46241, a sequence with a large 5-kb ORF, the main source of genome size increase when compared with FACE#494, is also composed of two hypothetical proteins on the forward strain and a DNA polymerase on the reverse strain. This organization and the composition of this sequence suggest the insertion of a free mt plasmid. We propose that R. irregularis mt genomes have been subjected to similar integration phenomena and that other strain-specific DPDCOs are ancient integrated plasmids which have degenerated. As for HEOs, the incidence of DPDCOs in the metabolic activity of mitochondria has been investigated. In Podospora anserina, the integration of linear plasmids in the mt genome was correlated with greater longevity (Hermanns & Osiewacz, 1996).


The third type of VGE identified, SIRs, are small elements of 33–99 bp that can fold in a hairpin-like structure. These SIRs were first identified in Chlamydomonas reinhardtii (Boer & Gray, 1991). They are assumed to contribute to genome rearrangement through a mechanism analogous to bacterial transposition or recombination (Denovan-Wright & Lee, 1994; Nedelcu & Lee, 1998). In fungi, similar elements in the same size range were also found, but folded into a double hairpin element (DHE) (Paquin et al., 2000). The origin and mechanisms of dispersion of these elements remain unclear, but certain studies have reported that some are mobile elements (Nakazono et al., 1994), probably using a transposition model like ‘cut and paste’ (Grindley & Reed, 1985). Moreover, it has been reported that the target sites of homing endonucleases have a size range of 15–40 bp, and some have a palindromic organization and thus are close to the structure of SIRs (Jurica et al., 1998). In R. irregularis, we found SIRs in all the strains studied, in introns, hypothetical proteins, DPDCOs, HEOs or intergenic regions, but never in exons of CMGs, suggesting the presence of selection mechanisms before or after insertion. We identified five types of SIR differing in their sequences, able to be targeted by different kinds of homing endonucleases having different cleavage site specificities. We can speculate that a collaborative mechanism between SIRs and HEOs occurs: SIRs, containing the cleavage site, could transpose to a new position and then be cleaved by endonucleases which can insert their sequences creating new HEOs. This hypothesis is supported by the presence of different types of SIR and different HEOs in mt genomes. However, we were not able to identify a previously described HEO cleavage site (Jurica & Stoddard, 1999) in the different types of SIR. Several SIRs were conserved between strains, showing identical primary or secondary structures, although not at the same position in the genomes. This indicates that SIR transposition is a dynamic system, which is still active, and greatly involved in the plasticity of the mt genome of R. irregularis.

Mitochondrial markers are suitable for strain differentiation and monitoring

mt genomes have often been used to investigate fungal population dynamics in the field (Börstler et al., 2008, 2010; Sykorova et al., 2011) and in genetic studies, such as that investigating mitochondrial inheritance after crosses in the ascomycete Mycosphaerella graminicola (Ware, 2006). Thanks to the development of precise mt markers, we have succeeded in differentiating our five R. irregularis strains using one spore only for each fungal sample, and simple PCR and electrophoresis analyses. We also demonstrated that the strains MUCL_46240/MUCL_46241 could not be differentiated using nuclear microsatellites, by comparing their mt genomes or, a fortiori, when using the mt markers mtLSU and CMG. MUCL_46240 and MUCL_46241 were collected in two cultivated fields in Quebec (Canada), c. 30 km apart from each other. If we exclude any experimental error during strain isolation or subculturing, we must conclude that MUCL_46240 and MUCL_46241 are two isolates of the same strain. This stresses the interest in using the mt genome as a source of stable – one genome per individual – and robust markers for typing and tracing AM fungal strains. Finally, the mt primers defined from newly identified VGEs, which can be used in single spore PCR assays, will permit studies of mt exchanges in populations of R. irregularis, and will help to investigate the ‘parental’ inheritance of mitochondria in AM fungi.

In conclusion, we have identified new variable regions in mt genomes of R. irregularis. These regions are composed of three different types of VGE which provide new genetic markers for the precise monitoring of R. irregularis at the strain level. These markers will also be useful for fundamental studies on genetic exchanges in AM fungi.


The authors thank Jérôme Lluch (PlaGe, Campus INRA Auzeville, Castanet-Tolosan Cedex, France) and Francis Carbone (Laboratoire de Recherche en Sciences Végétales (LRSV), Castanet-Tolosan Cedex, France) for their role in Illumina sequencing. This work was supported by the FUI project Neofertil. M.M. and A.H. were funded by grants from Neofertil. D.F. was funded by Agronutrition. This work was performed in the LRSV laboratory, which is part of the TULIP ‘Laboratoire d'Excellence (LABEX) (ANR-10-LABX-41)’.