DNA barcoding of arbuscular mycorrhizal fungi


Author for correspondence:
Arthur Schüßler
Tel: +49 89 2180 74730
Email: arthur.schuessler@lmu.de


  • Currently, no official DNA barcode region is defined for the Fungi. The COX1 gene DNA barcode is difficult to apply. The internal transcribed spacer (ITS) region has been suggested as a primary barcode candidate, but for arbuscular mycorrhizal fungi (AMF; Glomeromycota) the region is exceptionably variable and does not resolve closely related species.
  • DNA barcoding analyses were performed with datasets from several phylogenetic lineages of the Glomeromycota. We tested a c. 1500 bp fragment spanning small subunit (SSU), ITS region, and large subunit (LSU) nuclear ribosomal DNA for species resolving power. Subfragments covering the complete ITS region, c. 800 bp of the LSU rDNA, and three c. 400 bp fragments spanning the ITS2, the LSU-D1 or LSU-D2 domains were also analysed.
  • Barcode gap analyses did not resolve all species, but neighbour joining analyses, using Kimura two-parameter (K2P) distances, resolved all species when based on the 1500 bp fragment. The shorter fragments failed to separate closely related species.
  • We recommend the complete 1500 bp fragment as a basis for AMF DNA barcoding. This will also allow future identification of AMF at species level based on 400 or 1000 bp amplicons in deep sequencing approaches.


This study aimed to define a DNA barcoding region for arbuscular mycorrhizal fungi (AMF) that also is useful for molecular in-field community studies. Despite the fact that AMF are perhaps the most important fungi in terrestrial ecosystems, forming mutualistic symbioses with c. 80% of land plants (Brundrett, 2009), much of their biology still is enigmatic. One recent example for a new and surprising finding are the Mycoplasma-related endobacteria of AMF (Naumann et al., 2010), with completely unknown function. The lack of knowledge about many aspects of AMF biology is partly because of their asexual, obligate symbiotic and subterranean lifestyle. All AMF belong to the phylum Glomeromycota (Schüßler et al., 2001) and molecular biological methods revealed cryptic species showing, for example, that spore morphs previously defined as different species in distinct families (e.g. morphs of Ambispora leptoticha) are conspecific (Sawaki et al., 1998; Redecker et al., 2000; Walker et al., 2007). However, the asexual reproduction and potentially clonal diversity complicate the interpretation of AMF species boundaries (Stukenbrock & Rosendahl, 2005). Despite this limitation, the present species concept is valuable, congruent with phylogenetic analyses (Walker et al., 2007; Msiska & Morton, 2009; Stockinger et al., 2009) and important for uncovering functional diversity. Unfortunately, the knowledge of preferential associations of AMF with plants under certain environmental conditions is still very limited, although a better understanding of differential AMF–plant associations and symbiotic preferences is of high ecological relevance and will affect sustainable management practices in agriculture and forestry.

Identification of AM fungal species from the field

Community analyses based on morphologically monitoring AMF spore occurrences in the soil reveal some important hints about the species composition in different ecosystems (Oehl et al., 2009; Robinson-Boyer et al., 2009), but spores are resting stages and may not reflect those species that are physiologically active at the time (Sanders, 2004). Moreover, relatively little is known about the influence of environment or host plant on sporulation dynamics over both space and time (Walker et al., 1982).

To overcome such drawbacks, molecular methods were developed to detect AMF directly within roots. The most frequently used markers are one or more of the nuclear rRNA genes, for example the widely used small subunit (SSU) rRNA gene (Helgason et al., 1999; Wubet et al., 2006; Lee et al., 2008), the internal transcribed spacer (ITS) rDNA region including the 5.8S rRNA gene (Wubet et al., 2004; Hempel et al., 2007; Sýkorováet al., 2007), and a part of the large subunit (LSU) rRNA gene (Gollotte et al., 2004; Pivato et al., 2007; Rosendahl et al., 2009). However, many molecular analyses are biased, as some of the primers used detect only parts of the community and the level of taxonomic resolution in most cases is uncertain. Species-level community analyses based on rDNA regions should be feasible (Gamper et al., 2009; Stockinger et al., 2009), but no single molecular marker or DNA barcode is yet suitable for species-level resolution of all AMF.

DNA barcoding for fungal species definition and identification

DNA barcoding in the strict sense is defined as the standardized analysis of an easily amplifiable PCR fragment for sequence-based identification of species. Identifications must be accurate, rapid, cost-effective, culture-independent, universally accessible and usable by nonexperts (Frézal & Leblois, 2008). By DNA barcoding, organisms can be identified in life cycle stages not suited for morphological identification (Gilmore et al., 2009).

In DNA barcoding, species are separated by standardized barcode gap analyses or phylogenetic tree-building methods. A barcode gap exists if the minimum interspecific variation is bigger than the maximum intraspecific variation. Alternatively, phylogenetic neighbour joining analysis based on Kimura two-parameter (K2P = K80) distances is a suggested standard method and in future more sophisticated phylogenetic methods will most likely be applied.

A part of the mitochondrial cytochrome c oxidase 1 (COX1) gene has become the first official animal DNA barcode (Hebert et al., 2004; http://www.barcoding.si.edu/) and for plants an agreed system is based on the plastid loci rbcL and matK (Hollingsworth et al., 2009), but no official consensus strategy exists for fungi. A standardized DNA-based species identification system for fungi would be extremely useful. There are c. 100 000 named fungi (Kirk et al., 2008), and estimates suggest that as many as 1.5–3.5 million species exist (Hawksworth, 2001; O’Brien et al., 2005). Identification of many of these, particularly from their vegetative state, will only be possible by molecular methods.

Primers have long been available for the nuclear ITS rDNA region (White et al., 1990; Gardes & Bruns, 1993) which are now commonly used for fungal identification (Kõljalg et al., 2005; Summerbell et al., 2007). The ITS rDNA region will probably be proposed to the Consortium for the Barcode of Life (CBOL, http://www.barcoding.si.edu) as a fungal barcode (Seifert, 2009). As for many other organism groups, fungal sequence data derived from inaccurately identified material exist in the public databases (Ryberg et al., 2008), and a lack of vouchers often precludes verification of sequences (Agerer et al., 2000). Unfortunately, third party corrections in the GenBank sequence database are prohibited (Bidartondo et al., 2008). Initiatives such as UNITE (http://unite.ut.ee) were established to provide validated and curated data, but such data are still lacking for AMF.

COX1 is not suited as general fungal barcode

Demonstration that the COX1 region is unsuitable for easy PCR-amplification, sequencing and species identification would preclude its use according to the CBOL standards. Although this region showed promise for Penicillium spp. (Seifert et al., 2007), the length of fungal COX1 is highly variable (1.6–22 kb). The shortest potential barcoding region varies in length from 642 bp to > 12 kb (Seifert, 2009). Moreover fungal species-level discrimination with COX1 genes may be inaccurate (Chase & Fay, 2009) and in Fusarium and the Aspergillus niger complex multiple paralogues hinder species-level resolution (Geiser et al., 2007; Gilmore et al., 2009). For the AMF Glomus sp. FACE#494, the barcoding region of COX1 spans 2200 bp and contains several introns (Lee & Young, 2009). Moreover, the mtDNA of Glomus diaphanum contains a COX1 intron with high sequence similarity to a corresponding COX1 intron in plants and Rhizopus oryzae (Lang & Hijri, 2009). The plant intron is thought to have originated by horizontal gene transfer (HGT) from fungi (Vaughn et al., 1995; Lang & Hijri, 2009), further questioning the general usability of COX1 as a barcode for either fungi or plants.

Defining a DNA barcoding region for AMF

Both potential primary barcoding regions –COX1 with its large length variation and the ITS rDNA with its lack of discrimination of closely related AMF species (Stockinger et al., 2009) – seem unsuited for AMF. Therefore, we aimed to define a DNA barcoding region for Glomeromycota by comparing different nuclear rRNA gene regions and the ITS.

We further on abbreviate the nuclear SSU rRNA gene as SSU, the LSU rRNA gene as LSU, and the 5.8S rRNA gene as 5.8S; the term ‘ITS region’ is used for the complete ITS1–5.8S–ITS2 rDNA (Fig. 1), for simplicity. A DNA fragment of 1420–1602 bp, amplified with AMF specific primers (Krüger et al., 2009) from species in widely separated AMF clades was sequenced. The fragment covers c. 240 bp of the SSU, the 400–526 bp long ITS region, and 776–852 bp of the LSU. We compared the complete fragment, the ITS region, the LSU region, and three c. 400 bp fragments, covering the 5.8S + ITS2, LSU-D1 or LSU-D2, for species resolving power and suitability as DNA barcode. This corresponds with the resolution level in environmental deep sequencing approaches using the present 454 GS-FLX Titanium system, with c. 400 bp average read lengths. The barcode we propose here will also facilitate the identification of species using future deep sequencing systems with > 1000 bp read lengths (http://www.454.com; http://www.pacificbiosciences.com).

Figure 1.

 Schematic representation of the nuclear ribosomal DNA regions studied. Triangles indicate positions of priming sites that were used as borders for in silico analyses of the fragments. Lines indicate the fragments analysed.

Materials and Methods

Taxa and public sequences used for analyses

The ‘core dataset’ sequences investigated in this study (see the Supporting Information, Table S1) cover the partial SSU, the ITS region and the partial LSU, completely covering a fragment spanning the region amplified with primers SSU-Glom1 (Renker et al., 2003) and NDL22 (van Tuinen et al., 1998). For all AMF analysed, a culture identifier or a voucher deposited in a herbarium (W-numbers) is known; for most, both items of information is available. The attempt (Att) numbers refer to the culture collection of Christopher Walker, BEG identifiers to the ‘International bank for the Glomeromycota’ (http://www.kent.ac.uk/bio/beg), INVAM to the ‘International culture collection of (vesicular) arbuscular mycorrhizal fungi’ (http://invam.caf.wvu.edu) and MUCL to the ‘Glomeromycota in vitro collection’(GINCO; http://emma.agro.ucl.ac.be/ginco-bel/). Some additional identifiers are listed in Table S1. For analysis of the five AMF species included in the AFTOL (assembling the fungal tree of life) project (James et al., 2006), the individual SSU, ITS and LSU sequences were assembled to a contiguous consensus sequence. For the ‘extended dataset’, analyses of the Ambisporaceae, Diversisporaceae and Glomus Group Aa additional public database sequences (Tables S2–S6) were included. Sequences probably derived from contaminants (Schüßler et al., 2003) were excluded.

DNA extraction, PCR amplification, cloning and sequencing

Spores were cleaned and DNA was extracted as described in Schwarzott & Schüßler (2001). At first, PCR was performed with the primers SSU-Glom1 combined with NDL22 or LR4+2 (Stockinger et al., 2009). Later, the PCR approach with AMF-specific primers described in Krüger et al. (2009) was used, for the majority of the AMF characterized (Table S1). Polymerase chain reactions with the Phusion High Fidelity DNA polymerase (Finnzymes, Espoo, Finland), cloning, restriction fragment length polymorphism (RFLP) analyses and sequencing were performed as described in Krüger et al. (2009), except for Glomus caledonium BEG20 which was amplified using a Taq DNA polymerase (Peqlab, Erlangen, Germany) and some clones that were obtained using the StrataClone Blunt PCR Cloning Kit (Stratagene Agilent Technologies, La Jolla, CA, USA). Sequences were assembled and proofread with seqassem (http://www.sequentix.de) and deposited in the EMBL database with the accession numbers FN547474FN547681.

Phylogenetic and sequence divergence analyses

The partial SSU, ITS region and the partial LSU sequences from this study and public database sequences covering the same regions were analysed (Table S1). Data were mainly from single-spore DNA extractions or single spore isolates of characterized AMF species. Shorter regions were separated either by the gene borders, or by primer binding sites. The fragments used for analyses were: the ITS region (400–526 bp) including the 5.8S and cut at the gene boundaries to the SSU and LSU; the LSU fragment (776–852 bp) covering the LSU until the binding site of primer LSUmBr (Krüger et al., 2009); the ITS2 fragment (352–430 bp) corresponding to an ITS3–ITS4 (White et al., 1990) amplicon including most of the 5.8S and the complete ITS2 region; the LSU-D1 fragment (281–394 bp) corresponding to a portion bordered by the LR1 (van Tuinen et al., 1998) and FLR3 (Gollotte et al., 2004) priming sites (whereas FLR3 is a forward primer); the LSU-D2 fragment (370–436 bp) corresponding to an FLR3-LSUmBr amplicon (Fig. 1).

For some analyses, shorter or less well-defined sequences from the database were included and manually aligned to the core dataset with align (http://www.sequentix.de) or arb (Ludwig et al., 2004; http://www.arb-home.de). The resulting dataset is referred to as ‘extended dataset’. Sequence divergences were calculated based on the K2P model (Kimura, 1980) with pairwise deletion of gaps, using the ape package of r (Paradis et al., 2004). To illustrate the sequence divergences within and between species, taxongap 2.3 (Slabbinck et al., 2008) was used.

The analyses of database sequences included some identical sequences where, from the database entries, it could not be excluded that these possibly originated from different spores or cultures. Phylogenetic analyses were performed with phylip 3.6 (Felsenstein, 2005) with neighbour joining tree-building based on K2P distances. A consensus tree was calculated from 1000-fold bootstrapped analyses with sumtrees (Sukumaran & Holder, 2008). As an alternative approach, sequences were aligned automatically using the MAFFT online server (MAFFT version 6; http://align.bmr.kyushu-u.ac.jp/mafft/online/server/) before phylogenetic analyses. The iterative refinement option of MAFFT was set to FFT-NS-i (Katoh et al., 2002). Phylogenetic trees were processed with treegraph2 (treegraph.bioinfweb.info), treeviewj (Peterson & Colosimo, 2007) and treedyn (Chevenet et al., 2006) and refined with Adobe Illustrator CS3.


The phylum Glomeromycota presently contains 219 described species. Of these, 81 are available as cultures from the INVAM, BEG and GINCO collections. Only some of these are single-spore isolates and some may be misidentified. Many undescribed or unaffiliated AMF are also hosted in culture collections. In the present work, we analysed a core dataset represented by 28 characterized AMF species from three different orders, with a focus on close relatives. For the Diversisporaceae, five of the eight known species could be covered, whereas within the Gigasporaceae (sensuMorton & Benny, 1990) and the Acaulosporaceae five of the 45 and four of the 36 known species, respectively, were studied. For the Pacisporaceae (seven species; not available as cultured AMF), one species could be analysed from stored DNA extracts from the study of Walker et al. (2004). In the monogeneric Glomerales 11 of 102 described Glomus species and in the Ambisporaceae two of eight could be studied. Further well-defined sequences were used for some groups, such as the Ambisporaceae ITS region for five of the eight known species. In general, the availability of well-defined isolates is a major bottleneck for the study of many AMF taxa.

We did not test the AM1-NS31 SSU fragment, used in many environmental studies including a recent 454 GS-FLX sequencing approach (Öpik et al., 2009), because the AM1 primer discriminates many AMF taxa and the amplified region lacks species resolution power.

Intraspecific rDNA sequence variation

No universal intraspecific percentage of sequence variation (K2P distance) could be defined as a threshold to separate AMF species. For the longest DNA fragment studied, SSUmCf-LSUmBr (c. 1500 bp, see Table S7, corresponding to the core dataset), the maximum intraspecific variation ranged from 0.47–10.8%. Considering only the seven species for which at least 24 sequence variants are available (Acaulospora laevis, Gigaspora margarita, Gigaspora rosea, Scutellospora gilmorei, Glomus intraradices, Glomus sp. ‘irregulare-like’ DAOM197198 and Glomus versiforme) the minimum intraspecific variation was 1.55%. The highest value of 10.8% was found in G. intraradices (cultures FL208 and MUCL49410).

The ITS region revealed a variation of 0.23–14.6%, or 2.96–14.6% when analysing only the seven species with at least 24 variants of the SSUmCf-LSUmBr fragment available. Glomus intraradices (FL208 and MUCL49410) again showed the highest intraspecific variation. The range of variation in the LSU-D2 fragment was 0–15.7% (2.8–15.7% for species with at least 24 sequence variants known), again with G. intraradices showing the highest value.

For the LSU-D1 fragment (LR1-FLR3), five species lacked intraspecific variation (number of distinct sequences in parentheses): Glomus sp. WUM3 (6), G. caledonium (3), Acaulospora scrobiculata (4), Glomus luteum (5), Diversispora celata (3). In general, this region showed the lowest intraspecific variation for most species analysed, with one exception, Kuklospora kentinensis (14) where the ITS2 fragment (ITS3–ITS4) showed the lowest variation with only a single basepair insertion in some sequences. Further K2P distance data are shown in the Supporting Information Figs S1, S2.

Barcode gap analyses

A barcode gap is not a prerequisite for DNA barcoding, but may allow easy distinguishing of species (Hebert et al., 2004). Barcode gaps could not be found for all AMF species studied. Comparison of the different regions, regardless of the alignment method used (Table S7, Fig. S1), showed the complete fragment (SSUmCf-LSUmBr) resulting in the lowest number (4) of species without a barcode gap, followed by the complete ITS region (5) and the LSU region (7). Analysis of the LSU-D2 fragment also resulted in seven species lacking a barcode gap, whereas the LSU-D1 fragment revealed 12 species without a barcode gap. The ITS2 fragment (covering most of the 5.8S) resulted in eight species without a barcode gap. For the complete fragment, the size of the barcode gaps, if they existed, varied from only 0.1% to 22%. Some further analyses of the Ambisporaceae and Diversisporaceae are shown in Fig. S2.

Phylogenetic analyses of the core dataset

The Gigasporaceae, Acaulosporaceae, Diversisporaceae, Ambisporaceae, Glomus Group B, Glomus Group Aa and Glomus Group Ab were analysed separately, as the high variation in the ITS region made it impossible to align across family level groups. For each group, five defined regions covered by the SSUmCf-LSUmBr fragment were analysed (Fig. 1). All positions in the alignment were included in the neighbour joining analyses (Figs 2, S3–S8), as summarized in Table 1 for the core dataset (Figs 2, S3–S8).

Figure 2.

 Phylogenetic tree computed from all c. 1500 bp SSUmCf-LSUmBr fragment sequences analysed (core dataset), demonstrating species level resolution. Neighbour joining analyses (1000 bootstraps) with bootstrap (BS) support displayed down to the level of species. Note that the BS support values differ from those given in Table 1, because an unambiguous alignment of internal transcribed spacer 1 (ITS1) and ITS2 sequences between families, as computed here, is impossible. Therefore, the BS values shown here are biased by ambiguously aligned sites in the highly variable regions and for species level comparison the values from Table 1 should be referred to. The corresponding species is written to the right of each cluster; every second cluster is highlighted in grey.

Table 1.   Respective bootstrap values supporting species as monophyletic after neighbour joining analyses (based on K2P distances, 1000 bootstraps) of six different regions (complete SSUmCf-LSUmBr fragment, complete internal transcribed spacer (ITS) region, ITS2, large subunit (LSU), LSU-D1 and LSU-D2 fragments)
Gigaspora margarita88755547 34
Gigaspora rosea100904890 59
Scutellospora gilmorei100998893 69
Scutellospora spinosissima9298 95  
Scutellospora heterogama100991001009798
Length of alignment (positions)1505468795394398376
Acaulospora laevis100100100100100100
Acaulospora scrobiculata100100100100100100
Acaulospora sp. WUM18100100100100100100
Kuklospora kentinensis100100100100100100
Length of alignment (positions)1591525826436403401
Diversispora celata100951007099100
Diversispora spurca1009610097 100
Glomus aurantium100949495 94
Glomus eburneum10075100729993
Glomus versiforme100100100100100100
Length of alignment (positions)1600497860407398440
Glomus cf. clarum100100100100100100
Glomus intraradices72     
Glomus sp. ‘irregulare-like’100969953 95
Glomus proliferum9480    
Length of alignment (positions)1644540863437400440
Glomus mosseae10097100939899
Glomus sp. WUM31009710098 100
Glomus caledonium1001009699 97
Glomus coronatum1001001001009999
Length of alignment (positions)1664565862448397442
Glomus etunicatum100991009096100
Glomus sp. W3349100100100100100100
Glomus luteum1001001001009693
Length of alignment (positions)1624539843433392430

The complete fragment (SSUmCf-LSUmBr) provided the best discriminatory power. Each of the analysed species was resolved with bootstrap support of at least 72%, for most species of > 90%. The AFTOL sequences of Glomus mosseae and Scutellospora heterogama cluster with those of the corresponding species. Sequences of Glomus sp. ‘irregulare-like’ DAOM197198 (= MUCL43194 = DAOM181602, used for the running Glomus genome sequencing project) and ‘GINCO #4695rac-11G2’ cluster with those of Glomus irregulare, and together are likely representing one species, confirming the evidence of Stockinger et al. (2009).

Almost all species could be separated using the complete ITS region, except G. intraradices and its close relatives. The same situation was reported for maximum likelihood analyses of this region (Stockinger et al., 2009) and holds true for analyses of the LSU region only. Using the LSU, Scutellospora spinosissima (three sequences) and Glomus proliferum (15 sequences) neither were resolved as monophyletic and the Gigaspora rosea clade (27 sequences) had bootstrap support below 50%. When the ITS2, LSU-D1 and LSU-D2 fragments were analysed separately, the LSU-D1 fragment performed worst with sequences from 11 of the 25 species not forming monophyletic clades. The ITS2 and LSU-D2 fragments performed better, but still did not separate G. proliferum (15 sequences) from G. intraradices (47 sequences). Gigaspora margarita BEG34 did not form a well-supported clade for either fragment. As for the 800 bp LSU, S. spinosissima (three sequences) was not resolved in the LSU-D2 analysis.

Although not included in the CBOL standards or recommendations, a blast approach was tested in addition to the phylogenetic analyses. We used the blastn default settings of NCBI in both, public database and local blast searches, and studied all SSUmCf-LSUmBr fragment sequences for their correct identification. This alternative approach always resulted in first hits corresponding to the correct species (data not shown).

Phylogenetic analyses of the extended dataset

Shorter sequences from the public database, selected according to their assigned name or culture identifier, were included in some analyses. In addition, some environmental sequences were used, predominantly from the Ambisporaceae, Diversisporaceae and Glomus Group Aa.

Analyses of Ambisporaceae Only two Ambisporaceae species SSUmCf-LSUmBr fragments were available (Table S7, Fig. S1), but five ITS regions and several environmental sequences of Ambispora species could be analysed. All were phylogenetically well separated (Fig. S9). The environmental sequences (number in parentheses) from Taxus baccata (6), Prunus africana (1) or Plantago lanceolata (1) roots form branches distant from the characterized species.

Analyses of Diversisporaceae The ITS analyses of the Diversisporaceae (Fig. S10) did not reveal any fundamental differences from the analyses of the core dataset (Fig. S7). At this point, we draw attention to the fact that several Glomus species have not yet been formally transferred to the genus Diversispora and therefore carry the ‘wrong’ genus name. The four ITS database sequences from the INVAM cultures AZ237B from Arizona together with the four sequences of NB101 from Namibia are most likely of conspecific origin. Also, a set of 30 environmental ITS sequences annotated as G. versiforme in the database, cluster separately from G. versiforme BEG47 and should be annotated as unknown Diversispora species. It was already known that Glomus fulvum (five sequences), Glomus megalocarpum (2) and Glomus pulvinatum (2) form a clade much apart from other Diversisporaceae species and together probably represent a distinct genus (Redecker et al., 2007).

For the LSU analyses (Fig. S11), the four database sequences (AM947664,65, AY842573,74) from G. versiforme BEG47 clustered with the 25 sequences of our BEG47 core dataset sequences, but the sequence EU346868 from a G. versiforme culture HDAM-4 was widely separated. All database sequences (EF067886-88) referring to Glomus eburneum INVAM AZ420A as well as D. celata (Gamper et al., 2009) clustered with those of the respective species in our core dataset. Three Glomus aurantium LSU database sequences (EF581860,62,63) are separated from two other sequences (EF581861,64). All five sequences are linked to voucher W4728 and originate from one trap culture setup with material collected near Tel Aviv in Israel (J. Błaszkowski, pers. comm. 21 September, 2009). As trap cultures usually contain several species, it is not certain that the sequences in the subclades were derived from conspecific organisms.

Analyses of Glomus Group Aa (‘Glomus mosseae group’)  Analysis of our core dataset of this group showed clear separation of species with the ITS region, the ITS2 fragment, and both LSU fragments analysed. However, the situation changed when including database sequences for the ‘extended dataset’ (see Figs 3, S4).

Figure 3.

 Internal transcribed spacer (ITS) region (a), ITS2 fragment (b) and the large subunit (LSU)-D2 fragment (c) neighbour joining analyses (1000 bootstraps) of Glomus Group Aa. Analysis (c) is performed with a different dataset than (a) and (b) (for details see the Supporting Information, Tables S5, S6). Some long branches were reduced in length to 50% (//). ‘AY635833, AY997053, DQ273793’ represents the consensus sequences of these sequences. Glomus mosseae (closed square), Glomus sp. WUM3 (grey circle), Glomus coronatum (grey triangle, apex up), Glomus caledonium (black triangle, apex right), Glomus monosporum (open square with cross), Glomus fasciculatum (diamond), Glomus geosporum (grey triangle, apex down), Glomus dimorphicum (open square), Glomus constrictum (black circle), Glomus fragilistratum (grey triangle, apex right).

For the ITS region, Glomus sp. WUM3 (six sequences), G. caledonium (10 sequences) and Glomus geosporum (31 sequences) formed well-separated clades. Glomus mosseae sequences formed two well supported subclades (Fig. 3), which were rendered paraphyletic by the clustering of the ex-type of Glomus coronatum BEG28 (16 sequences) in between. However, the minor G. mosseae clade (only seven sequences) consists exclusively of sequences derived from field sampled spores with identifiers GMO2 and GMO3. From spore GMO2 one sequence (AF161058) clusters in the minor clade while the other entire ones (AF161055-57, AF166276) cluster within the major clade.

The ITS sequences in Glomus Group Aa reveal more discrepancies. Glomus monosporum (IT102: AF004689; FR115: AF004690, AF125195), Glomus dimorphicum (BEG59: X96838-41) and ‘Glomus fasciculatum’ BEG58 (X96842,43; but see following text) sequences cluster in the major G. mosseae clade.

For the G. mosseae major clade (excluding the GMO2 and GMO3 sequences), the intraspecific variation of the complete ITS region is 12.1% (100 sequences). When adding the G. monosporum, G. fasciculatum BEG58 and G. dimorphicum sequences clustering in this clade the variation increased only marginally to 12.2% (109 sequences). The intraspecific variation of the other characterized species within Glomus Group Aa varied between 0.8 and 2.8%.

The LSU-D2 fragment analysis resulted in clear separation into several well-supported clades (Fig. 3), but some contain sequences from more than one species. One Glomus fragilistratum sequence clusters within the G. caledonium clade. One G. coronatum BEG49 sequence is distant from those of the ex-type culture G. coronatum BEG28 (=Att108). BEG49 clusters with Glomus sp. WUM3, but a Glomus constrictum BEG130 sequence also falls in this clade. The intraspecific variation of the LSU-D2 fragment is 19.4% (170 sequences). The major G. mosseae clade had a variation of 15.8% (158 sequences) and the smaller clade of 11.2% (12 sequences). The other species in this group showed an intraspecific variation between 1.2–5.0% (5–28 sequences, respectively).


In this study, we analysed several regions of the nuclear rDNA region as possible candidates for DNA barcoding of AMF, including the ITS region which is widely used for identification of fungi. Because it was demonstrated that the ITS region alone is unsuitable to resolve closely related AMF species (Stockinger et al., 2009), whereas a longer, 1500 bp fragment could be successfully applied, we used this longer rDNA fragment as a baseline. Moreover, c. 400 bp fragments were analysed for their power to resolve species and suitability for community analyses using the 454 GS-FLX Titanium pyrosequencing method (Valentini et al., 2009).

Intraspecific rDNA variation and its definition

In the present study, we calculated intrasporal and intraspecific rDNA variability for several species. However, the determination of species in the Glomeromycota is largely based on a morphological species concept and the apparent asexual lifestyle may complicate the interpretation of species borders, though asexual speciation is found in diverse organism groups. For AMF, perhaps the best-studied clade, Glomus Group Ab, may exemplify the problems. A very high intraspecific variation was found in G. intraradices (Stockinger et al., 2009). This was characterized from two isolates and the parent culture of one of the isolates (the ‘ex-type culture’ of this species, FL208, derived from a root trap culture). The 1500 bp rDNA from a single spore, interestingly, roughly encompassed the amount rDNA variation and moreover also the pattern of sequence types found in the entirety of samples analysed, which were derived from two isolates and the FL208 culture. Both isolates originated from the same field site, but from material sampled 20 yr apart. The results raise questions such as whether one AMF spore contains most of the existing intraspecific rDNA variation, or whether the similarity in the sequence type patterns reflects, for example, the sampling of two recent descendents of a clonal lineage. These are open questions, but the closely related ‘G. irregulare-clade’ (likely representing a single species) contains a huge number of sequences derived from diverse ecosystems and many continents. Glomus intraradices sequences have never been detected in these ecosystems, but are up to now only known from Citrus sp. in Florida. We interpret these data as most likely reflecting a biologically meaningful genetic separation of different organisms. Although we can currently separate all morphospecies studied, and take this as support for the applicability of DNA barcoding for AMF, it must be noted that the species concept used to define these asexual organisms may change.

The intraspecific and intrasporal variation varied considerably among the studied AMF, for all regions analysed (Figs S1, S2). Here, we followed the CBOL barcoding standards (http://www.barcoding.si.edu) and used K2P distances. We stress this because the numbers for sequence variation differ significantly, depending on the method used for estimation; for example, the G. intraradices ITS region (47 sequences) 14.6% K2P distances correspond to > 23% uncorrected distances including gaps as a fifth character (Stockinger et al., 2009). Similarly high K2P distances occur for the ITS region of G. mosseae (12.2%, 109 sequences). The intrasporal ITS variation we found in the G. mosseae sequences was 4.6% (16 sequences) and only slightly increased to 5.3% when adding 45 database sequences from cultures with geographically widespread origin published in Avio et al. (2009). An example for high ITS variation is G. fulvum (Diversisporaceae), where the addition of one sequence raises the variability from < 10% to 15% (five sequences in total). The ‘outlier’ sequence is derived from a different geographical location and might also represent a closely related, but distinct species.

In general, for AMF the simple use of a percentage variation value as threshold to define and cluster molecular operational taxonomic units (MOTUs) for species identification must be considered inapplicable.

Barcode gap and phylogenetic analyses

The comparison of the maximum intraspecific and the minimum interspecific variation revealed that none of the studied DNA fragments allowed absolute AMF species separation by barcode gap analyses. Evidently, when based on the rDNA regions studied, this method cannot be applied to AMF. In general, barcode gaps may often be an artefact of insufficient taxon sampling (Wiemers & Fiedler, 2007). The likely existence of a large number of undescribed and uncharacterized species (Sýkorováet al., 2007; Öpik et al., 2009) adds further complexity to the topic. Moreover, there are several inaccurate species determinations in the public sequence databases and contaminant sequences cannot be ruled out when using spores from mixed species cultures (Schüßler et al., 2003). Examples of inconsistencies are G. fasciculatum BEG53 and BEG58 sequences that cluster in Glomus Group Ab and in Glomus Group Aa, respectively. Morphologically interpreted, it is very unlikely that the BEG58 sequences belong to G. fasciculatum (Lloyd-Macglip et al., 1996).

DNA barcode-based identification of species can also be derived from phylogenetic inference. The simple neighbour joining analysis based on K2P distances of the complete fragment (SSUmCf-LSUmBr) resulted in support for all species investigated here. It allowed a distinction between all closely related species in Glomus Group Ab. The species concept in this difficult group is also supported by the fact that the mitochondrial LSU rDNA as a marker (Börstler et al., 2008) distinguishes G. intraradices from the genome sequenced Glomus species DAOM197198 that is represented by the ‘G. irregulare clade’.

For the 1500 bp fragment blast searches performed well and could be an alternative tool for identification, but this may be problematic for unknown species. It should be kept in mind that similarity-based comparisons can be misleading and phylogenetic methods generally perform better. Therefore, we recommend a phylogenetic approach, but blast surely is an alternative for fast data screening or to select sequences to be analysed more in detail.

The ITS region

The ITS region resolved many of the known species, but not the closely related members within Glomus Groups Ab and Aa, respectively. However, the ITS region was suited to resolve relatively closely related species in the Ambisporaceae (Walker et al., 2007), and also shows, for example, that a set of environmental ITS sequences labelled as G. versiforme does not cluster with those of G. versiforme BEG47 and probably represent distinct species. The ITS region might be useful for species delineation, but with some limitations.

Other problems with species resolution might be caused by synonyms. For example, in Glomus Group Aa several sequences with uncertain assignment to species are from G. dimorphicum and G. monosporum, which were, on morphological grounds, discussed as possibly conspecific with G. mosseae (Walker, 1992). However, the difficulties might also result from the use of mixed species cultures. The fungus identified as G. monosporum INVAM FR115 was in a culture that also contained spores of G. mosseae and Paraglomus occultum (http://invam.caf.wvu.edu/cultures/accessionculturedetails.cfm?ID=6356, 12.02.2010). The G. monosporum culture INVAM IT102 also contained G. mosseae and Glomus etunicatum spores (from http://invam.caf.wvu.edu/cultures/accessionculturedetails.cfm?ID=6895, 12 Feb 2010). It can therefore not be ruled out that the spores identified as G. mosseae and G. monosporum are of conspecific origin, or that contaminant sequences gave rise to incorrect assignation.

The G. mosseae ITS sequences formed two distinct clades, with the minor clade consisting only of sequences from two field sampled spores (GMO2 and GMO3). As already discussed in Antoniolli et al. (2000) spore GMO3 could be an unidentified species, and the ‘outlier’ sequence AF161058 from spore GMO2 might be interpreted as a contaminant originating from GMO3. Currently, when including the database ITS sequences, it seems impossible to state whether the G. mosseae clade consists of one species or several species that cannot be separated or have been misdetermined. Analysing the complete fragment (SSUmCf-LSUmBr) for more and well-defined isolates may solve such questions.

The LSU region

Using the 800 bp LSU region of the core dataset resulted in more unresolved species than using the ITS region, but the LSU-D2 region alone showed about the same species resolution power as the ITS region. The LSU-D1 fragment behaved worst with both extended and core datasets. It seems unsuited for obtaining good resolution and this may explain why the 800 bp LSU region resolution is not better than that of the shorter LSU-D2. The G. mosseae sequences analysed by Rosendahl et al. (2009), from cultures with geographically widespread origin, all fell into the main G. mosseae LSU subclade (Fig. 3, lower clade). The authors proposed, based on the genetic variability found in the LSU and in FOX2 and TOR gene introns, that these cultures are closely related and the panglobal distribution likely was caused by anthropogenic dispersal. It should also be mentioned that three single-spore isolates (HG isolate 209, BEG224, JJ isolate 243) each gave rise to divergent sequence variants located in both G. mosseae LSU subclades. This indicates that the rDNA variation reported in some other studies is an underestimate, caused by a lack of detection of less frequent sequence types (represented by the upper LSU-D2 subclade in Fig. 3).

DNA fragments for deep sequencing technologies

The 454 GS-FLX Titanium pyrosequencing technology currently allows an average read length of c. 350–450 bp and offers great potential for ecological studies. Our data demonstrate that a read length of 400 bp will not be sufficient to identify all AMF species with certainty, based on neighbour joining analyses using such a short fragment only. However, there are alternative phylogenetic approaches that may overcome this lack of resolution when taking an alignment based on longer sequences as a ‘backbone’ for the phylogenetic inference. For example, the program raxml 7.2.6 (http://arxiv.org/abs/0911.2852v1; Stamatakis et al., 2010) includes a novel likelihood-based algorithm for evolutionary placement of short reads into a given reference tree of full length sequences. We show the LSU-D2 and ITS2 fragments to be good candidates for species identification by 454 pyrosequencing. The LSU-D2 region may be preferred if AMF sequences are specifically amplified from roots or soil (Krüger et al., 2009). In studies where the diversity of other groups of fungi is also investigated, the ITS2 fragment is a good alternative and can be amplified with established primers for fungi. Although most such published ITS and LSU region primers do not match all AMF sequence variants, many do not strictly discriminate AMF taxa, as they match at least 50% of the known intraspecific sequence variants. These primers are ITS1 (White et al., 1990) with a ratio of total number of sequences analysed : total mismatches : 3′-end mismatches in the last four sites of 1250 : 56 : 5, ITS4 with 1271 : 23 : 5, ITS5 (White et al., 1990) with 1217 : 36 : 4, LR3 (http://www.biology.duke.edu/fungi/mycolab/primers.htm) with 929 : 24 : 15 and ITS1F (Gardes & Bruns, 1993) with 1250 : 75 : 4. ITS1F shows mismatches to a number of AMF, such as most Ambispora species, some Glomus species, Scutellospora projecturata and many members of the Diversisporaceae and Acaulosporaceae, but at positions that should not hamper amplification if PCR conditions are not too stringent. Conversely, the following primers must be interpreted as not suited to amplify all AMF: the LSU forward primer FLR3 (1239 : 128 : 64) discriminates, for example some Scutellospora and Paraglomus species; ITS3 (1219 : 604 : 577) mismatches at the 3′-end to most Glomus Group Ab, Ambisporaceae and an unidentified Acaulospora species. Moreover, it has up to five 5′-end mismatches to the Geosiphon pyriformis sequences.

New developments in 454 pyrosequencing methods will soon allow a read length of 1000 bp. For this, new primers could be designed targeting a fragment consisting of the ITS2-LSU region (complete ITS2 and LSU until primer LSUmBr), with a length of c. 960–1117 bp. This fragment allowed resolution of all species investigated by NJ analyses (data not shown), although with lower bootstrap support when compared with the 1500 bp fragment.


We have shown that barcode gap analyses based on the rDNA regions are not suited for AMF barcoding. The intraspecific variation seems heterogeneous and exceptionally high in some groups. Phylogenetic analyses of the c. 1500 bp SSUmCf-LSUmBr rDNA fragment distinguished all species investigated, whereas shorter rDNA fragments did not allow a separation of very closely related species. The LSU-D2 and ITS2 fragments appear most suitable for high-throughput 454 GS-FLX Titanium pyrosequencing technology with 400 bp read length,

However, in addition to methodological aspects, species recognition is mainly hampered by the lack of a comprehensive and accurate baseline dataset and accessibility of biological material. To overcome this and to avoid problems using mixed or cross-contaminated cultures it would be desirable to establish, provide and use single-spore isolates. Many open questions could be answered by studying more defined cultures and isolates, or sometimes by more in-depth characterization of field material. Surprisingly, for many very recently described AMF species no biological material seems to be available at all, except for the voucher that is needed for the formal description. Consequently these species are not available from culture collections, making any proof or improvement of concepts very difficult.

From the molecular biological point of view, the use of proof reading polymerases under optimal PCR conditions is highly recommended, as it considerably reduces PCR errors and sequence chimaera, as discussed in Lahr & Katz (2009) for example, although it should be noted that the Phusion-PCR conditions used in that paper are unsuitable (see http://www.finnzymes.com). To mark errors in the public databases, a third party annotation facility in GenBank (as proposed by many mycologists, such as Bidartondo et al., 2008) would help, but unfortunately is not allowed. Therefore, curated databases such as UNITE currently seem to be the only option to provide reliable data.

For future analyses, a ‘quantitative world of community analysis’ beyond the current limit of 400 bp read length will be feasible, as 1000 bp 454-reads are possible (http://www.454.com) and new high throughput (and possibly low-cost) sequencing technologies may allow even longer reads, soon (e.g. Pacific Biosciences, http://www.pacificbiosciences.com; Eid et al., 2009). This may be taken as another argument in favour of using longer DNA barcodes for better species resolution, as suggested here.

As a baseline for Glomeromycota DNA barcoding, we propose the sequencing of variants of the easily PCR amplifiable SSUmCf-LSUmBr 1500 bp fragment. We also recommend that such a molecular characterization should be included in AMF species descriptions whenever possible. The sequence data will be very important for future molecular ecological studies of AMF–plant associations and preferences in the field, which are still mostly hidden.


The grant for H.S. was funded by the Marie Curie Early Stage Research Training Fellowship of the European Community’s Sixth Framework Programme (MEST-CT-2005-021016, ‘TRACEAM’). The grants for M.K. and A.S. were financed by the German Research Foundation (DFG). Thanks to all who supplied samples. We thank Chris Walker for discussion and proofreading of the manuscript.