DNA barcoding of lichenized fungi demonstrates high identification success in a floristic context

Authors


Author for correspondence:
Laura J. Kelly
Tel: +44 20 83325374
Email: l.kelly@rbgkew.org.uk

Summary

  • Efforts are currently underway to establish a standard DNA barcode region for fungi; we tested the utility of the internal transcribed spacer (ITS) of nuclear ribosomal DNA for DNA barcoding in lichen-forming fungi by sampling diverse species across eight orders.
  • Amplification of the ITS region (ITS1–5.8S–ITS2) was conducted for 351 samples, encompassing 107, 55 and 28 species, genera and families, respectively, of lichenized fungi. We assessed the ability of the entire ITS vs the ITS2 alone to discriminate between species in a taxonomic dataset (members of the genus Usnea) and a floristic dataset.
  • In the floristic dataset, 96.3% of sequenced samples could be assigned to the correct species using ITS or ITS2; a barcode gap for ITS is present in 92.1% of species. Although fewer species have a barcode gap in the taxonomic dataset (73.3% with ITS and 68.8% with ITS2), up to 94.1% of samples were assigned to the correct species using BLAST.
  • While discrimination between the most closely related species will remain challenging, our results demonstrate the potential to identify a high percentage of specimens to the correct species, and the remainder to the correct genus, when using DNA barcoding in a floristic context.

Introduction

Molecular techniques have a long history of use for species identification in the Fungi (reviewed by Seifert, 2009; Begerow et al., 2010). However, DNA barcoding can be distinguished from other molecular identification tools as it requires the use of a standardized DNA region across a given taxonomic group, such as a portion of the mitochondrial gene COI for animals (Hebert et al., 2003) or regions of the plastid genes matK and rbcL for land plants (CBOL Plant Working Group, 2009). Recently, efforts have begun to establish a standard DNA barcode for fungi (see reviews by Seifert, 2009; Begerow et al., 2010). Although several studies have investigated the possibility of applying the COI barcode region to fungi, results have been variable, with successful species discrimination in some groups (Seifert et al., 2007), but problems with the presence of paralogous copies and introns in others (reviewed by Seifert, 2009). The internal transcribed spacer (ITS; comprising ITS1–5.8S–ITS2) of nuclear ribosomal DNA (nrDNA) is more commonly sequenced in fungi than any other region of DNA (Begerow et al., 2010) and has been endorsed by representatives of the mycological community for use as the fungal DNA barcode (Seifert, 2009). Indeed, the Barcode of Life Data Systems (BOLD; Ratnasingham & Hebert, 2007) now includes blast-based identification of fungi using the ITS region (although the small size of the current reference database limits the utility of this tool). However, it has also been suggested that the ITS2 spacer alone might be a useful barcode region for fungi (e.g. Chen et al., 2010), particularly in the context of environmental studies that make use of high throughput sequencing technologies (Nilsson et al., 2009; Stockinger et al., 2010). By contrast, the ITS1 spacer alone has been used recently as a DNA barcode, in combination with 454 sequencing, to identify species and assess fungal diversity in environmental samples (Buée et al., 2009; Tedersoo et al., 2010). Thus, the formal adoption of the ITS as the official fungal DNA barcode (for which approval from the Consortium of the Barcode of Life is required) awaits further analysis to assess how well this region performs across a variety of fungal groups (Seifert, 2009; Eberhardt, 2010).

DNA barcoding for lichenized fungi

With c. 13 500 accepted species, lichenized fungi account for c. 18% of known fungal diversity (reviewed by Hawksworth, 2001) and it has been estimated that 28 000 species are likely to occur worldwide (Lücking et al., 2009). Although lichenized fungi represent one of the best known components of fungal diversity (Schmit & Mueller, 2007), their accurate identification is a nontrivial task that requires taxonomic expertise, particularly in the case of complex groups where the characters used for identification may be subtle and difficult to discern. Moreover, even in expert hands some samples may prove intractable, such as juvenile or fragmentary material that lack the morphological or chemical characters necessary to make an accurate identification. DNA barcoding presents the opportunity to capture specialist knowledge of lichen taxonomy, in the form of a reference sequence database built from expertly identified voucher specimens, and translate this into an accessible tool. The accurate identification of lichen samples is required for applications such as biomonitoring of air pollution (Garty et al., 2002) and the bioindication of habitats of conservation importance (Nascimbene et al., 2010) as well as for assessment of species distributions and effective conservation of lichen biodiversity (Hunter & Webb, 2002). The availability of a robust and accurate system of DNA barcoding for the identification of lichenized fungi could facilitate such processes.

The ITS region has been used extensively in the study of lichenized fungi, including in the assessment of species boundaries and in testing the correlation between genetic and morphological diversity in species complexes (Wirtz et al., 2008; Wedin et al., 2009; Kotelko & Piercey-Normore, 2010). Recently Del-Prado et al. (2010) tested the utility of the ITS for species delimitation in the large family Parmeliaceae where, in most cases, there was no overlap between intraspecific and interspecific distances. Cases where genetic distances within and between species did overlap coincided with taxa that are considered to be species complexes, and the authors concluded that ITS data can be used to aid species identification of lichenized fungi (Del-Prado et al., 2010). However, lichen-forming fungi are found within several major fungal lineages (see Schoch et al., 2009) and further data are required to test the ability of the ITS region to distinguish between species in these diverse groups.

Testing the effectiveness of DNA barcoding

The lichen flora of Great Britain and Ireland is extremely well known (see Smith et al., 2009) and represents an ideal test-case for DNA barcoding as the essential taxonomic foundation is in place (Kress et al., 2009), without which the ability of barcodes to identify samples to species cannot be assessed. Here we use taxonomic and floristic test-cases, comprising samples from Great Britain and Ireland, to assess the utility of the ITS region as a DNA barcode for lichenized fungi for two practical applications where its potential as a species identification tool is likely to be of greatest use; first to distinguish between closely related and ‘hard-to-identify’ species and second to facilitate biodiversity surveys within a given geographic location. Usnea (Parmeliaceae), a cosmopolitan genus containing > 600 species (Wirtz et al., 2006), was chosen as the taxonomic test-case. The genus itself can be easily recognized by its gross morphology, but the identification of individual species can be extremely difficult and specimens with atypical states for morphological characters used in species delimitation are not uncommon (Randlane et al., 2009). Moreover, identification of specimens may require the use of chemical characters, necessitating the use of thin layer chromatography (TLC; Clerc, 1998). The difficulties associated with identification of Usnea species make this group an ideal target for DNA barcoding. Our second test-case aimed to assess the utility of barcoding within a floristic context by sampling c. 100 species of lichenized-fungi from habitats of conservation importance within the Cairngorms National Park, Scotland. Samples were primarily collected from native aspen stands (Invertromie Wood) within the Insh Marshes National Nature Reserve, a habitat-type of high species diversity and conservation importance (Cosgrove & Amphlett, 2002; Parrott & MacKenzie, 2009). In addition, we visited further sites within the same watershed, targeting National Vegetation Classification (NVC) W18: Pinus sylvestrisHylocomnium splendens woodland (Rodwell, 1991). As the ITS2 spacer has also been proposed as a possible DNA barcode in fungi (Nilsson et al., 2009; Chen et al., 2010), we also assessed the potential utility of this subregion of the ITS for species identification.

Materials and Methods

Sampling of study material

For the taxonomic test-case, we sampled 112 Usnea specimens (both freshly collected and herbarium material), representing 16 of the 19 species occurring in the British Isles, with collections targeted to maximise the geographic range represented (Supporting Information Table S1). Determinations were based on standard morphological (Clerc, 1987a,b; Smith et al., 2009) and chemical characters; all Usnea samples were subjected to standard thin layer chromatography methods (Culberson, 1972; Orange et al., 2001). For species where distinct chemotypes are recognized, we aimed to include material with a range of chemistries. Samples were assigned to categories, based on the ease with which a final determination was reached: (1) identified by morphology alone; (2) identification required TLC data and morphology; (3) morphologically ambiguous specimens requiring an iterative process of identification, utilising TLC and ITS sequence data; (4) ambiguous specimens with conflict between morphology/chemical characters and ITS data. In addition, a fifth category – (5) juvenile material lacking the necessary morphological/chemical characters to make an identification – was included to test the ability of DNA barcodes to identify poorly developed specimens.

For the floristic test-case, 248 specimens were collected from native woodland habitats, comprising 94 species from 55, 28 and 8 genera, families and orders, respectively, of lichenized fungi (Table S2) following Smith et al. (2009). Determinations were based on standard morphological and chemical characters. The epiphytic flora was sampled extensively; saxicolous and terricolous macrolichens were also included. Nine Usnea samples from the floristic dataset were also included in the Usnea dataset; the total number of specimens sampled for this study was 351. Samples for extraction were selected under a dissecting microscope, and comprised either young vegetative material of macrolichens, sections of apothecia for crustose taxa or small homogeneous scrapes of sterile crusts. Separate individuals of crustose taxa were collected from separate trees. To account for intraspecifc variation we aimed to collect a minimum of three samples for as many species as possible (66.0% of species were represented by ≥ 3 samples and 77.7% by ≥ 2 samples). All nomenclature follows Smith et al. (2009), except where indicated.

Molecular methods

DNA was extracted from fresh material using a DNeasy Plant Mini Kit (Qiagen) following the manufacturer’s protocol. Initial tests with this kit on herbarium specimen material indicated that those collected ≥c. 2 yr ago failed to yield amplifiable DNA; some freshly collected samples also failed to yield amplifiable DNA using this kit. DNA was extracted from these specimens using a sodium dodecyl sulphate (SDS)-based protocol modified from Whiting et al. (1997) by Martin & Winka (2000).

The ITS region was amplified with one of the following pairs of fungal-specific primers: ITS1-F (Gardes & Bruns, 1993) + ITS4 (White et al., 1990); ITS1-F + ITS4A (Larena et al., 1999); ITS5 (White et al., 1990) + ITS4A. Reactions contained 2.5 μl of 10× NH4 reaction buffer (Bioline, London, UK), 0.6 units BIOTAQ DNA polymerase (Bioline), 2.5 mM MgCl2 (Bioline), 0.2 mM dNTPs (Promega), 0.8 μg BSA (Invitrogen), 0.3 μM of each primer, 4% dimethyl sulphoxide (DMSO), 1 μl genomic DNA (or in some cases 1 μl of a 10× dilution) and double-distilled water to a total of 25 μl. The PCR amplifications were performed with a programme of 94°C for 3 min, followed by 30–34 cycles of 94°C for 45 s, 50–60°C for 60 s, and 72°C for 90 s, with a final 10 min at 72°C.

For a small number of samples that failed to yield a single, discrete, product with standard PCR, nested PCR was performed using the ITS1 (White et al., 1990) + ITS4 primers for the secondary PCR. Reactions were as above, except 1 μl of a 10× dilution of the primary PCR product was used as the template. The amplification programme for secondary PCR was: 94°C for 3 min; 28 cycles of 94°C for 30 s, 50°C for 30 s, 72°C for 60 s, with a final 10 min at 72°C. Where PCR failed to yield products suitable for sequencing, amplification of the target region was attempted for some samples directly from the source material by placing a thin slice of a single apothecium into the PCR mixture. Direct PCR was performed with the same conditions as for genomic DNA, but using an initial 5 min denaturization at 94°C. Where the whole ITS region could not be obtained we amplified the ITS2 region using the ITS3 (White et al., 1990) + ITS4A primer pair with the same reaction mix and amplification conditions as for the full ITS region.

Amplification products were cleaned using ExoSAP-IT (Affymetrix, Santa Clara, California, USA) following the manufacturer’s protocol. Cycle sequencing was conducted with a BigDye Terminator v 3.1 100 Reaction Ready kit (Applied Biosystems, Carlsbad, California, USA) following the manufacturer’s protocol using the same primers as for the original PCR (in some cases internal primers, ITS2 (White et al., 1990) + ITS3, were also used). Cycle sequencing products were analysed by The GenePool (University of Edinburgh, Edinburgh, UK). Electropherograms were visually inspected for polymorphic positions, or a clear secondary signal underlying the primary base calls in strong sequencing reactions, indicative of heterogeneous amplification products. Of 46 samples (13.1% of all samples studied) showing clear evidence of heterogeneity within PCR products from the full ITS region (Table S3), 23 were selected for cloning, representing a range of taxonomic groups, to test for evidence of intra-individual ITS variation (i.e. distinct ITS copy types) or co-amplification of nontarget fungi (e.g. parasites or endophytes). In addition, for one sample with a single polymorphic base the two possible ITS sequence types were inferred (Table S3); sequences from the remaining 22 samples not selected for cloning were excluded from further analyses. The PCR products were cleaned using an illustra GFX PCR Purification Kit (GE Healthcare, Little Chalfont, Buckinghamshire, UK) and cloned into the pGEM-T Easy Vector System (Promega). Twelve colonies were screened for each sample via colony PCR, using the same primers as in the initial PCR. Colony PCR samples contained 2.5 μl of 10× NH4 reaction buffer, 0.6 units BIOTAQ DNA polymerase, 2.5 mM MgCl2, 0.2 mM dNTPs, 0.3 μM of each primer, 0.8 μg BSA and double-distilled water to a total of 25 μl; part of a single colony was resuspended into this mix. The following temperature profile was used: 94°C for 10 min; 28 cycles of 94°C for 30 s, 52°C for 30 s, 72°C for 90 s; 72°C for 7 min. Cloned PCR products were cleaned and sequenced as above.

Sequence alignment and analysis

Contigs were assembled and edited using sequencher v4.5 (Gene Codes Corporation, Ann Arbor, Michigan, USA). Initial assessment of sequence identity was performed via blastn searches (Altschul et al., 1997) of GenBank (Benson et al., 2010); sequences with top hits to ITS accessions from the same (or closely related) species were retained for further analysis. For samples with no ITS sequences from closely related taxa on GenBank, sequences were considered to be from the target organism if all samples of a given species yielded highly similar sequences. Sequences with top hits to obvious fungal contaminants (e.g. mitosporic Herpotrichiellaceae or Tremellomycetes) were discarded. Sequences were submitted to GenBank, accession numbers FR799032 – FR799323 (Tables S1, S2); sequences and electropherograms have also been submitted to BOLD (Ratnasingham & Hebert, 2007), project code: BARLI.

Sequences from the Usnea dataset were aligned using muscle (Edgar, 2004) with default parameters and subsequent optimization by eye in macclade v4.04 (Maddison & Maddison, 2002). As the presence of recombination can confound phylogenetic methods (Schierup & Hein, 2000; Posada & Crandall, 2002), tests for recombination were carried out on the Usnea dataset using gard (Kosakovsky Pond et al., 2006a,b) and rdp3 (Martin et al., 2005). gard was run under the best-fit model of evolution (as assessed with the corrected Akaike’s Information Criterion (AICc ); Sugiura, 1978) using multiple-breakpoints with β-Γ rate variation and three rate classes, via the Datamonkey server (http://www.datamonkey.org). The following recombination detection methods in rdp3 were used: Bootscanning (Salminen et al., 1995); Chimaera (Posada & Crandall, 2001); GENECONV (Padidam et al., 1999); MaxChi (Maynard Smith, 1992); RDP (Martin et al., 2005); SiScan (Gibbs et al., 2000); and 3SEQ (Boni et al., 2007). Settings for each method were optimized following recommendations in the rdp3 manual. Although the floristic dataset was not analysed using phylogenetic methods (the broad taxonomic spread necessitating the use of the methods that do not rely on a multiple sequence alignment), the presence of chimaeric clones (i.e. PCR recombinants) could result in an overestimation of the degree of intra-individual ITS variation. Therefore, cloned sequences from a given sample were aligned with each other and tested for recombination as described earlier. No evidence of recombination was detected for the Usnea dataset and none of the clones from the floristic dataset were identified as recombinants; therefore, all sequences were retained for subsequent analyses. Sets of cloned sequences for a single accession forming highly supported clusters in preliminary phylogenetic analyses were merged to form consensus sequences in the final datasets.

Preliminary analyses of species discrimination using phylogenetic methods revealed that sequences from several Usnea samples failed to cluster with any species in our dataset. To test whether these singletons cluster with other previously sequenced species we downloaded all Usnea ITS sequences from GenBank (excluding those containing ambiguous bases or indels within the 5.8S gene) and aligned these with our newly generated sequences, as described earlier for the Usnea dataset, to create an expanded dataset. Because of inaccuracies potentially associated with fungal accessions in GenBank (Nilsson et al., 2006) we based measures of percentage species discrimination on only those sequences generated by us. Before further analysis of the datasets, regions of the alignment corresponding to the 18S (containing a group I intron in some taxa) and the 28S genes were excluded. Sequence alignments have been submitted to TreeBase (study accession S11204). Best-fit models of evolution for both Usnea datasets were selected using the AIC in modeltest v3.7 (Posada & Crandall, 1998) and mrmodeltest v2.2 (Nylander, 2004) for implementation in distance and Bayesian analyses respectively. Datasets were partitioned into ITS1, 5.8S and ITS2, and each partition was evaluated separately.

To assess whether sequences from Usnea form species-specific clusters, we analysed the entire ITS region (ITS1–5.8S–ITS2) and ITS2 using the bionj version of the neighbor-joining algorithm (Gascuel, 1997) in paup* v4.0 b10 (Swofford, 2003), and with Bayesian inference (BI) in mrbayes v3.1.2 (Huelsenbeck & Ronquist, 2001; Ronquist & Huelsenbeck, 2003), for both our sequences and the expanded Usnea dataset. Species were scored as successfully discriminated if samples formed a species-specific cluster with ≥ 70 bootstrap percentage (BP) under maximum parsimony and ≥ 0.95 posterior probability (PP) under Bayesian inference. The bionj analyses were conducted with distance options set to the best-fit model of evolution. Node support was assessed using the bootstrap (Felsenstein, 1985), running 1000 pseudoreplicates of bionj with the same distance options as for the original search. Analysis by BI was carried out using the parallel version of mrbayes v3.1.2 (Huelsenbeck & Ronquist, 2001; Ronquist & Huelsenbeck, 2003; Altekar et al., 2004) through the Computational Biology Service Unit of Cornell University, USA (http://cbsuapps.tc.cornell.edu/index.aspx), or the Bioportal (http://www.bioportal.uio.no) applying the best-fit model of evolution for each partition. Each search was run with the following settings: four chains (three with default heating); random starting tree; sampling every 1000 generations; flat priors. Four independent 10 million generation runs were carried out for the Usnea dataset. For the expanded Usnea dataset, four independent 30 million generation runs were performed. Parameter values from each run were viewed in tracer v1.4 (http://tree.bio.ed.ac.uk/software/tracer) to confirm appropriate effective sample sizes had been obtained and stationarity reached. We checked further for convergence by using awty (Nylander et al., 2008) to compare split frequencies between each run. Trees corresponding to the first 500 000 generations of each run for the Usnea dataset, and the first 3 000 000 generations for the expanded Usnea dataset, were discarded as the burn-in. Single majority rule consensus trees for each dataset, showing all compatible groupings, were constructed in paup* v4.0b10 from the combined post burn-in trees.

When calculating the success of barcoding, all subspecies, varieties, forms or chemotypes of a given species were treated as the same (e.g. Species ‘a’ subsp. ‘a’ was scored as correctly identified if it matched samples from Species ‘a’ subsp. ‘a’ or Species ‘a’ subsp. ‘b’). To assess levels of species discrimination we used a custom Perl script to create all possible pairwise sequence alignments with muscle under the default settings. A modified version of the pDIST subroutine from the ‘simple pairwise matching’ suite of scripts (Little, 2009) was used within a custom Perl script to calculate the uncorrected p distance for each alignment and to score the success or failure of discrimination for each sample. Discrimination of a sample was counted as successful if the minimum uncorrected interspecific p distance exceeded the maximum uncorrected intraspecific p distance (i.e. a barcode gap is present; for samples yielding multiple ITS sequence types, a sample was only scored as successfully discriminated if all sequences from that sample could be discriminated). A species was scored as successfully discriminated if all samples from that species possessed a barcode gap. The percentage of successfully discriminated samples/species was based on the results from species where sequences for the target region (ITS or ITS2) had been obtained for > 1 sample, although all sequences were included in the analysis to act as potential causes of failed discrimination. The analysis was performed for the ITS and ITS2 datasets for both test-cases. We also assessed the use of ITS data for species identification by conducting blast searches against custom databases created from both full ITS and ITS2 sequences from the floristic and Usnea datasets. The stand-alone version of blast (blastall v2.2.16) was used to conduct all vs all blastn searches for each database with the default Expect value. A custom Perl script was used to score the number of samples successfully identified for all species where the ITS/ITS2 region had been obtained for > 1 sample; a sample was counted as correctly identified if the top blast hit, excluding hits by the query to itself, was from the same species as the query. Species were scored as successfully identified if all samples from that species were successfully identified.

Results

Universality of amplification and sequencing

The full ITS region was obtained from 75.9% of samples from the Usnea dataset and 83.9% of samples from the floristic dataset, or 80.9% of the combined dataset of 351 samples. If samples with partial sequences (i.e. single direction or ITS2 alone) are also considered, a total of 85.5% samples yielded some ITS data. The most common reason for not obtaining the target region in the Usnea dataset was the partial (9.8% of samples) or complete (7.1% of samples) failure of amplification (Table 1). All samples for which amplification failed completely were taken from herbarium specimens, and had been collected from the field at least 3 yr before extraction. All full-length, unambiguous, consensus sequences were derived from DNA extracted within 1 yr of the field collection. For the floristic dataset, the primary reasons for not obtaining the target region were amplification of heterogeneous PCR products (6.0% of samples) or preferential amplification of nontarget fungal species (2.8% of samples; Table 1). Of the eight orders represented in the floristic dataset, the Arthoniales had the lowest proportion of samples (33.3%) successfully yielding the full target region (Table S4). The next lowest levels of success were for the Pertusariales (61.9%) and Peltigerales (62.5%). For the remaining five orders, the target region was obtained for ≥ 87.2% of samples. Of the 28 families sampled, a 100% success rate was obtained for 15 and a 0% success rate for three (Chrysothricaceae, Peltigeraceae and Roccellaceae). In Pertusaria, excessive amplicon length (Table 1) prevented full sequence data being obtained from P. coronata and P. pertusa. Amplification of ITS using ITS1-F or ITS5 as the forward primer resulted in amplicons of c. 2 kb because of the presence of one or more group I introns downstream of the primer annealing sites. For example, in P. pertusa the ITS region itself is only c. 520 bp in length, but c. 1.4 kb of 18S gene (including the intron insertion/s) was also amplified. A PCR using the ITS1 primer failed as the annealing site was interrupted by the intron insertion. While group I introns were found in several other taxa, the amplicons did not exceed c. 1.1 kb.

Table 1.   Reasons for failure to obtain the internal transcribed spacer (ITS) barcode region
ReasonUsnea datasetFloristic datasetStudy
Number (%a) of samplesNumber (%a) of samplesNumber (%a) of samples
  1. aAs a percentage of the total number of samples in the Usnea dataset (112), Floristic dataset (248), and the total number of samples for study as a whole (351).

  2. bIncludes samples where the ITS2 region was amplified, but requires cloning.

  3. cFor example, parasitic fungi.

  4. dFull consensus sequence not obtained because of group-I introns in the 18S gene.

  5. na, not applicable.

No amplification product obtained8 (7.1)4 (1.6)12 (3.4)
Amplification product could only be obtained for the ITS2 regionb11 (9.8)2 (0.8)13 (3.7)
Sequences were from a nontarget speciescna7 (2.8)7 (2.0)
Full consensus not obtained because of ≥ 1 mononucleotide runsna5 (2.0)5 (1.4)
Sequence only obtained in a single direction1 (0.9)3 (1.2)4 (1.1)
Region too longdna4 (1.6)4 (1.1)
ITS PCR product requires cloning7 (6.3)15 (6.0)22 (6.3)

Of 23 samples cloned, for six (26%) no clones matched the target species (i.e. blastn searches of GenBank yielded top hits to nontarget fungi). For two samples (8.7%), the clones represented two distinct ITS sequence types (pairwise sequence divergence 2.5% and 17%), both apparently from the target species (but see the Discussion). The remaining samples yielded either near-identical clones that matched only the target species (seven samples) or both clones that matched the target species and those matching nonspecific fungi (e.g. mitosporic Tremellales or mitosporic Herpotrichiellaceae), indicating that polymorphism observed from direct sequencing in these cases was likely to be caused by the co-amplification of nonlichenized fungi rather than intra-individual ITS heterogeneity (Table S3).

Species discrimination

Tree-based measures of species discrimination  Of the 15 species of Usnea represented by > 1 sample in the ITS dataset, 11 (73.3%) formed species-specific clusters with ≥ 70 BP under the bionj method (Fig. 1, see the Supporting Information, Fig. S1), or a PP of ≥ 0.95 under Bayesian inference (BI; tree not shown). For the ITS2 region, nine (56.3%) of the 16 species represented by > 1 sample formed species-specific clusters with ≥ 70 BP (Fig. S2); samples from Usnea filipendula and Usnea rubicunda also form individual clusters, but with < 70 BP. Under BI, samples from 12 species (75.0%) formed species-specific clusters, but only eight (50.0%) of these with ≥ 0.95 PP (tree not shown). Three samples, Usnea florida EDNA09-02125, Usnea subfloridana EDNA09-02133 and U. sp. EDNA09-02360, failed to cluster with any species in the ITS or ITS2 trees (Figs 1, S1, S2). Two of these samples also did not cluster with any species in the analysis of the expanded dataset, including Usnea sequences from GenBank (Fig. S3), whilst the third (U. florida EDNA09-02125) clustered with samples from U. florida and Usnea wasmuthii in a group distinct from the main clades for these species (Fig. S3b). All four juvenile specimens for which sequence data were obtained could be placed to a species, using ITS or ITS2, under both bionj and BI (Figs S4, S5, Table S5).

Figure 1.

Phylogram from the bionj analysis of the Usnea internal transcribed spacer (ITS) dataset based on GTR+I+Γ distances (see bottom left-hand corner for scale), with midpoint rooting. EDNA accession numbers (referring to the DNA bank at the Royal Botanic Garden, Edinburgh, UK) follow species names; numbers in parentheses indicate the number of clones used to generate a consensus sequence in samples that required cloning. Terminals are shaded to indicate the country of origin for the specimens from which the ITS sequences have been obtained (see top left-hand corner for key). Asterisks above branches indicate nodes with bootstrap support of ≥ 70%. See the Supporting Information, Fig. S1, for a full colour version.

Barcode gap analysis  For the Usnea dataset, discrimination based on uncorrected p distances was 75.0/73.3% of samples/species with the ITS region and 61.2/68.8% of samples/species with ITS2 (Fig. 2a,b, Table 2). For the floristic dataset, 94.1/92.1% of samples/species could be discriminated with the ITS region and 89.3/88.9% of samples/species with the ITS2 region (Fig. 2c,d, Table 2). For the floristic dataset, samples lacking a barcode gap for the ITS region came from two genera (Cladonia and Physcia), with the majority of these samples (90.9%) from Cladonia (Table S6). For ITS2, four genera lack a barcode gap (Cladonia, Lecidella, Parmelia and Physcia), with Cladonia again the genus with the largest number of samples (12) in this category (Table S6). For all samples from the floristic dataset that lacked a barcode gap for ITS, the minimum interspecific p distance was to a congeneric species (data not shown).

Figure 2.

Plots of minimum interspecific uncorrected p distances vs maximum intraspecific uncorrected p distances for 80 samples from the Usnea internal transcribed spacer (ITS) dataset (a), 85 samples from the Usnea ITS2 dataset (b), 187 samples from the floristic ITS dataset (c) and 187 samples from the floristic ITS2 dataset (d). Data points represent samples from species where the ITS/ITS2 region was successfully obtained from > 1 sample; points that fall above the 1 : 1 line indicate samples with a barcode gap.

Table 2.   Proportion of samples and species successfully discriminated for the Usnea and floristic datasets
DatasetTree-based analysesaBarcode gap analysisblast analysis
% samplesb/speciesc% samplesb/speciesc% samplesb/speciesc
  1. aSpecies were scored as successfully discriminated if samples formed a species-specific cluster with ≥ 70 bootstrap percentage under maximum parsimony and ≥ 0.95 posterior probability under Bayesian inference. Only samples from successfully discriminated species were also scored as successfully discriminated.

  2. bAs a percentage of the total number of samples from species with > 1 successfully sequenced sample in the Usnea ITS dataset (80), Usnea ITS2 dataset (85) floristic ITS dataset (187), floristic ITS2 dataset (187).

  3. cAs a percentage of the total number of species with > 1 successfully sequenced sample in the Usnea ITS dataset (15), Usnea ITS2 dataset (16) floristic ITS dataset (63), floristic ITS2 dataset (63).

  4. na, not applicable.

Usnea
 ITS70.0/73.375.0/73.392.5/80.0
 ITS235.3/50.061.2/68.894.1/81.3
Floristic
 ITSna94.1/92.196.3/92.1
 ITS2na89.3/88.996.3/90.5

blast-based identification  In blast analyses, 92.5/80.0% samples/species from the Usnea dataset were correctly identified with the ITS region and 94.1/81.3% of samples/species with ITS2 (see Tables S6, S7 for samples/species with failed identification). The percentage of samples correctly identified is higher for ITS2 than for ITS, owing to a sample of U. florida (EDNA09-02125; not assigned to species with the tree-based methods, see the paragraph ‘Tree-based measures of species discrimination’) being assigned to U. wasmuthii with ITS but U. florida with ITS2. Juvenile samples were not included in the blast database, but their identification was checked via blast to the database of other Usnea sequences. The ITS and ITS2 regions gave the same identification for the juvenile samples (Table S5) and were congruent with the results of the tree-based analyses (Figs S4, S5). For the floristic dataset, 96.3/92.1% samples/species were correctly identified with the ITS region and 96.3/90.5% samples/species with the ITS2 region (Table 2). Samples from the floristic dataset for which blast-based discrimination with ITS failed were from two genera (as was also found for the barcode gap analysis, see the paragraph ‘Barcode gap analysis’ and Table S6), with Cladonia the genus with the largest number of failed samples (six). For the ITS2 region, failed samples were from three genera (Cladonia, Parmelia and Usnea; Table S6). All samples from the floristic dataset that were not assigned to the correct species with blast, were assigned to the correct genus for both ITS and ITS2 (data not shown).

Discussion

Species discrimination in lichenized fungi using ITS

Depending on the dataset examined (taxonomic vs floristic), and method of analysis used, our results reveal levels of successful species discrimination of between 73.3% and 92.1% for ITS and 50.0% and 90.5% for ITS2. In addition, we have shown that individual samples can be assigned to the correct species in a floristic context with a very high success rate (up to 96.3% of samples using blast-based identification), a result that encourages the use of the ITS as a DNA barcode for biodiversity studies in lichen-rich areas. Moreover, samples not assigned to the correct species were consistently assigned to the correct genus. Thus, although it may not be possible to discriminate all samples to the species level with ITS, unknown samples can be reliably ascribed to a genus or to a group of closely related congeneric species. Species that could not be discriminated with ITS come from two genera (Cladonia and Physcia), with the majority of samples belonging to Cladonia (90.9% of the samples lacking a barcode gap and 85.7% of the samples not assigned to the correct species using blast). Our study included seven species of Cladonia with at least two samples, only three of which (C. portentosa, C. squamosa and C. subulata; see Table S7) could be discriminated using the barcode regions tested here. The genus Cladonia contains several species complexes, where morphological characters overlap between taxa (Fontaine et al., 2010), a phenomenon that also occurs in other lichen genera (Grube & Kroken, 2000). Recent studies of some Cladonia species revealed that morphologically distinct taxa cannot be resolved using ITS or other molecular data (Kotelko & Piercey-Normore, 2010; Fontaine et al., 2010) raising the possibility that morphological characters used to define separate species could reflect variation in environmental conditions (Kotelko & Piercey-Normore, 2010) or may be homoplasious as a result of convergent evolution (Fontaine et al., 2010). Such phenomena could account for the failure to discriminate between Cladonia samples in this study. However, discrimination also failed in the easy-to-identify Cladonia gracilis (Table S7), suggesting that ITS variation may not always track species boundaries in this genus.

In common with several other barcode studies that have focused on a limited geographic area, our floristic results demonstrate the increased level of success that can be obtained compared with barcoding in a given taxonomic group of closely related species. DNA barcoding may be particularly effective in a floristic context as it is likely that not all close relatives of a given species will be found in the study area, allowing the reference database to be restricted to those species that are known to occur (Chase & Fay, 2009), resulting in higher levels of species discrimination. For example, a study of the complete butterfly fauna of Romania achieved a 90% success rate for species identification with the animal barcode COI (Dincăet al., 2010), whereas Lahaye et al. (2008) reported that > 90% of species were successfully identified with DNA barcoding in their study of the plant floras of two biodiversity hotspots. Our results indicate that it may be possible to distinguish c. 92% of species of lichen forming fungi within a restricted geographic location. Nevertheless, as ITS data could not be obtained for all samples in our floristic dataset, the percentages of species discrimination reported may represent an overestimation of the actual proportion of species that it is possible to distinguish by barcoding (of 73 species represented by ≥ 2 samples, 10 could not be assessed for species discrimination as sequence data for the full ITS region were obtained for < 2 samples). Additional studies will be required to test the effectiveness of DNA barcoding in other lichen-rich areas in order to establish whether the high-level of successful species-identification reported here is more widely applicable.

Whilst the level of species discrimination was lower in the Usnea dataset, a high percentage of individual samples were assigned to the correct species using blast (92.5% for ITS and 94.1% for ITS2), indicating that a small number of samples are responsible for the overlap between intra and interspecific distances in some species (as is also evident from the tree-based analyses, e.g. Fig.1; see also Table S8). U. florida and U. subfloridana are among the species lacking a barcode gap, and it has been suggested that they constitute a single species as although they differ in reproductive strategy (sexual versus predominantly asexual), specimens from both taxa formed a single monophyletic group on the basis of data from the β-tubulin gene and ITS (Articus et al., 2002). Thus the intermingling of the majority of samples from these two species in a well supported clade (Fig.1) may reflect the fact that they are conspecific. However, there are two more divergent samples which do not cluster with any other species in our taxon-set (EDNA09-2125 and EDNA09-2133; Fig. 1). In the expanded dataset, U. florida EDNA09-2125 clusters with several U. florida, and one U. wasmuthii, sequences from GenBank in a group distinct from the main clades for these species (Fig. S3b) whilst U. subfloridana EDNA09-2133 groups with U. sp. EDNA09-02360, a specimen that cannot be assigned to any species in the British flora. Two other specimens in the Usnea dataset, U. fragilescens EDNA10-00740 and U. glabrescens EDNA09-01568, also failed to cluster with conspecific samples. In all of these cases the samples’ chemistry and morphology are consistent with the species concepts defined in The Lichens of Great Britian and Ireland (Smith et al., 2009; this is also the case for samples with failed discrimination from the floristic dataset; see Tables S8, S9). The presence of specimens with ITS sequence types distinct from the other conspecific samples may indicate that the species concepts we have applied need revising, and that there are previously unrecognised, or cryptic, species present in Great Britain and Ireland, a phenomenon that has been found elsewhere in several lichen groups (reviewed by Crespo & Pérez-Ortega, 2009). Nevertheless, divergent ITS sequences could be indicative of processes other than speciation. For example, incomplete concerted evolution may lead to differences between ITS copies in an array (Simon & Wieß, 2008) and subsequent homogenisation towards different copy-types, or bias in PCR amplification (Wagner et al., 1994), could result in distinct ITS sequences being retrieved from individuals of the same species. Any hypothesis regarding cryptic species would need to be investigated further by examining data from additional lines of evidence, preferably including independently evolving gene regions (Crespo & Pérez-Ortega, 2009), along with a critical re-examination of morphological characters and species concepts.

Future challenges for the DNA barcoding of lichenized fungi

Our findings highlight some of the challenges that need to be addressed in order to maximize the potential for DNA barcoding in lichenized fungi, in particular those relating to the amplification of multiple, or nontarget, fungal species during the PCR. The presence of heterogeneous amplification products, or the preferential amplification of nontarget fungi, was the greatest limiting factor for successful amplification and sequencing of the ITS in the floristic dataset. Cloning results for the floristic dataset revealed that in six out of 20 cases no clones were obtained from the target species; instead, the sequences derived from nontarget groups such as mitosporic fungi. Tests in silico have demonstrated that some ITS primers commonly used in fungi can be subject to a range of biases (including taxonomic and length biases; Bellemain et al., 2010); similar biases may explain the preferential amplification of nontarget species from some samples in our study, particularly if nontarget fungi lack group I introns, which are commonly found in some lichenized taxa (DePriest & Been, 1992; Simon et al., 2005). From the remaining 14 cloned samples, eight yielded one or more clones that were not from the target taxon. This indicates that the major cause of mixed amplification products for the ITS region is the coamplification of nonlichenized fungal species along with the target species. As well as potentially contaminating fungi occurring on lichen surfaces (e.g. lichenicolous fungi), endophyte-like fungi (endolichenic fungi) are found within the thalli of lichens (Arnold et al., 2009). Moreover, endolichenic, lichenicolous and lichen-forming species may all be found within the same class of fungi (Arnold et al., 2009). While it might be possible to exclude some contaminating fungi by increasing the specificity of primers or amplification conditions, lichenized fungi occur across diverse lineages (see, for example, Schoch et al., 2009) and the use of more specific barcoding protocols could limit the degree of standardization and restrict the range of lichenized fungi that can be barcoded under a given protocol. In addition to a standard barcode region, the development of DNA barcoding as a widely accessible tool also requires standard and robust protocols that require little optimization. As such, for some samples, rather than increasing the specificity of conditions used for amplification of the ITS region, it may be necessary to survey all fungal ITS types present using more general protocols and rely on the reference database to sort between the lichen-forming and other fungal species present.

In addition to the coamplification of nonlichenized fungi, heterogeneous PCR products may be indicative of intragenomic ITS variation, a phenomenon that would prove problematic for DNA barcoding unless all sequence types reflect species boundaries. Previous studies have found evidence for multiple ITS sequence types within individuals of several fungal species. Simon & Wieß (2008) found intragenomic ITS polymorphism within four species of fungi, while Simon et al. (2005) detected evidence for intragenomic nrDNA variation within two species of lichenized fungi. We detected three possible cases of such variation, in samples from Diplotomma pharcidium (EDNA09-01532), Hypocenomyce scalaris (EDNA09-02422) and Micarea denigrata (EDNA09-01552). In the case of H. scalaris, a single polymorphic position was detected, and both ITS sequence types correctly identified the sample. For the other two samples, much greater divergence was found between sequence types: 2.5% difference between the two D. pharcidium sequence types and 17.0% difference in the case of M. denigrata. blast-based analyses for D. pharcidium EDNA09-01532 gave a correct identification irrespective of the sequence used, and a barcode gap was maintained for this species (which was represented in our dataset by three samples). Nevertheless, as only one species of Diplotomma was included in our study, the possibility remains that the intra-individual ITS variation found for D. pharcidium may be shared across other species of this genus. In the case of M. denigrata, only one individual was included, so we were unable to test whether both sequence types could correctly identify this sample (M. denigrata is also not represented by ITS sequence data on GenBank). The ITS product cloned for M. denigrata was amplified via direct PCR on part of a single apothecium. However, the degree of difference between the two M. denigrata sequences may suggest that, aside from intra-genomic ITS variation, the coamplification of multiple fungal species present on or within this sample (see Arnold et al., 2009) could also be responsible for this result.

Conclusion

Our results, from testing the utility of DNA barcoding across eight orders of lichen-forming fungi, demonstrate that a high proportion of samples (> 92%) can be placed to the correct species using either the ITS (ITS1–5.8S–ITS2) or ITS2 region. Moreover, if considered within a floristic context, in the region of 92% of species possess a barcode gap when using the ITS. These findings highlight the potential utility of DNA barcoding as an identification tool for lichenized fungi.

Acknowledgements

We thank Mary Gibby, Michelle Hollingsworth, Alexandra Clark, Ruth McGregor and Alan Forrest for their support on this project, Barbara Benfield, Rod Corner, David Hill, Peter Lambley, Stephen Ward, Ray Woods and members of the British Lichen Society for providing Usnea material, Damon Little for providing access to his simple pairwise matching scripts and Philippe Clerc for advice on the identification of Usnea specimens. We also thank Philippe Clerc and two anonymous reviewers for comments on an earlier version of this manuscript. This work was supported financially via funding from the Scottish Government Rural and Environment Research and Analysis Directorate (RERAD) to the Royal Botanic Garden Edinburgh, UK.

Ancillary