Employing 454 amplicon pyrosequencing to reveal intragenomic divergence in the internal transcribed spacer rDNA region in fungi

The rDNA internal transcribed spacer (ITS) region has been accepted as a DNA barcoding marker for fungi and is widely used in phylogenetic studies; however, intragenomic ITS variability has been observed in a broad range of taxa, including prokaryotes, plants, animals, and fungi, and this variability has the potential to inflate species richness estimates in molecular investigations of environmental samples. In this study 454 amplicon pyrosequencing of the ITS1 region was applied to 99 phylogenetically diverse axenic single-spore cultures of fungi (Dikarya: Ascomycota and Basidiomycota) to investigate levels of intragenomic variation. Three species (one Basidiomycota and two Ascomycota), in addition to a positive control species known to contain ITS paralogs, displayed levels of molecular variation indicative of intragenomic variation; taxon inflation due to presumed intragenomic variation was ≈9%. Intragenomic variability in the ITS region appears to be widespread but relatively rare in fungi (≈3–5% of species investigated in this study), suggesting this problem may have minor impacts on species richness estimates relative to PCR and/or pyrosequencing errors. Our results indicate that 454 amplicon pyrosequencing represents a powerful tool for investigating levels of ITS intragenomic variability across taxa, which may be valuable for better understanding the fundamental mechanisms underlying concerted evolution of repetitive DNA regions.


Introduction
The internal transcribed spacer (ITS) region of nuclear ribosomal DNA is the most commonly sequenced region in fungi and is used in fungal systematics to define species, to infer phylogenetic relationships, and for identification (DNA barcoding) of fruiting bodies, cultures, and DNA in environmental samples (Horton and Bruns 2001;Peay et al. 2008;Begerow et al. 2010). The ITS region has recently been proposed as the universal barcode for all fungi (Schoch et al. 2012). Although mycologists rely heavily on ITS to define and detect species and to understand fungal evolution, there are many long-recognized problems with using this region. Problems range from a lack of interspecific variation in some groups of fungi, especially some Ascomycota (Rehner and Buckley 2005;Balajee et al. 2007;Rojas et al. 2010), to an abundance of variation among individuals within populations (K ar en et al. 1997;Kauserud and Schumacher 2002;Nilsson et al. 2008;Blaalid et al. 2013). These problems are not unique to the ITS region and it is unlikely that any single, short DNA region includes levels of molecular variation suitable for separating species across a phylogenetic group as broad as kingdom Fungi, with an estimated 1.5-5.1 million extant species (Hawksworth 2001;Schmit and Mueller 2007;Blackwell 2011).
However, one problem that is relatively unique to rDNA regions, including the ITS region, is the possibility for significant intragenomic (within-individual) variability. This potential arises because the ribosomal tandem array occurs at high copy number, which in fungi can range from approximately 45 to 200 copies per genome and span several chromosomes (Maleszka and Clark-Walker 1990;Ganley and Kobayashi 2007). Intragenomic ITS variability has been observed in a wide range of taxa, including prokaryotes, plants, animals, and fungi (Feliner et al. 2004;W€ orheide et al. 2004;Stewart and Cavanaugh 2007;Simon and Weiss 2008;James et al. 2009;Vydryakova et al. 2012). In one recent case, intragenomic ITS variation was noted in the fungal genus Laetiporus, a group of brown-rot polypores in the Antrodia clade (Lindner and Banik 2011). Intragenomic variation in this group was found to inflate estimates of species richness and to complicate phylogenetic investigations when cloned ITS sequences rather than ITS sequences obtained by direct Sanger sequencing were analyzed. Unfortunately it is not known how widespread this phenomenon is in kingdom Fungi. If such intragenomic variation is common it will cause significant problems with the analysis of environmental sequencing data. These problems could be especially severe with high-throughput next-generation sequencing methods (e.g., 454 pyrosequencing, Illumina, and IonTorrent), where even low-frequency ITS paralogs will be detected.
Our aim was to explore levels of intragenomic divergence in the Dikarya (Ascomycota and Basidiomycota) using large-scale sequencing of ITS1 amplicons derived from axenic single-spore cultures. Laetiporus cincinnatus, a species known to contain significant intragenomic variation (Lindner and Banik 2011), was included as a positive control. A wide range of phylogenetically diverse Basidiomycota and Ascomycota single-spore cultures were chosen from culture collections and the ITS1 region was amplified and subjected to 454 pyrosequencing (Margulies et al. 2005).

Fungal cultures
One hundred and twenty-seven single-spore cultures from diverse phylogenetic lineages in the Dikarya (Ascomycota and Basidiomycota) were originally screened for use in this study. Of these, 99 produced >100 pyrosequencing reads following initial data filtering (see methods below) and were included in the final dataset; 44 were Ascomycota and 55 were Basidiomycota (Appendix). Cultures were obtained from the culture collections of the Center for Forest Mycology Research (CFMR), maintained by the US Forest Service, Northern Research Station in Madison, WI; the ARON culture collection at the Department of Biology, University of Oslo; the Norwegian Veterinary Institute; and from the Norwegian Forest and Landscape Institute culture collections. All Basidiomycota were checked for, and found to lack, clamp connections, one potential sign of a dikaryotic mycelium.

Molecular methods
DNA was extracted from the axenic cultures following a 2% CTAB (hexadecyl-trimethyl-ammonium bromide) miniprep method described by Murray and Thompson (1980) with minor modifications: DNA was resuspended in 60-lL distilled sterile H 2 O at the final step of extraction. Samples were prepared for 454 pyrosequencing by performing nested PCR amplification using the fungalspecific primers ITS1F and ITS4 (White et al. 1990;Gardes and Bruns 1993) in the first step, and fusion primers including ITS5 and ITS2 (White et al. 1990) in the nested step. Fusion primers were constructed by adding 16 different unique 10 bp tags (Technical bulletin 005-2009, Roche Diagnostics Corp., Basel, Switzerland) and 454 pyrosequencing Titanium adaptors A and B to ITS5 and ITS2, respectively. The same tags were added to both forward and reverse primers. All PCR reactions were performed in three parallels for all samples for both PCR steps. PCR was performed on an MJ thermal cycler PTC-200 in 20-lL reactions containing 2-lL template DNA and 18-lL reaction mix. Final concentrations were 0.10 mmol/L dNTP mix, 0.125 lmol/L of each primer, and 0.5 units polymerase (Phusion Hot Start II, Finnzymes, Vantaa). The PCR amplification program was as follows: 30 sec at 98°C, followed by 20 cycles of 10 sec at 98°C, 20 sec at 50°C, 20 sec at 72°C, and a final extension step at 72°C for 7 min before storage at À20°C. The nested PCR was run with the same reaction concentrations and amplification program, but with a 509 diluted PCR mix as a template. After normalization of DNA concentration using the SequalPrep TM Normalization Plate (96) Kit following the manufacturer's protocol (Invitrogen, CA), PCR products were pooled into 8 equimolar amplicon libraries and cleaned with Wizard â SV Gel and PCR Clean-Up System (Promega, Madison, WI). The 454 Titanium sequencing of the tagged amplicons was performed at the Norwegian High-Throughput Sequencing Centre (http://www.sequencing.uio.no) using a 454 plate divided into eight compartments.

Bioinformatics analyses
As an initial filter, we removed all sequences with more than two errors in the primer sequence; with one or more errors in the tag sequence; with one or more DNA ambiguity symbols (N); or with an overall length of less than 150 bases. Reads with noncompatible tag combinations (Carlsen et al. 2012) also were removed. Sequence data were not denoized (e.g., Quince et al. 2011) so as to retain PCR and sequencing errors in addition to intragenomic ITS variation. Based on tag information, the sequences were split into 127 datasets representing the various single-spore cultures plus two negative controls. Twenty-eight datasets were discarded from further analyses due to a low number of reads (<100), leaving 99 species in the final dataset. Alignments were constructed in MAFFT 6.903 (Katoh and Toh 2008) for all datasets using the default (auto) strategy, which typically resulted in the FFT-NS-1 or FFT-NS-2 algorithm being selected. Manual inspection and BLAST searches (Altschul et al. 1997) of GenBank (Benson et al. 2012) also identified "contaminant sequences" in some datasets that represented species from the other datasets. These were interpreted as sequences that had switched tags at both ends (see Carlsen et al. 2012) and were excluded from the analysis. The final MAFFT alignments of the 99 accepted datasets were analyzed in DnaSP (Librado and Rozas 2009), where descriptive molecular variation statistics were calculated, including number of reads, number of alignment sites, number of haplotypes, haplotype diversity, nucleotide diversity (pi), and average number of nucleotide differences (k).

Results
After filtering, a total of 148,046 sequences were analyzed from the 99 isolates, yielding 9086 individual haplotypes (i.e., the number of clusters at 100% sequence similarity; Appendix). The number of reads per species ranged from 176 to 4212 with an average of 1495 reads per species. A strong positive correlation was observed between number of haplotypes per species and sequencing depth (Fig. 1A). However, the average number of nucleotide differences per species did not correlate with sequencing depth (Fig. 1B). There was a weak positive correlation between the total number of clusters detected per species and sequencing depth both at the 97% (Fig. 1C) and 99% ( Fig. 1E) clustering level. However, when the comparison was restricted to nonsingleton clusters, no relationship was detected between the number of clusters and sequencing depth at 97% (Fig. 1D) or 99% (Fig. 1F). In the full dataset of 99 taxa, 97% clustering of sequences produced 286 clusters. When excluding the singletons, 110 clusters were retained (Appendix). Even at a 99% sequence clustering level, 79% of the species included only one cluster when excluding singletons. In 92 of the 99 species, between 99.6% and 100% of the sequences were assigned to a single cluster by the clustering process (Appendix). Only in three species (Aspergillus sp., L. cincinnatus, and L. huroniensis) were more than 1.1% of the sequences affiliated with cluster(s) other than the most frequent.
As expected, the L. cincinnatus sequences exhibited high levels of molecular variation (k = 6.5) (Appendix, Fig. 1B) reflecting the already documented intragenomic ITS divergence in this species (Lindner and Banik 2011). Another Laetiporus species (L. huroniensis) that was poorly sampled by Lindner and Banik (2011) showed similarly high levels of molecular variation (k = 7.3). These two species displayed three nonsingleton 97% operational taxonomic units (OTU) (Fig. 1D) and numerous subgroups in the ITS phylogenies (Fig. 2). With a few exceptions, the remaining species displayed low sequence variation primarily with k < 1 and one nonsingleton 97% OTU. However, in addition to the two Laetiporus species, four additional species (Armillaria cf. novae-zelandiae, Aspergillus sp., Annulohypoxylon multiforme, and Polyporales sp.; see Fig. 1D) displayed more than one nonsingleton OTU at 97% sequence identity (Appendix).
Four species (Aspergillus sp., Annulohypoxylon multiforme, L. cincinnatus, and L. huroniensis) displayed signs of intragenomic ITS variation when neighbor-joining trees were constructed; the remaining species displayed star-shaped trees (unrooted) more suggestive of PCR and pyrosequencing error (Fig. 2). One species, Saccharomyces cerevisiae, displayed a very high number of divergent sequences at both the 99% and 97% levels, although the vast majority of these sequences were singletons ( Fig. 1C and E) and the unrooted neighbor-joining tree for this species was star shaped (Fig. 2).

Discussion
Although three of the 98 previously unsampled fungal species (%3%) displayed signs of intragenomic variation in the ITS region based on neighbor-joining analyses and five species (%5%) displayed greater than one nonsingleton cluster at 97% sequence identity, the majority of species displayed levels of sequence variation that likely could be ascribed to PCR and sequencing errors. Hence, most species seem to possess well-homogenized ITS tandem arrays, indicating that intragenomic variation in the ITS region will not severely affect environmental studies utilizing next-generation sequencing if certain datahandling steps are followed. In our dataset of 98 taxa (excluding the positive control L. cincinnatus), 97% clustering of sequences produced 281 OTUs (187% inflation), whereas similar clustering with the exclusion of singletons produced 107 OTUs (9% inflation) (Appendix). Our results support removal of all singleton clusters in addition to sequence denoizing (Quince et al. 2011) as critical steps for limiting taxon inflation due to PCR/sequencing errors and/or intragenomic variation.
Interestingly, one species, Saccharomyces cerevisiae, showed an abundance of divergent singleton sequences (42 of 43 OTUs were singletons at 97% clustering), although without greater sequencing depth it is difficult to say if these divergent singletons are due to PCR/ sequencing errors or intragenomic variability. Unfortunately it is difficult to distinguish among PCR/sequencing errors and intragenomic variation, although in the case of PCR/sequencing errors the number of haplotypes should increase with sequencing depth, whereas for intragenomic     variability there should be little correlation between the number of haplotypes and sequencing depth (cf. Dickie 2010). With great enough sequencing depth, it should be possible within individual species to distinguish between plateauing/stabilizing numbers of intragenomic haplotypes and sequencing errors. Additional methods specifically correcting for inflation due to intragenomic variability could be developed as next-generation sequencing methods are employed to screen larger numbers of taxa for the presence of intragenomic heterogeneity and rare ITS paralogs are documented. For traditional Sanger-based sequencing projects utilizing consensus sequences (e.g., from root tips, fruiting bodies, or cultures), the presence of rare ITS haplotypes in a genome does not appear to be a major concern, given that species with significant intragenomic ITS variation (e.g., L. cincinnatus) can produce "clean" consensus sequences representing the most common ITS variants (Lindner and Banik 2011). However, if certain variants in an ITS array become common, these copies could manifest themselves as seemingly unresolvable bases in sequence chromatograms, a phenomenon observed when allelic heterozygosity in ITS is encountered as a result of differing nuclei in a dikaryotic/heterokaryotic mycelium (Huang et al. 2010;Hyde et al. 2013). In the case of allelic heterozygosity of ITS, one would expect to observe two primary variants in approximately equal ratios.
For the fungal strains used in this study, we cannot entirely rule out that some of the observed variation is due to differing nuclei within a single mycelium (see Horton 2006), despite the fact that efforts were made to ensure monokaryotic isolates (e.g., sampling cultures derived from single spores and screening all Basidiomycota for the presence of clamp connections, a morphological feature indicative of a dikaryon). For species containing intragenomic variation (e.g., Annulohypoxylon multiforme, Aspergillus sp., L. cincinnatus, and L. huroniensis; Fig. 2), we observed more than two variants/clades and variants were not observed in approximately equal ratios, as would be expected if this variation was due to heterozygosity in a dikaryotic/heterokaryotic mycelium. In addition, the level of variation observed for some species was very high, with k (average nucleotide difference) ranging up to 7.4 in L. huroniensis. Such a high level of allelic divergence is typically not expected in a heterozygous individual (Hughes et al. 2009).
In order to fully understand the extent of intragenomic ITS variation in fungi, a broader phylogenetic range of species will need to be surveyed, including members of other fungal phyla such as Chytridiomycota s.l., Glomeromycota, and Zygomycota s.l. Significant intragenomic ITS variation has recently been detected in Batrachochytrium dendrobatidis, the chytrid fungus implicated in worldwide amphibian declines (Berger et al. 1998). Individual B. dendrobatidis genomes were found to contain up to 20 ITS haplotypes per genome (Schloegel et al. 2012), suggesting that significant intragenomic variation in the ITS region is a phenomenon that occurs in diverse fungal lineages. Our results indicate that high-throughput sequencing works well for detecting intragenomic variation and could be applied to an even wider range of species, although it will be difficult to screen fungi that are difficult to culture (e.g., Glomeromycota) or for which haploid material may be difficult to obtain. Given an estimated 1.5-5.1 million fungal species worldwide (Hawksworth 2001;Schmit and Mueller 2007;Blackwell 2011), few generalizations can be made because at best approximately 0.01% of fungal species have been sampled for intragenomic variation to date.
Because fungi are extremely diverse and it is difficult and time consuming to characterize species using traditional methods, it has recently been suggested that fungal species could be formally named based on environmental ITS data (Hibbett et al. 2011). Given the current findings, formal naming of environmental ITS sequences may present potential problems that need to be taken into account because environmental sequences will not always correspond to species in the traditional sense. Reconciling disparate ITS copies under one and the same species would be possible under such systems, but would require prior knowledge and manual intervention. Despite these potential problems, the ITS region seems to be the best DNA barcode currently available, although additional regions will be needed in the future for many fungal groups (cf. Gazis et al. 2011).
The present work suggests that significant intragenomic variation in the ITS region is potentially widespread in a small percentage of species throughout kingdom Fungi. A possible mechanism for generation of intragenomic variation is hybridization (James et al. 2009), although mechanisms capable of maintaining this variation are poorly understood. Understanding the mechanisms that allow ITS paralogs to "escape" concerted evolution in certain species may be the key to understanding how concerted evolution acts so efficiently in the majority of situations. Despite the fact that a large percentage of eukaryotic DNA is repetitive and subject to homogenization via concerted evolution, the fundamental mechanisms of concerted evolution remain largely unknown (Dover 1993;Elder and Turner 1995;Liao 1999).
Given that ITS regions often differ among species, it must be concluded that the ITS region typically evolves significantly during the time it takes for species to diverge. However, it is not known if species displaying large levels of intragenomic variation are being "caught in the act" of evolving, or whether these species can maintain this variation over long evolutionary periods of time. When ITS sequences diverge via speciation, the observed variation will be based on a combination of how quickly species diverge (i.e., how quickly the ITS regions diverge) relative to how quickly concerted evolution erases variation. If rare ITS paralogs do indeed represent traces of previous speciation or hybridization events, it may be possible to use these variants to better understand the evolution of fungal species complexes.
Appendix. List of single-spore isolates subjected to pyrosequencing and summary of molecular variation statistics by isolate.  Appendix. Continued.  Appendix. Continued.