• DNA sequences;
  • phylogenetics;
  • public DNA databases;
  • taxonomic misidentification;
  • fungal material

In the October 2003 issue of New Phytologist, Bridge et al. (2003) presented a study on the unreliability of published DNA sequences, also covered in a commentary in the same issue (Vilgalys, 2003). In common with many other phylogeneticists, taxonomists and ecologists, we have realised the increasing impact that misidentified or otherwise erroneous sequences may have on the reliability of conclusions inferred from use of the sequence data. In principle we therefore welcome this type of initiative. One consequence of this is that we have never included in our own papers sequence data from sources that we do not trust. Another consequence of the problem is that we are involved in initiatives for the development of smaller, quality controlled sequence databases where all accepted sequences must originate from well annotated fungal material (sporocarps or axenic cultures) deposited in public herbaria or culture collections, and identified/verified by recognized experts (Kõljalg et al., 2003). Unfortunately, Bridge et al. (2003) fall into another pitfall by publishing without taking the time to look at the sequence accessions, read the original papers in which the sequence data were presented, or contact the sequence submitters when results appear to be highly unexpected.

Sadly, as a result of publication pressure and limited time, there is a growing and widespread trend leading to citation of secondary and tertiary references, instead of citing the original publications. In consequence, data and conclusions become misinterpreted, and although we do not accuse Bridge et al. at this particular point, more and more often hypotheses are presented as well documented facts. We find this approach an unfortunate development.

Bridge et al. (2003) report their results from three case studies of sequences retrieved from the EMBL/GenBank/DDBJ sequence database. The largest of the cases rests on 100 nuclear ribosomal small subunit sequences from specimens classified as members of the Ascomycete order Helotiales. We could find only 91 accession numbers listed in their Table 1, and as far as we could identify from the unordered list, 47 of the 91 were deposited by us (Holst-Jensen et al., 1997a, 1997b, 1999). We further deduced from their discussion that another six unlisted sequences (AJ226071AJ226076) included in their study came from species of the genus Myriosclerotinia (Sclerotiniaceae) and were also deposited by us (Holst-Jensen et al., 1999). We have not been able to deduce what the remaining three missing accessions were.

A sequence accession includes a description of the deposited sequence. To take one example from the Helotiales case, AJ226069 from Myriosclerotinia caricisampullaceae (Fig. 1), the complete accession contains 483 bases, of which only the first six (1–6) and the last 48 (436–483) correspond to the functional ribosomal small subunit RNA (SSU). The bases in between (7–435) represent a putative group IC1 intron (Holst-Jensen et al., 1999). Bridge et al. (2003) evidently missed this information although it was available with the accession (see Fig. 1), and from lengthy discussion in the referenced paper.


Figure 1. Sequence accession AJ226069, showing the format of a typical EMBL/GenBank/DDBJ sequence accession. Sequence related information that may be retrieved includes, among others, description of the source, and products coded by subsequences.

Download figure to PowerPoint

The method used by Bridge et al. (2003) to retrieve and compare sequences is simple and bound to lead to erroneous conclusions from time to time. Repeated key word searching in EMBL with the Sequence Retrieval System (e.g. combining ‘SSU’, ‘ribosomal’ and ‘Helotiales’ would lead to retrieval of all sequences containing these key words, including several with only partial and flanking SSU sequences. Apparently the description block (DE, Fig. 1) would be sufficient to remove the latter type of accessions. However, as already explained, it is not always that simple. Bridge et al. should have included at least one additional step: the data should have been verified by careful examination of the deposit information and a thorough search for related literature, in particular where the best matches were to apparently distantly related taxa.

A total of eight of our 53 SSU sequence accessions included in the Helotiales case contain group IC1 introns (AJ226069–AJ226076), corresponding to a case given particular attention. We quote from Bridge et al. ‘Sequences of eight of the nine representatives of the genus Myriosclerotinia… showed good matches to other sequences obtained in the same study from the same laboratory, but either no match, or very low matches, to other Helotiales. … The ninth sequence from this genus was obtained in a different study (accession number Z81386; Holst-Jensen et al., 1997b) and identified closely with the genus Monilinia. … This finding therefore supports the suggestion that the eight ambiguous sequences have not been correctly identified. All of these sequences were obtained from cultures and so could again be the result of faster-growing contaminants in the original samples’. Any search combining the key words ‘Myriosclerotinia’ and the voucher numbers from the eight sequence accessions would have retrieved sequence data from a related taxonomic study (Holst-Jensen et al., 1998) from which correct identification of the isolates would have been evident. Group I introns are mobile intervening sequences that are relatively common in nuclear rRNA genes of algae, fungi and slimemolds. Phylogenetic studies have contributed considerable evidence for interKingdom host jumps, and the Myriosclerotinia study (Holst-Jensen et al., 1999) from which the eight sequence accessions are derived provided evidence of horizontal transfer of these introns at lower taxonomic levels. To quote from the abstract of that study: Incongruent branching patterns of intron-based and rDNA based (internal transcribed spacer) phylogenetic trees suggest that the fungal host genomes and the group I introns do not share a common evolutionary history.

The information now available from the databases is attractive. However, naïve use of this information may lead to misinterpretation and potentially far reaching and erroneous conclusions. As a mycologist you would not compare spore data from members of a genus in a database without determining if each described spore is a meiospore or a particular type of mitospore (micro or macroconidium, or chlamydospore). What we need therefore is to improve the competence of the database-exploring scientists.

Bridge et al. (2003) also point to another problem, that many of the accessions are unpublished and/or unvouchered. Updating sequence data is important, but it is not evident who is responsible. At present, the responsibility primarily lies with the depositor. Personally, we believe that up-to-date records can be obtained only if the databases perform their own regular literature searches and update the database accordingly. Any use of apparently unpublished sequence data should initiate, as a minimum, a literature search using combinations of author names, taxa and sequence based key words. We expect that the majority of putatively unpublished data would be identified as published data, and that many of them would include voucher information.

We strongly support critical reviews of the works of scientists, and welcome any well founded criticism of our hypotheses and conclusions. However, care should be taken to verify correct data interpretation, and the sources should always be examined before publishing.


  1. Top of page
  2. References