Current progress in DNA barcoding and future implications for entomology

Authors


Utsugi Jinbo, Department of General Systems Studies, Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan. Email: cujinbo@mail.ecc.u-tokyo.ac.jp

Abstract

DNA barcoding is a technique for identifying organisms based on a short, standardized fragment of genomic DNA. The standardized sequence region is called a DNA barcode because it is like a barcode tag for each taxon. Since the proposition of this concept and the launch of a large project named the Barcode of Life, this simple technique has attracted attention from taxonomists, ecologists, conservation biologists, agriculturists, plant-quarantine officers and others, and the number of studies using the DNA barcode has rapidly increased. The extreme diversity of insects and their economical, epidemiological and agricultural importance have made this group a major target of DNA barcoding. However, there is some controversy about the utility of DNA barcoding. In this review, we present an overview of DNA barcoding and its application to entomology. We also introduce current advances and future implications of this promising technique.

INTRODUCTION

Species identification is a fundamental part of recognizing and describing biodiversity. Traditionally, identification has been based on morphological diagnoses provided by taxonomic studies. Only experts such as taxonomists and trained technicians can identify taxa accurately, because it requires special skills acquired through extensive experience.

As interest in biodiversity has increased in the fields of ecology, evolutionary biology, agriculture and economics, among others, it has become increasingly important to precisely identify species. However, the number of taxonomists and other identification experts has drastically decreased. Consequently, alternative and accurate identification methods that non-experts can use are required. One of the most promising approaches is the use of molecular instead of morphological data for identifying taxa, which has long been a fundamental idea of many biologists (Busse et al. 1996; Blaxter 2003). Advances in DNA-sequencing technologies have enabled researchers studying biodiversity to conduct simple, cost–effective and rapid DNA analyses. This progress in biotechnology, and the taxonomy crisis itself, played a large role in the creation of DNA barcoding.

OVERVIEW OF DNA BARCODING

Hebert et al. (2003a,b) proposed a technique using a primer set to amplify a 648-base pair (bp) region of the mitochondrial cytochrome-c oxidase subunit 1 (COI) gene to ensure rapid and accurate identification of a broad range of biological specimens. They named this technique “DNA barcoding”. Then, the Barcode of Life project was proposed to promote DNA barcoding as a global standard for sequence-based identification of eukaryotes. In 2004, this project was formally initiated by the establishment of the Consortium for the Barcode of Life (CBOL), which aims to develop a standard protocol for DNA barcoding and to construct a comprehensive DNA barcode library. Recently, the Barcode of Life project entered a new phase with the launch of the International Barcode of Life project (iBOL; International Barcode of Life 2010a). The iBOL is a huge international collaboration of 26 countries that aims to establish an automated identification system based on a DNA barcode library of all eukaryotes. In the first five years, the iBOL will focus mainly on developing a barcode library, including five million specimens of 500 000 species. The iBOL will also address the development of technologies, including new or improved protocols, informatics, equipment, DNA extraction methods and faster information systems.

The CBOL and iBOL have launched campaigns to build DNA barcode libraries of each animal group. The major targets are fish (Fish-BOL; Ward et al. 2009), birds (ABBI; Hebert et al. 2004a), mammals (Mammalia Barcode of Life), marine life (MarBOL) and insects. The Canadian Barcode of Life Network (BOLNET.ca) was the first national network for DNA barcoding. Subsequently, the following regions or countries have also initiated projects as a part of the iBOL: Europe (ECBOL; http://www.ecbol.org/), Norway (NorBOL; http://dnabarcoding.no/en/), Mexico (MexBOL; http://www.mexbol.org/) and Japan (JBOLI; http://www.jboli.org/). JBOLI provides information and promotes collaborative projects on DNA barcoding in Japan (see http://www.jboli.org/en/projects for relevant projects). There are also thematic programs, such as polar life (PolarBOL), quarantine and plant pathogens (QBOL, as a part of the ECBOL; Bonants et al. 2010) and human health (HealthBOL). As for insects, campaigns for Lepidoptera, Trichoptera, ants (Formicidae) and bees have been started. Tables 1 and 2 list the current iBOL campaigns and the number of barcoded specimens and species for each insect order, respectively. The most progress to date has been made for the Lepidoptera group. Presently, 430 000 barcodes representing about 50 000 species (30% of all known species) have been collected (Silva-Brandão et al. 2009; International Barcode of Life 2010b). Barcoding projects combined with inventories for relatively small areas are currently in progress. An inventory of the Área de Conservación Guanacaste, a World Heritage Site in Costa Rica, is one such project that includes barcoding, with a focus on insects (Janzen et al. 2005, 2009; Hajibabaei et al. 2006a). Another example is the Moorea Biocode Project, a comprehensive inventory of Moorea Island in French Polynesia that incorporates DNA barcoding (Check 2006).

Table 1.  Current progress of international Barcode of Life campaigns
Name of campaignTarget groupTotal species numberSpecimens barcodedSpecies barcodedClusters recognized by barcode
  • Data accessed 15 October 2010.

  • “Clusters recognized by barcode”, specimen clusters which can discriminate from those of any described species.

Formicidae Barcode of LifeAnts12 2058 495792 6%1697
Trichoptera Barcode of LifeCaddisflies13 16517 8232 347 18%654
Lepidoptera Barcode of LifeButterflies and mothsapprox. 165 000438 34148 676 29%<4000
All Bird Barcoding InitiativeBirds9 93320 2463 281 33%31
Coral Reef Barcode of LifeAnimals in Great Barrier Reef16 80728 6195 431 32%no data
Fish Barcode of Life InitiativeFishes31 22060 3857 882 25%no data
Mammalia Barcode of LifeMammals5 42619 862858 16%305
Marine Barcode of LifeMarine life55 45137 1826 199 11%no data
Shark Barcode of LifeSharks1 1604 339557 48%no data
Table 2.  Current progress of DNA barcoding library of insects stored in the BOLD system
OrderNumber of specimens barcodedNumber of species barcoded
  • Data accessed 15 October 2010.

Diplura84
Protura00
Collembola00
Archaeognatha424
Thysanura113
Ephemeroptera7 192513
Odonata3 521291
Dictyoptera42
Blattaria60494
Isoptera467134
Mantodea228140
Dermaptera496
Plecoptera3 221400
Orthoptera3 395654
Phasmida8725
Embioptera1911
Zoraptera00
Grylloblattodea11
Mantophasmatodea21
Psocoptera703
Phthiraptera52785
Thysanoptera880103
Hemiptera14 5182 129
Neuroptera76999
Megaloptera829103
Raphidioptera105
Planipennia00
Coleoptera18 9264 428
Strepsiptera97
Mecoptera3226
Siphonaptera7511
Diptera61 1406 182
Trichoptera24 0033 457
Lepidoptera433 84347 732
Hymenoptera91 02412 247
Total664 92479 320

The COI region does not work well as a DNA barcode for plants and fungi. Therefore, alternative sequence regions have been proposed for use as barcodes in these groups. For plants, two regions of chloroplast DNA, ribulose–bisphosphate carboxylase (rbcL) and maturase K (matK), are considered standard barcodes and some other regions are considered supplementary barcodes (CBOL Plant Working Group 2009; Consortium for the Barcode of Life 2009). For fungi, an internal transcribed spacer (ITS) has been proposed as the standard (Seifert et al. 2007; Seifert 2009). Large barcoding projects for both trees (TreeBOL) and fungi (All Fungi Barcoding) have been launched.

To construct an automated identification support system for DNA barcoding initiatives, it is necessary to accumulate comprehensive DNA barcode records for all organisms. Recent advances in information technology have made it possible to manage huge datasets. The Barcode of Life Data Systems (BOLD) is the official informatics workbench for the Barcode of Life project (Ratnasingham & Hebert 2007), developed by the Canadian Center for DNA Barcoding (CCDB). BOLD provides a data repository for DNA barcodes, an identification support system based on them, and web services for other system developers. BOLD is freely available to any researcher via the Internet, although registration is required to create private databases and/or access restricted data. To identify unknown samples, researchers simply search for their sequenced barcode regions on the BOLD website. The results are displayed in tables showing the most closely related species and related higher taxa, as well as data on 50 closely related barcodes in the library. Importantly, the BOLD system is open to the public.

Advantages of DNA barcoding as an identification technique

As mentioned in the introduction, molecular-based identification is not a new concept. Many molecular identification systems have been developed, including a bacteria identification system using SrRNA sequences (Busse et al. 1996). However, DNA barcoding has several advantages over previous methods. One advantage is its availability. The standard DNA barcode region, a fragment of COI, is very efficient for species identification. This region has good discrimination power for most animal groups. This universal primer, originally designed for marine invertebrates, can be applied to all animal phyla (Folmer et al. 1994; Hebert et al. 2003a,b, 2004b). A 648-bp fragment has enough information and can be directly sequenced with a sequencer. The alignment process is not difficult because this is a protein-coding region. Errors can be detected by checking whether the obtained sequence is translatable. These useful features are the reason why the COI region was selected as the standard DNA barcode. Thus, DNA barcoding can be a simple but powerful method for non-experts, especially those who routinely identify a large number of samples.

Verifiability of identification of voucher specimens through relationships with taxonomy is another advantage of DNA barcoding. DNA barcoding is authorized by taxonomic experts who identify the voucher specimens from which DNA barcodes were obtained. A barcode record requires a species name, voucher specimen data (locality, date, depository of specimen, photographs etc.), a sequence, polymerase chain reaction (PCR) primers and trace files (sequencer's original outputs). CBOL and the National Center for Biotechnology Information (NCBI) have already proposed a standard format (keyword “BARCODE”) for barcode sequences in GenBank (Consortium for the Barcode of Life 2005). Information on voucher specimens and trace files help to confirm whether the previous identification and sequence data are correct.

DNA barcoding and taxonomy

There is considerable controversy regarding the taxonomic perspective of molecular data, including DNA barcoding (Meier 2008). There are two principal issues: (i) species identification; and (ii) species discovery. These are sometimes confused.

Species identification using barcodes depends on the number of representatives of each species included in the database. The most reliable way to obtain a DNA barcode that accurately represents a species is to base it on the type specimen of that species. The first description of a new species using a DNA barcode from the holotype was by Brown et al. (2003), who used this method to describe a new species of Xenothictis (Lepidoptera: Tortricidae). Since then, many new species have been described with DNA barcodes from the holotype or paratypes, not only in arthropods, but also in other animals (e.g. Burns et al. 2007; Badek et al. 2008; Dabert et al. 2008a,b; Vaglia et al. 2008; Yassin et al. 2008; Yoshitake et al. 2008; Adamski et al. 2009).

On the contrary, species discovery is defined as the taxonomic process of recognizing a cluster of individuals and/or populations as a single species. The DNA barcode can accelerate species discovery. First, DNA barcoding can be used to identify cryptic, previously overlooked species (Hebert et al. 2004b; Janzen et al. 2005). Second, DNA barcode information helps sort all specimens of related taxa, especially when taxonomic studies of these taxa are inadequate (e.g. Smith et al. 2006, 2007, 2008). However, as discussed below, it should be noted that DNA barcoding can not detect all candidates of undescribed species, especially for recently divergent groups.

Some researchers have envisioned “DNA taxonomy”, a concept of adopting DNA sequencing as a central criterion for taxonomic decisions and descriptions, and have proposed using DNA barcodes as the standard method of analysis (Blaxter 2003; Tautz et al. 2003; Vogler & Monaghan 2007). However, there is concern over adopting one specific sequence region as the only criterion for taxonomic studies (Lipscomb et al. 2003; DeSalle 2006; Rubinoff 2006). In addition, it is quite apparent that the DNA barcode itself is not a new species concept (i.e. a species can not be defined based on the barcode only); neither does it provide enough information to describe unknown specimens as a new species. The results of barcoding can only suggest new species candidates (Witt et al. 2006; Hajibabaei et al. 2007; Miller 2007; Waugh 2007) as well as other valuable supporting information (e.g. distribution, life history, host plants) for taxonomic studies (e.g. integrative taxonomy: Dayrat 2005, see Yoshitake et al. 2008 and Schlick-Steiner et al. 2010). Species descriptions using barcodes based on type specimens will become more common and important in the near future.

Accuracy of DNA barcode-based identification

One of the most critical issues regarding DNA barcoding is its accuracy of species identification. Generally, the accuracy fundamentally depends on the extent of overlap between interspecific divergence and intraspecific variation. That is, the larger the “gap” between intra- and interspecific differences in genetic distance, the more successful the species identification (Hebert et al. 2004a; Meyer & Paulay 2005). Indeed, some early studies reported very high identification success and the presence of a distinct barcoding gap. A mean intraspecific divergence of 10 times was proposed as the standard threshold for differentiating species (e.g. Hebert et al. 2003a, 2004a,b). However, according to Meier et al. (2008), the barcoding gap is sometimes misinterpreted and should be quantified as the difference between intraspecific and minimum congeneric distances instead of using mean values.

When intra- and interspecific distances are widely overlapped, DNA barcoding-based identification is not effective (Moritz & Cicero 2004; Meyer & Paulay 2005; Elias et al. 2007; Wiemers & Fiedler 2007). Overlap can be caused by several factors, including large genetic diversity in a species (Davis & Nixon 1992; DeSalle et al. 2005). Another major cause is paraphyly or polyphyly of species that appear to be closely related. Indeed, it is estimated that one-fourth of animal species are not monophyletic (Funk & Omland 2003). Species may appear to be polyphyletic or paraphyletic in phylogenetic analyses due to incomplete lineage sorting of mitochondrial DNA, introgression or incongruence in the definition of morphological species. Wiemers and Fiedler (2007) reported that DNA barcode-based identification often failed in Lycaenidae (Lepidoptera) because of high intraspecific divergence probably due to incomplete lineage sorting. Two groups of organisms may share the same DNA barcode(s) but may not belong to the same species, in particular, if they have diverged very recently. Such situations are rather common (e.g. Kaila & Ståhls 2006; Langhoff et al. 2009; Burns et al. 2010; Žurovcováet al. 2010). These points show the limitations of the DNA barcoding method depending on a single region of mitochondrial DNA. In such cases, supplemental analyses combined with other traits, such as nuclear genes, are required (Hebert et al. 2003a; Baker et al. 2009). Another factor that may lead to overlap is the incongruence between molecular data and the traditional definition of species, in particular when a group is poorly studied taxonomically (Avise & Walker 1999; Meyer & Paulay 2005). Such cases may be improved by integrative taxonomic revisions that combine genetic and morphological data (Funk & Omland 2003; Dayrat 2005; Meyer & Paulay 2005; Kehlmaier & Assmann 2010).

The development of algorithms for DNA barcode-based identification is a challenge in the field of bioinformatics. In the identification engine of the BOLD system, sequences similar to a query are collected from the reference library by a linear search (Ratnasingham & Hebert 2007). The result is also available as a cladogram based on the neighbor-joining (NJ) method. In the tree-based approach, a query sequence is assigned to a species when the query is included in a cluster consisting entirely or even partially of conspecifics (Hebert et al. 2003a; Meier et al. 2006). There is controversy about the accuracy of tree-based approaches, such as the NJ method, for DNA barcoding-based identification. Meier et al. (2006) introduced distance-based criteria, in which a query sequence is assigned to a species of the best-matched barcode regardless of similarity (best match method) or when the degree of difference between the query and the best-matched barcode is less than 95% for all intraspecific distances. Virgilio et al. (2010) compared the performance of DNA barcoding-based identification among insect orders and these two criteria, and concluded that the distance-based criterion showed higher and more robust performance than the tree-based one. Another criterion is character-based identification, which directly uses nucleotide variation in each base position as a diagnostic character. This criterion may provide more accurate results than distance-based approaches in which all variation is reduced to a single vector, even for subspecies and populations that show very little variation (Rach et al. 2008; Lowenstein et al. 2009). The accuracy of the character-based approach tends to be low without a comprehensive library of species or species complexes (Little & Stevenson 2007). Many algorithms based on different approaches have been proposed and their performances have been compared (Frézal & Leblois 2008; Austerlitz et al. 2009).

The most important factor affecting the accuracy of species identification is the coverage and reliability of available barcode libraries (Ekrem et al. 2007). As mentioned above, barcode-based identification will fail if the DNA barcode data of the species in question has not been registered to a library. In fact, most identification errors are caused by a lack of reference data (Virgilio et al. 2010). In addition, intraspecific variation might be underestimated when the samples included in the library do not reflect the overall genetic diversity and/or do not include all clades of non-monophyletic species groups, and interspecific variation might be overestimated if data on closely related species are unavailable. Wiemers and Fiedler (2007) reported that the barcoding gap in Lycaenidae (Lepidoptera) is an artifact caused by insufficient sampling across taxa. It should be emphasized that the misidentification of reference barcode data is another serious problem. Many records from misidentified samples have been submitted to GenBank (Ruedas et al. 2000; Harris 2003). Meier (2008) reported that misidentified barcode data are submitted to the BOLD database, which does not have a mechanism for verifying records. The DNA barcodes obtained from misidentified specimens are detected by comparison with multiple barcodes of the species. Then, misidentifications can be corrected by re-identification of voucher specimens by taxonomic experts. Thus, quality control in collaboration with taxonomists is required for the proper construction of reference DNA barcode libraries.

DNA barcode-based identification is quite effective at discriminating a limited set of species, such as species occurring in a small area, agricultural pest species and invasive species (Meier 2008; Kress et al. 2009). In these cases, the gap between intraspecific and interspecific diversity is mostly distinct because the number of closely related species complexes is small and each species shows comparatively low intraspecific diversity. However, error rates can be high when there are locally diverged species complexes or when invasive species and/or populations contaminate native populations of the same or closely related species (Meyer & Paulay 2005). Field inventories and preliminary reference database surveys are necessary to develop strategies for the creation of a robust identification framework for each specific purpose.

As mentioned above, COI barcodes do not provide adequate information for species identification when intra- and interspecific distances are widely overlapped. However, one can identify samples by combining supplementary molecular data with COI barcodes. In such cases, DNA barcode-based identification consists of two processes: a rough identification using the COI barcode and detailed identification using the supplementary molecular data for a specific group of insects. The BOLD system accepts these supplementary molecular data (supplementary barcode) in addition to the standardized DNA barcode regions. In the future, a database of identification workflows for each taxon combined with the BOLD system is required for such integrative procedures. Gompert et al. (2006) discriminated between two subspecies of Lycaeides melissa (Lepidoptera: Lycaenidae) using a nuclear marker (amplified fragment length polymorphism, AFLP). The two subspecies share some haplotypes of the COI barcoding region, probably caused by introgression. Dasmahapatra et al. (2010) emphasized that the AFLP marker is a useful tool to check results given by DNA barcode.

The presence of multiple mitochondrial gene haplotypes, such as nuclear pseudogenes of the mitochondria genome (NUMT) or heteroplasmy (the coexistence of multiple mitochondrial haplotypes in an individual), also reduces the validity of DNA barcoding. This problem has been reported for many insects (Gellissen & Michaelis 1987; Zhang & Hewitt 1996; Bensasson et al. 2000; Brower 2006; Rubinoff et al. 2006; Magnacca & Brown 2010a,b) and can affect the barcoding results (Song et al. 2008). However, two methodological advances may lessen the impact. Moulton et al. (2010) revealed that specific primer sets for the COI gene reduce the co-amplification of NUMT. Magnacca and Brown (2010a) reported that intensity of heteroplasmy differs among tissues and that DNA extracted from large tissues, such as the abdomen, reduces polymorphism in the barcode. In addition, Magnacca and Brown (2010b) showed that the species identification success rate increases when polymorphic bases are treated as characters.

Methodological advances

Some recent methodological advances in the field of DNA extraction and PCR extend the range of application for DNA barcoding. In addition to the identification accuracy reviewed above, there are two fundamental limitations of DNA barcoding for biodiversity surveys. The first is damage to voucher specimens caused by the DNA-extraction procedure. While extracting DNA, a small portion of tissue (usually thoracic muscle or legs in insects) is removed from the specimen. This procedure inevitably causes the loss of morphological information. In particular, for some extremely small insects, such as egg parasitoid wasps, preparation for dissection (e.g. swelling of the specimen) damages DNA; in addition, the specimen may be damaged during the dissection itself. Many non-destructive DNA-extraction methods have been proposed (Johnson & Clayton 2003: lice; Favret 2005: aphids; Pons 2006; Gilbert et al. 2007: Coleoptera; Petersen et al. 2007: tarantulas; Rowley et al. 2007: Acarina, Araneae, Coleoptera, Diptera and Hymenoptera; Badek et al. 2008: analgoid mites; Hunter et al. 2008: Diptera; Katoh et al. 2008: Coleoptera; Castalanelli et al. 2010: Coleoptera, Diptera, Hemiptera, Acari). These techniques enable researchers to determine the DNA barcode from voucher specimens of important museum collections or small insect specimens with minimal damage. However, these methods have been applied only to limited orders of insects and need to be tested on more taxonomic groups.

The second limitation is sample condition. The DNA of dried, pinned specimens, the most popular method of insect preservation, is degraded by heat, oxidation (Lindahl 1993; Zimmermann et al. 2008) and fumigation gas (Saito 2002). Thus, DNA barcoding has mainly been used only on fresh samples or specimens preserved in an ideal manner for molecular work (refrigerated or stored in ethanol or acetone). However, recent methodological and technical advances allow the extraction of archival or ancestral DNA from historical museum specimens or fossilized samples. The extraction and amplification of this DNA has become one of the hottest trends in molecular ecology, evolutionary biology, paleobiology and anthropology, and many different methods have been used for animals, plants and fungi (Höss et al. 1994; Yang et al. 1996; Ozawa et al. 1997; Parducci et al. 2005; Austin & Melville 2006). Table 3 summarizes the methods that have been applied to the study of insects. As shown in the table, the PCR success rate changes depending on the insect order of study and on the condition of the samples, but in general the amplification of DNA fragments becomes extremely difficult for specimens that have been preserved for more than 50 years. Strange et al. (2009) showed that molecular markers work well in Bombus specimens up to 101 years old, although the amplification rate is significantly lower in materials that are more than 60 years old. Surprisingly, Thomsen et al. (2009) obtained DNA from fossilized Coleoptera preserved in permafrost for more than 10 000 years, even though only a short fragment of DNA was amplified by PCR. Other technical advances such as efficient DNA extraction methods, the discovery of high-efficiency DNA polymerase, reagents that decrease the effect of impurities that inhibit PCR and a DNA-repairing enzyme (Hajibabaei et al. 2005; Juen & Traugott 2006; Ball & Armstrong 2008; see also Chelomina 2006) have made it more feasible to amplify DNA from historical and fossilized specimens.

Table 3.  List of past attempts to amplify the DNA from historical museum or fossilized insect specimens
Insect orderDNA extraction method (tissue from which DNA was extracted)Successive number or rate of amplificationMaximum age of samples with positive PCR result in years (preservation condition)Amplified fragment length (bp)References
  • Modified from phenol–chloroform method.

  • High speed DNA extraction method (see http://www.ande.com.au).

  • CTAB, cetyl trimethyl ammonium bromide method.

ColeopteraGilbert's method (non-destructive)13/1454 (dried)220Gilbert et al. (2007)
ColeopteraGilbert's method (non-destructive)15/20180 (dried)78Thomsen et al. (2009)
ColeopteraGilbert's method (non-destructive)17/20180 (dried)204Thomsen et al. (2009)
ColeopteraGilbert's method (non-destructive)1/23280–1870 (non-frozen)78–204Thomsen et al. (2009)
ColeopteraGilbert's method (non-destructive)3/12approx. 26 000 (permafrost)109–159Thomsen et al. (2009)
ColeopteraANDE (non-destructive)3/431 (dried)550Castalanelli et al. (2010)
DipteraChelex (non-destructive)4/434 years (dried)137 bpJunqueira et al. (2002)
DipteraDNAzol (non-destructive)29/2965 (dried)137–315 bpJunqueira et al. (2002)
DipteraPhenol–chloroform (non-destructive)2/218 (dried)137–357 bpJunqueira et al. (2002)
DipteraCTAB (head, legs and flight muscle)16/3530 (dried)287 bpHartley et al. (2006)
DipteraANDE (non-destructive)2/213 (dried)800 bpCastalanelli et al. (2010)
LepidopteraSalting out (leg)12/12109 (dried)100–300 bpHarper et al. (2006)
LepidopteraNucleoSpin (non-destructive)32/3321 (oven dried)134–221 bpHajibabaei et al. (2006b)
HymenopteraChelex (leg)46/5525 (dried)119–217 bpInoue et al. (2010)
OdonataDNeasy (leg)77.1%51 (dried)118–272 bpWatts et al. (2007)

DNA amplification from ancient specimens may also depend on the length of the amplified fragments. As shown in Table 3, fragments that are shorter than 200 bp are relatively well amplified even from old specimens, whereas longer ones are not. Indeed, most attempts to amplify such DNA have adopted primer sets for 20–200-bp fragments (Table 3). This low amplification success rate for longer fragments may be caused by fragmentation of DNA within the specimens.

Two strategies have been proposed for addressing this problem. The first is to identify species based only on short fragments that are easily amplified. Several authors (Hajibabaei et al. 2006b; Fan et al. 2009) tested this process and showed that short barcodes are effective for species identification when the taxonomic group of the sample is preliminarily confined. This strategy is especially effective when DNA barcoding is used to identify historical samples by comparing them to a reference barcode library. The second strategy is to obtain a full-length DNA barcode by connecting the short fragments. Van Houdt et al. (2010) demonstrated such a method by amplifying 269–363-bp fragments within the barcode region using newly developed universal primers and then connecting these fragments using a complete barcode guide sequence obtained from a fresh sample of the same species (or congeneric species) using the Bayesian algorithm. Although much time and effort is required, this strategy makes it possible to obtain full-length barcodes from archival specimens such as type specimens.

This progress in the barcoding of old specimens increases the value of museum collections as a source of genetic diversity information that is relevant to ecology, evolutionary biology, population genetics and conservation biology (Wandeler et al. 2007). Most primers used for DNA barcoding are universal and it is possible to amplify DNA from a wide range of organisms. This raises the risk of contaminating archival DNA with contemporary DNA. Thus, archival or ancestral DNA barcoding should be conducted under very specific conditions including at least two repetitions of PCR amplification and the elimination of contemporary DNA from the laboratory (see Chelomina 2006).

Quantitative analysis using the DNA barcode

Several authors have attempted to quantify species diversity in an environmental sample directory using barcodes. The fundamental idea is to amplify all DNA barcodes in a sample (here, we refer to this array of DNA barcodes as environmental DNA barcode) and quantify the frequency of each species. Summerbell et al. (2005) proposed a cost–effective method for doing this without reading each barcode sequence. PCR amplicons are labeled with digoxigenin dideoxy-UTP and annealed with species-specific oligonucleotide probes bound to nylon membranes. Then the signal intensities of each probe are quantified to measure the relative amounts of each barcode in the sample. This method is very cost–effective even though the number of samples for each annealing procedure is limited (<200 spp./membrane), complicated preliminary tests to prove specificity and annealing conditions are needed, and a complete set of sequence information for all species expected to exist in the environment is necessary.

A more direct and straightforward strategy for quantifying the environmental DNA barcode is to determine the sequence of each species within the array. Recent advances in pyrosequencing have made it possible to obtain numerous DNA fragments at once and quantify the frequency of species in an environmental DNA barcode. Although there have been no studies on insects using this method, several attempts have been made on diets of vertebrates (Soininen et al. 2009; Valentini et al. 2009; Deagle et al. 2010) and these showed that pyrosequencing is an effective method for DNA barcoding. Pyrosequencing has limitations, however; these include its read length and cost. Its read length is less than the full-length animal DNA barcodes (648 bp). This is problematic considering that environmental DNA includes multiple sequences from multiple species and individuals. Thus, researchers need to prepare complete barcode libraries for species that are expected to occur in the environment until technical advances extend the read length (currently up to 350 bp) to exceed full barcode length.

Despite the technical and methodological problems mentioned above, the quantification of species using the total DNA barcode will open up a wide range of possibilities, such as estimating the diet of insects and entomophagous animals from their feces (reviewed below), determining the composition of an insect's bacterial symbionts and how they change in time, investigating novel bacterial or fungal pathogens of insect pests and estimating hidden biodiversity in soil samples (Hugo et al. 2006; Juen & Traugott 2006, 2007).

DNA barcoding and other database projects

DNA barcoding projects will help to document biodiversity together with other database projects. As information technology has advanced, various large-scale database projects have been established to share biodiversity data (e.g. species names, distribution of species, observations, specimen data in natural history collections) and use them for scientific studies, conservation activities or political decision making. For example, the Global Biodiversity Information Facility (GBIF) maintains a portal website to share species names and occurrence data; the Catalogue of Life project provides a comprehensive list of organism names; and the Encyclopedia of Life and Tree of Life projects are constructing websites to describe all species, higher taxa and their phylogenetic relationships. The species name is an essential component of these databases and is used as the principal key to explore the data. Thus, species identification is also essential for these biodiversity databases. However, it is difficult to identify species for most users and data providers of these databases such as ecologists, governmental officers and policy makers. Converting DNA barcode sequences into species names through an identification system would make the DNA barcode into a new keyword that allows non-taxonomists to retrieve precise data from multiple biodiversity databases (Fig. 1). Collaborations between DNA barcoding projects and other biodiversity projects would enable users of DNA barcoding to seamlessly obtain various data about a species including its diagnosis, geographic range or specimen information in addition to its species name. Furthermore, researchers can assess biodiversity patterns and processes by assembling and integrating various biodiversity resources using DNA barcode (see Guralnick & Hill 2009).

Figure 1.

Workflow for retrieving biodiversity information from databases by (A) traditional approach and (B) DNA barcoding.

The genetic sequence databases and DNA barcode databases are also cooperating with each other. The sequence data including voucher specimen information stored in BOLD are also registered to NCBI, the DNA Data Bank of Japan (DDBJ) and the European Molecular Biology Laboratory (EMBL) when they are ready for public release (Ratnasingham & Hebert 2007).

APPLICATIONS OF DNA BARCODING FOR ENTOMOLOGY

The unique features of DNA barcoding mentioned above also provide many benefits to both basic and applied entomology.

Identifications using molecular data can help elucidate the relationships of morphologically variable individuals of the same species, such as individuals in different developmental stages, castes in social animals and sexually dimorphic individuals (Miller et al. 2005; Johnson et al. 2009). Insects, especially those of holometabolous orders, are extremely variable, and numerous attempts have been made to associate their life stages using molecular markers (Miller et al. 2005; Ahrens et al. 2007; Sutou et al. 2007; Johnson et al. 2009; Gattolliat & Monaghan 2010; Hayashi & Sota 2010; Kathirithamby et al. 2010; Murría et al. 2010; Pauls et al. 2010). In addition to the features of typical non-barcode molecular markers, the advantages of DNA barcoding include primer universality, the accumulation of information on a wide range of taxonomic groups, and its association with taxonomy. These advantages may aid the study of ecologically interesting insect phenomena, such as host plant alternation among aphids, extreme sexual dimorphism and heterotrophic heteronomy of Strepsiptera, as Kathirithamby et al. (2010) investigated using non-barcode molecular markers.

Cryptic host specificity

DNA barcoding can also help to identify species interactions. Host specificity of parasitic insects is a major topic of interest. Smith et al. (2006) evaluated the feasibility of using DNA barcodes for studying parasitoid insects. They identified caterpillar parasitoid flies (Tachinidae) in Costa Rica from a large number of materials using morphology-based methods. Then they repeated the exercise using DNA barcodes, and were able to recognize 17 morphologically discriminated species candidates and another 15 cryptic species candidates. An extraordinary diversity of tachinid flies and parasitic wasps in the study area was also revealed using a similar approach (Smith et al. 2007, 2008). Li et al. (2010) revealed actual host utilization of fig-associated Sycophila wasps (Hymenoptera: Eurytomidae) using both barcode and non-barcode sequences. Plant DNA barcode data make it possible to identify host plants from plant tissues on insect bodies and from digested materials. Jurado-Rivera et al. (2009) estimated host specificity of Australian leaf beetles (Coleoptera: Chrysomeridae) and their associations with plants using the DNA barcoding approach. They amplified the host plant DNA barcode (chloroplast trnL intron, which is a supplementary barcode region for plants) directly from extracts of 76 species of beetle and attempted to identify each host plants. They identified the DNA barcodes of undiscovered host plants, revealing previously unknown host plants for beetles. Matsuki et al. (2008) showed that food habit of phytophagous insects can be estimated by amplifying plant DNA from their faces. The future accumulation of reference DNA barcodes for entire biota will make DNA barcoding a useful tool for studying host specificity and diversification processes in nature.

Trophic relationships

The idea of tracking trophic links in the field using molecular data has become common in the past 15 years with advances in PCR technology. Numerous studies have been conducted to reveal the trophic relationships between predator and prey or herbivore and plant by detecting prey or host DNA from the gut contents or feces of the predators or herbivores using specific primers or antibodies (Asahida et al. 1997; Kohn & Wayne 1997; Zaidi et al. 1999; Farrell et al. 2000; Gariepy et al. 2007; Matheson et al. 2007; Dunshea 2009; Weber & Lundgren 2009; King et al. 2010: reviewed in Sheppard & Harwood 2005; Fournier et al. 2008; King et al. 2008). These studies have shown that a molecular approach can reveal trophic interactions in nature. Clare et al. (2009) amplified DNA barcodes from guano of the Eastern red bat Lasiurus borealis to estimate the composition of the bat's prey. Quantitative analysis of DNA barcodes revealed that the bats prey mostly on Lepidoptera except for Arctiidae, even though many of the prey species have a tympanal organ, which was believed to be an effective defense mechanism against bat attack. This result also suggests the effectiveness of multiple putative defense mechanisms in Arcriinae, such as ultrasonic jamming.

As reported in previous non-barcode molecular studies (Agustíet al. 2003; Sheppard & Harwood 2005; Davey et al. 2007; Hosseini et al. 2008; Greenstone et al. 2010; Monzóet al. 2010), DNA in the animal gut is detectable for one or a few days. Thus the DNA barcode enables researchers to trace not only trophic links, but also changes in diet according to season or the developmental stage of an insect (e.g. Davidson & Evans 2010). Quantitative analyses such as pyrosequencing may make trophic studies using the DNA barcode more comprehensive, quicker and easier, as discussed by Deagle et al. (2009).

DNA barcoding is not a perfect tool for trophic ecology. For instance, researchers can not estimate a target animal's complete feeding habit only by animal barcodes when the target is omnivorous (polyphagous), consuming not only animals but also plants, fungi and detritus. In addition, PCR does not reveal whether an amplified fragment originated from predation or scavenging. Furthermore, the high sensitivity of PCR-based methods may lead to misleading results about feeding habit based on gut content: universal primers amplify DNA fragments originating not only from predator's gut contents but also from prey's gut contents. These problems can be reduced by combining DNA barcoding with other methods such as stable isotope analysis using δ13C and δ15N in animal tissues, as discussed in Okuzaki et al. (2010).

Applied entomology and commercial use

A simple and rapid species identification system is necessary for commercial, agricultural, environmental, conservational and epidemiological uses. For commercial use, there are representative case studies such as the identification of tuna fish and bush meat to detect mislabeling and the illegal trade of products (Eaton et al. 2009; Lowenstein et al. 2009). DNA barcode can detect illegal trade of endangered or protected insects, such as birdwing butterfly used for ornaments and some stag beetles kept as pets. For agriculture and the conservation biology, the rapid detection of serious pests and/or invasive species could prevent their establishment. Many molecular-based methods for identifying various organisms using various tools and target molecules have been introduced. However, methods that can be applied to a range of targets are necessary because of the drastic increase in and globalization of potential targets for identification (Bonants et al. 2010). For epidemiological purposes, rapid identification methods would facilitate the monitoring of disease vectors such as mosquitoes. DNA barcoding has the potential to become a standard tool for species identification in these fields (Floyd et al. 2010).

Invasive pests are the most serious threat to biodiversity, and their rapid and accurate identification is indispensable in terms of biosecurity. For this purpose, global coverage in the DNA barcode library is of great value. Armstrong and Ball (2005) compared the performance of DNA barcoding-based identification to previous molecular-based methods and found that DNA barcoding was a better solution, useful for monitoring pests and detecting unpredictable species. They also emphasized that DNA barcoding is extensible only by append barcode data of more species. Two groups, tussock moths (Lepidoptera: Lymantriidae: Lymantria and Orgyia) and fruit flies (Diptera: Tephritidae), were initially selected for test cases. Subsequently, identification performance was also tested using three important pest groups of Lepidoptera, a species group of Lymantria, yellow peach moth (Crambidae: Conogethes) and fall web worms (Arctiidae: Hyphantria) (Armstrong 2010). Scheffer et al. (2006) surveyed outbreaks of invasive leaf miner pests (Diptera: Agromyzidae) in the Philippines and found the presence of three species of Liriomyza. Another barcoding study reported four unrecorded alien species at an urban park in Vancouver, Canada (deWaard et al. 2009). DNA barcoding also helps to identify specimens in various developmental stages, which are difficult or impossible to identify morphologically due to a lack of reliable characteristics (Edwards et al. 2008; Zhang et al. 2008; Emery et al. 2009; Malumphy et al. 2009; Pieterse et al. 2010). Rapid identification of the larvae of pest species is very important for pest control. Doskocil et al. (2008) investigated the species composition and seasonal occurrence of turfgrass-infesting larvae of the Phyllophaga beetle (Coleoptera: Scarabaeidae) using a DNA barcode and proposed an efficient control strategy based on their results. Tokuda et al. (2009) identified gall midge larvae inducing gall on cultivated roses in Japan using DNA barcode and revealed that the gall midge species which associated with wild roses occasionally feed on cultivated roses.

DNA barcoding is also a useful tool for searching for candidates of biological control agents and evaluating their potential risks. The importance of biological control agents that predate upon or infest pests in nature has increased due to the expansion of international commerce, which has resulted in an increased chance of invasion by non-native pests. However, searching and screening for, and evaluating the risk of, such agents require long-term feeding experiments. Hence, using DNA barcoding to identify agents based on their gut contents would make this process more efficient (Symondson 2002; Greenstone 2006; Neumann et al. 2010).

Combining rapid identification using DNA barcodes (Besansky et al. 2003; Cywinska et al. 2006; Kumar et al. 2007) and adequate knowledge of fundamental ecology of hematophagous-vector arthropods would make it possible to prevent or minimize the epidemiologic risk of vector-borne pathogens such as malaria, trypanosoma and many viruses. As mentioned above, DNA barcoding can also help elucidate the basic ecology (e.g. habit and diet of larva, verification of male and female adults) of vector insects (e.g. Garros et al. 2008; Dhananjeyan et al. 2010). DNA barcodes from blood meals in the midguts of vectors have revealed complex interactions between vectors and their vertebrate hosts (Townzen et al. 2008; Alcaide et al. 2009). For example, Alcaide et al. (2009) amplified a DNA barcode from a mixed blood meal of hematophagous arthropods (Diptera, Hemiptera and an ixodid tick species) and showed that some mosquito species occasionally feed on multiple vertebrates (e.g. feed on both mammalian and avian hosts). The accumulation of this kind of information on vector–host relationships, including results of non-barcoding molecular studies (Kent 2009), is important for predicting transmission patterns of vector-borne pathogens.

The universality of PCR primers and databases make the barcode a more powerful tool than other molecular methods. The establishment of reference barcode libraries for each field is an urgent issue. As surveyed above, many campaigns for pests (TBI; Tephritidae), pathogens (QBOL) and disease vectors (MBI; mosquito and HealthBOL) have been launched. The QBOL aims to obtain DNA barcode data of important species (fungi, arthropods, bacteria, nematodes, viruses, phytoplasmas) and to construct a diagnostic tool for quarantine (Bonants et al. 2010). Moreover, the working group for agricultural and forestry pests and their parasitoids in iBOL are planning to barcode a total of 25 000 species, including aphids, thrips, true fruit flies, scale insects, sawflies and gall wasps (International Barcode of Life 2010a). These activities will be performed in collaboration with other projects such as the Global Invasive Species Information Network (GISIN; Simpson 2004).

CONCLUSIONS

DNA barcoding has become increasingly common since it was proposed in 2003. Currently, more than one million records are available in the BOLD system, which is the official depository of DNA barcode data. The new large-scale project, iBOL, will accelerate the creation of reference barcode libraries and will facilitate the application of this simple identification method. In the near future, DNA barcoding will become a standard identification protocol for various organisms. As reviewed above, one of the initial major targets of DNA barcoding is insects. Lepidoptera have been adopted to assess the feasibility of DNA barcoding using a large dataset. Consequently, many campaigns for various insect groups have been launched to build comprehensive DNA barcode libraries of target taxa. The records can be used for taxonomic, phylogenetic, ecological, conservational and agricultural research. DNA barcoding projects are strongly related to other biodiversity and genetic database projects. Together with the identification support system, DNA barcode will be a new keyword to explore biodiversity and will serve as a bridge between research in the fields of biodiversity and genomics.

Some taxonomists are concerned that DNA barcoding will compete with traditional taxonomic studies (e.g. Ebach & Holdrege 2005a,b). However we emphasize that DNA barcoding is inseparably linked to taxonomy, a powerful tool that complements taxonomic studies (Schindel & Miller 2005; Hajibabaei et al. 2007). The integration of various types of data, such as morphological, ecological, physiological and molecular data, including DNA barcodes, will improve species discovery and description processes (Waugh 2007; Padial et al. 2010). This integrative approach will be strengthened by various biodiversity databases.

ACKNOWLEDGMENTS

We express sincere thanks to members of laboratories for their reviews on early drafts of this manuscript. Part of the work on DNA barcoding in Japan is supported by the GBIF Japan National Node, conducted within the framework of the National BioResource Project (NBRP), initiated and supported by the Japan Science and Technology Agency (JST) and the Ministry of Education, Culture, Sports, Science and Technology (MEXT).

Ancillary