Sequence diversity in the cytochrome c oxidase subunit 1 gene has been shown to be an effective tool for species identification and discovery in various groups of animals, but has not been extensively tested in mammals. We address this gap by examining the performance of DNA barcodes in the discrimination of 87 species of bats from Guyana. Eighty-one of these species showed both low intraspecific variation (mean = 0.60%), and clear sequence divergence from their congeners (mean = 7.80%), while the other six showed deeply divergent intraspecific lineages suggesting that they represent species complexes. Although further work is needed to examine patterns of sequence diversity at a broader geographical scale, the present study validates the effectiveness of barcoding for the identification of regional bat assemblages, even highly diverse tropical faunas.
DNA barcoding seeks to advance both species identification and discovery through the study of patterns of sequence divergence in a standardized gene region. A segment near the 5′-terminus of the mitochondrial cytochrome c oxidase subunit 1 (COI) gene has been selected as the barcode region for members of the animal kingdom (Hebert et al. 2003). Its effectiveness has been validated for various animal groups and most investigated species (> 94%) possess distinct barcode arrays, with low intraspecific variation and high divergences from closely allied taxa (Ward et al. 2005; Hajbabaei et al. 2006a). Barcode sharing has been found between a few congeneric species, largely among taxa that are known to hybridize (Kerr et al., in press). Most prior barcode studies have generated hypotheses concerning overlooked (cryptic) species (Hebert et al. 2004a), many of which have subsequently been recognized as having morphological and ecological differences (Ward et al. 2005; Hajibabaei et al. 2006a).
The earliest DNA barcode studies involved investigations of sequence variation in local faunas (Hogg & Hebert 2004; Ball & Hebert 2005), but these are now leading to continental or global barcode campaigns for a few groups such as birds, fish and Lepidoptera (Marshall 2005). Although the efficacy of DNA barcoding has gained increasing validation, prior work on mammals has been restricted to two studies of primate species, most represented by a single individual (Lorenz et al. 2005; Hajibabaei et al. 2006b). Bats (order Chiroptera) are an obvious target for analysis as approximately 20% (1116 of 5416) of all mammal species belong to this order (Wilson & Reeder 2005). Moreover, although most mammal species are thought to have been described, the incidence of overlooked taxa is likely to be high within bats due to their cryptic behaviour and morphology.
In a previous survey of bats, sequence diversity in cytochrome b (cyt b) established that most species show mean intraspecific divergences that are less than 2.5% (Ditchfield 2000; Bradley & Baker 2001). Because rates of evolution in cyt b and COI are roughly similar, these earlier investigations provide a useful benchmark for our work. Here we build on earlier genetic work with bats and previous DNA barcoding studies by examining patterns of COI divergence in a highly diverse Neotropical bat fauna. Bats follow the usual pattern of increased taxon diversity in tropical regions (Willig & Selcer 1989) and thus represent a good group with which to test earlier predictions that barcoding will fail in the tropics (Moritz & Cicero 2004). Neotropical bats are a particularly good test system as this region has historically been regarded as containing the highest microchiropteran species density in the world (Willig & Selcer 1989), provoking detailed taxonomic studies (e.g. Simmons & Voss 1998; Barquez & Diaz 2001; Lim & Engstrom 2001; Lim et al. 2003). We focus on Guyana because of both recent taxonomic work (Lim & Engstrom 2001) and the availability of vouchered specimens. Our investigation seeks to assess the effectiveness of DNA barcoding for species discrimination in this fauna.
We sampled tissue from 840 vouchered (skin and skeleton or whole in alcohol) specimens held at the Royal Ontario Museum, representing 87 species, 47 genera and 7 bat families which had previously been collected from a variety of locations within Guyana (Fig. 1). Taxonomic designations followed Wilson & Reeder (2005) with the following exceptions; we retain Artibeus planirostris as a unique species from Artibeus jamicensis following Lim et al. (2004) and we retain the name Artibeus bogotensis due to taxonomic revisions in progress. We also retained a Molossus sp. designation for one specimen because Lim & Engstrom (2001) concluded that it was either a species unknown from Guyana or an undescribed taxon. Details on each specimen including sampling location are available within the ‘Bats of Guyana’ project in the Published Projects section of the Barcode of Life Data Systems (BOLD, http://www.barcodinglife.org). Specimen information [global positioning system (GPS) coordinates of collection, institution-holding voucher, voucher number, etc.] is found by following the ‘view all records’ link and then clicking on the ‘specimen page’ for each specimen. Similarly, sequence information and trace files are found under the ‘sequence page’ linked to each specimen.
DNA isolation, amplification and sequencing
A 1-mm3 piece of frozen tissue (liver, heart or kidney) from each specimen was placed directly into 96-well plates containing lysis buffer and proteinase K. Subsequent DNA extraction employed a glass fibre protocol (Ivanova et al. 2006). The 658-bp target region of COI was amplified using two primer cocktails and visualized in a 96-well E-Gel (Invitrogen). Cocktail 1 was C_VF1di,/C_VR1di (Ivanova et al. 2006), while cocktail 2 (C_VF1LFt1/C_VR1LRt1), was an improved version including M13-tailed versions of the primers and an additional primer pair, LepF1_t1 and LepRI_t1 in the following ratio; 10 pmol/µL, VF1_t1: VF1d_t1: LepF1_t1: VF1i_t1 (1:1:1:3) or VR1_t1: VR1d_t1: LepRI_t1: VR1i_t1 (1:1:1:3) (Ivanova NV, Zemlak TS, Hanner RH, Hebert PDN, pers comm). All primer sequences are available from the Canadian Center for DNA Barcoding by visiting the website at http://www.dnabarcoding.ca/clareetal2006.php. The polymerase chain reaction (PCR) mix included 6.25 µL of 10% trehalose, 1.25 µL 10× PCR buffer, 0.625 µL (2.5 mm) MgCl2, 0.125 µL (10 µm) forward and reverse primer cocktail, 0.625 µL (10 mm) DNTPs, 0.625 µL Taq polymerase and 4 µL H20 + template DNA (Hajibabaei et al. 2005). PCRs were run under the following thermal cycle conditions: 1 min at 94 °C followed by 5 cycles of 30 s at 94 °C, 40 s at 50 °C, and 1 min at 72 °C, followed by 35 cycles of 30 s at 94 °C, 40 s at 55 °C, and 1 min at 72 °C, and finally 10 min at 72 °C and PCR products were sequenced on an ABI 3730 (Hajibabaei et al. 2005).
Samples producing single clear amplicons from cocktail 1 were sequenced with VF1d and VR1d (Ivanova et al. 2006), while those from cocktail 2 were sequenced with M13F and M13R (Messing 1983) using BigDye version 3.1 on an ABI PRISM 3730 capillary sequencer (Applied Biosystems).
Sequences were aligned using seqscape version 2.1.1 (Applied Biosystems) and edited manually. Sequences and original trace files are available in the ‘Bats of Guyana’ project on BOLD and on GenBank (accession nos EF079971–EF080810). Sequence divergences were calculated using the Kimura two-parameter (K2P) model of base substitution (Kimura 1980). A neighbour-joining (NJ) tree of K2P sequence distances showing intraspecific variation was created using BOLD and a NJ tree of interspecific divergence including bootstrap analysis (500 replications) was performed using mega3 (Kumar et al. 2004). K2P sequence divergences for all levels in the taxonomic hierarchy were determined using the ‘Distance Summary’ tool on BOLD. Species that split into two or more distinct groups with high bootstrap support and sequence divergences greater than 2.5% between them are hypothesized to represent provisional species following the observations of Ditchfield (2000) and Bradley & Baker (2001). Such groups were differentiated by adding a suffix (PS1, PS2) to the current species name. Regression analysis was used to assess the relationships between mean and maximum intraspecific sequence divergence and sample size, and between sequence divergence and geographical distance for individuals of each species.
A COI amplicon was recovered from all 840 individuals and more than 97% of the sequence reads were greater than 600 bp in length (most 657 or 658 bp). The mean K2P sequence distance within species was 0.60%, while the mean divergence between congeners was 13× higher at 7.80% (Fig. 2; Table S1, Supplementary material). Regression analysis indicated that mean divergence values were not significantly correlated to sample size (adjusted R2 = 0.013 P = 0.161) and that maximum divergence was only weakly related to sample size (adjusted R2 = 0.105, P = 0.003) (Fig. 3). Geographic distance also explained very little of the sequence divergence among conspecific individuals (adjusted R2 = 0.016, P < 0.001). Six species showed deep intraspecific variation, forming two or more intraspecific barcode groups with greater than 2.5% mean divergence between them (Table 1, Fig. 4).
Table 1. Six bat species from Guyana with large sequence divergence (K2P) between lineages at COI. Lineages without bootstrap support represent single specimens
A NJ tree of sequence divergences (K2P) at the COI region indicated that most genera formed cohesive units (Fig. 4; Figure S1, Supplementary material for a complete NJ tree showing intraspecific variation). However, levels of sequence divergence between congeneric taxa varied substantially, appearing to approximate a bimodal distribution. At the extremes, two genera showed very low divergences among species —Molossus ater and Molossus molossus were 2.18% divergent, while Carollia brevicauda and Carollia perspicillata were just 1.2% divergent. By comparison, Peropteryx leucoptera and Peropteryx kappleri showed almost 20% sequence divergence.
Several previous studies on vertebrates have raised concerns regarding the acquisition and ease of interpretation of DNA barcode data. For example, Vences et al. (2005) encountered difficulties in amplifying the barcode region from amphibian lineages, but they did not use a primer set that was designed for this group. Their difficulty contrasts with the results on other vertebrate groups, including birds and fishes, where amplification of the barcode region has proven straightforward (Hebert et al. 2004b; Ward et al. 2005). While we encountered early difficulties in barcode recovery for bats, development of a primer cocktail enabled amplification of all species included in our study. Because the component primers in this cocktail were tailed with M13, sequencing reactions were straightforward. The application and extension of this formulation strategy not only promises a solution to barcode amplification for groups such as the amphibians, but also the generation of cocktails that amplify the barcode region for broad assemblages of life.
Aside from difficulties in PCR amplification of the barcode region, concerns have been raised in relation to interpretational problems derived from the inadvertent amplification of COI nuclear pseudogenes of mitochondrial origin (NUMTs, Bensasson et al. 2001). In practice, the ∼650-bp length of the barcode amplicon provides substantial protection against this as most NUMTs are < 200 bp in length (Richly & Leister 2004). In addition, because of the higher copy number of mitochondrial COI sequences, prior barcode studies have shown that NUMTs are detected in a very small percentage of species (Kerr et al., in press). Moreover, when detected, NUMTs regularly show indels or diagnostic mutations (e.g. stop codons) that reveal their presence. Rigorous inspection of trace files, especially those with low phred (quality) (Ewing & Green 2006) scores linked to heterozygous peaks or uncalled bases, can also be used to filter possible NUMTs. Detection allows a shift in analytical strategy to suppress their amplification. In the present study, we detected no signs of pseudogenes, a fact that may be correlated with the unusually small genome sizes of bats (Van Den Bussche et al. 1995; Gregory 2002).
Prior barcode studies have established that more than 95% of species possess diagnostic sequence arrays for the barcode region. For example, Ward et al. (2005) found that all 207 Australian fish species that they examined had a diagnostic barcode sequence array. Few species assemblages have been surveyed on a continental scale, but work on North American birds revealed that nearly 95% of recognized species have distinct barcode arrays (Kerr et al., in press). The few cases where barcodes failed to separate bird species involved either closely allied allopatric taxa whose status as distinct species is uncertain or sister taxa that hybridize (Kerr et al., in press). Although no prior investigation has assessed the effectiveness of DNA barcoding in tropical vertebrates, a study on more than 500 species of Costa Rican Lepidoptera established that 98% of these species had diagnostic barcode arrays (Hajibabaei et al. 2006a).
Our investigation reinforces the conclusions of earlier barcode studies on animals. All of the bat species that we examined possessed a diagnostic array of COI sequences, enabling their identification. In 81 of 87 species, sequences formed a single cohesive cluster that was clearly divergent from those of congeneric taxa, as evidenced by the 13-fold higher mean sequence divergence among congeners than among members of a species. The other six species showed a different pattern of variation — sequences fell into two or three clusters, each showing substantial sequence divergence from their sister clusters. Similar cases of deep sequence variation within ‘species’ have been regularly encountered in prior DNA barcode surveys and they provide prima facie evidence that the taxon/taxa under investigation represents a species complex. The incidence of overlooked taxa recognized in this fashion ranges from lows of 3% in groups that have seen much taxonomic work such as birds (Hebert et al. 2004b) or 5% in the Lepidoptera (Hajibabaei et al. 2006a) vs. > 200% in groups that have received little attention (Blaxter et al. 2005). With an incidence of 6.9%, the bats of Guyana show a higher level of overlooked taxa than most other vertebrate groups, but not dramatically so. At least one of the six taxa we identified has been previously recognized as a probable species complex in other geographical areas (Lewis-Oritt et al. 2001; Wilson & Reeder 2005).
We did not observe any striking anomalies in the patterning of barcode variation within bats although levels of sequence variation within species were higher than those in most other groups. For example, treating the provisional species as separate taxa, intraspecific variation in Guyanese bats averaged 0.60% vs. 0.27% in North American birds (Hebert et al. 2004b), 0.39% in marine fishes (Ward et al. 2005) and 0.46% in Lepidoptera (Hajibabaei et al. 2006a). The higher level of variation in bats reflected a general shift in the pattern of intraspecific diversity. Within other groups, most species showed a single dominant sequence and rare variants around it. By contrast, in some Guyanese bat species, every individual that we examined had a different sequence. This elevated variation may reflect some unique aspect of mitochondrial evolution in bats.
In summary, we have established the effectiveness of DNA barcoding for identification of the bat fauna of Guyana, despite its high diversity. By extension, our results suggest that DNA barcode libraries will create highly effective identification systems for any regional bat fauna. We further conclude that the assembly of these local libraries will generate a substantial number of hypotheses regarding overlooked species. In this study, sequence divergence was unrelated to the geographical distance between collection sites for specimens. However, this result is not unexpected, given that these localities were never more than 700 km apart, a distance representing just a small segment of most species ranges. Hence, the present investigation provides no guide to the further diversity that will be revealed when barcode data are gathered on bats from different geographical regions. However, a substantial number of additional taxa will surely be revealed.
Although bats show a high incidence of cryptic species and substantial difficulties in species identification, similar complexities occur in other mammal orders such as rodents and insectivores. Viewed from this context, the assembly of a DNA barcode library for the global mammal fauna will not only aid recognition of currently overlooked species, but will also lead to the development of an automated identification system that will be particularly valuable for taxa in these groups. The latter tool will further be a useful practical resource for varied ecological, biodiversity and evolutionary investigations. With just over 5000 species, the global mammal fauna represents a smaller challenge than the campaigns that seek to barcode all birds and fishes by 2011. As a consequence, despite a delayed start, it seems likely that a comprehensive barcode inventory for all mammals can reach closure by the same timeline.
This work was supported by grants from the Gordon and Betty Moore Foundation, from Genome Canada through the Ontario Genomics Institute, and from the Natural Sciences and Engineering Research Council of Canada to PDNH and by a NSERC postgraduate scholarship to E.L.C. We thank N.V. Ivanova for aid with sequence analysis, S. Ratnasingham and G. Downs for bioinformatics support and M. Hajbabaei, A. Borisenko, N. Ivanova, R. Hanner, M.B. Fenton and an anonymous reviewer for helpful comments on this manuscript.