SEARCH

SEARCH BY CITATION

Keywords:

  • DNA barcoding;
  • sequence analysis;
  • software

Abstract

Species identification based on short sequences of DNA markers, that is, DNA barcoding, has emerged as an integral part of modern taxonomy. However, software for the analysis of large and multilocus barcoding data sets is scarce. The Basic Local Alignment Search Tool (BLAST) is currently the fastest tool capable of handling large databases (e.g. >5000 sequences), but its accuracy is a concern and has been criticized for its local optimization. However, current more accurate software requires sequence alignment or complex calculations, which are time-consuming when dealing with large data sets during data preprocessing or during the search stage. Therefore, it is imperative to develop a practical program for both accurate and scalable species identification for DNA barcoding. In this context, we present VIP Barcoding: a user-friendly software in graphical user interface for rapid DNA barcoding. It adopts a hybrid, two-stage algorithm. First, an alignment-free composition vector (CV) method is utilized to reduce searching space by screening a reference database. The alignment-based K2P distance nearest-neighbour method is then employed to analyse the smaller data set generated in the first stage. In comparison with other software, we demonstrate that VIP Barcoding has (i) higher accuracy than Blastn and several alignment-free methods and (ii) higher scalability than alignment-based distance methods and character-based methods. These results suggest that this platform is able to deal with both large-scale and multilocus barcoding data with accuracy and can contribute to DNA barcoding for modern taxonomy. VIP Barcoding is free and available at http://msl.sls.cuhk.edu.hk/vipbarcoding/.