A fuzzy-set-theory-based approach to analyse species membership in DNA barcoding


Ai-Bing Zhang, Fax: +86 1068901860; E-mail: zhangab2008@mail.cnu.edu.cn


Reliable assignment of an unknown query sequence to its correct species remains a methodological problem for the growing field of DNA barcoding. While great advances have been achieved recently, species identification from barcodes can still be unreliable if the relevant biodiversity has been insufficiently sampled. We here propose a new notion of species membership for DNA barcoding—fuzzy membership, based on fuzzy set theory—and illustrate its successful application to four real data sets (bats, fishes, butterflies and flies) with more than 5000 random simulations. Two of the data sets comprise especially dense species/population-level samples. In comparison with current DNA barcoding methods, the newly proposed minimum distance (MD) plus fuzzy set approach, and another computationally simple method, ‘best close match’, outperform two computationally sophisticated Bayesian and BootstrapNJ methods. The new method proposed here has great power in reducing false-positive species identification compared with other methods when conspecifics of the query are absent from the reference database.