Get access

Building the Coleoptera tree-of-life for >8000 species: composition of public DNA data and fit with Linnaean classification


Correspondence: Ladislav Bocak, Department of Zoology UP, 17. listopadu 50, 771 46 Olomouc, Czech Republic. E-mail:


The species representation of public databases is growing rapidly and permits increasingly detailed phylogenetic inferences. We present a supermatrix based on all gene sequences of Coleoptera available in Genbank for two nuclear (18S and 28S rRNA) and two mitochondrial (rrnL and cox1) genes. After filtering for unique species names and the addition of ˜2000 unpublished sequences for cox1 and 18S rRNA, the resulting data matrix included 8441 species-level terminals and 6600 aligned nucleotide positions. The concatenated matrix represents the equivalent of 2.17% of the 390 000 described species of Coleoptera and includes 152 beetle families. The remaining 29 families constitute small lineages with ˜250 known species in total. Taxonomic coverage remains low for several major lineages, including Buprestidae (0.16% of described species), Staphylinidae (1.03%), Tenebrionidae (0.90%) and Cerambycidae (0.58%). The current taxon sampling was strongly biased towards the Northern Hemisphere. Phylogenetic trees obtained from the supermatrix were in very good agreement with the Linnaean classification, in particular at the family level, but lower for the subfamily and lowest for the genus level. The topology supports the basal split of Derodontidae and Scirtoidea from the remaining Polyphaga, and the broad paraphyly of Cucujoidea. The data extraction pipeline and detailed tree provide a framework for placement of any new sequences, including environmental samples, into a DNA-based classification system of Coleoptera.

Get access to the full text of this article