Excluding spatial sampling bias does not eliminate oversplitting in DNA‐based species delimitation analyses

Abstract DNA barcoding and DNA‐based species delimitation are major tools in DNA taxonomy. Sampling has been a central debate in this context, because the geographical composition of samples affects the accuracy and performance of DNA barcoding. Performance of complex DNA‐based species delimitation is to be tested under simpler conditions in absence of geographic sampling bias. Here, we present an empirical dataset sampled from a single locality in a Southeast‐Asian biodiversity hotspot (Laos: Phou Pan mountain). We investigate the performance of various species delimitation approaches on a megadiverse assemblage of herbivorous chafer beetles (Coleoptera: Scarabaeidae) to infer whether species delimitation suffers in the same way from exaggerate infraspecific variation despite the lack of geographic genetic variation that led to inconsistencies between entities from DNA‐based and morphology‐based species inference in previous studies. For this purpose, a 658 bp fragment of the mitochondrial cytochrome c oxidase subunit 1 (cox1) was analyzed for a total of 186 individuals of 56 morphospecies. Tree‐based and distance‐based species delimitation methods were used. All approaches showed a rather limited match ratio (max. 77%) with morphospecies. Poisson tree process (PTP) and statistical parsimony network analysis (TCS) prevailingly over‐splitted morphospecies, while 3% clustering and Automatic Barcode Gap Discovery (ABGD) also lumped several species into one entity. ABGD revealed the highest congruence between molecular operational taxonomic units (MOTUs) and morphospecies. Disagreements between morphospecies and MOTUs have to be explained by historically acquired geographic genetic differentiation, incomplete lineage sorting, and hybridization. The study once again highlights how important morphology still is in order to correctly interpret the results of molecular species delimitation.


| INTRODUC TI ON
Since DNA barcoding was formally proposed (Hebert et al., , 2004Ratnasingham & Hebert, 2007), cox1 sequences have been rapidly accumulated from all around the world (Porter & Hajibabaei, 2018). Early studies mostly had a narrow systematic focus and targeted local or regional species assemblages.
In order to infer in more detail the empirical behavior of species delimitation approaches that are currently in use in combination with particular genetic markers, it would be desirable to test commonly used methods on a dataset without geographic bias that still provides a sufficient number of closely related (i.e., congeneric) taxa that provide a sufficiently wide hypothesis space to test. Since low dispersal capacity might be a highly relevant factor for the outcome of such an investigation, we specifically investigate a soil-dwelling group of beetles (Sericini chafers) which have short emergence periods, little-specific host preferences, and a high degree of endemism (e.g., Ahrens, 2004Ahrens, , 2005Dalstein et al., 2019;.
We focus on cox1, since it continues and will continue to be a widely used marker for taxonomy in barcoding and metabarcoding studies.
Identification success may decrease with increasing spatial scale of sampling, up to a drop of 50% at continental scales (Bergsten et al., 2012). Sampling on a continental scale thus considerably increases the complexity of barcoding studies. Most of the "northern" latitude studies, however, are supposed to contain species with only low infraspecific haplotype diversity (due to extinctions and recolonization events during and after the Pleistocene; for example, Ahrens et al., 2013;Hewitt, 1996Hewitt, , 1999Schmitt, 2007), and often assemblages only contain a small number of closely related species. Thus, these data do not represent suitable test cases of species delimitation performance when the component of actual geographic genetic variation is excluded. On the other hand, most of studies, which focused on tropical groups, collected specimens from more than one site (i.e., under geographical bias) (e.g., Elias et al., 2007;Janzen & Hallwachs, 2011;Janzen et al., 2009;Thormann et al., 2016). Often, mismatch of MOTUs with morphospecies was seen as evidence for cryptic diversity (e.g., Janzen & Hallwachs, 2011;Janzen et al., 2009).
Here we present a dataset that was sampled from one local assemblage in a Southeast-Asian biodiversity hotspot (Laos: Phou Pan mountain). We investigate the performance of various species delimitation approaches on a megadiverse assemblage of herbivore Sericini chafer beetles (Coleoptera: Scarabaeidae). Our objective is to infer whether species delimitation suffers from exaggerate infraspecific variation in the same way that led to inconsistencies between entities from DNA-based and morphology-based species inference in previous studies, despite the lack of geographic genetic variation. We are interested in the degree of deep coalescence (Maddison, 1997) in this local species assemblage and in how species delimitation approaches handle these data. Excluding geographic genetic variation, we would expect less problems due to deep coalescences and thus higher rates of taxonomic congruence between morphospecies and MOTUs. Furthermore, we employ clustering algorithms similar to those used in metabarcoding approaches, to explore the reliability of this critical step in current metabarcoding analysis pipelines (e.g., Coissac et al., 2012;Deiner et al., 2017;Macher et al., 2018;Ruppert et al., 2019).

| Study group, sampling, and identification
The study group is the megadiverse tribe Sericini that contains worldwide nearly 4,000 described species in about 200 genera (Ahrens, 2006). They are one of the oldest lineages of phytophagous Scarabaeidae and diversified with the rise of the angiosperms 108 Ma (Ahrens et al., 2014;. Sericini are nearly worldwide distributed, except in Australia, most oceanic islands, archipelagos, and circumpolar regions (Ahrens, 2006). The polyphagous herbivore adults are fully winged while larvae feed on roots and underground stems of living plants (Ritcher, 1966). Some species are considered as crop pests (Nair, 1986). Their highly similar external morphology makes the species difficult to distinguish, but highly complex male genitalia are well-differentiated between species and show only little intraspecific variation (Ahrens & Lago, 2008).
Sampling was conducted during 4 weeks in April 2014 by Carolus Holzschuh and local collectors in the Phou Pan mountain area (Laos, Hua Phan province) (Figure 1) (ca. 20°12′N, 104°01′E), at an elevation between 1,300 and 2,000 m. Specimens were collected using light traps, by hand, or netting during daytime. The Phou Pan mountain is situated in the Indo-Burmese biodiversity hotspot area (Myers et al., 2000) which is characterized by extremely high endemism.
The habitat with its dense rainforests (Müller et al., 2013) offers a large variety of plant species for herbivore insects to feed on. For this study, we used only males (1,086 specimens), since they were assignable to distinct morphospecies, while females are often not distinguishable among closely related syntopically occurring species.
Specimens were presorted to morphospecies using the complex shape of their copulation organ, that is, aedeagus, which has been proven to be the best suited trait system to robustly infer species entities for this group (Dalstein et al., 2019;Eberle et al., 2016). For this purpose, male genitalia of all specimens were dissected. Habitus and genitalia of each species were photographed with a stereomicroscope (ZEISS Stereo Discovery.V20) connected to a ZEISS Axiocam.
Presumably undescribed species that were not yet referable to an available species name were numbered consecutively (sp1, sp2, etc.).

| DNA sequencing
We sequenced the cox1 gene (5′-end) of multiple specimens (3)(4)(5) per morphospecies (in total 190). Laboratory work followed the standard protocols of the German Barcode of Life project (Geiger et al., 2016). DNA was extracted from a mesothoracic leg and attached muscles using the Qiagen DNeasy Blood and Tissue Kit or the Qiagen BioSprint96 magnetic bead extractor.
The PCR reaction was carried out in total reaction mixes of 20 μl, including 2 μl of undiluted DNA template, 0.8 μl of each primer (10 pmol/μl), and standard amounts of the reagents provided with the "Multiplex PCR" kit from Qiagen using prim- Raw DNA sequences were assembled (forward and reverse sequence) and edited in Geneious R7 (version 7.1.3, Biomatters Ltd.) to correct base-calling errors and to assign ambiguities (when forward and reverse sequence were not congruent for certain nucleotides).

| Phylogenetic analysis and species delimitation
Putative morphospecies were compared with results obtained from the DNA-based species delimitation methods. We applied Poisson tree process (PTP) (Zhang et al., 2013), statistical parsimony network analysis (TCS) (Templeton et al., 1992), Automatic Barcode Gap Discovery (ABGD) , distance-based clustering, and Barcode of Life database (BOLD)-Barcode Index Numbers (BINs). These methods were applied on all sequenced beetles to result in clusters that are considered molecular taxonomic units (MOTUs) (Floyd et al., 2002), that is, DNA-based species-assignments by the respective method.
A phylogenetic tree was calculated with maximum likelihood from the final multiple alignment of all DNA sequences using the IQ-TREE web server (IQ-TREE version 1.6.12; http://iqtree.cibiv.univie.ac.at/) (Nguyen et al., 2015;Trifinopoulos et al., 2016); the best substitution model (GTR+F+I+G4) was chosen with ModelFinder (Kalyaanamoorthy et al., 2017) according to Bayesian information F I G U R E 1 Collecting area in Laos (20°12′N, 104°01′E) (marked with a black dot) criterion (BIC). Branch support was calculated by generating 1,000 samples for ultrafast bootstrapping (Hoang et al., 2018 (Huson & Bryant, 2006) to visualize incompatible and ambiguous signals in the cox1 dataset. In these networks, the parallel edges, rather than the single branches, illustrate splits concluded from the data.
We used both versions of the Poisson tree process model (PTP) on the PTP web server (https://speci es.h-its.org/; accessed on 5 August 2020): bPTP, which adds Bayesian support (pp) values to branches that delimit species in the input tree, and the refined multirate mPTP. PTP uses the shift in the number of substitutions at internal nodes to identify branching rate transition points (Zhang et al., 2013) which indicate speciation events. We used default settings for the bPTP analysis (100,000 MCMC generations, thinning: 100, burn-in: 0.1, seed 123).
Statistical network analysis as performed with TCS v. 1.21 separates the sequence data into clusters of closely related haplotypes connected by changes that are nonhomoplastic with a 95% probability (Templeton et al., 1992); if applied to mtDNA the extent of the networks has been found to be largely congruent with morphospecies (Ahrens et al., 2007;Meier et al., 2006). Distance-based clustering was done with the tclust-function in the R-package spider (v. 1.5.0; Brown et al., 2012). A threshold of 3% was applied to the pairwise distance matrix of all specimens that was corrected with the Kimura model (K80). The logic of this approach underlies most metabarcoding protocols (Liu et al., 2020;Macher et al., 2018;Piper et al., 2019), relying on the presence of a barcoding gap (Elbrecht et al., 2017), which was chosen as a gap at 3% pairwise distance by the majority of studies (however, see Beentjes et al., 2019 for an 2% example). Finally, we compared outcome from species delimitations to Barcode Index Number (BIN) assignments (Ratnasingham & Hebert, 2013) in the BOLD data base (Project-SCOIB: Sericini COI Barcoding).
To check the performance and accuracy of the DNA-based delimitation methods compared to the a priori morphospecies hypotheses based on the genital morphology, the match ratio  was calculated: Match ratio = 2 * N match /(N MOTU + N morph ).
N match is the number of species with exact matches, when the morphospecies and DNA-based species delimitation results to include the same specimens. N MOTU is the number of classified groups by the different delimitation methods, and finally, N morph is the number of morphospecies. All resulting MOTUs were mapped onto the phylogenetic tree beside terminal's labels (Figure 2). Due to shared haplotypes in different morphospecies, lowest inter-and infraspecific distances were both zero (Table 1), while maximum infraspecific distances were around 7%. The infraspecific mean distance was 1.5%, and the median even lower (0.8%). Nine morphospecies (i.e., 16% of the taxa) had infraspecific distances larger than 3%.   Ma. peregoi,N. sp 26,Mi. panzona,Mi. varians,G. marginalis,G. carolusi). In all others, infraspecific branches were rather shallow. Taxa sampled with more than three specimens and that were represented by a single haplotype did not occur. For all those cases with deep coalescence, at least one of the DNA-based species delimitations split the morphospecies, which in turn decreased the match ratio.

| D ISCUSS I ON
In the present paper, we investigated DNA-based species delimitation using the mitochondrial cox1 gene in a megadiverse assemblage of chafer beetles (Sericini) with particular focus on the performance of commonly used species delineation methods. The setup of examining barcodes of beetles from a single locality was chosen to investigate molecular species delimitation performance using data without geographic bias. While we know that match ratios strongly vary in tropical taxa (e.g., from 0.14 to 1.00; , we theoretically expected that match ratios would go against one due to the exclusion of geography-induced variance. Instead, for different standard species delimitation methods, we could also not report match ratios higher than 0.77. Interestingly, the 3% threshold clustering that is commonly used in metabarcoding approaches did not perform worse than more sophisticated approaches (like PTP or TCS); however, an accuracy of only less than 80% is not really what one could call a reliable taxonomy assessment.
DNA-based species delimitation approaches may oversplit morphological entities , while at the same time the opposite may be also the case (Dalstein et al., 2019), even in the same taxon (as demonstrated here for the tribe Sericini). This particularly proved to be true in presence of incomplete lineage sorting and hybridization and if geographic bias is not excluded (match ratio <0.5; Dalstein et al., 2019). Extreme oversplitting has been reported for both mtDNA and nDNA, when sex-biased dispersal occurs, which also limits general dispersal .
Oversplitting in our data is caused by the relatively deep coalescence in 21% of the species, which widely corresponds with the missing match to the morphospecies ( Figure 4). Also, the lack of significant interspecific divergence (i.e., a barcoding gap) seems to coincide with failure of accurate species delimitation. The impact is high with only 31 out of the 56 morphospecies matching perfectly the boundaries of inferred MOTUs (Figure 2). The nature of maternal inheritance of mtDNA and its very low recombination rate is probably the major reason for these patterns of deep coalescence. Historically acquired genetic differentiation, for example, in previously isolated populations is maintained in secondarily mixing populations (e.g., Ahrens et al., 2013). The more often such isolated populations occur in time and space, for example, due to climatic fluctuation during the Pleistocene in geographically highly structured areas such as Southeast Asia, the more often we encounter such "paleogeographically induced" infraspecific variation which leads to the same result as current geographic variation. This effect consequently impedes species delimitation methods in the same way, particularly in a single marker system (e.g., cox1).
In addition to such historical effects, our data also appear to . In those instances, lumping of morphospecies in DNA-based species delimitation seems to be more likely; however, also oversplitting was observed (e.g., Microserica).
Despite strong divergence in male genital morphology, hybridization between closely related species of Sericini has been reported (e.g., Dalstein et al., 2019). The rather divergent structure of the aedeagus between species might indicate a case of mechanical isolation (lock-and-key hypothesis) that prevents mating between different species (Eberhard, 1985 Padial et al., 2010;Schlick-Steiner et al., 2010;Tautz et al., 2002).
We showed that the use of different clustering-and tree-based delimitation methods (Carstens et al., 2013) with the same single maker reproduces the same erroneous signal in slightly different variations.
It is thus critical to corroborate results from DNA-based species delimitation with data from other sources (e.g., genital or larval morphology, feeding traits, behavior, etc.; e.g., Janzen et al., 2009) to allow for independent testing of species boundaries.
Sericini chafers proved to be a valuable model system, because of robust morphospecies assignments that were facilitated by highly dissimilar and morphologically complex male genitalia that serve as accurate species diagnostic trait (Dalstein et al., 2019;Eberle et al., 2016).
Overall, the initial hypothesis of impeccable DNA-based species boundaries in syntopical species assemblages clearly had to be rejected. This was rather unexpected, especially since there was no additional evidence from other sources that these oversplittings could relate to cryptic diversity (Janzen & Hallwachs, 2011;Janzen et al., 2009Janzen et al., , 2017.
Given the highly simplified parameters of DNA-based species delimitation in this one-site species assemblage, it becomes clear how complex species delimitation with DNA-based methods is.
Performance with mean error rates of more than 30% is under the expectations for proper use for applied sciences and conservation management. Methods that are more sophisticated did not perform better than over-simplified threshold clustering methods as used, for example, in metabarcoding. Once more, we highlight the necessity of morphology for the verification of de novo species delimitation results and the constant need of integrative taxonomic approaches.

ACK N OWLED G M ENTS
We are grateful to Benedict Wipfler for his technical assistance.

CO N FLI C T O F I NTE R E S T
We have no conflicts of interest to declare.