The first comprehensive DNA barcode reference library of Chinese Tanytarsus (Diptera: Chironomidae) for environmental DNA metabarcoding

Reliable DNA barcode reference libraries are critical for biodiversity monitoring based on Environmental DNA (eDNA). DNA barcoding has proven successful in species delimitation and association of life stages in many groups of animals, such as non‐biting midges (Chironomidae). We use the genus Tanytarsus van der Wulp, 1874, second largest genus of the family Chironomidae, to test whether DNA barcodes can be used to identify unknown and cryptic species. A comprehensive DNA barcode library will be built to facilitate further eDNA studies for biodiversity monitoring and conservation in freshwater ecosystems.


| INTRODUC TI ON
The ongoing climate change and human disturbances lead to a continuous decline of global biodiversity (Butchart et al., 2010), which greatly impacts the sustainability of our planet (Díaz et al., 2006). Efforts to conserve biodiversity at regional, national and international scales (Butchart et al., 2010;UNEP, 2011) depend on precise knowledge on taxonomy and species distributions (Kelly et al., 2014;Thomsen & Willerslev, 2015). As such, conservation practices are still limited by the poor understanding of biodiversity in numerous un-described taxa from massive geographical regions (Vié et al., 2009). Biological monitoring in aquatic ecosystems is particularly dependent on taxonomic expertise (Kelly et al., 2014).
To overcome these challenges, DNA-based approaches provide effective tools to identify aquatic organisms. DNA barcodesstandardized genetic markers for species identification (Hebert, Cywinska, & Ball, 2003;Hebert, Ratnasingham, & Waard, 2003)have become the most useful method for estimating species diversity and revealing cryptic species diversity (Hajibabaei et al., 2016;Hebert et al., 2004;Kanturski et al., 2018;Young et al., 2019). Due to the recent advances in sequencing technologies and bioinformatics, DNA metabarcoding (Tab erlet et al., 2012;Yu et al., 2012) based on organismal and environmental DNA (eDNA) (Macher et al., 2018;Majaneva et al., 2018) is increasingly used for biodiversity assessments and biomonitoring of aquatic biota (Elbrecht & Steinke, 2019;Elbrecht et al., 2017;Morinière et al., 2017;Sun et al., 2019;Valentini et al., 2016). Coupled with sensitive, cost-efficient and ever-advancing DNA sequencing technologies, eDNA metabarcoding is thus critical for biodiversity monitoring and environmental policy formulation (Kelly et al., 2014). Within a single standardized sample of freshwater and metabolic wastes, eDNA from entire communities across various taxonomic groups can be analysed simultaneously by comparison with barcode libraries (Deiner et al., 2017;Kelly et al., 2014). Yet, these biodiversity assessments are still highly limited by the availability of reference barcode libraries. More comprehensive and taxa specific barcode libraries are thus integral to monitoring and conserving aquatic biodiversity. In Europe, several freshwater macroinvertebrate DNA barcode libraries on a national level have been built Morinière et al., 2017Morinière et al., , 2019Weigand et al., 2019).
However, the high diversity of freshwater macroinvertebrate species (e.g. caddisflies and chironomids) occurs in Asia, and only a minor portion of this diversity has been barcoded to date (Lin et al., 2015;Zhou et al., 2007).

| Taxon sampling and taxonomy
Field sampling was conducted in China from 2008 to 2019 using methods, such as malaise trap, light trap, sweeping net, drift nets and kick-nets. Sampled specimens (Table S1) were preserved in ethanol (85% for adults and 95% for immatures) and stored at 4°C in the dark until further processing. Selected samples were dissected under a stereomicroscope and identified under a compound microscope using available references (e.g. Ekrem, 2002;Kawai, 1991;Lin et al., 2018a;Sasa, 1983;Sasa & Kawai, 1987). Tissue samples (three legs or head-thorax) were used for genomic DNA extraction. Afterwards, the exoskeleton of head-thorax was cleansed and mounted in Euparal® on the same microscope slide as its corresponding body parts (Saether, 1969).

| Molecular laboratory protocols
We divided each specimen into two parts for molecular experiments and analysis. Part 1. DNA of most specimens was extracted using Qiagen DNA Blood and Tissue Kit at the College of Life Sciences, Nankai University and Department of Natural History, NTNU University Museum. PCR amplifications of COI barcodes with the universal primers LCO1490 and HCO2198 (Folmer et al., 1994) were mimicked following the protocol from Lin et al. (2018a). Sanger sequencing of the purified PCR products was carried on the ABI 3730 at the BGI (Beijing, China) and BigDye 3.1 at the MWG Eurofins (Ebersberg, Germany). Part 2. DNA extraction, PCR amplification and high-throughput sequencing of the specimens were executed at the Canadian Centre for DNA Barcoding (CCDB, University of Guelph, Canada) using standard high-throughput protocols (de-Waard et al., 2008).

| Data analysis
Raw sequences were edited and assembled in SeqMan (DNASTAR), and aligned using MUSCLE (Edgar, 2004) implemented in MEGA X (Kumar et al., 2018) to check stop codons.
In addition to the novel data provided here, we searched available public Tanytarsus records from China in the Barcode of Life Data System (BOLD, http://www.bolds ystems.org/) (Ratnasingham & Hebert, 2007). In total, 298 records ( Figure S1; Table S1) with COI barcode of Chinese Tanytarsus were assigned to our present database (16 April 2020). A distribution map of the 298 records from China was created using the packages "ggplot2" (Wickham et al., 2018) and "mapdata" (Becker & Wilks, 2019) with R (version 3.5.2).
The "Barcode Gap Analysis" tool on BOLD was used to calculate sequence divergences for the present dataset, including the mean and maximum pairwise distances for intraspecific divergences, the mean and minimum pairwise distances for interspecific divergences, and minimum genetic distances to the nearest neighbour.
In addition, all 298 barcode sequences were applied to the Barcode Index Number (BIN) system on BOLD. The BIN system clusters DNA barcodes to generate OTUs with a threshold of 2.2% for a rough differentiation between interspecific and intraspecific genetic distances (Ratnasingham & Hebert, 2013).
A neighbour joining (NJ) (Saitou & Nei, 1987) tree was constructed based on the aligned dataset using the Kimura 2-Parameter (K2P) model (Kimura, 1980) with 1,000 bootstrap replicates and pairwise deletion in MEGA X.

| OTU delineation based on DNA barcodes using Automatic Barcode Gap Discovery (ABGD)
Compared to distance-based methods, the single-locus and treebased delineation methods, such as the Generalized Mixed Yule Coalescent (GMYC) (Pons et al., 2006) and Bayesian Poisson tree processes (PTP) approaches (Zhang et al., 2013), are suspected to cause over splitting (Lin et al., 2018b;Pentinsaari et al., 2017;Talavera et al., 2013). As a distance-based method, Automatic Barcode Gap Discovery (ABGD) (Puillandre et al., 2012) partitions sequences into groups based on comparisons of pairwise distances with higher reliability. Since ABGD provides a more reliable approach, our data were applied into the ABGD to compare the OTU number resulting from the BIN-based "Barcode Gap Analysis" and NJ-tree. ABGD analysis was carried out on 17 April 2020 using the web interface (https:// bioin fo.mnhn.fr/abi/publi c/abgd/abgdw eb.html). We used the K2P model and applied X = 1.0 as relative barcode gap width, and kept default settings for remaining parameters.

| DNA barcode analysis
The aligned 298 sequences ranged from 607 to 658 base pairs, including 211 sequences with a full barcode length of 658 base pairs. These 298 sequences were assigned to 66 BINs, which included 20 singleton BINs and 46 concordant BINs. 23 new BINs were added to BOLD, most represent new species entries for the system (Figure 2; Table S1). 20 BINs representing 20 species are singleton. The following ten species were represented by two BINs: Tanytarsus biwatrifurcus Sasa & Kawai, 1987, Tanytarsus formosanus Kieffer, 1912 Sasa, 1984. The mean intraspecific divergence of all species was 0.76% with a maximum of 6.75% in T. biwatrifurcus. The mean interspecific divergence was 12.4% with the minimum 7.2% between Tanytarsus oscillans Johannsen, 1932 and Tanytarsus unagiseptimus Sasa, 1985 (Figure 2).

| Species discrimination
The neighbour joining tree based on 298 COI DNA barcodes comprises 56 well separated clusters representing 33 named and 23 unnamed species. Among the 23 unnamed species, 21 putative species might be new to science, and the remaining two are unidentified. In general, the results showed a preferable matching pattern between DNA barcode clusters and morphospecies in Tanytarsus.

| Species without identification
As the larvae of some Tanytarsus species are not described and the females are difficult to differentiate, a larva of Tanytarsus sp. 31XL

| Cryptic species diversity
The DNA barcodes we generated in this study have revealed several potential cryptic species in the Tanytarsus chinyensis-, mcmillani-, norvegicus-, signatus-, tamakutibasi-, thaicus-and triangularis species groups, and some other species ( Figure S2 and Table S1).

F I G U R E 2 Barcode Gap Analysis of 298 COI barcode sequences of 56
Tanytarsus species from China. Two distance distribution histograms show the mean intraspecific divergence and distances to nearest neighbour. Three scatter plots are provided to confirm the existence and magnitude of the Barcode Gap. The first two scatter plots show the overlap of the max and mean intraspecific distances vs. the interspecific (nearest neighbour) distances. The third scatter plot plots the number of individuals in each species against their max intraspecific distances, as a test for sampling bias and Tanytarsus sp. 27XL) are closely related to T. biwatrifurcus, T. iriolemeus Suzuki, 2000 andTanytarsus tamakutibasi Sasa, 1983 with highly similar hypopygia. Among these seven species, five were well differentiated by multiple nuclear markers in the previous molecular phylogeny of Tanytarsus (Lin et al., 2018c, Figure 3). Since immatures were associated with adults by DNA barcodes, a few morphological diagnoses of species within the T. tamakutibasi species group were detected.
Additionally, similar cases were observed in the Tanytarsus thaicus and Tanytarsus triangularis species groups. Within the T. thaicus species group, three genetic lineages are well supported by the NJ tree and TCS haplotype network based on COI barcodes (Figure 4). These two potential cryptic species, Tanytarsus sp. 6XL and Tanytarsus sp.
17XL, are closely related to Tanytarsus thaicus Moubayed, 1990 based on morphology. T. sp. 17XL could be separated from T. thaicus by additional loci (Lin et al., 2018c, Figure 1). Within the T. triangularis species group, Tanytarsus sp. 20XL, Tanytarsus simantoteuus Sasa, Suzuki &Sakai, 1998 andTanytarsus tobaoctadecimus Kikuchi &Sasa, 1990 are difficult to separate morphologically. DNA barcodes of the above three species were clustered into three OTUs in both the NJ tree and the haplotype network ( Figure 5), indicating T. sp. 20XL as a potential cryptic species. After examination of these vouchers, some diagnostic characters were found in immatures and adult males associated with DNA barcodes. Moreover, the above three species have also been confirmed by additional multiple nuclear markers (Lin et al., 2018c, Figure 1).

| Life stages association
In this study, the larvae of 24 species, pupae of three species and adult females of five species were associated with their adult males by DNA barcodes ( Figure S2; Table S1). Among above cases, the larvae of 22 species, pupae of two species and adult females of four species have not been described yet.

| OTU delineation based on DNA barcodes using Automatic Barcode Gap Discovery (ABGD)
As some morphospecies showed comparably high intraspecific divergence, no definite "barcode gap" was observed based on pairwise F I G U R E 3 Genetic analyses of 31 COI barcodes of the Tanytarsus tamakutibasi species group from China. (a) Neighbour joining tree of T. tamakutibasi group based on K2P distance; numbers on branches represent bootstrap based on 1,000 replicates; scale represents K2P genetic distances. (b). TCS haplotype network based the COI barcodes of the T. tamakutibasi group. Mutations are shown as lines on the branches distance ( Figure 6). ABGD analysis of the present dataset recognized 55 OTUs with a prior intraspecific divergence of Pmax = 3.59%.
Minor discrepancies were observed in Tanytarsus okuboi Sasa & Kikuchi, 1986 and T. sp. 4XL, which was clustered in one group in ABGD, but recognized as two OTUs in both the BIN system and the NJ tree.

| Tanytarsus DNA barcode reference library
In this study, we investigated the taxonomy of Chinese Tanytarsus non-biting midges and generated a DNA barcode reference library. This library includes 298 records for 66 BINs representing 56 putative species. Until now, 33 species with a Linnaean name, and the remaining 23 species lack of a Linnaean name including 21 new species to science and two unidentified species. Overall, DNA barcode coverage is nearly complete for known Tanytarsus from China.
Our results also suggest a high incidence of cryptic species diversity, indicating that the number of species in Tanytarsus is likely to be much higher than previously recognized. To date, 99 species within Tanytarsus have been recorded in Japan (Yamamoto & Yamamoto, 2014). Although 56 morphological species of Tanytarsus have been barcoded here, wider sampling of Tanytarsus is needed for building a more comprehensive DNA barcode reference library.
Previous studies have shown the great potential of macroinvertebrate metabarcoding for biodiversity assessments of freshwater ecosystems (Elbrecht & Steinke, 2019;Elbrecht et al., 2017).
Compared with classical morphological approaches, biodiversity assessments based on metabarcoding will reduce costs and time by generating millions of DNA fragments from thousands of specimens at a time. However, few barcode libraries for macroinvertebrates have been built in China, which are crucial for biomonitoring. The present study is an important contribution to build a more comprehensive DNA barcode library for Chinese chironomids.
We show that DNA barcoding is an efficient tool for species delimitation and recognition of cryptic species in Tanytarsus. A number of immatures of some species associated with DNA barcodes are discovered for the first time as putative new species. In general, the NJ tree and ABGD based on DNA barcodes yielded concordant OTUs corresponding to morphospecies. With a lower threshold on BOLD, BIN yields a bit higher OTUs. However, using a single locus such as COI to differentiate species can cause an overestimation of the species diversity due to deep intraspecific divergence (Lin et al., 2015(Lin et al., , 2018b or an underestimation of species diversity due to recent speciation.
Incomplete lineage sorting (Willyard et al., 2009), horizontal gene flow (Naciri et al., 2012), recent speciations and insufficient taxon sampling (Luo et al., 2015;Zhang et al., 2010), can lead to inaccurate species delimitation. Therefore, an integrative taxonomy with molecular, morphological and ecological data is crucial to explore species boundaries (Galimberti et al., 2012;Hebert et al., 2004;Schutze et al., 2017;Zheng et al., 2020). For instance, species boundaries of closely related species within two Tanytarsus species groups were explored based on both morphology and DNA, and eight cryptic species were uncovered and described (Lin et al., 2018a(Lin et al., , 2018b. Although we reveal many putative new species by DNA barcodes here, a further taxonomic study together with nuclear markers and morphology is still needed to diagnose and describe these putative new taxa.

| CON CLUS ION
Our results demonstrate that DNA barcodes enable the discovery and identification of cryptic new species of Tanytarsus non-biting midges.
Our DNA barcode reference library of Chinese Tanytarsus included 298 records for 66 BINs representing 56 putative species, indicating 21 putative new species. A comprehensive DNA barcode reference library will speed up accurate species delimitation and discovery by providing a reliable reference resource for taxonomic studies. Furthermore, eDNA metabarcoding will be improved in biomonitoring of freshwater ecosystem with more integrative DNA barcode reference libraries.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13209.

DATA AVA I L A B I L I T Y S TAT E M E N T
The list of all specimen records, images, trace files and collateral information is publicly accessible on BOLD through the dataset "DS-2020TACH", DOI: dx.doi.org/10.5883/DS-2020TACH.