A DNA barcode library for 5,200 German flies and midges (Insecta: Diptera) and its implications for metabarcoding‐based biomonitoring

Abstract This study summarizes results of a DNA barcoding campaign on German Diptera, involving analysis of 45,040 specimens. The resultant DNA barcode library includes records for 2,453 named species comprising a total of 5,200 barcode index numbers (BINs), including 2,700 COI haplotype clusters without species‐level assignment, so called “dark taxa.” Overall, 88 out of 117 families (75%) recorded from Germany were covered, representing more than 50% of the 9,544 known species of German Diptera. Until now, most of these families, especially the most diverse, have been taxonomically inaccessible. By contrast, within a few years this study provided an intermediate taxonomic system for half of the German Dipteran fauna, which will provide a useful foundation for subsequent detailed, integrative taxonomic studies. Using DNA extracts derived from bulk collections made by Malaise traps, we further demonstrate that species delineation using BINs and operational taxonomic units (OTUs) constitutes an effective method for biodiversity studies using DNA metabarcoding. As the reference libraries continue to grow, and gaps in the species catalogue are filled, BIN lists assembled by metabarcoding will provide greater taxonomic resolution. The present study has three main goals: (a) to provide a DNA barcode library for 5,200 BINs of Diptera; (b) to demonstrate, based on the example of bulk extractions from a Malaise trap experiment, that DNA barcode clusters, labelled with globally unique identifiers (such as OTUs and/or BINs), provide a pragmatic, accurate solution to the “taxonomic impediment”; and (c) to demonstrate that interim names based on BINs and OTUs obtained through metabarcoding provide an effective method for studies on species‐rich groups that are usually neglected in biodiversity research projects because of their unresolved taxonomy.

represents the foundation for all conservation and biodiversity initiatives. However, the inventory of biodiversity cannot be completed through morphological approaches alone. Both the speed and costs associated with sequence characterization of a standardized DNA fragment can be improved using DNA barcoding. Usually DNA barcoding studies provide a basis for establishing the reference sequence libraries required to identify specimens of known species (Gwiazdowski, Foottit, Maw, & Hebert, 2015;Hebert, Cywinska, Ball, & Dewaard, 2003). Herein we additionally show that it is also an efficient method for registering unknown and taxonomically challenging species-so called "dark taxa" (Page, 2016). Sequenced taxa can subsequently be associated with established binomens by taxonomic specialists using a reverse taxonomy approach, based on accurately identified museum specimens (ideally type specimens) and expert knowledge. During this process, specimens that belong to unnamed molecular character-based units (operational taxonomic units [OTUs] or barcode index numbers [BINs]) will either be referenced to known species or they may represent overlooked species that are new to science (Geiger, Moriniere, et al., 2016). A curated and comprehensive DNA barcode reference library enables fast and reliable species identifications in those many cases where time, personnel and taxonomic expertise are limited. Furthermore, such a library also supports large-scale biodiversity monitoring that relies upon metabarcoding bulk samples (Hajibabaei, Shokralla, Zhou, Singer, & Baird, 2011;Hajibabaei, Spall, Shokralla, & Konynenburg, 2012;Shokralla, Spall, Gibson, & Hajibabaei, 2012), like those obtained from Malaise traps (Gibson et al., 2014;Leray & Knowlton, 2015;Morinière et al., 2016;Yu et al., 2012).
This publication presents the first results of the Diptera campaign and it provides coverage for 5,200 BINs (Ratnasingham & Hebert, 2013). It covers ~55% of the known Diptera fauna from Germany. According to the checklist of German Diptera (Schumann et al., 1999) and the three additions published so far (Schumann, 2002(Schumann, , 2004(Schumann, , 2010 9,544 species of Diptera have been recorded from Germany. The Diptera library now includes a total of 2,453 reliable species identifications, and 2,700 BINs, which possess either interim species names or just higher-level taxonomy (genus or family; "dark taxa"). Although it has been shown that BINs correspond closely to biological species of most insect orders (Hausmann et al., 2013), there are other studies reporting difficulties in determining species through DNA barcodes within Diptera. In particular, wellstudied groups such as the syrphids represent a problem, because here additional genes for a clear type assignment must be consulted in many genera (Mengual, Ståhls, Vujić, & Marcos-Garcia, 2006;Rojo, Ståhls, Pérez-Bañón, & Marcos-García, 2006). Further examples of problems in species delineation due to barcode gaps, at least for some genera, are the Tachinidae and the Calliphoridae (Nelson et al., 2012;Pohjoismäki et al., 2016;Whitworth, Dawson, Magalon, & Baudry, 2007). In one of the few studies dealing with DNA barcoding in Diptera it was shown, that less than 70% of a composition of about 450 species covering 12 families of Diptera could be reliably identified by DNA barcoding, as there was wide overlap between intraand interspecific genetic variability on the COI gene (Meier, Shiyang, Vaidya, & Ng, 2006). However we find that more than 88% of the studied species, identified based on morphology or BIN matches to the BOLD database, can be unambiguously identified using their DNA barcode sequences. BINs enable the creation of an interim taxonomic system in a structured, transparent and sustainable way and thus become a valuable foundation for subsequent detailed, integrative taxonomic studies. Furthermore, the BIN system enables analyses that are equivalent to studies based on named species, that is where the underlying specimens are identified by specialists using traditional methods (i.e., morphology). The latter will play a special role in the processing, classification and genetic inventorying of lessexplored "dark taxa," which have been treated and processed with less priority by previous DNA barcoding activities. Moreover, this automated approach of delineating species is less affected by operational taxonomic biases, so it can provide more objective identifications than conventional approaches Packer, Gibbs, Sheffield, & Hanner, 2009;Schmidt et al., 2015). Using DNA extracts derived from bulk collections made by Malaise traps, we further demonstrate that species delineation using interim names based on BINs and OTUs constitutes an effective method for biodiversity studies using DNA metabarcoding. As the reference libraries continue to grow and gaps in the species catalogue are subsequently filled, BIN lists assembled by metabarcoding will provide improved taxonomic resolution.
The present study has three main goals: (a) to provide a DNA barcode library for 5,200 BINs of Diptera; (b) to demonstrate, based on the example of bulk extractions from a Malaise trap experiment, that DNA barcode clusters, labelled with globally unique identifiers (such as OTUs and/or BINs), provide a pragmatic, accurate solution to the "taxonomic impediment"; and (c) to demonstrate that interim names based on BINs and OTUs obtained through metabarcoding is an effective method for studies on species-rich groups that are usually neglected in biodiversity research projects because of their unresolved taxonomy.

| Fieldwork, specimens and taxonomy
A network of 130 (professional and voluntary) taxonomists and citizen scientists collected and contributed specimens to the DNA barcoding projects, primarily from various German states, but also from surrounding European countries (Austria, Belgium, Czech Republic, France, Italy). Most specimens (94.5%, 42,587 of 45,040 with COI sequences >500 bp) were collected by Malaise traps, which were deployed from 2009 to 2016. The study sites included more than 683 localities in state forests, public lands and protected areas such as the Nationalparks "Bayerischer Wald" and "Berchtesgadener Land," the EU habitats directive site "Landskrone," as well as alpine regions at altitudes up to 2,926 m (Zugspitze). Detailed information on collection sites and dates is available in Appendix S1. Since 2009, more than five million specimens of Diptera were collected by hand collecting, sweep netting, and by Malaise-, window-and pitfall-trapping. However, most voucher specimens have been extracted from Malaise trap samples. Twenty to 100 Malaise traps were deployed in each of seven years (2011-2017) mostly across habitats in Bavaria and Baden-Wurttemberg; one trap was placed in Rhineland-Palatinate. Samples were screened morphologically to maximize the diversity of species submitted for sequence characterization. Most vouchers were derived from Germany (44,511), but others were collected in France (222), Czech Republic (147), Belgium (106), Austria (70) and other Central European countries (18). All samples and specimens are now stored in the SNSB-ZSM or ZFMK except for a few held in private collections. From the entire collection, ~3,000,000 specimens of potential interest, most of which derived from the huge Malaise trap experiments in the framework of the GMTP, were identified to family level mostly by D.D. and to a minor extent by B.R. and experienced specialists using appropriate literature (Oosterbroek, 2006 and references therein;Papp & Darvas, 1997, 1998, 2000a, 2000b, Schumann et al., 2011. From this material, 59,000 specimens were submitted for sequence analysis through the DNA barcoding pipeline (including sample preparation, high-quality imaging and metadata acquisition for each specimen) established at the ZSM to support its involvement in national and international DNA barcoding projects. Most samples (>99%) were stored in 96% EtOH before DNA extraction. Specimen ages generally ranged from 1 to 5 years (43,112 specimens, 96%); only 4% were more than 5 years old. The number of specimens analysed per species ranged from one to 1,356 (i.e., Megaselia rufa) (Wood, 1908; see Appendix S1). When taxonomic expertise was available, specimens were sent to specialists to obtain as many species-level identifications as possible.

| Laboratory protocols
A tissue sample was removed from each specimen and transferred into 96-well plates at the SNSB-ZSM for subsequent DNA extraction.
For specimens with a body length >2 mm a single leg or a leg segment was removed for DNA extraction. The whole voucher was used for some very small specimens (e.g., ≤1 mm, such as small Cecidomyiidae, Chironomidae and Sciaridae), but replacement vouchers from the same locality were retained. In other cases (vouchers from Malaise traps), DNA was extracted from the whole voucher at the CCDB (Guelph, Canada) using "voucher-recovery" protocols (DeWaard et al., 2019) and the specimens were repatriated to the SNSB-ZSM and ZFMK for identification and curation. DNA extraction plates with the tissue samples were sent to the Canadian Center for DNA Barcoding (CCDB) where they were processed using standard protocols. All protocols for DNA extraction, PCR amplifications and Sanger sequencing procedures are available online (ccdb.ca/resources/). All samples were PCR-amplified with a cocktail of standard and modified Folmer primers CLepFolF (5′-ATTCAACCAATCATAAAGATATTGG) and CLepFolR (5′TAAACTTCTGGATGTCCAAAAAATCA) for the barcode fragment (5′ COI; see Hernández-Triana et al., 2014), and the same primers were employed for subsequent bidirectional Sanger sequencing reactions (see also Ivanova, Dewaard, & Hebert, 2006;deWaard, Ivanova, Hajibabaei, & Hebert, 2008, DeWaard et al., 2019. Voucher information such as locality data, habitat, alti-

| Data analysis
Sequence divergences for the COI-5P barcode region (mean and maximum intraspecific variation and minimum genetic distance to the nearest-neighbouring species) were calculated using the "Barcode Gap Analysis" tool on BOLD, employing the Kimura 2-parameter (K2P) distance metric (Puillandre, Lambert, Brouillet, & Achaz, 2012). The program muscle was applied for sequence alignment restricting analysis to sequences with a minimum length of 500 bp. Neighbour-joining (NJ) trees were calculated following alignment based on K2P distances.
The "BIN Discordance" analysis on BOLD was used to reveal cases where species assigned to different species shared a BIN, and those cases where a particular species was assigned to two or more BINs.
Sequences are grouped into clusters of closely similar COI barcode sequences, which are assigned a globally unique identifier, termed a "barcode index number" or BIN (Ratnasingham & Hebert, 2013). This system enables tentative species identifications when taxonomic information is lacking. The BIN system involves a three-step online pipeline, which clusters similar barcode sequences algorithmically into OTUs being "named" by a number. For the majority of studied insect orders, specimens sharing a BIN very often represent a close speciesproxy as delineated by traditional taxonomy (e.g., for Lepidoptera, Hausmann et al., 2013). However, some genera or families throughout the insects exhibit problems with species delineation based on DNA barcodes, due to high intra-or low interspecific genetic distances (e.g., cryptic diversity, BIN sharing or the barcode gap; see Hubert & Hanner, 2015). Within the Diptera, this phenomenon has been well documented (Meier et al., 2006), at least in some families, such as calliphorid, syrphid and tachinid species (Mengual et al., 2006;Nelson et al., 2012;Pohjoismäki et al., 2016;Rojo et al., 2006;Whitworth et al., 2007), but may also occur in families of "dark taxa" as well.
Every other "disagreement/conflict" case is the starting point for re-evaluation of both molecular and morphological data. We follow the concept of Integrative Taxonomy (Fujita et al., 2012;Padial et al., 2010;Schlick-Steiner et al., 2014, 2010 to infer whether there are previously overlooked species ("cryptic taxa") in the sample, or whether barcode divergence between species is too low or absent to allow valid species to be delineated using only COI characteristics.

| Reverse-taxonomy approach
When sequenced specimens could only be assigned to a category above the species level (family, subfamily or genus), we used interim species names (such as TachIntGen1 sp.BOLD:AAG2112) based on the corresponding BIN, so these specimens could be included in the "Barcode Gap Analysis" in order to provide more comprehensive estimates of the distribution of genetic divergences among both species assigned to Linnaean species and those with BIN assignments.
This analysis was conducted on all specimens at the same time after updating the interim taxonomy where necessary. For specimen records, which lack lower taxonomy (e.g., those uploaded only as "Diptera"), we applied the highest "conflict-free" taxonomy-for example the genus name, when other specimens within that BIN had the same identification-using a BIN match with the public data on BOLD (e.g., Melanagromyza sp. BOLD:ACP6151). All specimens, which could not be identified to species or genus level, and where the vouchers were in acceptable condition (e.g., unbroken antennae and/or legs after retrieval from Malaise trap), were selected using the corresponding BINs for identification by taxonomic specialists.
Interim names were subsequently moved into the "Voucher status" field in the BOLD metadata tables after all analyses were performed.

| Metabarcoding and bioinformatic data analysis
The potential utility of the DNA barcode library for biomonitoring Diptera was tested with field samples, focusing on an early warn-  (Martin, 2011). Forward and reverse reads in each sample were merged with the vsearch program "fastq_mergepairs" with a minimum overlap of 40 bp, yielding ~313-bp sequences.
Forward and reverse primers were removed with cutadapt, using the "discard_untrimmed" option to discard sequences for which primers were not detected at ≥90% identity. Quality filtering was done with the "fastq_filter" in vsearch, keeping sequences with zero expected errors ("fastq_maxee" 1). Sequences were dereplicated with "derep_ fulllength," first at the sample level, and then concatenated into a fasta file, which was then dereplicated. Chimeric sequences were removed from the fasta file using "uchime_denovo." The remaining sequences were then clustered into OTUs at 97% identity employing "cluster_size," a greedy, centroid-based clustering program. OTUs were blasted against the Diptera database downloaded from BOLD including taxonomy and BIN information in geneious (version 9.1.7; Biomatters) following the methods described in Morinière et al. (2016). The resulting csv file, which included BIN, Hit-%-ID value, family, genus and species information for each out, was exported from Geneious and combined with the OTU table generated by the bioinformatic pipeline. The combined results table was then filtered by Hit-%-ID value and total read numbers per OTU. All entries with identifications below 97% and total read numbers below 0.01% of the summed reads per sample were removed from the analysis.
OTUs were then assigned to the respective BIN (Appendix S2).
Presence-absence overviews of selected Diptera taxa (BINs) within the metabarcoding study were created; one-sided Pearson correlation coefficients were calculated to estimate the percentage of "dark taxa" with mid-range body size versus the number of species
No sequence information was recovered from 13.77% (8,139) of the specimens. Barcode recovery was most successful for EtOH-preserved specimens less than 10 years old. For the subsequent analyses we selected 45,040 specimens with high-quality DNA barcode sequences (≥500 bp), which fulfilled the requirements for being assigned to a BIN. This data set included ~5,200 BINs (2,500 were assigned a total of 2,453 Linnean species while 2,700 lacked a species designation, 52.4% of the data set). These BINs included one or more representatives from 88 of the 117 (75%) dipteran families known from Germany ( Figure 1, Table 1; Appendix S3, Krona graph in Figure   S2). More than one-third (1,829) of the BINs were new to BOLD.
Inspection of the COI sequence clusters using NJ trees (created with analytical tools on BOLD) and using the TaxCl-approach for detecting taxonomic incongruences (Rulik et al., 2017) revealed high congruence with morphology-based identifications. Among the 2,453 taxa assigned a Linnean binomen based on morphological identifications and "conflict-free" BIN matches, 88.67% (2,138) were unambiguously discriminated by their COI sequences. Another 122 species (4.97%), representing 8.7% of all studied specimens (3,951 individuals), were assigned to more than one BIN, resulting in a total of 255 BINs (Table 1; Appendix S3). For purposes of re-identification, the species in this subset can also be unambiguously assigned to a current species. For 34 of these taxa, the maximum intraspecific variation (maxISP) was <3% (range: 1.1%-3.0%), cases which may reflect either young sibling species or high intraspecific variation arising from secondary contact between phylogeographical lineages.
Another 88 species showed considerably higher divergences with maxISP ranging from 3% to 6% in 48 species and from 6% to 12% in another 40 species, cases that are strong candidates for overlooked cryptic diversity. Most of these cases involved species whose members were assigned to two BINs (112 species), but specimens of nine species were assigned to three BINs and those of one other to four BINs. Another 156 species (6.56%), representing 2.9% of all specimens (1,316 specimens), involved two or more named species that shared a BIN (   Using the Diptera data, we identified a total of 1,403 BINs including representatives of 71 families (1,385 species) within the metabarcoding data set (Appendix S2). Almost one-third (498/1403) of these BINs belonged to "dark taxa." Figure 2 illustrates examples of presence/absence overviews for the families Muscidae, Cecidomyiidae, Chironomidae and Syrphidae for selected Malaise trap sites.

Total number of taxa/with barcode
Among families containing "dark taxa," the percentage of unnamed taxa was inversely correlated with body size (r = −0.41, p = 0.0004) and positively with numbers of species reported from Germany (r = 0.33, p = 0.0037) (Figure 3; Appendix S4).  (Schumann, 2002(Schumann, , 2004(Schumann, , 2010Schumann et al., 1999). Until now, most of these fami- TA B L E 2 (Continued) (Continues) inaccessible because of the lack of specialists. By contrast, within just a few years, this study provided an interim taxonomic identification system for half of the German Diptera fauna. Although half these species still lack a Linnean name, their BIN assignments are useful "taxonomic handles" for work in ecology, conservation biology and other biodiversity research (see Geiger, Moriniere, et al., 2016). The study demonstrates the efficiency of DNA barcoding in the identification of Central European Diptera, reinforcing the results of earlier studies. DNA barcode coverage was nearly complete for many species-poor families (e.g., Megamerinidae, Opetiidae, Phaeomyiidae) known from Germany and the incidence of "dark taxa" in these families was low. Overall, there was a strong inverse relationship between the number of "dark taxa" and average body size: the smaller the average body size of a family, the higher the ratio of "dark taxa" (Figure 3). Among families with the smallest body sizes, our results suggest a higher incidence of cryptic diversity and overlooked species, indicating the number of dipteran species in Germany is likely to be much higher than previously recognized. Among families, such as the "Iteaphila group" (Empidoidea; see Meyer & Stark, 2015), Milichiidae and Trichoceridae, DNA barcoding indicates unexpectedly high levels of diversity as their BIN count is substantially higher than the number of species known from Germany (Schumann et al., 1999). The Cecidomyiidae represent the most impressive example, as we encountered 930 BINs while only 886 species are known from Germany (Table 1; Jaschhof, 2009;Schumann et al., 1999).

| D ISCUSS I ON
As such, they represent by far the largest family of Diptera in the studied area. When compared with the other families in Figure 2b, it is clear that the Cecidomyiidae show a lower average interspecific variation, indicating an increased evolutionary rate. As already proposed by Hebert et al. (2016), the extraordinary species-or BIN number-might be linked to their unusual mode of reproduction, namely haplodiploidy. Here, paternally inherited genomes of diploid males are inactivated during embryogenesis (Normark, 2003).
The phenomenon of haplodiploidy is known from Hymenoptera (Branstetter et al., 2018;Hansson & Schmidt, 2018) another group known to be rate accelerated, but it is largely unstudied throughout Diptera. Despite the need for more study, we conclude the true diversity of Diptera in Germany, Europe and the world has been seriously underestimated, a conclusion reached in several other studies (Erwin, 1982;Hebert et al., 2016;May, 1988;Ødegaard, 2000).
Within the metabarcoded Malaise trap samples collected over just one season in one region of Germany, we identified 1,735 OTUs with a sequence identity higher than 97% to a dipteran record. This result indicates that metabarcode analysis of bulk samples will be a valuable approach for assessing the diversity of Diptera in Germany (Appendix S2). Variation in overall biodiversity between sampling sites as well as annual phenologies of certain taxa can easily be visualized using presence-absence maps (Figure 2). This will be a useful feature for comparison of large data sets and for monitoring beneficial or pest insects (L. A. Hardulak et al. in preparation). Although a third of the OTUs within the metabarcoding data set could not be assigned to a Linnean species, interim names, such as BIN assignments, make it possible to compare sampling sites. OTUs with lower sequence similarities (<97%) to known taxa can be used to track "dark taxa," those species missing from the reference sequence library.
Although such taxa may only be assigned to a family or genus, their  for soil biology (Oliverio, Gan, Wickings, & Fierer, 2018). This approach combines the advantages of DNA barcoding, namely the capacity to identify any life stage, body fragment or even trace DNA in the environment, with the ability of high-throughput sequencers to analyse millions of DNA fragments and thousands of specimens at a time.
The application of this technology to biodiversity assessments will certainly enable species surveys at larger scales, shorter time and lower costs compared with classical morphological approaches (Douglas et al., 2012;Hajibabaei et al., 2011;Ji et al., 2013;Taberlet, Coissac, Pompanon, Brochmann, & Willerslev, 2012). The ability to upscale biomonitoring projects is crucial, as is the need to generate biodiversity data fast and with less dependence on often unavailable taxonomic experts. Additionally, data generated by ongoing metabarcoding studies, such as from annual national biomonitoring projects, can be combined and reanalysed, producing recursively more comprehensive species lists, when new reference sequences become available or when taxonomic annotations have been improved. While biomonitoring studies have traditionally employed small subsets of indicator species, metabarcoding will enable comprehensive assessments of biodiversity because even "dark taxa" can be tracked. Furthermore, metabarcoding can enhance the ability to rapidly assess biodiversity patterns to identify regions that are of most significance for conservation.
Although this project aimed to develop a comprehensive DNA barcode library, resource constraints meant that only half the specimens sorted to a family or better taxonomy could be analysed. It is certain that many species and genera currently absent from the reference library remain within this sorted material, making the remaining samples a valuable resource for future extension of the reference library. Our work has also highlighted the potential of DNA barcoding and metabarcoding to aid efforts to conserve the world's fauna.
Because these technologies greatly enhance our ability to identify, and thus conserve, biodiversity, they should be pursued-vigorously.
As our study has provided several thousands of voucher-based DNA barcode records, we invite the global community of dipteran taxonomists to improve identifications for the many "dark taxa" encountered in our study by identifying these vouchers using reverse taxonomic approaches.
The present study represents an important component of a decade of work directed toward creating a comprehensive DNA barcode library for German animal species. Because Diptera represents the largest and taxonomically most challenging insect order, they have received less attention than other orders (e.g., Lepidoptera, Coleoptera, freshwater orders) with lower species richness and more taxonomic expertise. Our work on Diptera has not only confirmed that this order is extremely species-rich, but also that several of its most diverse families include a large proportion of "dark taxa." The present study represents a cornerstone for subsequent research on these unexplored groups of Diptera. This paper pres- is labour intensive and time consuming, especially for taxonomically challenging taxa.
Our study presents results from one of the most comprehensive DNA barcoding projects on Diptera, a megadiverse, and, almost certainly, most diverse insect order. Our results strongly support the conclusion that DNA barcoding will enable the discovery and identification of most dipteran taxa. Some cases of low interspecific variation were observed in the Syrphidae, Tachinidae and Calliphoridae where additional markers may be needed for species identification (Haarto & Ståhls, 2014;Nelson et al., 2012;Pohjoismäki et al., 2016;Whitworth et al., 2007). However, in most cases, there was congruence between BINs and species defined by traditional morphological methods, supporting the use of DNA barcoding as a species identification tool for Diptera. This conclusion and the finding that many of the species we encountered represent "dark taxa" indicates that DNA barcoding will speed the discovery of genetic entities that will eventually gain recognition as biological species. Our data release aims at making these results accessible to the scientific community through a public data portal so they will be available for taxonomic research, biodiversity studies and barcoding initiatives at national and international levels.
In summary, the application of DNA barcoding enabled a comprehensive assessment of German Diptera, including several highly diverse families, which would otherwise have been excluded due to a lack of taxonomic expertise. By selecting morphospecies from the pool of specimens collected by the year-long deployment of Malaise traps in ecosystems ranging from alpine to lowland settings, we constructed a reference library for most dipteran families known from Germany. Due to the diversity of sampling sites, we encountered a wide range of taxa from microendemics to wide-ranging generalists with varied seasonal phenologies. We emphasize that DNA barcoding and the resultant barcode reference libraries provide an easy, intuitive introduction to molecular genetics, an approach accessible to undergraduate students in a way that genome sequencing is not. Because DNA barcoding workflows have been implemented in many laboratories around the world and because current primer sets reliably generate amplicons, this method is ideal for educational purposes. Democratization of the method, the analytical tools and data through the BOLD database (Ratnasingham & Hebert, 2007) further facilitates its use in real world situations. The approach has the additional advantage of allowing students to not only work with "real organisms," but also to solve long-standing taxonomic puzzles. The latter work leads students to probe the historical literature, to regale in past expeditions in search of type locations or type material, and potentially to end the chase by describing a new species.
However, it is critical that senior taxonomists and professors need to recognize these possibilities and encourage their students to embrace this approach as it offers such a clear solution to the taxonomic impediment.
Germany has a tradition of more than 250 years of entomological research, and the number of Diptera species recorded is the highest for any European country comprising almost half of the European fauna. Despite this long effort, knowledge of its Diptera fauna must be regarded as fragmentary. In accordance with the species accumulation curve presented by Pape (2009) for the British Isles, additional species were revealed from current collecting efforts for practically every species-rich family. Recording "new" species is slowed by the lack of experts for many of these families as well as by the lack of up-to-date identification keys. A particularly important result of our study is that the estimated number of dipteran species in Germany is certainly much higher than formerly thought. High proportions of unrecorded species were evident for the Agromyzidae, Anthomyiidae, Cecidomyiidae, Ceratopogonidae, Chironomidae, Chloropidae, Phoridae, Sciaridae and Sphaeroceridae, and to a lesser extent for the Empidoidea, Limoniidae, Mycetophilidae and others.
Further studies point to an enormous under-estimation of the species diversity in the Cecidomyiidae (Borkent et al., 2018;. Although our data do not allow for an accurate projection for the size of the total species numbers, it seems quite likely that this single family contains thousands of unrecorded species in Germany.

ACK N OWLED G EM ENTS
We express our extreme gratitude to the taxonomists, citizen sci-

DATA AVA I L A B I L I T Y
All specimen data have been made publicly available within the BOLD workbench -a DOI for the dataset has been added.