New mitochondrial primers for metabarcoding of insects, designed and evaluated using in silico methods

Abstract Insect metabarcoding has been mainly based on PCR amplification of short fragments within the “barcoding region” of the gene cytochrome oxidase I (COI). However, because of the variability of this gene, it has been difficult to design good universal PCR primers. Most primers used today are associated with gaps in the taxonomic coverage or amplification biases that make the results less reliable and impede the detection of species that are present in the sample. We identify new primers for insect metabarcoding using computational approaches (ecoprimers and degeprime) applied to the most comprehensive reference databases of mitochondrial genomes of Hexapoda assembled to date. New primers are evaluated in silico against previously published primers in terms of taxonomic coverage and resolution of the corresponding amplicons. For the latter criterion, we propose a new index, exclusive taxonomic resolution, which is a more biologically meaningful measure than the standard index used today. Our results show that the best markers are found in the ribosomal RNA genes (12S and 16S); they resolve about 90% of the genetically distinct species in the reference database. Some markers in protein‐coding genes provide similar performance but only at much higher levels of primer degeneracy. Combining two of the best individual markers improves the effective taxonomic resolution with up to 10%. The resolution is strongly dependent on insect taxon: COI primers detect 40% of Hymenoptera, while 12S primers detect 12% of Collembola. Our results indicate that amplicon‐based metabarcoding of insect samples can be improved by choosing other primers than those commonly used today.

damselflies) (Rach et al., 2017). However, the reference libraries for these alternative markers are small in comparison with those for COI.
In recent years, new high-throughput sequencing (HTS) platforms have opened up the possibility of analysing the taxonomic composition of entire environmental samples in a single analysismetabarcoding. This has created a lot of excitement in the biodiversity research community. Unfortunately, the Folmer region of COI is not suitable for HTS platforms because it is too long. Therefore, special mini-barcodes (short fragments, of variable length and position, within the Folmer region) have been developed for metabarcoding. Hajibabaei et al. (2006) and Meusnier et al. (2008) showed that minibarcodes from 135 bp up to~450 bp can provide the same degree of taxonomic discrimination as the whole 658 bp Folmer region.
Mini-barcodes have the additional advantage that they more easily can be amplified when the DNA is damaged or fragmented, which is common in environmental DNA samples (Taberlet, Coissac, Hajibabaei, & Riesenberg, 2012;Yu et al., 2012).
Ideally, a marker used for metabarcoding should have highly conserved sequence stretches that can be used for the design of "universal" primers amplifying all taxa of interest in the sample and that flank a highly variable region that can be used for species discrimination.
Unfortunately, being a protein-coding gene, COI is highly variable in the third position of most codons due to the redundancy of the genetic code, making it quite challenging to design primers for metabarcoding with good taxonomic coverage (Deagle, Jarman, Coissac, Pompanon, & Taberlet, 2014). Inevitably, there will be a varying number of mismatches between the primers and the templates in the sample, translating into differential affinity of the primers for different templates. The primer-template pairs with fewer mismatches will be amplified more easily in each cycle, potentially resulting in extreme overrepresentation of these sequences in the final PCR product. These biases in "universal" COI primers have been documented empirically in several studies. Hajibabaei, Shokralla, Zhou, Singer, and Baird (2011) and Brandon-Mong et al. (2015) reported biases with Lep-F1/Lep-R1 primers (Hebert, Stoeckle, Zemlak, & Francis, 2004), Yu et al. (2012) showed that the Folmer primers (Folmer et al., 1994) fail to amplify many species of Hymenoptera, and Clarke, Soubrier, Weyrich, and Cooper (2014) showed that several primer pairs are associated with amplification bias resulting in overrepresentation of Diptera and Lepidoptera sequences. The use of degenerate primers can reduce the bias to some extent (Clarke et al., 2014;Elbrecht & Leese, 2017). Morinière et al. (2016) studied the amplification success of four COI primer pairs with different degrees of degeneracy for different taxonomic groups taken from a Malaise trap sample. The amplification success was strongly dependent on degeneracy, varying from 5% for primers with no degeneracy to 49% for highly degenerate primers.
The amplification bias of COI primers has resulted in several metabarcoding studies exploring alternative markers. It is common, when working with a very broad taxonomic scope (up to phylum), to use a very conserved but easily amplified marker, such as the nuclear small subunit ribosomal RNA (rRNA) gene (18S) (Pawlowski et al., 2012). Several examples of 18S metabarcoding can be found, mostly involving soil/sediment biodiversity assessment (Drummond et al., 2015, soil;Brannock &Halanych, 2015 andDell'Anno, Carugati, Corinaldesi, Riccioni, &Danovaro, 2015, meiofauna) and eukaryotic microorganisms (Pawlowski et al., 2012). For vertebrates, especially fishes, the most used alternative marker is the mitochondrial small subunit rRNA gene (12S) (Furlan, Gleeson, Hardy, & Duncan, 2016;Port et al., 2015;Valentini et al., 2016). Performance of the mitochondrial large subunit rRNA gene (16S) has been tested in insect metabarcoding with promising results. Using in silico analyses, Clarke et al. (2014) showed that 16S mini-barcodes of <200 bp identified just slightly fewer species than mini-barcodes of COI of the same length when applied to a set of 315 species (constituting 264 genera and 23 orders) of insects, while the taxonomic coverage (no. of species successfully amplified) was 75%-90% with 16S vs. only 50% with COI. However, longer COI mini-barcodes increased the taxonomic resolution between closely related species to almost 100%, while the resolution of 16S peaked at 85%. Remarkably, taxonomic coverage and taxonomic resolution of 16S were consistent through 11 analysed insect orders, while the best COI taxon coverage was just above 50% within Diptera and Lepidoptera, and only between 0% and 47% within other insect orders.
Similar results were obtained by Elbrecht et al. (2016): 16S amplified more species and more equally through orders, thus enhancing biomass estimation. They state that if the goal is to identify the species present in the sample, COI is still the best choice due to the availability of extensive public reference databases, but when the aim is to assess the biodiversity in numbers-rather than in terms of species names-16S would be a better choice. Additionally, 16S metabarcoding has the advantage of amplicons not being mistaken with nuclear pseudogenes or Wolbachia, as can be the case with COI (Clarke et al., 2014). Deagle et al. (2014) suggested that the best strategy to follow in the future would be to build local databases of several markers and conduct metabarcoding studies with these different markers simultaneously rather than focusing on a single universal marker (COI).
Clearly, there is a need for more systematic search for optimal metabarcoding markers. The increased interest in sequencing mitochondrial genomes in recent years, and the development of sophisticated software for primer design and evaluation, has opened up new and faster ways of tackling this task. Here, we take advantage of these opportunities in searching for optimal metabarcoding primers. Our results show that most previously published primers perform poorly in silico compared with the optimal primers or primer-pair combinations identified here and that a combination of at least two markers can significantly increase the species detection compared to analyses using a single marker.

| Overview
In short, we compiled all publicly available insect mitogenomes and then applied two different primer design software packages: | 91 ECOPRIMERS and DEGEPRIME, to identify suitable markers for insect metabarcoding. ECOPRIMERS https://git.metabarc oding.org/obitools/ecoprimers/wikis/home) is part of the OBITOOLS bioinformatics package (Boyer et al., 2016;https://git.metabarc oding.org/obitools/obitools/wikis/home). Given amplicon length interval and maximum number of errors between primer and template (with the possibility of constraining those errors not to be in the 3′ end of the primer), ECOPRIMERS creates a list of the best primers from a set of sequences or genomes in the dedicated ECOPCR database format. DEGEPRIME (Hugerth, Wefer, et al., 2014; https://github.com/EnvGen/DEGEPRIME) is a program for semi-automated design of degenerate primers, originally developed for microbial 16S rRNA metabarcoding but applicable more generally.
Given a maximum degeneracy (number of unique primer sequence combinations) and primer length, DEGEPRIME creates a list of the best primers for an input alignment.
After identifying a set of suitable markers for insect metabarcoding using this approach, we then evaluated their performance against that of previously published ones, focusing on taxonomic coverage and resolution of species in the mitogenome reference database. We also assessed the performance of different primerpair combinations.

| Data preparation
Two sets of complete mitochondrial genomes of Hexapoda were created by downloading all accessible entries from GenBank, the first accessed in October 2015 and the second accessed in September 2016 (Supporting Information Figure S1; Tables S1 and S2). The first set (D1), comprising a total of 1,138 genomes (corresponding to 801 species, 607 genera, 268 families and 34 orders; 75 species (9%) were represented by more than one sequence), was converted into ECOPCR database format (for use with ECOPRIMERS) and, in parallel, split into different FASTA files, each containing either a protein-coding gene or an rRNA gene, using GENEIOUS 8.1.7 (https://www.geneious.c om, Kearse et al., 2012) (for use with DEGEPRIME). The protein-coding genes ATP8, ND2 and ND6 could not be extracted by aligning them to a reference sequence using the default settings of GENEIOUS due to high variability. Consequently, they were not considered for designing primers that could amplify a wide range of insect taxa. The rest of the extracted genes (12S, 16S, ATP6, COI, COII, COIII, ND1, ND3, ND4, ND4L and ND5) were aligned using MAFFT v7.266 (Katoh & Standley, 2013). The second set (D2), comprising a total of 1,600 genomes (corresponding to 1,081 species, 766 genera, 311 families and 34 orders; 948 of the species (12%) were represented by more than one sequence) was converted into ECOPCR database format. D1 was used for primer design and D2 for primer evaluation. To assess the accuracy of the taxonomic annotation of the GenBank entries, the complete COI sequence of each entry in D1 was submitted to BOLD database using custom scripts. The GenBank species identification for each sequence and the identity provided by BOLD were then compared.

| Design of new primers
Primer design was done using data set D1. The script TrimAlignment.pl (included in the DEGEPRIME package) was run over the sets of extracted genes, trimming away alignment columns with more than 10% of the sequences having gaps (-min 0.9). For the DEGEPRIME analysis, we used a primer length of 18 bp (-l 18) and maximum degeneracy of 12-and 216-fold (-d 12/-d 216). Maximum degeneracy was set low (12-fold) to find primers with high specificity and low risk of forming primer dimers, and higher (216-fold) to explore results from the other end of the trade-off between unspecificity/primer dimers and higher sequence matching. These analyses will be referred to as DEGEPRIME-d12 (12-fold degeneracy) and DEGEPRIME-d216 (216-fold degeneracy). Entropy for each potential primer site is calculated by DEGEPRIME as ∑p i log 2 p i , where p i is the frequency of sequence i, where sequence i has the same length as the primer. After finding primer sites with low entropy, we identified the best primer pairs for each mitochondrial gene amplifying a sequence of suitable length for metabarcoding (100-500 bp long). ECOPRIMERS was run over the ECOPCR database-formatted dataset with the options of amplicon length of 50-500 bp (-l 50 -L 500), no mismatches in at least 70% of the species (default) and up to three mismatches in 90% of the species (default), none of them in the 3′ end of the primer (-3 3) and considering the sequences as circular (since the mitochondrial genome is circular) (-c).

| In silico PCR
The primers found in the previous step, plus already published primers for barcoding and metabarcoding of insects targeting COI, 16S and CytB, were subjected to in silico PCR using D2 and ECOPCR in the OBITOOLS package (Ficetola et al., 2010; https://git.metabarcoding. org/obitools/ecopcr/wikis/home). Compared to D1, D2 contains approximately 500 more sequences and 200 more species, thus providing a good test of the ability of the primers to detect and discriminate new taxa. Stringent PCR conditions (corresponding to high annealing temperature, resulting in higher specificity and lower amplification bias) were emulated by setting the ECOPCR options such that we allowed no mismatches (-e 0) and amplicon length ±10% of the expected length for each primer pair. Nine insect orders within D2 were also amplified separately. Specifically, we targeted those insect orders that are most abundant in Malaise traps (Hymenoptera, Diptera, Lepidoptera, Hemiptera) and in pitfall traps (Coleoptera, Collembola, Blattodea, Orthoptera and Thysanoptera).

| Measuring the performance of markers
The performance of each marker, consisting of a primer pair and its associated amplicon, was assessed using different indices measuring taxonomic coverage and resolution. Taxonomic coverage (B C ) is the proportion of species in the data set that are amplified by the given primer set (Ficetola et al., 2010). Taxonomic resolution (B S ) is the proportion of species whose amplified sequences are unambiguously separated from the remainder of the set at a given similarity threshold (Ficetola et al., 2010). A species is considered unambiguously identified when it does not present any synonymy conflict, that is, when no cluster of barcodes from the species contains sequences of other species. However, this way of measuring resolution does not penalize the presence of two or more clusters with the same species label. This is not a problem when the analysis is reference-based; that is, the downstream diversity analysis aggregates split clusters based on taxonomic annotations in a reference database. However, when such a reference database is missing and the analysis is focused on molecular operational taxonomic units (MOTUs), this measure artificially inflates taxonomic resolution as the similarity threshold increases (Riaz, 2011). To address MOTU-based scenarios, we propose an alternative measure of taxonomic resolution, which we refer to as exclusive taxonomic resolution (B E ), in which the presence of two or more clusters sharing the same species label is considered as an ambiguity (even when each of these clusters only contains sequences with the same species label).
It is important to note that B S and B E are calculated over the set of sequences amplified by the primer pair, not over the original set, which can potentially lead to misinterpretations in case of high values of B S (or B E ) and low values of B C . For instance, if a primer pair amplifies 60% of the species in a mixture and the generated barcodes are able to discriminate between 87% of those amplified species, one might get the impression of having a good primer pair, while the reality is that such a primer pair is incapable of detecting almost half the species in the sample (87% of 60% is 52%). To get a general idea of how many species can be detected in a sample, we propose the effective taxonomic resolution (ETR) index, defined as the product between B C and B E .
For a formal definition of these measures, let S U be the set of species occurring in a single, homogeneously labelled cluster (a uniquely resolved or unambiguously identified species) and let S R be the set of species occurring in a homogeneously labelled cluster (regardless of whether there are more clusters with the same label). Let S A be the set of species among the amplified sequences, and let S be the total set of species in the database. With standard set theory notation, where |S| denotes the number of (unique) elements in a set S, we can then define the indices as follows: An important property of B E is that it varies with the similarity threshold used for clustering, peaking at a value (the "barcoding gap") that is characteristic for the marker. If the similarity threshold is too low, many closely related species will not be distinguished; if it is too high, variable species will lower B E . To identify this peak in B E , it is important to have many species represented by multiple sequences.
Our reference databases have many singletons and only a few species represented by multiple sequences. To facilitate the identification of the optimal similarity threshold under such circumstances, we introduce an alternative definition of exclusive taxonomic resolution, B 0 E , which is defined relative to clusters and not species. Thus, it penalizes oversplitting by counting the additional clusters generated by splitting the species that are well represented in the database into more than two clusters, using this to compensate for the fact that we cannot detect oversplitting in the many singleton species.
At peak resolution (the barcoding gap), B E 0 should be a reasonable approximation of B E measured over a database where all species are represented by many sequences. For a formal definition, let C A be the total set of clusters produced during the clustering of the amplified sequences. This set is composed of three subsets: C U is the set of clusters with a single, unique species label, C M is the set of clusters with more than one label, and C N is the set of clusters with a single but not unique label (i.e., the label is shared with other clusters). Then, we define the alternative index B

| Combinations of primer pairs
Two approaches were used to find good combinations of markers (primer pairs and associated amplicons) for metabarcoding. First, we simply examined pairs of markers that were identified in the previous steps as having good performance when used on their own (independent approach). Second, we considered the best marker identified for each gene in the previous steps, and then searched for the best marker for the fraction of the data set that the first marker was unable to detect (residual approach).
To measure the success of a pair of markers, we looked at the total number of species resolved by at least one of the two markers relative to the total number of species in the database. We regarded this as the total ETR of the two markers, ETR T . The contribution of each marker was then be teased apart by focusing on the species that were uniquely resolved by one marker but not the other. Formally, let S ðiÞ U be the set of species uniquely resolved by marker i, with similar index notation for other species sets. Then, we define the total ETR of two markers i and j as ETR ði;jÞ and the uniquely contributed taxonomic resolution of marker i as.
Note that S X \S Y refers to the elements occurring in S X but not in S Y . Finally, we define the redundant taxonomic resolution of two markers i and j, that is the species that are unambiguously resolved by both markers, as ETR ði;jÞ Note that these indices are additive, such that Primer quality indices and other definitions are summarized in

| Computations
The B C index values were computed using the command ecotaxstat from the OBITOOLS package. The B S values were calculated using the command ecotaxspecificity with a similarity threshold varying from 95% to 100% in steps of 1%. The same similarity threshold range was used with the algorithm UCLUST (implemented in the program USEARCH, Edgar 2010), and then, custom scripts were used to count the number of clusters with a single, unique species label, |C U |, and the clusters with mixed labels or with a single but not unique label, |C M | and |C N |, which were then used to compute B E 0 and all variants of the ETR index. All scripts used for the study are available at https://github.com/metagusano/new_primers_insects.

| Primer design
Potential primer sites in the rRNA genes (12S and 16S) have significantly lower entropy than the best primer sites in the protein-coding genes ( Figure 1, Supporting Information Figure S2). Both rRNA genes offer primer sites with entropy well below 4, which is rare in the other genes. Only primer pairs matching more than half of the D1 sequences were considered for further examination. For DEGEPRIME-d12, no primers filling this requirement were found for ATP6, ND1, ND3, ND4 or ND4L, but a large number of potential primer pairs were identified in the remaining genes (Supporting Information Table S3). The 12S primers F1 and F2, as well as R1 and R2, are very closely located one to each other and partially overlap with SR-J-14199-SR-N-14594 published by Kambhampati and Smith (1994).
All five pairs amplify fragments of the 16S gene.

| Measuring the quality of primer pairs
Among all primer pairs analysed (found using DEGEPRIME, ECOPRIMERS or previously published), the ones with the highest coverage (B C ) for each gene were used to test the differences between different ways of measuring resolution (B S and B E 0 ). For all markers, taxonomic resolution measured in the standard way (B S ) increased monotonically as the similarity threshold increased, reaching its maximum at 100%  Figure 3 shows the performance of previously published markers (primer pair and associated amplicon) and the best markers designed with DEGEPRIME-d12, DEGEPRIME-d216 and ECOPRIMERS. Detailed results for all primers are given in the supplementary material (Supporting Information Tables S3-S5, S8 and S9; see also separate csv files S12-S16).

| Performance of marker pairs
When combining two markers, the best results were obtained with COI + COII (ETR = 0.89), and 12S + COI or 16S + COI (both combinations with ETR = 0.88), followed by 12S + 16S, 12S + COII, 12S + CytB and COI + CytB (ETR = 0.85), and 16S + COII, 16S + CytB and COII + CytB (ETR = 0.84) (Figure 5a, Supporting Information Tables S10 and S11). For the three best marker pairs, which all F I G U R E 3 Taxonomic coverage (green), exclusive taxonomic resolution (blue) and effective taxonomic resolution (orange) of examined primers. Top left: newly designed primers with ECOPRIMERS and published primers with no degeneracy. Top right: newly designed primers with DEGEPRIME and published primers with degeneracy lower or equal to 12-fold. Bottom: newly designed primers with DEGEPRIME and published primers with degeneracy higher than 192-fold. Among the published primers, only those with B C > 0.15 are shown in this graph [Colour figure can be viewed at wileyonlinelibrary.com] involve COI, and for COI + CytB, the ETR U of COII, 12S, 16S and CytB is at least double that of COI. Outside these cases, the ETR U values of the two markers of the pair are more balanced.
Optimizing the primers for the second marker by designing them only over the set of sequences missed by the first marker (Figure 5b, Supporting Information Figure S3; Table S9) produced similar results to those obtained by combining the original, independently designed markers (Figure 5a).

| DISCUSSION
The rapid pace in the publication of mitochondrial genomes, and the improvement of in silico tools for primer design and evaluation, opens up new possibilities in the quest for optimal metabarcoding protocols. Even though the performance of the primers found using in silico approaches must still be validated experimentally, these methods are clearly here to stay. The current activity of mitogenome sequencing is well illustrated by the difference in size between the two data sets used for this study, downloaded only 11 months apart and indicated an annual increase in the publicly available mitogenomes by approximately 50%. The data set used by us to design primers (D1) was downloaded in September 2015 and contains mitochondrial genomes of more than 800 species. In a previous study published only a year earlier, Clarke et al. (2014) had access to data from only 315 insect species (excluding Collembola, Diplura and Protura), less than half of the data set used here.
One potential problem with the computational approach is that publicly available data do not necessarily reflect the composition of real environmental samples. For instance, Malaise trap samples tend to be dominated by Diptera and Hymenoptera specimens, but these orders are less frequently targeted in mitogenome sequencing projects than more popular groups like Lepidoptera, Coleoptera and  Figure S3) [Colour figure can be viewed at wileyonlinelibrary.com] some of the minor insect orders. However, the compositional biases are perhaps less problematic than one might fear. Even though they are clearly underrepresented, Diptera and Hymenoptera are still reasonably well represented in the databases we used (Supporting Information Figure S1).
At a finer taxonomic scale, there may also be important differences between the databases and real environmental samples. For instance, we observed that a few groups, like Drosophila, are very well represented in the mitogenome databases, with numerous sequences from the same or very closely related species, while most groups are represented by a diversified selection of mitogenomes from different species. In an environmental sample, such as a Malaise trap sample, one might expect a different type of distribution of sequence abundance and similarity. Nevertheless, given the size of the publicly available databases, we do expect our results to be at least indicative of the performance of the primer pairs in metabarcoding of real samples.
In terms of in silico tools, we compared results obtained with ECOPRIMERS, a popular primer design software for metabarcoding studies, with those from DEGEPRIME. Although DEGEPRIME has been widely used for primer design in the field of microbial metabarcoding, its use in eukaryotic studies has so far been restricted to unicellular organisms (Hu, Karlson, Charvet, & Andersson, 2016;Hugerth, Muller, et al., 2014;Parada, Needham, & Fuhrman, 2016). However, DEGEPRIME presents a series of advantages over ECOPRIMERS. For instance, it gives the user more control over the parameters that are important in finding adequate primers. It also allows the design of degenerate primers, making it possible to find primers that amplify a larger proportion of the sequences in the database at stringent PCR conditions. In our study, the primers found using ECOPRIMERS did not nearly perform as well as those found with DEGEPRIME, primarily because they were not degenerate.
In the ideal case, a primer pair used for metabarcoding should amplify the desired DNA sequence of all representatives of the target group present in the sample, and bioinformatic processing of these sequences would then be able to identify the species (and their abundance). To do this, the selected DNA sequence should have highly variable regions that are able to discriminate between closely related species, flanked by regions that are conserved across the target group so that they form suitable targets for PCR primers (Ficetola et al., 2010). These features should also be present in a short fragment to fit current sequencing platforms and to allow analysis of degraded DNA (Taberlet et al., 2012).
How can these properties be quantified? Ficetola et al. (2010; see also Riaz, 2011) proposed the indices B C and B S for taxonomic coverage and taxonomic resolution, respectively. These indices have proven useful and have been widely employed (Bellemain et al., 2010;Clarke et al., 2014;Epp et al., 2012). However, B S increases monotonically with the similarity threshold, making it useless in finding the barcoding gap (Figure 2a; Hebert et al., 2003, Meyer & Paulay, 2005. Thus, B S fails to discriminate between inter-and intraspecific genetic variation and does not consider the haplotype diversity within species that barcoding (and, by extension, metabarcoding) should assume. Only the presence of a rich reference database would allow safe identification of the clusters that belong to the same species, and even, the COI reference databases are still not complete enough for this in most cases. Therefore, B S is not an adequate measure for comparing the performance of metabarcoding markers.
In theory, the exclusive taxonomic resolution index we propose here (B E ) solves these problems. Given a rich reference database covering intraspecific variation in all taxa, it should allow us both to identify the barcoding gap and to compare the performance of different metabarcoding markers when species circumscriptions cannot be deduced from a reference database. However, we found that B E did not decrease rapidly enough at high similarity values to allow safe identification of the barcoding gap using our database. The reason is apparently the small number of species for which any intraspecific variation is covered in our database. Therefore, we also propose the alternative definition of the index, B E 0 , which measures resolution relative to clusters and not to species in the database.
This results in the index being increasingly penalized as the few abundantly represented species are split into smaller and smaller clusters when the similarity threshold value is increased beyond the barcoding gap. This version of the index allowed us to easily find the barcoding gap ( Figure 2). Our modified index should lag slightly behind B E in the decrease in resolution seen beyond the barcoding gap because of the relative shortage of high amounts of intraspecific diversity in the database. However, the decrease in B E 0 should be faster and more dramatic at high threshold values than for B E , as is also evidenced by our plot (Figure 2). At the barcoding gap, B E 0 should be a good approximation of B E .
We asked ourselves why we never obtained exclusive taxonomic resolution over 0.90 in our analyses, even for COI markers that supposedly should provide almost perfect taxonomic resolution. The fact that most (67%-86%) of the unresolved species were shared between marker pairs (Supporting Information Table S7) suggests that most of these species are impossible to circumscribe correctly using the mitochondrial data in the database. This could be because the taxonomic annotation in the database is incorrect, or because the mitochondrial genetic variation within species is unusually small or high in these species. In either case, it seems likely that the taxonomic resolution values presented here are conservative estimates of the true values.
Arguably, the best metabarcoding primers we found are the 12fold degenerate primer pairs targeting 16S and 12S. It was necessary to increase the degeneracy level considerably to obtain similar performance for the primers targeting protein-coding genes. Clearly, this reflects the superiority of rRNA genes as metabarcoding markers because of the presence of continuous, highly conserved regions allowing the design of universal primers. The variability seen at third codon positions in protein-coding genes makes it more difficult to find good primers. There is also an increased risk that new sequence variants not considered in the design phase will show up in environmental samples and that they will not be amplified due to primer mismatch.
One might have expected the performance of rRNA markers to be negatively affected by the difficulty of separating closely related species based on a conservative sequence, but our results indicate that this is a not a major issue when the amplicon is long enough to allow for several nucleotides of difference. The exclusive taxonomic resolution of the best rRNA markers is very similar to that of the best protein-coding markers. Closely related species have more similar rRNA marker sequences, but they are still distinct.
The situation appears to be different for nuclear rRNA markers.
The rRNA sequences of the large and small ribosomal subunits (18S and 28S) have conserved regions that allow the design of primers with good coverage, like the mitochondrial rRNA markers, but the resulting amplicons also tend to be conservative, resulting in very low taxonomic resolution (Wangensteen et al., 2018). Among nuclear markers, the internal transcribed spacer (ITS) may be the most promising for metabarcoding. It is known to provide good taxonomic resolution and has the advantage of being flanked by conserved regions (subunits 5.8S and 28S), so primers can be satisfactorily designed (Schoch et al., 2012 Multilocus or multigene approach has been used in several metabarcoding studies as a solution to fill the gaps in the detection of species by a single marker that is not able to amplify or discriminate between certain species (Cowart et al., 2015;Drummond et al., 2015;Shaw et al., 2016). Alberdi, Aizpurua, Gilbert, and Bohmann surveys, however, the multilocus approach has been less frequently applied (Alberdi et al., 2017;Elbrecht et al., 2016).
Even though the use of two primer pairs can increase taxonomic resolution significantly, it must be borne in mind that the individual HTS reads of the two markers cannot be combined, at least not using standard protocols. Thus, advanced bioinformatic processing will be needed to synthesize results across analyses using two separate markers. Interestingly, such analyses could help improve the quantification of the abundance of different species if amplification biases differ between markers.
A key finding of this study is that the best metabarcoding markers for insects are not found in the COI gene. The rRNA markers offer much broader taxonomic coverage at low levels of primer degeneracy and under stringent PCR conditions, while still resolving most species that can be separated genetically. At high levels of primer degeneracy, markers in the protein-coding genes can compete in performance with the rRNA markers, but under such conditions, the best COI markers are often outperformed by markers in other genes like COII and CytB. It is true that we currently lack reference data for these alternative markers. However, there is much to suggest that recent advances in the sequencing of mitochondria, such as mitochondrial metagenomics (Crampton-Platt, Yu, Zhou, & Vogler, 2016, Cicconardi et al., 2017Zhou et al., 2013) and long-range PCR for whole mitochondrial genome amplification (Deiner et al., 2017), will change this in the near future. As the public mitogenome databases grow in size, so do the reference libraries for all mitochondrial markers, as the existing COI reference data can be used to provide reliable taxonomic annotation of entire mitochondrial genomes, where such annotation is not available from other sources. HTS techniques can also now be used to facilitate the generation of reference libraries for custom markers during local or national barcoding campaigns. Therefore, we think metabarcoding projects should seriously consider the use of markers outside of COI as a complement to or replacement of COI-based analyses.

ACKNOWLEDGEMENTS
We are thankful to Niclas Gyllenstrand for asking the question that

DATA ACCESSIBILI TY
All the ecopcr files, as well as the two mitochondrial genomes data sets D1 and D2, can be accessed at https://zenodo.org/record/ 1326419#.W2WVr62B1E5.