The topological nature of tag jumping in environmental DNA metabarcoding studies

Metabarcoding of environmental DNA constitutes a state‐of‐the‐art tool for environmental studies. One fundamental principle implicit in most metabarcoding studies is that individual sample amplicons can still be identified after being pooled with others—based on their unique combinations of tags—during the so‐called demultiplexing step that follows sequencing. Nevertheless, it has been recognized that tags can sometimes be changed (i.e., tag jumping), which ultimately leads to sample crosstalk. Here, using four DNA metabarcoding data sets derived from the analysis of soils and sediments, we show that tag jumping follows very specific and systematic patterns. Specifically, we find a strong correlation between the number of reads in blank samples and their topological position in the tag matrix (described by vertical and horizontal vectors). This observed spatial pattern of artefactual sequences could be explained by polymerase activity, which leads to the exchange of the 3′ tag of single stranded tagged sequences through the formation of heteroduplexes with mixed barcodes. Importantly, tag jumping substantially distorted our data sets—despite our use of methods suggested to minimize this error. We developed a topological model to estimate the noise based on the counts in our blanks, which suggested that 40%–80% of the taxa in our soil and sedimentary samples were likely false positives introduced through tag jumping. We highlight that the amount of false positive detections caused by tag jumping strongly biased our community analyses.

To facilitate the simultaneous analysis of multiple samples during sequencing, each DNA amplicon from individual samples is labelled (tagged) by adding a pair of short but unique oligonucleotide sequences (hereafter referred to as tags) to the 5′ end of both primers in a PCR reaction (Binladen et al., 2007), allowing the tagged amplicons to be pooled for sequencing. These tags leave specific marks in the amplicons that can be identified during the later bioinformatics (demultiplexing) step and used to backtrack the sequence to its original sample. This tagging technique is a well stablished approach that has been applied on high throughput screening platforms (Quéméré et al., 2013;Shehzad et al., 2012).
Tags are assumed to be immobile, but it has been shown that tags can indeed switch during the library preparation step (Carøe & Bohmann, 2020;Esling et al., 2015;Schnell et al., 2015). At this time, amplicons go through an end-repairing reaction, after which the Illumina indexes and adapters are ligated to both ends of the sequences (Meyer & Kircher, 2010). During the library preparation stage, artefactual (nonoriginal) sequences carrying a different combination of tags can be generated by tag jumping (Schnell et al., 2015). In fact, studies have detected amplicons with incorrect tags following processing through the Illumina sequencing platform, which can lead to amplicons being attributed to incorrect samples if the tag combination formed matches that of an actual sample (Esling et al., 2015;Schnell et al., 2015). The result of this tag jumping is a sequence being misidentified as belonging to an amplicon from another sample, causing an effect similar to sample cross-contamination. The tag jumping described above should not be confused with "index hopping", an issue previously presented by van der Valk et al. (2020), who showed that indexes added to amplicons can move between libraries during sequencing and affect the results of studies with low DNA yields (van der Valk et al., 2020). Nevertheless, similar to index hopping, tag jumping can introduce sequences that artificially inflate species diversity in samples.
In this study we evaluate the latter of the suggested tag jumping explanations illustrated in Figure 1, mediated by a polymerase (usually T4 DNA polymerase) with 3′-5′ exonuclease activity during end repair (Schnell et al., 2015;van Orsouw et al., 2007). These polymerases are commonly used during library preparation (of adapter ligation protocols) for Illumina in the end-repair step (Meyer & Kircher, 2010). The applied polymerase can digest nonmatching DNA nucleotides at the 3′-end of the DNA strand and extend that same end using the complementary strand as a template. During this process, single-stranded tagged sequences of two different amplicons can interact, forming a heteroduplex with mixed tags (van Orsouw et al., 2007). In a heteroduplex, the activity of the commonly used T4 DNA polymerase can switch the 3′ tag at each side, creating artefactual sequences that carry a combination of tags formed by the 5′-tag of each interacting strand (van Orsouw et al., 2007). If the newly formed tag combination is shared by another sample, it cannot be separated from 'true' sample sequences during the demultiplexing step.
Within a tag matrix, where primers are arranged to generate unique tag combinations for all samples using both forward (rows) and reverse primers (column), tag jumping (if occurring as described in Figure 1) is constrained to produce misidentified amplicons along horizontal or vertical lines in the original tags matrix. That is, artefactual combinations carrying the 5′ tag of each strand interacting during the end-repair step will share the same row/column in the matrix as the two interacting strands. If chimera amplification during a second PCR can be excluded as an explanation for tag jumps, artefactual sequences forming "cross-like" patterns in tagging matrices are a unique characteristic of a tag jumping processes mediated by T4 DNA polymerase as described by Schnell et al. (2015). That is, other suggested processes such as cluster bleeding and/or the crosscontamination of primers do not generate cross-shaped patterns related to sample positioning within the tagging matrix. To date, this unique topological signature of heteroduplex-driven tag jumping has not been explored.
In this study, we evaluated topological patterns in four metabarcoding libraries by both inspecting findings of rare taxa (i.e., taxa present in few samples) as well by assessing artefactual sequences detected through our extensive use of sample blanks. Here, the rationale for our focus on rare taxa was that their absence from most samples simplified visualization of eventual cross-like patterns. We applied a topological model accounting for a sample's position in the tagging matrix (column × row) to provide the first quantitative mapping of samples affected by a tag jumping process that behaves as proposed by Schnell et al. (2015). We hypothesized that tag jumping is a process potent enough to substantially alter the findings and outcome of metabarcoding studies applied to soil and sediment samples.

| DNA extraction
We extracted DNA from two sets of samples: First, lake sediments from Lake Grosssee (Flumserberg, Switzerland) in 47°04′43.8" N 9°14'47.1" E at 1619 m.a.s.l. Sediment cores were recovered in 2017 from the central basin of Lake Grosssee with a percussion piston-scoring system (Uwitec, Austria). Cores were split lengthwise and aligned to a composite of 7.73 m as described in Glaus (2018)  For each depth, three subsamples (extraction replicates, 0.5 g of material each) were extracted using the DNeasy Power Soil kit (Qiagen). DNA extractions were carried out in a dedicated ancient DNA laboratory at Umeå University that is isolated from other modern DNA laboratories and has a positive air pressure system accompanied by a HEPA air filter system. We followed protocols to avoid contamination during the extraction process (Paijmans et al., 2019), including using extensive personal protective gear (facemask, gloves, and clean suit). Multiple extraction blanks and control amplifications were processed alongside all samples to screen for potential contamination.

| DNA metabarcoding with tagged primers and sequencing
All prepolymerase chain reaction (PCR) preparations for DNA metabarcoding were performed in the ancient DNA laboratory to minimize possible contamination. On a 96-well plate, tagged primers were arranged in a matrix. We arranged at least one PCR blank (with PCR mix and water, no primer, no DNA) per row and per column to assess the extent of tag jumping. Each extract, including those from extraction blanks, had 2-3 independent PCR replicates (i.e., with individual tag combinations; sequences and number of replicates detailed in Data S1) in the same column of a 96-well plate; all the extracts were randomly distributed on the PCR plate. We also included at least six PCR negatives (with PCR mix, water, and primers, but no DNA) for each plate and one PCR positive with DNA extracted from (3) during the after following library preparation, single stranded sequences interact to form a heteroduplex. (4) during the end-repair step the formed heteroduplex is exposed to T4 polymerase. The 3′-5′ exonuclease activity of this polymerase can splice the non-matching 3′ ends. (5) the 3′ end is synthesized again matching the complementary strand. The switching of tags between the ends results in a new tag combination for the heteroduplex (6). The sequences that formed will be identified as a sample if the new combination match that of an actual sample. This artefactual sample will consist of strands derived from two different samples. mitochondrial DNA) included amplicons amplified by the trnL g/h (Taberlet et al., 2007), MamP007F/R (Giguet-Covex et al., 2014), 16Smam-1/−2 (Taylor, 1996), and EW-B/−E (Bienert et al., 2012) primers, respectively.
DNA amplification for the PLAN library was carried out in 20 μl reactions containing 1x PCR buffer (Qiagen), 2 mM MgCl 2 , 0.2 mM dNTPs (Qiagen), 0.4 μM trnL g/h primers, 1.25 U HotStartTaq DNA polymerase (Qiagen), 4 μg bovine serum albumin (BSA, Thermo Fisher Scientific), and 1 μl DNA template. PCR was used for the enzyme activation at 95°C for 15 min and then 45 cycles consisting of 94°C for 30 s, 55°C for 30 s, and 72°C for 30 s, followed by final elongation at 72°C for 10 min.

| Bioinformatics
Data quality was checked with fastQC (Andrews, 2010) and all subsequent steps from raw data to assignment were carried out using OBITools3 (Boyer et al., 2016) through the following data processing pipeline: (1) Reads were merged using the function obi alignpairedend that uses no parameters. Only aligned reads were kept, using the function obi grep with the parameter -a mode: alignment. (2) Aligned reads were then demultiplexed with the function obi ngsfilter, allowing for no mismatches on the tags with the parameter -e 0. (3) The file was dereplicated by removing strictly identical sequences with obi uniq, keeping only the counts in each sample. (4) Quality filtering of sequences shorter that 40 base pairs (bps) or less than 10 counts in the whole library were removed using obi grep.
We also removed additional low-repeated sequences by hand after counts for the whole library were removed while all other assignments were grouped to the deepest taxonomic level. Note that this data set, which includes very low read counts (<10 in a sample), was used for regression analysis in the context of tag jumping, because even low numbers of reads may be explained by the effects of tag jumping. Additionally, low read count samples (<10 counts) were removed prior to clustering analysis and analysis of similarities, following standard practice for curation protocols.

| Testing the validity of the T4 DNA polymerase model
To test the validity of the T4 DNA polymerase tag jumping model proposed by Schnell et al. (2015), we initially relied on a visual inspection of how reads of rare taxa were distributed within the tag matrices. The rationale behind this is that for rare taxa, tag jumping patterns between samples (column-wise or row-wise) should be more easily detectible, given that these species are expected to be absent in most samples. We followed the distribution of rare taxa within the tags matrix and searched for cross-like patterns in abundance emerging along horizontal (row) and vertical (column) directions.
To statistically test to what extent a samples' topological position could be related to artefactual sequences formed by tag jumps, we performed a regression analysis using a topological descriptor modelling the heteroduplex formation (reads in the row × reads in the column) and the number of reads in our blanks after which residuals were checked for normality. All regression statistics were performed in MATLAB (MathWorks). To further scrutinize the impact of tag jumping, sequenced species data from library MAM2 was imported into the program Primer-e and transformed into presenceabsence data. The latter data set was used to build a Jaccard's similarity matrix, where we clustered samples using the group average method with a cophenetic correlation coefficient of 0.8967. In addition, we conducted a multidimensional similarity analysis (ANOSIM) where we used PCR and extract replicates as groups to assess within group variance and test for significant (p < .05) differences between pairwise comparisons.

| RE SULTS
All results of sequenced data can be found in Table S1, but a library specific summary follows below. In line with the proposed heteroduplex tag jumping model (Figure 1), we found counts of rare species forming horizontal and vertical patterns in the tags matrix as exemplified in Figure 2. High absolute numbers of reads (and relative number of reads) were systematically found in samples originally placed within the same column and/or rows within the tag matrix. Here, rare species reads F I G U R E 2 An example of the distribution of reads for a rare species (Lepus sp.) in one of our sedimentary DNA libraries (MAM1). Reads are shown as a function of the topologic space of the tagging matrix where forward primers are shown as rows and reverse primers as columns. Blue circles represent PCRblanks. Samples were originally randomly distributed within the tagging matrix, and the observed linear distribution of reads within the matrix (as indicated by the rook symbol) is a pattern to be expected from T4 polymerase-mediated tag jumping.
were always highest at the point of intersection, where horizontal and vertical tagging patterns crossed. Importantly, replicates positioned in other places around the tag matrix did not show positive detections.
Through the four libraries, 0.6% to 9.6% of the total reads detected in the PCR blanks had tag combinations that did not match any sample. In all four data sets we examined, the total number of reads observed in the PCR blanks could be largely explained by a linear regression taking into account the topological position (row × column) of the PCR blank and the total number of reads associated with each of the tags (Table S2,  proportion of reads derived from tag jumping relative to the total number of measured reads was used to group detections into four categories of uncertainty: (1) Unreliable detections, that is, taxa in which the total number of reads was within the upper 95% confidence interval of the regression predicting number of artefactual reads per sample as a function of matrix position; (2) uncertain detections, taxa in which total counts were only ≤50% higher than those predicted to be artefactual sequences (inferred using the 95% confidence boundaries of the regression analyses); (3) reliable detections, that is, samples with counts ≥50% higher than the predicted artefactual sequences; and (4) detections outside the limits, that is, taxa where their position within the tag matrix were not covered by the positions of the PCR blanks used to derive the regressions. Our classification of detections found that a considerable proportion of the taxa was classified as unreliable ( Figure 4). That proportion varied from 40% in library PLAN to as high as 80% in library WORM.
In line with these high numbers of unreliable detections, we found that sample similarities were more dependent on their tag combination than on their shared background (i.e., sourcing from the same sample), as highlighted by our cluster analysis ( Figure 5).
Surprisingly, sample replicates and blanks showed no clear pattern of similarity. Some few replicates that showed similarities with another replicate also shared a similar tag (Figure 5a). Pairwise comparison of replicate groups suggested that there were no significant (p > .05) differences between most groups (i.e., when samples were Red dotted lines denote the 95% confidence interval around each regression line. Symbols indicate how samples were classified according to our four categories of uncertainties: Unreliable samples (red diamonds), uncertain samples (pink squares), reliable samples (green symbols) and samples judged outside the limits of the regression (blue stars). Black circles enclose the PCR blanks that were used to construct the regression. Equations for the regression lines reads: Upper left panel (Total reads = 4.383*10-7*X − 301.76); upper right panel (Bos reads = 5.248*10-7*X + 38102.2); lower left panel (Ovis reads = 7.208*10-6*X − 13.025); and lower right panel (Sus reads =6.6327*10-5*X − 8.407). Not axis have been cut and scaled to better show the regressions. similar). In fact, we could only find significant differences between two replicate groups despite their vastly different sample origins.
Instead, similarities between clusters were primarily explained by their sharing of tags (Figure 5b).
When removing detections classified as unreliable or uncertain (categories 1 and 2), some taxa were strongly affected by this filtering. For example, sequences of Cervus in library MAM1 were indistinguishable from the predicted tag jumping noise in most of the replicates and were removed from 72 of 81 occurrences ( Figure 6).
Others, such as Capra or Sus, experienced dramatic shifts in distribution after filtering. When focusing only on replicates classified as reliable (category 3), Capra or Sus were only found to be present in a few samples, while the unfiltered data suggested a nearly ubiquitous distribution of these rare taxa across samples. Nevertheless, distribution of some taxa remained relatively unaffected by our filtering approach. For example, Bos or Ovis distributions with the samples remained largely intact, even after removing the unreliable samples.
In MAM1, each taxon was on average found to be reliable in only every ninth sample-a frequency that illustrates the strong impacts that tag jumping imposes on the data set (see Table S3 for a complete breakdown by taxa in all libraries). We also note that the PCR blanks, which did not contain DNA or primers and should therefore be free from any sequences, became clean after our topology-based filtering. In contrast to the original data set, the analysis of similarity in the denoised data proves a significant difference between groups F I G U R E 4 Categories of sample uncertainties for each of the four studied libraries (MAM1, MAM2, WORM, and PLAN). Bars show the percentage of taxa, averaged per sample for each library (taxa in category/taxa in sample). Categories representing levels of uncertainty (shown in colours) are based on the predicted number of artefactual sequenced counts using the regressions in Figure 3 and the actual number of reads in the sample (see result section for a detailed description on how categories were calculated).

F I G U R E 5 Cluster dendrogram
constructed from the Jaccard's similarities matrix for the analysed samples. Black branches represent clusters identified in the analysis to share similarities (cophenetic correlation coefficient = 0.8967). Note the black ellipses marking PCR blanks combinations and thus, samples that should be free from taxa. In panel (a) replicates share the same coloured symbol. Here, the spread of coloured symbols at the base of the dendrogram illustrate that most replicates are dissimilar to each other. In panel (b), tags combination used for each samples are shown at the base. Note how samples identified as being similar to each other. This model predicts how artefactual sequences are distributed along columns or rows of the tag matrix, and is also consistent with the strong positive correlation between reads per row and column and the number of detected sequences present in the blanks. Given that Schnell et al. (2015) point towards the activity of the T4 DNA polymerase as one of the main drivers of tag jumps, and that this polymerase was also used in our end-repair step, it appears that our results strongly support the previous inferences made by Schnell et al. (2015).
Metabarcoding and high-throughput sequencing technologies have repeatedly proven their importance and utility in environmental studies. Nevertheless, there is an urgent need to improve the underlying methods and account for tag jumping effects, as has been previously suggested (Esling et al., 2015;Schnell et al., 2015;Zinger et al., 2019). In that sense, the reliability of positive detections is obviously a key element. As we show here, tag jumping can produce major distortions of sequenced data and-if not accounted for-can lead to incorrect interpretations of the data. The latter is evident from the striking contrast between our topologybased filtering and the original data sets (Figure 6a,b). Hence, we seemingly validate our main hypothesis that the artefacts F I G U R E 6 Comparison of sequence data from library MAM2 before (a) and after (b) filtering away samples predicted to be strongly affected by tag jumping according our topologic based model. Here, samples that were removed were part of the categories classified as unreliable and uncertain (see result section). Note that some taxa (such as Cervus or Sus) were present across many samples in the unfiltered dataset (a) but had well constrained distribution after filtering away samples strongly affected by tag jumping (b).
introduced by tag jumps are potent enough to substantially distort the conclusions of metabarcoding studies. Clearly, there is an urgent need to recognize impacts of tag jumping in environmental DNA data sets and to understand its underlying mechanism(s).
Regarding the former, it is important to highlight that the use of multiple replicates cannot rule out false detections induced by tag jumping, as it is possible that several replicates could present as false positives if they share one of the tags. While tag jumping may have grave implications for environmental DNA studies, our findings suggest that tag jumping generates a predictable pattern that not only provides routes for detecting affected data and reducing its impacts, but also offers key insights into the mechanism underlying the process.
Importantly, tag jumping should not be regarded as a minor effect that can be removed by simply raising the detection limit based on the number of reads observed in laboratory blanks. This is especially true considering that most studies use few controls (i.e., only 1 or 2 control blanks), which means they cannot fully capture the topological patterns we identified in our data sets that strongly suggest tag jumps are unevenly distributed across the tagging matrix.
In other words, assuming that artefactual sequences present in one and/or a few blank samples are representative of all samples within the tagging matrix is too simplistic and might generate flawed data.
We also show that the overall impact of tag jumping on a data set is an artificially inflated similarity between samples sharing a tag (e.g., Figure 5), as well as a distortion of taxa presence (e.g., Figure 6).
Clearly, our data proves that PCR-free indexing methods, which have been suggested to reduce tag jumping (Esling et al., 2015), are not enough to assure that samples are not affected by tag jumps.
It was only after "denoising" our data sets with the algorithms we developed that differences between individual sample groups became significant. That is, tag jumping initially masked the community differences between groups in the original sequenced data.
This finding suggesting that tag jumping increased false positive detections to the point that our sample communities were no longer different indicates that this process can strongly influence interpretations of metabarcoding data. It is important to note that tag jumping does not necessarily generate homogenizing effects, as differences between groups could also be artificially inflated-the direction and magnitude of the effects will ultimately depend on the arrangement of samples in the tagging matrix. For example, sample replicates could be placed in the same row/column of the tagging matrix, which would cause jumps to artificially lower betweenreplicate variance and thereby increase the likelihood of finding statistical differences between these samples and others.
Intriguingly, we also found that the impact of tag jumping varied substantially between our four libraries. These library-specific differences may be the result of differences in the amplified barcoding regions. Previous studies have shown that chimera rates are higher when barcodes share a high degree of similarity (Shin et al., 2014;Wang & Wang, 1997). As similarities between barcodes modulate the interaction between DNA strands, we argue that barcodespecific variation can also be expected in tag jumping. In addition, small differences in sample post-processing, GC content, time of processing, or temperature during purification and/or storage prior to sequencing may also contribute to variable levels of tag jumping between libraries. Additional research is needed to fully evaluate effects of these factors on the tag jumping process.

| Current practices and the way forward
There are generally three methods for library preparation: (1) One-step PCR, (2) two-step PCR and (3) adapter ligation (Bruce et al., 2021). Importantly, one-step PCR and two step PCR (including Phusion primers) are not likely affected by the tag jumping process as shown to affect libraries in our study. However, adapter ligation is sensitive to the tag jumping. According to Illumina, this latter technique has been cited in almost 10,000 publications. Whether these studies were affected by tag jumping or not is unknown, but there is a rationale for expecting a large number of them may have been affected by tag jumping.
We suggest that adequately reporting tag jumping, and, ideally, preventing these processes is important for all future metabarcoding studies that use adapter ligation. This is particularly relevant for environmental DNA studies where the validity of a positive detection within complex DNA templates cannot be verified through comparison with in situ monitoring data (for example, when used in palaeoecological reconstructions where reconstructed taxa cannot always be validated against historical documentations). We suggest four practices that can be used to reduce the impacts arising from tag jumping (see Data S1 for more details): 1. Twin-tagging. Using each tag only once eliminates all cross contamination mediated by heteroduplex driven processes.
Examples of the use of this tagging technique can be found for example in "An appetite for pests: Synanthropic insectivorous bats exploit cotton pest irruptions and consume various deleterious arthropods" (Cohen et al., 2020) or "Amazonian mammal monitoring using aquatic environmental DNA" (Coutant et al., 2021). However, this tagging strategy substantially increases analytical costs and workloads.
3. Use of a systematic blank system within the tags matrix. By using multiple blanks embedded in the matrix in a way that every column and every row has at least one PCR blank, it is possible to characterize tag jumping and filter out affected samples. That is, by using regressions predicting the baseline noise as shown in our study. While this approach cannot completely remove tag jumping from data sets, it can substantially improve the reliability of results and sharpen data curation procedures.
4. Skip the end-repair step. Polymerase activity during the end repair step seems to explain the occurrence of tag jumping in our data. By removing the end-repair step, the tag jumping issue is probably reduced, although the impact on DNA sequencing beyond tag jumping is unknown.

| Conclusions
Artefactual sequences arising due to tag jumps leave behind a topological signature in metabarcoding data sets. This signature can be traced back to each samples' original position in the tag matrix, wherein the abundance of artefactual sequences form a "cross-like" pattern. In this study, where chimera from second-step PCR can be ruled out as a possible mechanism behind tag jumping, the topological pattern is consistent with a process caused by a 3′-end digestion occurring through the activity of the T4 DNA polymerase. We highlight that tag jumping is a major concern for metabarcoding studies: the degree of false positivity in our data sets was high enough to cause different sample communities to become statistically indistinguishable from one another. The high number of false positives in our PCR blanks strongly suggests that when attempting to curate data sets, the contaminating influence from tag jumping cannot be avoided by setting a threshold for reliable data based on the number of reads found in the blanks-a procedure commonly done in the published literature. While multiple sample replicates can be used to detect the effects of tag jumping, increasing the number of replicates cannot, by itself, solve the problem.
Only adequate laboratory protocols and careful inspection of the data can guarantee data unaffected by tag jumping.

ACK N O WLE D G E M ENTS
We thank the Swedish research council for founding JK (Dnr 2017-04548), the Swiss National Science Foundation for founding MAM (grant no. P2BEP2-188256) and four anonymous reviewers that helped us to improve our study.

O PE N R E S E A RCH BA D G E S
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available at [https://doi.org/10.5061/ dryad.tdz08kq33].

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in Dryad at https://doi.org/10.5061/dryad.tdz08 kq33.