Application of eDNA metabarcoding in a fragmented lowland river: Spatial and methodological comparison of fish species composition

Assessments of fish communities tend to rely on capture-based methods that, due to sampling biases, can underestimate actual species richness. Alternatively, environmental DNA (eDNA) based metabarcoding is a

There are, however, a number of questions remaining regarding the spatial and temporal distribution of eDNA in lotic systems, given how they differ from other habitat types due to their continuous and directional water flow. In lotic systems, organisms release genetic material which is expected to disperse downstream until it is chemically and/or biologically decomposed (Deiner & Altermatt, 2014;Jane et al., 2015;Wilcox et al., 2016). Therefore, at a sampling site, eDNA collected from water is likely to represent the composition of both local fish communities and those located upstream Nakagawa et al., 2018). This is in contrast to more traditional fish monitoring techniques based on capture methods that provide highly localized data at the time of sampling (Radinger et al., 2019). Temporal changes in fish behavior (e.g. spawning migrations, summer vs. winter habitat use, diel migrations) and varying rates of DNA degradation due to biotic and abiotic factors might strongly influence the detectable levels of eDNA (Shogren et al., 2017).
In river fish assemblages, a major anthropogenic pressure is the presence of impoundments (e.g. dams, weirs) that were originally constructed for navigation and hydrological regulation, impact longitudinal connectivity, and fragment habitats (Oliveira, Baumgartner, Gomes, Dias, & Agostinho, 2018;van Puijenbroek, Buijse, Kraak, & Verdonschot, 2019). Even single barriers can interrupt the longitudinal connectivity of a river (Jager, Chandler, Lepla, & Van Winkle, 2001), leading to species isolation (Falke & Gido, 2006) and restricting the natural movements of fish for reproduction, feeding, and habitat colonization and can potentially lead to genetic impoverishment. In Western Britain, the lower River Severn basin was subjected to considerable river engineering in the 19th Century through the construction of a series of weirs that enabled navigation further upstream for industrial purposes (Figure 1; Aprahamian, 1988). These inhibited the spawning migration routes of a number of diadromous fishes, including allis shad (Alosa alosa), twaite shad (Alosa fallax), and sea lamprey (Petromyzon marinus) (Maitland & Lyle, 2005;Aprahamian, Aprahamian, & Knights, 2010).
There is now a plan of river reconnection in place ("Unlocking the Severn") that either removes these weirs or provides fish passes that facilitate the upstream passage of migratory fish (Antognazza et al., 2019;Environment Agency, 2019a). Such reconnection schemes in river systems can lead to changes in the fish community (Catalano, Bozek, & Pellett, 2011;Magilligan, Nislow, Kynard, & Hackman, 2016). Therefore, knowledge on the community composition and distribution of fishes prior to reconnection are required to enable postconnection to be assessed and the management evaluated. There are also considerable spatial differences evident in the river habitats across the lower River Severn basin, ranging from the River Teme tributary providing a relatively narrow channel of pool and riffle characteristics, though to the main River Severn providing relatively deep, impounded sections ( Figure 1; Gutmann Roberts, Hindes, & Britton, 2019). These spatial differences in habitat typologies are then also likely to be reflected in considerable differences in the fish assemblages (Noble, Cowx, & Starkie, 2007).
Consequently, the aim of this study was to apply eDNA-based metabarcoding to characterize the distribution of fish species of the lower River Severn and its River Teme tributary, above and below major impoundments and prior to their river reconnection. The objectives were to identify the fish species present across the two rivers using eDNA and compare the fish species composition and relative abundance with long-term data obtained from approximately 26 years of fish surveys completed using capture methods. It was predicted that while both methods would demonstrate considerable spatial differences in the fish assemblage across the study area, eDNA would be more powerful at detecting fish of high conservation importance given the likelihood of these species being in low abundance (Jerde et al., 2013;Wilcox et al., 2013;Sigsgaard, Carl, Møller, & Thomsen, 2015;Deiner et al., 2017).

| Sampling sites
Water sample replicates were collected every 2 weeks across the River Severn and its tributary, River Teme during May and June 2018 ( Figure 1; Table 1). An additional 15 water samples were collected in May and June 2017 for one site ("Powick") (Table 1). Samples collected during May and June 2018 are the same samples used for monitoring shads through a real-time assay specific to Alosa spp. (Antognazza et al., under review), while samples collected in 2017 were used during the development of a real-time assay specific to Alosa spp. (Antognazza et al., 2019).
The River Teme is approximately 130 km in length and is impounded in its lower reaches by a weir ("Powick Weir") that has a head of approximately 1.5 m. This weir, located 3 km from its confluence with the River Severn, is considered as largely impassable for most fish in the river (Figure 1). The Powick sampling site was located just below this weir. The other sampling site on the River Teme was "Tenbury," located approximately 48 km upstream of Powick Weir, with a further weir ("Knightwick Weir") located between them ( Figure 1). Note Knightwick Weir is considered as less of a barrier to fish movements due to a lower head (Figure 1). The River Severn, approximately 354 km in length, has a series of six weirs in its lower reaches that disrupt its longitudinal connectivity. The most downstream sampling site on this river was the second most downstream weir on the nontidal section of the river ("Diglis Weir"), situated around 2 km upstream of the River Teme confluence. This weir was used as the most downstream site on the river, rather than the most downstream weir (Upper Lode Weir), as the latter represents the tidal limit of the river under most flows and is considered passable to most fish in the river, including all of the anadromous fishes in the river, especially during large spring tides when the weir is flooded (Bolland et al., 2019) (Figure 1). Correspondingly, sampling sites were located up-and downstream of Diglis Weir, and then upstream of the weirs at Bevere (10 km upstream from Diglis), Holt Fleet, and Lincomb (15 and 30 km upstream from Diglis, respectively) ( Figure 1). In subsequent analyses and evaluations, Diglis (Severn) and Powick (Teme) were therefore considered as the weirs that potentially represented the principal blockages to the movements of fishes between the different sections of each river. The fish species known to be present for at least some of the year in both rivers are provided in the Table S2.

F I G U R E 1
Locations of sampling sites on the River Teme and River Severn (gray circles) where the water samples were collected for the eDNA metabarcoding and the fish surveys were performed. Thick black lines refer to the two main impoundments on the Teme and Severn, being Powick and Diglis Weirs, respectively. Light gray lines refer to minor impoundments present on both rivers

| Contamination control
In order to minimize the probability of contamination, clean and consistent field collection protocols were established. Negative controls were included in the field, water filtration, DNA extraction, and DNA amplification steps. These were then sequenced, resulting in 36% of the total sequenced samples being negative controls. In addition to intermittent negative controls during the filtration process, negative controls were included at the start and end of field sampling for each site, water filtration, and DNA extraction.
Field equipment were stored and prepared for field sampling in a laboratory that is located in a separate wing to any DNA and tissue handling laboratory. Re-usable plastic bottles, ropes, and weights were used to collect water in the field after undergoing a stringent decontamination protocol that involved cleaning all equipment with 10% commercial bleach solution (immersion for a minimum of 30 min), followed by thoroughly rinsing them with sterilized water and then autoclaving them. Prior to field sampling, each bottle was prepared and stored in a sterile plastic bag, which was wiped on the outside with 10% Microsol detergent (Anachem). All sampling equipment per site was stored separately in large sterile bags that were wiped on the outside with 10% Microsol detergent and sealed until their use at the specific site. Each site also had its dedicated equipment which were sterilized and were held in site specific sterile bag, including single-use disposable gloves, spray bottle with 10% Microsol detergent, tissue paper, plastic bucket (cleaned with 10% Microsol solution) for storing weights after sampling, scissors, duct tape, and cable ties for finishing the set-up of sample bottles (see the next section for details), a sterile plastic bucket for storing all used equipment during sampling (this ensured the equipment did not come in direct contact with the field environment), a rubbish bag, and a sterile ice cooler for storing the collected water samples. After again cleaned with 10% Microsol solution before opening it for the filtration step.

| Sampling methods
Water samples were collected using 1-L sterile plastic bottles by sampling the river across its width using road bridges that traversed its entire width, with some samples also collected from the riverbank. Both methods were described and compared in Antognazza At two sites on the River Severn, Lincomb and Bevere, no bridges were available from which samples could be safely collected and so water samples were collected by samplers standing on the river bank, with water collected in a sterile plastic bottle attached to an extendible pole (from 1.8 to 3.7 m), with the bottle submerged sufficiently to allow collection through the water column. Sampling equipment was cleaned after each sample using 10% Microsol detergent. A total of five water samples were collected from each site.
Samples were alternately collected with the pole at its shortest and at its longest length. Two negative controls were also collected, one before starting water sample collection (with the pole at its shortest length) and one at the end (with the pole at its longest length). These negative controls consisted of 1-L sterile plastic bottles filled with sterile water which were treated in the same way as sample collection bottles; the lid was removed and put back on the bottle, and the closed bottle was then dipped in the water. The sampling equipment was changed between each sampling point with the pole sterilized using 10% Microsol solution. All samples were immediately stored on ice in a sterilized cooler and then in the sterile fridge overnight (5°C).

| eDNA filtering and extraction
All samples were filtered within 24 hr of their collection by filtering the water through a 0.45 µm cellulose nitrate filter membrane (WhatmanTM). Filtration negatives (1 L distilled water) were run before the first filtration and then after every sixth sample, plus at the end in order to test for contamination during the filtration step.
Filtration and DNA extraction were performed under a biological flow cabinet (Nuaire Labgard Class II biological safety cabinet) in a laboratory not dedicated to any DNA processing. Prior to filtration, all equipment was sterilized by submersion in 10% commercial bleach solution for 15 min and then washed with sterile water, followed by being placed under the flow cabinet with UV light for 20 min. Following each sample filtration, the filter paper was removed using sterile tweezers and placed in an individual power bead tube for DNA extraction and stored in a refrigerator. Tweezers were sterilized after each use in 10% Microsol solution, for at least 10 min and then washed with distilled water. Filtration equipment was sterilized after each sample in a 10% commercial bleach solution for 15 min, followed by flushing with tap water and then followed by two washes with distilled water.
The day after filtration, DNA was extracted using the DNeasy PowerWater Kit (Qiagen), according to the manufacturer guidelines, and was eluted in 100 µl elution buffer. Samples were quantified using the Nanodrop and stored at −20°C for a maximum of 3 months prior to their amplification.

| Amplification steps
All

| Bioinformatics and data analysis
Raw reads were processed through the DADA2 pipeline ver. 1.8 (Callahan et al., 2016)  To assign taxonomy, a megablast against the NCBI database was performed initially using the Galaxy platform online (usegalaxy.org; Afgan et al., 2018), aiming to find a match for each of the 6,490 sequences. The set expectation cutoff value (e-value) was set at 1e −06 .
A match for all 6,490 sequences was not achieved, but was for 99.3% of the sequences (6,445 sequences) at 76% identity cutoff (e-value) and the minimum query per hsp, allowing gaps. The remaining unmatched sequences (45 sequences) were re-analyzed separately to confirm the absence of any match with fishes before they were discarded. Blast using a custom database (

| Semi-quantitative eDNA analysis
Following the approach adopted in Balasingham et al. (2018), eDNA detections for target species were interpreted in a semi-quantitative way. The number of the eDNA sequences for a target species at a specific sampling location was divided by the total number of eDNA sequences returned at the same sampling location. This approach can be used to estimate the abundance of the signal of a target species as it gives a proportion of a specific species in relation to the total number of reads at one location (Balasingham et al., 2018).
Therefore, if in a particular location, and eventually further upstream, a species occurs in high abundance, water samples collected should reflect a higher proportion of eDNA sequence returns for the species as a measure of the relative signal (Balasingham et al., 2018).

| Fish survey
In the last 20 years, electric fishing surveys have been completed on the River Teme in the areas close to where the eDNA surveys were completed. With the River Severn being a much larger and deeper river than the Teme, it is more challenging to sample by electric fishing. Consequently, fyke net surveys (fished overnight) and micromesh seine netting of 0+ fish at the end of the summer (Table S3) complemented electric fishing data for the River Severn.
Electric fishing surveys on the River Teme were completed con-

| Comparing eDNA metabarcoding detection and fish capture survey techniques
The species recorded during the fish surveys and detected in the eDNA-based metabarcoding were compared based on presence/ absence of each species detected and then the proportion of species according to their numerical abundance within each survey.
To enable this, the eDNA data were managed in two ways: (a) sequences which gave at least one read count (singletons), prior to the application of a minimum number of reads count threshold (threshold set at 320 reads, see section "Library quality and raw data control" for details), and (b) only sequences which passed the minimum number of reads count threshold (determined by the maximum reads of contaminations 320 reads, see section "Library quality and raw data control" for details). Data from the fish surveys were summarized in three ways using detection thresholds in as similar manner as possible to the metabarcoding, although it is recognized that the physical capture of a fish species is not equivalent to an erroneous recording in eDNA. However, this is countered by the appearance of a single fish species in only one sample that is never captured in subsequent samples being either rare or highly transient and thus would not be considered as a long-term member of the natural fish assemblage. Correspondingly, the thresholds applied were (  Values of S S vary between 0 (no similarity on species composition between the methods) and 1 (perfect similarity between the methods), and was applied to comparisons between the two survey methods and, for the eDNA metabarcoding data, to each river to compare composition of the fish assemblage between the sampling areas (eDNA with the threshold of minimum reads applied and fish surveys with the threshold of species presence in at least 50% of surveys). Data collected in Powick in 2017 were only compared to data collected there in 2018. Secondly, a semi-quantitative approach was used where the relative abundance of the fishes detected as present in the river were compared between the methods using a bubble plot. For eDNA, this was based on proportion of the number of reads (after bioinformatics filtering and minimum threshold reads applied as described above). For fish surveys, it was the proportion by species (according to numerical abundances) captured by the sampling method.

| Library quality and raw data control
After quality filtering and merging, a total of 6,490 amplicon sequence variants were retrieved, with only 71 ASVs belonging to fishes. These 71 ASVs assigned to 20 fish species and were compiled in a table of "amplicon sequence variants" (Table S1). Out of all 68 negatives analyzed, only five displayed contamination (7.4%), which for two of them (negatives PBN9 and WBN5) the level of contamination was negligible. Specifically, field negative PBN9 displayed 71 reads in total (divided between two species: 32 reads assigned to Phoxinus phoxinus, 39 reads assigned to Alburnus alburnus) and field negative WBN5 displayed 465 reads in total (divided in two species: 312 assigned to Alosa spp. and 153 reads assigned to Pseudorasbora parva) (Table S1). Therefore, the reads threshold was set at 320 (i.e. all samples with less than 320 reads were discarded from further analyses). The additional three negative samples that were contaminated were: (a) an extraction negative (NE2), which displayed high level of contamination with >20,000 reads assigned to Alosa spp.; (b) a filter negative (N28, reads >8,000 assigned to Alosa spp.); and (c) a filter negative N7 with 6,000 reads each assigning to Alosa spp. and P. parva (Table S1).
In order to detect the source of contamination, the filtering, extraction, and amplification workflow were reviewed, and samples suspected to be contaminated (i.e. samples associated with a contaminated negative control) were removed from further analyses, resulting in six samples being removed (after sequence quality check and merging): two samples from Diglis, two samples from Bevere and two samples from Lincomb (Table S1). The selection of the 320 reads threshold resulted in five species having read numbers below this threshold and they had to be considered as undetected by eDNA metabarcoding at the applied level of detection threshold. The five species were Gasterosteus aculeatus (L., 1758), Perca fluviatils (L., 1758), Salmo trutta (L., 1758), Salmo salar (L., 1758), and Thymallus thymallus (L., 1758).

| Species detection with eDNA
Species assignment was not possible for eight of the 71 final ASV due to a nonconcordant match within the Blast analyses (Table 2).
Specifically, five sequences initially identified as A. alburnus (L., 1758), two sequences identified as Blicca bjoerkna (L., 1758) and one sequence identified as Cottus aleuticus (Gilbert, 1896) were discarded from further analysis, as no clear identification was possible (Table 2). Following these steps, a total of 15 fish species were  Table S4). The species with the highest number of reads was P. phoxinus in the Teme and R. rutilus in the Severn (Figure 2; Table S4).
The Sørensen's similarity coefficient between eDNA data col-

| Comparing eDNA metabarcoding detection and fish capture survey techniques
Comparisons of the eDNA metabarcoding versus the fish survey data under the less-stringency scenario resulted in the highest values of the Sørenson coefficient, ranging from 0.60% to 0.81% (Table 3). Under the moderate-stringency scenario, this reduced to 0.13 and 0.71%, while under the high-stringency scenario it ranged from 0% to 0.71% (Table 3). Generally, similarity decreased as the stringency level increased, except at Powick where it remained relatively high. Across both methods and the lowest level of stringency, the total number of fish species identified in the river was 25 (Table 3; Table S5a), but again this number declined as the level of stringency increased (17 at the high-stringency scenario), especially for the fish surveys (Table 3; Table S5a-S5c).
Comparison of the proportion of the fish species identified by the two methods (with the threshold applied for eDNA reads) and upstream and downstream of the weirs revealed some contrasting results. P. parva were not detected in any fish survey and P. marinus was detected only once (albeit it was recorded as a "lamprey" due to taxonomic ambiguity in identification); both species were detected using eDNA. While Alosa spp. were detected at all sites by eDNA detection, they were only recorded twice in fish surveys and only at Powick. Also, while eDNA-based metabarcoding detected the presence of B. barbus in all sites, this was not the case for fish surveys ( Figure 2; Table S5a-S5c). In fish surveys, E. lucius, A. anguilla, A. alburnus, and G. cernuus were more prevalent than suggested by our eDNA-based results ( Figure 2). Indeed, there appeared more differences by sampling method within each river than were apparent between the two rivers ( Figure 2). Spatial comparisons of the fish species detected between the two sampling areas of each river, and both methods revealed that similarity in species composition was higher in the River Severn (S S = 0.89 for eDNA, S S = 0.86 for fish survey) than the River Teme (S S = 0.71 for eDNA, S S = 0.53 for fish survey).

| D ISCUSS I ON
The eDNA-based metabarcoding approach used here was able to detect most of the species that have been detected using fish capture methods in the two studied rivers over the last 26 years. Applying a range of stringency levels to both the metabarcoding data and the fish survey data revealed that at relatively low stringency levels, a larger number of species was detected with both methods (n = 25 by fish surveys, 20 by metabarcoding); at the highest stringency level, this decreased to 15 species by metabarcoding and nine species by fish surveys. In the low stringency scenario, a no-minimum reads threshold for the metabarcoding data was applied and was considered as the equivalent of using all the fish capture data to represent the species richness of the fish assemblage (including occasions of when the capture of a single individual fish species occurred in a single survey). Application of the high-stringency scenario represents

TA B L E 2 Sequences with unambiguous matches between databases
results where greater rigour has been applied in analyses, ensuring that only species which were regularly present in fish catches and at relatively high proportions in the metabarcoding data were used to describe the composition of the fish assemblage.
A criticism of eDNA-based detection is the possibility of false positives due to contamination; therefore, decontamination procedures, designed to significantly limit contamination (Goldberg et al., 2016), were followed in the field and laboratory processes for this study. There were 68 negative samples sequenced ( Table S6).
In contrast, P. parva was detected in 2.6% of the samples (at Diglis and Worcester), but also in 1.4% of the negative samples. Therefore, its potential presence in the River Severn has to be considered carefully, especially as it is a highly invasive species across Europe (Gozlan et al., 2010) whose presence in England has been minimized though a programme of eradicating lentic populations to prevent their dispersal into lotic systems (Britton, Davies, & Brazier, 2010).
Notwithstanding, P. parva populations have been known to be present in the lower River Severn basin, with a population eradicated from a pond connected to the River Teme in 2005 (Britton, Brazier, Davies, & Chare, 2008), with a more recent population present in a pond connected to the River Severn that was eradicated in early spring 2017, just prior to our water sampling (Canal River Trust, 2018).
As these ponds have connection to the Severn, there was some likelihood of their dispersal from the pond into the river, as this dispersal mechanism is common in this species (Davies & Britton, 2016). For example, in the Meuse River in the Netherlands, floodplain lakes are used by P. parva as spawning, nursery, and adult habitats, with river channels mainly used as dispersal corridors (Pollux & Kurosi, 2006).
Given the high invasiveness of P. parva, allied to it hosting a novel, Note: The total number of species highlighted in bold character is the sum of detected species using both methods.
generalist pathogen (Sana et al., 2018), then more work is recommended to determine more definitively whether P. parva and/ or its pathogens are now present in the river.
In order to track contamination of water samples, negative samples were included at each step (i.e. water sampling, filtering, DNA extractions, first and second PCR, sequencing) (Goldberg et al., 2016). However, a consistent pattern of contamination could not be detected, preventing it being tracked back to a specific stage in analytical process (e.g. filtration, DNA extraction or PCR stage).
As not all of the analyzed negative samples amplified Alosa spp. DNA (Table S6), then contamination during filtration and DNA extraction can be excluded. As such, the source of contamination has to remain speculative, with it potentially occurring at the library preparation stage and/ or during multiplexing prior to sequencing (due to 96well plates being used), or it could be due to tag-jumping (i.e. Schnell, Bohmann, & Gilbert, 2015). As the contamination was dealt with by removing all of the samples that were associated with contaminated negative samples, then the subsequent discussion points are based on data analyses that can be considered as robust.
In general, the eDNA-based metabarcoding detected most of the species that had been recorded in fish surveys completed over a 26 years period and also had increased detection of Alosa spp. and to be present in the river from either the fish surveys or from angler catches (e.g. Nolan, & Britton, 2018). As such, while the eDNA data have shown some high promise in detecting species that were not apparent in fish surveys, their data need to be complemented from species detections using other sampling methods if reliable spatial comparisons of total fish species richness are to be made. In addition, the completion of fish surveys and angler catches in recent years on both rivers have provided biometric data on fish populations that have high utility for fisheries management (e.g. Amat Trigo, Roberts, & Britton, 2017;Nolan et al., 2019), aspects which cannot be provided by eDNA methods.
The effect of multiple environmental factors on the efficiency of eDNA detection (e.g. pH, temperature, UV, PCR inhibitors, organic materials) has been recently investigated (Barnes et al., 2014;Jane et al., 2015;Strickler, 2015;Takahara, Minamoto, Yamanaka, Doi, & Kawabata, 2012), as their impact ultimately determines the concentrations of DNA in the environment. Pivotal to this is the need for increased knowledge on the transport and degradation rate of eDNA within fluvial systems (Deiner et al., 2017;Shaw et al., 2016). Given that many of the factors affecting eDNA presence and detection are environmental and seasonally variable, then the timing of surveys remains an important factor in determining the efficiency of species detection. However, this is also the case for fish surveys based on capture methods, where there can be considerable differences in seasonal habitat use between species and within species, with potential for high habitat partitioning between different life-stages of fishes (eggs, larvae, juvenile, adult) (Radinger et al., 2019). Many adult fishes are also highly vagile, with species such as B. barbus having home ranges of over 12 km and showing considerable movements upstream in early spring for spawning . Thus, the completion of sampling events for eDNA at discrete times of the year, such as spring in this study, are likely to be too simplistic to provide a comprehensive perspective on the fish assemblage and for a more robust spatial and temporal description to be made, is likely to require increased sampling frequency throughout the year, and across a greater number of sampling sites.
In summary, the results of the eDNA metabarcoding revealed that it provided a relatively robust description of the composition of the fish assemblage from a limited number of water samples collected over a discrete 2 months sampling period. Indeed, depending in the level of stringency applied to the data, the results were similar to those retrieved from over 20 years of fish capture surveys. eDNA metabarcoding also detected some fish that rarely, if ever, get sampled by capture techniques, such as Alosa spp. and P. marinus. The eDNA data also provided a snapshot of the fish assemblage on the two rivers prior to the outlined works on river reconnection, providing baseline data on fish distributions in spring for subsequent com- the River Severn has the potential to influence the composition of the fish assemblage through most of the basin, then refined methods should enable improved assessments that will ultimately support management conservation actions.

ACK N OWLED G M ENTS
Financial support for this study was by a studentship provided by the Severn Rivers Trust and Bournemouth University, with funding from "Unlocking the Severn for LIFE," LIFE Nature Programme (LIFE15/ NAT/UK/000219), and HEIF 5+1+1 funding. D.S.R. was supported by the Natural Environment Research Council award number NE/ R016429/1 as part of the UK-SCAPE programme delivering National Capability.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.