Habitat suitability models reveal the spatial signal of environmental DNA in riverine networks

of eDNA and habitat suitability models allows larger scale and spatially integrative inferences about biodiversity, ultimately needed for the management and protection of biodiversity.


Introduction
Landscapes are undergoing significant changes from regional to global scales driven by anthropogenic activities, often triggering losses in biodiversity (Leclère et al. 2020, Tickner et al. 2020, Gonzalez et al. 2023).Freshwater systems are among the most affected by direct and indirect anthropogenic impacts (overexploitation, agricultural expansion, pollution, climate change), and they are losing species at accelerating rates (Dudgeon 2019).To quantify and subsequently mitigate species loss, robust and high-resolution monitoring data of biodiversity is crucial but challenging to gather (Soberòn and Peterson 2009, Chandler et al. 2017, Stephenson et al. 2017).Rivers are highly structured and provide ecological niches that are hardly comprehensively sampled by current biodiversity monitoring, and for many taxa, occurrence data are still scattered (Inoue et al. 2017).Different species show heterogeneous distribution patterns, based on their physiological requirements for their environment (Miller and Holloway 2017).This link between the abiotic environment and species' occurrences can be used to predict species distributions in given 'analog' landscapes (Guisan and Zimmermann 2000).As such, species distribution models provide large-scale projections based on the identification of suitable habitats (Guisan et al. 2002).
Species distributions can also be based on direct sampling in the field, however restricted to discrete sampling points.The sampling of environmental DNA (in brief: eDNA) has become a cost-efficient and non-invasive method to monitor communities across time and space (Baird and Hajibabaei 2012, Deiner et al. 2017, Altermatt et al. 2020).The DNA suspended in water (either fragments, in an organelle, or in a living cell) can be extracted, sequenced, and taxonomically assigned with the help of reference sequence databases (Taberlet et al. 2012, Pawlowski et al. 2020).The sampling of eDNA has established as a monitoring tool in aquatic communities (Thomsen and Willerslev 2015, Leese et al. 2016, Bruce et al. 2021).Within the mixture of genetic material suspended in waterbodies, a sample integrates genetic information of multiple taxonomic groups (Keck et al. 2017, 2022a, Cordier et al. 2018) at larger spatial scales than traditional point samples (Deiner et al. 2016, Carraro et al. 2020a), thus giving a more integrated perspective of ecosystems.
The diversity and distribution of species in rivers are determined by their dendritic structure, by local abiotic and biotic factors, and by the spatial linkage between upstream and downstream patches.Species are heterogeneously distributed, with indicator groups varying differently across the spatial scales of monitoring in response to changes in the environment (Kaelin and Altermatt 2016).Currently, describing aquatic indicator communities is often limited to localized sampling (Kuemmerlen et al. 2016).For a comprehensive assessment, the inclusion of models based on spatially explicit measurements of environmental parameters can validate spatially constrained field-sampling (Kuemmerlen et al. 2016, Keck et al. 2022b).For the sampling of eDNA, species distribution models can strengthen the plausibility of eDNA-based species detection and validate the detection of a rare species (Riaz et al. 2020).This can link eDNA signals and their source population upstream, improving the ability to interpret these signals robustly.
The transport of eDNA has been addressed by experimental and observational studies, where eDNA was tracked in artificial streams (Shogren et al. 2019), in natural streams by placing caged organisms and measuring their DNA signals downstream (Jane et al. 2015, Thalinger et al. 2020), or by the detection of geographically narrow distributed species, such as species constrained to lakes (Deiner et al. 2015, Pont et al. 2018).These estimates of the distance that DNA travels in the water are varying widely (Jo and Yamanaka 2022), yet the potential of using information on eDNA transport has only recently been used to assess and extrapolate biodiversity (Carraro et al. 2020a) Yet, the combination of habitat suitability models and eDNAbased assessment of biodiversity have barely been exploited to better understand the spatial distribution of organisms and the transport of eDNA for a better spatial assessment of biodiversity in riverine systems (Riaz et al. 2020).
Here, we used high-performance habitat suitability modelling to evaluate the spatial plausibility of eDNA-based species detection.We correlated the signal from sampling eDNA in a landscape perspective by predictions from habitat suitability models (HSM) and environmental DNA (eDNA) to characterize the distributions of 127 indicator species across all major river catchments of Switzerland.We combined predictions aggregated for more than 22 000 subcatchments with eDNA sampled at 172 river sites as part of a biomonitoring program.We tested the spatial integration and plausibility of eDNA sampling for 127 aquatic insect species belonging to the orders Ephemeroptera, Plecoptera and Trichoptera species (EPTs).We used HSMs to evaluate the spatial plausibility of eDNAbased species detections, and further estimated downstream transport distances of DNA signals at the landscape level.

Study area and routine biomonitoring (NAWA project)
We studied aquatic biodiversity of 127 of may-, stone-, and caddisflies (Ephemeroptera, Plecoptera and Trichoptera) species across all rivers and streams in Switzerland.The study area covers 41 285 km 2 , discretized into 21 858 subcatchments, each about 2 km 2 in size.Habitat suitability modelling targeted these subcatchments, and we contrasted it with eDNA-based detection of these species at 172 sites.An overview of the workflow for both approaches can be found in the Supporting information.
The 172 eDNA sampling sites are part of a Swiss federal biomonitoring program conducted by the Swiss Federal Office for the Environment (NAWA: Nationale Beobachtung Oberflächengewässerqualität, BAFU 2013, 2016) and eDNA samples were collected in early summer of 2018 (NAWA Spez) and spring of 2019 (NAWA Trend) at the same time when the regular aquatic invertebrate monitoring was conducted.

eDNA sampling
For the sampling of eDNA, water was filtered at each site using Sterivex filters (pore size 0.22 µm).Samples and negative controls were filtered in the field.Four filter replicates were collected per site, each filtering 500 ml water, totaling 2 litre.The filters were sealed with Luer caps, placed in Ziplock bags, and transported back to the lab in a cool box with cooling elements.They were stored in the lab at −20°C until the DNA was extracted.Further details of the processing of the eDNA data are described in Brantschen et al. (2021Brantschen et al. ( , 2022)).

DNA extraction of Sterivex filters
The DNA extractions were conducted in a laboratory dedicated to eDNA work.This includes a pre-room to dress up with protective laboratory suits and clean all equipment that enters the eDNA lab with a bleach solution.Laminar flow hoods are bleached and UV-treated before usage and the reagents are stored and processed in a separate hood from the DNA-samples.The cleanroom has a positive air-pressure to avoid contamination and it is only allowed for pre-PCR work and is spatially separated from other molecular work to avoid any contamination of the samples (Deiner et al. 2015).DNA was extracted using the Qiagen PowerWater Sterivex Extraction Kit.The eDNA samples were randomly assigned to batches of 12 filters and processed following the manufacturer's protocol.Finally, the DNA extracts were eluted in a volume of 100 µl of elution buffer for each of the filter replicates and stored at −20 °C until further processing.

eDNA metabarcoding: PCR steps
In total, three sequencing libraries targeting the COI barcode were prepared.For the sampling sites of the NAWA Trend (n = 89), 2 primers were used, for the NAWA Spez sites (n = 83), only one primer pair.The fwhF2/EPTDr2n primer pair (Vamos et al. 2017, Leese et al. 2021) targets a 142 bp fragment within the Folmer region and was used for the NAWA Trend and the Spez campaign.The miCOIintF/ jgHCOI2198 primers (Geller et al. 2013, Leray et al. 2013) is more degenerate and targets a 313 bp long fragment in the same gene region.The libraries for the NAWA Trend were prepared with the Nextera XT library prep Kit (Illumina, Inc.San Diego, CA, USA), and the fwhF2/EPTDr2n library loaded onto a flow cell with a target concentration of 14 pM, and 10% PhiX.The same was done for the miCOI-intF/jgHCO2198 library with a target concentration of 16 pM.Both libraries were pair-end sequenced on an Illumina MiSeq (Illumina, Inc.San Diego, CA, USA).The NAWA Spez library (fwhF2/EPTDr2n) was sequenced on a NovaSeq on one lane of an SP flow cell with v1 chemistry, using a 150 pair-end sequencing approach.

Bioinformatic analysis
The two sequencing runs generated for the NAWA Trend were demultiplexed and quality filtered with FastQC (Andrews et al. 2012).Then, raw reads were end-trimmed and merged, and primer sites were cut using usearch (Edgar 2010).We obtained chimera-corrected and clustered zeroradius Operational taxonomic units using UNOISE3 (Edgar 2016).For the NAWA Spez, the data from the NovaSeq were processed using a customized workflow, with default settings of all functions if not other specified.A more detailed description is provided in the Supporting information.

Habitat suitability models
Habitat suitability models (HSM; Guisan et al. 2017) for each of the 127 species were computed using the 'N-SDM' software (Adde et al. 2023a).NSDM is a state-of-the-art HSM platform built around a spatially-nested framework intended to facilitate the combination of a 'global' model quantifying the species response to the set of bioclimatic conditions that can be found across its full distributional range, with a 'regional' model fitted with fine-scale habitat covariates.In this study, the global and regional models were combined by using the 'covariate' nesting strategy available in N-SDM.Two sets of species occurrence records were used to cover Swiss occurrences and occurrences outside Switzerland.These occurrence records are based on the most extensive dataset combining all available data from routine biomonitoring and experts.For Switzerland, occurrence records for the 127 species aggregated at 25 m resolution for the 1980-2021 period were provided by the Swiss Species Information Center InfoSpecies (www.infospecies.ch)on 23 August 2021 (https://doi.org/10.15468/htjezm).Occurrence records outside of Switzerland for matching species and periods were obtained from the Global Biodiversity Information Facility (GBIF; https://www.gbif.org/) on 27 October 2021 (https://doi.org/10.15468/dl.zwp3dx).To limit spatial clustering effects, occurrence records were spatially disaggregated.For each species and occurrence set, 10 000 background absences were randomly generated across the target areas to contrast observations.N-SDM default settings were used for modelling all species.For more information on the general specifications and technical details, see the settings file used for running N-SDM (Supporting information) and the companion ODMAP (Overview, Data, Model, Assessment and Prediction) protocol (Zurell et al. 2020) (Supporting information).In short, the 'covsel' embedded covariate selection procedure (Adde et al. 2023b) included in N-SDM was used to select the best subset of variables for modelling each species (see the Supporting information for the detailed list of candidate variables and for a description of the HSM fitting).The most often selected predictors in the global model were related to temperature (mean annual temperature, mean annual temperature range, temperature seasonality, isothermality) and precipitation (driest month, wettest month, seasonality) (Supporting information).For the regional predictors, the most often selected predictors were associated with land use and topography (agricultural cover, runoff, baseflow index, landslides, alpine pasture cover, distance to waterbodies) (Supporting information).
16000587, 0, Downloaded from https://nsojournals.onlinelibrary.wiley.com/doi/10.1111/ecog.07267 by Paul Scherrer Institut PSI, Wiley Online Library on [06/06/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License ing data was done using the 'phyloseq' package and in the tidyverse environment (Wickham et al. 2023).All analyses at the catchment (i.e. the 2 km 2 subcatchments) level were based on shapefiles from the basic model of Swiss geological data ('Topographische Einzugsgebiete Schweizer Gewässer'), assembled by the FOEN.We used a layer with the geometries for all catchments (n = 21 858) of 2 km 2 (hereafter: subcatchments).This layer contains the subcatchments' spatial dimensions (polygons), a unique identifier for each subcatchment and their nestedness in the larger 2 km 2 catchments.For each, we obtained information on the spatial hierarchy (upstream H1/downstream H2) within the 40 km 2 , and the coordinates of all the outlets (lowest point in flow direction through which the subcatchment drains).For the analysis, the maps from the HSM continuous predictions were discretized at a subcatchment level, if the species occurred in any pixel with a probability above the species-specific maxTSS.The result was a 'species × subcatchment' matrix containing presence/ absence data for each of the EPT species.For the eDNA data, we filtered the sequencing data to only species from the order Ephemeroptera, Trichoptera and Plecoptera, and combined the detections from the two primers into a presence/absence matrix for all the sampling sites and all the detected species.By this, we contain the spatially most extensive dataset on EPT detection using eDNA in Switzerland.Then, the coordinates of the eDNA sampling sites were intersected with the spatial layer to obtain the corresponding identifier (n = 172).We compared local detections between HSM and eDNA at the subcatchment level individually for each species.
The maps that showed the methods' overlap were generated with the Swissriverplot package (Alther and Altermatt 2020).To account for detections of eDNA that were outside of suitable habitat, we tested the plausibility of DNA transport distances by calculating the Euclidean distance between the eDNA sampling coordinates and the outlets of upstream suitable subcatchments.This was an iterative process: first, for each species, we filtered the sampling sites where only eDNA detected the species.We then identified all subcatchments upstream in the same 40 km 2 catchment by using the hierarchy information (H1, H2) provided in the geodata.We intersected these subcatchments with the predictions of the HSM, and retained only subcatchments that were upstream of the eDNA sampling point and where suitable habitat was predicted.The eDNA detections with no suitable habitat upstream were discarded for this calculation.In a next step, we calculated the Euclidean distances between the sampling site to the outlets of the suitable upstream subcatchments with the crossdist.defaultfunction in the spatstats package (Baddeley and Turner 2005).As we aggregated the HSMs at the subcatchment level, the Euclidean distance to the sampling point is a reasonable approximation for the river path.For each eDNA sampling site with multiple potential suitable habitats upstream, we extracted only the closest upstream habitat.This calculation of the shortest distance was done for all sampling sites out of the 172 sites where a species was detected only by eDNA and not the HSM.In total, the 136 species could have been detected at 172 sites, leading to 23 392 potential detections.From those, there were 6199 detections with eDNA only.For all these sites, we calculated the minimal distance to the outlet of an upstream catchments with suitable habitat, if possible.For 824 (3.5%) of these sites eDNA detections were without any upstream suitable habitat.To test if the calculated minimal distances are plausible transport effects, we created a Null model of mean minimal distances.
For this, we randomized the sampling sites across Switzerland (Supporting information), and then assigned species detections to these points.The number of detections of a species was kept equal to the actual number of observations.We then used the same approach, to calculate distances as for the observed detections: For every species and a sampling point, we selected only subcatchments upstream and with a positive HSM prediction and then calculated the minimal distance between the random eDNA detection and the upstream catchments with a HSM predictions.
As for every species, there were multiple sampling points with a (random) eDNA detection only, we averaged the minimal distance for each species, resulting in an average minimal distance across all sites, and a frequency distribution of minimal distances across species.We then tested for a difference between the observed shortest distance from a species to the upstream suitable habitat prediction and the shortest distance of the Null model based on a paired t-test.

General patterns in the prediction of habitat suitability and eDNA detections
Occurrence of EPT species showed a highly heterogeneous distribution across Switzerland.For the least widespread species, suitable habitat was predicted in 492 out of 21 858 (0.02% of sites) subcatchments, and in 21 479 subcatchments for the most common species (98.2%).A subset of these catchments was sampled with eDNA (172 sampling sites) as part of the biomonitoring program for the same species.The HSM predicted species occurring on average at 99 ± 47 sampled subcatchments, whereas eDNA detected EPT signals on average at 17 ± 24 sites (range 1-140).Thus, suitable habitat was more widely predicted across the 172 sampling sites (also implying a higher local richness of suitable habitat being predicted; Supporting information) than species detected with eDNA.The frequency of a species to be detected by eDNA and to be predicted by the HSM were significantly correlated (Supporting information).
The overlap between the HSM predictions and the eDNA species detections were aggregated across all species (Fig. 1A).A high number of the NAWA sampling sites were predicted to be unsuitable for EPTs (Fig. 1A, dark-blue boxplot, mean = 27, SD ± 47).Also, the number of sites where suitable habitat was prevalent but eDNA did not detect the species (Fig. 1A, orange boxplot, mean = 83, SD ± 42) was on average high.Congruence between the detections was generally low, as only at a few sites eDNA detected species in a surrounding suitable habitat (Fig. 1A, pink boxplot, mean = 9, SD ± 23).Similarly, the eDNA sampling led to few detections at sites where no suitable habitat was predicted (Fig. 1A, light-blue boxplot, mean = 7, SD ± 9).On average, we predicted EPT species to be more broadly distributed (occurring in catchments where eDNA was sampled but did not detect the species).The number of detected species with eDNA that were not predicted at that site was low.
The distribution of species (both from prediction of suitable habitat and from eDNA sampling at the sampled sites were highly variable between species (Fig. 1B).The most common species at the top of the distribution (Baetis rhodani) was detected/predicted at most sites, whereas species at the bottom of the distribution have a narrow geographic distribution, which is also reflected in little predicted suitable habitat and low numbers of detections with eDNA.

(A) ( B)
Figure 1.Number of detections/predictions for each of the 127 aquatic insect species across the overall 172 sites, and match/mismatch between the eDNA and HSM approach.(A) Across all species, the number of sites of positive congruence (both methods detected/predicted species occurrence) is given in magenta, the number of sites at which only one method predicted/detected the species is given in orange (for HSM only) and light-blue (for eDNA only, respectively), or the number of sites of negative congruence (neither method detected/predicted the species) in dark blue.(B) Fraction of records visualizing differences in the detectability for all species, the horizontal bars indicate the proportions of sites assigned the four categories described in (A) in decreasing order of HSM predictions from top to bottom.

Species maps show spatial integration of the detections/predictions
EPT species show distinct distributions across the Swiss landscape.Species were widespread in the lowlands of the Swiss Plateau (Fig. 2A), almost opposite constrained to alpine streams (Fig. 2B), or show narrower distribution ranges e.g. in the northern flanks of the Alps (Fig. 2C).The habitat suitability was predicted as presence-absence at fine resolution (pixel size) across all catchments in Switzerland (left panels) for each species.The predictions were aggregated at a catchment level and subsequently, the range mapping constrained the analysis to subcatchments within a certain range from the observations (central panel).Combining the predicted suitable habitat with the range mapping eventually resulted in maps of suitable habitats that were compared to species detections at discrete eDNA sampling sites (right panel).
To illustrate the heterogeneous patterns in detections/predictions, we looked at individual species maps, and show here three species as examples that vary in their occurrence to summarize the general findings across all 127 species, and individual species maps can be found in the dryad repository (S2).The habitat suitability maps of three insect species (Fig. 2A Brachyptera risi, 2B Drusus bigattus, and 2C Rhitrogena dorieri) with distinct distribution patterns were overlaid with the detections based on the eDNA sampling campaign.The EPT species show distinct distributions across Switzerland.Species are widespread in the lowlands of the Swiss Plateau (Fig. 2A), constrained to alpine streams (Fig. 2B), or show narrower distribution ranges, e.g. in the northern flanks of the Alps (Fig. 2C).

Spatial linkage between suitable habitat predictions and eDNA detections
In a small proportion of sites, the eDNA sampling also detected species' signals at sampling sites where no suitable habitat was predicted.This spatial mismatch points to potential flow-directed linkages between the eDNA signal and Figure 2. Predicted habitat suitability of three representative aquatic insect species, namely Brachyptera risi (A), Drusus bigattus (B), and Rhitrogena dorieri (C), and the match/mismatch with their recorded occurrence based on a singular eDNA sampling campaign.Left panels show predicted suitable habitat for each of the three species at a 25 m resolution using global and local environmental predictors.Middle panels show aggregated suitable subcatchments (light-green polygons, at about 2 km 2 scale) based on true records.For this, a biogeographic filter constrained potentially suitable catchments based on a clustering around true records (historic observations, red plus sign).Finally, the right panels show the combined predicted potential distribution based on the intersection between predicted suitable habitat and observed biogeographic range.At 172 sampling sites, the respective species was then assessed using eDNA, and the match/mismatch between these two methods (HSM and eDNA) is given as colored dots, indicating if a species was locally detected (or not) with eDNA, and if a species was locally predicted by the HSM (or not), respectively.upstream suitable habitat.For any given species (Fig. 3A, e.g.Chaetopteryx villosa), we illustrated its detections and suitable habitat across the 172 sites (Fig. 3A).The mismatches become apparent in different spatial patterns, illustrated in the insert (Fig. 3B): 1) for some sites, eDNA does not detect the species despite the site being situated in a subcatchment where the HSM predicted suitable habitat.However, at a sampling site further downstream, eDNA picked up the species' signal.2) At another site, there was a large coherent habitat patch predicted as suitable for the species that was sampled at multiple sites, but eDNA only detected the species in one of those sites.
3) At the last site, the species was detected at a single isolated sampling site only by eDNA, where also the closest sampling sites were not predicted to be suitable habitats, however multiple subcatchments in proximity and upstream were.

Minimal upstream distance of suitable habitat to eDNA detections
Locally, we distinguished between scenarios of overlapping suitable habitat predictions and eDNA detections at the sampling sites (Fig. 4A), where the HSMs validated eDNA signals when coinciding (either both were positive, or both were negative for a specific species' signal, Fig. 4A, scenario 1).
At some sampling sites we detected the DNA of a species, yet outside the predicted suitable habitat (Fig. 4A, scenario 2).Such scenario can also provide information on the possible spatial extent of eDNA transport and/or possible limits of the spatial projection of the HSM.Detections by eDNA sampling only can be linked to a subcatchment with suitable habitat at a distance D (Fig. 4A) situated upstream.Alternatively, eDNA detections are not hydrologically linked to suitable habitat upstream (Fig. 4A, scenario 3).This spatial context allows to infer the minimal distance D for sampling sites with upstream suitable habitat.For each species detected by only eDNA at a site, the minimal distances between the sampling site of the eDNA detection and the closest upstream suitable subcatchments were calculated and averaged for all the sites with only eDNA per species (Supporting information).The minimal distances ranged between 0.2 and 41 km (median 1.06 km, median absolute deviance ± 1.2) for different species.Setting these minimal distances into a spatial context, we calculated minimal distances from a randomized distribution of species detections by random points across Switzerland (Supporting information) while keeping the total number of detections per species constant, i.e. rare species had fewer random detections and widespread species were detected at more random sites (Supporting information).This randomization Figure 3.( A) Distribution of the suitable habitat on a catchment scale based on a habitat suitability model (HSM) and eDNA sampling across Switzerland, exemplified with the Trichoptera species Chaetopteryx villosa.The predictions of suitable habitat were aggregated at the subcatchment level (green polygons) and compared locally to eDNA sampling sites (colored points).At seven sites, congruence between HSM prediction and its detection with eDNA was observed (magenta points), at 31 sites HSM predicted habitat suitability, yet it was not detected with eDNA (orange points), and at 7 sites it was observed with eDNA outside subcatchments with predicted suitable habitat (light-blue points).Finally, the species was not detected nor predicted at the remaining 132 sites.Mismatches between eDNA and HSM can be, among others, due to HSM predicting habitat suitability, yet not occurrence in strict sense, or eDNA being able to pick up a signal downstream of an occurrence due to transport, illustrated in the insert (B): focusing on the spatial linkage between the detections, 1) there were sites that were situated in a suitable habitat subcatchment but eDNA did not detect the species locally, but only in the most proximate site downstream (light-blue point).In a larger coherent suitable habitat patch 2), the HSM predicted all samplings sites in the subcatchments to be suitable but eDNA overlapped in only one sampling site (magenta point).3) For a sampling site further downstream, only eDNA detected C. villosa with multiple suitable habitat patches upstream and in close proximity of the detection.
16000587, 0, Downloaded from https://nsojournals.onlinelibrary.wiley.com/doi/10.1111/ecog.07267 by Paul Scherrer Institut PSI, Wiley Online Library on [06/06/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License of possible eDNA-based species detections disrupts the linkage between eDNA and upstream HSM predictions for a given species, and we found a significantly longer minimal distance (p < 0.001, t = 4.2, df = 122) between each random detection and suitable habitat predictions with minimal distances ranging between 0.9 to 13 km (median 4.2 km, mad ± 3.8 km) with no direct correlation to the observed distances of a given species (Supporting information).

Discussion
To understand state and change of biodiversity in freshwater system a robust and spatially explicit understanding of species' occurrences is needed.As two complementing approaches, habitat suitability models (HSM) provide information about species' potential occurrence, while eDNA based detections provide indication of species' actual occurrence.Individually, both approaches are used in ecological studies to characterize biodiversity, yet only few studies combined them (Riaz et al. 2020) Here, we integrated high-resolution habitat suitability models with eDNA-based assessments of aquatic invertebrates in riverine networks to understand their individual and combined capacity to inform on species' occurrence as well as the spatial extent of eDNA-based inference.Habitat suitability models are based on more continuously documented environmental parameters throughout a landscape and can inform about gaps between eDNA sampling points.We used the predictions of suitable habitat for the aquatic insects Ephemeroptera, Plecoptera and Trichoptera (EPTs) to identify DNA signals that were detected outside of the range of the downstream of suitable catchments at sampling sites that were not predicted in a subcatchment with suitable habitat.The predictions integrated the eDNA sampling into the landscape, and allowed for establishing a spatial link to suitable habitat upstream by calculating a minimal distance of DNA transport.We found that the observed distances between local eDNA detections and upstream HSM prediction were significantly shorter than what would be expected from randomly distributed detections.

The predicted versus the observed niche: mismatches between HSM and eDNA
The HSMs estimated species to be more homogeneously distributed across the sampling sites than what was observed by the eDNA detections for two reasons.As with every sampling technique, false-negatives and false-positives are also occurring with eDNA sampling, and they cannot be minimized at the same time.With one sampling campaign only, we expect that false negative are more relevant than false positives, and that further sampling time points, or larger volumes or more sequencing depth would have resulted in more species detected, yet at the cost of more false-positives (Goldberg et al. 2016, Cristescu and Hebert 2018, Altermatt et al. 2023).As such, our approach is relatively conservative, which also allows to infer transport distances of upstream signals that are at some point fainting away and become non-detectable.
Importantly, some level of mis-matches (e.g.predicted absence of a species yet positive eDNA signal, or vice versa) is expected, as HSM are indicating possible habitats, yet not proofing species occurrences/absences, such that species can also be found outside the predicted range.Contrastingly, presence of a DNA signal does not automatically indicate presence of a species, as some transportation or import of DNA through vectors is known.The spatial linkage between predicted suitable habitat and eDNA detections for any given species can fall into three classes (cases of these three classes numbered 1 to 3). 1) A positive eDNA signal (red) is coinciding with the subcatchment in which the HSM model predicts suitable habitat (light-green).2) A positive eDNA signal is found in a subcatchment, yet the HSM only predicts suitable habitat upstream.The distance D to this closest upstream subcatchment is calculated and used for the calculation of the minimal upstream distance to suitable habitat, and interpreted as possible eDNA signal transport through water flow.3) A positive eDNA is observed in a subcatchment, yet without predicted HSM in this or any upstream subcatchments.These cases were calculated for any species in (B).Observed minimal distance (∆ D) for every detection of a species is calculated (with eDNA, in red) and compared to randomly selected detections of species (grey) and the respective closest upstream suitable habitat.The observed minimal distances were averaged across all detections for each species.The y-axis shows the number of species for a given mean minimal distance (x-axis).
Methodological biases of habitat suitability models are associated with the resolution of geo-environmental predictors (e.g.geographic, topographical, climatic) to describe the realized environmental niche of species (Guisan et al. 2017).The underlying species occurrence records used for fitting the HSMs are the realization of a combination of the abiotic environment and biotic interactions in the local community, and to some extent the ability of species to reach a location (dispersal) (Wisz et al. 2013, Riaz et al. 2020, Poggiato et al. 2021).Yet, these models still overpredict distributions but this can be accounted for a posteriori on the predictions, e.g. using regional species pool derived from actual species ranges (e.g.IUCN) and polygons from observations (Lessard et al. 2012).

Integrating discrete sampling (eDNA) with continuous predictions (HSM)
Of particular interest in the mismatching scenarios are detections of a species by eDNA when there is no overlap with predicted suitable habitats locally, but suitable habitat is situated further upstream.Having the continuous and spatially explicit information provided by HSM from upstream areas gives the opportunity to infer potential unseen populations of a sampled species but also to validate eDNA-based detections if locally congruent.Such validation is highly relevant for eDNA-based monitoring of waterbodies and is so far often done by comparison with traditional methods at a local level (Brantschen et al. 2021, Keck et al. 2022b).Whereas HSMs are not aimed to provide biomonitoring data, they have the potential to complement and validate field-based methods (here eDNA), and further integrate the sampling point into a landscape perspective, leading to a broader understanding of biodiversity dynamics in the context of conservation and management (Riaz et al. 2020).
We used a broader spatial scope of HSMs to establish estimates for eDNA transport within a catchment, as the sampling of eDNA provides an integrative perspective of upstream biodiversity (Deiner et al. 2017, Carraro et al. 2020a).Experimental studies and statistical approaches have been employed to better disentangle the transport, retention, and resuspension of DNA in flowing waters (Harrison et al. 2019).Using artificial stream channels to measure DNA signals resulted in estimated transport distances between 54 m to 81 m (Shogren et al. 2017), whereas similar experiments placing caged fish in natural streams detected a DNA successfully at 240 m (Jane et al. 2015) or along 1.3 km downstream (Thalinger et al. 2020).Modelling approaches can upscale the predictions from experiments, for example a simple mass balance model estimated mussel DNA to be transported 35 km downstream (Sansom and Sassoubre 2017).From observational studies, DNA signals of a rare species indicated DNA transport >20 km downstream (Villacorta-Rath et al. 2021), and detections of geographically constrained species such as lake species in downstream rivers estimated transport distances of at least 10 km (Deiner and Altermatt 2014), or mussels detected more than 3 km downstream (Stoeckle et al. 2021), and around 60 km for the fish species Coregonus sp.(Pont et al. 2018).
Distances estimated from observational, experimental, or statistical approaches are widely divergent (Jo and Yamanaka 2022).Whereas experiments are usually constrained in the artificial river length and cannot robustly predict distances realized in the field, observational approaches depend on the knowledge of source populations upstream of the eDNA sampling point.This requires continuous data of species distributions along a river upstream of the eDNA sampling.The sampling of communities is limited to discrete points, but understanding the transport dynamics of eDNA is crucial for the spatial interpretation of biomonitoring and to create efficient sampling networks (Carraro et al. 2020b).
Instrumentalizing HSMs to establish the spatial linkage of populations and their genetic traces, as done here for 127 insect species, can give a more scalable approach to estimate DNA transport distances (minimal median estimated transport and average deviation is 1.06 km ± mad 1.2 km) systems across a gradient of environmental and hydrogeological conditions, and multiple species.Importantly, our approach can be adapted more widely to the monitoring of multiple taxonomic groups with variable body sizes and dispersal potential.In river systems, the combination of HSMs with eDNA monitoring could inform about species migrations, suitable habitat or even potentially identify populations that would go unnoticed otherwise.As the sampling of eDNA in rivers can integrate biodiversity information across ecosystems, the application may be especially powerful in remote and hardly accessible landscapes or for elusive species, where sampling eDNA from rivers informs not only about the distribution of aquatic but also terrestrial species upstream.Such detections can be validated through habitat suitability models and guide effective conservation and management.
In summary, the estimated transport distance is in the range of previously reported empirically assessed values and allows extrapolations of transport distances across many taxa and riverine systems.The combination of eDNA and habitat suitability models allows larger scale and spatially integrative inferences about communities, ultimately needed for the management and protection of biodiversity.

Figure 4 .
Figure 4. ( A) Conceptual illustration on how we calculated minimal distance between an eDNA detection and the closest upstream predicted HSM of the respective species.The spatial linkage between predicted suitable habitat and eDNA detections for any given species can fall into three classes (cases of these three classes numbered 1 to 3). 1) A positive eDNA signal (red) is coinciding with the subcatchment in which the HSM model predicts suitable habitat (light-green).2) A positive eDNA signal is found in a subcatchment, yet the HSM only predicts suitable habitat upstream.The distance D to this closest upstream subcatchment is calculated and used for the calculation of the minimal upstream distance to suitable habitat, and interpreted as possible eDNA signal transport through water flow.3) A positive eDNA is observed in a subcatchment, yet without predicted HSM in this or any upstream subcatchments.These cases were calculated for any species in (B).Observed minimal distance (∆ D) for every detection of a species is calculated (with eDNA, in red) and compared to randomly selected detections of species (grey) and the respective closest upstream suitable habitat.The observed minimal distances were averaged across all detections for each species.The y-axis shows the number of species for a given mean minimal distance (x-axis).