Thermal stratification and fish thermal preference explain vertical eDNA distributions in lakes

Significant advances have been made towards surveying animal and plant communities using DNA isolated from environmental samples. Despite rapid progress, we lack a comprehensive understanding of the “ecology” of environmental DNA (eDNA), particularly its temporal and spatial distribution and how this is shaped by abiotic and biotic processes. Here, we tested how seasonal variation in thermal stratification and animal habitat preferences influence the distribution of eDNA in lakes. We sampled eDNA depth profiles of five dimictic lakes during both summer stratification and autumn turnover, each containing warm- and cool-water fishes as well as the cold-water stenotherm, lake trout (Salvelinus namaycush). Habitat use by lake trout was validated by acoustic telemetry and was significantly related to eDNA distribution during stratification. Fish eDNA became “stratified” into layers during summer months, reflecting lake stratification and the thermal niches of the species. During summer months, lake trout, which rarely ventured into shallow waters, could only be detected at the deepest layers of the lakes, whereas the eDNA of warm-water fishes was much more abundant above the thermocline. By contrast, during autumn lake turnover, the fish species assemblage as detected by eDNA was homogenous throughout the water column. These findings contribute to our overall understanding of the “ecology” of eDNA within lake ecosystems, illustrating how the strong interaction between seasonal thermal structure in lakes and thermal niches of species on very localised spatial scales influences our ability to detect species.


Introduction 24
Environmental DNA (eDNA) is increasingly being used to conduct biodiversity surveys, 25 species occupancy studies, and detect endangered and invasive species (Deiner et  By contrast, the influence of water movement on eDNA transport and species detection has 66 largely been neglected for lacustrine systems. An important seasonal feature of many 67 temperate lakes is stratification, where isolated layers of water are formed. During summer, 68 the upper warm layer (epilimnion) is separated from a deep, cold layer of the lake 69 (hypolimnion) by the formation of a thermocline (a temperature-dependent density gradient) 70 between these layers. Brief periods of whole water-column mixing occur prior to and after 71 stratification in dimictic lakes during spring and autumn (Wetzel, 2001). These hydrological 72 layers give rise to distinct temperature and oxygen conditions that create different habitat 73 niches for aquatic organisms. Thus, the seasonal cycle of lake stratification can concentrate 74 organisms within, or isolate organisms from, certain habitats at different times of the year. 75 The general view is that eDNA signal is more or less homogenous in freshwater lakes and 76 ponds due to the relatively small size of such habitats when compared with the much larger 77 and less discrete marine realm. However, there have been interesting insights from studies 78 of single lakes which have found differences in eDNA community composition at the top and 79 bottom of the water column, possibly indicating a role for the thermocline in separating these 80 molecular signals (Hänfling et al., 2016 Temperature-driven habitat segregation among species of freshwater fish has the potential 120 to create depth-specific molecular signals during stratification. Temperate freshwater lakes 121 often remain stratified for about half of the calendar year. Given that warm-and cold-water 122 fishes spend most of their time at shallower and deeper depths respectively during 123 stratification, it is likely that they release the bulk of their eDNA in these habitats. The 124 general view is that eDNA signals of aquatic organisms are more or less homogenous in 125 freshwater lakes and ponds, despite the distinct thermal preferences of the fish occupying 126 these ecosystems. Thus, eDNA studies often involve the collection of surface water samples 127 only, without considering the important seasonal forces which shape thermal stratification 128 and the habitat preferences of organisms. However, there is emerging evidence that eDNA 129 can reflect local species richness and also peak in concentration during seasonal events 130 In this study we explored the impact of lake stratification and turnover on the distribution of 134 eDNA in dimictic lakes and make specific predictions for warm-and cold-water fishes. We 135 validated our results by simultaneously collecting detailed acoustic telemetry data to define 136 fine-scale habitat preferences of an obligate cold-water stenothermic fish (lake trout). We 137 hypothesised that: 1) Lake thermal stratification (i.e. summer) results in strong stratification 138 of eDNA signals for species that are highly constrained (cold-and warm-water species) and 139 less stratification for more generalist species (cool-water species) ( Figure 1A). 2) Isothermal 140 conditions (i.e. autumn turnover) result in homogenous eDNA signals for all thermal guilds of 141 fishes throughout the water column ( Figure 1B).  (Table S1). Monitoring of fish species at IISD-ELA has been conducted annually or 150 bi-annually since the 1970s, therefore the species composition of most lakes is well known. 151 There are 14 species of fish across all the study lakes (mean 8, range 6-10 species per lake, 152 Table S2). All lakes have overlapping community compositions, including lake trout 153 (Salvelinus namaycush), a cold-water top predator, in every lake. Sampling dates were 154 chosen based on decades-long records of the timing of seasonal stratification and turnover 155 (mixing) in these lakes. Moreover, temperature measurements of the water column were 156 used to confirm lake stratification or turnover at the time of sampling (Table S3). 157 Water samples were taken at six depths, evenly dispersed throughout the water column at 158 the deepest point of each lake (Table S3). Four 500 ml replicate water samples were taken 159 per depth using an electrical pump and Jayflex PVC tubing (Winnipeg Johnston Plastics, 160 MB, Canada) secured to a weight. To prevent contamination between lakes, dedicated 161 tubing was used for each lake. Moreover, to prevent contamination among depth samples 162 within a lake, the tubing was cleaned by flushing one litre of 30% bleach, then one litre of 163 distilled water, followed by a two-minute flush of depth-specific lake water through the 164 apparatus. For each sampling point, 500 ml of lake water was sampled and stored in an  Table S2). For lake trout, we collected acoustic telemetry data on depth 177 occupancy to determine seasonal habitat use and compared it with depth profiles collected 178 with eDNA data. Extensive telemetry studies conducted at IISD-ELA over the past two 179 decades have shown that the seasonal vertical distribution of lake trout is strongly influenced downloaded (~8 h duration per lake, semi-annually). The pressure sensor data were 198 converted to depth information using Vemco VUE software for each detection for the 199 duration of the study (yielding ~200-700 depth detections for each fish in a typical 24-hr 200 period). After downloading, duplicate detections (single tag signals detected by more than 201 one receiver) were removed. In order to assess whether different time periods of cumulative 202 eDNA persistence in the lakes affected the relationship between eDNA counts and telemetry 203 data, we grouped telemetry data for each fish at different temporal scales, ranging from the 204 day of eDNA sample collection, as well as one week, and one month prior to sample 205 collection. The total number of detections of all fish were grouped into depth intervals 206 reflecting the vertical distribution of the eDNA sampling (6 intervals per lake). We adjusted 207 for varying depth interval size and variation in the total amount of telemetry detections for 208 each lake over the relevant time period. 209

Molecular analysis 210
DNA was extracted from filters using the Qiagen Blood and Tissue kit. We followed the 211 manufacturer's instructions with minor modifications: 370 µl buffer ATL was used in the initial 212 incubation step, and the DNA was eluted in 2 x 60 µl of AE buffer. After elution, DNA was 213 stored at -80 ⁰C. We included a DNA extraction control of blank sample for each lake. All

Contamination prevention 236
Steps to prevent contamination were taken at each phase of work. During fieldwork, we used 237 a dedicated boat and separate tubing for each lake to prevent between-lake transfer of DNA. 238 All field equipment was decontaminated in 30% bleach and triple-washed with distilled water 239 the evening before. Nitrile gloves were used when collecting the samples and changed 240 between sampling points. The field lab used for filtering and storing of field equipment at 241 IISD-ELA had not previously been used for sampling or storage of animal tissues. Benches 242 were cleaned thoroughly with 20% bleach before use. After use, Buchner filtration funnels 243 were washed in soapy water, soaked in 30% bleach for ten minutes, and vigorously triple-244 rinsed in ultrapure water between samples. DNA extraction and pre-PCR preparation were 245 conducted in a dedicated environmental DNA lab at McGill University. The lab and 246 equipment were thoroughly cleaned with 10% bleach before and after use (e.g. surfaces, 247 floors, main shelving). Filter tips were used for all molecular work. There was no detectable 248 PCR amplification in any field, DNA extraction or PCR negative controls based on gel 249 electrophoresis, but we included all blanks for sequencing. 250

Bioinformatics 251
We used custom scripts to remove adapters, merge paired sequences, check quality and 252 generate amplicon sequencing variants (ASVs). Samples were received as demultiplexed 253 fastq files from Génome Québec. Non-biological nucleotides were removed (primers, indices 254 and adapters) using cutadapt (Martin, 2011). Paired reads were merged using PEAR 255 Information). We used a custom reference database which contained only fish known to 266 exist in the Lake of the Woods region (Ontario, CA), downloaded from NCBI. Biomonitoring 267 has been ongoing since the 1960s so there is a well-developed knowledge of species 268 composition in this area. We also compared our assignments against the full NCBI database 269 and found only one additional fish ASV with the larger database. This matched to the 270 Hypophthalmichthys genus (a carp), which is not known to exist at IISD-ELA but appeared at 271 high abundance in one sample, possibly indicating a laboratory false positive. Other 272 taxonomic groups appeared at very low frequencies when our ASVs were matched against 273 the NCBI database, such as bacterial, mammalian and bird taxa, but as they were not the 274 focus of our study they were excluded. 275

Statistical approach 276
We used a variance stabilising transformation on our sample x ASV matrix to account for 277 uneven library size across our samples. Unlike rarefaction, this approach does not discard 278  We examined the relationship between fish community assemblages and the interaction 286 between lake depth and lake state (stratified or isothermal) with PERMANOVA analysis. We 287 used a Bray-Curtis distance matrix on our transformed sample x ASV matrix as the response 288 variable. We tested the interaction between lake depth (coded as a continuous variable) and 289 lake state on community composition, specifying 5000 permutations constrained within lake 290 "strata". We then tested for homogeneity in multivariate dispersion between our groups with 291 the function betadisper. We used non-metric multi-dimensional scaling to visualise fish 292 communities, by specifying either 2 or 3 dimensions (to minimise stress and achieve 293 convergence) and 200 random starts. 294 We explored the contribution of each species to seasonal differences in ASV counts at 295 different depths by fitting mixed effects models. We used ASV count for each species in 296 each sample as the response variable modelled as the interaction between lake state (i.e. 297 stratified or isothermal), depth of sample, and fish species to investigate whether 298 stratification and turnover had variable effects for different species. We implemented 299 negative binomial mixed effects models with lake identity as a random effect in glmmTMB 300 us to control for library size while retaining interpretable response data (for example, in 303 comparison to transforming variables which has been used in other studies). We also fitted 304 several reduced models and compared these with AIC, always retaining the lake identity as 305 a random effect term due to the nature of the experimental design. Once we had selected 306 our best-fitting model with AIC, we confirmed the significance of the highest-level interaction 307 term with a likelihood ratio test. Final models were evaluated for overdispersion. 308 We fitted a second series of mixed effects models to examine the relationship between the 309 strength of eDNA signal in the water and habitat use by lake trout as detected by acoustic 310 telemetry. We fitted the counts of lake trout ASVs as the response variable, and the 311 interaction between lake state (stratified or isothermal) and telemetry detections as the 312 explanatory variables, as this would allow the relationship to vary according to differential 313 habitat use and presence of the thermocline. We implemented negative binomial mixed 314 effects models with lake identity as a random effect in glmmTMB (Brooks et al., 2017), again 315 using the total library size (DNA sequence counts for each sample) as a log offset in the 316 model. This analysis was performed for each of the three temporal datasets of telemetry 317 data collected (one day, one week and one month before the point of sampling), to test 318 whether differences in the temporal range of habitat selection better explained the 319 distribution of eDNA, as it is known to persist in the water column for several days to weeks. 320 Several simpler models with a reduced fixed effects structure were fitted for each temporal 321 dataset, and we compared all models with AIC. 322

Results 323
Thermal habitat structure 324 Temperature profiles in each lake confirmed that eDNA sampling occurred during 325 stratification and turnover (isothermal or near-isothermal conditions) within the lakes under 326 study (Table S3). The thermocline was confirmed as being between 4.60-6.60 m from the 327 surface (approximately between eDNA sampling depths two and three for most lakes). 328 These patterns are typical of those found in previous years during peak stratification and 329 turnover for lakes in this region (Sichewski & Cruikshank, 1998). 330

Recovery of eDNA sequences and taxonomic assignment 331
We recovered 94,013 ± 6,389 sequences per demultiplexed sample with an initial quality 332 score of 33.0 ± 0.23. After removing adapters, discarding low quality sequences, merging 333 paired end sequences and length filtering we retained 76,734 ± 5,954 sequences per 334 sample. From the entire dataset we created 373 ASVs, onto which we were able to map 335 back 98.6% of filtered sequences (Table S5). A total of 28 ASVs were assigned to fish 336 species known to exist at IISD-ELA. Although this number was small as a proportion of the 337 total number of ASVs, 95.1% of all the filtered sequences in the dataset belonged to fish 338 found at IISD-ELA. The ASVs from other taxonomic groups had very low numbers of reads. 339 This indicates that most sequences in our dataset belong to fish from this geographic region, 340 rather than resulting from the amplification of non-target taxonomic groups (e.g. bacteria, 341 birds and mammals, which appeared at very low frequencies, Figure S1). 342 In the mock community, we made 19/27 correct detections at species level. Of those not 343 detected at species level, four were detected at genus level (i.e. the last common ancestor 344 algorithm assigned a match of the correct genus with no species name), two were detected 345 at family level (i.e. the correct family but no species or genus given by the last common 346 ancestor algorithm), one had many congenerics detected although not the correct species, 347 and one could not be detected at any level. 348 The eDNA samples detected the majority (12/14) of fish species confirmed by both historical 349 and present-day fishing surveys as being present in these habitats. The two species which 350 were not detected (brook stickleback and longnose dace) are known to prefer near-shore 351 and stream habitats and are also noted as being rare in many of these lakes, and thus 352 sampling at the centre point of the lake may not be optimal to detect them at these times of 353 year. We were able to assign the majority of ASV sequences at species-level using the last 354 common ancestor algorithm with two exceptions. Coregonus artedi could only be assigned 355 at genus level, as a closely related congener (Coregonus clupeaformis) also exists in this 356 region (although C. clupeaformis is not present in any of our study lakes). Chrosomus 357 neogaeus and Chrosomus eos were both assigned at genus level, possibly because pure 358 Chrosomus eos does not exist in this region but instead forms both cytoplasmic and nuclear 359 hybrids with Chrosomus neogaeus (Mee & Taylor, 2012). 360

Fish community assemblages 361
During stratification, the relative proportions of ASVs from each species per sample changed 362 dramatically at different depths in the lakes ( Figure 3A). The overall species composition of 363 the lakes was the same, yet species detection differed greatly at certain depths; with the 364 greatest change taking place between points 2 and 3, which demarcates the thermocline in 365 most lakes. For example, eDNA from cold-water stenotherms could only be detected in large 366 proportions at the bottom of the lakes during lake stratification (lake trout Salvelinus 367 namaycush and slimy sculpin Cottus cognatus). Lake trout were not detectable at all at the 368 shallowest measurement points (1-1.5 m from the surface) at this time. Warm-water minnow 369 species, which habitually inhabit shallow and littoral waters (e.g. Chrosomus neogaeus, 370 Margariscus margarita, and Pimephales promelas), were detected in much greater 371 proportions at the surface, with large decreases in the proportions of sequences in samples 372 taken from below the thermocline. eDNA from cool-water eurytherms was distributed across 373 all sampling depths, with the exception of Coregonus, which was only abundant at points 2 374 and 3 and could barely be detected at either the shallowest or deepest depths. 375 During lake turnover in late autumn, fish community detection by eDNA was much more 376 homogenous throughout the different depths of the lake ( Figure 3B Coregonus detections were no longer concentrated to the middle of the water column but 385 could be detected at shallow and deep depths as well. 386 There was a significant interaction between lake depth and lake state affecting fish 387 community assemblages detected by eDNA (PERMANOVA, F1,335 = 4.35, p = 0.0002). This 388 result indicates that fish communities were detected throughout the water column differently 389 if the lake was stratified or isothermal. NMDS plots for each lake showed that communities 390 were clearly grouped by lake state (Figure 2), with distinct communities detected during 391 stratification and turnover in most lakes. This result was confirmed by our mixed effects 392 modelling approach to describe the distribution of fish ASV counts. The model which best fit 393 the data included the three-way interaction between lake state (stratified or isothermal), 394 eDNA sample depth, and fish species as an explanatory factor, when compared to any 395 reduced model (ΔAIC 92.7). A full list of the reduced models that we tested and their AIC 396 scores appears in Table S6. The three-way interaction between lake state, sample depth 397 and species was highly significant (likelihood ratio test = 112.7, p < 0.001). eDNA from 398 different fish species was distributed across the vertical column differently in each water 399 mixing period. 400 Relationship between eDNA and lake trout habitat use 401 Lake trout eDNA was primarily concentrated in the bottom half of lakes ( Figure 4A) during 402 lake stratification (corresponding to points deeper than 6.25 -16.5 m depending on the lake 403 sampled). During lake turnover, lake trout eDNA was very abundant at all points in the water 404 column, with no clear patterns according to sampling depth. Acoustic telemetry showed the 405 lake trout inhabited the bottom two thirds of the water column during stratification, although 406 they were less likely to occupy the deepest depths ( Figure 4B red bars, median depth of 407 telemetry detections = 7.74 -11.90 m). During turnover, lake trout primarily selected habitat 408 in the top third of the water column, with frequency tailing off at the deepest part of the lake 409 ( Figure 4B blue bars, median depth of telemetry detections = 1.73 -6.51 m). The difference 410 between median depths of fish one month and one week before, as well as the day of 411 sampling was not large (Table S7). 412 The top ranked model to explain lake trout eDNA counts included the interaction between 413 lake state (stratified or isothermal) and telemetry detection frequency for the month prior to 414 the day of sampling (log(lake trout ASV counts) = -2.14 + 6.80telemetry + 0.97turnover -415 6.02telemetry x turnover). There was a positive correlation between lake trout telemetry 416 detections and eDNA counts during lake stratification, but no relationship during turnover 417 ( Figure 5). There were also five other models within two AIC counts of the top ranked model, 418 which could be considered as having equal explanatory power (all models are listed in Table  419 S8). These included a model with only the two main effects (no interaction) for average 420 telemetry detections for the data from a month prior to sampling, as well as models with and 421 without the interaction term for both the week prior to sampling, as well as the day of 422 sampling, indicating that there were not large differences in the abilities of the different 423 temporal groupings of telemetry detections to predict lake trout eDNA. 424

Discussion 425
Our study was designed to test the influences of lake stratification and mixing on eDNA 426 distribution within the framework of a replicated, whole-lake experimental design. Our results 427 demonstrate that eDNA signals show very strong seasonal stratification during summer and 428 mixing during autumn in a manner that closely reflects the thermal preference of fishes. We 429 detected large differences in fish community composition during different lake states ( Figure  430 2). During stratification, the most dramatic changes in community composition measured 431 with eDNA took place in samples above and below the thermocline: warm-water fish eDNA 432 was stratified above the thermocline and cold-water fish eDNA was concentrated below the 433 thermocline ( Figure 3). These differences were observed even across very small spatial 434 scales (<30 m) between shallow and deep sampling points. By contrast, during lake 435 turnover, eDNA of all fish species was relatively homogenous throughout the water column. 436 Few studies have managed to weigh the relative importance of abiotic and biotic influences 437 on the distribution of eDNA -in this system, the two are intrinsically linked through 438 bioenergetic requirements of fish which are manifest as thermal preferences. Thermal 439 density gradients of lake water during stratification create distinct microhabitats for lake trout 440 that provide suitable oxythermal habitat, which is generally defined as the volume of the lake 441 that is <15°C with >4 mg L -1 DO (Plumb & Blanchfield, 2009). In late summer, optimal 442 oxythermal habitat for lake trout is greatly reduced, concentrating this species into a narrow 443 band within lakes that is often only a few meters thick (Plumb & Blanchfield, 2009). As a 444 result, lake trout eDNA becomes localised due to narrow habitat selection by this cold-water 445 stenotherm and the presence of the thermocline, which restricts water mixing between the 446 epilimnion and hypolimnion (Wetzel, 2001). This is an important finding for the design of 447 eDNA sampling studies, given that our study lakes are some of the smallest capable of 448 supporting lake trout habitat. During lake turnover, the shallow-water presence of lake trout 449 (shown by acoustic telemetry results to be in the top third of the water column) is decoupled 450 from the distribution of eDNA signals, highlighting the role that water column mixing may 451 have to play in dispersing the eDNA signal (Figure 4) (Table S2). Moreover, eDNA sampling during lake turnover showed 466 a much more equitable distribution of eDNA signals for warm-water minnow species. Thus, 467 the contribution of water mixing to transporting warm-water fish eDNA to the bottom of the 468 lake and shaping the distribution of eDNA is likely to be considerable. Interestingly, the 469 minnows in our study lakes are classified as littoral-benthic species, spending the majority of preservation of DNA has already been called for to limit false negatives (Ficetola et al., 503 2015), but it is apparent from our analysis that carefully planning the timing of sampling 504 and/or location of samples is highly important, when a difference of even a few metres could 505 alter conclusions regarding species presence or absence. Maintaining the status quo of a 506 surface sampling approach during the summer months will exclude or limit the consistent 507 detection of cold-water species during periods of seasonal stratification, resulting in poor 508 representation of these species in datasets. By planning monitoring campaigns for lake 509 turnover, practitioners can use surface samples (which are often easier and faster to collect) 510 to reliably sample fish species with a wide range of bioenergetic requirements. If sampling 511 must be carried out during lake stratification, cold-water species can be targeted by sampling Yet, there are many interacting facets that control the rates of production, transport and 530 decay of eDNA within ecosystems that cannot be observed within small artificial systems, as 531 has been argued in other areas of ecology which make use of mesocosm studies 532 (Carpenter, 1996). Equally, the ecological significance of these factors cannot be tested 533 when examined in isolation (Carpenter, Chisholm, Krebs, Schindler, & Wright, 1995).  Acknowledgements 546 We are indebted to the many IISD Experimental Lakes Area students and staff for 547 maintaining records of field data and for logistical assistance with this project. S Michaleski, 548 release in response to fish habitat selection and lake stratification/turnover. A lake during stratification (A) has isolated layers of water due to the formation of a temperature-dependent density gradient. There is minimal mixing between upper (epilimnion) and lower (hypolimnion) layers. Fishes select habitat due to bioenergetic requirements: this diagram shows potential habitat selection by warm-water, cool-water (able to inhabit all layers of the lake), and cold-water fishes. eDNA is released into stratified water layers and is slow to mix between the layers of the lake. Symbols represent the eDNA of warm-water fish (red squares), cool-water fish (open grey circles) and cold-water fish (filled dark blue circles). By contrast, during lake turnover (B) there is an isothermal water column with mixing between deep and shallow waters. Cold water fishes are now able to inhabit the entire water column. eDNA of all species is thoroughly mixed throughout the water column. Panel C shows temperature changes with lake depth during lake stratification (red line) and lake turnover (blue line) for Lake 373 during the 2018 sampling season. Figure 3: Proportional barplot shows the relative species composition detected by amplicon sequencing variants (ASVs) of all lakes combined during lake stratification (A) and lake turnover (B), at different sample intervals in the water column. The depth variable comprises of six evenly spaced vertical sampling points in the water column, and thus absolute measurements will vary for lakes of different depths. Point 1 is the shallowest measurement near the surface of the lake. Fish species are arranged in order of warm to cold thermal guilds (Table S2). Figure 4: Lake trout amplicon sequencing variants (A) and lake trout telemetry detections (B) ordered by lake depth with stratified samples in red, turnover samples in blue. The depth variable is comprised of six evenly spaced vertical sampling points in the water column, and thus absolute measurements will vary for lakes of different depths (minimum lake depth = 13.2 m, maximum lake depth = 30.4 m). Point 1 is the shallowest measurement near the surface of the lake. Telemetry signal counts are expressed as a proportion of the total telemetry counts for that lake over the previous month. Depth interval size is also controlled for. Figure 5: Model predictions from the best fit model to explain lake trout amplicon sequencing variants (ASVs). The best fit model included the interaction between seasonal water column thermal structure and proportion of telemetry signals in that depth interval. Telemetry signal counts are expressed as a proportion of the total telemetry counts for that lake over the previous month, and depth interval size is also controlled for. Shaded error bars are 95% confidence intervals.