Archaea in boreal Swedish lakes are diverse, dominated by Woesearchaeota and follow deterministic community assembly

forest lakes. Archaea-speci ﬁ c sequencing revealed that freshwater archaeal diversity could be partly explained by lake variables associated with nutrient status. Combined with deterministic co-occurrence patterns this ﬁ nding suggests that ecological drift is overridden by environmental sorting, as well as other deterministic processes such as biogeographic and evolutionary history, leading to lake-speci ﬁ c archaeal biodiversity. Acetoclastic, hydrogenotrophic and methylotrophic methanogens as well as ammonia-oxidizing archaea were frequently detected across the lakes. Archaea-speci ﬁ c sequencing also revealed representatives of Woesearchaeota and other phyla of the DPANN superphylum. This study adds to our understanding of the ecological range of key archaea in freshwaters and links these taxa to hypotheses about processes governing biogeochemical cycles in lakes.


Introduction
Lakes around the globe receive, transport and transform sizable amounts of carbon (Battin et al., 2009;Tranvik et al., 2009). Yet the magnitude of their role in global carbon cycling remains uncertain and may rest to a large extent on poorly understood microbes that drive ecosystem-scale processes. Moreover, these freshwater systems are thought to be particularly sensitive to climate warming (Schneider and Hook, 2010) enhancing microbial productivity (Prowse and Stephenson, 1986;Rouse et al., 1997) and ultimately biogeochemical processes (Thornton et al., 2015;Wik et al., 2016) and water quality (Roulet and Moore, 2006;Weyhenmeyer et al., 2016).
To construct accurate models for water quality and predict the role of lakes in global biogeochemical cycles, such as the production of greenhouse gases (GHGs) including carbon dioxide (CO 2 ), methane (CH 4 ) and nitrous oxide (N 2 O), it is essential to understand the microbes underpinning these processes. Most studies specific to freshwater archaea have focused on sediments and wetlands because these habitats are important CH 4 sources (Bastviken et al., 2004;Bridgham et al., 2013). Pelagic and oxygenated freshwaters are, however, also a source of CH 4 (Grossart et al., 2011;Bogard et al., 2014;Angle et al., 2017, Biži c et al., 2020 and show considerable archaeal diversity (Auguet et al., 2011;Hugoni et al., 2013).
In contrast to extensive studies of bacterial (e.g. Newton et al., 2011;Eiler et al., 2012;Savio et al., 2015;Ruiz-González et al., 2017) and to a lesser extent eukaryotic microbial diversity (Debroas et al., 2017) in freshwater systems, archaeal diversity is largely under-sampled in freshwaters. Because archaea form only a minor fraction of the microbial community, their diversity is poorly represented in studies using PCR primers that amplify both bacterial and archaeal 16S rRNA genes (Caporaso et al., 2012;Wang et al., 2012;Klindworth et al., 2013;Thompson et al., 2017). Despite their low abundance, archaea are expected to be a critical component of the microbial community in freshwaters, due to their role in CH 4 and nitrogen cycling and the unknown metabolic potential of the recently described archaeal lineages.
The best-known archaea in the pelagic zones of freshwaters are CH 4 producing archaea (methanogens) and ammonia-oxidizing archaea (AOA). The activity of methanogens is the main source of biogenic CH 4 , and nitrification by AOA contributes to N 2 O production. Methanogens often found in lakes include Methanobacteriales, Methanosarcinales and Methanomicrobiales (Earl et al., 2003; that are particularly abundant in the anoxic bottom waters of dystrophic lakes (Peura et al. 2015). AOA comprise the phylum Thaumarchaeota and groups such as Nitrosoarchaeumlike (group I.1a) and Nitrosotalea-like (SAGMGC-1) archaea. AOA have been shown to inhabit in particular oligotrophic surface freshwaters (Pouliot et al., 2009;Hu et al., 2010;Auguet and Casamayor, 2013;Berdjeb et al., 2013;Hugoni et al., 2013;Mukherjee et al., 2016). Much less is known about the distribution and drivers of planktonic freshwater archaea other than methanogens or AOA. In a survey of high-altitude lakes, the main archaeal groups were Woesearchaeota and Parvarchaeota, recently described lineages with poorly defined functions (Ortiz-Alvarez and Casamayor, 2016). In a comparison of two alpine lakes differing in trophic status, environmental factors explained considerably less spatio-temporal variation of the archaeal community than in a parallel study on bacteria (Berdjeb et al., 2013). In summary, there is a need for more extensive phylogenetic sampling and characterization of the habitat preferences of archaea in freshwaters, as well as to develop hypotheses about the assembly of the archaeal community in freshwaters.
The structure of the archaeal community can be assumed to depend on the balance between stochastic and deterministic processes. Stochastic processes (i.e. ecological drift) will result in random combinations of taxa, whereas, if deterministic processes dominate, predictable patterns of taxa distributions and abundances will emerge. Deterministic processes have been shown to override random processes in macroorganisms (Gotelli and McCabe, 2002), and this has also been demonstrated for bacteria and eukaryotes in a number of habitats (Horner-Devine et al., 2007;Ofiteru et al., 2010;Caruso et al., 2011;Eiler et al., 2011;Vanwonterghem et al., 2014). Important deterministic processes that determine community composition are thought to be environmental filtering and species interactions (Diamond, 1975;Gotelli and McCabe, 2002), as well as biogeographic and evolutionary history (Vuilleumier and Simberloff, 1980;Cracraft, 1988). However, to our knowledge, it has not been tested if the dominance of deterministic processes also applies to freshwater archaeal communities.
Here, we use high-throughput amplicon sequencing of (i) a universal region of the small subunit (SSU) rRNA gene (V6-V8) covering all three domains, and (ii) the V3-V5 region of the archaeal 16S rRNA gene (Gantner et al., 2011) to obtain a detailed measure of archaeal diversity in surface water of 109 boreal lakes. We compare phylogenetic diversity of lake archaea to environmental variables such as catchment land use/cover, hydrological, and geochemical properties to determine what variables best correlate with the distribution of archaeal taxa, and the roles random (i.e. ecological drift) and deterministic (i.e. environmental sorting and dispersal) factors in archaeal community assembly.

Characteristics of surveyed lakes
We explored and quantified variation in the diversity of epilimnic freshwater archaea at latitudes ranging from 55.4 to 68.3 (Fig. S1) representing globally the latitudes with the highest concentration, area and perimeter of inland water bodies (Verpoorter et al., 2014). The sampled lakes span from nemoral to arctic (subalpine) vegetation zones and represent summer conditions as they were all sampled during August 2014. Metadata associated with each of the 119 sampled lakes (from 109 sufficient archaeal sequences were retrieved), or at least a substantial subset thereof, included latitude and longitude, temperature, chlorophyll concentration, nutrients and catchment characteristics (for summary statistics see Table S1).

Archaeal contribution and diversity in freshwater lakes
Based on universal sequence reads of archaeal, bacterial and eukaryotic SSU rRNA genes, archaeal SSU rRNA genes comprised from 0.03% up to 1.5% of the overall diversity (Fig. S2). This range with the average of 0.8% of archaeal reads corresponds to the lower ranges of previous studies of archaeal relative abundances in lakes (Pernthaler et al., 1998;Glöckner et al., 1999;Keough Fig 1. Correlation matrix between alpha diversity measures of archaea and environmental variables of the study lakes. The size of the symbol is inversely related to the P-value (correlations with P values >0.05 are not shown), while colour coding indicates the correlation coefficient (r > 0 in blue, r < 0 in red). Abbreviations: nitrite and nitrate nitrogen (NO 2 + NO 3 ), silicate (Si), ammonium nitrogen (NH 4 ), total phosphorous (TP), total organic carbon (TOC), total nitrogen (TN), phosphate phosphorus (PO 4 -P); specific ultraviolet absorbance (SUVA); diversity measures: Simpson diversity index (Simpson, InvSimpson), Shannon index (Shannon), phylogenetic diversity (PD), species richness (SR), Chao 1 richness estimator (Chao1), abundance based coverage estimator (ACE); oxygen concentration (O 2 ); catchment characteristics: percentage of Water, Agriculture, and Wetland. [Color figure can be viewed at wileyonlinelibrary.com] Auguet and Casamayor, 2008;Ortiz-Alvarez and Casamayor, 2016). Partial least squares modelling revealed that the relative abundance of archaeal in relation to bacterial and eukaryotic SSU rRNA genes increased with catchment land cover classified as forests and wetlands (Fig. 2). The highest relative abundances were found in dystrophic lakes. Many dystrophic lakes are characterized by anoxic bottom waters (hypolimnion) where electron acceptors for respiration are highly depleted. Accordingly, the genomes of hypolimnion microbes show the potential for fermentation and methanogenesis (Peura et al. 2015;Peura et al. 2018). A high prevalence of archaea also in the surface waters (epilimnion) of these net-heterotrophic, greenhouse-gasemitting systems can be speculated to be the result of anoxic microenvironments (Grossart et al., 2011) and their transitory occurrence in oxygenated waters.
In silico analysis of the selected PCR primers against the SILVA 16S rRNA database predicted 86.8% coverage of archaea for the archaea-specific primers and 71.5% coverage of archaea for the universal primers. The archaeal primers had only minor cross-domain amplification predicted which was confirmed in the sequencing: on average, 95.2% (range 72.2%-100% per sample) of the archaeal sequencing reads mapped to sequences of the target domain. While our approach targeted most of the archaeal diversity deposited to databases at the time of analysis, recently discovered archaeal groups emerging through random shotgun metagenomic sequencing were potentially missed by our amplicon sequencing approach. Only 4.9% of Asgardaeota in SILVA matched the archaea-specific primers, and we determined an incomplete matching to Altiarchaeota (40.7%), Diapherotrites (52.5%) and Nanoarchaeota (58.5%). The selection against the unknown diversity is a well-known limitation of primer-based sequencing approaches (Karst et al., 2018), and thus uncertainty in the taxonomic coverage needs to be accounted for in data interpretation.
The archaea-specific amplification detected in total 119 483 archaeal amplicon sequence variants (ASVs) with an average of 1309 ASVs per sample (range 107-4024). Rarefaction analysis suggested that a large part of the diversity in individual lakes was recovered (Fig. 3A). Most of the archaeal ASVs occurred in single lakes (90.6%) with another 7.1% occurring in two or three lakes. The remaining 2.0% of the ASVs (n = 2488) were found in more than three lakes. In these lakes, these ASVs were the dominant ASVs with a combined relative abundance ranging from 22.7% to 82.4% per lake and an average of 50.1% across the lakes (Fig. S3A). There were 346 ASVs that occurred in more than 10 lakes, and six ASVs (including a Woesearchaeia, a Methanobacterium, two Methanosaeta and two Methanoregula ASVs) that occurred in more than 50 lakes. The most prevalent ASV (a Woesearchaeia) occurred in 76 lakes (prevalence distribution of the ASVs is shown in Fig. S3B). The high number of ASVs that were restricted to a single lake together with the low prevalence of most ASVs (Fig. S3C) indicates high among-lake richness. High richness was confirmed using species accumulation curves (Fig. 3B). Unsaturation strongly suggests that the ASV pool is widely undersampled along the broad environmental gradients sampled, despite sequencing effort with almost 120 000 unique ASVs detected. However, although these results are based on denoised data, which aim to remove artificial diversity introduced by PCR amplification and sequencing, outputs likely include incorrect sequences that can inflate richness and distort community composition.

Community-level biogeography corresponds with lake characteristics
We examined trends in archaeal diversity and richness in relation to lake physicochemical variables, latitude and catchment land cover. Archaeal diversity measures (phylogenetic PD, inverse Simpson, Shannon, Fisher indices, ACE, Chao1, observed ASVs) were highly intercorrelated (r > 0.7, P > 0.001) and positively correlated with TOC, TP, TN, chlorophyll a, conductivity (ion concentrations) and turbidity (Fig. 1). All of these environmental variables are indicative of productivity, suggesting that high productivity enhances archaeal richness. Increased richness with increased productivity, as predicted by ecological theory (Cardinale et al., 2009;Duffy et al., 2017), was also corroborated by positive correlations with the combined bacterial, eukaryotic and archaeal diversity in the universal primer dataset.
Archaeal beta diversity across the lakes showed a similar pattern of community composition when assessed either by Bray-Curtis distance or by UniFrac distance (Procrustes superimposition; R = 0.261, P = 0.001). Among-lake community dissimilarity based on either of the two distance measures was poorly explained by the spatial distance between the lakes (Partial Mantel test controlling for environmental variables, R unifrac = 0.03, P unifrac = 0.19, R bray = 0.08, P bray = 0.011). This result does not support a distancedecay relationship of the archaeal species distribution, similar to what has been described for bacteria (Bell, 2010). Controlling for the effects of between-lake geographic distance revealed that archaeal community structure was weakly correlated with the measured environmental variables (Partial Mantel test, R unifrac = 0.18, P unifrac = 0.04, R bray = 0.17, P bray = 0.001). Correlation showed that archaeal phylogenetic composition was associated with lake productivity (nutrients, TN and TP and chlorophyll concentrations), as well as with pH, conductivity and other lake physicochemical variables (Table 1).  As observed in other large-scale spatial studies on bacteria and protists (e.g. Lima-Mendez et al., 2015;Thompson et al., 2017), archaeal beta diversity was significantly related to many potential explanatory variables, indicating that environmental sorting contributes to structuring archaeal communities over large regional scales. However, ecological drift, a stochastic process, should be the driving force of archaeal community assembly when the basic entities of diversity are assessed at the ASV level. Populations of ASVs are expected to be small and inhabit geographically isolated habitats (i.e. lake systems), a pattern that is supported by the low abundance and narrow distribution of most archaeal ASVs in our study lakes. Regarding the low abundant ASVs, demographic stochasticity is expected to play a strong role. In addition, drift processes may dictate the likelihood of population detection when dormant life stages, such as a seed bank, dominate the community (Lennon and Jones, 2011). The consequences of ecological drift are that ASV abundances fluctuate randomly, increasing the differences among otherwise equivalent communities (Gilbert and Levine, 2017). Such fluctuations potentially explain the high number of ASVs with low prevalence, often occurring only in single lakes.
To assess the importance of random versus deterministic factors shaping the archaeal communities, we performed Monte Carlo simulations on the co-occurrence patterns of the ASVs in the study lakes. We observed a positive Standard Effective Size (5.34) significantly different from random (P < 0.001) indicative of low cooccurrence and deterministic processes being important in community assembly. Although the random ecological drift is suggested to be ever present, important deterministic processes that determine archaeal community composition can be environmental filtering and species interactions (Diamond, 1975;Gotelli and McCabe, 2002), as well as biogeographic and evolutionary history (Vuilleumier and Simberloff, 1980;Cracraft, 1988). According to the size-dispersal hypothesis, small organisms such as archaea are more likely affected by species sorting than dispersal limitation, because organisms with a size on the micrometer-scale can widely disperse (Cottenie, 2005;Beisner et al., 2006;Shurin et al., 2009). Thus, the distribution is mainly a reflection of the environmental properties (Farjalla et al., 2012). Lack of distance decay in the archaeal community composition along our geographic gradient, as suggested by the partial Mantel tests, further emphasizes that environmental sorting plays a more prominent role than deterministic dispersal processes such as mass effects or biogeographic history in community assembly of archaea. Our finding corroborates results from the study on bacteria (Lindström and Langenheder, 2011).

Taxonomical distribution of archaea across freshwater lakes
The most abundant archaeal classes across the successfully sequenced 109 lakes were Woesearchaeia, Methanomicrobia and Nitrosophaeria (Fig. 4A). As there is a current revolution in archaeal taxonomy, we also linked SILVA taxonomy (v. 132) with other current taxonomic hierarchies (Castelle et al., 2015). The Woesearchaeia (previously DHVEG-6 and Parvarchaea, also termed Woesearchaeota phylum), which are recently discovered members of the DPANN superphylum (Castelle et al., 2015), were the most dominant class in our dataset in the total number of reads and the number of unique ASVs (n > 58 000). While most lakes were dominated by Woesearchaeia, in a fifth of the lakes the dominant archaeal class was Methanomicrobia or Nitrosphaera (Fig. 4A). Redundancy analysis indicated that nutrient status and the aromatic character of the dissolved organic matter are important explanatory variables underpinning the taxonomic shift at the class level (Fig. 4B). However, if this shift reflects the metabolic predictions from the available Woesearchaeota genomes such as potentially fermentative metabolism (Castelle et al., 2015;Lazar et al., 2017) or symbiotic lifestyle (Castelle et al., 2015) is still unknown.

Methane-cycling archaea and their distribution
Lakes are sources of GHGs such as CH 4 , CO 2 and N 2 O, with archaea playing a particularly important role in CH 4 cycling. Consistent with other freshwater systems such as wetlands Bridgham et al., 2013;Narrowe et al., 2017), the second most dominant archaeal class was methanogenic Methanomicrobia. We identified multiple methanogen groups including hydrogenotrophic methanogens (Methanoregula, Methanobacterium), acetoclastic methanogens (Methanosaeta) and as minor groups methylotrophic methanogens that use methylated compounds such as methylamines and methanol to produce CH 4 (Methanomassiliicoccales, Candidatus Methanofastidiosa; Nobu et al., 2016). The most abundant methanogen genera in our dataset were hydrogenotrophic Methanoregula and acetoclastic Methanosaeta (Fig. 5A). As in previous studies of freshwater lakes and wetlands, these two genera were frequently identified together. While their relative dominance has been suggested to depend on factors such as pH, season and carbon availability (Kotsyurbenko et al., 2007;Sun et al., 2012;He et al., 2016), our study points to the importance of nutrient status and catchment land cover as important determinants. For example, Methanoregula showed the highest relative abundances in lakes with a forested catchment, while in eutrophic (phosphorus and nitrogen rich) lakes Methanobacterium and Methanolinea had their highest relative abundances (Fig. 5B). In two lakes, the dominant methanogen genus was Ca. Methanoperedens. Members of this taxon have been shown to conduct anaerobic oxidation of CH 4 using nitrate (Raghoebarsing et al., 2006;Haroon et al., 2013;Arshad et al., 2015) or Fe(III) and Mn(IV) (Ettwig et al., 2016) as electron acceptors. Related sequences have also been linked to anaerobic oxidation of CH 4 coupled to sulfate reduction in freshwaters (Schubert et al., 2011;Timmers et al., 2015). As mentioned earlier, it can be speculated that the detected methanogens may either represent transient members originating from anoxic environments or inhabit anoxic microenvironments (Grossart et al., 2011) in the oxygenic part of the water column. Either way, their presence and taxonomic composition suggest an active role in CH 4 cycling of freshwater lakes.

Ammonia-oxidizing archaea
Thaumarchaeota classes detected in the lakes included Group 1.1c, SCGC_AB-179-E04 and most prominently Nitrosophaeria including Ca. Nitrosotalea and Ca. Nitrosoarchaeum, which are described as AOA. In freshwater ecosystems, especially those with high allochthonous inputs such as dystrophic lakes, increased nitrogen supply could promote the occurrence of ammonium oxidizers. AOA, however, are favoured at low levels of nitrate, low light and low pH (French et al., 2012;Hatzenpichler, 2012). Accordingly, Nitrosophaeria showed the highest relative numbers in lakes with low pH and TN (Fig. 5B). Recent genome analysis of a Ca. Nitrosotalea devanaterra revealed genes encoding both a predicted high-affinity substrate acquisition system and potential pH homeostasis mechanisms, which were expressed during acidophilic growth (Lehtovirta-Morley et al., 2016). While Ca. Nitrosotalea were abundant in dystrophic lakes, Ca. Nitrosoarchaeum, previously found in low-salinity habitats worldwide (Blainey et al., 2011), were the abundant members of Nitrosophaeria in most other lake types.

Novel taxa with potentially versatile metabolic roles
In addition to canonical methanogens, we detected Bathyarchaeota and Verstraetearchaeota (Ca. Methanomethylicus; Vanwonterghem et al., 2016), which are hypothesized to represent methylotrophic methanogens.
The class Bathyarchaeia contributed highly to the archaeal community in lakes with agricultural land use in their catchments, as indicated by redundancy analysis (Fig. 4B). The detected Bathyarchaeia were highly diverse, represented by more than 7761 ASVs that were collectively abundant across the lakes. Bathyarchaeia are among the most abundant organisms reported in marine and freshwater sediments globally (Biddle et al., 2006;Lloyd et al., 2013;Fillol et al., 2015;Lazar et al., 2015, Wurzbacher et al., 2017. Verstraetearchaeia, occurring as a minor archaeal class in our dataset, appear to be capable of fermentation utilizing sugars as a carbon source and generating acetyl-CoA via the Embden-Meyerhof-Parnas pathway and pyruvate-ferredoxin oxidoreductase (Vanwonterghem A B Fig 5. Taxonomic composition of the archaeal methanogenic genera (following SILVA 132 taxonomy) (A) and their number of reads (relative abundance in percentage) in relation to environmental variables as inferred by redundancy analysis (B) with the 15 most abundant genera represented. In lake system names, 'slu' refers to the lakes of the national sampling campaign and 'jam' and 'asa' to separate sampling campaigns.
[Color figure can be viewed at wileyonlinelibrary.com] et al., 2016). Combined with the taxonomic results discussed above, our findings support the diverse functional roles of archaea in freshwater systems, beyond CH 4 and ammonium metabolism.

Conclusion
Monte Carlo simulations on archaeal ASV patterns suggested that freshwater archaeal communities were shaped by deterministic processes such as environmental sorting as well as biogeographic and evolutionary history, overriding stochastic processes such as ecological drift. Large among-lake variability was reflected in the cumulative rarefaction curves of archaeal ASVs that did not saturate over the lake gradient, although many rarefaction curves of archaeal ASVs in individual lakes did reach an asymptote. Furthermore, our observations of a coupling between productivity indicators and archaeal richness provide evidence for the generality of the productivity-diversity relationship across all organismal kingdoms.
We revealed a marked phylogenetic diversity of archaea in freshwater lakes, not restricted to functionally characterized groups such as canonical methanogens or AOA. As such our study expands the knowledge of archaeal diversity inhabiting freshwater environments and shaping carbon and nitrogen cycling and CH 4 emissions of lakes. Future studies should aim to estimate the mass and energy fluxes through the archaeal compartment in these aquatic environments of regional (water quality) and global (GHG emissions) significance.

Experimental procedures
Sampling. Samples of freshwater microbes were collected from 95 Swedish lakes in August 2014 as part of a national lake monitoring programme (Fig. S1). In addition, 24 additional lakes were sampled in two separate campaigns during the same time period (Fig. S1). Water samples for analysis of water chemistry were collected from the depth of 0.5 m with a 0.5 m long tube sampler (Ruttner-type). Water samples for analysis of microbes were sampled from the depth of the whole epilimnion (usually down to between 2 and 8 m) with a 2-m long tube sampler. The epilimnion samples were pooled in a large bucket and a subsample of 100 and 500 ml was filtered with a peristaltic pump on 142 mm Millipore (Billerica, MA, USA) polycarbonate filters with 0.2-μm pore size until the filters clogged. Filters were immediately stored at −80 C until further processing.
Environmental data. Geographical information, such as lake area and catchment area, was derived from the database Svenskt Vattenarkiv (Swedish Water Archive, SMHI, 2012). Catchment land use/cover for each lake (e.g. % forest, % agriculture, % urban) were downloaded from the Swedish Land Cover Data database (Naturvårdsverket, 2014), part of the CORINE Land Cover data (European Environment Agency, 2014).
Water physicochemical data from the Swedish national lake monitoring program at the time of sampling are publicly available from the national data host (https://www.slu.se/en/ departments/aquatic-sciences-assessment/data-host/). The physicochemical variables were measured according to international (ISO) or European (EN) standards.
Large surveys, like in our case, result in datasets containing missing values for various reasons, often encoded as NaNs, blanks, or any other placeholders. One way to handle this problem is to get rid of the observations that have missing data. However, such an approach will risk losing overall statistical power and data points with valuable information, as well as introducing systematic biases. In addition, different subsets of the metadata for different analysis results in different values, which can lead to biased comparisons among analysis results and most importantly do not allow the application of multivariate statistical analyses. Thus, we used a strategy where we factored in the missing values (n = 425; 16% of total metadata entries) and learned the best imputation values for the missing data (see below for further description). Prior to analyses, two samples were entirely removed as they contained highly incomplete metadata. In addition, 10 variables were removed because they were redundant or contained high numbers of NAs.
DNA extraction, amplification of SSU rRNA genes and Illumina MiSeq sequencing. Filters were cut into small pieces and DNA was extracted following MoBio Power Soil kit protocol. For both PCR assays (i.e. archaeal and universal) used in this study, a two-step PCR protocol was applied to minimize PCR biases, such as chimera and heteroduplex formation (Thompson et al., 2002) and to add barcodes, Illumina handles and adapters to the amplicons of individual samples. In the archaeal protocol, we added 1 μl of DNA extract to duplicate PCR tubes containing dNTPs (0.2 mM), the archaeal primers 340f (5 0 -ACACTCTTTCCCTACACGACGCTCTTCCGATCT-NNNN-CCCTAYGGGGYGCASCAG-3 0 ) and 1000r (5 0 -AGACGTGTGCTCTTCCGATCT-GGCCATGCACYWCY TCTC-3 0 ) [modified from Gantner et al. (2011) with added Illumina adapters] at 0.25 μM, as well as Q5 Polymerase (0.4 unit), Q5 enhancer (4 μl) and PCR buffer (1 ×) in a final volume of 20 μl. Amplicon size was 650-750 bp. The first PCR consisted of an initial denaturation step of 98 C for 30 s and then 30 cycles of 10 s at 98 C, 30 s at 63 C, 30 s at 72 C and a final elongation of 2 min at 72 C.
After once again purifying the samples using the Agencourt AMPure XP kit and quantification by a PicoGreen assay (Quant-iT PicoGreen, Invitrogen), 16S rRNA gene and universal SSU rRNA gene samples were pooled separately in equimolar amounts. Samples were sequenced on a MiSeq using v3 chemistry and software version 2.6.1.1 (Science for Life Laboratory, Uppsala). Final sequencing results were obtained from 103 samples (94 samples from the national monitoring program and nine from the separate sampling campaign) using the universal primer pair and for 109 samples (88 from the national program and 21 from the separate campaigns) using the archaea-specific primer pair.
Amplicon sequences produced in this study are publicly available at the NCBI-SRA under accession numbers SRR10321796-10321906 (archaea) and SRR10965090 -SRR10965180 (universal).
Amplicon data processing. After raw sequence data had been trimmed of primers with CUTADAPT (Martin, 2011) and sequences without matching primers removed, they were analysed with R package dada2 (Callahan et al., 2016) for de-replicating, denoising and sequencepair assembly. After manual inspection of quality score plots, the forward and reverse reads of the universal amplicons were trimmed at 280 and 260 bp length respectively and the archaeal forward and reverse reads at 230 and 180 bp length respectively. Additional quality filtering removed any sequences with unassigned base pairs and reads with a single phred score below 10. The universal SSU rRNA gene amplicons were assembled by merging the read pairs. The archaeal 16S rRNA gene amplicon reads, for which the read pairs did not overlap, were concatenated. After reads were dereplicated, forward and reverse error models were created in dada2 with a subset of the sequences (10 7 reads). Chimeras were removed using 'removeBiomeraDenova' in the dada2 package, which resulted in the final taxon table.
Phylogenetic and taxonomic analysis. MAFFT (Katoh and Standley, 2013) version 7.305 was used for sequence alignment. FastTree 2 (Price et al., 2010) was used for generating a phylogenetic tree of the aligned sequences. The function assignTaxonomy of the DADA2 package was used to assign taxonomy using version 132 of the Silva database . Chloroplast and mitochondrial reads were removed from the universal dataset and non-archaeal reads from the archaeal dataset. Taxonomic coverage of the archaeal primers was tested in silico with Silva TestPrime (Klindworth et al., 2013; https://www.arb-silva.de/search/ testprime/) allowing one mismatch and no mismatches in the 3 0 end.
Statistical analysis. All statistical analyses and plotting were performed in R version 3.5.2 (R Core Team, 2016) using multiple R packages with R code deposited to https://gitlab.com/eiler_lab/scandinavian-archaeadiversity. Missing values in the metadata were approximated using multiple imputation with Fully Conditional Specification implemented by the MICE algorithm as described in van Buren and Groothuis-Oudshoorn (2011). This approach uses variables as predictors that (i) appear in the complete data model, (ii) are related to the none-response, and (iii) explain a considerable amount of the variance while (iv) removing variables that have too many missing values within the subgroup of incomplete cases. If variables autocorrelated, a subset of the variables was retained for downstream statistical analysis. Partial least squares regression models were used to infer environmental variables that could predict the contribution of archaeal rRNA gene reads to the total read abundance in the dataset obtained with universal rRNA gene primers. Prior to alpha and beta diversity analyses, ASV data were rarefied to the smallest sample size (4649 reads) using the function rrarefy in the vegan package (Oksanen et al., 2019). Alpha diversity measures were obtained using EstimateR on rarefied community data (i.e. ACE, Chao1, Shannon, Fisher and Simpson index) and picante (phylogenetic diversity; PD). Beta diversity matrices were calculated using UniFrac and Bray-Curtis distances on the rarefied and Hellinger transformed community data. We used distance-based redundancy analysis (dbRDA) with the function capscale in vegan to identify environmental variables co-varying with archaeal beta diversity.
Model significance of dbRDA was tested by permutational analysis (function anova.cca) for the overall model and for the significance of individual factors (marginal effects). Redundancy analysis on the relative proportions of archaeal classes and methanogen genera was carried out with the function rda to identify environmental variables co-varying with the classes and genera. The importance of random versus deterministic factors was assessed with EcoSim (Gotelli and McGill, 2006). EcoSim performs Monte Carlo randomizations to create 'pseudo-communities' (Pianka, 1986), then statistically compares the patterns in these randomized communities with those in the real data matrix.

Supporting Information
Additional Supporting Information may be found in the online version of this article at the publisher's web-site: Figure S1 Map of the sampled Swedish lakes. Figure S2 Proportion of domain-specific rRNA gene reads resulting from amplicon sequencing with universal primers. Figure S3 The contribution of archaeal amplicon sequence variants (ASVs) unique to individual study lakes to the overall community of each lake as well as the contribution of archaeal ASVs occurring in two to three lakes and more to each lake community (A). The prevalence of archaeal ASVs cumulation curve (B) and (C). Table S1 Summary statistics of metadata including 119 sampled lakes.