Beyond taxonomy: Validating functional inference approaches in the context of fish‐farm impact assessments

Characterization of microbial assemblages via environmental DNA metabarcoding is increasingly being used in routine monitoring programs due to its sensitivity and cost‐effectiveness. Several programs have recently been developed which infer functional profiles from 16S rRNA gene data using hidden‐state prediction (HSP) algorithms. These might offer an economic and scalable alternative to shotgun metagenomics. To date, HSP‐based methods have seen limited use for benthic marine surveys and their performance in these environments remains unevaluated. In this study, 16S rRNA metabarcoding was applied to sediment samples collected at 0 and ≥1,200 m from Norwegian salmon farms, and three metabolic inference approaches (Paprica, Picrust2 and Tax4Fun2) evaluated against metagenomics and environmental data. While metabarcoding and metagenomics recovered a comparable functional diversity, the taxonomic composition differed between approaches, with genera richness up to 20× higher for metabarcoding. Comparisons between the sensitivity (highest true positive rates) and specificity (lowest true negative rates) of HSP‐based programs in detecting functions found in metagenomic data ranged from 0.52 and 0.60 to 0.76 and 0.79, respectively. However, little correlation was observed between the relative abundance of their specific functions. Functional beta‐diversity of HSP‐based data was strongly associated with that of metagenomics (r ≥ 0.86 for Paprica and Tax4Fun2) and responded similarly to the impact of fish farm activities. Our results demonstrate that although HSP‐based metabarcoding approaches provide a slightly different functional profile than metagenomics, partly due to recovering a distinct community, they represent a cost‐effective and valuable tool for characterizing and assessing the effects of fish farming on benthic ecosystems.

Metagenomics, herein defined as the study of the entire DNA material recovered from environmental samples, and metatranscriptomics, as the study of gene expression from mRNA recovered from environmental samples, have also received considerable attention in the last few years (Bourlat et al., 2013;Grossart et al., 2020;Knapik et al., 2019;Semmouri et al., 2020). While amplicon-based eDNA metabarcoding provides information on which organisms are present, metagenomics and metatranscriptomics enable insights into the functions they possess, and in the latter instance, the activity of these functions. This is particularly relevant when microorganisms are being used as indicator organisms. For organisms such as bacteria, little is known about the ecology of the vast majority of species. As such, it is their metabolic capability rather than their identity that is usually of greatest interest, and likely to provide more information about current environmental conditions. Taxonomic and functional profiles can respond differently to biogeography, abiotic environmental variables (e.g., organic content, metal concentration) and community processes and interactions. As such, they can exhibit different level of stochasticity and temporality, and provide complementary information that may increase our understanding of the mechanisms behind community turnover (Barberan et al., 2012;Cordier et al., 2020;Hornick & Buschmann, 2018;Louca et al., 2016).
Having both taxonomic and functional information also enables the computation of functional redundancy within the community, which may also help assess resilience (Escalas et al., 2019). While metagenomics and/or metatranscriptomics may eventually replace metabarcoding for whole microbial community assessment, their mainstream use is still limited by their relatively low sample throughput and cost-efficiency, and heavy computational and data management requirements (Bowman & Ducklow, 2015;Breitwieser et al., 2018;Nagpal et al., 2016). To circumvent these issues, several programs have been developed to infer metabolic profiles from eDNA 16S ribosomal RNA (rRNA) gene data (e.g., functional annotation of prokaryotic taxa [FaProTax] (Louca et al., 2016), PiPhillin (Iwai et al., 2016), Vikodak (Nagpal et al., 2016), Prediction by phylogenetIC plAcement [PaPrica] (Bowman & Ducklow, 2015), Phylogenetic Investigation of Communities by Reconstruction of Unobserved States [Picrust2; Douglas et al., 2019) and Tax4Fun2 (Wemheuer et al., 2018). The last three methods use a hidden-state prediction (HSP) approach (Zaneveld & Thurber, 2014) where genomic content is inferred according to the position of the genome in a reference phylogenetic tree. These methods have been shown to provide functional profiles that correlate with a varying degree of success to metagenomic and metabolomic profiles (e.gBowman & Ducklow, 2015;Douglas et al., 2019;Sun et al., 2020;Wemheuer et al., 2018). While not as accurate as metagenomic functional analyses, these HSP methods can provide more complete metabolic profiles as they do not require high sequencing depth to assign functions (e.g., pathways of rare taxa can still be assigned; Langille et al., 2013), and may be useful in situations where metagenomics would be prohibitively expensive, such as in broad microbial routine monitoring surveys.
Metabolic inference methods have been evaluated and used in a large variety of studies, for example in clinical trials (Millares et al., 2015), oyster aquaculture (Arfken et al., 2017), aquatic urban systems (Wang et al., 2018), acid mine drainages (Aguinaga et al., 2018), subterranean estuaries (Hong et al., 2019), gut microbiota (Pacheco-Sandoval et al., 2019), marine biofilms (Salerno et al., 2018), and coral reef associated microbiomes (Pearman et al., 2019). However, there has been limited use in the context of benthic marine monitoring surveys (Cordier, 2020;Hornick & Buschmann, 2018;Laroche et al., 2018). Before this application can be routinely applied in such situation, it is essential to evaluate its performance against similar genetic data (herein DNA) recovered from a nontargeted approach. In particular, the accuracy of functional inference methods is strongly influenced by the completeness of the reference databases and by the genetic plasticity of some taxonomic groups (Bowman & Ducklow, 2015). For example, certain functions, especially those involving few genes, tend to occur at shallow phylogenetic depth (Martiny et al., 2015) and can be difficult to correctly predict.
The aim of this study was to evaluate the performance of three metabolic inference methods, PaPrica, Picrust2 and Tax4Fun2, against metagenomic and environmental data, in the context of salmon farm benthic surveys. In particular, we aimed to: I) compare predictions and abundance correlations between 16S rRNA amplicon-based metabarcoding functional inference approaches (hereafter referred to as HSP methods) and shotgun metagenomic functional profiling, (ii) contrast the taxonomic and functional microbial diversity recovered from metabarcoding and metagenomics, and (iii) assess and compare the sensitivity of functional communities derived from HSP methods and metagenomics with respect to microbial turnover and in correlation with environmental data.

| Sample collection
Benthic sediment samples (depths of 32-85 m) were collected at four large scale Atlantic salmon (Salmo salar) farms, two located in a semi-exposed coastal region of mid-Norway (farms: FRØ and SMØ) and two inside fjords in northern Norway (farms: NOR, STO) ( Figure 1). At FRØ, six samples were collected: three biological replicates located next to the pen (0 m), and three reference (control) replicates located 1,200 m away from the fish pens. The objective of this sampling design was to compare bacterial communities between impacted and nonimpacted sediments. To test HSP methods across different geographic settings within Norway, samples were also collected next to the pen at the NOR and STO farms (one each) in northern Norway, and two samples at SMØ (Southern-Norway), one next to the pen and one located at a reference site located 7,920 m away from the farm. The selection of the reference sites was made following extensive region-wide surveys and hydrodynamic modelling prior to undertaking this sampling (see Dunlop et al., 2020). The criteria for choosing reference sites were: >1 km away, comparable depth band, aspect and habitats in vicinity, and with comparable hydrodynamics. Approximately 5 g of the top (1 cm) sediment layer, collected with a van Veen grab sampler (surface area 0.1 m 2 ), was subsampled per grab with a sterile spatula and stored at -20°C until further processing.

| Physicochemical and macrofaunal analyses
A complimentary suite of environmental parameters was obtained from parallel studies. The prevalence of three terrestrial fatty acids (oleic acid, 18:1n-9; linoleic acid, 18:2n-6; and α-linolenic acid, 18:3n-3) in the sediments, which indicate fish-feed-derived organic matter (White et al., 2017), were assessed by Folch lipid extraction and direct methylation as described in Woodcock et al. (2019). The organic and inorganic carbon content of the sediment was determined by drying the sediment at 40°C for 48 h followed by combustion at 450°C for 2 h (LOI450). Measures of benthic respiratory fluxes, including ammonium (NH 4 ), total carbon dioxide (TCO 2 ), and oxygen (O 2 ) were obtained from, and following the methods described in Keeley et al. (2019). Additionally, the macrofaunal communities were obtained from and characterized using the methods described in Keeley et al. (2019)

| Bioinformatic analysis of 16S rRNA data
Primers from the demultiplexed fastq files were removed with cuTadaPT (version 2.6; Martin, 2011), using the --no-indels flag and a minimum overlap of 17 bp, and reads quality filtered, denoised, merged and chimera filtered with the dada2 r program (version 1.14; Callahan et al., 2016). Prior to quality filtering, reads were truncated at 226 and 220 bp on the 5' end to remove the lower quality section and reduce the number of reads lost during quality trimming.
Quality filtering and denoising were performed using the default parameters, and merged using a perfect minimum overlap of 10 bp.
Chimera removal was performed using the consensus method where sequences found to be chimeric in the majority of samples (default value =90%) are discarded. Nonchimeric sequences were taxonomically assigned using DADA2's inbuilt RDP Naïve Bayesian Classifier (Wang et al., 2007) trained on the SILVA reference database (version 132 clustered at 99% similarity; Quast et al., 2013). Sequences found in negative controls, including DNA extraction, PCR, indexing and sequencing blanks were examined and subsequently removed from across all samples. Sequences unclassified at kingdom level or not identified as bacteria were discarded. Additionally, rare amplicon sequence variants (ASVs; less than 10 reads across the entire study) were removed from the data set. The resulting sequences were used for taxonomic and functional profiling using three pipelines: PaPrica and metabolic inference, sequencing depth per sample was visualized with the "rarecurve" function of the "vegan" r package (version 2.5.6; Oksanen et al., 2019) to ensure that all samples had sufficient sequencing depth to recover most of the diversity ( Figure S1). The default parameters implemented within each method were used to keep the analysis as simple as possible.

| Bioinformatic analysis of metagenomic data
The metagenomic pipeline used is based on the fully automated workflow of the humann2 software (version 2.8.2; Franzosa et al., 2018). Reads were first preprocessed with kneaddaTa (version 0.7.4; https://bitbu cket.org/bioba kery/knead datae) to perform both quality filtering with TrimmomaTic (version 0.39; Bolger et al., 2014), and screening of undesired reads (herein phix, human and salmon DNA) with BowTie2 (version 2.3.5.1; Langmead & Salzberg, 2012). Profiling of taxa was performed with meTaPhlan2 (version 2.7.8; Truong et al., 2015) and results used to construct a sample-specific database from functionally annotated pangenomes (referred to as the chocoPhlan database). humann2 then performed a nucleotide-level mapping with BowTie2 of all reads against the custom database. Reads that remained unaligned were subjected to an additional translated search against the UniRef50 protein database (Suzek et al., 2015) using diamond (version 0.8.36.98; Buchfink et al., 2015). The gene family abundance table was then converted to a KO and EC tables using the "humann2 _regroup_table" function and the uniref50_ko and uniref50_level4ec groups, respectively.

| Data analysis and statistics
Taxonomic and functional richness differences between metabarcoding and metagenomic data were visualized with box and bar plots using the "ggplot2" R package (version 3.3.1; Wickham, 2016).

Spearman correlations of functions across all samples and per
samples between the HSP methods and the metagenomic data were assessed using the "stats" R package (version 3.6.1; R Core Team, 2017) and the "cor" function. For this analysis, EC and KO abundances were transformed to the centred-log ratio with the "clr" function of the "composition" R package (version 1.40.4; van den Boogaart et al., 2020) and only functions shared between inference methods and metagenomics were maintained. Because several ECs and KOs co-occur within pathways and genomes, correlation of functions between HSP and metagenomic samples can be naturally high, even between completely unrelated samples (Douglas et al., 2019;Sun et al., 2020). To take this dependency into account, we added a randomized data set referred to as "null expectation" for each HSP method. Based on the original ASV abundance table, the permaswap function of the "vegan" R package (parameters: times =1, burnin =20,000, thin =500, mtype = "count", shuffle = "both") was first used to create a dataframe with permuted samples and ASVs. This table was then input into each HSP method to obtain "null expectation" data sets of functional inferences. Differences between results of the actual and "null expectation" data were tested with Welch's two sample t tests, with assumption of normal distribution tested with a Shapiro-Wilk test.
Correspondence of ASV and functional communities between HSP methods with metagenomics was assessed with a procrustes test using the "protest" function (symmetric analysis with 9,999 permutations and scores = "sites") of the "vegan" R package (version 2.5.6; Oksanen et al., 2019). The correspondence of the ASV and functional communities derived from HSP and metagenomics with macrofaunal communities (transformed with the Wisconsin method implemented in "vegan") as well as physicochemical data (scaled with the rda function of the "vegan") was also assessed with procrustes tests using the same parameters. Physicochemical data were only completely available for the FRØ locality, therefore only samples from this site were kept for the latter analysis. In addition to the procrustes tests, the sensitivity of each data set towards fish farming was assessed using permutational analyses of variance (PERMANOVA; Anderson, 2005) between samples collected at the 0 m and ≥1,200 m from the pen. For the procrustes and PERMANOVA analyses, two methodologies were evaluated: (i) Aitchison distance matrices computed from centred-log ratio transformed abundances, as suggested in Gloor et al. (2017) for compositional data, and (ii) Jaccard distance matrices for presence/absence data. Multivariate homogeneity of groups dispersions analyses between distance categories (0 m and ≥1,200 m) were performed with the "betadisper" function of the "vegan" R package.
Compared to reference sites, benthic environments in proximity to fish farm activities are typically characterized by higher concentrations of organic matter and nutrients, especially phosphorus and nitrogen (Buschmann et al., 2006;Wang et al., 2012), which can lead to eutrophic conditions and anaerobic microbial degradation (e.g., sulphate reduction and methanogenesis; Valdemarsen et al., 2009). To explore this, the response of pathways associated to the nitrogen and sulphur cycle between near and far-field sites, and between metagenomic and HSP-based pathways profiles were compared. Pathways were clustered to their parent class based on the MetaCyc database (https://metac yc.org/) and only classes involved in nitrogen (fixation, ammonification, nitrification, denitrification and degradation) and sulphur cycle (oxidation, reduction and degradation) were maintained. Groups of pathways ssociated to more than one parent class were filtered out. The response of the remaining classes between pen and reference sites was analysed using centred-log ratio transformed data and the "lm" function of the stats R package and visualized with barplots using the ggplot2 R package.
Only PaPrica and Picrust2 provide pathways data derived from the MetaCyc database and are similar to those of humann2, therefore only these two HSP methods were assessed. In addition, functional groups determined by the FaProTax methodology (version 1.2.1; Louca et al., 2016), where ecologically relevant groups are assigned to ASVs based on available literature from cultured strains, was performed and differential abundance assessed with ancom2 (version 2.1; Kaul et al., 2017) between pen and reference sites in order to compare results with metagenomic and HSP data.

| Metagenomic sequencing and preprocessing
A total of 314,003,232 reads with a mean of 31,400,323 per sample were obtained from the ½ Novaseq SP flow cell (Table S3). Trimming low quality reads reduced their number by 19.3% and removing reads associated to phix and human DNA by 0.6%, and to salmon DNA by another 0.6% (Table S3).
On average, 4,425.6 gene families per sample matched the chocoPhlan database after nucleotide alignment with BowTie2 (Table S4). Translating reads with diamond and using the uniref50 database, an average of 36% of reads could be aligned, with a mean of 1,084,334 gene families identified per sample (Table S4).

| Metabarcoding versus metagenomic-based functional profiling
In both the 16S rRNA metabarcoding and metagenomic data sets, the two main bacterial Phyla were Proteobacteria and Bacteroidetes (Figure 2a). The Proteobacteria families Desulphobacteraceae, Psychromonadaceae and Vibrionaceae, and the Bacteroidetes family Flavobacteriaceae were the most abundant taxa. The total number of bacterial families detected was substantially higher for metabarcoding (224) compared to metagenomics (15; Figure 2b). Subsequently, the mean number of families detected per sample was also considerably higher for metabarcoding (114)  At the sample level, Spearman correlations of the HSP data sets with metagenomics was highest for Picrust2 (0.81), followed by Tax4Fun2 (0.67) and PaPrica (0.6), but no significant difference with null expectation data sets was observed (Figure 4 and Table S5). Assessment of the normality of the distribution with a F I G U R E 2 Taxonomic composition at family and phylum level (a), and taxonomic (b) and functional (c) richness per data set. In (a), unclassified families and those out of the most five abundant families within their phylum were grouped under "Others". Similarly, Phyla out of the top six most abundant were grouped under "Others". In (b) and (c), barplots represent the total richness per data set while boxplots indicate richness per sample and data set. Taxonomic richness comparison is made at family level. In (c), humann2 columns represent the metagenomic data. EC, enzyme commission numbers; KO, KEGG orthologue numbers  The correspondence of the ASV (16S rRNA metabarcoding) community with the functional community derived from metagenomics was relatively weak using either centred-log ratio transformation (r = .49, p-value = .21) or presence/absence (r = .4, p-value = .44) data ( Figure 5). This contrasts with the strong and significant correspondence of the HSP-derived functional communities with metagenomics, with PaPrica and Tax4Fun2 showing highest correlation (r ≥ .87; Figure 5). Correspondence of these two HSP methods with metagenomics noticeably improved when using presence/absence data and Jaccard dissimilarity indices (r = .92 and .04, respectively; Figure 5).

| Sensitivity of 16S rRNA metabarcoding and metagenomics
The response of the taxonomic (ASVs) and functional communities toward fish farm activities was tested for each data set with a PERMANOVA, by comparing the variance between samples collected at the pen (0 m) versus those collected at reference sites (≥1,200 m). Except for Picrust2, the data transformed to the centredlog ratio showed significant differences between near and far-field samples. In general, highest sensitivity was achieved when using presence/absence data and Jaccard dissimilarity indices, with PaPrica and Tax4Fun2 being the most responsive (R 2 = 0.39 and 0.4, respectively; Table 1). A betadisper analysis of homogeneity of groups dispersions showed no significant difference between distance groups ( Figure S2 and Figure S3, Table S7).
Correlation of the taxonomic (ASVs) and functional communities with macrofaunal communities and physicochemical data were explored with Protest analyses. While the taxonomic profile did not significantly correlate with the macrofauna, all functional communities were strongly and significantly correlated (r ≥ 0.68), with the strongest associations with PaPrica, Tax4Fun2 and metagenomic data (Table 2). Conversely, only the EC-based data sets (PaPrica and humann2 EC) were significantly associated with the physicochemical data (r ≥ 0.96, p-value ≤ .048), although correlations were relatively strong with all molecular data sets (r ≥ 0.5; Table 2).
Pathways of particular interest involved in the nitrogen and sulphur cycle were compared between pen and reference sites and between the metagenomics (humann2) and 16S rRNA HSP-based data ( Figure 6). Only four out of the nine pathways investigated were found to be affected by fish farm activities within the metagenomic and Picrust2 data sets. These included nitrate-reduction (increased prevalence; +), allantoin-degradation (reduced prevalence; −), sulphuroxidation (−) and glycosaminoglycan-degradation (+). In comparison, PaPrica identified four additional pathways affected by fish farming including nitrogen-fixation (+), sulphur-reduction (−), sulphite and sulphide-reduction (−), and dimethylsulphide-degradation (+). All pathways found to be either positively or negatively affected by fish farm activities in the metagenomes were found to respond similarly in both PaPrica and Picrust2 data sets. Since we expected sulphur related pathways of both metagenomic and 16S rRNA HSP-based data to be more strongly affected by fish farm activities, we also assessed functional

| DISCUSS ION
In this study, our main objectives were to assess the level of correspondence between the taxonomic and functional profiles derived from amplicon-based 16S rRNA metabarcoding data and shotgun metagenomics, evaluate the strengths and weaknesses of both approaches, and determine whether functional profiles from HSP methods can be used as a substitute to metagenomics for monitoring functional changes associated with fish farming in marine environments.

| Alpha-diversity differences between methods
The overall taxonomic richness recovered by 16S rRNA metabarcoding was up to 20-fold higher (depending on the taxonomic level) than metagenomics. This is due, in part, to the differences in the ability of identifying sequences of different lengths (~450 bp [metabarcoding] vs. ~150 bp [metagenomics]) and by differences in the taxonomic assignment methods used. However, it is also well recognized that metagenomics requires much more sequencing effort than metabarcoding to reach equivalent 16S rRNA coverage as it captures all DNA material (Cottier et al., 2018;Singer et al., 2020). This is especially problematic in highly diverse communities such as the marine sediment samples assessed in this study. Small differences in taxonomic composition were anticipated because 16S rRNA primer sets are never truly universal (Pollock et al., 2018). In this case, few gene families could be assigned using a nucleotide search against the chocoPhlan that are essential to cellular activities. Additionally, several nonubiquitous functions, and especially those occurring at shallow phylogenetic depth, are difficult to accurately predict (Martiny et al., 2015) and may therefore be omitted by HSP methods (Bowman & Ducklow, 2015). These false negatives were prominent for PaPrica, which had the lowest specificity. The opposite scenario where functional richness is artificially increased due to false positive predictions is also a possibility. For example, phylogenetic plasticity and genomic variability can result in loss of functions within taxa that cannot be detected by HSP methods. However, because of the substantial differences in terms of recovered taxonomic diversity between metagenomic and 16S metabarcoding methods and limits imposed by sequencing depth, it is also possible that some functions predicted by the HSP methods were not detected by metagenomics. As such, the true sensitivity of the HSP methods, which was lowest for Tax4Fun2 and highest for PaPrica, was probably underestimated.

| Inferred pathways correspondences with metagenomic, biological and environmental data
In general, we observed little correlation and/or no significant difference with the null expectation data sets based on the abundance of functions shared between HSP methods and metagenomics. Other studies have reported weak correlations of HSP derived pathways abundance with metagenomics when compared to null expectation data sets, with decreasing performance for more complex and/ or less characterized environments (Douglas et al., 2019;Sun et al., 2020). These low correlations could be due to preferential amplification of certain DNA sequences, primers biases, and varying gene copy numbers of 16S rRNA per taxa, although HSP methods typically try to correct this bias (Bowman & Ducklow, 2015;Douglas et al., 2019;Wemheuer et al., 2018). The increased detection sensitivity of 16S rRNA metabarcoding can also create a bias in the number of contributing taxa to certain functions, which can negatively affect correlations with functions derived from metagenomics. Considering that amplicon-based 16S rRNA metabarcoding and metagenomics uncovered a substantially different bacterial diversity, the weak correlation in functional abundance between the two methods is expected.
An alternative and possibly more appropriate approach to comparing functional profiles of HSP and metagenomic approaches is by contrasting their correlation with metadata (Sun et al., 2020) or by evaluating the correspondence between the ordination of their functional communities. Using procrustes analyses, there was a very strong and significant correlation between HSP methods and metagenomics, especially when using presence/absence data.
The ASV communities showed no significant relationship with the functional profiles, suggesting that the taxonomic and functional communities were influenced differently by the biological and/or environmental conditions. We also tested the correspondence of the bacterial taxonomic and functional profiles with macrofaunal communities and physicochemical data. While both profiles correlated relatively well to physicochemical data (r ≥ 0.5), with the EC-based data performing best, only functional profiles correlated strongly and significantly with the macrofaunal communities. A higher association of functional versus taxonomic beta-diversity with macrofaunal data was also reported by Laroche et al. (2018), which suggest that interactions between these communities are especially driven by microbial metabolic capabilities rather than specific phylogenetic association.
The sensitivity of the different data sets in detecting the effect of fish farm activities was evaluated by comparing changes in community composition between near-field (0 m from pen) and far-field samples (>= 1,200 m from pen). In general, we found higher sensitivity for the HSP methods, and especially for PaPrica shifts in response to a contamination gradient compared to functional community changes. This observation has also been reported by Cordier (2020), Hornick andBuschmann (2018), Laroche et al. (2018) and Ren et al. (2016) and is probably due to several taxa sharing the same metabolic capabilities and their succession in the ecosystem may have less to do with environmental changes than with biological properties (e.g., growth cycle and bacterial interactions) and geotopographic factors (e.g., depth and geographic distance). As such, functional profiles may be slightly more robust and sensitive in detecting environmental alterations caused by fish farm activities, although further research is needed to properly test this assumption.

| Presence and abundance of function of particular interest between methodologies
Benthic environments under cage fin-fish aquaculture are usually enriched in organic waste and nutrients such as phosphorus and nitrogen compounds (from faeces and uneaten fish feed for example), which can lead to eutrophic conditions, microbial anaerobic activities and the production of ammonia and hydrogen sulphide gasses (Brooks & Mahnken, 2003;Buschmann et al., 2006;Valdemarsen et al., 2009;Wang et al., 2012). In the present study, we were particularly interested in comparing the response of classes of pathways associated to the nitrogen and sulphur cycles between the pen and reference sites, and between the metagenomic and HSP-based data.
Overall, results from both approaches were very similar, with an increase near the pens of pathways associated to nitrate reduction and glycosaminoglycan degradation, and a decrease of pathways affiliated to allantoin degradation and sulphur oxidation. Additionally, the PaPrica analysis showed a decrease in pathway abundance associated with sulphur reduction, sulphite and sulphide reduction, and an increase of dimethylsulphide degradation near the pens. While we expected pathways of nitrate-reduction to be in higher abundance near the fish farms, due to enriched nutrients and possibly anoxic conditions, it was somewhat surprising that pathways associated with sulphur compounds were less abundant in both the metagenomic and HSP data sets. However, sulphite respiration was found to be associated with the pens when using taxonomic information of the 16S rRNA data and literature-based functional association (FaProTax methodology). It is likely that certain pathways associated to the sulphur cycle, such as sulphite oxidation and reduction, were indeed more prominent near the pens but were not fully picked-up by metagenomic and HSP-based functional profiling, possibly due to the incompleteness of reference databases. For example, pathways associated to sulphite respiration were absent from both the humann2 and Picrust2 data sets. Glycosaminoglycan degradation is responsible for the degradation of long linear polysaccharides made of repeating disaccharide units, also referred to as mucopolysaccharides (Ernst et al., 1995). It is probable that high quantities of mucopolysaccharides originate from mucus produced and excreted by the caged salmon (see Jacobsen et al., 2019;Reverter et al., 2018)  activity, involving more samples and taking into account regional and temporal variability is needed to identify putative functional indicators of fish farm ecological impacts that could be eventually integrated into benthic health indexes.

| SUMMARY
Collectively, our results suggest that the lower specificity of HSP methods may be offset by the ability of amplicon-based metabarcoding to provide a much more exhaustive assessment of the taxonomic community, and hence of functions that have low genomic variability. This allows HSP methods to provide functional profiles that are relatively similar to those of metagenomics, and which respond similarly to environmental changes. While the accuracy and sensitivity of HSP methods are still strongly affected by the incompleteness of reference databases, our results suggest that they can provide a useful functional profiling alternative to metagenomics, and a valuable tool in detecting and evaluating the effects of salmon farming on benthic ecosystems. Further research testing the ability of inferred functional profiles in detecting degrees of impact is warranted to appropriately assess the applicability of the methodology to threshold-based monitoring strategies. O. (2012). Exploration of community traits as ecological markers in F I G U R E 6 Centred-log ratio (CLR) abundance of pathway classes of particular interest between metagenomics (humann2) and hiddenstate prediction (HSP) methods (PaPrica and Picrust2). Classes of pathways with positive CLR were more prevalent at the pens while those with negative CLR were more prevalent at the reference sites [Colour figure can be viewed at wileyonlinelibrary.com]