Spineless and overlooked: DNA metabarcoding of autonomous reef monitoring structures reveals intra‐ and interspecific genetic diversity in Mediterranean invertebrates

The ability to gather genetic information using DNA metabarcoding of bulk samples obtained directly from the environment is crucial to determine biodiversity baselines and understand population dynamics in the marine realm. While DNA metabarcoding is effective in evaluating biodiversity at community level, genetic patterns within species are often concealed in metabarcoding studies and overlooked for marine invertebrates. In the present study, we implement recently developed bioinformatics tools to investigate intraspecific genetic variability for invertebrate taxa in the Mediterranean Sea. Using metabarcoding samples from Autonomous Reef Monitoring Structures (ARMS) deployed in three locations, we present haplotypes and diversity estimates for 145 unique species. While overall genetic diversity was low, we identified several species with high diversity records and potential cryptic lineages. Further, we emphasize the spatial scale of genetic variability, which was observed from locations to individual sampling units (ARMS). We carried out a population genetic analysis of several important yet understudied species, which highlights the current knowledge gap concerning intraspecific genetic patterns for the target taxa in the Mediterranean basin. Our approach considerably enhances biodiversity monitoring of charismatic and understudied Mediterranean species, which can be incorporated into ARMS surveys.


| INTRODUC TI ON
The development and application of biodiversity monitoring tools enhance the power and pace of data generation, which is important to track ecological changes in the face of environmental shifts. From communities to populations, genetic diversity is a valuable tool to characterize these responses to surrounding pressures. Patterns of genetic differentiation are, therefore, fundamental to understanding community and species resilience (Reusch et al., 2005), evolutionary potential, adaptive ability (DuBois et al., 2022) and connectivity patterns (Darnaude et al., 2022). These aspects have implications for both marine ecosystems themselves and for the natural services they provide (Darnaude et al., 2022;Pinsky et al., 2020). Nevertheless, genetic diversity information is still low and patchy for marine invertebrates partially due to challenges with sampling (Costantini et al., 2018).
At smaller spatial scales, genetic variability affects local dynamics and thus the resulting in community composition. Biodiversity assessments are an important tool to capture this local variability at both species and population levels. In recent years, efforts have been made to standardize biodiversity data collection, allowing for studies at larger spatial and temporal scales (Obst et al., 2020). One such method is the deployment of Autonomous Reef Monitoring Structures (ARMS; Leray & Knowlton, 2015) which have been used across the world's oceans over the last few years (Obst et al., 2020;Pearman et al., 2020, see www.natur alhis tory.si.edu/resea rch/globa l-arms-progr am/publi cations for further updates). ARMS consist of nine stacked PVC plates arranged in a three-dimensional structure ( Figure 1) that provide suitable settlement surfaces for a remarkable variety of species and can be deployed without causing a significant impact on the surrounding environment. Following retrieval, organisms collected on and within the structures can be identified using morphological and molecular techniques (Leray & Knowlton, 2015).
Regarding the latter, DNA metabarcoding is a popular and efficient approach that allows for the rapid assessment of species diversity within a community (Taberlet et al., 2012). The mitochondrial cytochrome oxidase I (COI) gene has been extensively used to describe whole communities for countless metazoan surveys (e.g. Leray & Knowlton, 2015;Nichols et al., 2022), and traditional population genetic studies use the same target gene for its intraspecific variability to study phylogeographic distributions of populations (Pérez-Portela et al., 2013;Wäge et al., 2017). However, the within-species variability of the COI is typically concealed in metabarcoding studies, despite representing valuable ecological information with major implications for ecosystem dynamics (DuBois et al., 2022;Reusch et al., 2005).
Unlocking this data, therefore, has enormous potential for assessing intraspecific genetic diversity for hundreds of species simultaneously.
Classical metabarcoding studies have primarily focused on generating species lists rather than assessing within-species diversity. This is primarily an error-avoidance strategy as both PCR amplification of bulk samples and next-generation sequencing inherently introduce false sequences through, e.g., chimeras and tag-jumping (Elbrecht et al., 2018;Turon et al., 2020). Clustering sequences is currently a common practice to reduce input data to molecular operational taxonomic units (MOTUs) which removes the within-species variability that is typically binned into MOTUs. Elbrecht et al. (2018) devised a method for detecting intraspecific genetic diversity by extracting haplotypes at the community level. This approach has gained traction in recent years, as evident from numerous studies (Andújar et al., 2021;Antich et al., 2021Antich et al., , 2022Brandt et al., 2021;Macé et al., 2021;Porter & Hajibabaei, 2020;Shum & Palumbi, 2021;Turon et al., 2020) that have utilized this technique to reveal valuable insights into genetic diversity. The term "metaphylogeography" was coined by Turon et al. (2020) to describe the application of this method in uncovering intraspecific genetic patterns and population structures within ecological communities. With these alternative bioinformatics tools, raw metabarcoding reads can be processed to recover amplicon sequence variants (ASVs) at the species level. Once ASVs or haplotypes are generated from the metabarcoding data, they can be employed to make inferences about population-level genetic diversity. However, an issue remains in that the number of reads per haplotype does not necessarily correlate with the biomass or number of individuals exhibiting that haplotype (Elbrecht et al., 2017 are crucial for population inferences, and several approaches can be considered to quantify relative abundances. Azarian et al. (2021) proposed the use of a frequency of occurrence metric instead of read abundances when an appropriate amount of samples are collected.
If reads are recorded for a haplotype in a given sample, at least one individual with that haplotype was present. This approach is advantageous for population-level inferences from community DNA collections when adequately sampled and replicated (Azarian et al., 2021), as demonstrated by Shum and Palumbi (2021).
In the present study, we show the potential of ARMS to provide new insights into intraspecific genetic patterns for several invertebrate taxa from the Mediterranean Sea. This region harbours unique biodiversity hotspots with prominent levels of endemic fauna, accounting for an estimated 7% of global marine biodiversity (Coll et al., 2012). Nonetheless, Mediterranean coastal areas are subject to compounding anthropogenic impacts such as habitat disturbance, depletion and destruction (Coll et al., 2012;Micheli et al., 2013). Further, knowledge about the genetic structure and diversity of many of the local invertebrate fauna is severely lacking (Costantini et al., 2018;Mugnai et al., 2021). Information regarding these species' genetic diversity in different locations would, therefore, aid in characterizing their response to environmental pressures in a unique marine setting.
Further, as demonstrated in this study, this information can be acquired through existing biodiversity monitoring efforts. Here, we use metabarcoding data gathered from ARMS deployed in the Tyrrhenian (Livorno and Palinuro, Italy) and Adriatic (Rovinj, Croatia) Seas. These marine basins constitute two distinct biogeographic environments with species-specific levels of connectivity for invertebrates (Villamor et al., 2014), which can affect both community patterns and intraspecific diversity. The strong abiotic barriers may contribute to higher genetic variability in echinoderms and polychaetes due to population isolation and ongoing speciation (Patarnello et al., 2007;Villamor et al., 2014). Conversely, lower genetic variability may be exhibited by corals and sponges due to inherently less variation within the COI region for these taxa (Calderón et al., 2006;Erpenbeck et al., 2016;Shearer et al., 2002). However, there are generally few studies on the subject in the Mediterranean region (Costantini et al., 2018).  Figure S1). From each ARMS, five samples were collected corresponding to different plate positions within the structure (Figure S2). Moreover, one additional sample was collected from the natural substrate near each ARMS. Each sample was analysed in three PCR replicates. Two samples from the same ARMS in Livorno contained insufficient sample volume for genetic analysis. As such, there was a total of 480 PCR replicates (3 locations * 3 sites * 3 units * 6 samples * 3 replicates-6 unavailable replicates).

| Study design
The resulting metabarcoding data were compared with two additional datasets: morphological identification data of organisms present on and within the ARMS, and existing sequence data derived from single-specimen samples published on GenBank.  (Table S1) and retrieved in July 2019. ARMS were placed at the same depth within a given site. During retrieval, ARMS were covered with PVC boxes to prevent loss of vagile fauna, then removed from the substrate. Once transported to the boat, each ARMS was placed into a separate PVC bin filled with 20μm filtered seawater. Natural substrate was sampled in a 22.5 × 22.5 cm area near each ARMS using a scraper and a vacuum pump at the time of retrieval. The sample net was collected and placed into a sterile PE ziplock bag.

| Sample collection
In the laboratory, ARMS were disassembled, and vagile fauna was collected by sieving the water from the bins. The 2000μm fraction was placed in ethanol for morphological identification to the lowest taxonomic rank possible. Each ARMS plate (22.5 × 22.5 cm) was placed into individual sterile PVC trays and photographed on each side to estimate per cent coverage of sessile organisms. The morphological data collected were used for comparison with metabarcoding data in the present study and a summarized dataset is available in Appendix S1. Settlement plates were then scraped and the organic material was collected according to plate position ( Figure S2), resulting in five samples from within each ARMS. All samples (five from ARMS, one from the nearby natural substrate) were homogenized separately in a blender at maximum speed for 15 s. The resulting material was F I G U R E 2 Map of sampling locations in the Mediterranean Sea. dried, split into sub-replicates in 15-mL falcon tubes with 96% ethanol and stored at −20°C until DNA extraction.

| DNA extraction, amplification and sequencing
DNA was extracted using the NucleoSpin® Soil Kit (Thermo Fisher Scientific) following the manufacturer's protocol with two modifications; the initial sample amount was doubled from 500 to 1000 mg, and the elution buffer was incubated at 70°C before use.
Samples were amplified using the newly designed degenerate primer set targeting a 313-bp fragment of cytochrome c oxidase subunit 1 (COI) (forward IIICRrev: GGNTG AAC NGT NTA YCCNCC; reverse HBR2d: TAWAC TTC DGG RTG NCC RAA RAAYCA) that has been developed to amplify a wide range of phyla across eukaryotes and algae. Forward and reverse primers were indexed with 11-13 nucleotide long tags differentiated by at least three different nucleotides (for details see Corse et al., 2017), and a 0 to 3 nt heterogeneity spacer (none/N/NN/NNN) was added to mitigate the issues caused by low sequence diversity in Illumina amplicon sequencing (Fadrosh et al., 2014). A unique combination of tags was used for each sample. Each 15-μL amplification mix consisted of 7.5-μL QIAGEN® Multiplex Master Mix, 2.5-μL QIAGEN® Nuclease-Free H20, 1.5 μL of each respective tagged primer (2 μM) and 2-μL DNA. All samples, including seven positive and three negative controls, were amplified in three PCR replicates. The positive controls consisted of six identical terrestrial mock communities (containing 14 known species) and one marine mock community (containing 12 known species) with each species having a single Sanger sequenced barcode present in equimolar concentrations (Appendix S1). Negative controls consisted of three negative controls for extraction that consisted of 50 μL of DNA-free water subjected to DNA extraction protocol, three negative controls for DNA aerosols that consisted of a 1.5-mL vial containing 50 μL of DNA-free water that remained open but otherwise untouched during the extraction protocol, and one PCR controls (DNA-free water) (see Corse et al. 2017).

| Bioinformatics pipeline
The quality of the resulting raw Illumina reads was inspected using FASTQC (Andrews, 2010; www.bioin forma tics.babra ham.ac.uk/ proje cts/fastq c/). OBITOOLS (Boyer et al., 2016) was used for initial quality filtering. Reads below a minimum quality threshold of 28 were removed (obicut), pair-end reads aligned and alignments with quality scores below 40 discarded (illuminapairedend). The aligned sequences were demultiplexed (ngsfilter), filtered strictly for length at 313 bp, reads with ambiguous bases were removed (obigrep) and parsed by replicate.
The final dataset was decontaminated using decontam (Davis et al., 2018) in R which statistically identifies contaminant sequences through comparisons with negative controls. Contaminants were removed based on the prevalence model using a 0.5 threshold after excluding eight negative control replicates with library sizes >2000.
Each step was evaluated by inspecting the number of haplotypes remaining in terrestrial positive control sequence clusters, i.e., the molecular operational taxonomic units (MOTUs) containing the terrestrial control haplotypes. The marine positive controls were not used for this purpose due to potential crosscontamination between samples. After decontamination, spurious haplotypes remained and the haplotype table was, therefore, further curated using lulu (Frøslev et al., 2017) in R with default settings (84% similarity threshold, 95% co-occurrence ratio). The following filters were then applied in successive order: haplotype relative abundance per replicate >0.01%, haplotype absolute abundance in replicate >5 reads and occurrence in at least two out of three PCR replicates.
The resulting sequences were inspected in MEGA11 (Tamura et al., 2021) and those containing stop codons were removed. Sequences were taxonomically assigned using the RDP classifier (v2.12; Wang et al., 2007) trained on a COI database from 2016 (Wangensteen & Turon, 2016). A second taxonomic assignment was made using BOLDigger (Buchner & Leese, 2020) to account for updated databases since the creation of the RDP classifier reference.
In BOLDigger, the BOLDigger method was used to identify top hits and flag suspicious matches. Species-level uncertainties (e.g. due to incomplete taxonomic resolution in the reference match) were inspected and corrected when possible. The taxonomic tables from RDP and BOLDigger were combined so that BOLDigger assignments replaced RDP assignments in cases when BOLDigger had sequence identity match ≥98% and RDP species-level bootstrap values were lower than 98%. RDP species-level assignments were accepted for species-level bootstrap values ≥90%. If RDP bootstrap values for class were lower than 85% and no appropriate BOLDigger assignment was available to replace it, no class was assigned and the haplotype was removed. If a given MOTU contained haplotypes where some remained unassigned at species level, all haplotypes received the species assignment when no conflicts were present. MOTUs containing conflicting class assignments were removed. The final taxonomic list was filtered to exclude non-marine and non-metazoan taxa, and haplotypes clustered into MOTUs containing control sequences (including from marine mock communities) were removed to account for cross-contamination between control and real samples.

| Data analysis
In the final haplotype table, it was noted that haplotypes assigned to the same species were frequently clustered into separate MOTUs.
For this reason, haplotypes were grouped into species based on taxonomic assignment rather than MOTUs. Haplotypes without species-level assignments were, therefore, disregarded after initial data exploration.
To generate population-level information, the resulting dataset was converted to presence/absence per haplotype per sample (see Azarian et al., 2021;Shum & Palumbi, 2021). Note that this is different from presence per PCR replicate since a haplotype had to be present in at least two out of three PCR replicates in previous steps to be considered present in a sample. The plate position for each sample was not treated as a variable in any analyses since this was beyond the scope of the present study.
To validate and compare results, sequences from single-specimen samples were included from GenBank for species with more than three haplotypes in the metabarcoding dataset. Sequences were downloaded when more than 10 sequences from Mediterranean locations were available. Sequence data meeting these criteria were available for four groups: Ophiothrix fragilis (Pérez-Portela et al., 2013), Platynereis dumerilii and P. massiliensis (Calosi et al., 2013;Wäge et al., 2017), Clytia spp. (Cunha et al., 2017) and Eualus spp. (Conforti & Costantini, 2022). The GenBank sequences were combined with the species data from the present study and aligned in MEGA11 (Tamura et al., 2021) using the Muscle algorithm (Edgar, 2004). Sequences were trimmed to the longest possible overlapping fragment in MEGA11.
Haplotype networks for species with at least five haplotypes were created using pegas (Paradis, 2010) in R. Pegas was also used to compute genetic distance (K2P) and nucleotide diversity (π). The R package ade4 (Dray & Dufour, 2007) was used for analysis of molecular variance (AMOVA) with a randomization test (n = 999) to test the effects of location and site (nested within location) on genetic variability in each species. Accumulation and rarefaction curves were created using the R package vegan (Oksanen et al., 2022).
To confirm that species found in the resulting dataset had previously been observed in the study region, species occurrence records were checked in the Global Biodiversity Information Facility (GBIF; www.gbif.org/) occurrence database.

| Pipeline outputs
A total of 12,999,884 reads from 480 PCR replicates were generated after evaluation based on 42 positive control replicates with mock communities and 10 negative control replicates (Table S2) Figure S4.
The pipeline generated a dataset with 145 unique species. Fifty of these species belonged to more than one MOTU, with five species clustering into four or more MOTUs. Conversely, only four MOTUs contained multiple species assignments. For this reason, analyses following initial data exploration considered species to consist of haplotypes with the same taxonomic assignment at species level, rather than individual MOTUs. As a result, haplotypes without species assignments were disregarded.

| Comparison with morphological observations
Photoanalysis of sessile organisms on ARMS plates identified 22 invertebrate categories to at least genus level (Figure 3). Of these categories, only one (the ascidian Ciona edwardsi) was also observed in the metabarcoding data. The most frequent categories identified in the photoanalysis were the Annelida Polychaeta Salmacina spp./Filograna implexa species complex (Kupriyanova & Jirkov, 1997) and the Bryozoa Schizobrachiella sanguinea, which both lacked haplotype occurrences in the metabarcoding data. The photoanalysis identified a larger number of bryozoan (classes Gymnolaemata and Stenolaemata) and ascidian (class Ascidiacea) species than the metabarcoding method (Figure 3).
Morphological identification of vagile fauna (≥2 mm) observed 92 categories defined to at least genus level in the ARMS. Of these, 17 categories matched with species present in the metabarcoding data ( Figure 3). A larger number of mollusc species (classes Bivalvia and Gastropoda) were detected using morphological identification than metabarcoding.
Conversely, metabarcoding detected 128 species not found in either sessile or vagile data, primarily species of classes Polychaeta and Demospongiae (Figure 3).

| Location and site comparisons
The most abundant classes were Polychaeta, Hydrozoa and Demospongiae (Figures 4 and 5). The three classes constituted 84% of haplotype occurrences in the final dataset and were prevalent across all locations ( Figure 5), although Rovinj had more classes represented overall. Rovinj had the highest numbers of unique species, haplotypes and haplotype occurrences when comparing the three locations, F I G U R E 3 Comparison of species detected using morphological identification (sessile and vagile fractions) and metabarcoding methods at class level. Venn diagram illustrates the total number of species detected. followed by Livorno and then Palinuro (Table S3). Only 10 species were present with more than five occurrences in all three locations. Eight of these occurred in sufficient numbers to test differences between locations. AMOVA with a randomization test (n = 999) using locations and sites as groups revealed two species having significant variance explained by location: the Porifera Halisarca dujardinii (Φ 0.084, 8.4% variance explained, p = .001) and the Porifera Strongylacidon bermuda (Φ 0.081, 8.1% variance explained, p = .034), see Table S4. However, the overwhelming majority of variance was contained within sites, and most sites had several unique haplotypes (i.e. haplotypes occurring strictly at a given site; hashed bars in Figure 5). Polychaeta had the highest proportion of unique haplotypes per site and was the most abundant class; Hydrozoa and Demospongiae had a lower proportion of unique haplotypes despite being abundant ( Figure 5). Further, within-site variance was also apparent, i.e., differences between ARMS. In total, 197 haplotypes from 82 species were only found on one ARMS, mainly belonging to the most abundant classes.

| Intraspecific diversity patterns
Most species (84 of 145) had only one haplotype present, and 60 of those species were only found in one sample (i.e. one settlement plate or one natural substrate plot). Overall, nucleotide diversity was low across species, with some exceptions. A total of six species had nucleotide diversities >0.06 (Figure 4) Haplotype networks for species with at least five haplotypes are available in Appendix S2. F I G U R E 4 Frequency of nucleotide diversity per species in classes with at least five haplotype occurrences.

| Ophiothrix fragilis
For the brittle star Ophiothrix fragilis, five haplotype sequences were found in the present study across the three study locations. These After alignment and trimming, the combined data collapsed into 89 unique haplotypes with a length of 313 bp ( Figure S7). In the present study, four haplotypes were new, and one had previously been found grouped with the largest Mediterranean cluster (Figure 7). F I G U R E 5 Number of haplotypes per class, region and site (one site = 3 ARMS). Hashed bars represent site-unique haplotypes (i.e. do not occur in any other site but may occur on multiple ARMS within the site). Plain bars represent the total number of haplotypes occurring within the site. Haplotype occurrences are shown for classes with more than four occurrences across the dataset.
F I G U R E 6 (Lengend on next page) 3.6.2 | Clytia gracilis and C. hemisphaerica gracilis and five C. hemisphaerica sequences from this study were combined with 17 sequences from Cunha et al. (2017) and Govindarajan et al. (2006). Sequences from Govindarajan et al. (2006) were removed due to the presence of ambiguous bases. The combined data collapsed into 22 haplotypes of 313 bp. In this study, seven new and four previously described haplotypes were found F I G U R E 6 Haplotype networks for the most diverse species in the dataset. Colours correspond to sites within regions. Species are shown if they have at least five haplotypes and confirmed distribution in the study region. Haplotype circle sizes are not to scale due to differences in abundance between species. *Region explained a significant amount of genetic variance.

| DISCUSS ION
Autonomous reef monitoring structures have been deployed throughout the world's oceans to monitor biodiversity and evaluate the influence of environmental and anthropogenic impacts on species richness (Obst et al., 2020;Pearman et al., 2020). Here, we use DNA metabarcoding data from ARMS deployed in highly diverse locations around Italian coastal waters in the Tyrrhenian and Adriatic Seas. With a combination of existing bioinformatics tools, we disentangle haplotypic variation at an intraspecific level across biogeographic regions. We reveal trends of genetic variability between and within both locations and sites that would otherwise be missed using traditional single-species methods. Our approach documents valuable genetic information for 145 benthic invertebrate species simultaneously and enables the first investigation of the genetic patterns of diversity for several species.
This study highlights the potential of ARMS biodiversity surveys coupled with DNA metabarcoding-based haplotype analysis to reveal hidden genetic patterns among Mediterranean invertebrates.

F I G U R E 8
Haplotype network and map of sample sites for Clytia gracilis and C. hemisphaerica including sequences from the present study and Cunha et al. (2017). Haplotypes introduced in this study are marked with an asterisk (*).

| Comparison with morphological data
In addition to the molecular information, morphological data were collected from the ARMS. This included estimates of percentage cover of various sessile organism categories and morphological identification to at least genus level of vagile fauna ≥2 mm. Overall, species occurrences from the DNA metabarcoding data show slight overlap with morphological observations (Figure 3), a pattern which has been observed in similar studies (Cahill et al., 2018;Obst et al., 2020). In the present study, metabarcoding detected 128 species not found through the morphological identification methods, or alternatively, found but not identified at genus level. Conversely, metabarcoding methods failed to detect the most abundant genera in morphological observations. The fact that morphological identifications did not match well with the DNA metabarcoding data, despite the rarefaction curves being mostly saturated ( Figure S3), highlights the potential limitations and discrepancies between the two approaches. The saturation of rarefaction curves suggests that the sequencing depth was sufficient to capture the majority of the species diversity in the samples. However, the mismatch between the morphological and molecular data could be attributed to various factors.
Firstly, morphological identification may not be able to detect cryptic species, which are genetically distinct but morphologically indistinguishable, leading to an underestimation of true species diversity. Secondly, the DNA metabarcoding approach may detect species that are difficult to identify morphologically, such as sponges, or specimens otherwise overlooked due to their small size or rarity. Lastly, molecular methods may fail to identify-or misidentify-certain species due to, e.g., primer bias or mismatch (Cahill et al., 2018), insufficient sample amounts and incomplete or inaccurate reference databases (Mugnai et al., 2021).
Overall, these patterns underscore the importance of integrating both morphological and molecular methods to achieve a more comprehensive understanding of biodiversity in ecological studies.  (Ardizzone et al., 1989;Ponti et al., 2015;Spagnolo et al., 2014).

| Location and site comparisons
The three locations had similar class compositions, with Polychaeta, Demospongiae and Hydrozoa constituting the majority of haplotypes identified. Rovinj, in the northern Adriatic Sea, had the greatest number of haplotypes and represented classes. In a similar study by Pearman et al. (2020), differences in community composition between Western Mediterranean and Adriatic locations were more apparent, with Arthropoda dominating in the former and Polychaeta in the latter subregions.
Many haplotypes were unique for the different sites within each location, indicating potential within-region variation in community composition and intraspecific genetic variability ( Figure 5). This was particularly the case in Rovinj sites. Polychaetes had a larger number of site-unique haplotypes across the three locations, while most hydrozoan and demosponge haplotypes were shared between two or more sites. For example, the polychaete Sabellaria spinulosa had a large number of haplotypes in the Isola site (Livorno) but few in other sites and none in the Palinuro location ( Figure 6). Other species, such as the hydrozoan Campanularia hincksii, had haplotypes represented in and shared between almost all sites ( Figure 6). This highlights species-specific patterns in genetic diversity across the study area.
Further, a comparatively large number of haplotypes were also only found on one ARMS unit. As such, variation at all spatial scales, from location to individual ARMS unit, was observed at species level. This brings an additional perspective to the within-site community differences presented by Pearman et al. (2020).
Assessment of location differences in species-specific genetic patterns was made difficult by the small number of haplotype occurrences remaining after stringent filtering, the overall low diversity and the fact that most species did not occur across the three locations. Despite this, eight abundant species were analysed during location comparison and two of these (both demosponges) had significant genetic variance explained by location, Halisarca dujardinii and Strongylacidon bermuda. Several invertebrate species show population genetic structure in the Mediterranean, sometimes relating to geographic barriers or isolation by distance (e.g. Costantini et al., 2018;Villamor et al., 2014). For some species, Villamor et al. (2014) indicated connectivity between the Tyrrhenian and Adriatic Seas, despite significant environmental differences between these areas. In the present study, we found no clear patterns of regional variation but highlight the potential of metabarcoding methods for this purpose (see Turon et al., 2020).

| Intraspecific diversity patterns
Most species had low COI diversity, with 58% of recorded species having only one haplotype. Moreover, 41% of described species were only found in one sample. Across all species, including those with a single occurrence, the average nucleotide diversity was 0.8% ± 2.6 SD. When excluding species with zero diversity, the corresponding number was 1.9% ± 3.7 SD. Similarly low intraspecific diversity was found by both Shum andPalumbi (2021) andTuron et al. (2020) using metabarcoding methods in the Pacific Ocean and Atlantic/Mediterranean Seas respectively. The early-stage communities on artificial structures can be expected to have lower diversity than established communities on natural substrates. In addition, diversity estimates can be influenced by the size of the gene fragment since the 313 bp used in this study is shorter than the 650 bp in conventional COI-based population genetics. In the present study, this was demonstrated in the case of Ophiothrix fragilis, where trimming the additional data from Pérez-Portela et al. (2013) collapsed 125 sequences to 90 haplotypes and removed the distinction between two known lineages ( Figure S7). Further, COI inherently harbours low variability in several taxa, which can also explain the observed patterns. These taxa are typically at the base of the Metazoan tree (Huang et al., 2008) and include, for example, sea anemones (Shearer et al., 2002), corals (Calderón et al., 2006;Shearer et al., 2002) and demosponges (Erpenbeck et al., 2016). Apart from the class Demospongiae, this aligns with the observed patterns in the present study ( Figures 4 and 5).
At the other end of the spectrum, some instances of high intraspecific diversity were found. High nucleotide and haplotype diversities were observed in Polychaeta, Malacostraca, Hydrozoa and Demospongiae, which also were the most abundant classes. Haplotype diversity per species significantly increased with increasing haplotype occurrences ( Figure S9), while nucleotide diversity unexpectedly did not ( Figure S10). For example, some abundant species had only one haplotype or low nucleotide diversity (e.g. the Porifera among Trypanosyllis pseudocryptic lineages, which was lower than the maximum "intraspecific" distance found in this study (42.6%; Appendix S1).
Another species with unexpected genetic patterns was the Ross worm, Sabellaria spinulosa, an abundant reef-building polychaete in the Mediterranean Sea (Schimmenti et al., 2015) which had many haplotype occurrences in our samples. The abundance was expected as it is a pioneer species, and its presence was confirmed by the morphology-based records. More unexpectedly, however, this study presents relatively high nucleotide diversity (0.025) and very high intraspecific distances (up to 10.7%) for this species. For comparison, a previous study of S. spinulosa in the Mediterranean found the intraspecific distances to be only 0.9% (Schimmenti et al., 2015) employing a longer COI fragment (610 bp). This highlights potential population structure or crypticity within an important yet understudied species in this region. Similarly, the polychaete Subadyte pellucida had extremely high nucleotide diversity (16.1%) and has been found living on or attached to other organisms such as coral colonies (Mastrototaro et al., 2010), sponges (Goren et al., 2021), anemones (Mangano et al., 2010) and sea stars/brittle stars (e.g. Ophiothrix fragilis; Pettibone, 1993). However, this species has not been studied at a population genetic level in the Mediterranean and requires further investigation related to their host specialization and diversity. These results demonstrate the benefit of metabarcoding-based methods to identify potential cryptic lineages and highlight prospective avenues of research in species discovery and distribution.
On a species level, we were able to compare our findings with existing studies of four groups: the brittle star Ophiothrix fragilis, the annelid polychaetes Platynereis dumerilii and P. massiliensis, the hydrozoans Clytia gracilis and C. hemisphaerica and the shrimp Eualus spp. We found both new and previously described haplotypes in our dataset for all species groups, and further provide records from previously unstudied locations. The findings in the present study aligned with results from studies applying traditional methods, for which two examples are highlighted here. In the case of C. gracilis and C. hemisphaerica, we find that C. gracilis forms a polyphyletic clade (Figure 8; Figure S10) and therefore contend this species consists of a species complex in accordance with Cunha et al. (2017) and Govindarajan et al. (2006). For O. fragilis, Pérez-Portela et al. (2013 suggested the existence of two distinct lineages separated by genetic distances of at least 15.8% (p distances). Lineage I was observed almost exclusively in the Atlantic Ocean and lineage II in the Mediterranean Sea, except for one sequence from lineage I collected in Blanes, Spain (BLA1; Figure 7). However, when sequences were trimmed to 313 bp and collapsed, the lineage I sequence BLA1 clustered with lineage II haplotypes. Upon further inspection, any distinction between the two lineages disappeared when using the shorter sequence ( Figure S7). As such, this COI fragment failed to detect significant population structure within O. fragilis and highlights target gene limitations. Haplotype networks combining data from the current and comparable studies for Eualus spp. and Platynereis spp. are available in Appendix S2 ( Figures S5 and S6).

| Method limitations
The mismatch between morphological and metabarcoding records and the presence of extreme outliers in nucleotide diversity suggest that the metabarcoding methods may fail to detect, or mislabel, certain species. For example, our data contain some unlikely taxonomic assignments, such as the demosponge Strongylacidon bermuda. This species was the most abundant demosponge across samples but has previously only been recorded in North America (de Voogd et al., 2022). Unless this is a recently introduced and highly successful non-indigenous species, which we consider unlikely, its presence in our dataset is almost certainly some form of error. In this case, the error is likely due to incomplete public databases, which is a common issue in DNA metabarcoding (Mugnai et al., 2021). In our study, it is exemplified by the small proportion of metazoan haplotypes that received taxonomic assignment at species level (less than 10%) despite using the BOLD database. In addition, for the species we identified, we find very limited sequence data from the Mediterranean region for comparisons in GenBank, which further highlights the regional discrepancy in database coverage.
Intraspecific studies involving data from ARMS will have some inherent differences from traditional approaches. ARMS units provide a snapshot of diversity, capturing only those species that settle onto the plates during the deployment period. This design could potentially lead to undersampling of species diversity, which may affect the overall assessment of community composition. On the other hand, haplotype diversity refers to the genetic variation within species or populations. The observed haplotype patterns in our study may be influenced by factors such as natural population structure, selection pressures and study design, rather than solely undersampling. The single haplotypes observed for many species could reflect the true genetic structure of these populations. By recognizing the difference between these two aspects of diversity, we can better interpret our study results and understand the potential limitations associated with the ARMS units and the chosen molecular techniques.
Future studies should consider incorporating additional sampling methods and strategies to capture a more comprehensive representation of both species and haplotype diversity.
Furthermore, metabarcoding approaches include bulk samples, in this case, material scraped from plates, and the sample composition can affect the outcome. In the present study, we excluded macrophyte taxa during analysis following our research objective, despite rhodophytes constituting a significant portion of the biomass. Biomass-dominant taxa can affect the read abundances of both rarer and smaller organisms, and therefore limit the probability of their detection (Elbrecht et al., 2017;Leray & Knowlton, 2015).
Subdividing scraped material by size fraction may, therefore, be useful for future studies when not all target organisms constitute a considerable proportion of the biomass (Wangensteen et al., 2018).
Lastly, any bioinformatic pipeline will have trade-offs when attempting to remove erroneous haplotypes from the dataset. For the present study, we chose a stringent filtering approach which almost certainly excludes rare species or haplotypes and therefore prevents more in-depth analysis such as demographic inferences (Shum & Palumbi, 2021). Incorporating positive controls with intraspecific variation in the experimental design would also be beneficial for pipeline development, which was not the case with the current dataset. As such, both bioinformatics methods and study design should be chosen carefully depending on the research objectives moving forward.

| CON CLUS ION
In this study, we recover haplotype information for hundreds of Mediterranean marine invertebrate species simultaneously. This was achieved by implementing COI-based metabarcoding on ARMS deployed in coastal marine ecosystems. Our haplotype findings overlap with comparable published studies using single-species population genetic methods. In general, we find low intraspecific diversity across several invertebrate classes in the three study locations in the central Mediterranean Sea. However, we also identify species with high diversity that warrant further investigation regarding potential crypticity and population structure. In addition, we highlight the presence of unique haplotypes at all spatial scales for almost all classes studied, indicating small-scale genetic variability within the invertebrate species presented here. We demonstrate DNA metabarcoding as an important tool to generate intraspecific genetic information, yet emphasize the limitations of these methods, as they rely on taxonomic accuracy in barcode reference databases. The approach presented here greatly enhances DNA metabarcoding methods to reveal interspecific COI variation both within and between ARMS units. This has the potential to strengthen species monitoring across vast scales that help track geographical range shifts and climate-related impacts on biodiversity.

CO N FLI C T O F I NTER E S T S TATEM ENT
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.

DATA AVA I L A B I L I T Y S TAT E M E N T
Sequence data and pipeline step outputs have been deposited to Zenodo and are available at https://zenodo.org/recor d/7781906. Scripts used for the bioinformatics pipeline, data analysis and visualization are available at https://github.com/thoma sdott er/spine less-haplo types.

B EN EFIT-S H A R I N G S TATEM ENT
Benefits from this research accrue from the sharing of our data and results on public databases as described above.