Genomic signatures of host‐associated divergence and adaptation in a coral‐eating snail, Coralliophila violacea (Kiener, 1836)

Abstract The fluid nature of the ocean, combined with planktonic dispersal of marine larvae, lowers physical barriers to gene flow. However, divergence can still occur despite gene flow if strong selection acts on populations occupying different ecological niches. Here, we examined the population genomics of an ectoparasitic snail, Coralliophila violacea (Kiener 1836), that specializes on Porites corals in the Indo‐Pacific. Previous genetic analyses revealed two sympatric lineages associated with different coral hosts. In this study, we examined the mechanisms promoting and maintaining the snails’ adaptation to their coral hosts. Genome‐wide single nucleotide polymorphism (SNP) data from type II restriction site‐associated DNA (2b‐RAD) sequencing revealed two differentiated clusters of C. violacea that were largely concordant with coral host, consistent with previous genetic results. However, the presence of some admixed genotypes indicates gene flow from one lineage to the other. Combined, these results suggest that differentiation between host‐associated lineages of C. violacea is occurring in the face of ongoing gene flow, requiring strong selection. Indeed, 2.7% of all SNP loci were outlier loci (73/2,718), indicative of divergence with gene flow, driven by adaptation of each C. violacea lineage to their specific coral hosts.

There are a number of reasons for this reassessment. First, absolute physical barriers in the sea are exceedingly rare (Ludt & Rocha, 2015;Rocha & Bowen, 2008;Rocha et al., 2005). As a result, speciation must often proceed with varying levels of gene flow and aided by divergent selection in different habitats or hosts (Palumbi, 1994). Second, the strong interspecific interactions that can promote ecological speciation in terrestrial species (e.g., host-parasite, mutualisms) are also common in certain marine ecosystems (Blackall, Wilson, & van Oppen, 2015;Stella, Jones, & Pratchett, 2010). For example, reef-building corals have tight ecological associations with a wide variety of invertebrate taxa (Zann, 1987), including ~900 named species of sponges, copepods, barnacles, crabs, shrimp, worms, bivalves, nudibranchs, and snails (reviewed by Stella et al., 2010). This wide array of symbiotic relationships creates tremendous potential for host shifting and the development of host specificity that can lead to sympatric speciation.
While encouraging, there are gaps in our knowledge that with the expansion of genomic technologies, we are now in a position to begin to fill. Detecting signatures of natural selection in populations where there is likely ongoing gene flow is now possible using genome-wide data, lending insight into the mechanisms of ecological speciation (Bernal, Gaither, Simison, & Rocha, 2017;Campbell, Poelstra, & Yoder, 2018;Puebla, Bermingham, & McMillan, 2014;Westram et al., 2018). To date, however, no studies examining the genomic signatures of ecological divergence in marine host-parasite systems have been conducted.
The ~6 million km 2 Coral Triangle region is home to over 500 species of reef-building corals (Veron et al., 2011) and thousands of unique species of fishes and invertebrates (Barber & Boyce, 2006;Briggs, 2003), making it the global center of marine biodiversity (Cowman & Bellwood, 2011;Hoeksema, 2007). Most of the literature examining the evolution of this biodiversity hotspot has focused on allopatric processes such as divergence across geological and oceanographic features such as the Sunda Shelf or Halmahera Eddy during Pleistocene low sea levels stands (for reviews, see Barber, Cheng, Erdmann, Tenggardjaja, & Ambariyanto 2011;Carpenter et al., 2011;Gaither & Rocha, 2013). Allopatric divergence is clearly an important factor in the biodiversity of the Coral Triangle. However, the extraordinary diversity in this region, combined with the prevalence of strong species-species interactions on coral reefs, makes it likely that ecological speciation also contributes to the evolution of biodiversity in this hotspot.
The marine snail, Coralliophila violacea (Figure 1), is an obligate ectoparasite, living, feeding, and reproducing exclusively on corals in Poritidae, a highly abundant and diverse coral family (Kitahara, Cairns, Stolarski, Blair, & Miller, 2010), which is found in shallow reefs across the tropical Indo-Pacific. The snails attach themselves to their host, form feeding aggregations, and drain energy from their host as it tries to repair damaged tissues (Oren, Brickner Loya, 1998). They are sequential hermaphrodites, a common trait of parasitic mollusks (Heller, 1993), and breed with conspecifics on their host coral colony. Two genetically distinct lineages of C. violacea occur sympatrically on reefs of the Coral Triangle, but each lineage occupies one of two groups of Porites corals, suggesting ecological divergence (Simmonds et al., 2018). A lack of evidence of genetic structure within each lineage of C. violacea inside the Coral Triangle precludes physical isolation as an explanation for the observed divergence. Host specificity commonly results from preferential larval settlement (Ritson-Williams, Shjegstad, & Paul, 2003 populations should be considered incipient species (Drès & Mallet, 2002;Malausa et al., 2007).

Coral Triangle
Genomic tests of selection are key to distinguishing between these possibilities. If divergence among C. violacea lineages results purely from neutral processes, genetic drift and migration should have approximately equal effects on all parts of the genome (Nielsen, 2005), and frequencies of neutral loci should show similar levels of differentiation (Via, 2009 (Feder et al., 1994;Nosil, Funk, & Ortiz-Barrientos, 2009), because natural selection affects non-neutral parts of the genome, as well as linked loci, to a greater extent (Smith & Haigh, 1974). As such, frequencies of loci under selection (outlier loci) or linked loci should either be unusually high or unusually low, in host-associated populations, depending on the type of selection occurring (Beaumont & Nichols, 1996).
In this study, we use genome-wide single nucleotide polymor-

| Sample collection
We collected snails on snorkel during 2011-2013 from six sympatric populations of two lineages of C. violacea representing unique parasite-host groups (Table 1, Figure 2, Appendix S1). We chose snails from the most abundant Porites species from each group (P. lobata, P. cylindrica, Dana, 1846, Figure 1) to maximize the number of samples and reduce potentially confounding effects of differences among hosts within the same group. To further reduce confounding effects resulting from taxonomic complexity within P. lobata (Forsman, Barshis, Hunter, & Toonen, 2009;Prada et al., 2014), we used coral host species identifications from Simmonds et al. (2018) that were confirmed through RAD-seq data.

| Creation of RAD libraries
We extracted genomic DNA from 20 mg of foot tissue from 67 individual C. violacea (34 from P. cylindrica and 33 from P. lobata;

| RAD-seq data processing
To prepare raw sequence data for SNP identification, we truncated all raw sequence reads to the insert size (36 bp), filtered for quality (PHRED scores >20), and discarded empty constructs. We then processed the resulting data using custom scripts written by Misha Matz, available on the 2bRAD GitHub site (https ://github.com/ z0on/2bRAD_denovo). First, we counted unique tag sequences (minimum sequencing depth 5×) and the number of sequences in reverse-complement orientation and then merged these tags into one table. Then, we clustered all sequences in CD-HIT (Fu, Niu, Zhu, Wu, & Li, 2012) using a 91% similarity threshold. Next, we defined the most abundant sequence in the cluster as a reference sequence and then filtered a locus-annotated table from the previous two steps, excluding reads below 5× depth and those exhibiting strand bias. Lastly, we flipped the orientation of the resulting clustered sequences to match the most abundant tag in a cluster.
To call genotypes (as population-wide RAD-tag haplotypes), we used GATK (McKenna et al., 2010) and applied mild allele filters (10× total depth, allele bias cutoff 10, and strand bias cutoff 10), with the additional requirement that alleles appear in at least two individuals.
We then applied locus filters allowing a maximum of 50% heterozygotes at a locus, no more than two alleles, genotyped in 30% of samples and polymorphic. Finally, we removed loci with the fraction of heterozygotes >75% (potential lumped paralogs) and missing >70% of genotypes. The final set of SNPs was then thinned to one per tag (that with the highest minor allele frequency) for F ST and STRUCTURE analysis to remove linked loci.

| Individual sample filtering steps
To maximize the quality of the final dataset, we further filtered out individuals (N = 11) with low genotyping rates, indicating low DNA quality, by taking the log 10 of the number of sites genotyped per individual, and removing any individuals that were outside one standard deviation (SD) of the mean. We used VCFtools (Danecek et al., 2011)

| Genetic structure
To test whether the patterns observed in a mitochondrial locus were present in loci genome-wide, we inferred the population genetic structure of the full RAD-seq dataset (2,718 loci), outlier loci only (73 loci), and neutral loci only (2,645 loci), from 51 individuals using two methods. First, we ran the Bayesian model-based clustering method STRUCTURE (Pritchard, Stephens, & Donnelly, 2000) using a burn-in period of 20,000 followed by 50,000 MCMC replicates for K = 1-12, and 10 runs for each K. We used the admixture model, with allele frequencies correlated among populations. The results from STRUCTURE were then analyzed in CLUMPAK v1.1 (Kopelman, Mayzel, Jakobsson, Rosenberg, & Mayrose, 2015) to select for the best K and graphically display the results.

| Outlier analyses
To test for evidence of natural selection in relation to coral host, we compared SNPs between lineages of snails on different hosts, pooled across six localities, with two datasets: (a) including all individuals and (b) excluding migrants and admixed individuals that we identified using STRUCTURE. First, we performed an outlier loci analysis using BayeScan v2.1 (Foll & Gaggiotti, 2008) with a prior of 10, a sample size of 5,000, and 100,000 iterations, using a burn-in of 50,000, and 20 pilot runs of 5,000 each. To explore the impact of misleading data, we employed a 10% false discovery rate.
To further explore outlier loci, we used a second method to detect loci under selection (FDIST2) as implemented in ARLEQUIN (Excoffier & Lischer, 2010

| Candidate gene identification and annotation
To annotate the putative functions of genes linked to outlier loci, we aligned sequences containing SNP outlier loci to nucleotide collections (nr/nt) available on the NCBI website, in Blast2GO 5 Basic version (October 7, 2019) using the BLASTn algorithm (Altschul et al., 1997) with a taxonomic filter for Mollusca (taxid:6447). We adjusted parameters (expected threshold 10, word size 7, no low complexity filter, no mask for look-up table only) to accommodate short read sequences. We only examined hits with a high query coverage (>80%).
Then, we identified and annotated any associated genes using NCBI and GeneCards ® .

| RE SULTS
After removing empty constructs and filtering for quality, we obtained an average of 5,710,091 unique sequence reads per individual at a minimum 5× depth. In total, we sequenced and genotyped 17,676 high-quality RAD-seq loci with ≥25× coverage, in 67 snails collected from two different coral host species, at six locations. After filtering for 30% maximum missing data per locus, this total decreased to 5,999 loci and then to 2,718 SNPs following thinning to one SNP per loci to remove any physically linked SNPs for STRUCTURE and F ST analyses. Next, we removed 16 individuals that had either low DNA quality (missing data ≥ +1SD from the mean) or potential contamination issues (inbreeding coefficient ≥ +2SD from the mean), leaving 51 individuals.

| Migration and admixture
Inferring the ancestry of individuals in STRUCTURE, using host as a prior, revealed strong differences among C. violacea living on different coral hosts (P. lobata and P. cylindrica, Figure 4), despite some migration and admixing between sympatric lineages.
Moreover, migration rates were strongly asymmetric between snails living on these two hosts. In total, 19% (5 of 26 samples) of the snails collected from P. cylindrica had P. lobata genetic ancestry, while no snails (0 of 25 samples) with P. cylindrica ancestry were ever found on P. lobata (Appendix S4 and S5). Admixed individuals were only found at locations where migration was also observed (Dumaguete and Pulau Mengyatan; Appendix S5). After excluding migrants and admixed individuals, the mean F ST across all loci increased from 0.047 to 0.075 and the weighted F ST from 0.090 to 0.150.

| Host-specific directional selection
Because STRUCTURE identified 9/51 individuals that were either migrants from one coral host to the other, or of admixed ancestry (Appendix S5), we used two different datasets for detecting host-specific selection: (a) all individuals in the filtered dataset and (b) excluding migrants and admixed individuals. We then searched for loci under selection using two methods. The first involved a Bayesian model, BayeScan (Foll & Gaggiotti, 2008). Using the default false discovery rate (FDR) of 10%, we identified six loci as outliers (pairwise F ST = 0.241-0.354, mean F ST = 0.305, Figure 5a, Table 2) in the dataset with all snails. Three of these outlier loci (tag21753, tag39884, tag52997) had log 10 (PO)> 1 giving substantial-to-strong support as candidate loci, based on criteria from (Jeffreys, 1961). After excluding all admixed and migrant individuals, the number of outlier loci only increased to eight (pair- Figure 5b, Table 2).

| Mapping and annotation of outlier loci
The majority (78%) of putative outlier loci did not align to any other mollusk sequences currently available in the NCBI database (11/2019, that plays a role in gastropod feeding behavior (Miller, 2019), and a hormone receptor gene (tag28347, HR96 gene) involved in the regulation of xenobiotic detoxification (Lindblom & Dodd, 2006;Richter & Fidler, 2014). At tag28347, there were two alleles that occurred F I G U R E 4 Bar plot of Bayesian assignment probability from STRUCTURE for K = 2 using 2,718 loci from 51 Coralliophila violacea. Each vertical bar corresponds to an individual. The proportion of each bar represents an individual's assignment probability to cluster one (green) or two (gold), shown grouped by coral host and then by location as numbered in Table 1, Figure 2 in almost equal frequency (43%, 57%) in the P. lobata-associated lineage of snails but were nearly fixed (97%) for one allele in the P.

| Divergence with gene flow
In parasitic species such as C. violacea, divergence with gene flow likely occurs through two mechanisms of premating isolation (Nosil, Vines, & Funk, 2005). The first is host preference for egg laying and/ or recruitment to their host (either individual or species). Divergence occurs when mating takes place solely on that host, eventually

TA B L E 2 (Continued)
leading to speciation (Funk, Filchak, & Feder, 2002;Hawthorne & Via, 2001). Second is host adaptation, where selection acts against immigrants from another host via immigrant inviability (Nosil, 2007;Nosil et al., 2005;Porter & Benkman, 2017). Our study suggests that both mechanisms may be occurring in C. violacea. All migrants were individuals that genetically sorted to the lineage associated with P. lobata but were instead living on P. cylindrica. Additionally, only admixed individuals were observed on P. lobata. This pattern suggests that gene flow and admixture between host-associated lineages are unidirectional-from lobata to cylindrica. Such unidirectional gene flow could result from two possible scenarios, either the failure of larvae to recruit, or the failure of recruited larvae to survive.
Larval recruitment processes could promote asymmetrical gene flow if the lineage associated with P. cylindrica strongly prefers P.
cylindrica as a host over P. lobata or does not respond to chemical settlement cues from P. lobata, preventing the recruitment of P.
cylindrica-associated larvae to P. lobata. In addition, larvae from P.
lobata would need to be less selective in their recruitment, occasionally landing on P. cylindrica rather than P. lobata. Such a mechanism makes sense, given that there are twice as many coral species (N = 8) in the clade of Porites to which P. lobata belongs, than in that to which P. cylindrica belongs.
An alternative, but not mutually exclusive explanation is that asymmetry in gene flow and admixture could result from postsettlement processes. For example, if larvae from P. cylindrica-associated individuals settle on P. lobata, but are less likely to survive and reproduce, this could lead to immigrant inviability (Ingley & Johnson, 2016;Nosil et al., 2005;Richards & Ortiz-Barrientos, 2016) and asymmetry in admixture. Under such a scenario, genes beneficial to snails living on P. cylindrica would likely be less helpful on P. lobata and we should see some indication of a selective sweep in the derived lineage with respect to the standing genetic variation of the ancestral lineage (Przeworski, Coop, & Wall, 2005). Indeed, results showed some outlier loci (e.g., HR96, detoxification gene) that were in equal proportions in P. lobata (43%, 57%) but were at near fixation in P. cylindrica (97%), indicating a soft sweep on standing genetic variation at that locus.
Regardless of whether the limited misalignment of snails and coral hosts results from pre-or postrecruitment processes, the fact that the vast majority of snails sort by host coral in the face of hybridization and gene flow indicates that natural selection must be relatively strong to counteract gene flow of Nm>10 (Funk, Egan, & Nosil, 2011). Moreover, the high fidelity of the snails occupying P.
cylindrica and lower fidelity of snails occupying P. lobata, combined with selective sweeps in P. cylindrica, suggest that snails parasitizing P. lobata are the ancestral lineage. This conjecture is consistent with the observation that specialist species often evolve from generalist ancestors (Nosil, 2002), likely because specialization constrains further evolution by reducing genetic variation (Moran, 1988). If it is generally true that specialists evolve from generalists (Kawecki, 1996(Kawecki, , 1998, then host specialization could be an important mechanism of divergence within the Coral Triangle (Briggs, 2005) as increased diversity should raise niche partitioning, leading to more opportunities for host specialization (Janz, Nylin, & Wahlberg, 2006).

| Candidate genes involved in adaptation to host
Outlier loci can provide insights into the targets of natural selection (Storz, 2005) and are a useful starting point for determining how selection may be acting on lineages diverging on different hosts.
Our analysis revealed 73 putative gene regions with F ST values significantly higher than neutral expectations, suggesting that they are likely under selection and could be involved in adaptation to coral hosts, or linked to such genes via hitchhiking (Via, 2012).
There is no a priori information on the types of genes involved in the adaptation of mollusks to different hosts and, due to a lack of genomic resources for C. violacea, only 9 of 73 outlier loci mapped to gene regions with predicted functions. However, a useful comparison can be found in ectoparasitic phloem-feeding insects adapting to different host plants (Oren et al., 1998). Genes under selection in these insect-plant interactions include genes involved in sensing hosts, that protect insects against plant defenses and facilitate feeding, and that code for digestive and detoxifying enzymes to neutralize plant toxins (e.g., metal-ion binding, Simon et al., 2015).  (Freeman et al., 2003). Notably, this is only the first genomic exploration of C. violacea and a broader survey of genomic diversity would be needed to pin down areas of the genome that are crucial for adaptations to coral hosts. Future work would benefit from a fully annotated genome of C. violacea that would allow us to examine the genomic architecture of divergence with gene flow and quantitative trait loci. In turn, this would allow us to better pinpoint regions of the genome under selection, and the specific functions of genes involved in adapting to different hosts.

| Ecological divergence in the sea
John Briggs originally proposed the idea of sympatric speciation as an important diversification mechanism within the Coral Triangle (i.e., "Center of Origin" hypothesis), as well as in the export of species formed under intense competition within the region (Briggs, 1999(Briggs, , 2005. To support his hypothesis, he pointed to multiple cases of sympatric sibling species with distributions centered on the Coral Triangle, where the older of the two species has a wide range, while the younger has a much more restricted range limited to the Coral Triangle (Briggs, 1999). Our study provides the first genomic evidence to support his assertion that ecological divergence with gene flow could be generating biodiversity in the Coral Triangle. In addition, spatial patterning of C. violacea sympatric host lineages also matches the pattern Briggs described, with the ancestral P. lobata host lineage having a broad geographic distribution, and the derived P. cylindrica host lineage restricted to the Coral Triangle (Simmonds et al., 2018).
While there is ongoing debate (Evans, McKenna, Simpson, Tournois, & Genner, 2016;Huang, Goldberg, Chou, & Roy, 2018;Di Martino, Jackson, Taylor, & Johnson, 2018;Matias & Riginos, 2018), there is clearly a multiplicity of processes driving diversification in this region (Barber & Meyer, 2015). Given the results of this study, it is important to expand our thinking beyond models that focus solely on allopatry to advance our understanding of marine speciation and origins of the Coral Triangle biodiversity hotspot.

ACK N OWLED G M ENTS
We are grateful to Dr. Eli Meyer and Dr. Misha Matz for the use of their scripts to process and analyze 2b-RAD sequence data. We acknowl-