Symbiont type and environmental factors affect transcriptome‐wide gene expression in the coral Montipora capitata

Abstract Reef‐building corals may harbor genetically distinct lineages of endosymbiotic dinoflagellates in the genus Symbiodinium, which have been shown to affect important colony properties, including growth rates and resilience against environmental stress. However, the molecular processes underlying these differences are not well understood. In this study, we used whole transcriptome sequencing (RNA‐seq) to assess gene expression differences between 27 samples of the coral Montipora capitata predominantly hosting two different Symbiodinium types in clades C and D. The samples were further characterized by their origin from two field sites on Hawai‘i Island with contrasting environmental conditions. We found that transcriptome‐wide gene expression profiles clearly separated by field site first, and symbiont clade second. With 273 differentially expressed genes (DEGs, 1.3% of all host transcripts), symbiont clade had a measurable effect on host gene expression, but the effect of field site proved almost an order of magnitude higher (1,957 DEGs, 9.6%). According to SNP analysis, we found moderate evidence for host genetic differentiation between field sites (F ST = 0.046) and among corals harboring alternative symbiont clades (F ST = 0.036), suggesting that site‐related gene expression differences are likely due to a combination of local adaptation and acclimatization to environmental factors. The correlation between host gene expression and symbiont clade may be due to several factors, including host genotype or microhabitat selecting for alternative clades, host physiology responding to different symbionts, or direct modulation of host gene expression by Symbiodinium. However, the magnitude of these effects at the level of transcription was unexpectedly small considering the contribution of symbiont type to holobiont phenotype.

Some types, most notably in clade D, may provide the holobiont with increased tolerance of elevated temperatures and can reduce susceptibility to coral bleaching (Berkelmans & van Oppen, 2006;, the breakdown of the coral-Symbiodinium symbiosis which is emblematic of the current coral health crisis. Genome-wide analyses of host gene expression hold great promise to reveal the genetic underpinnings of these functional differences. Host transcriptional changes have been studied in nonpathogenic symbioses of bacteria and vertebrates (Rawls, Samuel, & Gordon, 2004), insects (Wilson et al., 2006), cephalopods (Chun et al., 2008;Wier et al., 2010), deep-sea mussels (Boutet et al., 2011), and polychaetes (Nyholm, Robidart, & Girguis, 2008). In cnidarians, most transcriptomic studies have focused on the establishment of symbiosis in coral larvae (Mohamed et al., 2016;Schnitzler & Weis, 2010;Voolstra et al., 2009), recently settled coral polyps (O'Rourke et al., 2017), and adult symbiotic and nonsymbiotic sea anemones (Lehnert et al., 2014;Rodriguez-Lanetty, Phillips, & Weis, 2006). In most of these studies, significant changes in host gene expression suggested an active role of the host in the establishment and regulation of symbiosis. However, only few studies have investigated the transcriptional response of cnidarian hosts to different symbiont types. The first to show a correlation between Symbiodinium type and host transcription, DeSalvo et al. (2010) reported expression changes in genes involved in protein metabolism in the coral Montastraea faveolata. This observation was more recently corroborated in Acropora millepora by Barfield, Aglyamova, Bay, & Matz (2018), who demonstrated that both the symbiont type and the native reef environment result in host gene expression differences in a common garden experiment. Modulation of host physiology, including metabolism, stress response, and immune system, was also found in the sea anemone Exaiptasia pallida during colonization by an opportunistic Symbiodinium type, with respect to the dominant type (Matthews et al., 2017).
The objective of the present study was to expand upon this incipient knowledge of how different coral-Symbiodinium partnerships affect gene expression, and to shed more light on how interactions between symbiont partners are regulated at the genetic level in Montipora capitata, one of the most common coral species in Hawai'i. Using whole transcriptome sequencing (RNA-seq), we examined the transcriptional profiles of M. capitata colonies in field populations harboring Symbiodinium types representing clades C and D. We compared these profiles to the transcriptomic states corresponding to two other variables, field site and disease status. The two field sites, while both located on Hawai'i Island, differed markedly with respect to underwater topography and climatic conditions, especially rainfall and temperature.
Organisms can respond to environmental changes through local adaptation (allele frequency changes between generations due to natural selection) and acclimatization (nongenetic, usually shortterm phenotypic responses), both of which can manifest in gene expression changes. Environmentally sensitive genes may comprise a significant portion of the transcriptome (5%-15%; Santo et al., 2013;Zhou, Campbell, Stone, Mackay, & Anholt, 2012). The third variable we included was the presence or absence of Growth Anomaly (GA), a widespread coral disease characterized by abnormally enlarged skeletal growth. In a previous study partially based on the same dataset, we found that GA has a surprisingly limited impact on gene expression (Frazier, Helmkampf, Bellinger, Geib, & Takabayashi, 2017). Comparing the effects of field site and disease status to clade-related gene expression differences in the present study allowed us to evaluate the magnitude of these effects.
Further, we measured genetic differentiation between field sites and among corals harboring alternative clades directly from the RNA-seq data to assess the role of host genotype. We also provided a list of host genes whose expression was correlated with Symbiodinium clade, and employed coexpression network analysis to illuminate the connectivity and function of genes that might be responsive to the presence of different symbiont clades.

| Description of field sites, sample collection, and sequencing
Sample collection, RNA extraction, and RNA sequencing were conducted as described previously (Frazier et al., 2017) (Johnson & Wiegner, 2013). Waiʻōpae, while on an open coastline, was composed of numerous permanently and temporarily submerged basaltic tidepools, ranging in size and connectivity at low tide (c. 1 m 3 to 2,500 m 3 , 0.25-3 m depth). Such complex underwater topography at Waiʻōpae, along with high rainfall and groundwater discharge, created high variability in temperature and salinity (Wiegner, Mokiao-Lee, & Johnson, 2016).
The 27 core samples originated from a total of 18 colonies (12 from Waiʻōpae and six from Kīholo) found at a depth between 0.5 and 3.0 m relative to mean lower low water. Half of the colonies at each site were morphologically healthy, in which case one sample was taken per colony. The other half was visibly affected by the coral disease GA. From each diseased colony, two samples were taken, one each from lesioned and morphologically unaffected tissue. In the laboratory, coral tissue was scraped off the cores using a sterile razor, and total RNA was extracted from pulverized tissue using a combination of TRIzol (Invitrogen), chloroform, and the RNeasy Mini Kit (Qiagen) following the manufacturer's instructions. Total RNA was sent to the Yale Center for Genomic Analysis for mRNA isolation and sequencing on three lanes of an Illumina HiSeq 2000.
Libraries were constructed and multiplexed as described in Frazier et al. (2017), and paired-end sequencing was conducted for 75 cycles for each read pair.

| Transcriptome assembly and separation
The assembly process has also been described in more detail elsewhere (Frazier et al., 2017). In short, 25 million paired-end reads were generated on average per sample (range: 17-42 million). A meta-transcriptome representing the host and associated microorganisms was assembled from the quality-filtered raw reads using the Trinity package (Grabherr et al., 2011). Of initially 660,340 isotigs (transcripts), 87,085 were retained after removing low expression transcripts (FPKM <0.5 summed across all samples, see below) and transcripts without open reading frames. To separate coral host from symbiont transcripts, this dataset was subjected to three processes: (a) ortholog detection using InParanoid version 4.1 (Sonnhammer & Ostlund, 2014), with genomes and transcriptomes of the coral Acropora digitifera (Shinzato et al., 2011) and clade B symbiont Symbiodinium minutum (Shinzato et al., 2011;Shoguchi et al., 2013) serving as references; (b) ortholog detection based on reciprocal best hits to cnidarian sequences in GenBank using BLASTp; and (c) retention of transcripts with a GC content ≤47% after a strong GC content bias was observed in independently annotated coral versus symbiont sequences. Transcripts that were classified as coral or cnidarian by the first two homology methods, and unclassified transcripts meeting the GC content cutoff, were regarded as M. capitata transcripts, resulting in a coral host transcriptome reference of 20,461 sequences.

| Symbiodinium clade determination
To determine the predominant Symbiodinium clade in each sample, we searched the meta-transcriptome assembly (all isotigs before quality filtering) for transcripts of the internal transcribed spacer 2 (ITS2) rDNA region using BLASTn. As queries, we selected previously described ITS2 sequences representing clades A-D (LaJeunesse, 2001: AF333505, AF333511;Putnam, Stat, Pochon, & Gates, 2012: HE578979, HE579036). Four meta-transcriptome transcripts were found at an e-value <0.001, one having high similarity to clade C, and three to clade D. Two of the latter differed only in flanking sequence and were identical regarding ITS2, and one was discarded because it appeared to be misassembled (the sequence homologous to ITS2 was not flanked by 28S, in addition to being lowly expressed). The two remaining sequences, referred to as reference transcripts from here on, were identified to type level by aligning them separately to ITS2 sequences of a wide diversity of types within each clade (C1, C3, C15, C17, C21, C31 for clade C; D1a, D8, D12, D13, D15 for clade D; most were taken from Putnam et al., 2012; with MAFFT version 7 (Katoh & Standley, 2013). Reference transcripts were then identified by their closest phylogenetic affiliation using RAxML version 8.1.20 (Stamatakis, 2014) with the GTRCAT model and 250 rapid bootstrap replicates. The composition of the Symbiodinium assemblage in each sample was estimated by mapping quality-filtered raw reads to the two reference transcripts using bowtie version 2.2.4 (Langmead & Salzberg, 2012) and the --very-sensitive-local option. The number of uniquely mapped reads was counted for each reference transcript (read pairs were counted only once, regardless of whether both mates aligned concordantly or discordantly, or whether only one mate aligned), and per-sample clade composition expressed as the proportion of reads mapping to the clade C or D reference. Reads that did not perfectly match the reference transcripts, suggesting the presence of intragenomic variants, alternative Symbiodinium types or sequencing errors, were identified by the "XM" flag in SAM output files, and their genetic distance estimated in terms of nucleotide differences.

| Gene expression analyses
The abundance of coral host transcripts was measured in counts and FPKM values by the RSEM method (Li & Dewey, 2011) in Trinity.
In all cases, transcripts with a minimum count of 1 in fewer than three samples were first removed (note this filtering step was conducted in addition to the removal of lowly expressed genes during meta-transcriptome assembly processing as described above), and counts normalized to counts per million (CPM) using edgeR version 3.10.5 (Robinson & Oshlack, 2010). Outliers were identified by hierarchical clustering of CPM distances between all samples, leading to the exclusion of sample WA8. In addition, we also defined two samples with relative ITS2 read abundances of ≥0.25 for more than one Symbiodinium clade as outliers (WH8 and WA12), to allow unambiguous binary coding of clade in the following analyses. Gene expression profiles were visualized by metric multidimensional scaling (MDS) of logCPM values using limma's plotMDS function (Ritchie et al., 2015) with default settings.

| Genetic differentiation
Genome-wide estimates of F ST values between sample groups (Wai'ōpae vs. Kīholo, clade C vs. clade D, and clade C vs. clade D from Wai'ōpae only) were calculated from synonymous SNPs identified in the RNA-seq reads making up the default dataset, excluding GA-affected samples (n = 17). SNPs were called using the Samtools/ BCFtools pipeline (Li, 2011;Li et al., 2009), implementing Bowtie2 (Langmead & Salzberg, 2012) to map lightly trimmed reads with a minimum mapping quality score (MAPQ) of 20 to the coral host transcriptome reference. Preliminary SNP calls were subjected to the following filters: minor allele frequency of 0.1, maximum allowable missing data threshold of 0.95, and a minimum depth of 20. Of the resulting 46,307 SNPs, 12,287 were accepted as synonymous using SnpEff (Cingolani et al., 2012) by comparison to open reading frames identified with TransDecoder (Haas et al., 2013). Extracting only a single SNP per transcript by selecting the first occurrence in the longest isoform reduced this number to 2,524 (dataset is available from the Dryad Digital Repository: https://doi.org/10.5061/ dryad.db4r63m). Average pairwise F ST values between samples of different groups were calculated from these remaining SNPs according to Weir and Cockerham (1984), using the genet.dist function (method = "WC84") in hierfstat (Goudet, 2005). Confidence intervals were obtained from 1,000 bootstrap replicates with the boot. ppfst function.

| Symbiodinium clade and host genetic differentiation
We examined the composition of the Symbiodinium community in each sample by extracting ITS2 sequences from the meta-transcriptome assembly and individual dataset mapping. The two reference transcripts in the assembly were identified as belonging to clade C, type C31, and to clade D, type D1a (Symbiodinium trenchii). In 22 of 27 samples, raw reads only mapped to either the C31 or the D1a reference (Table 1; Supporting Information Table S1). The remaining five samples were characterized by a mix of both clades, with the more frequent clade contributing 67%-91% of the ITS2 reads.
The majority of samples, especially among those assigned to clade D, contained reads that did not perfectly match the reference (see Supporting Information Table S1). In these cases, the number of mismatched reads ranged from 7% to 44% of all mapped reads per sample (averages: clade C = 11%, clade D = 24%), and typically manifested as a single-nucleotide difference, or more rarely, as several differences at the 3′ end of the read. This suggests that most mismatched reads represented sequencing errors, intragenomic variants, or alternative genotypes, although we cannot exclude that some samples hosted alternative types of the same clade at low frequencies. Thus, the symbiont community in each colony appeared to be dominated by Symbiodinium from a single clade (C or D), and types C31 and D1a specifically.
Colonies from Wai'ōpae were equally likely to house either clade (C = 6, D = 6), while most colonies from Kīholo were associated with clade D (C = 1, D = 5). The association between clade composition and site was not significant (Yates's chi-square of independence, X 2 = 1.87, p-value = 0.17). Regarding GA, a higher prevalence of clade D was observed in healthy colonies (C = 2, D = 7) than in diseased colonies (C = 5, D = 4), although this association was also not significant (Yates's chi-square of independence, X 2 = 2.10, p-value = 0.15).
Clade composition did not differ between affected and unaffected tissue in the same colony, except in some cases where clade composition in affected tissue was observed to shift away from the more homogeneous state in unaffected tissue, resulting in a mixed composition (Table 1)  Supporting Information Figure S1). Gene expression profiles of samples associated with clade D appeared more uniform than those with clade C, which were more spread out. As expected based on a previous study (Frazier et al., 2017), GA disease status explained less of the variation in gene expression than site and clade, with the profiles of healthy and GA-diseased samples overlapping significantly.

| Differential gene expression
Notably, samples obtained from GA-affected and GA-unaffected tissue of the same colony were consistently found in close proximity to each other, and clustered separately from healthy samples, implying that GA-associated changes at the transcriptome level affect the whole colony. After removing outliers, pairwise comparisons of sample groups yielded a high number of differentially expressed genes (DEGs) at p < 0.05 in the default dataset. Consistent with the strong separation of gene expression profiles by site, the highest number of DEGs was found between Wai'ōpae and Kīholo, with 773 and 1,184 more highly expressed genes, respectively (total = 1,957; Figure 1b).
GO term enrichment analyses in the Biological Process category revealed only a few common, nonoverlapping gene functions to be overrepresented among these genes, including organic cyclic compound biosynthetic process (61 genes, p = 0.021) and DNA-templated regulation of transcription (47 genes, p = 0.045; Table 2). and "D" indicate only symbionts of one clade were detected, otherwise estimated relative frequencies for each clade are given (see also Supporting Information Table S1 for absolute read counts and library sizes). b "GA, affected" and "GA, unaffected" refers to lesioned and morphologically unaffected tissue sampled from GA-diseased colonies.
A moderate number of DEGs was also detected between samples housing different clades, with 167 and 106 genes expressed at higher levels in clade C and D samples, respectively (total = 273; Figure 1b).
In contrast, testing for DEGs between healthy and GA-diseased samples (GA-affected and GA-unaffected tissue combined) resulted in only 11 genes total. To examine the effects of Symbiodinium clade on host gene expression in more detail, we removed environmental effects by looking at a subset of data consisting only of samples from Wai'ōpae. This approach yielded 80 genes more highly expressed in clade C, and 74 in clade D hosts. GO term analysis of these 154 genes showed enrichment of several terms pertaining to metabolic processes, including DNA metabolic process (10 genes, p = 0.004) and protein phosphorylation (seven genes, p = 0.046), as well as DNA integration (seven genes, p = 0.006) and transport processes (total eight genes, p ≤ 0.003; Table 2). About half of these 154 genes

| Coexpression network analysis
Patterns of host gene expression, and how they relate to external factors including Symbiodinium clade, were also examined with a system genetics approach. In the default dataset, we detected 17 modules, or groups of coregulated genes, ranging in size from 38 to 5,302 genes (Supporting Information Figure S2). In addition, a sizable number of genes (n = 3,079) could not be assigned to any module. According to linear regression analysis, three and four of these modules were significantly correlated with site (R 2 = 0.28-0.51) and clade (R 2 = 0.28-0.63), respectively (Supporting Information Table   S2). No correlation was found between modules and GA disease status. Site-associated modules L1 and L4 were enriched in GO terms related to biosynthetic processes (314 genes, p < 0.001), gene expression/RNA metabolism (>200 genes, p < 0.001), and others (Table 3; L15 was excluded from this analysis due to its insufficient size). Clade-associated modules were examined in more detail in a second WGCNA analysis based on the Wai'ōpae dataset. Unbiased by field site-related effects, this analysis yielded a more fine-grained network consisting of 25 modules, ranging from 54 to 4,810 genes in size (Supporting Information Figure S3). Of these, four modules  (Table 3). Various metabolic processes were also identified for module M3, for example, protein metabolic process (60 genes, p = 0.003) and phosphorus metabolic process (41 genes, p < 0.001), as well as DNA-templated regulation of transcription (27 genes, p = 0.006) ( Table 3).
To further converge on individual host genes that might be particularly responsive to symbiont clade, we analyzed the connectivity of genes-that is, the extent to which their expression is correlated with that of other genes. Of the 100 most connected genes (by kTotal value), 77 were discovered to belong to clade-associated module M2. The remainder was assigned to module M1, which was not significantly correlated with clade. None of these 100 genes overlapped with the 154 between-clade DEGs (Supporting Information

| Symbiodinium composition
Our primary objective was to evaluate the effects of Symbiodinium types from different clades on host gene expression in M. capitata, and identify groups of genes potentially involved in the maintenance and regulation of different symbiotic relationships. We also considered two other variables-samples were collected from sites on two opposite sides of Hawai'i Island, and from colonies and tissues unequally affected by GA, a widespread coral disease we focused on in a previous publication (Frazier et al., 2017

| Host gene expression profiles
Across 27 RNA-seq datasets from 18 colonies, we detected distinctly different transcriptional responses to the three variables described above. According to overall transcriptional profiles ( Figure 1a They may also represent a phenotypic response (acclimatization), for example, through epigenetic effects or polyphenism. Conceivably, the environment at the two field sites differed sufficiently to induce both adaptation and acclimatization, especially in a sessile organism. For instance, as a typical fringing reef on Hawai'i Island's arid Western side, Kīholo is characterized by stable conditions with constant sea water circulation and little rainfall. In contrast, Waiʻōpae on the Eastern side received higher rainfall and groundwater influx and was defined by its complex underwater topography with partially submerged tide pools of different size and connectivity to the open ocean. Waiʻōpae was therefore subject to much higher fluctuations in temperature and salinity (Wiegner et al., 2016) than Kīholo, necessitating different short-and long-term physiological responses. The magnitude of gene expression differences between sites was comparable to those in a similar study, which reported about 3,000 DEGs between two sites (Barfield et al., 2018) in Australian A. millepora.
These genes showed enrichment in macromolecule biosynthesis and RNA processing functions, suggesting some overlap with our GO term analysis, which highlighted cyclic compound biosynthesis and transcription as potentially important functions in response to different environments (Table 2).

| Site-controlled clade effects
We found little to no overlap in DEGs obtained from comparisons based on site, clade, and GA disease (Figure 2), and in coexpression modules correlated with these variables (Supporting Information   Table S2). This suggests that gene expression responses to site, clade, and GA disease are largely independent of each other, despite the overrepresentation of clade D in Kīholo and healthy samples, respectively. Because the sampling site was found to be a significant factor and clade C was underrepresented at Kīholo (Figure 1a   host (Loram et al., 2007;Matthews et al., 2017). Symbiodinium types in clades C and D may also be adapted to different light regimes, and influence the vertical distribution pattern of their hosts (Iglesias-Prieto et al., 2004). Most notably, some clade D symbionts confer increased resistance to elevated temperatures to the host, and reduce its susceptibility to coral bleaching (Berkelmans & van Oppen, 2006;Mieog et al., 2009; but see e.g., Hume et al., 2015).
As the abundance of clade D symbionts has been increasing globally in habitats impacted by bleaching and across a wide range of host species, some are suspected to be opportunists capable of outcompeting and replacing optimally coadapted symbionts (e.g., in clade C) in health-compromised corals

| Key genes and coexpression networks
The magnitude of gene expression differences (e.g., low hundreds of genes) associated with variation in symbiont composition was similar to that reported from several other cnidarian host species.  host transcriptome, the effect size is not fully comparable to the present study. In contrast, Barfield et al. (2018) found more than 3,000 DEGs between corals associated with different symbionts, possibly because of this study's extra level of sensitivity due to its common garden design. Among the DEGs we identified in coral hosts harboring different Symbiodinium clades, genes involved in DNA metabolism and integration, protein phosphorylation, and transport processes were significantly over-represented (Table 2). Individual genes with the most significant expression differences between clades included genes encoding two transcription elongation factors, a putative but fragmentary Toll-like receptor (TLR), collagen and ankyrin repeat-containing proteins (the former possibly involved in Wnt signaling), and a pore-forming cytotoxin homologous to Delta actitoxin Aas1a. We detected few transcriptional differences in genes associated with the immune system and stress response (e.g., in lectins, additional TLRs, antioxidants)-differences which might be expected when comparing an optimally coadapted (clade C) and a more opportunistic symbiont (clade D), as observed by Matthews et al. (2017)  Consistent with the DEGs between sites, two of the site-dependent modules appeared to revolve around the production of macromolecules and organic cyclic compounds, as well as RNA metabolism.
Focusing on clade effects in the smaller dataset, two of the modules significantly correlated with clade (M6 and M17) did not produce meaningful GO term results likely due to their small size. However, some information could be gleaned from the 10 most highly connected genes in each module, which represent hub gene candidates that may influence the activity of a disproportionate number of genes in the network (Supporting Information Tables S3 and S4). In module M6, hub gene candidates included serine proteases, as well as genes potentially involved in DNA repair and cytoskeleton regulation (e.g., homologs of Mediator of DNA damage checkpoint protein 1, spectrins). Potential hub genes in module M17 comprised a homolog of S-adenosylmethionine synthase, several aminoacyl tRNA synthases, and ABC transporters, suggesting a module function in translation, membrane trafficking, and possibly immune response.
The two larger of the significant modules, M2 and M3, showed enrichment in GO terms mostly pertaining to metabolic processes, as well as to a lesser extent, signal transduction and regulation of transcription (Table 3). Both modules featured a functionally diverse cast of potential hub genes. For instance, M2 hub gene candidates may be involved in endocytosis (Adaptor protein complex 2), RNA processing (e.g., La protein, Polypyrimidine tract binding protein 1), and signal transduction (e.g., Cyclic nucleotide phosphodiesterase, and a putative G protein), among others. In the case of M3, potential hub genes seemed to mostly play a role in signaling pathways and cytoskeleton regulation, comprising several kinases, putative kinase receptors, and others (e.g., STK, Ras and Zap70 homologs, Ankyrin 2, to focus on the most reliably annotated ones). Notably, module M2 also featured a disproportionate number of highly connected genes, since the majority of these genes in the entire transcriptome (top 100 by kTotal) fell into this group (while kTotal is correlated with the number of genes per module, M1 contained more genes in total but only a minority of highly connected genes). By combining DGE and coexpression network analyses, we discovered that none of the top overall connected genes overlapped with the 154 DEGs between clades, suggesting that clade-associated genes are not central regulators in the host transcriptome. On the other hand, most DEGs (70%) were found within the modules significantly correlated with clade. Coexpression network analyses therefore pointed to more extensive, subtle changes in biological processes between hosts colonized by different clades than was observable directly through DGE analysis. In summary, these processes appeared to revolve around metabolic changes, transcription and translation, and cell signaling, which are consistent with previous observations in similar systems (DeSalvo et al., 2010;Matthews et al., 2017). However, it is important to note that the usefulness of homology-based gene annotation and GO term analysis remains limited in nonmodel organisms, and typically rely on a small number of genes. In the present study, we were only able to annotate a quarter of all transcribed host genes with Swiss-Prot homologs, and to identify Pfam domains (on which GO annotations were based) in three-quarters. Sequence homology may not be a reliable predictor of functional homology across greater phylogenetic distances, and genes that cannot be annotated may represent taxonomically restricted or highly divergent genes with important functional roles (Tautz & Domazet-Lošo, 2011;Wissler, Gadau, Simola, Helmkampf, & Bornberg-Bauer, 2013). These concerns appeared to be particularly pronounced in the site-dependent DEGs. Despite their high number, these genes were characterized by only four main GO terms, three of which were highly similar despite efforts to reduce term redundancy (Table 2). In addition, only four out of the 10 most significant DEGs could be functionally annotated based on sequence homology, indicating the rest may be taxonomically restricted. Such genes without homology to known genes have been linked to species-specific adaptations in response to changing environments (Colbourne et al., 2011;Voolstra et al., 2011). It is thus conceivable taxonomically restricted genes are over-represented in DGEs from two sites characterized by different environmental conditions, although this possibility requires further analysis.

| CON CLUS IONS
Unraveling the complexity of host-symbiont interactions at the molecular level remains an important goal in the life sciences. In this study, we investigated transcriptome-wide differences in host gene expression associated with two dominant types of Symbiodinium clade C and D in the coral M. capitata, which is of particular interest considering some clade D symbionts may convey higher resistance to elevated water temperatures and are globally increasing in prevalence. In our study population, we found that association with predominantly clade C or clade D types was correlated with measurable changes in host gene expression profiles, a moderate number of DEGs, and changes in the activity of coexpressed gene networks.
However, considering the differences in physiology and performance between coral colonies harboring different types, in particular of clades C and D, the clade-dependent effects appeared unexpectedly limited in scope. This was especially evident in comparison with the much more pronounced effects of environmental factors, which we explored through the inclusion of different field sites. Notably, we did not uncover large-scale transcriptional changes pertaining to the immune or stress response, although increased activity of these systems might be expected from hosts confronted with less optimally coadapted symbionts of clade D Symbiodinium. This raises the question whether the types investigated here indeed differ significantly with regards to their role along the mutualism-parasitism spectrum, as has been hypothesized for clades C and D .
The clade-based effects we observed, however, were consistent with previous studies, and involved changes in metabolic processes, as well as more subtle variation within several other systems, including the regulation of transcription and translation, cell signaling, and membrane trafficking. Differences in metabolic pathways and transport may be linked to differences in symbiont-host nutrient transfer (Loram et al., 2007), but to what extent transcriptional changes translate into metabolic changes remains unclear until further experiments (compare Matthews et al., 2017). Our experimental setup was also not designed to determine to what extent the correlation between host gene expression and symbiont type is caused by the host genomic background, the host responding to symbiont type, or more intriguingly, cross-kingdom transcriptional modulation of the host by Symbiodinium. A promising avenue of research to this end would be to investigate the role of symbiont regulatory RNAs. Such an approach could further illuminate where the relationships between clade D symbionts and their hosts are located on the spectrum from mutualism to parasitism, which has important long-term implications for coral reef health and conservation . By providing a candidate list of clade-associated genes in M. capitata, we hope to motivate the further characterization of these genes, which all too often remain only incompletely annotated-after all, identifying and comparing unknown candidate genes that are found consistently in nonmodel organisms has the potential to make unexpected contributions to our understanding of host-symbiont interactions.

ACK N OWLED G M ENTS
We would like to acknowledge John Burns, Ben Clark, Elizabeth Clemens, Dan Jennings-Kam, Lauren Kapono, Keo Lopes, Kanoelani Steward, and Julia Stewart for sample collection, and Scott Geib for help in the laboratory and with data processing.
We are grateful for many years of fruitful work at the Wai'ōpae tidepools, which were destroyed on June 4 , 2018 by a lavaflow during the recent eruption of Kīlauea. Gary Banks and Sachie Ohia, whose home was also destroyed during this event, were invaluable by providing a home base during field work. Finally, we would like to thank three anonymous reviewers whose comments led to important improvements in this manuscript. This work was supported by the United States National Science

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
MT, MF, and MH designed the research. MH, MF, and RB performed the experiments and analyzed the data. MH wrote the manuscript.
All authors critically revised and approved the final manuscript.