The draft genome of Actinia tenebrosa reveals insights into toxin evolution

Abstract Sea anemones have a wide array of toxic compounds (peptide toxins found in their venom) which have potential uses as therapeutics. To date, the majority of studies characterizing toxins in sea anemones have been restricted to species from the superfamily, Actinioidea. No highly complete draft genomes are currently available for this superfamily, however, highlighting our limited understanding of the genes encoding toxins in this important group. Here we have sequenced, assembled, and annotated a draft genome for Actinia tenebrosa. The genome is estimated to be approximately 255 megabases, with 31,556 protein‐coding genes. Quality metrics revealed that this draft genome matches the quality and completeness of other model cnidarian genomes, including Nematostella, Hydra, and Acropora. Phylogenomic analyses revealed strong conservation of the Cnidaria and Hexacorallia core‐gene set. However, we found that lineage‐specific gene families have undergone significant expansion events compared with shared gene families. Enrichment analysis performed for both gene ontologies, and protein domains revealed that genes encoding toxins contribute to a significant proportion of the lineage‐specific genes and gene families. The results make clear that the draft genome of A. tenebrosa will provide insight into the evolution of toxins and lineage‐specific genes, and provide an important resource for the discovery of novel biological compounds.

Recent studies have revealed a high frequency of cnidarian-specific genes is enriched within the cnidocyte (Sebé-Pedrós et al., 2018;Sunagar et al., 2018). Many of these cnidarian-specific genes expressed in the cnidocytes encode for toxin peptides Sebé-Pedrós et al., 2018). This highlights that cnidarians possess both morphological and biochemical novelties and that the evolution of these innovations may be related. This is consistent with evidence that acrorhagin 1 and acrorhagin 2 are novel toxincoding genes which are localized to the acrorhagi, a morphological structure used for envenomation that is unique to sea anemones from Actinioidea (Honma et al., 2005;Macrander, Brugler, & Daly, 2015).
Indeed, understanding the evolution of venom and its delivery in cnidarians can provide insights into the innovation of morphological and biochemical novelties. While the majority of cnidarian toxin research has focussed on sea anemones from the Actinioidea superfamily (Prentis et al., 2018), no highly complete sequenced genomes for members of this superfamily currently exist (Urbarova et al., 2018).
This lack of genomic resources limits our collective ability to understand the phylogenetic and molecular evolutionary histories of toxinencoding genes within this superfamily. Such a resource would provide an excellent model to investigate the evolution of novel morphological and cellular structures, and their relationship with novel genes.
Moreover, genetic innovations restricted to Actinioidea are found to be enriched for functions related to venom and its delivery. The suite of toxin and toxin-like (TTL) genes identified in A. tenebrosa reveal an abundance of gene families evolving through lineage-specific duplications and, in some cases, concerted evolution. This study shows that gene duplication and divergent selective pressures have shaped the genetic variation in genes encoding toxins in actiniarians.

| Sample preparation, sequencing, and assembly
Samples of A. tenebrosa were collected from the intertidal zone at Coolum, (QLD, Australia). Tissue from a single individual was used to extract high-quality gDNA using the E.Z.N.A. Mollusc DNA Kit (Omega Bio-Tek;Stefanik, Wolenski, Friedman, Gilmore, & Finnerty, 2013). Extracted gDNA was used to construct four paired-end (PE) libraries sequenced on Illumina 2500 HiSeq platform using multiple insert sizes (170, 500, 2,000, 5,000 bp) with a read length of 100 bp (NCBI BioProject PRJNA505921). Sequencing resulted in over 150 million PE reads per library, with over 96% being high-quality (Q > 30, [N(ambiguous bases) < 1%]). Contiguous sequences were generated and scaffolded using a manual operation of ALLPATH-LG (Butler et al., 2008) with a focus on removing redundant sequences.
The presence of the complete mitochondrial genome of A. tenebrosa in the draft genome was investigated. Assembled contigs were queried using BLASTN against a database which consisted of the complete mitochondrial genome of A. equina. Contigs receiving a significant hit (e value 1e −05 ) were imported into Geneious 9.1.6 and aligned using a global alignment with free end gap and 100% identity. This resolved a single sequence, of 20,691 bp, and was aligned to the complete mitochondrial genome of A. equina using eight iterations of MUSCLE. Gene order and annotation of the mitochondrial genome of A. tenebrosa were performed as per Wilding and Weedall (2019).
All repeat models were curated to remove models putatively part of protein-coding genes. Any models confidently annotated by LTR_retriever or RepeatModeler (i.e., not classified as "Unknown") were removed from consideration as they are not likely to be part of protein-coding genes. Open reading frames from the remaining repeat models were extracted and examined using HMMER 3.1b2 (Eddy, 2011) to identify models that only contained domains associated with transposable elements. For this purpose, we collated a list of transposon-associated domains which primarily consisted of domains identified by Piriyapongsa, Rutledge, Patel, Borodovsky, and Jordan (2007) with additional Pfam (Finn et al., 2014) and NCBI CDD (Marchler-Bauer et al., 2015) domains included on the basis of manual inspection of domain prediction results for putative transposable elements. Repeat models that contained a TE-associated domain prediction were removed from consideration and assumed to be truepositives. A custom database of known genes was created to enable BLAST comparison of remaining repeat models and subsequent removal of false predictions from protein-coding genes. The database includes the UniProtKB/Swiss-Prot proteins as well as the gene models of Nematostella vectensis (v.2.0;Putnam et al., 2007;Schwaiger et al., 2014), Exaiptasia pallida (v.1.1; Baumgarten et al., 2015), Acropora digitifera (v.0.9; Shinzato et al., 2011), and Hydra vulgaris (Chapman et al., 2010). This database had probable transposons removed via the same process detailed above using HMMER 3.1b2 and domain organization. Any remaining repeat models were removed from the initial custom repeat library (CRL) if they had significant BLASTX hits (e value < 1e −02 ) when queried against the gene model database. The final curated CRL was used to soft-mask the A. tenebrosa genome using RepeatMasker (-e ncbi -s -nolow -no_is -norna -xsmall) for later gene prediction. Scripts were produced to automate this process and are available from https ://github.com/zkste wart/Genome_analy sis_scrip ts/tree/maste r/repeat_pipel ine_scripts.

| Gene model prediction and annotation
Following the masking of repeat regions, gene models were predicted using ab initio methods guided by transcriptional expression. These reads included the Red and Brown ecotypes obtained from NCBI (Bioproject PRJNA313244; van der Burg, Prentis, Surm, & Pavasovic, 2016). Raw reads were quality trimmed using Trimmomatic (Bolger, Lohse, & Usadel, 2014) with parameters used by the Trinity de novo assembler (Haas et al., 2013;MacManes, 2014). Trimmed sequences were aligned against the genome using STAR 2.5 (Dobin et al., 2013) using the two-pass procedure for the de novo identification of transcription splice sites. The SAM file produced by STAR was converted to BAM format and sorted using samtools v.1.5 (Li et al., 2009). Gene models were predicted by BRAKER1 v1.11 (Hoff, Lange, Lomsadze, Borodovsky, & Stanke, 2016) using the soft-masked genome assembly and the STAR alignment file as inputs. The completeness of the proteincoding genes was then assessed using BUSCO (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015;Waterhouse et al., 2018). Gene models were annotated by querying models against the Uniclust90 database (Mirdita et al., 2017) using MMseqs2 with an e value < 1e −05 . Gene Ontology (GO) terms associated with the representative UniProtKB sequence for each Uniclust90 hit were attributed to the A. tenebrosa gene model using the idmapping_selected.tab file provided by UniProtKB.
A total of 1,314 SCO were identified and aligned using clustal-omega (Sievers et al., 2011). The alignments were the concatenated, and the best evolutionary mode protein model (JTT+F+I+G4) was determined. Finally, a maximum-likelihood tree with 1,000 ultrafast bootstrap replicates was generated using IQ-TREE (Nguyen, Schmidt, von Haeseler, & Minh, 2015).
Following the generation of a cnidarian species tree, the gain and loss of gene families across Cnidaria were inferred using the DOLLOP program from the PHYLIP package version 3.696 (Felsenstein, 1989; http://evolu tion.genet ics.washi ngton.edu/phylip.html). The species tree and a presence/absence matrix of gene families were imported into the DOLLOP program. The most parsimonious evolutionary scenario for the gain and loss of gene families was estimated using Dollo's parsimony law, which assumes genes arise once on the evolutionary tree and can be lost independently in different evolutionary lineages (Farris, 1977). The predicted proteomes from cnidarian species with sequenced genomes were used to investigate the evolution of protein domains. Protein domains were predicted using HMMER 3.1b2 against the Pfam database (e value < 1e −05 ), the best hit was retained, and overlapping domains were removed. A Fisher exact test was performed to determine Pfam enrichment with p-value of .05. The presence of toxin and toxin-like (TTL) genes was investigated in A. tenebrosa. The TTL genes were identified as per Surm et al. (2019). Briefly, BLASTP was performed against the against the manually curated Swiss-Prot database (e value < 1e −05 ). Significant queries with top BLAST annotations from sequences in the Tox-Prot database (Jungo & Bairoch, 2005) were considered candidate proteins. Candidate proteins were further interrogated and required to contain a signal peptide identified using SignalP (Petersen, Brunak, Heijne, & Nielsen, 2011). aligned to sequences used in previous studies (Alieva et al., 2008;Ikmi & Gibson, 2010).

| Genome assembly
Using a whole-genome shotgun strategy, we sequenced and assembled the genome of A. tenebrosa. A total of 1.2 billion paired-end reads, with a length of 100 bp, were sequenced across four different insert size libraries (170, 500, 2,000, and 5,000 bp; Table S1).
Raw reads were used to assemble the A. tenebrosa genome using ALLPATHS-LG. The genome size of A. tenebrosa is estimated to be ~255 Mbp (Table 1). The draft genome assembled is of similar quality to other cnidarian genomes (Table 1). Although the assembly resulted in the scaffold and contig N50 lower than other cnidarian genomes, the predicted genome completeness using metazoan Augustus gene models is among the highest (89.6%) for cnidarian genomes, with only N. vectensis having a more complete assembly (91.6%). The assembly contains ~19% repetitive DNA sequences, which is similar to reported values for other cnidarians (Tables 1 and S2).

| Functional annotation of predicted gene models
The ab initio gene model prediction identified 31,556 protein-coding genes in A. tenebrosa. All gene models were validated, receiving significant BLAST hits against multiple A. tenebrosa transcriptomes (van der Burg et al., 2016). Our ab initio gene model prediction was highly complete compared with other cnidarian genomes, increasing the previous BUSCO score to 94.6% (Table 1). Only E. pallida gene models were more complete (94.7%). Of the 31,556 proteincoding genes, 19,022 and 25,478 returned a significant BLAST hit (e value 1e −05 ) against the Swiss-prot and TREMBL database, respectively. This highlights that ~80% of the predicted proteome shares sequence similarity to known protein sequences, with ~20% having no similarity to other proteins. In contrast, only 6.56% of E. pallida predicted proteome returned no hits to known proteins at this stringency. However, other cnidarian genomes returned similar levels of novelty, with Discosoma sp. having 16.17% of proteins returning no hits. The annotation of protein domains revealed 19,056 (~60%) gene models contained identifiable Pfam domains. This is less than other sea anemone genomes, with 78.64% and 68.35% of E. pallida and N. vectensis gene models having a protein domain, respectively. Additionally, both corallimorpharians genomes reported less than 60% of gene models to encode proteins with known protein domains. Taken together, these results highlight that the draft genome of A. tenebrosa is mostly complete, yet a significant proportion of its genes are unique (Table 2).
Our assembly also resolved the complete mitochondrial genome for A. tenebrosa (GenBank accession MK291977), shown to be 20,691 bp long ( Figure S1).

| Gene family evolution
Manual curation and a phylogenomic characterization of seven Cnidarian species drove our investigation of Cnidarian gene turnover. Using 1,314 genes, we built a representative cnidarian species tree from all seven genomes (Figure 1). This species tree confirmed TA B L E 2 Functional annotation of gene models from seven cnidarian species  tively (Castañeda et al., 1995).
With evidence supporting that genetic innovations in the genome of A. tenebrosa are related to venom, we further investigated its total and toxin-like gene (TTL) complement. Overall, we identified 113 TTL gene families in A. tenebrosa (Table S4). Manual curation of TTL genes revealed that sea anemone type 3 (BDS-LIKE) KTx family is the most highly expanded TTL gene family (15 copies, 11 of which are full-length sequences). A phylogeny of sea anemone type 3 (BDS-LIKE) KTx was generated from these full-length sequences (Figure 4), as well as functionally characterized sequences from other sea anemones (Jouiaei, Sunagar, et al., 2015;. The 11 A. tenebrosa sequences clustered into four distinct clades, one of which includes only A. tenebrosa paralogs (Clade A). This suggests a process of concerted-like evolution. Investigation into the genomic localization of the 11 A. tenebrosa sequences revealed no presence of tandem duplication, a common mechanism observed during concerted evolution. Furthermore, the sequence identity among A. tenebrosa sea anemone type 3 (BDS-LIKE) KTx paralogs is highly divergent at 34.2% and 43.5% at the nucleotide and protein level, respectively.
The sea anemone sodium channel inhibitory toxin family (NaTx) has also previously been shown to evolve via concerted evolution in multiple different sea anemone species (Moran et al., 2008). Here we generated a phylogeny for the NaTx gene family ( Figure 5), using sequences from a previously published alignment Investigating the phylogenetic histories of cnidarian MACPF gene family also revealed evidence of concerted-like evolution ( Figure 6).
This included two paralogs of A. tenebrosa MACPF sequences clustering together. Genomic localization further revealed these sequences evolved through tandem duplication ( Figure S2). Evidence of concerted evolution was also revealed with A. tenebrosa MACPF sequences being highly homogenous, sharing 94.3% and 92.8% similarly at the nucleotide and protein level, respectively. In fact, clustering of paralogs of MACPF was observed in the majority of the anthozoan genomes investigated, including all sea anemones. Multiple tandem duplications were observed in E. pallida; however, this was not consistent in all sea anemones with N. vectensis paralogs not adjacent to each other in the genome. We also found evidence of concerted-like evolution in the sea anemone type 1 KTx family (Figure 7). While we did not observe this for A. tenebrosa paralogs, this process was observed for A. viridis paralogs.
While concerted-like evolution appears to be a consistent pattern of TTL genes families in sea anemones, similar pattern is also observed broadly in cnidarians florescent protein (FP) family ( Figure   F

| Selection patterns on toxin gene families
In this study, we further explored the evolutionary histories of TTL gene families to provide insights into the selective pressures acting on them (Table S5) For N. vectensis, no evidence of positive selection could be inferred among the highly homogenous Nv1 sequences ( Figure 5); however, F I G U R E 4 Maximum-likelihood tree with midpoint root depicting relationships among sea anemone type 3 (BDS-LIKE) KTx coding sequences. Bootstrap values after 1,000 iterations are shown next to nodes, values under 70% not reported. The GenBank accession numbers for the protein-coding gene used in this phylogenetic analysis are described in Sunagar and Moran (2015). A corresponding bar plot is provided which shows the computed dN/dS value for orthologs and paralogs. 1 = Actiniaria orthologs, 2 = Anemonia viridis paralogs, 3 = Actinia tenebrosa paralogs, 4 = Bunodosoma granulifera paralogs, 5 = Actinia tenebrosa Clade A paralogs (Clade A), and 6 = Actinia tenebrosa paralogs diverged we also have evidence of concerted-like evolution for the FP and MACPF gene families; however, we did not observe any paralogs escaping this process (Figures 6 and S3).

| D ISCUSS I ON
Here we present a draft genome assembly and annotation of A. tenebrosa. This complete draft assembly is the first from any species of the superfamily Actinioidea. Overall, the assembly was of F I G U R E 5 Maximum-likelihood tree with midpoint root depicting relationships among NaTx coding sequences. Bootstrap values after 1,000 iterations are shown next to nodes, values under 70% not reported. The GenBank accession numbers for the protein-coding gene used in this phylogenetic analysis are described in Sunagar and Moran (2015) and Sachkova et al. (2019). A corresponding bar plot is provided which shows the computed dN/dS value for orthologs and paralogs. 1 = Actiniaria orthologs, 2 = Actinia orthologs, 3 = Nematostella vectensis paralogs, 4 = Actinia equina paralogs, 5 = Actinia tenebrosa paralogs, 6 = Anemonia viridis paralogs, 7 = Nv1 paralogs, 8 = Av2 paralogs, 9 = Nematostella vectensis paralogs diverged, and 10 = Anemonia viridis paralogs diverged    (Baumgarten et al., 2015;Chapman et al., 2010;Putnam et al., 2007;Shinzato et al., 2011;Wang et al., 2017), verifying its suitability for comparative genomic studies. Insights into the evolution of gene families across Cnidaria revealed significant conservation among anthozoan species, with the many gene families gained in either the last common ancestor of Cnidaria or Anthozoa. Notably, all anthozoans used in this study are from Hexacorallia, highlighting a high conservation of gene families shared among this subclass. This is consistent with previous studies that have suggested that this shared gene set plays an important role in the evolution of traits essential to Hexacorallia taxa, including symbiosis with dinoflagellates, stress response, and delivery of venom (Baumgarten et al., 2015;Rachamim et al., 2015;Wang et al., 2017).
The A. tenebrosa genome is the most gene dense among cnidarians, with only E. pallida having a smaller genome and only A. digitifera having more protein-coding genes. However, flow cytometry revealed that the genome size of A. equina is larger (~520 Mb) than that predicted here ~255 Mb (Adachi, Miyake, Kuramochi, Mizusawa, & Okumura, 2017). One hypothesis for the discrepancy observed between estimated genome sizes may be associated with repeat regions that have not been fully captured in our assembly. The A. tenebrosa genome also contained a higher proportion of lineagespecific genes compared with other cnidarian genomes. Previous studies have identified this pattern in species from the superfamily Actinioidea, particularly those genes that encode for peptide toxins (Prentis et al., 2018;Surm et al., 2019). It is shown that there is relatively little overlap of toxin genes among cnidarian species and that a high proportion are restricted to specific lineages (Rachamim et al., 2015;Surm et al., 2019). In addition, many lineage-specific toxins from A. tenebrosa have expression restricted to acrorhagi, a novel structure used for envenomation. These data support the hypothesis that novel genes are expressed in novel morphological structures. Evidence in support of this hypothesis in other cnidarian species is equivocal. For example, although Nematostella-specific genes comprise a significant proportion of genes expressed in the nematostome, a novel structure only found in this genus, many of these genes were also expressed in tissues common to all sea anemone species (Babonis, Martindale, & Ryan, 2016).
The origin of new genes is considered to be an important source of evolutionary novelty, by providing the substrate upon which natural selection can act. New genes may be formed through multiple processes, ranging from gene duplication through exon shuffling to de novo gene formation (Kaessmann, 2010;McLysaght & Hurst, 2016;Tautz & Domazet-Lošo, 2011). Genes created through these processes produce copies of a gene that are identical to the ancestral sequence or generate genes with novel sequences that are restricted to specific lineages (Capra, Pollard, & Singh, 2010). Here, we have revealed that lineage-specific gene families undergo increased rates of gene duplication compared with gene families shared among cnidarian orders. This suggests that following the formation of new genes in cnidarian taxa, repeated duplication events occur.
However, this also suggests that few new genes arise through de novo gene evolution in cnidarians, as genes generated through this mechanism have been reported to undergo limited gene duplications (Schlötterer, 2015).
Sea anemones, and in particular species from the superfamily Actinioidea, are an important group used to understand the evolu- We propose that the major contributor to the evolution of new genes in cnidarians is through a process of gene duplication.

Significant expansions of neurotoxins are observed in A. tenebrosa.
This was evident from the increased copy number of Pfam domains (ShK and Defensin_4) which are associated with neurotoxins that modulate potassium ion channels. The Defensin_4 domain is associated with the sea anemone type 3 (BDS-LIKE) potassium channel toxin family, and both the gene family and protein domain are restricted to Actinioidea (Diochot, Schweitz, Béress, & Lazdunski, 1998). This is supported by evidence of the sea anemone type 3 (BDS-LIKE) potassium channel toxin family identified to be gained in the 947 Actinioidean-specific gene families ( Figure 1). Furthermore, sea anemone type 3 (BDS-LIKE) KTx appears to be the most highly duplicated toxin-encoding gene in A. tenebrosa.
In this study, we explored the selective pressures acting on orthologs and paralogs in TTL gene families to investigate the adapta- can be driven through a process of concerted evolution. Concerted evolution is the homogenization of paralogs that results in sequence similarity greater within species compared to between species (Liao, 1999;Nei & Rooney, 2005). This homogenization is typically attributed to gene conversion or unequal-crossing over (Brown, Wensink, & Jordan, 1972;Eickbush & Eickbush, 2007;Szostak & Wu, 1980). Here we observe concerted-like evolution in multiple TTL gene families including sea anemone types 1 and 3 (BDS-LIKE) KTx, NaTx, MACPF, and the nontoxin gene family FP. Whether the concerted-like evolution observed is through lineage-specific duplications or concerted evolution remains elusive.
Concerted evolution of a sea anemone toxin gene family has previously been reported in multiple species (Moran, Genikhovich, et al., 2012;Moran et al., 2008). Nv1, a member of NaTx TTL gene family, is the major adult venom component in N. vectensis Nv1 has evolved via concerted evolution. This is supported by the evidence of Nv1 copies being encoded by a cluster of at least 12 highly conserved sequences (Moran, Genikhovich, et al., 2012;Moran et al., 2008). This is further supported in the NaTx phylogeny we generated in this study, with Nv1 sequences clustering strongly together ( Figure 5). From our selection analyses, we could not infer that these highly homogenous Nv1 sequences are evolving under diversifying selection. Divergently, the N. vectensis paralogs that escaped this homogenization are inferred to be evolving under diversifying selection, which consists of Nv3-8. Recent experimental evidence supports the adaptive evolution of these paralog escaping the process of concerted evolution (Sachkova et al., 2019). This is evident with Nv4 and Nv5 paralogs expression being mostly restricted to early life stages compared with Nv1, suggesting neofunctionalization or subfunctionalization. The Nv4 and Nv5 paralogs also exhibit divergent activity being highly toxic to fish, compared with Nv1 which has greater activity against arthropods (Sachkova et al., 2019). Indeed, a similar pattern is also observed in A. viridis NaTx paralogs with the Av2 copies being highly similar and other copies escaping this homogenization. These escaped paralogs are also inferred to be evolving under diversifying selection. While evidence supports that both Nv1 and Av2 are evolving through a process of gene conversion or unequal-crossing over consistent with concerted evolution (Moran et al., 2008), the escaped paralogs, however, may have resulted from lineage-specific duplications (Sachkova et al., 2019).
Overall, our phylogenetic analyses provide repeated evidence of paralogs clustering closer together than orthologs for multiple TTL gene families in cnidarians. Whether this occurs through a process of concerted evolution (gene conversion or unequal-crossing over) or lineage-specific duplications is unclear, especially given that a combination of both processes may be occurring in parallel. We propose that concerted evolution is an important process in the evolution of ancient actiniarian venom, occurring in gene families recruited into the venom of at least last common ancestor. Subsequently, lineage-specific duplications allow paralogs to escape the homogenizing process associated with concerted evolution, with selection driving these new duplicates to undergo neofunctionalization or subfunctionalization.
In venomous animals, biochemical and morphological innovations result in phenotypic adaptations, such as toxin peptides and an envenomation system. Although the cnidarian envenomation system is largely conserved across this phylum, our analysis revealed duplication events in gene families enriched in A. tenebrosa include many nematocyte-related proteins such as toxin peptides. We propose that the genome sequence of A. tenebrosa will aid future research to improve our understanding of Actinioidean innovations involved in venom production and its delivery.

ACK N OWLED G M ENTS
The authors would like to thank the Evolutionary and Physiological

CO N FLI C T O F I NTE R E S T
We declare there are no conflicts of interest. Peter J. Prentis https://orcid.org/0000-0001-6587-8875