Gene expression profiling during the embryo‐to‐larva transition in the giant red sea urchin Mesocentrotus franciscanus

Abstract In echinoderms, major morphological transitions during early development are attributed to different genetic interactions and changes in global expression patterns that shape the regulatory program for the specification of embryonic territories. In order more thoroughly to understand these biological and molecular processes, we examined the transcriptome structure and expression profiles during the embryo‐to‐larva transition of a keystone species, the giant red sea urchin Mesocentrotus franciscanus. Using a de novo assembly approach, we obtained 176,885 transcripts from which 60,439 (34%) had significant alignments to known proteins. From these transcripts, ~80% were functionally annotated allowing the identification of ~2,600 functional, structural, and regulatory genes involved in developmental process. Analysis of expression profiles between gastrula and pluteus stages of M. franciscanus revealed 791 differentially expressed genes with 251 GO overrepresented terms. For gastrula, up‐regulated GO terms were mainly linked to cell differentiation and signal transduction involved in cell cycle checkpoints. In the pluteus stage, major GO terms were associated with phosphoprotein phosphatase activity, muscle contraction, and olfactory behavior, among others. Our evolutionary comparative analysis revealed that several of these genes and functional pathways are highly conserved among echinoids, holothuroids, and ophiuroids.

From an ecological and evolutionary perspective, the process of development could be considered as a balancing act in an unpredictable world. It is often remarkably resilient, producing consistent phenotypes in the face of new mutations and environmental perturbations, but it is also adaptable, allowing for the evolution of novel phenotypic traits in response to environmental change (Garfield et al., 2013;Sultan, 2007) For example in sea urchins with planktotrophic life-history mode, larvae are developmentally plastic in response to changes in environmental conditions (Byrne, Lamare, Winter, Dworjanyn, & Uthicke, 2013;Padilla-Gamiño, Kelly, Evans, & Hofmann, 2013;Yu et al., 2013) and food availability (Adams, Sewell, Angerer, & Angerer, 2011;Carrier, King, & Coffman, 2015;McAlister, 2008), which allow them to regulate the rate of growth and skeletal formation. Such developmental adjustments are achieved mostly through variation in the expression of regulatory genes (Carrier et al., 2015;Runcie et al., 2012). This kind of variation has been documented as an intrinsic characteristic of natural populations of sea urchins (e.g., Strongylocentrotus purpuratus), in which changes in timing or level of gene expression are well buffered during early development by on/ off switch-like regulatory mechanisms, while during later development they have a greater impact on the expression of downstream target genes and on morphology (Garfield et al., 2013).
Most of the efforts to understand GRN architectures and changes in the expression of regulatory and structural genes during indirect development have used the purple sea urchin S. purpuratus as a study system (Garfield et al., 2013;Hammond & Hofmann, 2012;e.g., Hinman & Davidson, 2003a;Rafiq, Cheers, & Ettensohn, 2012;Rafiq, Shashikant, McManus, & Ettensohn, 2014;Runcie et al., 2012;Tu, Cameron, & Worley, 2012). However, thanks to the advances in nextgeneration sequencing (NGS) technologies, the ability to sequence the entire transcriptome of a given tissue or life-history stage in a matter of days is providing new opportunities to explore the complexity of developmental GRNs in other echinoderms (Delroisse, Ortega-Martinez, Dupont, Mallefet, & Flammang, 2015;Dilly, Gaitán-Espitia, & Hofmann, 2015;Dylus et al., 2016;Gildor, Malik, Sher, Avraham, & Ben-Tabou de-Leon, 2015;Tulin, Aguiar, Istrail, & Smith, 2013) and marine invertebrates (Jackson & Degnan, 2016;Layden, Rentzsch, & Röttinger, 2016). As changes in gene expression may underlie many of the phenotypic differences between species (Brawand et al., 2011), studying transcriptomic divergence of sympatric species may shed light upon the initial genetic targets of natural selection in speciation events (Filteau, Pavey, St-Cyr, & Bernatchez, 2013). Under this context, here we examine the transcriptome structure and the expression of genes during the embryo-to-larva transition of the giant red sea urchin Mesocentrotus franciscanus (syn. Strongylocentrotus franciscanus), exploring questions regarding the level of conservation of homologous genes and functional pathways. This sea urchin is one of the most distantly related strongylocentrotid species yet lives in sympatry with S. purpuratus (Lee, 2003), and the development, survival, and settlement of its planktotrophic larvae are affected by changes in environmental conditions and by large-scale oceanographic processes associated with "El Niño" (Dewees, 2003;Ebert, Schroeter, Dixon, & Kalvass, 1994). Therefore, assessing the developmental genetic background of early ontogenetic stages of M. franciscanus would provide valuable genomic and molecular tools to study not just the evolution of GRNs in echinoderms, but also the plastic and adaptive capacity of these organisms to respond to the rapid environmental change projected for the future in one of the most productive marine regions in the world, the California Current Large Marine Ecosystem (Hofmann et al., 2014).

| Ethics statement
This study was carried out in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the Comisión Nacional de Investigación Científica y Tecnológica de Chile (CONICYT). The protocol was approved by the Committee on the Ethics of Animal Experiments of the University of California, Santa Barbara.

| Animal collection, fertilization, and larval culturing
Mature adults of the giant red sea urchin (Mesocentrotus franciscanus) were collected by SCUBA divers near Goleta Pier in the Santa Barbara Channel, California, USA (34°24′N, 119°49′W) in June 2013. Animals were maintained in a flowing seawater system under ambient conditions (~14°C, 32%o and pH ~ 8.0) at the University of California, Santa Barbara. Gametes were obtained by intracoelomic injection of 0.5 mol/L KCl. The eggs were resuspended in 0.35μm-filtered, UV-sterilized seawater (FSW) at ambient temperature and pCO 2 .
Sperm was collected dry and kept on ice until dilution on FSW prior to fertilization. Nine families were established using single dam-sire crosses (the sperm from one male was used to fertilize the eggs of a single female). After fertilization, embryos from each family were equally (~20 embryos/ml) and randomly distributed in plastic buckets. Development was tracked by recording the proportion of embryos to reach gastrula and early pluteus stages (~29 hr and 82 hr, respectively). At these ontogenetic stages, larvae from each bucket were collected in a small water volume using a reverse-filtration siphon and transferred to 1.5-ml Eppendorf tubes. These samples were then centrifuged at low speed to pellet the larvae, removing the excess water and adding 1 ml of TRIzol Reagent (Invitrogen, Carlsbad, CA, USA).
Samples were quickly frozen in liquid nitrogen and stored at −80°C for subsequent analysis.

| RNA extraction, cDNA library construction, and sequencing
Total RNA was extracted from each larval culture (~30,000 larvae) at gastrula and early pluteus stages (18 samples) following the guanidine isothiocyanate method (Chomczynski & Sacchi, 1987). Extracted RNA was further cleaned to remove degraded, fragmented RNA using an RNeasy® Mini Kit (Qiagen, Valencia, CA) according to manufacturer instructions. Quality and quantity of the RNA were analyzed by Agilent 2100 Bioanalyzer RNA assays and evaluated by calculating the ratio of the 28S and 18S ribosomal RNA intensity peaks. Highquality RNA (RINs over 8.5) was pooled (~5 μg each sample) according to their ontogenetic stages (gastrula and early pluteus) in order to average out differences in expression levels between families. Illumina TruSeq RNA library preparation, clustering, and sequencing reagents were used throughout the process following the manufacturer's recommendations (Illumina, San Diego, CA). Briefly, 1 μg of total RNA was used as starting material for library preparation with the Illumina TruSeq RNA preparation Kit V2. Poly-T oligo-attached magnetic beads were used to purify mRNA, which was then fragmented for 8 min at 94°C. First-strand and second-strand DNA were subsequently synthesized. Libraries were multiplexed with Illumina barcodes and sequenced in two lanes at GENEWIZ, Inc. (South Plainfield, NJ) using a 2 × 100 paired-end (PE) configuration on Illumina HiSeq2000 platform. Image analysis and base calling were conducted using the HiSeq Control Software (HCS). These base calls were used to generate BCL and FASTQ files by mean of the Illumina's CASAVA 1.8.2 program.

| Transcriptome assembly and annotation
Following sequencing, quality control of the raw data was performed using the CLC Genomics Workbench software v.8.5 (CLC bio, Denmark) in a four-step pipeline: (1) removal of adapter sequences; (2) removal of all reads containing more than 5% of ambiguous nucleotides ("N"); (3) trimming base pairs with a Phred quality score ≤ 30 from the 3′-end of each sequence; and (4) removal of reads shorter than 30 bp after trimming. Because the available reference genome of M. franciscanus is still in the early stages (low coverage and annotations), three de novo assemblies were developed in this study: (1) the reference transcriptome including all the libraries and ontogenetic stages; (2) the gastrula; and (3) the pluteus transcriptomes. Assemblies of high-quality reads were carried out using the Trinity software (Grabherr et al., 2011). Assemblies were performed with default settings and a minimum contig length of 200 nt. Reads that were not incorporated into any contig (i.e., singletons) were discarded and excluded from further analyses. Duplicate sequences were removed after the de novo assemblies.
The de novo assemblies were separately blasted against the UniProt (Swiss-Prot and TrEMBL) and NCBI RefSeq (nr) protein databases using the BLASTX algorithm with an e-value cutoff of 10 e −5 .
Annotated unigenes (consensus, nonredundant sequences) were further searched for Gene Ontology (GO) terms using the Blast2GO software (Conesa et al., 2005) according to the main categories of Gene Ontology (GO; molecular functions, biological processes, and cellular components; Ashburner et al., 2000). Complementary annotations were performed with the InterProScan v.5 software (Jones et al., 2014), which provides functional analysis of proteins by classifying them into families and predicting domains and important sites.
The annotation results were further fine-tuned with the Annex and GO slim functions of the Blast2GO software in order to improve and summarize the functional information of the transcriptome dataset.
The distribution of annotated unigenes among GO categories was mapped using the WEGO software (Ye et al., 2006). Additionally, a GO enrichment analysis using Fisher's exact test was also performed in Blast2GO to test whether any of the GO terms appeared signifi-  (P. parvimensis), and SRX666715 (A. filiformis). The same quality control and pipelines applied for M. franciscanus were implemented to obtain high-quality contigs of the additional species. Transcripts of the five species were translated and used for comparisons of orthologous clusters with the OrthoVenn software (Wang, Coleman-Derr, Chen, & Gu, 2015) using default parameters to identify GO categories and any GO enrichment.

| Differential gene expression analysis
In order to identify differentially expressed genes on early developmental stages of the giant red sea urchin M. franciscanus, reads of each sample were mapped to the de novo assembled developmental transcriptome (gastrula + pluteus) using CLC Genomics Workbench software v.8.5 (CLC bio, Denmark). The length fraction was set to 0.7, and the minimum similarity was set to 0.9, which meant that at least 70% of the individual reads had at least 90% identity with the reference sequences to be mapped and aligned. The read counts were normalized by calculating the number of reads per kilobase per million mapped reads (RPKM) and log 2 transformed. Then, the Manhattan metric distance was used for hierarchical cluster analysis, and a Kal's test was used to compare gene expression levels for larval stages. The Kal's test relies on an approximation of the binomial distribution by the normal distribution. This proportion-based test for gene expression is applicable for single sample to single-sample comparisons. The cutoff of FDR p-value correction ≤0.05 and fold-change value ≥2 was used to determine significant differential expression. The results of this comparative analysis were checked by inspecting the distribution of differentially expressed genes using volcano plots. Enrichment analysis of differentially expressed genes was conducted by hypergeometric tests using CLC Genomics Workbench software v.8.5 (CLC bio, Denmark) against the background of expressed genes (cutoff P FDR ≤ 0.05). The REViGO Web server was used for visualization of the GO terms associated with the differentially expressed genes.
Treemaps in REViGO present hierarchical data as nested rectangles and provide an intuitive visualization of the dataset (Supek, Bošnjak, Škunca, & Šmuc, 2011). Size of the rectangles was adjusted to reflect the p-value using the abs_log_pvalue option in REViGO. Finally, comparison of the expression profiles during the embryo-to-larva transition of M. franciscanus and its sympatric species S. purpuratus was developed using the Query tool available in Echinobase (echinobase. org; Cameron, Samanta, Yuan, He, & Davidson, 2009;Tu, Cameron, & Davidson, 2014). Here, our DEGs were assigned to the 24 Function Classes described in Tu et al. (2014) and compared to the profile of those embryonic genes in S. purpuratus.  Table   S1). The SRA raw reads were deposited on GenBank public database under the accession numbers SRS823202 (gastrula), SRS823216

| Sequence analysis and assembly
(gastrula) SRS823218 (pluteus), and SRS823221 (pluteus) of the bioproject PRJNA272924. After a stringent filtering process, ~91% high-quality, adapter-free, and nonredundant reads were retained for de novo assembly and further downstream analyses. Quality metrics of assembled transcriptomes (e.g., N 50 and L 50 ) are summarized in Table 1. Completeness assessments using CEGMA identified 247 out of the 248 core proteins (99.6%) as complete (defined as >70% alignment length with core protein) and 248 (100%) as partially present (Table 1). This was consistent with the Benchmarking Universal Single-Copy Orthologs (BUSCO; Table 1).

| Functional annotation
Our reference de novo assembled transcriptome contains 176,885 tentative consensus sequences with an N 50 of 1,637 bp and an L 50 of 21,423 (Table 1). From these unigenes, 60,439 (~34%) were blasted to known proteins in the public databases NCBI (nr) and UniProt (Swiss-Prot and TrEMBL), while 116,446 (~66%) had no matches and may represent: (1) specific unigenes of M. franciscanus with unknown function; (2) sequences with low similarity to those compared in public databases; and/or (3) chimeric sequences. Although the percentage of unigenes with a BLAST-hit may appear to be relatively low, we found that the number of unigenes with significant alignments (≤1 e −5 ) to known proteins in M. franciscanus is higher than those reported in other studies with nonmodel echinoderms (Delroisse et al., 2015;Dilly et al., 2015;Gaitán-Espitia, et al., 2016 Fig. S1).
In order to obtain a comprehensive insight into the possible functions of blasted unigenes in the gastrula (48,347) and pluteus (44,977) transcriptomes, we merged the gene ontology (GO) annotations obtained from Blast2GO and InterProScan, resulting in 107,758 and 98,586 GO terms, respectively. These GO annotations were similarly distributed among the main GO categories in both transcriptomes (Table S2) (Table S3). This is explained by the higher representation of Toll-like receptor proteins (e.g., sp-Tlr037), binding proteins (e.g., Sp-Pacsin2), developmental proteins such as the Frizzled proteins Frizz4 and the Dishevelled proteins Dsh involved in the Wnt signaling pathway (Bilić et al., 2007), and proteins from the TFG-beta family involved in growth such as the Admp (Table   S4). Unique clusters in the gastrula transcriptome (Table S5) (Table S4).
Active biological pathways during early ontogenetic stages of M. franciscanus were obtained from the KEGG Orthology database.
Here, a total of 5,428 transcripts were linked to EC numbers and mapped to six enzyme classes in which hydrolases showed the highest number of sequences followed by transferases and oxidoreductases ( Fig. S3). This pattern was consistent among the three do novo assemblies (Fig. S3). The high representation of these three main enzyme classes could be related to particular developmental processes. For instance, some hydrolases (e.g., lysosome acid hydrolases) are known to play an important role in intracellular digestion and synchronic degradation of the major yolk glycoproteins (e.g., vitellins) during  (Kari, 1985;Yokota & Kato, 1988 : specific (1) or shared by 2, 3 or 4  From these, signal transduction, endocrine system, and nervous system were the most represented pathways in the reference, gastrula, and pluteus transcriptomes (Figure 3a), and unigenes were mainly allocated to the KO categories of Organism System, Metabolism and Environmental Information Processing (Figure 3b).

| Identification of genes expressed during the embryo-to-larva transition
In developmental biology, gene regulatory networks (GNRs) are used to describe the progression of regulatory states, in embryonic space and time, necessary to specify different cell types present in a multicellular organism (Davidson, 2006). For sea urchins, these GRNs are represented by over 200 experimentally verified regulatory interactions acting during early development, from the unfertilized egg through the formation of a planktonic feeding larva (Israel et al., 2016). In our study, we identified ~2,600 functional, structural, and regulatory genes in M. franciscanus involved in developmental process (GO:0032502). Some of these genes (~580) were specifically related to several aspects of embryonic development (GO:0009790), from organogenesis through to the differentiation of the musculoskeletal, cardiovascular, and nervous system (Table S7). As has been documented in other studies, sea urchin fertilization triggers a series of preprogrammed events functioning to activate egg metabolism, incorporate the paternal genome, and initiate development (Walker, Unuma, & Lesser, 2006). Perhaps one of the first events is the rise of calcium signals (Ca 2+ ), which are modulated by ryanodine receptors genes (RyRs) in order to prevent polyspermy and reinitiate the cell cycle for development (Jaffe, Giusti, Carroll, & Foltz, 2001   axis formation and nervous system patterning by the establishment of ternary complexes with low-density lipoprotein receptors (e.g., LRP6) and Frizzled receptors (e.g., FZD5; Table S7; Bilić et al., 2007). After Wnt stimulation, the segment polarity protein disheveled (e.g., DVL3) mediates the phosphorylation of the LRP6 by Casein kinases (e.g., CSK1D), promoting recruitment of the negative regulator Axin (e.g., AIDA-A), which, in turn, stabilizes Wnt signaling transducers such as the β-catenin (Table S6; Angerer & Angerer, 2000;Bilić et al., 2007).
Between the early blastula and early gastrula stages of sea urchins, all of the GRNs controlling the major embryonic territories (i.e., skeletogenic, endomesoderm, and ectoderm) have been activated, initiating the programs of cell-type-specific differentiation (Wei, Angerer, & Angerer, 2006). These processes are modulated by subcircuits of genes encoding transcription factors and their linkages (Israel et al., 2016). In our study, we were able to identify genes related to transcription regulatory activity ( Table S2) could reflect differences in the time of specification and differentiation of cell types (e.g., endomesoderm vs. neural cell types), and also sequential activation of genes in a regulatory hierarchy (Wei et al., 2006). In fact, in our analysis, some genes related to the determination of left/right symmetry (GO:0007368) were only present in the pluteus transcriptome (Table S7), supporting the idea of time-dependent processes associated with GRNs (Israel et al., 2016).

| Evolutionary conservation of genes among echinoderms
The embryogenesis and early larval development of echinoderms involve the interactions of regulatory, structural, and functional genes that, in some cases, are highly conserved even among distantly related echinoderm species (Hinman & Davidson, 2003a,b;Hinman, Nguyen, Cameron, et al., 2003;. However, differences in skeleton formation and embryonic territories specification in this group are the result of evolutionary changes, which include gene duplications, protein function diversification, and genes co-opted to different functions (Dylus et al., 2016;Hinman & Davidson, 2007;Hinman, Nguyen, Cameron, et al., 2003;McCauley et al., 2010McCauley et al., , 2012. Here, comparisons of early developmental transcriptomes of echinoderms revealed several proteins/genes that are highly conserved among sea urchins, sea cucumbers, and brittle stars (Figure 2b,c). Although the number of proteins from each species was very similar (Table S9) Table S12). Some clusters between M. franciscanus and S. purpuratus were particularly associated with sea urchin embryogenesis, involving homolog proteins of the blastula protease 10 (Adam/TsL6), the short gastrulation Chordin, and the protein twisted gastrulation (Tsg ; Table S12).

CONFLICT OF INTERESTS
None declared.

DATA DEPOSITION
The datasets generated and analyzed during this study are available in the National Center for Biotechnology Information (NCBI) repository into the bioproject PRJNA272924. The NCBI Sequence Read Archive codes of the raw reads are SRS823202, SRS823216, SRS823218, and SRS823221. In addition, raw reads and the assembled transcriptome have been deposited in Dryad, DOI: 10.5061/dryad.hc7v5. All other data generated or analyzed during this study are included in this published article and its supporting information files.