Volume 60, Issue 3 p. 174-182
ORIGINAL ARTICLE
Free Access

HpBase: A genome database of a sea urchin, Hemicentrotus pulcherrimus

Sonoko Kinjo

Center for Information Biology, National Institute of Genetics, Mishima, Japan

Search for more papers by this author
Masato Kiyomoto

Corresponding Author

Marine and Coastal Research Center, Ochanomizu University, Chiba, Japan

Correspondence

Masato Kiyomoto

Email: kiyomoto.masato@ocha.ac.jp

Kazuho Ikeo

Email: kikeo@nig.ac.jp

and

Shunsuke Yaguchi

Email: yag@shimoda.tsukuba.ac.jp

Search for more papers by this author
Takashi Yamamoto

Department of Mathematical and Life Sciences, Graduate School of Science, Hiroshima University, Hiroshima, Japan

Search for more papers by this author
Kazuho Ikeo

Corresponding Author

Center for Information Biology, National Institute of Genetics, Mishima, Japan

Correspondence

Masato Kiyomoto

Email: kiyomoto.masato@ocha.ac.jp

Kazuho Ikeo

Email: kikeo@nig.ac.jp

and

Shunsuke Yaguchi

Email: yag@shimoda.tsukuba.ac.jp

Search for more papers by this author
Shunsuke Yaguchi

Corresponding Author

Shimoda Marine Research Center, University of Tsukuba, Shizuoka, Japan

Correspondence

Masato Kiyomoto

Email: kiyomoto.masato@ocha.ac.jp

Kazuho Ikeo

Email: kikeo@nig.ac.jp

and

Shunsuke Yaguchi

Email: yag@shimoda.tsukuba.ac.jp

Search for more papers by this author
First published: 13 March 2018
Citations: 11

Abstract

To understand the mystery of life, it is important to accumulate genomic information for various organisms because the whole genome encodes the commands for all the genes. Since the genome of Strongylocentrotus purpratus was sequenced in 2006 as the first sequenced genome in echinoderms, the genomic resources of other North American sea urchins have gradually been accumulated, but no sea urchin genomes are available in other areas, where many scientists have used the local species and reported important results. In this manuscript, we report a draft genome of the sea urchin Hemincentrotus pulcherrimus because this species has a long history as the target of developmental and cell biology in East Asia. The genome of H. pulcherrimus was assembled into 16,251 scaffold sequences with an N50 length of 143 kbp, and approximately 25,000 genes were identified in the genome. The size of the genome and the sequencing coverage were estimated to be approximately 800 Mbp and 100×, respectively. To provide these data and information of annotation, we constructed a database, HpBase (http://cell-innovation.nig.ac.jp/Hpul/). In HpBase, gene searches, genome browsing, and blast searches are available. In addition, HpBase includes the “recipes” for experiments from each lab using H. pulcherrimus. These recipes will continue to be updated according to the circumstances of individual scientists and can be powerful tools for experimental biologists and for the community. HpBase is a suitable dataset for evolutionary, developmental, and cell biologists to compare H. pulcherrimus genomic information with that of other species and to isolate gene information.

1 INTRODUCTION

Phylogenetically, echinoderms share a common ancestor with chordates in the deuterostome clade. This provides us the opportunity to consider echinoderms as model organisms in discussing how deuterostomes have evolved from the common ancestors of bilaterians. In particular, because the centralization of the nervous system in deuterostomes is one of the largest questions in biology, the neurogenesis and neural functions in echinoderms become good targets in evolutionary studies (Garstang, 1922, 1984). In addition, a number of scientists using echinoderms with experimental‐biology techniques have answered fundamental questions in biology (Davidson, 2010; Evans, Rosenthal, Youngblom, Distel, & Hunt, 1983; Levine & Davidson, 2005). However, although four sea urchin species have mainly been used as model organisms in echinoderm experimental biology throughout the world, the genomes of only the North American species, Strongylocentrotus purpratus and Lytechinus variegatus, have been reported (Sergiev, Artemov, Prokhortchouk, Dontsova, & Berezkin, 2016; Sodergren et al., 2006). Therefore, the genomic information of the other two species, Hemicentrotus pulcherrimus in Asia and Paracentrotus lividus in Europe, is awaited because the information is now essential for evolutionary and experimental biology.

In Japan, H. pulcherrimus has a long history as a model organism in the field of biology, and reports on H. pulcherrimus have increased since the mid‐1900s. The number of scientists using H. pulcherrimus is now the second largest in the sea urchin experimental biology field, and a number of excellent studies using this species have been published. For example, papers about the mechanisms of body axis formation (Kominami, 1988), neurogenesis (Yaguchi & Katow, 2003), and cilia/flagella motility (Yano & Miki‐Noumura, 1981) have provided many important biological insights. In addition, genome editing was introduced into H. pulcherrimus first among echinoderms (Ochiai et al., 2010, 2012). These facts indicate that providing the fully accessible genomic information of this species will advance our understanding of many biological questions.

Here, we report the genomic information for a species of non‐American sea urchin, H. pulcherrimus. We also build a publicly available database for the H. pulcherrimus genome, HpBase, and discuss the possibility of H. pulcherrimus as a model organism for modern experimental, evolutionary, cell, and developmental biology.

2 MATERIALS AND METHODS

2.1 Genome materials and sequencing

Genomic DNA of H. pulcherrimus isolated from the sperm of a single male collected in Shimoda, Shizuoka, Japan, was used for constructing the genomic libraries: a pair‐end library with inserts of approximately 300 bp and two mate‐pair libraries with inserts of 3 kbp or 10 kbp. These genomic libraries were sequenced with Illumina HiSeq 2000. The library construction and sequencing were conducted by Taraka Bio Inc.

2.2 Transcriptome materials and sequencing

For transcriptome sequencing, we used the mixture of total RNA extracted from embryos from a single pair of a males and a female at several developmental stages (2, 14, 21, and 43 hr after fertilization). The paired‐end RNA‐seq library was sequenced with read length of 90 bp using Illumina HiSeq2000. The library construction and sequencing were performed by Beijing Genomics Institute (BGI).

2.3 Genome and transcriptome assemblies

de novo genome assembly for H. pulcherrimus was performed with ALLPATHS‐LG (Gnerre et al., 2011) with the parameter “HAPLOIDIFY=TRUE” by Taraka Bio Inc. The raw sequence reads of the genome with a quality score less 10 were removed from the analysis. Additional sequences of 10,307 BAC ends (accession IDs: AG981701AG992007) were used to improve the assembly quality using SSPACE (Boetzer, Henkel, Jansen, Butler, & Pirovano, 2011). Gaps in the scaffold sequences were filled using SOAPdenovo version 1.12 Gap closer (Luo et al., 2012) implemented on the NGS data analysis platform, Maser (see below). The transcriptome reads were assembled using Trinity r2013‐02‐25 (Grabherr et al., 2011; Haas et al., 2013) after adapter trimming.

2.4 Gene predictions and annotation

We used MAKER2 (Holt & Yandell, 2011) to predict the gene structure from the genomic sequences. Briefly, MAKER2 first predicts gene models using two gene prediction programs and aligns these primary gene models to known sequences by BLAST (ncbi‐blast‐2.2.28+) searches, producing the integrated gene models. We collected the reference sequences of related species for BLAST searches from the NCBI nucleotide and protein database. Finally, MAKER2 generated nucleotide and protein sequences of the H. pulcherrimus gene model (nucleotide and protein sequences) and gene structure file.

We also predicted protein‐coding genes based on the transcriptome assembly using TransDecoder 2.0.1 (Haas et al., 2013). The resultant ORFs were clustered using the CD‐HIT program version v4.6.1 (Li & Godzik, 2006) under a similarity threshold of 100% to remove redundant or highly redundant sequences.

To infer gene function and orthology, we conducted reciprocal (bidirectional) BLASTP (NCBI‐blast‐2.2.30+) searches between H. pulcherrimus and S. purpuratus gene models (SPU_nucleotide.fasta deposited in Echinobase at http://www.echinobase.org). We also searched H. pulcherrimus gene models against the UniProt database and NCBI nucleotide collections to help annotate the gene models. From these BLAST search results, the top hit subject (E‐value ≤ 1e‐10) was retrieved.

2.5 Genome size estimation

The Illumina sequencing reads of the H. pulcherrimus genome were used for genome size estimation with a genome character estimator, GCE v1.0.0 (Liu et al., 2013), that takes into account the repeat contents and heterozygosity from the kmer distribution to estimate a more accurate genome size. We first calculated the distribution of the 17‐mer frequency for the Illumina raw reads using Jellyfish v1.1.11 (Marçais & Kingsford, 2011) and then ran GCE using the resultant 17‐mer frequency with optional settings of ‐c 100 and ‐H 1 (for adjusting non‐unique peaks and a heterozygous‐caused peak, respectively).

2.6 Assessment of assembly and annotation

The completeness of the genome/transcriptome assembly and gene prediction was assessed using BUSCO (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015). BUSCO searches for 843 genes that are highly conserved in Metazoan from the target sequences and evaluates the completeness with that ratio (the higher the ratio, the higher the completeness). We ran BUSCO for the H. pulcherrimus genome/transcriptome assembly and gene models predicted from each sequence and for the genome assembly of S. purpuratus as a comparison. For S. purpuratus, genome assembly v3.1 (Spur_3.1.LinearScaffold.fa) released by EchinoBase was used for the assessment.

2.7 Gene model comparison and orthology prediction

To investigate how de novo gene models of H. pulcherrimus were supported by the other data and to infer those orthologs, we conducted reciprocal BLASTP searches of sequences between the genome‐derived and transcriptome‐derived gene models of H. pulcherrimus and S. purpuratus gene models. For S. purpuratus, peptide sequences, SPU_peptide.fasta, embedded in annotation database build 7 were used. An e‐value less than 1e‐10 was used as a cut‐off to define a BLAST hit. We also defined orthologous genes as pairs of genes that are mutually unique best hits in a reciprocal BLAST search.

2.8 Maser and Genome Explorer (GE) for NGS data analysis and visualization

For the data analysis, we used the NGS data analysis platform, Maser, which is managed by the National Institute of Genetics, Japan (http://cell-innovation.nig.ac.jp/maser/). Maser is equipped with over 400 bioinformatic tools, including tools for the annotation of the de novo genome and transcriptome. We also used a genome browser, Genome Explorer (GE), to visualize the genome assembly and gene features. GE is a web‐based genome browser that was developed by the National Institute of Genetics, Japan, and implemented on Maser. GE can be used for gene searches by gene ID, name, or synonyms and for retrieval of sequences, and therefore can be used as a simple database. We used this function of GE to visualize the H. pulcherrimus genome and to construct its database.

2.9 Web browser requirements

To use HpBase, Google Chrome is recommended since GE work best with Google chrome.

3 RESULTS AND DISCUSSION

3.1 Genome size estimation and sequence coverage

The size and sequence coverage of the H. pulcherrimus genome were estimated to be 807 Mbp and 103.9x, respectively, using the GCE (Liu et al., 2013). This estimation is comparable to 814 Mbp in S. purpuratus (Sodergren et al., 2006) but much larger than the total length (568.9 Mbp) of all genomic scaffolds of H. pulcherrimus (Table 1), implying that there still remain many unassembled reads probably due to the high repeat contents and heterogeneity in the H. pulcherrimus genome.

Table 1. Summary of genome assembly and gene annotation
Number Average length (bp) N50 (kbp) Total length (Mbp) Number of gaps (n)
Genome
Contig 134,246 3,236 4.7 431.7
Scaffold (v0.5) 16,236 34,768 137.8 567.6 135,872,766
Scaffold (v0.8) 16,251 35,121 143.3 570.8 139,009,206
Scaffold (v1.0) 16,251 35,008 142.6 568.9 93,327,514
Gene model 24,860 1,685
Transcriptome
Contig 124,330 832
Gene model 20,564 1,657

3.2 Genome assembly v0.5

The Illumina HiSeq 2000 platform generated 1,700,169,112 reads (100 bp each) (1,152,419,110 reads for the pair‐end library, 243,436,308 reads for the 3k mate‐pair library and 304,313,694 reads for the 10k mate‐pair library). As a result of the first assembly with these reads, we obtained 134,246 contigs with an N50 length of 4,745 bp and 16,326 scaffolds with an N50 length of 134,805 bp (Table 1).

3.3 Genome assembly v0.8

We reconstructed scaffolds using additional sequences of H. pulcherrimus BAC clones (AG981701AG992007), improving the assembly quality slightly: the number of scaffolds decreased to 16,251, and the N50 length was extended to 143,251 bp (Table 1).

3.4 Genome assembly v1.0 (HpulGenome_v1_scaffold.fa)

Finally, we filled gaps in the genome assembly by using the SOAPdenovo gap closure module (Luo et al., 2012). As a result, the number of gaps (denoted by “N”) in the genome assembly v0.8 (139,009,206) was reduced by 23% in v1.0 (93,327,514), and the scaffold N50 was shortened slightly (from 143,251 in v0.8 to 142,559 in v1.0) (Table 1). This is the latest version of the assembly of the H. pulcherrimus genome shown in HpBase.

3.5 Gene prediction based on genome assembly

MAKER2 (Holt & Yandell, 2011) predicted 25,147 de novo gene models based on the latest version of the genomic assembly, HpulGenome_v1 (Table 1). Of these, invalid gene models that were too short (less than 50 aa) or contained multiple stop codons, or mitochondrial genes were removed. Finally, 24,860 gene models with average length of 1,685 bp were regarded as genome‐derived gene models (Table 1). The resultant sequences and gene structure datasets were summarized in “HpulGenome_v1_nucl.fa” (nucleotide sequences in fasta format), “HpulGenome_v1_prot.fa” (protein sequences in fasta format fasta), and “HpulGenome_v1.gff” (gene structure in general feature format). These datasets are deposited at the website for the H. pulcherrimus genome (see below). Based on the data in Echinobase (http://www.echinobase.org/) (Cameron, Samanta, Yuan, He, & Davidson, 2009; Kudtarkar & Cameron, 2017), there are 29,948 (June, 2017) total annotated genes of S. purpuratus and 22,105 total annotated genes of L. variegatus, indicating that the number of predicted genes in H. pulcherrimus is reasonable as data from the first assembled genome.

3.6 Transcriptome assembly (HpulTranscriptome)

A total of 53,376,104 RNA‐seq reads (100 bp each) were assembled into 124,330 contigs with an average length of 832.2 bp and an N50 length of 1,541 bp. These sequences are included in the dataset “HpulTranscriptome.fa” in HpBase. Based on the transcriptome assembly, 32,169 ORFs were identified by TransDecoder 2.0.1 (Haas et al., 2013). After the removal of highly redundant sequences (95% sequence similarity to each other), 20,564 gene models with an average length of 1,657 bp remained as transcriptome‐derived ORFs (Table 1) (deposited in HpBase as “HpulTranscriptome_nucl.fa” and “HpulTranscriptome_prot.fa”). In this study, we mixed total RNAs equally from four stages, maternal (2 hr), early blastula (14 hr), mesenchyme blastula (21 hr), and prism larva (43 hr), and assembled the reads. Thus, the data here lack the stage‐specific information and are not sufficient to cover all developmental events. However, combined with the gene prediction from the genome data, this transcriptome becomes a powerful tool in H. pulcherrimus biology because it contains the real ORFs (see below) and UTR regions, which are essential for designing knockdown experiments and/or for making multiple in situ hybridization probes. Together with the previously reported gastrula EST data (Shoguchi, Tokuoka, & Kominami, 2002), most of the embryonic transcripts should be covered and will be available in HpBase.

3.7 Assessment of assembly and gene prediction completeness

The completeness of the H. pulcherrimus genome and transcriptome assembly and gene predictions were assessed using BUSCO (Simão et al., 2015) (Table 2). As a result, BUSCO returned high degrees of completion (85.8%–97% of core metazoan genes) in both genome and transcriptome datasets. The degree of completion is higher for transcriptome sequences (94.4%–97.2%) than for genome sequences (85.8%–94.1%). The completeness is highest in transcriptome‐derived ORFs, while the number of predicted ORFs (20,564) is smaller than those of genome‐derived gene models (24,860) (Table 1).

Table 2. Assessment of genome completeness and gene annotation by searching for 843 BUSCOs
Dataset Complete single‐copy BUSCOs Fragmented BUSCOs Complete single‐copy + fragmented BUSCOs
Number % Number % Number %
Hemicentrotus pulcherrimus

Genome assembly

(HpulGenome_v1_scaffold.fa)

662 78.53 131 15.54 793 94.07

Gene model from genome assembly

(HpulGenome_v1_prot.fa)

511 60.62 211 25.03 722 85.65

Transcriptome assembly

(HpulTranscriptome.fa)

679 80.55 122 14.47 801 95.02

ORFs inferred from transcriptome assembly

(HpulTranscriptome_prot.fa)

707 83.87 112 13.29 819 97.15
Strongylocentrotus purpuratus

Genome assembly v3.1

(Spur3.1_Linearscaffolds.fa)

749 88.85 62 7.35 811 96.20

3.8 Comparison of gene models

To investigate how genome‐derived gene models were supported by transcriptome data, we conducted reciprocal BLASTP searches of genome‐derived gene models against transcriptome‐derived ORFs of H. pulcherrimus. As a result, 11,557 (46.5%) and 13,123 (52.8%) H. pulcherrimus genome‐derived gene models were matched to a single or at least one transcriptome‐derived gene model under the E‐value of 1e‐10, respectively (Table 3). Only the remaining 180 (0.7%) had no similar gene, indicating that most of genome‐derived gene models were supported by transcriptome data.

Table 3. Result of BLASTP searches for Hemicentrotus pulcherrimus gene models against transcriptome and Strongylocentrotus purpuratus gene models
Query Subject Number of gene
Reciprocal best hit pair Best hit under 1e‐10 No hit
H. pulcherrimus gene model from genome assembly( (HpulGenome_v1_prot.fa) H. pulcherrimus gene model from transcriptome assembly(HpulTranscriptome_prot.fa) 11,557 13,123 180
46.5% 52.8% 0.7%
H. pulcherrimus gene model from genome assembly )(HpulGenome_v1_prot.fa) S. purpuratus gene model (SPU_peptide.fa) 13,772 10,877 211
55.4% 43.8% 0.8%

Next, in order to infer gene orthology relationships, we conducted reciprocal BLASTP searches of gene models between H. pulcherrimus and S. purpuratus (Table 3). Among the 24,860 H. pulcherrimus gene models, 13,772 (55.4%) were matched to a single S. purpuratus gene model because they were reciprocal and unique best hit, and therefore, it is highly possible that these gene pairs are orthologous genes. In the remaining genes, 10,877 (43.8%) were matched to at least one S. purpuratus gene model with an E‐value of 1e‐10 or less, while 211 (0.8%) had no similar gene in S. purpuratus gene models. Based on these analyses, the genes predicted in this study using Maser are of sufficient quality for developmental, genetic, cell, and evolutionary biology.

3.9 Gene name annotation

Based on a BLASTP search result against S. purpuratus genes, we assigned gene names to de novo gene models of H. pulcherrimus according to the following criteria. (i) If both gene models of H. pulcherrimus and S. purpuratus matched each other as a single unique best hit, then we gave the same name as the S. purpuratus gene to the H. pulcherrimus gene model (e.g., Sp‐FoxQ2 → Hp‐FoxQ2). (ii) If the H. pulcherrimus gene model matched any S. purpuratus gene with an E‐value of 1e‐10 or less but not as a reciprocal best hit, we added “‐like” to the end of gene name (e.g., Sp‐FoxQ2 → Hp‐FoxQ2‐like). However, in both criteria (i) and (ii), there are some gene models for which the top hit subject of the S. purpuratus gene is not well annotated (namely, the gene name is not assigned and is denoted by “none”). Such gene models were named “hypothetical protein”, as were those that satisfied the third criteria. (iii) The remaining H. pulcherrimus gene models that do not have any similar gene in S. purpuratus were named “hypothetical protein” (e.g., Hp‐Hypp_0001). We also conducted BLASTP searches of H. pulcherrimus gene models against the UniProt database and NCBI nucleotide collections to help annotate the gene models. From these BLAST search results, the top hit subject (E‐value < 1e‐10) was retrieved. These results of gene name annotation and blast searches were listed in “HpulGenome_v1_annot.txt”.

3.10 Visualization of H. pulcherrimus genome on Maser

The genome assembly HpulGenome_v1 and gene structures were visualized on a genome browser, Genome Explorer (GE), implemented on Maser (Figure 1). We embedded the information of gene annotation listed in “HpulGenome_v1_annot.txt”; therefore, newly assigned gene IDs and names of H. pulcherrimus can be used as keywords for gene searches. Additionally, gene IDs and names of S. purpuratus can be used for gene searches, showing the top hit gene in BLAST for H. pulcherrimus recorded in the annotation file.

image
Hemicentrotus pulcherrimus genome assembly on Genome Explorer (GE). GE display is composed of four components: (i) Genome box, (ii) Chromosome/Scaffold/Contig bar, (iii) Control menu bar, and (iv) Mapping view. (i) The left Genome box shows a list of genomic scaffolds of H. pulcherrimus. The scaffold or position to be displayed on viewer can be specified by using filter or position box (arrow and arrowhead, respectively) (i) and gene searches by keywords (Entrez gene ID, Refseq ID, gene name, and symbol) are available in the Search tab (i′). The top Scaffold bar indicates the name and length of the scaffold displayed in the viewer (ii). The second top Control menu bar contains icons to zoom in and out, shift left or right, and retrieve any sequences in the range specified in the ‘From’ and ‘To’ boxes (iii). The Mapping view consists, in order from the top, multiple tracks indicating the position of gene model from genome assembly, gene model from transcriptome assembly, and arbitral data that the user uploaded (e.g., RNA‐seq reads) (iv)

3.11 HpBase web site construction on Public Maser

Finally, we created a website, HpBase, for public release of the aforementioned datasets and genome viewer of H. pulcherrimus (Figure 2). This website provides a link to GE and direct download links to the datasets of genome assembly, gene models, and annotation. The following six links are shown on the top page of HpBase;

image
Screenshots of HpBase (http://cell-innovation.nig.ac.jp/Hpul/). HpBase consists of six pages: Home (a); Gene Search (b); Homology Search with BLAST (c); Genome Viewer with GE (d); Data Download (e); Protocols download (f)
  • Home: A link to go back to the home page of HpBase (Figure 2a).
  • Gene Search: A link to go to the gene search page, on which people can search for the specific gene of interest by using the gene ID, gene name, synonyms, transcriptome ID, or SPU gene ID (Figure 2b).
  • Homology Search: People can use a BLAST search to isolate sequence information of interest. Blastn, blastp, blastx, tblastn, and tblastx are available as BLAST programs. The database lists the whole genome scaffold, nucleotide and protein sequences for annotated genes from the genome and transcriptome data. This page is equipped with several parameters for more detailed analyses (Figure 2c).
  • Genome Viewer: A link to the Genome Explorer (Figure 2d).
  • Data Download: Raw data files for genome assembly, gene models, transcriptome, and annotation files are downloadable (Figure 2e).
  • Protocols: This page contains PDFs showing a brief protocol for the experiment in each laboratory. All laboratories using H. pulcherrimus or other organisms are welcome to upload their protocols to this page through submission and a brief review by the database administrator. The contents of this page will help researchers using H. pulcherrimus by providing experimental information based on the experiences of each scientist and/or each laboratory. For example, “Reagents for microinjection” shows how to prepare the reagents for microinjection into the eggs of H. pulcherrimus. Based on the published data for H. pulcherrimus, the effective concentrations of morpholino oligo‐nucleotides (MO) to knockdown genes are between 100 μmol/L (e.g., Nodal‐MO1) (Yaguchi et al., 2010) and 3.8 mmol/L (e.g., Wnt7‐MO1)(Yaguchi, Takeda, Inaba, & Yaguchi, 2016) so far. These experimental concentrations shown in the protocol produced healthy morphants. To show the specificity of morpholino, using the second non‐overlapped morpholino should be the first choice. The second choice is using different species of sea urchins to confirm the same phenotype. The third choice is using control morpholino, but universal control‐MO, GFP‐MO, and randomized‐MO obtained from Gene Tools are not available at high concentration for H. pulcherrimus because 3.8 mmol/L aliquots of these morpholinos killed the embryos for unknown reasons while some morpholinos like Wnt‐7 mentioned above produced healthy embryos with the same concentration (Figure 2f).

4 CONCLUSION

Since the genome of S. purpuratus was read in 2006 (Sodergren et al., 2006), most sea urchin biology relies on S. purpuratus‐genome data stored in Echinobase, even though the researchers use different species such as H. pulcherrimus or P. lividus. The data were so powerful that the experimental speed for cloning interesting genes or genomic fragments has been dramatically improved. However, the differences in species affects detailed information such as the UTR‐sequence in mRNA or the non‐coding sequences in the genome, especially during the use of Asian or European species. Thus, the data shown here and stored in HpBase will help researchers using H. pulcherrimus in their detailed analyses in the fields of cell, developmental, and evolutionary biology.

HpBase contains a number of tools useful to the study using sea urchins. In particular, “Protocols” is a new challenge because the researchers can easily make their own protocols available to the public and refer to another lab's protocols. Since no strict review to check each protocol will be performed when they are deposited, the researchers must judge whether the information on the list is reliable or applicable for their own experiments. However, it will be very useful and powerful information for experimental biology.

The descendants of the male whose sperm was used for genomic sequencing are kept in two independent marine stations, Shimoda Marine Research Center, University of Tsukuba, and Marine and Coastal Research Center, Ochanomizu University. These sea urchins are available by request as a reference for genomic studies, and we also intend to make inbred strains and will read the genome in the future.

5 AVAILABILITY OF DATA

The sequence data were submitted to DDBJ/EMBL/GenBank databases (BioProject Accession PRJDB6441). The accessions of sequences of genome assembly HpulGenome_v1 and transcriptome assembly HpulTranscriptome are BEXV01000001‐01016251 and IACU01000001‐IACU01124330, respectively. The raw sequencing data of whole genome sequencing and RNA‐seq are deposited in the DDBJ Sequence Read Archive (DRA), with Accessions DRA006268 and DRA006267, respectively.

ACKNOWLEDGMENTS

We thank Masafumi Muraoka, Kazutoshi Yoshitake, Sadahiko Misu, and Norikazu Monma for their help in the analyses and HpBase construction. This research was partially supported by the Platform Project for Supporting Drug Discovery and Life Science Research from the Japanese Agency for Medical Research and Development (AMED) (to KI and SK) and a project of the Joint Usage/Educational Center (to MK) and Grant‐in‐Aid for Scientific Research (B) (26290070) (to TY and SY), (C) (25440101) (to SY), and Grant‐in‐Aid for Young Scientists (B) (23770241) (to SY) by the Ministry of Education, Culture, Sports, Science & Technology in Japan.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.