Chromosome‐level genome assembly of Paralithodes platypus provides insights into evolution and adaptation of king crabs

Abstract The blue king crab, Paralithodes platypus, which belongs to the family Lithodidae, is a commercially and ecologically important species. However, a high‐quality reference genome for the king crab has not yet been reported. Here, we assembled the first chromosome‐level blue king crab genome, which contains 104 chromosomes and an N50 length of 51.15 Mb. Furthermore, we determined that the large genome size can be attributed to the insertion of long interspersed nuclear elements and long tandem repeats. Genome assembly assessment showed that 96.54% of the assembled transcripts could be aligned to the assembled genome. Phylogenetic analysis showed the blue king crab to have a close relationship with the Eubrachyura crabs, from which it diverged 272.5 million years ago. Population history analyses indicated that the effective population of the blue king crab declined sharply and then gradually increased from the Cretaceous and Neogene periods, respectively. Furthermore, gene families related to developmental pathways, steroid and thyroid hormone synthesis, and inflammatory regulation were expanded in the genome, suggesting that these genes contributed substantially to the environmental adaptation and unique body plan evolution of the blue king crab. The high‐quality reference genome reported here provides a solid molecular basis for further study of the blue king crab's development and environmental adaptation.


| INTRODUC TI ON
The blue king crab (Paralithodes platypus) is the largest crab, even one of the world's largest arthropods, and is routinely commercially caught and eaten. It is protected by a rigid exoskeleton that covers all external parts (Cunningham et al., 1992;Stevens, 2014). The ecologically and commercially valuable blue king crab is widely distributed throughout the North Pacific, including coastal areas of USA, Russia and Japan (Daly & Long, 2014), with combined annual landings of 50,000 metric tons (Otto & Jamieson, 2001). However, its population has declined significantly since the late 1990s (Garber-Yonts & Lee, 2016). In addition to overfishing, viral-related diseases have greatly threatened its population health (Ryazanova et al., 2015).
Like most marine invertebrates, crustaceans, such as blue king crabs, are exposed to high concentrations of microorganisms. The defence system against microbes in crustaceans is largely dependent on cellular activities, such as adhesion, phagocytosis, encapsulation, nodule formation and melanization, performed by haemocytes. Other antimicrobial factors, such as lysozymes, agglutinins and haemolysins, are also critical parts of their immune system (Stevens, 2014).
In addition, the blue king crab has a complex life cycle, with multiple larval stages, including four pelagic larval stages and a semibenthic post-larval stage (Daly & Long, 2014). During these long larval stages, crabs are particularly vulnerable to pathogens. Although the immune system has been extensively studied (Muraille & Goriely, 2017;Parkin & Cohen, 2001;Takeuchi & Akira, 2010), the genes involved in the immune response of crabs are poorly identified. For example, the Serpin gene is known to play a key role in the immune system of many model species (Gettins, 2002;Huntington, 2011;Wang et al., 2017), and its unique immune network has been clearly studied (Dittmann et al., 2015). However, the Serpin gene in the king crab is only known to have anticoagulant and anticomplement effects (Dittmann et al., 2015), and other immune responses for adaptative evolution remain to be systematically clarified.
There are 129 king crab species in the 15 genera of the superfamily Lithodoidea (Stevens, 2014). To date, however, only limited RNA-sequencing (RNA-seq) data on Paralithodes camtschaticus have been released, and no other king crab species, including the blue king crab, have been systematically studied by omics-based analyses.
Based on genome size of crabs in the Animal Genome Size Database (http://genom esize.com), the C-values of Necora puber in Portunidae and Xantho pilipes in Xanthidae are 15.17 and 11.77, respectively (Bonnivard et al., 2009), indicating that the genomes of these two crab species are larger than 10 Gb. However, the two published crab genomes, namely swimming crab (Portunus trituberculatus) and Chinese mitten crab (Eriocheir japonica sinensis), are approximately 1.00 and 1.27 Gb, respectively . Thus, genome size among crab species appears to vary greatly. Several previous studies have investigated the reason for expanded genome size in specific species, with insertion and replication of transposable elements found to be the primary contributors (Marburger et al., 2018;Naville et al., 2019). Moreover, although all crabs belong to Decapoda, such as Portunus trituberculatus and Eriocheir japonica sinensis, most of them (called true crabs) belong to Brachyura, but king crabs belong to Anomura. Several studies have analysed the phylogenetic relationships of king crabs, and found that king crabs are closely related to true crabs (Bracken-Grissom et al., 2013;Tan et al., 2018). However, these studies were based on limited genetic data. Therefore, to better understand various issues related to the king crab, such as genome size, phylogenetic relationship, population history, development and immune response, obtaining a high-quality reference genome is crucial.
By combining BGI short reads, PacBio long reads and Hi-C reads, we have generated the first chromosome-level high-quality reference genome for the blue king crab, the continuity and completeness of which were comparable with that of other species.
Population history analysis revealed that the effective population size declined in the Cretaceous but increased more recently. Gene family analysis identified the expansion of several genes involved in development-related signalling pathways, steroid and thyroid hormone synthesis, and inflammatory regulation, suggesting that these genes probably contributed to the environmental adaptation of the blue king crab. This high-quality genome will provide a valuable resource for studying complex development processes and environmental adaptation of the blue king crab.

| Sampling and sequencing
A male adult blue king crab (Paralithodes platypus) was bought from the Sunkfa Company (place of sale: Beidaihe area, China; Figure S1). Genomic DNA was extracted from muscle using a contributed substantially to the environmental adaptation and unique body plan evolution of the blue king crab. The high-quality reference genome reported here provides a solid molecular basis for further study of the blue king crab's development and environmental adaptation.

K E Y W O R D S
adaptation, blue king crab, evolution, genome and sequenced on the BGI-seq 500 platform. A 20-kb long-read library was constructed and sequenced on the Sequel platform (Pacific Biosciences). Three Hi-C libraries were constructed with the restriction endonuclease MboⅠ and sequenced on the BGI-seq 500 platform. RNAs from four different tissues (liver, gill, stomach and heart) were extracted using a TRIzol kit and sequenced on the BGI-seq 500 platform.
PacBio long reads were used to construct the skeleton of the genome assembly. BGI-seq short reads were used to investigate the genome characteristics (such as genome size and heterozygosity) before assembly, as well as for assembly refinement (correcting sequencing errors that exist in long-reads) and evaluation of assembly quality. After these steps, Hi-C reads were used to anchor the contig-level assembly into the final chromosome-level genome assembly.

| Quality control of sequencing data
All sequencing data were filtered to reduce low-quality bases and duplicated reads using different strategies based on the platforms used. For the BGI-seq platform data (including genomic short-reads and RNA-seq reads), reads were filtered using the following steps: first, PCR duplications in read pairs produced during library construction were removed. Second, adaptors were removed from the sequencing reads. Third, read pairs with more than 50% low-quality bases were removed. Fourth, read pairs were excluded if any one read had more than 10% unknown bases (Chen et al., 2019). For the PacBio long reads, subreads were directly produced by the default parameters of the sequencing equipment (Sequel). For Hi-C reads, sequencing data were first filtered using the same method as for the BGI-seq short-insert reads, then further filtered by hic-pro (Servant et al., 2015).

| Genome size evaluation
To investigate the genome size of the blue king crab, all filtered short-insert reads from the BGI platform were used for 29-mer analysis with jellyfish (version 2.2.10; Marcais & Kingsford, 2011).
Specifically, reads were artificially divided into 29-bp length sequences with a 1-bp step size. If the read length is L, the read will produce (L − 29 + 1) k-mers. Therefore, the genome size (G) can be estimated by the formula: G = K number /K depth , where K number represents the total number of k-mers produced and K depth represents the peak value of k-mer depth generated.
To polish the assembled genome, the PacBio long-reads were first mapped to the genome assembly by minimap2 (version 2.9; Li, 2017) with default parameters except "-ax map-pb," and then corrected by racon (version 1.2.1; Talay & Altilar, 2009) with four-round iterations. Because of the high error rate of base-calling in the PacBio sequencing platform relative to the BGI platform, we further corrected the single-base errors in the genome assembly using the high-quality BGI short reads with bwa-mem (0.7.12-r1039; ) for mapping and pilon (version 1.21; Walker et al., 2014) for the correction process (two rounds). Based on the polished assembly and filtered Hi-C reads, we applied juicer  to process the data, then used 3D de novo assembly (version 180419 ;Dudchenko et al., 2017) to anchor the contigs at the chromosomal level. Lastly, we used juicebox (version 1.9.8;  to visualize and manually modify the results.

| Transcript assembly
Using the clean RNA-seq data, we de novo assembled the transcripts using bridger (r2014-12-01; Chang et al., 2015). These transcripts were further clustered based on pairwise sequence similarity using tgicl (tgicl_linux; Pertea et al., 2003), with the individual clusters then assembled to produce longer, more complete consensus sequences.
All transcripts were then used for the evaluation of genome assembly and transcript-based annotation of protein-coding genes.

| Genome annotation
We annotated repetitive sequences in the blue king crab genome using various software. tandem repeat finder (version 4.04; Benson, 1999) was used to identify tandem repetitive elements, and repeatmodeler was used to de novo detect transposable elements (TEs), including long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), DNA elements, and long tandem repeats (LTRs). The de novo repeat library produced via repeatmodeler analysis and the RepBase (RepBase16.02) database were separately analysed using repeatmasker (open-4.0.7; Bedell et al., 2000) to identify repeat sequences. Moreover, we used repeatproteinmask to identify TEs at the protein level. To specifically investigate the main contributors to the large king crab genome, we calculated the insertion times of different TEs based on the formula T = K/2R, where K is the Kimura value extracted from the annotation result, R is the evolution rate calculated by r8s (Sanderson, 2002) and T is the divergence time. Using repeatmasker, the TEs, including LINEs, SINEs, LTRs and DNA elements, were annotated. All annotated TEs were changed to fasta format with TE classification as repeatmodeler output consensus sequences. The newly combined consensus sequences were used as the de novo library for repeatmasker annotation and calculation of the Kimura value of each sequence. Divergence times of these species were calculated using r8s. Lastly, we used the Perl script parseRM.pl (downloaded from https://github.com/4urel iek/Parsi ng-Repea tMask er-Outputs) to analyse raw alignment outputs of repeatmasker with the parameters of "-l 300, 5 -age 30 -v. " We then masked all repetitive sequences, except for the tandem repeats, in the blue king crab genome for protein-coding gene annotation. First, we de novo predicted protein-coding genes using augustus (version 3.2.1; Stanke & Waack, 2003  "-evalue 1e-5 -outfmt 6." The results were then used for relationship determination using orthomcl (version 2.0.9; Li et al., 2003).

| Orthologous gene identification
Finally, 127 genes were found to have only one copy in each species and were thus considered as single-copy orthologous genes among these species.

| Phylogenetic relationships and divergence time estimation
The identified 127 single-copy orthologous genes among the 13  Li et al., 2003) were aligned using muscle (version 3.8.31; Edgar, 2004aEdgar, , 2004b) and concatenated to super-genes for phylogenetic tree and divergence time analyses. The phylogenetic relationships among species were determined using the concatenated single-copy genes with the maximum-likelihood model in raxml (version 8.2.10, see Supporting File for detailed parameters; Stamatakis, 2014). Divergence times among species were analysed using the fourfold degenerate synonymous site with the MCMCtree model in paml (version 4.8; Yang, 1997Yang, , 2007, with the fossil records downloaded from the TIMETREE database (www.timet ree.org) for calibration of results.

| Expansion and contraction analysis of gene family
All identified gene families determined by orthomcl were used as input data in cafe (version 3.1; Han et al., 2013). The phylogenetic relationship results determined by raxml and divergence times determined by mcmctree were also used for expansion/contraction analysis in cafe. To estimate the global error of the input data, we applied an error model before running the cafe program using the python script caferror.py in which the key parameter "lambda -s" automatically finds value(s) of lambda that maximize the log likelihood of the data for all families. Estimation of the global error was 6.84e-04, and the corresponding lambda value, which indicates the probability of both gene gain and loss per gene per unit time in the phylogeny, was 1.57e-03. Using this lambda value, we applied the main program of cafe for all input gene families as the final step, with these results used for later analyses.

| Relative evolution rate
The relative evolution rates of species were analysed by two methods. (a) The concatenated protein sequences were used for analysis by Tajima's relative rate test model in mega7 (Kumar et al., 2016).
The tpcv model in lintre (Takezaki et al., 1995) was used to test molecular evolution. For both methods, we evaluated the relative evolution rate between the blue king crab and other species, with spider as the outgroup.

| Functional enrichment analysis
Genes predicted in the blue king crab genome underwent functional annotation by alignment to the gene ontology (GO) and KEGG databases. Enrichment analysis was performed with gene ID of the blue king crab according to their homologous relationships. GO and KEGG enrichments were analysed by enrichgo, and R scripts, respectively (Beissbarth & Speed, 2004;Huang et al., 2009).

| Demographic history reconstruction
The population history of the blue king crab was inferred using the Pairwise Sequentially Markovian Coalescent (PSMC) model. The high-quality BGI short-insert-size reads were aligned to the blue king crab genome by bowtie2 (version 2.2.9; Langmead & Salzberg, 2012), and the alignment results were converted (samtools view -bS) into bam format and sorted (samtools sort) using samtools (version 1.3.1; . Single nucleotide polymorphisms (SNPs) were identified using bcftools (version 1.3.1; Narasimhan et al., 2016) with key parameters "-d 150 -q 20 -Q 20." The produced results were converted into the input format of PSMC using the vcfutils.pl Perl script (in bcftools). Population history was analysed by PSMC (0.6.5-r67; Li & Durbin, 2011) using generation time and mutation rate information. PSMC analysis was performed using the parameters "-N25 -r5 -p 4 + 25*2 + 4+6," where -N is the maximum number of iterations, -r is the initial theta/rho ratio and -p is the atomic time intervals.
Results were delivered to the plot script using the parameters "-g 2 -u 2.1442e-9 -Y 100," where -g is the number of years per generation and -u is the absolute mutation rate per nucleotide per generation.
We calculated the mutation rate using r8s (version 1.71) with the calibration time downloaded from the TIMETREE database (www. timet ree.org).

| Assembly, annotation and characteristics of the Paralithodes platypus genome
To resolve any difficulties that may arise during the genome assembly process, we employed genome survey analysis using ~447.62 Gb short-insert-size clean reads produced from the BGI platform (Table S1) before large-scale sequencing. Evaluation of genome characteristics indicated that the genome was ~5.49 Gb and showed extremely high heterozygosity ( Figure S2), thus suggesting a complex genome of blue king crab. To acquire a high-quality genome assembly,  Simao et al., 2015), as well as short reads and transcripts, were used for analysis. In total, more than 225 (74%) and 755 (77%) core eukaryote and metazoan genes, respectively, were successfully identified in the genome (Table 2), and assembly quality was comparable with that of several published species (Table S3). We then aligned all filtered short reads produced on the BGI-seq platform to the genome using bwa (BWA-MEM algorithm; , with more than 4,106 million reads (97.96%) mapped to the genome (Table S4). Moreover, we de novo assembled the transcripts with the 18.39 Gb of RNA-seq data using bridger (Chang et al., 2015; Tables S5 and S6). In total, 164,651 transcripts were acquired, with 158,958 (96.54%) mapped to the assembled genome (Table S7). We also found that the average GC content in the blue king crab (~41.57%) was similar to that in other species sampled here (33.62%-42.01%; Figure S3). Until now, no chromosome-level genome has been reported among Malacostraca species, except for Portunus trituberculatus, which is partially because the genome karyotypes of species in these taxa are extremely complex.
Thus, construction of a chromosome-level genome is crucial. Here, ~321.74 Gb of Hi-C reads (Table S8) were produced for chromosome construction using 3D de novo assembly (Dudchenko et al., 2017). A total of 104 chromosomes were anchored with a mounting rate of  Figure 1a). Therefore, to study genome expansion, we performed genome annotations in the blue king crab. We first annotated repetitive sequences in the blue king crab genome by both the de novo and homologue methods, resulting in the identification of ~3.71 Gb of repeat sequences, accounting for 77.73% of the assembled blue king crab genome (Table S10).
Among these repetitive sequences, LINEs accounted for the high-    (Table S11). We further investigated the expansion history of repetitive elements and found that the blue king crab experienced one expansion event within the last ~100 million years (Figure 1b), with both Gypsy and CR1 showing a sharp expansion within the most recent ~40 and ~20 million years, respectively ( Figure 1c). Therefore, these results indicate that LINE and LTR expansions were major contributors to the large genome size in the blue king crab, with expansion history and time shown in

| Phylogenetic relationship and evolutionary history of blue king crab based on comparative genomic analysis
We used protein-coding gene annotation and identified 28,287 high-quality protein-coding genes. To check the prediction quality of the protein-coding genes, we compared mRNA, coding sequence (CDS), exon and intron lengths and found that the quality of the predicted genes in the blue king crab was comparable with that of other species examined (Figure 2a). Gene functional annotation was employed using the InterPro, SwissProt, TrEMBL, GO, KEGG and NR databases. Results showed that most of the predicted genes had functional annotations in these databases (  (Li et al., 2003). As a result, 15,990 gene families were constructed, with 127 single-copy genes identified among the 13 species (Figure 2b; Table S12). Previous nuclear gene studies have indicated that the blue king crab has a very close phylogenetic relationship with true crabs (Bracken-Grissom et al., 2013;Tan et al., 2018). However, whether whole-genome data show the same results remains unclear. Here, we aligned all singlecopy genes among the 13 species using muscle (version 3.8.31; Edgar, 2004aEdgar, , 2004b, with protein alignments, and then concatenated to supergenes for phylogenetic analysis by raxml (-m PROTGAMMAAUTO; version 8.2.10; Stamatakis, 2014) and with TA B L E 4 Statistics of de novo annotated repeat sequences in the blue king crab genome    Yang, 1997Yang, , 2007. Results indicated that the blue king crab and two true crabs (i.e., Chinese mitten crab and swimming crab) diverged ~272.5 Ma. Furthermore, these three crabs diverged from shrimp ~301.7 Ma, Peracarida (H. azteca) diverged from Eucarida 328.5 Ma, and Malacostraca diverged from Maxillopoda 452.6 Ma ( Figure 3). Population history analysis indicated that the effective population size of the blue king crab declined ~1 Ma, but then began to expand ~100,000 years ago, suggesting a possible increase in genetic diversity and adaptation (Figure 4). Relative evolution rate analysis can clarify the historical status and the rate of molecular evolution of a species. We analysed the blue king crab and other examined species and found that the blue king crab has a relatively faster evolution rate than the swimming crab and Chinese mitten crab ( Figure 5; Tables S13 and S14).

| Specific/expanded gene families facilitate development of unique features and physiology
Using the clustering results of these species, we performed gene family analysis. We found 1,866 gene families that commonly existed in the 13 species (Figure 6a). In addition, 2,613 gene families existed in all five Malacostraca species (H. azteca, Portunus trituberculatus, Eriocheir japonica sinensis, Paralithodes platypus and Penaeus vannamei; Figure 6a), and 20 GO terms, such as carbohydrate metabolic process (p = 6.45e-04) and primary metabolic process (p = 2.11e-04), and 86 KEGG terms, such as glutathione metabolism (p = 6.10e-06), retinol metabolism (p = 4.15e-04), sphingolipid metabolism (p = 9.17e-05) and fatty acid metabolism (p = 1.32e-02), were functionally enriched (Tables S15 and S16). Among the five Malacostraca F I G U R E 4 Population history analysis of the blue king crab. This figure reflects changes in effective population size over time. Parameter "g" represents generation time, parameter "μ" represents mutation rate per generation and KYA represents thousand years ago  biosynthesis and metabolism processes of its unique body plan as well as in physiological processes (Figure 6b; Tables S17 and S18).
To investigate the expanded gene families in the above species, we employed expansion and contraction analysis of gene families using cafe (Han et al., 2013), and found that several gene families were changed in the five Malacostraca species (Tables S19-S22). We further studied whether the blue king crab showed unique changes in gene families and identified 98 expanded gene families and 113 contracted gene families in this species (p < .01). Functional enrichment analysis of these families (Tables S23-S26)

| D ISCUSS I ON
The blue king crab has great economic value and occupies an important evolutionary position. Genomic information regarding this species could help better understand its rapid reproductive rate as well as its unique body plan and environmental adaptation. Here, based on various sequencing platforms, we generated different sequencing data, including ~447.62 Gb of BGI-seq short-insert reads, ~229.39 Gb of PacBio long reads, ~321.74 Gb of Hi-C sequencing reads and ~18.39 Gb of transcriptomic data, to construct and annotate a chromosome-level reference genome of the blue king crab.
Using a hybrid assembly method combining wtdbg, racon, pilon and 3D de novo assembly, we successfully acquired a 4.77-Gb chromosome-level genome assembly of the blue king crab, which represents the first reference genome of Anomura (Figure 8; Tables 1 and 3; Tables S1, S2, S5 and S8). Genomic comparisons revealed this assembly to be the most complex chromosome-level genome assembly in Malacostraca to date, with successful anchoring of 104 chromosome sequences (Figure 8; Table 3). This number is consistent with chromosome results from karyotype analysis of another king crab species (Paralithodes camtschatica; Niiyama, 1935), further supporting the accurate chromosome number acquired in the current study.
Therefore, this genome assembly could act as a reference genome for studies on Malacostraca species. Our results also showed that the blue king crab had a large genome relative to the other arthropods examined. The main reason for genome expansion in the blue king crab was the expansion of LINE and LTR repetitive elements ( Figure 1a,b). Further analysis indicated that CR1 and Gypsy accounted for the largest proportions in LINE and LTR types and sharply increased within the last ~20 and ~40 million years, respectively ( Figure 1c). These analyses will assist in our understanding of the expansion history of the large blue king crab genome.
Phylogenetic analysis using various high-quality genome assemblies of Crustacea and several other representative species showed that the blue king crab has a close relationship with true crabs, including the Chinese mitten crab and swimming crab, with divergence ~272.5 Ma (Figure 2a). Gene family analysis identified several expanded gene families in blue king crab, including those related to several signalling pathways (mTOR, Hedgehog and PI3K-Akt signalling pathways), biosynthesis processes (steroid biosynthesis and thyroid hormone synthesis), inflammatory reaction and homologous recombination (Figure 7; Table S24). These signalling pathways, especially the Hedgehog pathway, play several key conserved roles in early development and body plan formation of many species (Chuong et al., 2000). For example, previous studies of mice and zebrafish indicate that the Hedgehog pathway plays a key role in craniofacial development (Schwend & Ahlgren, 2009).
In Drosophila, the Hedgehog protein regulates development of the eye imaginal disc (Strutt & Mlodzik, 1996). In mammals, the Hedgehog signalling pathway plays a central role in gonadal (and thus sexual) development (Barsoum, 2009). Our results together with those of previous studies further suggest that related genes may have contributed to the development and formation of the unique body plan of the king crab. The blue king crab has a long and complex development process and is vulnerable to virus attack during its lengthy larval stages. Interestingly, our comparative genomic analysis found that gene families involved in inflammatory reactions were expanded in the blue king crab, which may partly explain the strong environmental adaptation ability of king crabs. Like marine invertebrates, and indeed all marine animals, blue king crabs are constantly exposed to high concentrations of microorganisms.
However, they lack the elements of adaptive immunity, such as T-cells and B-cells, and primarily depend on other immune factors, such as antimicrobial peptides, to combat invading pathogens (Moe et al., 2018). Therefore, the expansion of immune-related genes may help king crabs avoid injury and invasion from pathogens, and thus better adapt to the marine environment. We also found that related homologous recombination genes were expanded in the king crab, suggesting that this species could quickly repair damaged genomic DNA via homologous recombination. Together, the genome assembly and transcriptomic data generated in this study will provide valuable resources for research on the developmental processes and adaptive evolution of the blue king crab.

ACK N OWLED G EM ENTS
We thank Nextomics Biosciences Co., Ltd (Wuhan, China) for assisting with library preparation and genome sequencing. This work was partly supported by the the National Natural Science Foundation of

DATA AVA I L A B I L I T Y S TAT E M E N T
Raw sequencing data and the genome assembly have been deposited in the National Center for Biotechnology Information (NCBI) database with accession no. PRJNA555178. The genome assembly has also been deposited in the Dryad database (https://datad ryad. org/stash/ datas et/doi:10.5061/dryad.jm63x sj6c).