The complete genome sequence of Exiguobacterium arabatum W‐01 reveals potential probiotic functions

Abstract Shrimp is extensively cultured worldwide. Shrimp farming is suffering from a variety of diseases. Probiotics are considered to be one of the effective methods to prevent and cure shrimp diseases. Exiguobacterium arabatum W‐01, a gram‐positive and orange‐pigmented bacterium, was isolated from the intestine of a healthy Penaeus vannamei specimen. Whole‐genome sequencing revealed a genome of 2,914,854 bp, with 48.02% GC content. In total, 3,083 open reading frames (ORFs) were identified, with an average length of 843.98 bp and a mean GC content of 48.11%, accounting for 89.27% of the genome. Among these ORFs, 2,884 (93.5%) genes were classified into Clusters of Orthologous Groups (COG) families comprising 21 functional categories, and 1,650 ORFs were classified into 83 functional Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. A total of 27 rRNA operons and 68 tRNAs were identified, with all 20 amino acids represented. In addition, 91 genomic islands, 68 potential prophages, and 33 tandem repeats, but no clustered regularly interspaced short palindromic repeats (CRISPRs), were found. No resistance genes and only one virulence gene were identified. Among the 150 secreted proteins of E. arabatum W‐01, a variety of transport system substrate‐binding proteins, enzymes, and biosynthetic proteins, which play important roles in the uptake and metabolism of nutrients, were found. Two adherence‐related protein genes and 31 flagellum‐related protein genes were also identified. Taken together, these results indicate potential probiotic functions for E. arabatum W‐01.


| Isolation and purification of bacterial strain
Ten individuals of healthy P. vannamei were obtained from a farmer's market. The intestines were removed from the shrimps to a sterilized Petri dish, and washed three times with sterilized PBS.
After homogenating the washed intestine with sterilized PBS, 0.2 ml homogenate was spread on tryptic soy agar (TSA) plate and incubated at 28°C for 24 hr. Several single colonies of the dominant colony were selected and streaking inoculated triple to obtain pure culture.

| The identification of bacterial strain
The bacterial strain was identified by amplified 16S rRNA sequence coding region. The genomic DNA of bacterial strain was extracted with a DNA extraction kit (Takara, Japan) following the manufacturer's instructions. DNA yield and purity was electrophoresed on a 1% agarose gel, and quantified using an ND-2000 NanoDrop UV spectrophotometer (NanoDrop Technologies). 27F (5′ AGAGTTTGATCCTGGCTCAG 3′) and 1429R (5′ GGTTACCTTGTTACGACTT 3′) were used in the polymerase chain reaction (PCR) under the following conditions: 5 min at 94°C, and 35 cycles of 15 s at 94°C (denaturation), 15 s at 55°C (annealing), 1 min at 72°C (extension), followed by 10 min at 72°C (final extension) using the ABI 2720 Thermal Cycler (Applied Biosystems).
The PCR product was purified and sequenced. Sequence was aligned to the National Center for Biotechnology Information (NCBI) (http:// www.ncbi.nlm.nih.gov).

| High-density pyrosequencing and sequence assembly of the genome
Whole-genome sequencing was performed using the PacBio RS II sequencing 10K library. After performing quality control protocols, the genomic fine drawing were completed through analyzing bioinformatic means. Sequencing data were self-corrected using FastqToCA and assembled according to principles similar to first-generation sequencing technology. Continuous long reads were obtained from three Single-Molecule, Real-Time (SMRT) sequencing runs; reads longer than 500 bp with a quality value over 0.80 were merged together into a single dataset. Next, the PBcR pipeline was used to correct for random errors. The longest 25X subset of the corrected data were used for de novo assembly using Celera Assembler, which employs an overlap-layout-consensus (OLC) strategy, with default parameters. Chromosome Atlas was drawn with circos-0.69 (Wong, 2012).
RepeatMasker software and the Repbase database (Horvath & Barrangou, 2010) were used to annotate repeat sequences.
For VFDB database screening, virulence genes were annotated and compared according to threshold criteria. A value of e < 1-e 5 was selected for BLASTn searches.

| Genome features
A genome size of 2,914,854 bp and a 48.02% GC content was found for E. arabatum W-01, as determined using PacBio RS II sequencing technology ( Figure 1). One scaffold of 2,914,854 bp was obtained without gaps ( however, significantly fewer numbers at sequence sizes of 2,500-2,599, 2,700-2,799, 2,800-2,899, and 2,900-2,999 were observed ( Figure 2).

| Gene annotation
Gene annotation analysis showed that of the 3083 ORFs, 2884 (93.5%) could be classified into COG families comprising 21 functional categories, among which general function prediction only, function unknown and transcription were the most abundant terms ( Figure 3). In addition, 1650 genes were classified into 83 functional KEGG pathways (Table S1) (Table S2). Sixty-eight tRNAs representing all 20 amino acids were also found (Table S3).

| Genomic islands
GEIs sequencing strategy has been developed to improve the efficiency of target gene sequence analysis with regard to time. Ninetyone genes in GEIs were predicted to have sequence similarities with previously identified genes from different species, such as genes involved in iron transport in Bacillus cereus and a variety of metabolicrelated enzymes in Paenibacillus mucilaginosus (Table S4, Figure 4). COG, Nr, and KEGG database analyses resulted in annotation of two adherence-related and 31 flagellum-related protein genes. The former were mainly annotated as adhesion lipoprotein, adhesion, and periplasmic component/surface adhesion and the latter as flagellar basal body rod protein, flagellum site-determining protein, flagellar M-ring protein, flagellar hook-associated protein, flagellar assembly factor, and flagellum-specific ATP synthase. The circle in Figure 4 represents a single chromosome, with red bars around the perimeter indicating all of the predicted GI locations across the three methods; within the circle, GI predictions are differentiated by the prediction method using F I G U R E 1 Chromosome Atlas for E. arabatum W-01. The scale is shown by the outer black circle. Moving inward, the first and second circles illustrate predicted coding sequences on the positive and negative strand, respectively, and are colored according to different functional categories. The third circle represents tRNAs (blue) and ribosomal RNA genes (red). The fourth and fifth (innermost) circles represent the mean-centered G+C content of the genome (red-above mean, blue-below mean) and GC skew (G-C)/(G+C), respectively. The data were calculated using a 1 kb window in 500 bp steps T A B L E 1 Statistics of assembly results

Statistics Scaffold
Total number 1

| IS elements
In total, 26 IS elements were identified in the genome (Table S5), 25 of which are in predicted GEIs sequences (Table S6).

F I G U R E 2 Coding sequence length distribution of E. arabatum W-01
F I G U R E 3 COG functional classifications of E. arabatum W-01 coding sequences

| Repeat sequence, prophage, and CRISPR elements
A total of 33 transposable repeat sequence were found, including 14 short interspersed nuclear elements, 18 long interspersed nuclear elements, and one long terminal repeat (LTR) element. A total of 68 potential prophages were predicted using Phage-finder software, and the proteins encoded by eight were annotated as two Large terminase and tails, two baseplates, one small terminase, one portal, one lytic enzyme, and one tyrosine recombinase. No CRISPR element was predicted using CRISPR Finder software.
Transport system substrate-binding proteins were the most abundant, including glucose uptake protein, iron ABC transporter substratebinding protein, D-methionine transport system substrate-binding protein, phosphate transport system substrate-binding protein, peptide/nickel transport system substrate-binding protein, polar amino acid transport system substrate-binding protein, and maltose/maltodextrin transport system substrate-binding protein. This group is followed in abundance by a variety of ectoenzyme, such as alpha-amylase, N-acetylmuramoyl-L-alanine amidase, beta-N-acetylhexosaminidase, and subtilisin. clpC (endopeptidase Clp ATP-binding chain C) was the only hypothetical virulence gene found. However, no resistance genes were revealed by these analyses.

| Nucleotide sequence accession numbers
The complete genomic sequence of E. arabatum W-01 has been deposited in the GenBank database under accession number SRP064228.

| DISCUSSION
The assembled sequences with one scaffold and 0 gaps indicated that the entire genome was covered. The total length of assembled sequences was 2,914,854 bp, a size that is similar to 13 reference Exiguobacterium genomes at 2.9 Mbp-3.2 Mbp. The similarity in genome size with members of the same genus suggests the completeness of our genome sequence.
Based on previous reports, E. arabatum W-01 was considered to be a potential probiotic for P. vannamei (Shi et al., 2015). The main mechanism by which probiotics inhibit disease is via secretion of antagonistic substances or competing with pathogens for adhesion sites or nutrients, thereby inhibiting pathogen growth and reproduction (Gatesoupe, 1999). For example, Pseudomonas fluorescens Ah2 inhibits the growth of Vibrio anguillarum by competing for free iron ions via siderophore secretion (Gram et al., 1999). The ecological importance of siderophores is with regard to absorption of nutrients from the environment and depriving competitors of these nutrients (Li et al., 2011). The secreted proteins identified as encoded by the E. arabatum W-01 genome largely comprise a variety of transport system substrate-binding proteins, which are related to uptake of glycerol, sugar, phosphate, oligopeptides, and iron. The transport system substrate-binding proteins of E. arabatum W-01 would not only contribute to bacterial growth but would also compete with intestinal pathogenic bacteria. In the same ecosystem, competition for nutrients and energy by different microbial populations play an important role in intestinal probiotics.
Many enzyme and biosynthetic proteins (Table S7), which play an important role in the metabolism of proteins and starches, were found among the secreted proteins of E. arabatum W-01. Indeed, digestive enzymes, such as alpha-amylase and subtilisin, effectively improve feed utilization and provide nutrition for growth. Poly-gammaglutamate synthesis protein, one such secreted protein, plays an important role in the biosynthesis of glutamate, which can be utilized by the host.
F I G U R E 4 Genomic islands distribution of E. arabatum W-01. Using IslandViewer to predict GEI. The circle represents a single chromosome, with red bars around the perimeter indicating the locations of all GEI predictions across the three methods. Within the circle, GEI predictions are differentiated by prediction method with IslandPath-DIMOB (blue), SIGI-HMM (orange), and IslandPick (green), all shown