Genomic characterization of Nocardia seriolae strains isolated from diseased fish

Abstract Members of the genus Nocardia are widespread in diverse environments; a wide range of Nocardia species are known to cause nocardiosis in several animals, including cat, dog, fish, and humans. Of the pathogenic Nocardia species, N. seriolae is known to cause disease in cultured fish, resulting in major economic loss. We isolated two N. seriolae strains, CK‐14008 and EM15050, from diseased fish and sequenced their genomes using the PacBio sequencing platform. To identify their genomic features, we compared their genomes with those of other Nocardia species. Phylogenetic analysis showed that N. seriolae shares a common ancestor with a putative human pathogenic Nocardia species. Moreover, N. seriolae strains were phylogenetically divided into four clusters according to host fish families. Through genome comparison, we observed that the putative pathogenic Nocardia strains had additional genes for iron acquisition. Dozens of antibiotic resistance genes were detected in the genomes of N. seriolae strains; most of the antibiotics were involved in the inhibition of the biosynthesis of proteins or cell walls. Our results demonstrated the virulence features and antibiotic resistance of fish pathogenic N. seriolae strains at the genomic level. These results may be useful to develop strategies for the prevention of fish nocardiosis.

Recently, we isolated two N. seriolae strains CK-14008 and EM15056 from diseased Channa argus (Northern snakehead) and Anguilla japonica (Japanese eel), respectively. To reveal the putative virulence factors and the genomic features of N. seriolae strains, we determined genome sequences of them using PacBio sequencing platform and compared their genomes with the genomes of phylogenetically close Nocardia species and other N. seriolae strains.

| Bacterial isolation, cultivation, and identification
Diseased snakehead (C. argus) and Japanese eel (A. japonica), which exhibited lethargy and skin ulcers were reported in 2014 (Busan, Korea) and 2015 (Gimcheon Gyeongsangbuk-do, Korea), respectively. Diseased fish samples were collected in ice-cooled boxes and transported directly to the laboratory of Korean National Institute of Fisheries Science for further diagnosis. Several swabs from the kidney, spleen, and liver of diseased fish were streaked on tryptic soy agar (TSA) and brain heart infusion agar (BHIA) plates and incubated at 25°C for 2 weeks. Two isolates, CK-14008 and EM-150506 were selected and cultured in tryptic soy broth medium at 25°C for 7 days under constant shaking with 100 rpm to obtain cell mass for DNA extraction experiment. The preparation of genomic DNA and PCR amplification of the 16S RNA gene were carried out following Chun and Goodfellow (1995) for the identification of two isolates.

| Genome sequencing, assembly and annotation
Genomic DNA from N. seriolae CK-14008 and EM150506 were extracted using the Qiagen DNeasy Blood & Tissue kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. For genome sequencing, a 20-kb PacBio SMRTbell library was prepared for each genome and PacBio RS II (Pacific Biosciences, Menlo Park, CA USA) was used for genome sequencing with P6 polymerase and C4 chemistry onto a single-molecular real-time (SMRT) cell for each genome.
For comparison between N. seriolae strains, the genomes of the strains N-2927 (Imajoh et al., 2015), U-1 (Imajoh et al., 2016), and ZJ0503 (Xia et al., 2015) were also downloaded from EzBioCloud, and the nucleotide sequences from the strains  and UTF1 (Yasuike et al., 2017) were downloaded from the NCBI genome database. For consistency of gene prediction and functional annotation of the analyzed genomes, the genome sequences of the strains SY-24 and UTF1 were uploaded to the Whole Genome (WG) pipeline of BIOiPLUG (https://www.bioiplug.com, ChunLab Inc., Seoul, Republic of Korea) and quality-controlled genome information was obtained.
To understand the phylogenetic relationships of the analyzed strains, OrthoANI values were calculated and UPGMA dendrograms were generated using the Orthologous ANI Tool (OAT) of ChunLab (Lee, Kim, Park, & Chun, 2015). For synteny analysis of the genomes at the nucleotide level, Blast Ring Image Generator (BRIG) was used with the default parameters (Alikhan, Petty, Ben Zakour, & Beatson, 2011).
Analysis of the pan-genome and core genome was conducted using the Comparative Genomics (CG) pipeline of BIOiPLUG Apps (https://www.bioiplug.com/apps, ChunLab Inc.). Pan-genome orthologous groups (POGs) were determined by a combined reciprocal best hit (RBH) method using uBLAST with an E-value threshold of 1 × 10 −6 (Ward & Moreno-Hagelsieb, 2014) and an open reading frame (ORF)-independent method using nucleotide sequences with cutoff values of at least 70% of gene coverage (Chun et al., 2009). A plot for the pan-genome and core genome sizes, Venn diagrams for the numbers of orthologous genes, and a heat-map for the presence and absence of the genes were generated using the CG pipeline of

BIOiPLUG in ChunLab.
Analysis of the secondary metabolite biosynthetic gene clusters and antibiotic resistance genes were conducted using the an-tiSMASH webserver (Blin et al., 2017) and Antibiotic Resistance Genes Database (ARDB) webserver (Liu & Pop, 2009), respectively.
Investigation of the putative virulence genes were conducted by keyword searches using annotated genes or homology searches using amino acid sequences against the putative virulence proteins of N. farcinica (Ishikawa et al., 2004;Yasuike et al., 2017).

| General genomic features of N. seriolae isolates
To analyze the genomic features of Nocardia strains isolated from the diseased fish, we sequenced the genomes of N. seriolae CK-14008 and EM150506, isolated from diseased C. argus (Northern snakehead) and A. japonica (Japanese eel), respectively. The genome of strain EM150506 consisted of a complete circular chromosome, and the genome of strain CK-14008 consisted of a complete circular chromosome with two incomplete plasmids.
Both strains had an 8.3-Mb chromosome with 68.1% G/C content. The general genomic characteristics of the two strains are described in Table 1.  Table S1).

| Phylogenetic relationships and genomic features of Nocardia bacteria
To identify the genomic features of the N. seriolae strains isolated from diseased fish, we compared their genomes with the genomes of Nocardia species isolated from human medical samples and natural sources. The general features of the analyzed Nocardia genomes are described in Supporting Information Table S1. A phylogenetic tree of eight Nocardia species based on the OrthoANI algorithm (Yoon et al., 2017) showed that the analyzed Nocardia species could be divided into two clades ( Figure 1a). One clade contains the species isolated from the diseased fish (CK-14008) and human medical samples (NBRC 100128, NBRC 100130, NBRC 100131, and NBRC 100430), while the other clade contains the species isolated from soil (NBRC 103114 and NBRC 108247) and diseased oysters (NBRC 100342). These phylogenomic relationships were identical with the results in the previous studies (Tamura et al., 2012(Tamura et al., , 2018. The distributions of the genes assigned to clusters of orthologous groups (COG) showed that the genes associated with the COG categories "transcription" and "amino acid transport and metabolism" were the most abundant in the genomes of Nocardia species, except for the genes assigned to the COG category "general function prediction only" (Supporting Information Table S2).
The relative abundances of the genes assigned to each COG category were highly similar among Nocardia species. However, in the genomes of the N. seriolae strains, the relative abundances of the genes assigned to the COG categories "transcription" and "secondary metabolites biosynthesis, transport, and catabolism" were approximately 1% lower than in other species. Of the COG assigned genes, the genes assigned to the COG category "replication, recombination, and repair" were significantly higher in the genomes of N.
To investigate the metabolic features associated with host specificity, we compared the orthologous genes from four N. seriolae strains  (Figure 3b), and analyzed the strainspecific genes in the seven N. seriolae strains (Figure 3c). In the genomes of the strain EM150506, CK-14008, SY-24, N-2927, U-1, UTF1, and ZJ0503, total of 74, 35, 47, 5, 11, 13, and 9 genes were detected as strain-specific genes, respectively. However, as most encoded hypothetical proteins, it was difficult to identify the host-specific metabolic features of the strains through the comparison of orthologous genes.

| Putative virulence-associated genes
Dozens of Nocardia species have been isolated from diseased hosts, and N. seriolae strains were mainly isolated from diseased fish. In the genomes of the analyzed strains, dozens of genes known to be candidate virulence factors in N. farcinica (Ishikawa et al., 2004;Yasuike et al., 2017) were detected using the parameters E ≤ 1 × 10 −5 and ≥ 50% sequence identity ( Table 2)

| Secondary metabolite biosynthetic genes
Dozens of secondary metabolite biosynthetic genes were detected in the genomes of Nocardia strains, and were identified in 94 biosynthetic gene clusters (Supporting Information  Figure 4a). The biosynthetic gene cluster for nocobactin includes five core biosynthetic genes (OJF78308 to OJF78312); the biosynthetic modules for adenylation, acyltransferase, condensation, ketoreductase, ketoacyl synthase, peptidyl carrier protein, and thioesterase were detected in these core biosynthetic genes (Figure 4b).
Approximately 30-kb upstream of the nocobactin biosynthetic gene with the genes from Nocardia species (Figure 4c,d). Approximately 10-kb downstream of the mycobactin biosynthetic gene cluster, two large nonribosomal peptide synthetase genes containing adenylation, condensation, and peptidyl carrier modules were detected (OJF82997 and OJF82998). However, they showed no homology with previously known secondary metabolite biosynthetic genes.

| Antibiotic resistance genes
In the genomes of Nocardia strains, resistance genes against several kinds of antibiotics were detected with more than 30% similarity to previously known antibiotic resistance genes ( Figure 5). According to Yasuike et al., the N. seriolae strains can be divided into two groups according to their α-glucosidase activity and susceptibility to erythromycin or oxytetracycline (Ismail, Takeshita, Umeda, Itami, & Yoshida, 2011;Yasuike et al., 2017). All N. seriolae strains analyzed in this study contained the gene encoding α-glucosidase, but did not have resistance genes against erythromycin or oxytetracycline. Of the detected antibiotic resistance genes, genes involved in the resistance to macrolide were the most abundant in the genomes of Nocardia species, and the diversity of antibiotic resistance genes was the highest in the genomes of N. seriolae strains. In particular, the number of genes involved in resistance to vancomycin was approximately twofold higher in the genomes of N. seriolae strains than in those of other Nocardia species. Furthermore, the resistance genes against amikacin, cephalosporin, dibekacin, fluoroquinolone,  3  3  3  3  3  3  3  0  0  3  1  2 isepamicin, netilmicin, sisomicin, streptomycin, tobramycin, and tobramycintilmicin were mainly detected in the genomes of N. seriolae strains.

| D ISCUSS I ON
Nocardia strains are widespread in diverse habitats such as soil and water (Luo, Hiessl, & Steinbuchel, 2014). Some of pathogenic Nocardia species causing nocardiosis have been detected in human and animal such as cat, dog, and fish (Eroksuz et al., 2017;Harada et al., 2009;Kudo et al., 1988). Recently, the occurrence of nocardiosis in farming fish has been increasing gradually because of the high-environmental stresses caused by the dense cultivation of fish and environmental conditions that favor pathogens, such as like warming seawater (Le Roux et al., 2015;Pulkkinen et al., 2010). However, there are currently no treatment options to cure nocardiosis in fish; suppressing the growth of the pathogens by antibiotic treatment is the only viable method to prevent disease occurrence (Nayak & Nakanishi, 2016).
For several decades, the virulence factors of some pathogenic Nocardia species have been actively studied, and several virulence features were identified including invasion into the host cells, survival in the cells, and bacterial lytic activity (Beaman & Beaman, 1994).
The major virulence feature of pathogenic Nocardia species is the invasion of the bacterium into host cells including macrophages (Beaman & Beaman, 1994). For attachment and invasion into the host cells, the most well-known virulence factor in the genus Nocardia is the mammalian cell entry (Mce) family of proteins (Arruda, Bomfim, Knights, Huima-Byron, & Riley, 1993;Yasuike et al., 2017). In our study, eight to 12 copies of mce operons were detected in the ge- In host cells, especially macrophages, invading pathogens have to defend against reactive oxygen species produced by the defense responses of the host cells (Fang, 2004 TA B L E 2 (Continued) F I G U R E 4 Nocobactin and mycobactin biosynthetic gene clusters in the genome of Nocardia seriolae CK-14008. (a) Genomic region around the nocobactin biosynthetic gene cluster. Red colored genes-from OJF78308 to OJF78312-indicate the core nocobactin biosynthetic genes. The core biosynthetic genes-from OJF78260 to OJF83928-located upstream of the nocobactin biosynthetic genes showed high homology with laspartomycin biosynthetic genes. This gene structure was highly conserved in the genomes of N. seriolae EM150506 and UTF1, which are fully sequenced N. seriolae genomes. The coding DNA sequence (CDS) numbers follow the protein ID from GenBank. (b) Structure of the biosynthetic modules detected in the core biosynthetic genes of nocobactin (OJF78308 to OJF78312) and laspartomycin (OJF78260 to OJF83928). (c) Comparison of the mycobacin biosynthetic gene cluster between N. seriolae and Mycobacterium tuberculosis. Of the mycobactin biosynthetic genes in M. tuberculosis, only three genes (gray shadows) showed homology with the N. seriolae genes. Percentages indicate the sequence identity between two genes. Downstream of the mycobactin biosynthetic gene cluster of N. seriolae, additionally large secondary metabolite biosynthetic genes were detected (OJF82997 and OJF82998), but showed no homology with previously known secondary metabolites. This gene structure was highly conserved in the genomes of N. seriolae EM150506 and UTF1. The CDS numbers follow the protein ID from GenBank.

ACK N OWLED G M ENTS
This work was financially supported by the National Institute of Fisheries Science, Republic of Korea (R2018062).

CO N FLI C T O F I NTE R E S T
The authors declare there are no conflicts of interest.