Phylogenic position and low genomic diversity of “Candidatus Rickettsia kotlanii” inferred by complete genome sequences of two Japanese isolates

Many Rickettsia species of the spotted fever group (SFG) cause tick‐borne diseases known as “spotted fever.” One of the candidate SFG Rickettsia species is “Candidatus Rickettsia kotlanii,” which was first detected in Haemaphysalis concinna in Hungary in 2006. However, its precise phylogenetic position in the SFG is not clear because only single‐gene sequence–based phylogenetic analyses were performed using very limited genes. Here, we present the complete genome sequences of two Japanese “Ca. R. kotlanii” isolates, which differed only by a 135 bp insertion/deletion (InDel). Using these genomes and publicly available whole genome sequences of other Rickettsia species, the precise phylogenetic position of “Ca. R. kotlanii” in Rickettsia was determined to be in a clade of the SFG. The phylogenetic relationships and average nucleotide identity of “Ca. R. kotlanii” relative to the other species indicated that “Ca. R. kotlanii” is an independent taxon in the SFG. Notably, although the genomes of the two isolates were almost identical, the isolates were obtained from different tick species in different regions and years, suggesting extremely low genomic diversity in “Ca. R. kotlanii.” While the genome of “Ca. R. kotlanii” is the smallest in the transitional group and SFG Rickettsia sequenced to date, we identified genes uniquely present or absent in “Ca. R. kotlanii,” but most were apparently degraded. Therefore, analyses of differences at the sequence (single nucleotide polymorphisms and small InDels) or gene expression level will be required to understand the functional or physiological features unique to “Ca. R. kotlanii.”


INTRODUCTION
Rickettsia are Gram-negative bacteria belonging to the class Alphaproteobacteria with obligate intracellular life cycles between mammalian hosts and hematophagous arthropod vectors and within nonhematophagous arthropods in nature. They are divided into four groups according to their phylogeny: the spotted fever group (SFG), the transitional group (TRG), the typhus group (TG), and the ancestral group. 1,2 Among these, the SFG Rickettsia are found worldwide, infecting a wide array of wild and domestic vertebrates mostly through tick bites. The SFG Rickettsia include more than 25 validated species, with additional new species or subspecies being proposed. 1 One of the candidate SFG species is "Candidatus Rickettsia kotlanii," which was detected in Haemaphysalis concinna in Hungary in 2006 and 2010. 3,4 The first isolate of this organism, named HM-2, from Haemaphysalis megaspinosa was reported in 2014 in Japan, 5 while its detection in vertebrates has been reported only recently, specifically in Polish raccoons. 6 Although phylogenetic analyses suggested that "Ca. R. kotlanii" belongs to the SFG, its precise phylogenetic position in the group is not clear because, in previous studies, only single-gene sequence-based phylogenetic analyses were performed using very limited genes. 4,5 Here, we present the complete genome sequences of two Japanese "Ca. R. kotlanii" isolates, which differed only by a 135 bp insertion/deletion (InDel). Using these genomes and publicly available whole genome sequences (WGSs) of other Rickettsia species, we showed the precise phylogenetic position of "Ca. R. kotlanii" in Rickettsia and that it has the smallest genomes among the Rickettsia species sequenced thus far, except for the TG Rickettsia species. Through interspecies genomic comparison, genes specifically present in or absent from "Ca. R. kotlanii" were also identified.

Bacterial isolates, cultivation, and genomic DNA preparation
The strain information for the two "Ca. R. kotlanii" isolates analyzed in this study, HM-2 and FLA-4, is shown in Table 1 To prepare genomic DNA for WGS determination, rickettsial isolates were inoculated into L929 cells, incubated in Eagle's minimal essential medium (Sigma-Aldrich, MA) at 34°C with 5% CO 2 for 7-10 days, and collected as previously described 7 except without EDTA treatment. Genomic DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen, Venlo, Netherlands) according to the manufacturer's instructions.

Genome sequencing and analysis
Genomic DNA libraries for Illumina sequencing were prepared using the NEBNext Ultra II FS DNA Library Preparation Kit (New England Biolabs, MA) and sequenced on the Illumina MiSeq platform (Illumina, CA) to generate paired-end sequence reads (301 bp x2; 1,298,728 reads for HM-2 and 2,488,346 reads for FLA-4). Illumina reads were quality controlled by trimming low-quality bases and adapter sequences using Trimmomatic version 0.39 integrated into KneadData version 0.10.0 (https://bitbucket.org/ biobakery/kneaddata). 8 Illumina reads derived from L929 cells were removed by mapping to the mouse reference genome (GRCm39) using Bowtie2 version 2.4.5 9 in KneadData, resulting in 747,304 (59.6%) and 768,756 (31.5%) nonmouse-derived reads for HM-2 and FLA-4, respectively. Genome assembly was performed using MetaPlatanus version 1.2.2, 10 yielding a single contig for each isolate. Closed genome sequences were obtained by capillary sequencing of PCR-amplified genomic regions between contig ends (Supporting Information: see Table S1 for primer sequences). Quality assessment of the assemblies, gene annotation, and sequence comparison of the two genomes were performed using CheckM version 1.1.3, 11 dfast version 1.2.6, 12 and nucdiff version 2.0.3, 13 respectively. The lists of other Rickettsia strains analyzed in this study are provided in Supporting Information: Table S2. The assembly levels of their genomes were referred from the NCBI Assembly (https://www.ncbi.nlm.nih.gov/assembly). Core genes were determined by Roary 14 using an amino acid sequence identity threshold of 80%. Phylogenetic trees based on core gene sequences were constructed using RAxML-NG version 1.0.1 15 with the best model inferred by ModelTest-NG version 0.1.6 16 and 200 bootstrap replicates. Average nucleotide identity (ANI) values were calculated using PYANI version 0.2.10. 17 The closed chromosome sequences were rotated using Circlator version 1.5.5 18 so that dnaA was the first gene in each sequence. Dot-plot T A B L E 1 "Ca. R. kotlanii" isolates analyzed in this study.

RESULTS
General genomic features of "Ca. R. kotlanii" The genomes of both isolates comprised a single circular chromosome and contained no plasmids. The chromosome of HM-2 was 1,204,030 bp in length (GC content; 32.2%) and contained 1242 coding sequences (CDSs), three rRNA genes (23S, 16S, and 5S), and 33 transfer RNA (tRNA) genes ( Table 1). The length of the FLA-4 chromosome was very similar to that of HM-2 (1,204,165 bp). These chromosome sizes were within the range observed for Rickettsia (1111-1657 kb; Supporting Information: see Table S2 for the list of representative strains of Rickettsia species used for the phylogenetic analysis described below) and the smallest among the species other than those in the TG (Rickettsia prowazekii and Rickettsia typhi; Figure 1). The comparison of the two "Ca. R. kotlanii" chromosomes revealed that they were identical except for a 135 bp InDel found in the tandem repeat-containing region of the gene for the Arp2/3 complex-activating protein RickA that promotes Rickettsia motility. 21 The InDel induced a copy number variation in the 45 bp repeat unit: three copies in HM-2 and six copies in FLA-4. The Sca2 protein related to motility in the late phase of Rickettsia infection appears to be the full-length type in "Ca. R. kotlanii," similar to that in other SFG Rickettsia. 22 Phylogenomic position of "Ca. R. kotlanii" To clarify the phylogenetic position of "Ca. R. kotlanii" in the SFG Rickettsia, we performed a phylogenetic analysis based on core genes (n = 690) of the 30 validated or proposed SFG and TRG species and "Ca. R. kotlanii." As shown in Figure 1 (note that the root was determined by a similar phylogenetic analysis including two TG Rickettsia, R. typhi, and R. prowazekii; Supporting Information: see Figure S1), this analysis revealed that the SFG Rickettsia species analyzed were separated into three distinct clades  Table S2.
(named clades 1-3) and that "Ca. R. kotlanii" belonged to clade 3, which includes well-known human pathogenic SFG Rickettsia, such as R. rickettsii and R. conorii, as well as Rickettsia japonica, the major causative agent of rickettsiosis in Japan. 23 The ANI values of "Ca. R. kotlanii" relative to the other members of clade 3 were between 95.29% and 96.04%, but they were 91.75%-93.02% relative to clades 1 and 2 and 89.98%-91.47% relative to the TRG Rickettsia (Figure 1 and Supporting Information: Table S2).
Chromosome structure of "Ca. R. kotlanii" Rickettsia genomes are characterized by a high degree of synteny conservation with a few exceptions. [24][25][26] To analyze the chromosomal structure of "Ca. R. kotlanii," we performed dot-plot sequence comparison of the chromosome of HM-2 with the closed chromosome of other Rickettsia species. This analysis revealed that the HM-2 sequence showed collinearity with the sequences of most Rickettsia species analyzed (Supporting Information: Figure S2). As the overall structure of the HM-2 chromosome appeared to be most similar to that of R. japonica and the two species appeared to be relatively close in clade 3 (Figure 1), we analyzed the genomic differences between the two species in further detail (note that while we used the YH_M sequence as the representative of R. japonica genomes, the genomic diversity of R. japonica was extremely low 27 ). As shown in Figure 2, in addition to two small inversions, several small regions were unique to each species in this analysis. As the presence of such regions is potentially related to the unique features of "Ca. R. kotlanii," we performed a pangenome analysis of the validated or proposed SFG and TRG Rickettsia species, including "Ca. R. kotlanii." This analysis identified 22 genes/ gene clusters unique or nearly unique to "Ca. R. kotlanii" among the SFG and TRG Rickettsia species analyzed (referred to as unique genes) and 22 genes/gene clusters specifically or nearly specifically missing from "Ca. R. kotlanii" (referred to as missing genes; Table 2, Figure 2). Among the unique genes, seven (HM2_12030-12070, HM2_12090, and HM2_12100) were clustered in the same region on the chromosome, one of the "Ca. R. japonica (Figure 2). HM-2_12100 contained the C-terminal catalytic domain of bacteriophage P4 integrase, and HM2-12080 and HM2-12090 also appeared to be parts of a degraded integrase gene, suggesting that "Ca. R. kotlanii" horizontally acquired this region ( Figure 2). However, all 22 unique genes were predicted to encode hypothetical proteins (or apparently degraded genes), and most were shorter than 100 amino acids (18/22; Supporting Information: Table S3). Many of the 22 missing genes were functionally annotated, but their lengths were highly variable between species (Supporting Information: Table S4), suggesting that they have been degraded in some or most other species. In fact, a close inspection of these genes in R. japonica revealed that at least nine of them have been split into two or more CDSs, and at least six were much smaller than the longest counterpart found in other SFG/TRG Rickettsia species (Supporting Information: Table S4).

DISCUSSION
Through the determination of complete genome sequences of two "Ca. R. kotlanii" isolates and core gene-based phylogenetic analysis, the precise phylogenetic position of "Ca. R. kotlanii" in Rickettsia was determined to be within clade 3 of the SFG. The phylogenetic relationships and ANI values (95.29% and 96.04%) of "Ca. R. kotlanii" relative to the other validated or proposed species in clade 3 indicated that "Ca. R. kotlanii" is an independent taxon in the SFG. Although clade 3 includes well-known human pathogenic SFG Rickettsia, its pathogenicity remains unclear because genetic markers/traits specific to human pathogenic species are not known. However, as "Ca. R. kotlanii" has not been detected in humans so far, this species may be nonpathogenic.
Notably, the genomes of the two "Ca. R. kotlanii" isolates were almost identical, although they were isolated from different tick species in different regions and years. This suggests that the genomic diversity of "Ca. R. kotlanii" is extremely low, as shown for R. japonica and Rickettsia heilongjiangensis, 7,27 although this possibility needs to be verified by genome analyses of more isolates from other geographic regions, especially Europe. Such analyses will also reveal the vector range of "Ca. R. kotlanii," as genome sequencing provides accurate species and lineage information. The only difference between the two isolates was a variation in the copy number of the 45 bp repeat unit in the gene for RickA. Although RickA has been reported to induce actin polymerization and promote Rickettsia motility immediately after invasion, 21 it is unknown whether this variation in copy number affects the function of RickA.
The genome of "Ca. R. kotlanii" (1204 kb) is smaller than those of the TRG Rickettsia and the other SFG Rickettsia sequenced to date. In addition, direct genome sequence comparison of "Ca. R. kotlanii" with R. japonica revealed several small regions unique to "Ca. R. kotlanii." A pangenome analysis of the SFG and TRG Rickettsia, including "Ca. R. kotlanii," uncovered a total of 44 genes/gene clusters that are specifically or almost specifically present or absent in "Ca. R. kotlanii." However, the characterization of these genes suggests that most were apparently degraded, as we previously showed for the genomic difference between R. japonica and R. heilongjiangensis. 7 Therefore, analyses of the differences at the sequence (single-nucleotide polymorphisms and small InDels) or gene expression level will be required to understand the functional or physiological features unique to "Ca. R. kotlanii."