Brucella ceti and Brucella pinnipedialis genome characterization unveil genetic features that highlight their zoonotic potential

Abstract The Gram‐negative bacteria Brucella ceti and Brucella pinnipedialis circulate in marine environments primarily infecting marine mammals, where they cause an often‐fatal disease named brucellosis. The increase of brucellosis among several species of cetaceans and pinnipeds, together with the report of sporadic human infections, raises concerns about the zoonotic potential of these pathogens on a large scale and may pose a threat to coastal communities worldwide. Therefore, the characterization of the B. ceti and B. pinnipedialis genetic features is a priority to better understand the pathological factors that may impact global health. Moreover, an in‐depth functional analysis of the B. ceti and B. pinnipedialis genome in the context of virulence and pathogenesis was not undertaken so far. Within this picture, here we present the comparative whole‐genome characterization of all B. ceti and B. pinnipedialis genomes available in public resources, uncovering a collection of genetic tools possessed by these aquatic bacterial species compared to their zoonotic terrestrial relatives. We show that B. ceti and B. pinnipedialis genomes display a wide host‐range infection capability and a polyphyletic phylogeny within the genus, showing a genomic structure that fits the canonical definition of closeness. Functional genome annotation led to identifying genes related to several pathways involved in mechanisms of infection, others conferring pan‐susceptibility to antimicrobials and a set of virulence genes that highlight the similarity of B. ceti and B. pinnipedialis genotypes to those of Brucella spp. displaying human‐infecting phenotypes.

constitute a subgroup that is unique in its ability to live in and spread through the aquatic environment (Guzmán-Verri et al., 2012;Nymo et al., 2011). Since the first described case in an aborted Bottlenose dolphin (Tursiops truncatus) fetus in 1994 (Ewalt et al., 1994), B. ceti has been detected among a variety of species from both odontocetes (toothed whales) and mysticetes (baleen whales) groups, including the very recent first isolation from a Risso's dolphin (Grampus griseus) and a Killer whale (Orcinus orca) , from a Minke whale (Balaenoptera acutorostrata)  and three Sowerby's Beaked whales (Mesoplodon bidens) . Nevertheless, human cases of B.
ceti infection have been reported, either as a result of exposure to the pathogen in laboratory settings or naturally acquired and putatively related to raw shellfish consumption (Brew et al., 1999;McDonald et al., 2006;Sohn et al., 2003;Whatmore et al., 2008). Likewise, since its putative first detection in harbor seals (Phoca vitulina) in 1994 (Ross et al., 1994), B. pinnipedialis was reported to infect a variety of marine mammal species, including both true-seals (Foster et al., 2018;Hirvelä-Koski et al., 2017;Kroese et al., 2018;Lambourn et al., 2013) and eared-ones  among pinnipeds, as well as cetaceans of both odontocetes and mysticetes suborders (Buckle et al., 2017;Whatmore et al., 2017). The increase of reported B. ceti and B.
pinnipedialis infections in marine mammals, as well as their host range (Foster et al., 2018;Mauroo et al., 2020) and geographical endemicity expansion (Jensen et al., 2013;Ohishi et al., 2016;Whatmore et al., 2017), are aspects of concern, since they may impact conservation efforts on most vulnerable species already threatened by anthropogenic factors, environmental elements (Van Bressem et al., 2009) or by other pathogens (e.g., morbilliviruses) . Furthermore, the ability of B.
ceti, and potentially also of B. pinnipedialis, to spread across the human population by entering the food chain or after direct contact of humans with infected marine mammals represents a new potential zoonotic threat Maquart et al., 2008;Moreno, 2014;Nymo et al., 2016). Within this picture, undertaking a comprehensive characterization of the B. ceti and B. pinnipedialis genome is a preliminary and fundamental step to either understand the genetic features that contributed to the evolutionary success of these organisms infecting their hosts and spread through the marine environment or to identify the genetic basis of brucellosis pathological phenotypes. Although research efforts have recently focused in this direction, so far molecular investigations have mainly concentrated on genomic surveillance (Duvnjak et al., 2017;Maquart et al., 2008;Ueno, Kumagai, et al., 2020;Zygmunt et al., 2021) and molecular epidemiology (Garofolo et al., 2020;Kroese et al., 2018;Lambourn et al., 2013;Tian et al., 2019;, evolutionary phylogenesis (Audic et al., 2011;Duncan et al., 2014;Suárez-Esquivel et al., 2017;Whatmore et al., 2017) and more recently on the differences in pathogenesis between genotypes (Curtiss et al., 2022;Damiano et al., 2015;Ocampo-Sosa & García-Lobo, 2008), whereas functional characterization of B. ceti and B. pinnipedialis genes has gained less attention. Our study provides a comprehensive description of core-and pan-genome elements from all available (as of February 2022) B. ceti and B. pinnipedialis full genomes and raw sequencing data, with a specific focus on putative genetic determinants for virulence and pathogenesis. Our findings highlight the zoonotic potential of these emerging pathogens, whose ecological role at the human-wildlife interface poses a threat to the health of coastal communities worldwide.

| Available assemblies
The National Center for Biotechnology Information (NCBI) Assembly database (Kitts et al., 2016) was queried for the "B. ceti" and "B.
pinnipedialis" keywords (first access on 23 June 2021, last access on 1 August 2022) and all available assemblies were downloaded (n = 15).
When accessible, relevant metadata such as date of collection, country of isolation, host species, and pathology were retained.

| Raw sequences and assembling
The Sequence Read Archive (SRA) database (Leinonen et al., 2011) was queried with the "B. ceti" and "B. pinnipedialis" keywords and available experiments were further selected using the SRA Run selector imposing the Illumina platform, and metadata reporting at least the year of isolation and the geographic area. After download, the raw reads were trimmed using the fastp software (Chen et al., 2018), checked for contamination by Confindr software (Low et al., 2019), and only experiments showing a theoretical coverage, calculated on trimmed reads, equal or higher than 70X, and no contamination, were retained. Trimmed reads were de novo assembled using the Spades 3.15 software (Bankevich et al., 2012) with default parameters for 2 × 250 or 2 × 150 Illumina paired-end reads. Scaffolds longer than 200 bp were retained and polished by the Pilon software (Walker et al., 2014).

| Criteria for assembly inclusion into data sets
Both the downloaded genomes and those assembled in this study were evaluated by the Quast software (Gurevich et al., 2013), and only those showing an overall size of between 3.0 and 3.4 megabases, the number of contigs <250 and N50 > 125,000 were retained (n = 56). As an additional check, assemblies were analyzed to evaluate completeness and contamination by using the CheckM software (Parks et al., 2015), imposing at least 98% of completeness and contamination lower than 2%. Genome assemblies were also checked for the correct species assignment using the Minihash database (Ondov et al., 2016). Finally, the sequence type (ST) was extracted using the multi-locus sequence typing (MLST) software (Seemann, 2022, https://github.com/tseemann/mlst), and samples missing the call of two or more alleles were discarded. Overall, the criteria for inclusion into the final data sets are summarized in Supporting Information:

| Phylogeny
The phylogenetic placement of B. ceti and B. pinnipedialis assemblies was performed by alignment-free single nucleotide polymorphisms (SNPs)-based approach using the KSNP3 tool (Gardner et al., 2015) to obtain an SNPs matrix, followed by MegaX (Kumar et al., 2018) software analysis for deducing a maximum-likelihood (ML) phylogenetic tree after choosing the most suitable substitution model as calculated by the Model Selection routine implemented in MegaX. Both downloaded and assembled genome sequences were compared against all the assemblies belonging to the Brucella genus at any stage of completeness, retrieved from the NCBI assembly database, as long as they met the abovementioned inclusion criteria. The B. ceti and B. pinnipedialis intra-genus ML phylogenetic tree was obtained using the RaXML software (Stamatakis, 2014) with default parameters starting from the core genome alignment obtained by the core genome analysis (see below).

| Annotation and pan-genome characterization
The assemblies passing criteria were annotated using a local installation of the Prokka software package (Seemann, 2014) set with default parameters. Pseudogenes were identified using the pseudofinder software (Syberg-Olsen et al., 2022) starting from the Prokka annotation. The core/pan-genome investigation was performed by the Panaroo software package (Tonkin-Hill et al., 2020) with default settings except for the core-threshold parameter, which regulates the gene detection in the data set for a gene included in the core genome, which was set to 100%. The genome openness was calculated according to the work of Tettelin and colleagues (Tettelin et al., 2008), using the micropan R package (Snipen & Liland, 2015).
The pan-genome protein coding products were annotated according to the clusters of orthologous genes (COG) database (Tatusov et al., 2000) using the NCBI CD-batch search tool, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa et al., 2021) using the KoalaBlast tool at the prokaryotic species level.
The pan-genome was further investigated for its intrinsically pathogenic properties. Briefly, the pan-genome (including pseudogenes) was compared to a well-characterized set of protein databases specialized in virulence factors as the virulence factor database (VFDB) (Chen et al., 2005), comprehensive antibiotic resistance database (CARD) (Alcock et al., 2020), Resfinder (Bortolaia et al., 2020), and BacMET (Pal et al., 2014) databases, by the suitable Blast package algorithm (Blastp or tBlastx) (Altschul et al., 1990). The similarity threshold at the protein level and horizontal coverage values were both set to 80% for considering a positive match. Finally, the integrated prophages were searched using the Phage Search Tool Enhanced release (PHASTER) web server (Arndt et al., 2016).

| RESULTS
3.1 | Characterization of B. ceti and B. pinnipedialis genomic data set All B. ceti (n = 47) and B. pinnipedialis (n = 9) raw sequencing data available in the NCBI SRA database were downloaded and assembled and, together with already assembled B. ceti and B. pinnipedialis genomes (nine and six entries, respectively), made up the initial genomic data set for this study. Genome assemblies that did not match our inclusion criteria (see methods) were excluded from further analysis (n = 11). Overall, 60 genomic sequences were further characterized, including 50 B. ceti (Table 1) and 10 B. pinnipedialis genomes ( Table 2). Metadata of this initial data set showed that samples were acquired over 26 years  and, when indicated, referred to a large sampling campaign from South America (Costa Rica, n = 20), several samplings from Europe (Italy, n = 8; Great Britain, n = 12; Norway, n = 2; Spain, n = 3) and one from Asia (Japan, n = 1). B. ceti samples were collected from specimens belonging to several cetacean species, mostly odontocetes (toothed whales) such as the striped dolphin (Stenella coeruleoalba, n = 31), the common bottlenose dolphin (T. truncatus, n = 3), the short-beaked common dolphin (Delphinus delphis, n = 3), the harbor porpoise (Phocoena phocoena, n = 3), and the Atlantic white-sided dolphin (Lagenorhynchus acutus, n = 1), but also including mysticetes (baleen whales) such as the common minke whale (B. acutorostrata, n = 1). Most B. ceti samples were isolated from the central nervous system (including the brain, vertebral lesions, and cerebrospinal fluid, n = 29) followed by the spleen (n = 4), skin or subcutaneous lesions (n = 3), lungworms (n = 2), lung (n = 1), liver (n = 1), uterus (n = 1), placenta (n = 1), and connective tissue (n = 1). At the time of collection, signs of neurobrucellosis were reported in most individuals (n = 23), one showing signs of focal necrosis in the spleen, the liver, and the lymph nodes, and one displaying osteomyelitis, while in a few others (n = 3), no Brucella-related pathology was observed. Notably, while most of the samples were putatively ascribable to cetacean specimens found either dead or live-stranded, one was taken from one T. truncatus kept in captivity in an aquarium and reported to be affected by osteomyelitis (Table 1). For B. pinnipedialis, samples were collected from specimens belonging to the species P. vitulina (n = 2) and the hooded seal (Cystophora cristata) (n = 1) among pinnipeds, to ORSINI ET AL. B. acutorostrata (n = 1) among cetaceans, and the Eurasian otter (Lutra lutra) (n = 1) among other aquatic mammals. Samples were isolated from the spleen (n = 3) and lymph nodes (n = 2), with one individual showing signs of brucellosis, one with focal necrosis in the liver, and the other two reported as asymptomatic (Table 2). Overall, the size of the genome assemblies in the data set ranged from 3.27 to 3.39 Mb (GC%: 56.09-57.29), while the number of protein-coding genes returned in the annotation ranged from 3054 to 3251. Furthermore, for all but one sample, the MLST returned an assignment to an already established ST among the ST23, ST26, ST27, and ST24, ST25 for B. ceti and B. pinnipedialis respectively (Jolley & Maiden, 2010).
The remaining sample (B. pinnipedialis, GCF_015624465) could not be assigned to any ST due to gaps in one locus (Supporting Information: -  (Wattam et al., 2009(Wattam et al., , 2012, that is all the strains belonging to species related Europe (Great Britain, n = 2) and North America (the United States, n = 1) and were sampled from P. vitulina (n = 3) and L. acutus (n = 1).
We, therefore, included these samples in the following analysis raising our final data set to 53 B. ceti and 12 B. pinnipedialis genomes, | 5 of 20 respectively (Table 3)     On average, each of these accounted for more than 6% of the core genome and 1.1%-9.0% of the accessory one ( Figure 5a). Other  were exclusively represented by gene products from the core genome, the X was represented only by genes located in the accessory genome and the N was represented by genes equally distributed between the core and the accessory genome ( Figure 5a).   (Courvalin, 2006), which may be indicative of a non-fully functioning related pathway. Similarly, while efflux pumps were among the gene products related to the beta-lactam resistance pathway (Gruenheid & Moual, 2012), the lack of products encoded by genes such as BlaI/Z and AmpR/C suggests that the metabolic pathway branch leading to the beta-lactam degradation is somehow interrupted. In addition, no gene products were assigned to any heavy-metal resistance pathway, with the only exception of three gene products putatively involved in conferring resistance to platinum (Supporting Information:    (Figure 5b).

| B. ceti and B. pinnipedialis virulence factors genes
To further characterize the B. ceti pan-genome for the presence of determinants of virulence and pathogenesis, our genomic data set was searched against the VFDB (Chen et al., 2005). Results showed 61 entries matching with genes encoding for well-established virulence factors identified within multiple bacterial species including those in the Brucella genus (Figure 6a and Supporting Information:  Figure 6b). Conversely, the loci glmM_1 (manBcore gene) and  21251406.v1). Furthermore, the ricA gene product was found, whose interaction with the host Rab2 protein affects the BCV maturation, thus decreasing intracellular replication rates and contributing to the evasion of the innate immune response (de Barsy et al., 2011). Interestingly, 6 genes, all located in the B. ceti core genome are related to the production of brucebactin, a highlyconserved siderophore firstly identified in B. abortus and putatively acting as an iron transporter in iron-limiting conditions at the beginning of the stationary growth phase (González Carreró et al., 2002). Four additional gene products, all but one encoded by genes in the B. ceti core genome, namely the ugpB, bhuA, flhP, and fliQ, are virulence factor candidates according to the VFDB classification. The latter two display significant sequence similarity with orthologs in Bartonella bacilliformis, a Gram-negative bacterium that causes febrile anemia in humans (Scherer et al., 1993). Noteworthily, analysis of the B. pinnipedialis data set returned an identical pattern of virulence factors (Figure 6a,b and Supporting Information: Results showed that B. ceti and B. pinnipedialis share the most virulence-related genes with zoonotic terrestrial species, mainly located in the core genome (Figure 6b). The gene BtpB, whose function is to inhibit the innate inflammatory response in alveolar macrophages through the TLR/nuclear factor kappa B (NF-kB) pathway, and to suppress the formation of cellular reactive oxygen species (ROS) during Brucella infection (Li et al., 2022), is shared by B.
ceti and B. pinnipedialis with B. abortus and B. ovis (Figure 6b). The gene RicA, which interacts with human Rab2 (de Barsy et al., 2011) and is also required by B. abortus for intracellular proliferation (Fugier et al., 2009), is exclusively shared by B. ceti and B. pinnipedialis with this terrestrial zoonotic Brucella (Figure 6b). Notably, despite all Brucella species being described as nonmotile organisms (Coloma-Rivero et al., 2021), we detected in all species the two genes flip and fliQ, type III secretion exporters belonging to the flagella pathway.
Indeed, since flagellum is known to contribute to virulence, cell growth, and biofilm formation, it is deemed to provide Brucella with peculiar infection versatility (Coloma-Rivero et al., 2021). In addition, we found that the two genes are closely related to orthologs in species from the genus Bartonella, whose members are capable of wide-range interspecies transmission and long-lasting bacteremia (Supporting Information:

| B. ceti and B. pinnipedialis antimicrobial resistance (AMR) genes and prophage elements
The presence in our data set of genes encoding for products potentially involved in AMR was assessed by searching against the CARD (Alcock et al., 2020), Bacmet (Pal et al., 2014), and Resfinder (Bortolaia et al., 2020) databases. CARD database returned 2 hits as AMR candidates, which were present in all samples and whose genes are located in the B. ceti and B. pinnipedialis core genome. These are gyrA, which potentially confers resistance to the nalidixic acid (Khan et al., 2019), and mprF, which encodes for an integral membrane protein that confers resistance to cationic peptides (Ernst et al., 2009) (Supporting Information: to the tecC enzyme that confers resistance to tetracycline (Saenger et al., 2000). Notably, this gene product was detected in two samples only, tempting to speculate that B. ceti phenotypes may still display a substantial pan-susceptibility to antibiotics. By contrast, the tecC gene was not detected in any of the B. pinnipedialis samples (Supporting Information:  (namely, bepC, bepD, bepE, bepF, and bepG), all encoding for multi-drug efflux pumps and involved in the efflux of toxic and hydrophobic compounds, but also contributing to resistance to a drug such as a deoxycholate, sodium dodecyl sulfate and nalidixic acid (Martin et al., 2009;Posadas et al., 2007). Finally, the PHASTER web server (Arndt et al., 2016)

| DISCUSSION
The Brucella genus comprehends species such as B. melitensis, B. canis, B. abortus, and B. suis, which are all well-established zoonotic agents, but also species with peculiar host restrictions thatapart from sporadic cases of infectionhamper the spread of these bacteria across the human population (Moreno, 2014). Nevertheless, among these host-restricted species, the marine mammal-infecting B. ceti and B. pinnipedialis are emerging worldwide as pathogens of concern at the human-wildlife interface of coastal ecosystems (Guzmán-Verri et al., 2012;Nymo et al., 2011). However, such interface is somehow limited as regards human interactions with marine mammals, and given the overall low prevalence of human brucellosis ascribable to B.
ceti and B. pinnipedialis so far reported, a correct evaluation of the zoonotic potential of these organisms is of utmost importance (Głowacka et al., 2018). A key issue that needs to be addressed is whether the relative rarity of human infections with B. ceti and B.
pinnipedialis is due to low levels of human exposure to the pathogen or rather to the poor ability of these bacteria to infect human hosts.
On the one hand, evaluation of the level of exposure is not a trivial task, as it would require large epidemiological studies and extensive surveillance programs (Pereira et al., 2020 (Larsen et al., 2013), as well as parasites of edible fish such as the Atlantic Cod (Gauds morhua) Nymo et al., 2016), investigations towards their zoonotic potential and capability of entering the human food chain should be intensified.
Furthermore, we observed that the B. ceti and B. pinnipedialis genomes are pan-susceptible to all known antibiotics, and no evidence of recently acquired prophages was detected. Moreover, not all the assembled contigs were ascribable to known plasmids, which suggests that genetic determinants of virulence and pathogenesis cannot in principle be transmitted by conjugation. Nevertheless, we were able to identify antimicrobial genetic determinants in two B. ceti genomic samples from the Adriatic Sea, a Mediterranean region that is re-known for being subjected to high anthropogenic pressure (Basili et al., 2021;Bruschi et al., 2021). These aspects, combined with the genome closeness observed in our B. ceti and B.
pinnipedialis genomic data set, lead to the hypothesis that these bacteria have a natural tendency to remain confined to their particular ecological niche with few opportunities for genomic exchange with other bacterial populations. However, it is tempting to speculate that, when exposed to anthropogenic pressures and high levels of microbial biodiversity, they may readily display capabilities for the lateral acquisition of antimicrobial resistance genes (Wattam et al., 2014). Altogether, these features should act as a warning in arguing that, under the convergence of specific ecological circumstances, B. ceti and B. pinnipedialis may represent a real threat to human health, thereby improving surveillance on all human activities that can raise the levels of exposure to these organisms a priority in the one-health agenda.

ACKNOWLEDGMENTS
The authors want to thank Francesca Ellero for the fruitful discussion and the English language professional editing of the manuscript.

CONFLICT OF INTEREST
None declared.

DATA AVAILABILITY STATEMENT
All assemblies provided in this article are available under the NCBI

ETHICS STATEMENT
None required.