Phylogeography of the marine pathogen, Vibrio vulnificus, revealed the ancestral scenarios of its evolution

Abstract Vibrio vulnificus is the leading cause of seafood‐associated deaths worldwide. Despite the growing knowledge about the population structure of V. vulnificus, the evolutionary history and the ancestral relationships of strains isolated from various regions around the world have not been determined. Using the largest collection of sequence and isolate data of V. vulnificus to date, we applied ancestral character reconstruction to study the phylogeography of V. vulnificus. Multilocus sequence typing data from 10 housekeeping genes were used for the inference of ancestral states and reconstruction of the evolutionary history. The findings showed that the common ancestor of all V. vulnificus populations originated from East Asia, and later evolved into two main clusters that spread with time and eventually evolved into distinct populations in different parts of the world. While we found no meaningful insights concerning the evolution of V. vulnificus populations in the Middle East; however, we were able to reconstruct the ancestral scenarios of its evolution in East Asia, North America, and Western Europe.


Martinez
. The phylogenomic analysis showed that V. vulnificus populations are largely divided into two main groups and two small clusters, one cluster dominated by strains from Western Europe and the other cluster by strains from Israel (Koton, Gordon, Chalifa-Caspi, & Bisharat, 2015;Lopez-Perez et al., 2019;Roig et al., 2018).
Despite the growing knowledge about the population structure of V. vulnificus, the evolutionary history and the ancestral relationship of strains isolated from Western Europe and Israel with those from other parts of the world have not been determined. Ancestral character reconstruction (ACR) can be used to recover various ancestral character states, including the genetic sequence (ancestral sequence reconstruction), protein sequence, gene order, morphological properties, and the geographical range of an ancestral population or species (Joy, Liang, McCloskey, Nguyen, & Poon, 2016;Pagel, Meade, & Barker, 2004;Pupko, Pe'er, Shamir, & Graur, 2000).

Molecular sequences collected in present time provide abundant
information about past evolutionary events. Relevant probabilistic models of molecular evolution enable reconstructing ancestral sequences to a sample of taxa, and phylogenetics provides an adequate framework for the reconstruction of ancestral sequences.
From a phylogenetic tree that depicts the evolutionary history of a sample of taxa, along with a set of corresponding homologous sequences, the sequences at each internal node of the tree can be estimated (Oliva et al., 2019). ACR is commonly used to reconstruct ancestral sequences that correspond to specific tree nodes. This can also enable determining changes in a character of interest on a phylogenetic tree over evolutionary time, by assigning the most likely ancestral character states to every internal node (Ishikawa, Zhukova, Iwasaki, & Gascuel, 2019). Global reconstruction over the entire tree describes the ancestral scenario of the character of interest.
In this study, we applied a phylogeography framework to reconstruct the spatial and temporal dynamics of V. vulnificus based on strains isolated from various geographical areas around the world.
As of September 2019, data of sequences and isolates of a total of 707 strains were available for this study. These included 1260 alleles of 10 MLST loci, resulting in 530 unique sequence types (ST).
Of the 707 strains, 219 were obtained from human tissues and the remaining (n = 488) from environmental sources (water and sediment), fish or eel, and shellfish. The list of strains used in this study and their assigned STs and alleles are available in Tables A1 and A2 (https://doi.org/10.6084/m9.figsh are.12666104).
Subsequently, the phylogenetic tree with the ML method was analyzed with IQ-TREE (Nguyen, Schmidt, von Haeseler, & Minh, 2014) using the default settings, with 1000 bootstrap values for tree evaluation. IQ-TREE (http://www.iqtree.org/) calculates and maps the likelihood of all possible sequence quartets using the best-fitting nucleotide substitution model (Schmidt, Strimmer, Vingron, & von Haeseler, 2002). GrapeTree was used to visualize the phylogenetic relationships between the various isolates. GrapeTree is a web-browser application that efficiently reconstructs and visualizes intricate minimum spanning trees, together with detailed metadata (Zhou et al., 2018). For the phylogeographic studies, this dataset was annotated with: sampling years (between 1964 and 2018); countries, which we grouped into 7 regions (North America, East Asia, Southeastern Asia, Western Europe, Middle East, South America, and Australia); and genotypes representing all three existing biotypes (biotypes 1, 2, and 3). We inferred an ML tree from the DNA sequences and rooted the tree using Vibrio parahemolyticus as an outgroup. The phylogeography of V. vulnificus was reconstructed from the ML tree, and locations were annotated using PastML with default options (Ishikawa et al., 2019). PastML uses decision-theory concepts to associate each node in the tree with a set of likely states. In the easy regions of the tree (typically close to the tips), the program predicts a unique state, whereas, in the more challenging parts of the tree (typically close to the root), it may predict several likely states, reflecting the uncertainty of the inferences. PastML takes as input a rooted tree (as  (Yang, 2007). The Joint is computed with the maximal posterior probability using dynamic programming (Pupko et al., 2000).

| Phylogenetic analysis
Phylogenetic analysis showed that V. vulnificus populations are divided into two main lineages, both of which included environmental and human-pathogenic strains. One lineage (lineage I) was dominated by strains from human samples, while another (lineage II) was dominated by strains from environmental sources (Figure 1). Two other small distinct lineages were also identified, lineage III consisting of biotype 3 strains (all belonging to sequence type 8) and lineage IV dominated by environmental strains. The same analysis based on the geographical source showed that the vast majority of strains within lineage I was dominated by strains from East and Southeastern Asia, while lineage II strains mainly originated from Germany, USA, and the Netherlands (Figure 2). Lineage III consisted of biotype 3 strains from Israel and a closely related genotype from Western Europe.
Lineage IV consisted of strains originating from Western Europe (Germany, Spain, Denmark, and the Netherlands). The list of strains with their assigned STs and lineages is shown in

| DISCUSS ION
Ancestral sequence reconstruction, using the combination of phylogenetic trees with extrinsic traits, was used to decipher the evolutionary scenarios of V. vulnificus. This study showed that the common ancestor of all V. vulnificus populations most likely originated in East Asia, and later evolved into two main clusters that spread and eventually evolved into distinct clusters in East Asia, North America, and Western Europe. is likely to have evolved from biotype 1 populations (Bisharat et al., 2005Danin-Poleg, Elgavish, Raz, Efimov, & Kashi, 2013;Efimov et al., 2014;Raz et al., 2014). Given the genetic relatedness between strains from Israel and Western Europe, it is reasonable to postulate that they share the same ancestral origins.  , 1975;Hollis, Weaver, Baker, & Thornsberry, 1976) and later from East Asia during the mid-1980s (Chan, Woo, Lo, & French, 1986;Chuang et al., 1992;Lee, Chung, & Lee, 2013;Park, Shon, & Joh, 1991)

. Human infections in Western Europe and the Middle
East were reported during the 1990s (Bisharat et al., 1999;Bisharat & Raz, 1996;Bock et al., 1994;Dalsgaard, Frimodt-Moller, Bruun, Hoi, & Larsen, 1996;Torres, Escobar, Lopez, Marco, & Pobo, 2002;Veenstra, Rietra, Coster, Slaats, & Dirks-Go, 1994). The global epidemiology of V. vulnificus is complex, and the changes observed are likely to have been the result of multiple factors such as the prevalence of the organism in the environment (Faruque & Nair, 2006), the impact of global warming on seawater temperatures which is extending the geographical range of V. vulnificus into the Northern Hemisphere (Vezzulli et al., 2016), and the emergence of new virulent strains due to the high and frequent horizontal gene transfer in the Vibrionaceae (Efimov et al., 2014;Kim et al., 2011;Quirke, Reen, Claesson, & Boyd, 2006). It is possible that improved diagnostics of infectious diseases may have played a role in shaping the epidemiology of V. vulnificus.
Our findings were based on sequence data from 10 housekeeping genes supplemented by sampling locations and sampling dates.
There are normally fewer polymorphic sites in individual housekeeping genes than in hypervariable genes. Nonetheless, the use of the combined sequences of multiple housekeeping genes has been shown to provide high discriminatory power while retaining signatures of long-term evolutionary relationships (Feil & Enright, 2004;Margos et al., 2008). The phylogeny inferred from MLST data in this study was highly congruent with the phylogeny inferred from draft or complete genomes (Lopez-Perez et al., 2019;Roig et al., 2018). This challenges the observations made by some authors that MLST-based phylogeny is inaccurate compared with phylogeny inferred from concatenated blocks of sequences or genome single-nucleotide polymorphism profiles (Tsang, Lee, Yiu, Lau, & Woo, 2017) (Yang et al., 2019). Similarly, persistent aquatic environmental reservoirs for V. cholerae O1 are present in Asia, causing a disease that has recurrently manifested in seven worldwide pandemics originating from Asia (Colwell & Huq, 1994;Mavian et al., 2020;Weill et al., 2017).
This study has some limitations that should be acknowledged when interpreting the findings. First, our conclusions regarding the ancestral states and the reconstruction of the evolutionary history of this species were based on sequences of 10 housekeeping genes and not whole-genome sequences, which has become a standard approach for studying the population structure and dynamics of bacteria. Within this context, using sequences of a limited number of conserved genes could be considered an acceptable proxy for inferring the population structure and evolution as has been recently used for studying the evolution of other vibrios (Moore et al., 2015;Ono et al., 2019). Furthermore, the resolution provided by MLST can be considered acceptable since sequence data included in the MLST scheme provide a valid phylogenetic signal (as described above) to infer the phylogeography and reconstruction of ancestral states.
Finally, one of the most important aspects in studies aiming to define the population structure, phylogeography, or routes of dispersal is the coverage (in space and time) of the pool of strains included in the study. To obtain reliable results, it is critical to use collections with a wide geographical and temporal coverage. For this aim, the historical repository of data deposited on the MLST website is a unique source of sequence data, in terms of a number of strains, countries of origin, and long time span. Second, another limitation of the study is the possible unbalance in the composition of the database with a dominance of clinical strains for some lineages (e.g., the Israeli strains that are dominated by clinical isolates). The absence of representative strains from environmental sources may introduce some bias and hence provide a partial picture of the genomic landscape of V. vulnificus. This may also imply that some human-pathogenic strains may have been imported via marine products from other sites. We believe that this scenario (import through the marine product) is highly unlikely in the case of disease emergence in Israel.
Lineage 3 clinical strains (Israel) have not been isolated in any other part of the world and have evolved from existing non-pathogenic environmental populations (Bisharat et al., 2005). Third, the methodology applied in this study for the inference of ancestral states and reconstruction of the evolutionary history is based on ML phylogeny. This methodology was initially tested on viruses with short sequences and rapid evolution (over short periods, few months to years) (Korber et al., 2000;Yang, Lauder, & Lin, 1995). Within this context, ML approaches effectively estimate the mutation rate and reconstruct ancestral scenarios. However, ML methodologies are not equally effective for bacteria, detecting less ancestral states (nodes in the tree) than other approaches based on Bayesian inference. Nevertheless, and rather interestingly ML-based methodology has been recently used for studying the evolution of Vibrio cholerae reservoirs in aquatic ecosystems (Mavian et al., 2020).

| CON CLUS IONS
Using a global collection of MLST sequence data of V. vulnificus spanning over 55 years, this study provided insights into the spatial and temporal dynamics of the evolution of V. vulnificus. We urge investigators of V. vulnificus to submit MLST data to PubMLST.org and thus contribute further knowledge and insights to the evolution of this important marine pathogen.

ACK N OWLED G M ENTS
We would like to thank all the contributors who have submitted isolate and sequence data to https://pubml st.org/vvuln ificu s/ throughout the years.

CO N FLI C T O F I NTE R E S T
None declared.

E TH I C S S TATEM ENT
None required.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets generated and analyzed during the current study are available at https://pubml st.org/vvuln ificus. Tables A1-A3 are available in figshare: https://doi.org/10.6084/m9.figsh are.12666104.