A comparative whole genome analysis of Helicobacter pylori from a human dense South Asian setting

Abstract Helicobacter pylori, a Gram‐negative bacterium, is associated with a wide range of gastric diseases such as gastritis, duodenal ulcer, and gastric cancer. The prevalence of H pylori and risk of disease vary in different parts of the world based on the prevailing bacterial lineage. Here, we present a contextual and comparative genomics analysis of 20 clinical isolates of H pylori from patients in Bangladesh. Despite a uniform host ethnicity (Bengali), isolates were classified as being part of the HpAsia2 (50%) or HpEurope (50%) population. Out of twenty isolates, eighteen isolates were cagA positive, with two HpEurope isolates being cagA negative, three EPIYA motif patterns (AB, AB‐C, and ABC‐C) were observed among the cagA‐positive isolates. Three vacA genotypes were observed with the s1m1i1dic1 genotype observed in 75% of isolates; the s1m2i1d1c2 and s2m2i2d2c2 genotypes were found to be 15% and 10% of isolates, respectively. The non‐virulent genotypes s2m2i2d2c2 was only observed in HpEurope population isolates. Genotypic analysis of oipA gene, present in all isolates, revealed five different patterns of the CT repeat; all HpAsia2 isolates were in “ON” while 20% of HpEurope isolates were genotypically “OFF.” The three blood group antigen binding adhesins encoded genes (bab genes) examined and we observed that the most common genotype was (babA/babB/‐) found in eight isolates, notably six were HpAsia2 isolates. The babA gene was found in all HpAsia2 isolates but present in only half of the HpEurope isolates. In silico antibiotic susceptibility analysis revealed that 40% of the strains were multi‐drug resistant. Mutations associated with resistance to metronidazole, fluoroquinolone, and clarithromycin were detected 90%, 45%, and 5%, respectively, in H pylori strain. In conclusion, it is evident that two populations of H pylori with similar antibiotic profiles are predominant in Bangladesh, and it appears that genotypically the HpAisa2 isolates are potentially more virulent than the HpEurope isolates.


| INTRODUC TI ON
Helicobacter pylori is a highly successful human pathogen that colonizes the human gastric mucosa of over half of the world's population. The prevalence of H pylori infection appears to be higher in developing countries than developed countries and its prevalence vary between populations and between groups within the same population. 1,2 The prevalence in Asia, Africa, varies from 54.7% to 79.1%, in North and South America the prevalence is 37.1% and 63.4% and in Europe, the prevalence is averages 47.0%. 3 Bangladesh, one of the low socioeconomic country of South Asia with more than 160 million inhabitants, is the eight most populous country in the world. In a study, a high prevalence of H pylori (92%) was reported in Bangladeshi population. 4,5 Furthermore, H pylori infection in Bangladesh was reported to be significantly higher in smokers as compared to non-smokers or subjects below 15 years of age. 6 Despite high occurrence rate of H pylori infection in Bangladesh, the incidence of gastric cancer is low as in other developing countries. 7 Helicobacter pylori has been co-evolving with humans for more than 60 000 years. 8 This long intimate symbiotic association of H pylori with human has been led to the emergence of different genotypes by accumulation of host specific adaptive changes over the period 9 and is characterized by distinct genotypes that predominate in different geographical regions around the world. 10 Moreover, the occurrence frequency and severity of gastric illnesses associated with H pylori are found to be strongly associated with the dominant genotype in that region. For example, the incidences of gastric cancer is higher in East Asian countries such as Japan and Korea when compared to Western countries. 11 Helicobacter pylori can colonize a host during infancy establishing a chronic infection that can persist for decades, if not eradicated. 12 The bacterium has numerous mechanisms to manipulate and evade host defenses to ensure its stomach survival. To counter the acidic pH in the stomach, H pylori utilizes urease enzyme that forms a cloud of neutral micro-environment around the bacterium. 13 CagPAI (cytotoxin-associated gene pathogenicity island) containing several virulence genes that trigger abnormal cellular signals is considered to be the most important risk factor for H pylori-associated gastric cancer. CagA, the most important virulence factor of CagPAI, plays a crucial role in H pylori pathogenesis. Comparative analysis revealed a significant functional difference in East Asian cagA which showed to induce higher pro-inflammatory secretions as compared to Western cagA. Furthermore, cagA activates a number of signal transduction pathways that bind and disrupt the function of epithelial junctions, leading to aberrations in functioning of tight junction, cell polarity, and cell differentiation. 14 H pylori also produces vacuolating cytotoxin A (vacA), which after entering in host cells by endocytosis, induces various cellular activities, including membrane channel formation, cytochorome c release leading to cell death, and cell membrane receptor binding, which initiates a proinflammotory response. 15 Similar differences in its functionality like cagA have also been associated with vacA which is believed to be the result of host specific adaptive changes. 16 The gene encoding vacA shows allelic diversity in its signal (s) regions (alleles: s1 and s2) and middle regions (alleles: m1 and m2). In vitro experiments showed s1m1 strains induce cell vacuolation more frequently than s1m2 and s2m2, from which it was inferred that the s1m1 was more cytotoxic. 17 Allelic differences in the intermediate (i) region (alleles: i1 and i2) are suggested to be correlated with severity of disease as compared to the "s" and "m" regions. 18 The more frequently observed deletions in the VacA have been classified and it has been observed that the deletion of 69-81 bp (d2) type is less virulent than the no deletion (d1) type. 19 The outer membrane proteins of H pylori are considered to be possible virulence factors. OipA (outer inflammatory protein), one member of this large protein family, is involved in bacterial adherence to the gastric epithelial cells and in mucosal inflammation. 20 Additionally OipA is associated with interleukin (IL)-8 induction, mucosal damage and with duodenal ulcer. 20 In addition to this, the bab genes have been shown to be positively correlated with gastric cancer and duodenal ulcers. 21,22 Similar to other H pylori OMPs, BabA has two closely related paralogues, BabB (also known as HopT) and BabC (also known as HopU), although these paralogues are not well characterized.
Inappropriate and irrational use of antibiotics against infectious diseases has already resulted in the emergence of multi-drug-resistant bacteria globally. Currently, triple therapy comprising two antibiotics and a proton pump inhibitor is used as eradication regime for H pylori. However, increasing antibiotic resistance trend to metronidazole, clarithromycin, and fluoroquinolones in H pylori has decreased the success rate of its eradication. Inactivation mutations such as frameshift and nonsense mutations, insertions, and deletions of the rdxA and frxA genes suggested to confer resistance to metronidazole in H pylori. 23 In addition, amino acid substitutions in rdxA are also suggested to confer resistance to metronidazole in H pylori. [24][25][26][27] On the other hand, the mechanism of fluoroquinolone resistance in H pylori has been associated to the mutations in quinolone resistance determining regions (QRDR) of the gyrA. 28 Moreover, the strategy employed by H pylori for clarithromycin resistance has been elucidated mainly due to mutations at nucleotide positions (A2142G, A2143G, A2142C, A2146, A2147G, and G2224A) in the 23S rRNA. [29][30][31] Despite high prevalence of H pylori in Bangladesh, little is known about the local lineages or the local prevalence of particular genotypes for cagA or vacA and babA genotypes and any associations with disease severity. 32 Therefore, this study was directed to evaluate the prevalence of distinct genotypes of cagA, vacA, and babA/B and other virulence factors together with its genotype-based antibiotic resistance profiles in prevailing Bangladeshi H pylori lineages and also to find the association with clinical outcome using a genomics-based approach to characterize genotypes and infer lineage classification. The findings would also provide a better understanding of the prevalence of antibiotic-resistant H pylori strains and its genotypic molecular mechanisms to facilitate the designing of more rational and effective combinatorial antibiotics therapy for eradication of H pylori infection. These data enable contextualization and comparison of drug-resistant H pylori-associated disease in Bangladesh with disease in other parts of the world.

| MATERIAL S AND ME THODS
The genome sequences for each of 20 randomly selected H pylori isolates collected from a cohort of 174 (125 adult and 49 children) H pylori-positive symptomatic or asymptomatic patients were determined (File S1). Out of 2010 H pylori isolates, 1% were selected in a two-step randomization process. In the first step, 20 patients (out of 174) were selected randomly and in the second step one isolate from each patient (from ten single colony isolates stored for each patient) was selected. The selected isolates were sub-cultured in selective medium (BHI-7.5% sheep blood plate, 0.4% isovitalex, 0.4% DENT supplement) under microaerophilic conditions (5% O 2 ; 15% CO 2 ; 80% N 2 ) at 37°C for 3-5 days. 32 Identity of the isolates was confirmed by mass spectrometry using MALDI TOF (Bruker, Germany). Genomic DNA was prepared from confluent growth using a commercial DNA extraction kit (Qiagen DNA Mini kit, Germany).
Genomic library was prepared using Nextera DNA sample prepara-

| Bacterial genome assembly and annotation
Paired Fastq files obtained for each isolate were processed as follows. Low-quality bases were removed and trimmed using the NGS QC toolkit 33 and FAST-X Toolkit (http://hanno nlab.cshl.edu/fastx_ toolkit), respectively. Read sets were then assembled using SPAdes to produce a draft genome sequence for each isolate. 34 Contigs were reordered to be consistent with the genome of H pylori 26695 using Contig Layout Authenticator (CLA). 35 Gene prediction and annotation of the assembled draft genomes were carried out by using Prokka. 36 The Artemis genome viewer was used to access specific annotated features in each of the draft genome sequences. 37 tRNA and rRNA were identified in the draft genomes using tRNAScan and RNAmmer, respectively. 38,39 The identification of phage-related regions was carried out using PHASTER. 40 In addition, sequences of CagPAI-positive and CagPAI-negative isolates were aligned against H pylori 26695 (Typical HpEurope) and H pylori F57 (Typical HpEastAsia) reference strains using Blast Ring Image Generator (BRIG). 41

| Phylogenetic analysis
Thirty-one 31 reference H pylori genome sequences representing different population/lineages were downloaded from the National Center for Biotechnology Information (NCBI) (Listed in Table S1) database. All sequences from the present study and the reference sequences were used to construct whole genome-based phylogenetic tree using Harvest. 42 The H pylori 26695 genome sequence was used as a reference for the core genome alignment. The tool was run to build a core genome-based phylogenetic tree that excluded recombination regions as previously done by Kumar et al 43 The phylogenetic tree was visualized in interactive tree of life (iTOL). 44

| Core genome and pan genome analysis
OrthoMCL was used to identify orthologous gene clusters using predicted protein sequences of all studied isolates (minimum threshold of 50 amino acid in length with identity and e-value parameters were at 70% and 0.00001, respectively). 45 The identified genes were aligned against the EggNOG database to define their functional categories. Genes contained more than one domain of distinct categories were classified as multiple class genes. The functional categories were graphically represented using R (http://www.R-proje ct.org).
Genes without appropriate hit against the database were classified as unknown genes. Furthermore, H pylori isolate-specific genes were also identified using OrthMCL followed by an in-house perl script. Similarly, the functionally categorized strain-specific genes by EggNOG were depicted graphically using R. In addition, a floral Venn diagram depicted the number of core genes and isolate-specific genes was produced using R. In addition to this, the core and specific gene content of HpAsia2 and HpEurope strains were also analyzed separately using OrthoMCL with the above-mentioned parameters.

| Identification of virulence-related and outer membrane proteins
The amino acid sequences of the predicted genes were compared with the H pylori virulence genes listed in VFDB. 46 Similarly, major OMPs from H pylori J99 were used to detect the genes encoding these protein in each of the isolates as described previously 47 using BLASTp. Identity and coverage 80% and 70%, respectively, was used as the detection threshold. 48 Among the virulence genes, we mainly focused on the presence and absence of BabA, BabB, and BabC gene and translation potential of a major OMP, OipA in the isolates.

| Phylogenetic analysis of CagA and VacA genotypes
The nucleotide sequences of the cagA and vacA genes from the genomes of the isolates and from the genomes of ten representative H pylori genomes from a range of genotypes from NCBI (Table S1) were extracted and aligned using CLUSTALW as implemented in MEGA 5.2. 49 These alignments were then utilized to construct cagA and vacA gene-based phylogenetic tree using neighbor-joining algorithm with 1000 bootstraps values. The output tree was visualized in iTOL in each instance. The multiple sequence alignments were also used for the allelic/genotypic classification of the cagA and vacA gene present in each isolate.

| In silico analysis of antibiotic susceptibility
Blastn search was performed to extract rdxA, frxA, gyrA, and 23S rRNA nucleotide sequences from each strain using reference genes hp0954, hp0642, hp0701, and hpr01, respectively, of H pylori 26695.
Extracted rdxA, frxA, and gyrA gene sequences were then translated into amino acid sequences prior to alignment. Sequence alignment of extracted 23S rRNA gene of each strain was performed at nucleotide level. Aligned sequences were then compared with reference sequence of H pylori 26695 to examine for reported and novel mutations.
Thirty percent (6/20) of the patients reported dyspepsia, heartburn, and occasional H2 blocker use. Among the isolates 12 (60%) were from gastric antrum, seven (35%) from gastric body and one (5%) were from gastric juice.  genes were present in the core of HpEurope. In addition to this, 142

| Genome characteristics
and 210 genes were identified as unique in HpAsia2 and HpEurope strains, respectively. In both lineages, majority of the unique genes were classified to L (replication, recombination and repair) and S (function unknown) classes.

| Core genome phylogenetic analysis
We used whole genome SNP-based phylogenetic analysis to infer the lineage of each of the isolates. All isolates from the present study were separated into two distinct population; 50% (10/20) of the isolates belonged to HpAsia2 (50%) and 50% (10/20) to HpEurope cluster despite being from one ethnicity (Figure 1). HpAsia2 and other with HpEurope. Moreover, one isolate belonged to HpEurope was lacking EPIYA-(C) motif (AB type) ( Table 2). Seven
Moreover, sequence analysis of the pre-EPIYA revealed that none of the isolates had a 39 bp deletion (approximately 300-bps upstream of the first EPIYA motif) that is typically observed in strains from Western countries. A total 55 EPIYA motifs were identified in  (Table S3).

| VacA Genotype
The predominant vacA "s" allele type was s1 (18/20, 90%) which is a typical of HpAsian and Western strains. The remaining isolates were s2 genotype and were present in HpEurope lineage isolates.
The occurrence rate of the vacA m1 allele type was 75% (15/20) while the m2 allele type accounted for 25% (5/20). VacA gene-based phylogenetic analysis of all 20 Bangladeshi isolates together with ten Western and East Asian harboring vacA gene revealed a clear distinction between m1 and m2 sequences (Figure 3). The preva-

| OipA gene
We found that oipA gene is present in all 20 isolates and showed five different CT repeat patterns (Table 3). Furthermore, we found that nine isolates out of 20 contained six CT repeat sequence displaying the "ON" status of OipA, while five strains that also had similar status ("ON") contained (2 + 3) pattern of CT repeats. Two strains were found to harbor a CT repeat architecture of (5 + 2) and another two with nine direct repeat of CT dinucleotide and in both cases the gene status was "ON." In addition to this, 61A5 and 89B9 isolates were having 10 direct CT dinucleotide repeat and in that case gene status was found to be "OFF" and these two were belonged to HpEurope lineage.

| Bab genes
The prevalence of bab genes among the isolates were studied. The    (Table S4).

| In silico antibiotics susceptibility analysis
In silico antimicrobial susceptibility analysis revealed that 40% of the strains were multi-drug resistant. We also detected that 90% (18/20) of the strains were resistant to metronidazole. Of those metronidazole-resistant strains, nine were expected to express non-functional  Table 4).
An extraordinary N-terminal extension of GyrA by five amino acid residues (QDNSV) and amino acid exchanges in QRDR (N87, D91, and R295) occurs solely in fluoroquinolone-resistant H pylori. 31 Our genotypic analysis also revealed that 45% (9/20) of the H pylori strains were resistant to fluoroquinolone antibiotics. Out of these nine fluoroquinolone-resistant strains, five had an unprecedented N-terminal extension of GyrA by five amino acid residues, immediately after starting codon, three exhibited amino acid exchanges in QRDR (D91) of H pylori and one strain showed a mutation at R295.
Point mutation at G2224 in 23S rRNA was detected in only one strain which is suggested to confer resistance to clarithromycin (Table 4).

| D ISCUSS I ON
Helicobacter pylori has a complex and long-standing coexistence with humans, so much so that particular bacterial lineages are strongly associated with regionally associated human lineages. 8 The plas-  genome comparison has allowed to observe the generalized preservation of lineage-specific differences. Higher percentage of core genes belonging to J (translation, ribosomal structure, and biogenesis) and M (cell membrane/biogenesis envelope) functional classes also identified in our analysis may potentially indicate the adaptive stress imposed by the dynamic micro-environment of the stomach to survive on this gastric pathogen as also previously reported. 43 Analysis of this study also revealed that a majority of specific genes belonging to L (replication, recombination, and repair) functional classes also indicating the requirement of H pylori to maintain a robust recombination and repair mechanism.
Based on the incidence rate of gastric cancer in people from There are several reports about the association of BabA-positive status of H pylori with increased risk for the development of peptic ulcer disease. 21 In this study, we looked into the prevalence of different bab genes in various combination among the isolates.
The study revealed that all HpAsia2 isolates were harboring babA and also most of the isolates had both babA and babB and no babC gene (babA/babB/-). Clinical outcomes of the patients carrying HpAsia2 lineage H pylori showed more gastric severity as compared to HpEurope carriers. This could be one of the important explanation because of absence of babA or babA/babB/-genes in most of the Bangladeshi isolates of HpEurope lineage. This result is also in concordance with the others which explains that the pathogenic potential of HpAsia2 is more as compared to HpEurope as described above.
Prevalence of metronidazole-resistant H pylori in Bangladesh is quite high, with resistance rate more than 90%. The resistance rates of levofloxacin and clarithromycin have been also increasing in H pylori isolates of Bangladesh. 32,54 Although the main mechanism of acquiring metronidazole resistance involves RdxA and/or FrxA inactivation mutations, a considerable number of missense mutations in both rdxA and frxA genes cannot rule out their role in metronidazole resistance by inducing conformational changes of RdxA and FrxA proteins. In tandem with remarkably high occurrence of metronidazole, our study also showed that about half of the strains would be resistant to fluoroquinolones. This acquired resistance is primarily due to amino acid substitution mutations in QRDR of GyrA protein that we also observed in some genotype predicted fluoroquinolone-resistant strains. Interestingly, more than half of the pre- Our study had several limitations; a two-step randomization procedure used in this study may not represent all variations which might be available in a population. In silico analysis of antimicrobial resistance may not identify novel or yet to be determined mutations associated with antimicrobial resistance.

ACK N OWLED G EM ENTS
Authors from icddr,b are grateful to the Governments of Canada, Sweden, Bangladesh, and the UK for providing core/unrestricted support.

D I SCLOS U R E
The authors have no competing interests.