Comparative genomics of the Komagataeibacter strains—Efficient bionanocellulose producers

Abstract Komagataeibacter species are well‐recognized bionanocellulose (BNC) producers. This bacterial genus, formerly assigned to Gluconacetobacter, is known for its phenotypic diversity manifested by strain‐dependent carbon source preference, BNC production rate, pellicle structure, and strain stability. Here, we performed a comparative study of nineteen Komagataeibacter genomes, three of which were newly contributed in this work. We defined the core genome of the genus, clarified phylogenetic relationships among strains, and provided genetic evidence for the distinction between the two major clades, the K. xylinus and the K. hansenii. We found genomic traits, which likely contribute to the phenotypic diversity between the Komagataeibacter strains. These features include genome flexibility, carbohydrate uptake and regulation of its metabolism, exopolysaccharides synthesis, and the c‐di‐GMP signaling network. In addition, this work provides a comprehensive functional annotation of carbohydrate metabolism pathways, such as those related to glucose, glycerol, acetan, levan, and cellulose. Findings of this multi‐genomic study expand understanding of the genetic variation within the Komagataeibacter genus and facilitate exploiting of its full potential for bionanocellulose production at the industrial scale.

Komagataeibacter strains are highly resistant to acetic acid and are the dominant species of submerged vinegar processes, where acidities are highly elevated (Barja et al., 2016).
When growing in a liquid medium, synthesis of a cellulose membrane enables retention of cells close to the medium surface, where the amount of oxygen is high (Czaja, Young, Kawecki, & Brown, 2007;Williams & Cannon, 1989). The process of bacterial cellulose synthesis has been described only from the metabolic point of view.
Although AAB are known for their ability to oxidize ethanol, tolerance to extremely acidic conditions, and production of cellulose, these features appear to be transient, as they are often rapidly lost when cells are cultured in media without the selective pressure of acetate or ethanol (Beppu, 1993;Coucheron, 1991;Krystynowicz et al., 2002Krystynowicz et al., , 2005Sokollek, Hertel, & Hammes, 1998;Takemura, Horinouchi, & Beppu, 1991). This phenotypic instability has a negative impact on the industrial performance of these species. Studies in various, mainly pathogenic bacteria, have shown that the genetic basis of phenotypic instability has been most commonly caused by spontaneous mutations and genome rearrangements, often associated with genetic elements, such as insertion sequences (IS), genomic islands (GIs), transposable elements (TEs), and transposable bacteriophages (Brzuszkiewicz, Gottschalk, Ron, Hacker, & Dobrindt, 2009;Chan et al., 2015;Kung, Ozer, & Hauser, 2010). Some IS have been found in the genomes of Komagataeibacter species, which were associated with cessation of the EPS synthesis (Coucheron, 1993;Iversen, Standal, Pedersen, & Coucheron, 1994;Standal et al., 1994).
Moreover, alterations of the plasmid profile have been observed in cellulose-negative (Cel − ) cells (Coucheron, 1991). Other genetic elements of the mobile genome of the Komagataeibacter genus, however, have not been studied.
Overall, the molecular biology of Komagataeibacter has been tested to a limited extend so far and focused mainly on the metabolic pathway of cellulose production from glucose. Furthermore, different laboratories have been using various strains and typically considered only single genes in their analysis . In parallel, many changes in the taxonomic classification bring additional challenges in interpreting the available data. Published results appeared to be very strain-specific thus making any conclusions about metabolism or regulatory mechanisms, responsible for the BNC production rate, unattainable at the genus level. Therefore, there is a need to broaden the genetic knowledge and to find common features of these strains that drive them to cellulose overproduction. By using the next-generation sequencing (NGS) technology, we sequenced and assembled the genomes of three strains producing bionanocellulose. These are K. xylinus E26 from own collection, K. xylinus BCRC 12334, and K. hansenii ATCC 53582. Strain K. xylinus BCRC 12334 has been exploited mainly in Taiwan (Kuo, Chen, Liou, & Lee, 2016;Wu & Liu, 2012). The strain ATCC 53582, also known as NQ5, has been commonly used in the research worldwide, which can be emphasized by the fact that it has been sequenced by two independent teams during the time of our work (Florea, Reeve, Abbott, Freemont, & Ellis, 2016;Pfeffer, Mehta, & Brown, 2016b). Especially recently, we can observe a steep increase in the number of sequenced Komagataeibacter genomes (Pfeffer, Mehta, & Brown, 2016a;Pfeffer, Santos, Ebels, Bordbar, & Brown, 2017a,b,c;Wang et al., 2017;Zhang, Poehlein, Hollensteiner, & Daniel, 2018;Zhang et al., 2017). The growing number of genomic sequences of Komagataeibacter strains encouraged us to perform a comparative study with the aim to infer the precise phylogeny of the genus and to investigate sequence conservation in both, the core and the flexible part of the genome. By further focusing on functional diversity among the strains in carbohydrate uptake, EPS biosynthesis, and c-di-GMP signaling, we harness the gathered knowledge with the aim to explain the observed phenotypic variability of the in-house strains.
1% cellulase (from Trichodrma reesei ATCC 26921, Sigma-Aldrich) was added to the culture, and the released cells were harvested for genomic DNA purification.

| Genomic DNA isolation and genome sequencing
Genomic DNA from the three strains was isolated according to the procedure published elsewhere (Ausubel et al., 1992) with modifications. Bacterial cells from 3.0 ml of liquid culture were pelleted and washed twice with TGE buffer (25 mM Tris-Cl; 10 mM EDTA, 50 mM glucose; pH 8) prior to lysozyme (1 mg/ml in TGE) treatment (RT, 30 min). Next, SDS was added to the cells suspension (up to final conc. 0.5%), which was next treated with proteinase K (0.1 mg/ml final conc., Qiagen), at 56°C for 10 min. In the next step, incubation at 65°C for 30 min with CTAB and NaCl (final conc. 1% and 0.7 M, respectively) was applied. After cooling down, the nucleic acids were extracted twice with equal volume of phenol-chloroform-isoamyl alcohol (25:24:1) mixture and once with equal volume of chloroform/isoamyl alcohol (24:1) mixture.
Finally, RNaseA (Qiagen) hydrolysis was done at 37°C for 20 min followed by additional extraction steps (same as above). Purified genomic DNA was precipitated with 4 M ammonium acetate and 100% ethanol at RT. The DNA pellet was washed twice with 70% ethanol, air-dried, and suspended in TE buffer. All reagents were from Sigma-Aldrich if not otherwise stated. NGS libraries were prepared using Nextera XT DNA Library Preparation Kit (Illumina).
Genome sequencing was performed using the Illumina MiSeq platform, in 2 × 150 bp paired-end reads mode. The genomes were sequenced at 20× coverage, on average.

| Measurement of carbon source effect on the BNC yield
A pre-culture was prepared in 5 ml of HS medium (as described above) from one isolated colony and incubated for 72 hr. The final culture was prepared, using 5% inoculum, in 6-well plates with 10 ml of HS medium in which carbon source (glucose) was replaced with, either fructose, maltose, sucrose, or glycerol. The culture was incubated for 7 days at 30°C. BNC membranes were next treated with 2% solution of NaOH for one night and 1.5% of acetic acid for 4 hr, and then carefully washed in distilled water until neutral pH was reached. The purified membranes were pressed between two filter papers and dried at 80°C in a Gel Dryer apparatus (model 543, Bio-Rad) until a constant weight was reached (Tiboni et al., 2012). For each strain (K. xylinus E25, K. xylinus E26, K. xylinus BCRC 12334, and K. hansenii ATCC 53582 strains) and each carbon source, six cultures (replicates) were prepared.

| Scanning electron microscopy
Cellulose biofilms were harvested after 7 days of incubation in 250-ml flasks, purified by several washes in water and 0.1% NaOH.
Finally, water was exchanged into iso-propanol. Such membranes were deep-frozen in liquid nitrogen and then freeze-dried. Before imaging, probes were coated with gold. Membranes' surfaces were analyzed by Scanning Electron Microscope FEI, Quanta FEG 250, at 40,000× magnification, at Bionanopark sp. z o.o., Lodz, Poland.

| Genome assembling and annotation
The sequencing reads were assembled de novo using SPAdes (with activated mismatch careful mode and otherwise default settings; v. 3.5.0; Nurk et al., 2013

| Orthologous proteins analysis and core genome calling
The clusters of orthologous proteins in the analyzed genomes were generated using Proteinortho program (v. 5.11;Lechner et al., 2011

| Genomic islands, insertion sequences, prophages, and CRISPR-Cas loci prediction
Genomic islands were predicted using IslandViewer3 online ser-  Grissa, Vergnaud, & Pourcel, 2007). Only the confirmed loci were considered. The results generated by both programs agreed in the majority of cases.

| Functional enrichment of the core genome
The proteome of K. xylinus E25 was annotated using COG through WebMGA server of Wei Zhong Li Lab (http://weizhongli-lab.org/ metagenomic-analysis/server/cog/ (Accessed April 2016). In total, 2,461 genes were assigned to at least one COG category, which was the case for 1,402 proteins of the core genome set. Additionally, this proteome was annotated using RAST annotation service (Aziz et al., 2008). In this case, 1,350 genes were annotated with at least one RAST category. For every COG or RAST category, assigned proteins were counted in the entire and in the core genome. Multicategory proteins (assigned to more than one COG/RAST category) were counted in each of their categories. Functional enrichment was conducted using Fisher's exact test (one-tailed) as implemented in R.

| Phenotypic and genomic diversity of the Komagataeibacter strains
We have observed that four Komagataeibacter strains from the inhouse collection displayed phenotypic differences related to the produced cellulose pellicle (Figure 1a). Each of pellicles differed in microscopic arrangements of fibrils, which formed pores of various sizes. Consistent with previous reports, one of the strains, K. hansenii ATCC 53582, produced wider cellulose fibers (Czaja et al., 2007).
Furthermore, the four strains from the in-house collection varied in cellulose productivity when grown on different carbon sources with glucose and glycerol being most preferable, whereas maltose was the least preferable one ( Figure 1b). Similar discrepancies have been published in independent studies testing other cellulose-producing strains (Keshk & Sameshima, 2005;Mikkelsen, Flanagan, Dykes, & Gidley, 2009;Nguyen, Flanagan, Gidley, & Dykes, 2008). The most explicit differences between all tested strains were pronounced when grown on glucose. Specifically, K. hansenii ATCC 53582 and K. xylinus E26 strains productivity was four and three times higher as compared to the other two K. xylinus strains, respectively. Interestingly, we noticed exceptionally high cellulose productivity for the K. xylinus E26 strain grown on the medium with glycerol as the main carbon source and K. xylinus BCRC 12334 strain grown on sucrose ( Figure 1b). Another important difference between the phenotypes of the investigated strains is stability of cellulose production. Among the four in-house strains compared here, K. hansenii ATCC 53582 strain was the most productive in submerged cultures (data not shown).
To understand the genetic basis of the observed phenotypic differences between the three K. xylinus in-house strains and the K. hansenii ATCC 53582 strain, their genomes were sequenced using the NGS technology. Since the complete genome of K. xylinus E25 strain was already available (Kubiak et al., 2014), the remaining three genomes were sequenced. The sequencing was conducted in a cost-effective manner, at low coverage of approximately 20×.
The assembled draft genomes consist of 300 contigs, on average (Table 1). The assembly of K. hansenii ATCC 53582 genome is of the highest quality (the highest contiguity) and has the smallest GC content among the sequenced genomes (Table 1). Moreover, mapped read sequences of K. xylinus E26 and K. xylinus BCRC 12334 strains, unlike those of K. hansenii ATCC 53582 strain, covered the genome of K. xylinus E25 strain almost completely ( Figure 1c).
Next, we estimated genomic distances between these four genomes using the MUM index, which is based on the number of maximal unique and exact matches shared by two genomes and takes values between 0 and 1, for very similar and very diverged genomes, respectively (Deloger, El Karoui, & Petit, 2009 We annotated de novo our and publicly available genome sequences of Komagataeibacter strains using Prokka (Seemann, 2014). Additionally, one complete genome of Gluconacetobacter diazotrophicus PAl 5, a well-studied free-living strain, was included as a reference. Table 1 Figure S1).
Next, we generated a cladogram based on the MUM index to gain a rough overview of the sequence similarity among the twenty genome sequences. This analysis grouped genomes into several clades according to reciprocally low values (0-0.2; Figure 2a

| Phylogenetic analysis and the core genome of the Komagataeibacter genus
In order to verify the cladding pattern based on MUMi values, we performed the phylogenetic analysis based on the 16S rRNA sequence. This yielded a tree of low confidence, as tested by bootstrap test (Supporting Information Figure S2A). Moreover, on this tree, many of the K. xylinus strains clustered together with the Note. In bold are given genomes, which were sequenced in this work. Total assembly length for each genome was calculated by adding all genome sequences (chromosomes, plasmids, contigs, or scaffolds). Total number of predicted genes coding for rRNA (5S/16S/23S) is given in the "rRNA" column. +/−: the strain synthesizes/does not synthesize cellulose; NA: not applicable; ND: not determined; NP: no publication; TW: this work.
proteins from at least two strains. Out of these, 1,578 orthologous gene clusters are present in every Komagataeibacter genome (constituting the core genome). After further filtering of the core genome set to select those clusters, which hold only single copy genes, we chose a set of 868 ortholog clusters. The phylogenetic analysis based on this group yielded a tree separating the genomes into species-specific clades ( Figure 2b). The tree topology was further confirmed with a previously proposed method, which employed dnaK-groEL-rpoB genes only (Supporting Information Figure S2B; Cleenwerck, De Vos, & De Vuyst, 2010  this work); the second clade constitutes of the three K. hansenii strains (now on called in this work the K. hansenii clade).
Additionally, we tested the sharing pattern of the entire orthologs set (Supporting Information Figure S3). Here, again the fewest orthologs were shared between the K. hansenii and the K. xylinus clade. Next, we focused on the groups of orthologs, which are not shared by all Komagataeibacter strains by investigating the cladding of their presence/absence pattern (Figure 2c). Roughly three major orthologs' clusters were formed, one, the biggest, grouping genes present in every Komagataeibacter genome (genes clustered at the right side of the Figure 2c); one characteristic of the K. xylinus species; and another one distinguishing the K. hansenii strains. In general, the distinction of the K. hansenii strains from the rest of the  Figure S4).

| The mobile genome elements in Komagataeibacter strains
Sequence-based and phylogenetic analysis conducted so far defined the core genome of the tested Komagataeibacter strains and enabled their classification, suggesting the clear separateness of the K. hansenii clade. Next, we focused on the mobile part of Komagataeibacter genomes (namely plasmid DNA, genomic islands GIs, insertion sequences IS, and prophages) because phenotypic diversity among bacterial strains of the same species is typically connected with this portion of DNA (Darmon & Leach, 2014).

| Plasmids
The diversity among plasmid DNA has already been connected with phenotypic changes for other AAB strains (Akasaka et al., 2015;Azuma et al., 2009

| Short mobile genetic elements prediction
Two best-studied groups of short mobile elements commonly related to bacterial strains phenotypic diversity are insertion sequences (ISs) and prophages. There are well-established bioinformatics tools enabling prediction of these mobile sequences available. In order to assure accuracy, complete assemblies of genomes are required. Therefore, four chromosome sequences from the completed or closed genomes (  (Coucheron, 1991(Coucheron, , 1993Standal et al., 1994). In contrast, our predictions showed quite complex composition of the IS families; for example, fifteen of them were identified in K. xylinus E25 genome (Supporting Information Figure S6). Absence of prophage sequences is very unusual in bacterial genomes but has been reported previously, for example, for Acetobacter pasteurianus 386B strain-a stable and industrially exploited AAB representative (Illeghems, De Vuyst, & Weckx, 2013). On the other hand, the presence of numerous diverse prophages is often linked with short-term strain variation, for example, in Streptococcus pneumoniae lineages (Croucher et al., 2014). Therefore, the relatively low number of insertion sequences and lack of intact prophages in K. hansenii ATCC 23769 strain genome may be interpreted as an indication of the highest stability of this genome among the compared three representatives of the Komagataeibacter genus.

| Genomic islands prediction
In contrast to prophages and ISs, genomic islands (GIs) are linked with more ancient evolutionary events and frequently are composed of genes of horizontal transfer origin, which appeared to be beneficial for a bacterial cell (Croucher et al., 2014;Gyles & Boerlin, 2014).
Interestingly, genes encoded on GIs may take part in attenuation of other genetic elements' mobility (Darmon & Leach, 2014 (Gyles & Boerlin, 2014). It was expected to find the source of phenotypic divergence among the in-house K. xylinus strains in the sequences or orthologs' content of the identified genomic islands (Supporting Information Figure S7A). It appeared that only a few rearrangements on GI sequences could be found and that in each of the three analyzed strains, only 13-20% of GI-localized genes were unique (Supporting Information Figure S7B). Among unique sequences, genes connected with DNA rearrangements, transcriptional regulation, and cellular stress response (e.g., error-prone polymerases, HTH-transcriptional regulators, type IV secretion system genes and chaperons) were found, suggesting indirect influence on cellulose production process. There was a huge predominance of CDSs annotated as hypothetical proteins among them (namely: 68% in K. xylinus E25, 69% in K. xylinus E26, and 80% in K. xylinus BCRC 12334 strain), causing difficulty in posing any hypothesis about the functional pathways adopted on these islands.   First, we tested for the presence of TA systems in the studied genomes, using our orthologous groups identified previously. Only one toxin-antitoxin system (namely fitA/B) was found as common in all

| Predicted mechanisms regulating flexibility of Komagataeibacter genomes
Komagataeibacter strains tested with the exception of K. kakiaceti JCM 25156 and the whole K. hansenii clade, in which relB/E system is primarily found (Figure 3b).

| Functional diversity of the Komagataeibacter genus
Ortholog-conservation pattern displayed distinctive clusters, which may group proteins responsible for species-specific features ( Figure 2c). Investigation of these clusters, however, did not expose any particular functional groups among them, likely due to poor annotation (data not shown). In the next steps, we focused on functional analyses of the core genome. For this purpose, we investigated functions represented by the core genes. To do this, we performed functional enrichment, based on the COG and RAST functional categories, in respect to K. xylinus E25 genome annotation only (Supporting Information Figure S8A,8B). This analysis revealed a few overrepresented housekeeping COG categories, such as nucleotide transport and metabolism (F); coenzyme transport and metabolism (H); translation, ribosomal structure and biogenesis (J); posttranslational modification, protein turnover, chaperones (O). This could suggest some divergence among other functional groups and thus higher than expected intragenus functional diversity of the Komagataeibacter strains. Nevertheless, a meaningful analysis is not yet possible for these species, since functional categories, either COG or RAST, were assigned to only ca. 30% of the proteins in K. xylinus E25 genome. Therefore, we further concentrated on the most characteristic features, such as carbon source preference and exopolysaccharides production, including cellulose.

| Predicted carbohydrate uptake mechanisms
One of the surprising results revealed by functional enrichment was the lack of "Carbohydrates" category (Supporting Information Figure S8A,S8B) since soluble exopolysaccharides and cellulose secretion are regarded as shared features of the genus. In our four in-house strains, we observed variability in the utilization of the two carbon sources that promote the highest cellulose yield, that is, glucose and glycerol (Figure 1b). Comparative analysis be-

| Predicted glucose uptake mechanism
In gram-negative bacteria, glucose must pass through two membranes before it enters the cytosol. The permeability of the outer membrane is often dependent on the channels formed by porins.
Based on Prokka annotation and InterProScan search, one candidate for a glucose transporter in Komagataeibacter strains may be the homolog of OprB porin from Pseudomonas aeruginosa (Wylie & Worobec, 1995). OprBs of Komagataeibacter are also likely betabarrel proteins (predictions made using Boctopus2). The number of oprB gene copies varies depending on genome, with the highest number of fifteen genes predicted for K. sp. SXCC1 strain (Supporting Information Figure S9). Due to the high number of oprB gene copies in each Komagataeibacter genome, it is possible that these proteins play an important role in carbohydrates transport (potentially, other than glucose as well; Wylie & Worobec, 1995). The conservation pattern of OprB orthologs highlights their diversity in the Komagataeibacter genus (Figure 4a).

GGGCGGGGCT GGCCT CGT CGGT CGAGA A T AT CGCCT A T T CCGA A CGT CAG GGGCGGGGCT GGCCT CGT CGGT CGAGA A T AT CGCCT A T T CCGA A CGT CAG GGGCGGGGCT GGCCT CGT CGGT CGAGA A T AT CGCCT A T T CCGA A CGT CAG GGGCGGGGCT GGCCT CGT CGGT CGAGA A T AT CGCCT A T T CCGA A CGCCAG GGGCGGGGCT GGCCT CGT CGGT CGAGA A T AT CG CCT A T T CCGA A CGT CAG GGGCGGGGCT GGCCT CGT CGGT CGAGA A T AT CG CCT A T T CCGA A CGCCAG GGGCGGGGCT GGCCT CGT CGGT CGAGA A T AT CGCCT A T T CCGA A CGCCAG GGGCGGG ------CCAGC AGGT CGAGA A T AT CGCCT A T CCCGA A CGCCAG GGGCGGGGCT GGCCT CGT CGGT CGAGA A T AT CGCCT A T T CCGA A CGT CAG GGGCGGGGCT GGCCT CGT CGGT CGAGA A T AT CGCCT A T T CCGA A CGT CAG GAGCGGGGCT GGCGT CGT CGGT CGAGA A T AT CGCCT A T T CCGA A CGT CAG GGGCGGGGCT GGCCT CGT CGGT CGAGA A CA T CGCCT A T T CCGAGCGGC AG GGGCGGGGCT GGCCT CGT CGGT CGAGA A CA T CGCCT A T T CCGAGCGGC AG GGGCGGGA CT GGCT T CGT CGGT GGAGA A CA T T G CCT A T T CCGAGCGT CAG GGGCGGGGCT T GCGT CCT CGGT T GA A A A T AT T GCCT A T GC CGA A CGGC AG GGGCGGGGCT T GCGT CCT CGGT T GA A A A T AT T GCCT A T GC CGA A CGGC AG GGGCGGGGCT T GCGT CCT CGGT T GA A A A T AT T GCCT A T GC CGA A CGGC AG
(a) galactose permease GalP (Hernández-Montalvo et al., 2003). Up to four copies of galP gene are present in the Komagataeibacter genomes and their products are well conserved (Figure 4b). Summarizing, our findings suggest, that glucose is not phosphorylated during transport through the inner membrane, but in the cytosol (by glucokinase).
The conservation pattern of both, OprB and GalP proteins, is almost exactly the same in K. xylinus E25, K. xylinus E26, K. xylinus BCRC 12334 strains. Furthermore, direct genome sequence comparison between these three strains did not identify a putative transporter present uniquely in K. xylinus E26 strain. Therefore, we cannot explain, at this stage, why K. xylinus E26 strain is more efficient in glucose-based cellulose synthesis. It is therefore possible that the observed differences in cellulose synthesis yield between the three strains are not due to differences in glucose transport, but due to distinctive modes of glucose metabolism regulation.

| Predicted glycerol metabolic enzymes
It has been shown in Escherichia coli that the proteins encoded by the glp regulon mediate the utilization of glycerol and sn-glycerol-3-phosphate (G3P) (Cozzarelli, Freedberg, & Lin, 1968). This regulon comprises of five operons which are located in three different regions of the genome (Zhao et al., 1994). We found homologs of glp proteins, which form one putative operon composed of glpD, glpK, glpX, fba genes in the tested Komagataeibacter strains (Figure 4c).
The components of this operon encode: aerobic G3P dehydrogenase, a cytoplasmic glycerol kinase, fructose-1,6-bisphosphatase, fructose-bisphosphate aldolase, respectively. The sequence upstream of the glp operon encodes the glpF gene, whose product is a cytoplasmic membrane protein facilitating diffusion of glycerol into the cell (Weissenborn, Wittekindtn, & Larsonsii, 1992). Adjacent to the glp operon is located the glpR gene, which encodes the repressor of the glp regulon of the deoR family of the transcriptional regulators (Zeng, Ye, & Larson, 1996). The repression of the glp regulon is relived in the presence of glycerol-P (Zeng et al., 1996). The majority of Komagataeibacter genomes also contain a homolog of the glpQ gene (located outside of the glp operon), whose pro duct is a periplasmic glycerophosphodiester phosphodiesterase. Moreover, several of the genomes carry additional copies of the predicted glp genes (Figure 4d). The structure of the glp operon is well conserved in the Komagataeibacter genomes, with the highest divergence displayed by the K. hansenii strains (Figure 4c). It can be noticed that, the glpF gene is much shorter in K. xylinus E25 strain than in other K. xylinus strains (Figure 4c). It is due to a single-nucleotide insertion in its sequence, resulting in a frameshift and a premature termination (data not shown). Since GlpF is a putative glycerol diffusion facilitator, its disruption may negatively influence the glycerol uptake. This could partially explain, why K. xylinus E25 strain has the worst cellulose productivity out of the four in-house strains, in the medium containing glycerol as carbon source (Figure 1b). On the other hand, a good performance of K. hansenii ATCC 53582 in this medium may be due to the presence of an additional copy of glpF gene in its genome (Figure 4d). A further inspection of the nucleotide sequence of the glp regulon allowed the detection of a polymorphism in the otherwise well-conserved region of the glpR gene (Figure 4e). The glpR sequence of K. xylinus E26 strain contained a six-nucleotide deletion and several mutations, which translated to the deletion of two amino acids (Leu61 and Ala62 in the consensus sequence; Supporting Information Figure S11) and four amino acid substitutions (Ser63Gln, Ser64Gln, Ser71Pro, and Gly156Arg in the consensus sequence; Supporting Information Figure S11). The deletion and the adjacent mutations are located at the C-terminus of the DeoR-like helix-turnhelix DNA binding domain predicted in this protein (InterProScan analysis). It is possible that binding of the GlpR repressor to the operator of the glp operon is impaired in K. xylinus E26 strain. Therefore, the expression of the glp operon may be active even in the absence of glycerol. This may explain why K. xylinus E26 strain has the highest cellulose yield in the medium containing glycerol among the tested strains (Figure 1b). From a more general point of view, this result shows that such an important phenotypic difference may be due to discrete changes in a single gene sequence, which influences indirectly the metabolic pathways via regulatory mechanisms.
These genes are homologous to gum-like heteropolysaccharide genes of Xanthomonas campestris responsible for xanthan synthesis, which has been recently discovered in Kozakia baliensis (Brandt, Jakob, Behr, Geissler, & Vogel, 2016). In K. xylinus E25 strain, this cluster consists of seventeen genes (Figure 5a). It is completely absent in K. europaeus LMG 18494, whereas the majority of these genes is missing in the K. hansenii strains, which was additionally verified using tblastn (Figure 5b). The most important seems to be the absence of aceA (gumD), which is thought to initiate acetan biosynthesis by transferring a glucosyl-1-phosphate residue from UDP-glucose to an undecaprenyl-phosphate lipid carrier anchored in the inner membrane (Brandt et al., 2016;. This would implicate that the K. hansenii strains and K. europaeus LMG 18494 should not produce acetan. However, it has been shown that some strains of these species do secrete EPS of similar monosaccharides composition as in acetan (Fang & Catchmark, 2014Valepyn, Berezina, & Paquot, 2012). It has been also observed that the presence of acetan in a culture medium influences its viscosity, thus enhancing cellulose dispersion (Ishida, Sugano, Nakai, et al., 2002). Moreover, it has been shown that EPS can modulate bundling and width of cellulose ribbons, and thus influencing cellulose porosity (Fang & Catchmark, 2014. Therefore, the differences in cellulose membrane structure observed for K. hansenii and the other three in-house strains may be due to the divergence in soluble EPS synthesizing enzymes.
The acetan gene cluster is flanked by pol genes (colored in navy blue in the Figure 5a,b). These genes are organized in a cluster in Acetobacter tropicalis and Kozakia baliensis (polABCDE; Deeraksa et al., 2005;Brandt et al., 2016). This cluster is responsible for the biosynthesis of capsular polysaccharide (CPS), which consists of galactose, glucose, and rhamnose. It has been shown that polE plays an important role in this process, since it is likely responsible for anchoring of the polysaccharide to the cell surface (Deeraksa et al., 2005). CPS is thought to serve as a protective barrier from acetic acid in AAB. However, it has been observed that Komagataeibacter species do not produce this membrane polysaccharide (Andrés- Barrao et al., 2013;Barja et al., 2016).
Our genomic analysis supports this finding as we observed that in all analyzed strains, polABCD operon is disrupted by the acetan cluster and, moreover, lacks the crucial polE gene (Figure 5a,b).
Komagataeibacter species must clearly have a distinctive, from other AAB, mechanism for acetic acid resistance.

| Levan biosynthesis
Apart from acetan, K. xylinus was also reported to synthesize levan, an EPS produced from extracellular sucrose (Kornmann et al., 2003;Limoli, Jones, & Wozniak, 2015). Activity of levansucrase, which is the main enzyme involved in the levan biosynthesis process, has been reported in K. xylinus I-2281 strain (Kornmann et al., 2003). By screening Komagataeibacter genomes (using tblastn with the levansucrase (LsdA) protein sequence of Ga. diazotrophicus SRT4; Arrieta et al., 2004), we were unable to find levansucrase gene in all but K. xylinus NBRC 13693 and K. kakiaceti JCM 25156 strains. In K. xylinus NBRC 13693 strain, levansucrase gene is arranged in a cluster with levanase (lsdB) gene ( Figure 5c) and the components of the type II secretion system (data not shown), similarly as it has been reported in Ga. diazotrophicus (Arrieta et al., 2004). The levanase is responsible for hydrolysis of levan to free fructose. Therefore, these genes are necessary for the bacterium to feed on sucrose. It has been reported that in Gluconacetobacter representatives, sucrose cannot be transported through cell membrane and has to be hydrolyzed into glucose and fructose in the periplasm (Velasco-Bedrán & López-Isunza, 2007). However, the obtained results suggest that the majority of Komagataeibacter strains should not synthesize levan. Other systems of sucrose transport and hydrolysis were not reported for this genus, and we were unable to discover them in the sequenced genomes. We have shown that culturing of the in-house strains in the medium with sucrose results in a low cellulose yield, except for K. xylinus BCRC 12334 strain (Figure 1b).
Since the genome of K. xylinus BCRC 12334 strain does not encode the lsdA-lsdB operon, this would suggest that there should exist another, yet undefined, system for sucrose metabolism in Komagataeibacter strains.
Furthermore, PilZ domain of this subunit has been shown as c-di-GMP binding site (Fujiwara et al., 2013;Morgan, Strumillo, & Zimmer, 2012). BcsB is a periplasmic protein, which participates in β-glucan chain synthesis and translocation . Role of BcsC and BcsD is unclear, and however, they are required for cellulose biosynthesis in vivo Römling, 2002). The type I cellulose synthase operon has on average size of 9 kbp (Figure 6a). When the structure of this operon is compared inside of the Komagataeibacter genus, several differences can be observed. First, in the K. hansenii strains, genes coding for subunits bcsAI and bcsBI are fused, unlike in the K. xylinus strains, as it was reported before (Saxena et al., 1994). What is more, some diversity in the structure of bcs genes can be observed. The subunit A of K. hansenii ATCC 53582 strain is much longer than in any of the Komagataeibacter strains. For K. hansenii ATCC 23769 and K. hansenii JCM 7643 strains, bcsCI is split into two, not similar parts.
In case of other strains, for K. medellinensis, bcsBI is disrupted due to a frameshift mutation, as previously reported for this cellulosenegative strain (Matsutani et al., 2015). The alignment shown in the subunit that is very well conserved among Komagataeibacter strains is bcsDI, which function is up to date the most speculative. It was suggested that bcsDI has originally evolved in AAB since it lacks homology with any other bacteria (Matsutani et al., 2015).
The type II bcs operon consists of four genes, bcsABII, bcX, bcsY, and bcsCII. It is believed that bcsII synthesizes the acylated, amorphous cellulose due to homology of bcsY to transacylase (Chawla et al., 2009;Umeda et al., 1999). This operon is absent in K. europaeus LMG 18494 and K. europaeus NBRC 3261 strains, as well as in Ga. diazotrophicus PAl 5 strain (Figure 6b), which was additionally checked using tblastn. K. oboediens 174Bp2 is unusual, as its genome contains four copies of bcsII operon (Figure 6b) (Matsutani et al., 2015). Additionally, there are many insertions or deletions (InDels) present at the start of the bcsCII across the compared strains, likely causing frameshift mutations in some of them.
Sequenced-based predictions suggest that bcsCII, like bcsCI, is a beta-barrel protein, likely forming a channel in the outer membrane (predictions made using Boctopus2; Hayat, Peters, Shu, Tsirigos, & Elofsson, 2016). Since the putative role of bcsC is export of cellulose, disturbance of bcsCII may influence secretion of the acylated polymer. Furthermore, sequence variations, in the both subunits, across Komagataeibacter strains may be responsible for differences in cellulose structure. Generally, the often-seen disruption of mainly the bcsC subunits suggests that cellulose export may be the first target of evolutionary forces.  K _ e u r o p a e u s _ L M G _ 1 8 8 9 0 K _ e u r o p a e u s _ N B R C _ 3 2 6 1 K _ h a n s e n ii _ A T C C _ 2 3 7 6 9 K _ h a n s e n ii _ A T C C 5 3 5 8 2 K _ h a n s e n ii _ J C M _ 7 6 4 3 K _ in te r m e d iu s _ A F 2 K _ in te r m e d iu s _ T F 2 K _ k a k ia c e ti _ J C M _ 2 5 1 5 6 K _ m e d e ll in e n s is _ N B R C _ 3 2 8 8 K _ o b o e d ie n s _ 1 7 4 B p 2 K _ r h a e ti c u s _ A F 1 K _ s p _ S X C C 1 K _ x y li n u s _ B C R C _ 1 2 3 3 4 K _ x y li n u s _ E 2 5 K _ x y li n u s _ E 2 6

| Diversity in the c-di-GMP-based regulatory network
K _ x y li n u s _ N B R C _ 1 3   Figure 7a). Already at this point, it appears a unique feature of the Komagataeibacter genus to have both domains present in tandem in the majority of c-di-GMP-metabolizing proteins (Figure 7a).
In Gluconacetobacter diazotrophicus PAl 5 and in strains of genera Acetobacter and Gluconobacter, it appears that this distribution is less skewed and there are many proteins harboring only a single, mostly a GGDEF, domain (Figure 7a,b). These numbers are nevertheless much higher than previously proven by Prof. Moshe Benziman (three cdg operons mentioned above; Tal et al., 1998).
After observing diversity in the number of c-di-GMP turnover proteins across the Komagataeibacter genomes, we next investigated how well these proteins are conserved. We observed a very complicated presence/absence pattern of these important proteins in the twenty analyzed genomes (Figure 7c). Furthermore, even the three canonical operons identified by Tal et al. (1998)    and may suggest primitive stage of these pathways organization in the Komagataeibacter genus. There is not enough data to speculate about specialization of the discovered genes in c-di-GMP metabolism or signaling by allosteric interactions. Nevertheless, cellulose producers seem to be a very poorly studied group of bacterial strains, which, when explored further, can bring interesting molecular discoveries in nucleotide-regulated signaling field.

| CON CLUS IONS
Importantly, this work contributed the first description of gene clusters important for carbohydrate metabolism (glucose and glycerol transport, EPS synthesis and secretion). Furthermore, comparative analysis enabled finding of Komagataeibacter features, believed to be genus-typical, which appeared to be strainspecific, for example, acetan and levan synthesis, or presence of three cdg operons. Therefore, many metabolic features, revealed by the previously reported production process optimizations, may appear not common for the whole genus. Moreover, great phenotypic differences may not be clearly distinguishable on genome level as it was exemplified here by our K. xylinus in-house strains, which were overall similar in DNA sequence. Most probably phenotypic diversity among the compared strains was due to discrete sequence changes (glpR polymorphism in K. xylinus E26 strain) or fine-tuning of gene expression and protein activities via yet to-bediscovered signaling pathways (the poorly understood c-di-GMP signaling network, hypothetical proteins identified as unique on genomic islands and in ortholog groups). Taken together, our findings showed that the genomic approach is sensible for elucidation of the numerous misleading results found in the literature. Precise In the future, together with the growing number of genomic sequences and biochemical studies, new genome engineering strategies should become available for cellulose production optimization.

ACK N OWLED G M ENTS
We are grateful to OpenExome s.c. and Department of Genetics, World Hearing Center, Institute of Physiology and Pathology of Hearing, Kajetany/Warsaw, Poland, for providing the training and facilities for genome sequencing. We would like to thank Teresa Pankiewicz for her valuable and multilevel help throughout the project's time. We thank Jolanta Płoszyńska for her help with media and strain preparation. We would like to further acknowledge Aleksandra Placzyńska (nee Kot) for her contribution to CRISPR-Cas loci finding. Furthermore, we would like to acknowledge Ganga Jena for her support. We are grateful to Nova Do and Bartłomiej Jan Blus for proofreading of the manuscript.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

E TH I C A L S TATEM ENT
The presented research did not involve studies with human or animal subjects or recombinant DNA.

DATA ACCE SS I B I LIT Y
The DNA sequences of the assembled genomes are available at NCBI under the BioProject accession: PRJNA339514, PRJNA339679, PRJNA339678. Additional data sets generated in this work are provided in the Supporting Information Data S1 of this article.