Conserved Active Site Architecture Between Bacterial Cellulose and Chitin Synthases

Glycosyltransferases (GTs) are a large and diverse group of enzymes responsible for catalyzing the formation of a glycosidic bond between a donor molecule, usually a monosaccharide, and a wide range of acceptor molecules, thus, playing critical roles in various essential biological processes. Chitin and cellulose synthases are two inverting processive integral membrane GTs, belonging to the type‐2 family involved in the biosynthesis of chitin and cellulose, respectively. Herein, we report that bacterial cellulose and chitin synthases share an E‐D‐D‐ED‐QRW‐TK active site common motif that is spatially co‐localized. This motif is conserved among distant bacterial evolutionary species despite their low amino acid sequence and structural similarities between them. This theoretical framework offers a new perspective to the current view that bacterial cellulose and chitin synthases are substrate specific and that chitin and cellulose are organism specific. It lays the ground for future in vivo and in silico experimental assessment of cellulose synthase catalytic promiscuity against uridine diphosphate N‐acetylglucosamine and chitin synthase against uridine diphosphate glucose, respectively.

Melina Shamshoum [a] and Filipe Natalio* [a] Glycosyltransferases (GTs) are a large and diverse group of enzymes responsible for catalyzing the formation of a glycosidic bond between a donor molecule, usually a monosaccharide, and a wide range of acceptor molecules, thus, playing critical roles in various essential biological processes. Chitin and cellulose synthases are two inverting processive integral membrane GTs, belonging to the type-2 family involved in the biosynthesis of chitin and cellulose, respectively. Herein, we report that bacterial cellulose and chitin synthases share an E-D-D-ED-QRW-TK active site common motif that is spatially co-localized. This motif is conserved among distant bacterial evolutionary species despite their low amino acid sequence and structural similarities between them. This theoretical framework offers a new perspective to the current view that bacterial cellulose and chitin synthases are substrate specific and that chitin and cellulose are organism specific. It lays the ground for future in vivo and in silico experimental assessment of cellulose synthase catalytic promiscuity against uridine diphosphate Nacetylglucosamine and chitin synthase against uridine diphosphate glucose, respectively. Glycosyltransferases (GTs) are essential enzymes that belong to a diverse and extensive group, playing vital roles in various biological processes, such as cell signaling, immune response, and protein folding. [1] Mechanistically, GTs catalyze the transfer of a sugar moiety from an activated donor molecule, most commonly a uridine diphosphate monosaccharide, to a wide range of acceptor biomolecules, including proteins, lipids, and other carbohydrates, thereby forming a glycosidic linkage. [2] Chitin and cellulose synthases are examples of the type-2 GT family that comprise inverting processive integral membrane proteins, which are involved in the biosynthesis of chitin and cellulose, respectively. [3] They catalyze the cleavage of uridine diphosphate nucleotide monosaccharides, namely, UDP-glucose [4] and UDP N-acetylglucosamine, [5] to form the corresponding polymer. [4a,5] In both biopolymers, each monomer is covalently linked to the non-reducing end of the nascent linear polymer by β-1,4-glycosidic bonds and rotated by 180°from the last added monomer. As a result, the nascent polymer is channeled through the transmembrane pore, extruded outside the cell, [6] and assembled into a highly organized structure.
It is generally accepted that cellulose synthases originated in many diverse bacteria, including members of the cyanobacterial lineage, probably as a response to environmental stresses such as UV radiation and desiccation. [7] The cyanobacterial ancestral genes were potentially transferred by horizontal transmission to algae and, further by vertical evolution, to plants. [7][8] The divergence of chitin synthase from cellulose synthases is still debatable. Ruiz-Herrera et al.
suggested that approximately 1 billion years ago, chitin synthases diverged from a β-glycosyl-transferase common ancestor with broad-range substrate-specificity. [9] This divergence might have initiated synthase substrate specialization [10] embodied by a remarkable separation between organisms that build either cellulose and chitin, i. e., cellulose developed into a major component of plant cell walls, [11] while chitin became a major component in the cell walls of fungi, single-celled algae, and the exoskeletons of arthropods. [6c] Later, a genome-wide study by Gonçalves et al. [12] proposed that bacterial chitin synthases originated through horizontal transfer from a eukaryotic donor. [12] In this study, we present findings that reveal a shared E-D-D-ED-QRW-TK motif between bacterial cellulose and chitin synthases active sites. Despite their low similarities in global amino acid sequence and structure, this motif is consistently observed in the active site of cellulose and chitin synthases from various bacterial species that are evolutionarily distant from each other. Additionally, this motif is spatially colocalized within these synthases.
We initially explored the level of global similarities between the amino acid sequences of bacterial cellulose synthase from Rhodobacter sphaeroides subunit A (BscA) [6a] and bacterial chitin synthase 1 from Pectobacterium atrosepticum (Pachs), in which low similarity (21.84 %) was observed ( Figure S1). However, the high similarities between the two substrates would suggest local similarities in the catalytic sites, potentially overshadowed by constraints of wholesequence comparisons at extremely high divergence. We, therefore, compared the spatial distribution of the amino acids located at the catalytic active pocket of BscA and Pachs. While the structure of BscA has been resolved, [6a] the structure of Pachs was predicted using AlphaFold. The results revealed spatial superposition for some of the active site amino acid residues for BscA and Pachs. Figure 1a shows their superimposition, where similarity is color-coded in green. This spatial analysis resulted in the identification of an overlapping E-D-D-ED-QRW-TK motif represented by the residues E151/E54, D180/D91, D246/D190, E342/E303, D343/ 308, Q379/Q342, R382/R345, W383/W346, T506/T544, K508/ K545 for BscA/Pachs, respectively. The E-D-D-ED-QRW-TK conserved motif spatially overlaps reasonably well between the BscA and Pachs (Figure 1b). However, the divergent positions of residues T506/T544 and K508/K545 might be attributed to the T544 and K545 residues of Pachs being located in a less well-predicted region or to a true structural effect induced by the absence of an alanine residue between the T and K residues in bacterial chitin synthases.
Interestingly, in the QRxW region, we found that x is a glycine residue (G381) in BscA, replaced by arginine (R344) in Pachs. These amino acids do not seem to interact directly with the substrate. Their location within the first coordination sphere, positioned between two crucial amino acids involved in catalysis, could potentially impact the orientation of the active site residues, the orientation of the substrate, and ultimately the efficiency of the catalytic process.
We explored whether the E-D-D-ED-QRW-TK motif is conserved in other cellulose and chitin synthases from different bacterial species and genera. To this end, we retrieved 83 bacterial sequences and aligned them using Clustal Omega. The resulting phylogenetic tree revealed a clear evolutionary separation between cellulose and chitin synthases ( Figure S2). We then aligned the same 83 sequences according to the E-D-D-ED-QRW-TK motif using a semiautomated approach by combining Probcons [14] and manual alignment. We consistently found the presence of the conservation of E-D-D-ED-QRW-TK motif in all of the bacterial amino acid sequences analyzed ( Figure S3), with a coherent presence of the arginine in chitin synthases and glycine in cellulose synthases in the QRxW region.
The phylogenetic tree built from this semi-automated alignment suggests that the origin of bacterial chitin synthases was preceded by bacterial cellulose synthases (Figure 2). According to Figure 2, the evolutionary mixture of bacterial cellulose and chitin synthases can be attributed to a gradual progression of evolutionary changes through several horizontal gene transfers between eukaryotic-bacteria and/or bacteria-bacteria while conserving the active site architecture. This hypothesis is supported by the genome-wide study by Gonçalves et al. [12] In this hypothetical framework, the occurrence of an early branching during the Plantae kingdom separation from the eukaryotic lineage approximately one billion years ago [9] is considered less likely. If chitin synthases had diverged into a distinct clade, subsequent horizontal gene transfer from a eukaryotic chitin synthase to a bacterium would exclude the involvement of cellulose intermediates, as depicted in Figure 2.  [6a] ) and Pachs color-coded in green for similar amino acids. The structure from Pachs was predicted using AlphaFold. [13] [14] and manual alignment, which suggests that bacterial cellulose synthases preceded the origin of bacterial chitin synthases. Red arrows point to selected cellulose synthases. Black arrows point to selected chitin synthases.
We explored whether the conserved E-D-D-ED-QRW-TK motif is spatially co-localized in the active sites of both cellulose and chitin synthases. To this end, we selected four evolutionarily distant bacterial cellulose synthases (Abcs, Alphaproteobacteria bacterium MedPE-SWcel; BscA, Rhodobacter sphaeroides; Otcs, Oceaniglobus trochenteri; Pocs, Pararhodobacter oceanensis) (Figure 2 red arrows) and four evolutionarily distant bacterial chitin synthases (Dpchs, Dickeya parazeae; Pachs; Avchs, Agrobacterium vitis; and Pcchs, Pseudomonas cichorii) (Figure 2, black arrows). All of the selected bacterial chitin synthases belong to different clades. The Absc belongs to an early clade together with cellulose synthase from Pseudoceanicola pacificus. BscA is found within clades containing mainly cellulose synthases. Otcs represent a clade immediately before branching into two major clades, one containing cellulose synthases such as Pocs and the other containing chitin synthases.
We used AlphaFold to predict the structure of these seven synthases. We then compared the cellulose synthase structures with the X-ray diffraction-resolved structure of BscA [6a] (Figure S4) and the chitin synthase structures with the EM-resolved structure of Candida albicans chitin synthase 2 (Cachs2 bound to UDP-GlcNAc, pdb: 7STM) [6b] ( Figure S5). The 3D complete structures of the seven synthases predicted by AlphaFold exhibited an extended degree of overlap with This includes the X-ray diffraction-resolved structure from Rhodobacter sphaeroides subunit A (BscA) [6a] and three AlphaFold-predicted structures from the bacteria Alphaproteobacteria bacterium MedPE-SWcel, Oceaniglobus trochenteri, and Pararhodobacter oceanensis, rainbow-colored from the N-to C-terminus. In all cases, the amino acids show a remarkable spatial overlap. b) Superimposed 3D structures of the E-D-D-ED-QRW-TK conserved motif from the active site amino acids of the EM-resolved structure from Candida albicans chitin synthases 2 [6b] and four evolutionary distant bacterial chitin synthases whose structures were predicted by AlphaFold, namely, Pectobacterium atrosepticum, Agrobacterium vitis, Dickeya parazeae, and Pseudomonas cichorii rainbow-colored from the N-to C-terminus. The amino acids display similar spatial distribution for all of the structures except for residue R646 (R lemon green) and the T and K residues (orange and red) located in a low electron-density region. c) Superimposed 3D structures of the E-D-D-ED-QRW-TK conserved motif from the active sites of four evolutionarily distant bacterial chitin synthases and the Cachs2 and the four evolutionary distant bacterial cellulose synthases showing good overlap and similar 3D architecture. d) same as (c) but with a molecular surface rainbow-colored from the N-to C-terminus, showing similar spatial distribution of the designated amino acids and 3D superposition of the conserved active site motif in both cellulose and chitin synthases. All figures were prepared using UCSF Chimera X v.14. [15] those with either X-ray diffraction or EM-resolved structures ( Figures S4 and S5).
Then, we extracted this motif from the X-ray diffractionresolved structure of BscA [6a] and cryo-EM-resolved structure from Cachs2 bound to UDP-GlcNAc and from the seven 3D structures predicted by AlphaFold using the semi-automated alignment to locate the amino acid positions ( Figure S3, Table S1).
We found that the 3D spatial distribution of the active site E-D-D-ED-QRW-TK conserved motif in three evolutionarily distant cellulose synthases (Abcs, Otcs, and Pocs) overlaps well with the residues found in the X-ray diffraction-resolved structure of BscA (Figure 3a and S4).
Regarding the chitin synthases, a direct comparison of the conserved E-D-D-ED-QRW-TK motif between the cryo-EMresolved structure of Cachs2 and the four evolutionarily distant chitin synthases (Pachs, Avchs, Ppchs, Dpchs) shows reasonably good overlap (Figure 3b and S5) with some discrepancies in the R646 residue and the TK region. Specifically, the R646 residue in Cachs2 exhibits a distinct alternative conformation compared to the predicted structures of other chitin synthases ( Figure S6a). However, it is noteworthy that this particular region surrounding the R646 displays no discernible electron density ( Figure S6b). Moreover, we found the most significant disparities in the 3D spatial location of the TK motif among the various chitin synthases examined, including Cachs2. In addition, from the Cachs2 structure, the TK region lacked electron density, leaving these two residues' most likely spatial location unresolved. Furthermore, this region was not modeled in the cryo EM-resolved structure of chitin synthase 1 from Phytophthora sojae, [6d] also due to the absence of electron density ( Figure S7). The overlap of the active site-conserved E-D-D-ED-QRW-TK motif for these representative bacterial cellulose and chitin synthases exhibits a remarkably similar 3D spatial distribution (Figures 3d and e).
Here, we demonstrated that the active site architecture between cellulose and chitin synthases is highly conserved between evolutionarily distant species, despite the very high dissimilarities among the overall amino acid sequences and 3D structures creating a theoretical framework for future experimental validation and where recombinant cellulose synthases can be probed for their catalytic activity against uridine diphosphate N-acetylglucosamine and recombinant chitin synthases against uridine diphosphate glucose. Combined with in silico simulations and active site mutant generation and catalytic activity assessment, this can reveal new insight into the relation between mechanism, architecture, and catalytic promiscuity of these enzymes.

Supporting Information
Supporting information is available at: https://github.com/ fnatalio/phylogenetics. It also contains all the AlphaFold structures and selected/extracted active site E-D-D-ED-QRW-TK motif, phylogenetic tree, and the alignment of 83 sequences from bacterial chitin and cellulose synthases.