Polysaccharide utilization loci encoded DUF1735 likely functions as membrane‐bound spacer for carbohydrate active enzymes

Proteins featuring the Domain of Unknown Function 1735 are frequently found in Polysaccharide Utilization Loci, yet their role remains unknown. The domain and vicinity analyzer programs we developed mine the Kyoto Encyclopedia of Genes and Genomes and UniProt to enhance the functional prediction of DUF1735. Our datasets confirmed the exclusive presence of DUF1735 in Bacteroidota genomes, with Bacteroidetes thetaiotaomicron harboring 46 copies. Notably, 97.8% of DUF1735 are encoded in PULs, and 89% are N‐termini of multimodular proteins featuring C‐termini like Laminin_G_3, F5/8‐typeC, and GH18 domains. Predominantly possessing a predicted lipoprotein signal peptide and sharing an immunoglobulin‐like β‐sandwich fold with the BACON domain and the N‐termini of SusE/F, DUF1735 likely functions as N‐terminal, membrane‐bound spacer for diverse C‐termini involved in PUL‐mediated carbohydrate utilization.

Proteins featuring the Domain of Unknown Function 1735 are frequently found in Polysaccharide Utilization Loci, yet their role remains unknown.The DOMAIN and VICINITY ANALYZER programs we developed mine the Kyoto Encyclopedia of Genes and Genomes and UniProt to enhance the functional prediction of DUF1735.Our datasets confirmed the exclusive presence of DUF1735 in Bacteroidota genomes, with Bacteroidetes thetaiotaomicron harboring 46 copies.Notably, 97.8% of DUF1735 are encoded in PULs, and 89% are N-termini of multimodular proteins featuring C-termini like Laminin_G_3, F5/8-typeC, and GH18 domains.Predominantly possessing a predicted lipoprotein signal peptide and sharing an immunoglobulin-like b-sandwich fold with the BACON domain and the N-termini of SusE/F, DUF1735 likely functions as N-terminal, membrane-bound spacer for diverse C-termini involved in PUL-mediated carbohydrate utilization.
induces the expression of the susA-G gene cluster (Fig. 1B) [12].Subsequently, the lipid-anchored sugar-binding protein SusD and the cell surface starch-binding proteins SusE and SusF, bind starch and provide local proximity between the substrate and the membrane-bound endo-acting a-amylase SusG [6][7][8][9][10][11]. SusG further hydrolyses starch into maltooligosaccharides, which are transported into the periplasmic space via the TonB-dependent sugar transporter SusC.In the periplasm, SusA and SusB, acting as neopullulanase and a-glucosidase respectively, further degrade the maltooligosaccharides into glucose, which is transported into the cell for glycolysis (Fig. 1B).Similar physically linked gene clusters encoding synergistically acting proteins to degrade complex carbohydrates were found to be conserved within the phylum of Bacteroidota (previously named Bacteroidetes) [4,[13][14][15][16].Since the susC/D gene pair, encoding SusC/D, was highly conserved, its homologs became the genetic markers of gene clusters named Polysaccharide Utilization Loci (PULs).The PUL database (PULDB) (www.cazy.org/PULDB/[17]) predicts PULs in sequenced genomes by localizing a susC/D gene pair and extending the PUL with physically linked genes encoding CAZymes and regulatory proteins.Besides proteins with predicted functions, PULs can also contain Proteins of Unknown Function (PUFs) that often contain Domains of Unknown Function (DUFs), making PULDB a valuable tool for novel CAZyme discovery.A powerful advantage of PULbased enzyme discovery is the possibility to predict putative substrates for present PUFs by matching the predicted catalytic functions of annotated enzymes of the PUL with monosaccharides and linkages present in a substrate.While using this approach, PUFs containing DUF1735 have been observed frequently one to two genes downstream of the susC/D gene pair (referred to as the susE-F position).Literature reported that DUF1735 was the most overrepresented Pfam-A domain (curated Pfam domains; automatically generated domains are labeled as Pfam-B) in the Human Gut Microbiota (HGM) and a remote homology of DUF1735 to SusE was mentioned [18][19][20].To our knowledge, only two DUF1735-containing proteins, namely BT3986 (DUF1735 + LamG3; UniProt: Q8A0N5_BACTN) and BT3987 (DUF1735 + GH18; UniProt: Q8A0N4_BACTN) have been investigated but only their C-termini were functionally characterized and used for naming the complete protein [21][22][23].The exclusion of DUF1735 in the protein annotation made gathering all published knowledge challenging.Therefore first, PULDB was used to identify all literature-derived PULs encoding DUF1735 to collect their corresponding literature.Subsequently, we developed two programs, the DOMAIN and VICINITY ANALYZER that generated comprehensive datasets of all DUF1735-containing proteins available in the Kyoto Encyclopedia of Genes and Genomes (KEGG) and UniProt [23,24].This dataset allowed the analysis of the taxonomical distribution of DUF1735 homologs, their abundance within genomes, as well as their PUL and domain architecture.
Due the observed abundance and specific location of DUF1735 at susE-F position in PULs of species belonging to the Bacteroidota phylum, we hypothesized that DUF1735 likely has a function in bacterial carbohydrate utilization.The previously investigated Bacteroidota-Associated Carbohydrate-binding Often N-terminal (BACON) domain (IPR024361, PF13004, cd14948) was also frequently found in the phylum of Bacteroidota and was described as N-terminal domains of multimodular proteins involved in carbohydrate utilization and mucin was predicted as putative substrate [6,13,15,16,25].The viral-specific crAss-BACON domain (from now on called BACON_2, PF19190) evolved from the BACON domain and is encoded in crAss-like phage that are predicted to infect species of the Bacteroidota phylum [25,26].The common N-terminal location and association to the Bacteroidota phylum of DUF1735, BACON, and BACON_2 motivated us to generate the same dataset for all three domains [25,26].

Analysis of DUF1735-containing proteins within literature-derived PULs
The locus tags of DUF1735 (PF08522)-containing proteins within literature-derived PULs in PULDB (www.cazy.org/PULDB/ [17]) were provided by Prof Nicolas Terrapon (accessed on 15.09.2022).For each hit the gene ID, host species, domain architecture, predicted functional annotation, and predicted substrate specificity of the PUL were manually extracted from PULDB and a database was created (Table S1).This initial dataset was used to confirm its abundance in PULs and to determine the position of DUF1735-containing proteins within literature-derived PULs.

Mining KEGG and UniProt for homologs with the DOMAIN ANALYZER
The automated analysis of the target domains was conducted with the DOMAIN ANALYZER (https://github.com/gaenssle/DomainAnalyzer), a program written in PYTHON 3.8 [27].This program performs three distinct steps.First, it retrieves all gene IDs associated with the entered domain name (DUF1735, BACON, BACON_2) from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and UniProt [23,24] via the DB GET function of genome.jp.Then, information on taxonomy, sequence, and domain architecture are downloaded for each gene ID.For UniProt IDs, data is retrieved via genome.jp.KEGG entries are obtained using the KEGG package from the Biopython module (Bio.KEGG.REST) with exception of the functional domain annotation, which was accessed from kegg.jp and subsequently filtered with a cutoff E-Value of 0.0001 to refine the analysis of the domain architecture.Last, the retrieved data are summarized and counted based on the distribution of domain architecture and taxonomic classification.The generated datasets are available in Tables S2A-C and S3 and are based on data downloaded in February 2024.

Studying genomic neighbors of target domains with VICINITY ANALYZER
To study whether a target domain (DUF1735, BACON, or BACON_2) was encoded in a PUL, the VICINITY ANALYZER (https://github.com/gaenssle/VicinityAnalyzer)was developed.It counts the occurrence of SusD (or other inputs) in AE5 gene distance of the target domain.The continuous numeration of genes within a genome stored in KEGG was exploited, as KEGG gene IDs consist of the corresponding KEGG genome ID and the index of the gene within the genome.As the enumeration of the gene indices for each genome were not always adjacent numbers (e.g. 1, 2, 3, . ..) but also occurred in steps of 5 and 10, the program was written to both detect and correct for such index numbering.By calculating the distance between KEGG gene IDs neighboring genes can be identified.By calculating the distance between KEGG gene IDs neighboring genes can be identified.SusD was selected as reliable PUL marker because it was consistently assigned a KEGG Orthology (KO).Besides the SusD KO K21572 (indicated as SusD-ID in Table S4), the SusD-associated domains 'RagB' and 'SusD-like domain' (indicated as SusD-D in Table S4), and the corresponding annotations 'SusD' and 'RagB' (indicated as SusD-N in Table S4) were applied as indirect PUL identifier.The generated dataset is based on data downloaded in February 2024 and can be found in Table S4.

Analyzing cellular localization based on signal peptide prediction
The extracted protein sequences of DUF1735-, BACON-, and BACON_2 domain-containing proteins (Table S2A-C, FASTA files on request) were used as input for signal peptides prediction using SIGNALP6.0[28].

Conserved residues and their structural localization in DUF1735-containig proteins
The Multiple Sequence Alignments of DUF1735, BACON, and BACON_2 seed sequences were downloaded from InterPro (accessed in February 2024) [29].BT3987 (6T8i) was analyzed with ConSurf and the pre-calculated Con-SurfDB analysis was used [30,31].

PeSTo analysis
DUF1735 crystal structures deposited in PDB were analyzed to predict interactions of the protein with other proteins, DNA/RNA, lipids, ligands, and ions using PeSTo (https://pesto.epfl.ch/)[32].

Analysis of DUF1735-containing proteins in literature-derived PULs
While mining PULs for CAZyme discovery, DUF1735containing proteins were frequently observed at the susE-F-like position in the PULs.To quantitatively confirm this observation and obtain more relevant literature, a dataset consisting of all literature-derived PULs in PULDB that contain one or more DUF1735 was created.A total of 135 individual DUF1735 domains, 117 DUF1735-containing proteins being encoded in 86 literature-derived PULs of 11 different species all belonging to the Bacteroidota phylum, and 10 corresponding publications were collected (Table S1) [33][34][35][36][37][38][39][40][41].Manual analysis of the obtained dataset confirmed that all DUF1735-containing proteins were indeed exclusively encoded at the susE-like (47%) or susF-like (35%) position, and only occasionally at sus-G-like position (18%; Table S1).Besides the DUF1735-containing proteins the most commonly predicted CAZymes in the 86 DUF1735-containing literature-derived PULs were the SusC/D markers (11% each), and members of the Glycoside Hydrolase family 18 (GH18; 6%), GH92 (4%), GH130 (1%), GH2 (1%), and Sulfatases family 1 (Sulf_1; 1.5%; Table 1).Members of GH18 are described as enzymes capable of chitin degradation or non-catalytic proteins such as xylanase inhibitors [42], concanavalin B [43], and narbonin [44].Members of GH92 are enzymes involved in the degradation of mannose-containing carbohydrates [21,45] and members of GH130 are mannose specific phosphorylases [46].GH2 is a very diverse family of predominantly exo-acting hydrolases accepting an array of different monosaccharides among which mannose [47].Thus, although the function of the majority (32% PUFs) of all proteins encoded in DUF1735-containing literature-derived PULs was unknown, mannose-based carbohydrates were the best informed suggestion as possible targets of the proteins encoded in these PULs.This was further supported by the endo-b-N-acetylglucosaminidase activity of BT3987 (DUF1735 + GH18), the only DUF1735-containing protein with described catalytic function to date [21].BT3987 is encoded in PUL72 (referred to as HMNG-PUL) of Bacteroidetes thetaiotaomicron VPI-5482 and it degrades high-mannose mammalian N-glycan (HMNG) together with the other enzymes of the PUL, including the sugar-binding protein BT3986 (DUF1735 + LamG3) another DUF1735-containing protein (discussed later) [21].
Our initial screening for DUF1735-containing proteins in literature-derived PULs confirmed their conserved location at susE-G-like position within PULs and suggested their relevance for the phylum of Bacteroidota whose species are known to produce a vast amount of enzymes required for complex carbohydrate degradation [4,[13][14][15][16].

Taxonomic distribution and genomic localization of DUF1735-homologs
To provide a comprehensive overview of DUF1735containing proteins beyond literature-derived PULs, KEGG and UniProt were mined for DUF1735 homologs by using the DOMAIN ANALYZER, a newly developed program.KEGG contained 9718 genomes of all kingdoms of life (Plants, Fungi, Animals, Bacteria, Archaea, and Protista) of which the majority (85%) was of bacterial origin (Fig. 2A).UniProt is the bigger database with 250 323 total genomes out of which 68% are bacterial (Fig. 2A).The extended dataset of DUF1735 homologs increased the number of individual DUF1735 domains by 5.7-fold (KEGG) and 30-fold (UniProt) from 135 found in literature-derived PULs to 775 in KEGG and 4047 in UniProt (Fig. 2B, Table S2A).For BACON and BACON_2 domains 911 and 780 entries in KEGG and 5942 and 4515 entries in UniProt were identified, respectively (Fig. 2B, Table S2B,C).For each entry, the KEGG genome and gene ID, the Uniprot ID, its annotation, its kingdom, phyla, and species of origin, the protein sequence and length, as well as its domain architecture were collected (Table S2A-C).To investigate whether the extended list of DUF1735 homologs were consistently encoded in PULs their vicinity to a SusD homolog was determined by using the VICINITY ANALYZER and the KEGG dataset.The results revealed that the majority (98%) of DUF1735 homologs were indeed in close vicinity to a SusD homolog, out of which 14% were identified via its unique KO K21572 (Tables S4 and S5).This exclusive presence of DUF1735-containing proteins in PULs suggests their involvement in PUL-based carbohydrate utilization.In comparison, only 38% of BACON domains and 28% of BACON_2 domains were found to be encoded in PULs (Tables S4 and S5).
Interestingly, of all bacterial genomes in UniProt only 0.6% belong to the Bacteroidota phylum (Fig. 2A) and 98% of all DUF1735 are encoded in one of their genomes (Fig. 2C).The virtually exclusive presence of DUF1735 within Bacteroidota genomes and the fact that 85% of all Bacteroidota genomes contained at least one DUF1735, highlights its evolutionary importance for this phylum (Fig. 2A,B).Similar to DUF1735, also the majority of all BACON domain-containing proteins (87%, 5177) were encoded in species of Bacteroidota (Fig. 2B,C).These results were as expected and aligned with the previously published correlation between BACON domains and the phylum of Bacteroidota [25].The BACON_2 domain is also most frequently encoded in species of Bacteroidota but only 35% (1580, Fig. 2B,  C).Notably, 16% (929) of all BACON_2-containing proteins were found in the Gram-positive phylum of Actinomycetota, highlighting a difference between DUF1735 and BACON_2 domain and making BACON_2 domains the only domains present in Grampositive species (Fig. 2C).
At least one copy of the 775 DUF1735 was present in 199 genomes resulting in an average of 4.0 DUF1735 per genome (Fig. 2B).A detailed analysis of the numbers of DUF1735 per genome, however, revealed that 13 species contained > 10 DUF1735 within their genomes with a maximum number of 46 DUF1735 in B. thetaiotaomicron 7330 (Table 2).This high number of gene copies suggests a relevant function of DUF1735 for B. thetaiotaomicron 7330, which is a keystone member of the HGM and a known specialist in carbohydrate degradation [48,49].All 46 DUF1735 were encoded in one of the many PULs of this species (Tables S2A and S4).Twelve of the species containing > 10 DUF1735 in their genome belonged to the order of Bacteroidales whose members are also commonly found in the HGM, while Paraflavitalea soli was isolated from greenhouse soil (Yongin city, Korea [50]) and belonged to the order of Chitinophagales.
The extended dataset generated by DOMAIN and VICINITY ANALYZER confirmed the virtually exclusive presence of DUF1735 in PULs of species belonging to the Bacteroidota phylum with up to 46 DUF1735 domains in a single genome.These results further supported the hypothesis that DUF1735-containing proteins are involved in PUL-driven carbohydrate utilization.
Similar to the function of SusE or SusF, BT3986 (DUF1735 + LamG3) acted as surface-bound sugarbinding protein, keeping the produced oligosaccharides in close proximity to the cell surface (Figs 1 and 3A) [21].The N-terminal DUF1735 domain of BT3986 (DUF1735 + LamG3) encodes a lipoprotein signal peptide and is likely responsible for anchoring its C-terminus to the cell surface (Fig. 3A).Notably, it was not specified which of the two domains had carbohydrate-binding [21], however, it is likely that the C-terminal LamG3 domain is responsible for it as carbohydrate binding capacities had been described previously for this domain [52].The two LamG3 domains of the multimodular arabinofuranosidase Abf43A-Abf43B-Abf43C (WP_024834488) from Ruminiclostridium josuimodular, are examples.They were characterized as new Carbohydrate Binding Modules (CBMs) capable to recognize a-(1,5)-linked L-arabinofuranosyl residues [52].A predictive analysis of the protein binding interface of the ALPHAFOLD model of BT3986 with PeSTo (https://pesto.epfl.ch/)further revealed that DUF1735 does not show any predicted interactions with ligands, while some residues of LamG3 were indicated to interact with ligands with prediction scores between 0.5 and 0.8 (Fig. S1).
Similar to the function of SusG, it has been shown that BT3987 (DUF1735 + GH18) acted as surfacebound GH; specifically as endo-N-acetylglucosaminidase separating the oligosaccharide moiety in HMNG from its polypeptide (Figs 1 and 3A) [21].Besides, Trastoy et al. solved five crystal structures of BT3987 (DUF1735 + GH18) in complex with its substrates and products, revealing that the N-terminal DUF1735 is not involved in substrate binding, which has been confirmed by a PeSTo analysis [22,32].Thus, no catalytic activity or carbohydrate binding capacity for DUF1735 itself has yet been described but a membrane-linker function that provides proximity of  its C-termini to other membrane-bound enzymes is suggested.This was not far-fetched as a similar function has been proposed for BACON domains [53].BoGH5A (BACON + GH5) encoded in PUL113 of B. ovatus ATCC 8483 has been functionally characterized and similar to BT3987 (DUF1735 + GH18) it contains a non-catalytic N-terminal and a catalytic Cterminal domain [21,53].The catalytic GH5 showed endo-xyloglucanase and acted as keystone enzyme for PUL-based xyloglucan degradation in B. ovatus.Based on its solved crystal structure (PDB: 3ZMR) and the predicted lipoprotein signal peptide in the BACON domain, the authors suggested that BACON anchors the catalytic C-terminus to the membrane, thus, acting as spacer allowing GH5 to freely move close to the membrane (Fig. 3B) [53].
The second most abundant domain architecture for DUF1735-containing proteins was DUF1735* + DUF4361 (18.8%,Table 3).Interestingly, DUF4973 is mostly overlapping with DUF1735 domain which might question their distinction (indicated as DUF1735*).Further, it was observed that DUF4973, DUF4361, and DUF5627 were exclusively present in combination with the N-terminal DUF1735 (Table S6).To our knowledge, no protein containing these domains has been functionally characterized to date but in the previously mentioned study by Cuskin et al. [21], two PULs encoding each a multimodular protein consisting of an N-terminal DUF1735 and a Cterminal DUF4361 (BT2624, BT3790) were described.BT2624 and BT3790 were both encoded at the susElike position in the two mannan degrading PULs, MAN-PUL1 (PUL 90 in PULDB) and MAN-PUL2.While the functional characterization of BT2624 and BT3790 is yet to be determined, there is a strong likelihood that proteins containing both DUF1735 and DUF4361 play a significant role in mannan degradation in B. thetaiotaomicron due to their simultaneous upregulation and expression with other functional members of the PULs [12].
For BACON-and BACON_2 domain-containing proteins, the diversity of C-termini was with 151 and 156 different domains, four-times higher compared to DUF1735-containing proteins, respectively (Table S2A-C).Similar to DUF1735 and DUF4973, BACON and BACON_2 domains frequently overlap indicating their similarity.They are, however, also the most frequent independent C-termini for BACON and BACON_2 domain-containing proteins (Table S2B,C).Other C-terminal domains were F5/8-typeC, cellulases, Leucine-rich repeat_5 (LRR_5), and Por secretion system C-terminal sorting domain (Por_secre_tail).Notably, LamG3 was only rarely present as C-terminus for BACON, and BACON_2 domain-containing proteins (Table S2B,C).
To investigate whether DUF1735-, BACON-, and BACON domain-containing proteins were membranebound, the extracted protein sequences were used as input for SIGNALP6.0 to predict signal sequences (Tables S4 and S5) [28].The most frequently predicted signal peptide for DUF1735-(96%) and BACONcontaining proteins (76%) encoded an N-terminal lipoprotein signal peptide (Tables S4 and S5).A smaller fraction (1% of DUF1735, 10% of BACON, 25% of BACON_2) encoded an N-terminal signal peptide for secretion or no signal peptide (4% of DUF1735, 14% of BACON, 0% of BACON_2).These results suggest that most DUF1735-, and BACON domain-containing proteins and a third of all BACON_2 domaincontaining proteins are indeed frequently membranebound and could likely provide local proximity of their varying C-termini to the outer cell membrane (Fig. 3, Table S4).

Structural evidence that DUF1735 proteins act as spacer
In the RCSB Protein Data Bank (PDB) the crystal structures of six proteins containing a DUF1735 domain and two proteins containing a BACON and/or BACON_2 domain were deposited (Table 4).DUF1735-and BACON domain-containing proteins both have an immunoglobulin (Ig)-like b-sandwich fold with a core structure of two packed antiparallel bsheets.The Ig-like b-sandwich fold had been reported for the N-termini of SusE and SusF of the Sus that also function as flexible spacer for their C-terminal CBMs similar to what has been described for BACON [1,25,58].However, it is important to mention that when comparing the Ig-like b-sandwich folds more closely differences become apparent (Fig. S2).
To strengthen the proposed function as spacer and highlight that a function as carbohydrate binding domain is expected to be unlikely, conservation studies on sequential and structural level were performed and combined.Surface-exposed aromatic residues are known to interact with cyclic monosaccharides via CH/ p-stacking [59][60][61].Only a two aromatic residues are conserved in the DUF1735 homologs and when visualized in the crystal structure of BT3987 (PDB: 6T8i), none of them appear to be on the surface.This makes a function of DUF1735 as carbohydrate binding module unlikely.Furthermore, all DUF1735 crystal structures deposited in PDB were analyzed to predicted interactions of the protein with other proteins, DNA/RNA, lipids, ligands, and ions [32].Our analysis revealed that DUF1735 does not show clear predicted interactions with either of the tested entities.Only some residues of DUF1735 from 4QNI (DUF1735 + DUF4361) and BT3986 (DUF1735 + LamG3) revealed predicted protein interactions (data not shown).
Based on the shared Ig-like b-sandwich fold of DUF1735, SusE, SusF, and BACON; the genetic localization of DUF1735 at the susE-F-like position in PULs, and the fact that all were predominantly present as multimodular proteins with varying C-termini involved in carbohydrate metabolism, a similar function as N-terminal spacer for various C-termini for DUF1735 was predicted.

Conclusion
Our generated datasets and performed analysis of DUF1735 present in PULDB, KEGG, and UniProt revealed several indications that DUF1735 likely acts as N-terminal membrane-bound spacer of SusE-or SusF-like proteins linked to varying C-termini involved in carbohydrate binding or degradation.The most frequent C-termini being LamG3, F5/8-typeC, and GH18.Further, it was confirmed that DUF1735containing proteins were exclusively encoded in PULs (97.8%) and in genomes of species of the Bacteroidota phylum (98%) with a maximum of 46 copies in the genome of B. thetaiotaomicron 7330.Our comparison of DUF1735-, BACON-, and BACON_2 domaincontaining proteins revealed both commonalities, such as their N-terminal position in multimodular proteins with suggested function of DUF1735 and BACON as membrane-bound spacer and domain repetition in genomes, and distinctions, notably the rather significant differences in the Ig-like b-sandwich fold, the presence of DUF1735 in PULs and its less variable Ctermini.Mutation screenings and further investigation of the partial functionally characterized multimeric DUF1735-containing protein BT3986 (DUF1735 + LamG3), BT3987 (DUF1735 + GH18), BT2624, and BT3790 (both DUF1735 + DUF4361) could unravel whether DUF1735 is crucial for the function of their C-terminus.

Fig. 1 .
Fig. 1.Scheme of the Starch utilization system (Sus) in Bacteroides thetaiotaomicron based on [1,14,58].(A) Scheme of Sus gene cluster consisting of BT3698-BT3705 (annotated as PUL66 in PULDB (www.cazy.org/PULDB/[17])).Gene sizes are represented to scale; red bars indicate the margins of assembled region.(B) The transcription regulator SusR induces transcription of susA-G genes upon maltose binding.SusA-G proteins work synergistically for efficient starch degradation.The three starch-binding outer membrane lipoproteins SusD-F bind starch, bringing the substrate in close proximity to the membrane-bound a-amylase SusG.SusG hydrolyses starch into maltooligosaccharides that transit into the periplasm through the Ton-B dependent transporter SusC.In the periplasm, the neopullulanase SusA and the a-glucosidase SusB hydrolyze maltooligosaccharides into glucose that is transported into the cytoplasm for glycolysis.

Fig. 2 .
Fig. 2. Quantification and taxonomic distribution of DUF1735, BACON, and BACON_2 domains within genomes in KEGG and Uniprot [23,24].(A) Quantification of total, bacterial, and Bacteroidota genomes.(B) Quantification of genomes and genes encoding the target domains, as well as quantification of individual target domains.(C) Taxonomic phyla distribution of the target domain in UniProt.

Table 4 .FEBS
Overview of all crystal structures of DUF1735-, BACON-, and BACON_2-containing proteins.Data accessed in February 2024.+ indicates separate domains, Open Bio 14 (2024) 1133-1146 ª 2024 The Authors.FEBS Open Bio published by John Wiley & Sons Ltd on behalf of Federation of European Biochemical Societies.
FEBS Open Bio 14 (2024) 1133-1146 ª 2024 The Authors.FEBS Open Bio published by John Wiley & Sons Ltd on behalf of Federation of European Biochemical Societies.

Table 2 .
Overview of species containing > 10 DUF1735 in their genome.KEGG database was used due to unique identifier per sequenced genomes.