Sequence mining yields 18 phloretin C‐glycosyltransferases from plants for the efficient biocatalytic synthesis of nothofagin and phloretin‐di‐C‐glycoside

C‐glycosyltransferases (C‐GTs) offer selective and efficient synthesis of natural product C‐glycosides under mild reaction conditions. In contrast, the chemical synthesis of these C‐glycosides is challenging and environmentally harmful. The rare occurrence of C‐glycosylated compounds in Nature, despite their stability, suggests that their biosynthetic enzymes, C‐GTs, might be scarce. Indeed, the number of characterized C‐GTs is remarkably lower than O‐GTs. Therefore, discovery efforts are crucial for expanding our knowledge of these enzymes and their efficient application in biocatalytic processes. This study aimed to identify new C‐GTs based on their primary sequence. 18 new C‐GTs were discovered, 10 of which yielded full conversion of phloretin to its glucosides. Phloretin is a dihydrochalcone natural product, with its mono‐C‐glucoside, nothofagin, having various health‐promoting effects. Several of these enzymes enabled highly selective production of either nothofagin (UGT708A60 and UGT708F2) or phloretin‐di‐C‐glycoside (UGT708D9 and UGT708B8). Molecular docking simulations, based on structural models of selected enzymes, showed productive binding modes for the best phloretin C‐GTs, UGT708F2 and UGT708A60. Moreover, we characterized UGT708A60 as a highly efficient phloretin mono‐C glycosyltransferase (kcat = 2.97 s‒1, KM = 0.1 μM) active in non‐buffered, dilute sodium hydroxide (0.1–1 mM). We further investigated UGT708A60 as an efficient biocatalyst for the bioproduction of nothofagin.

having high metabolic stability. [4,5] The first studies of C-GTs were published in the early 2000s [6][7][8] with the majority of known C-GTs discovered within the last few years. [9,10] Most reported C-GTs are from plants, display the GT-B structural fold, [11] utilize UDP-Glc as glycosyl donor, and react on the aromatic ring of specific hydroxylated acceptors. [1,3] These C-GTs belong to the glycosyltransferases family 1 in the Carbohydrate Active enZyme (CAZy) classification. [12] Early C-GT discovery studies were motivated by the presence of C-glycosylated compounds in particular organisms such as Pueraria lobata [13] and Glycyrrhiza glabra [14] and species of bamboo and cereals. [9,15] Fundamental properties that make C-GT discovery particularly challenging, include their high sequence similarity with related O-glycosyltransferases (O-GTs), and the occurrence of both C-and Oglycosylation activities in a single enzyme towards related compounds or even the same aglycon. [14,16] Moreover, the C-glycosylation activity of a C-GT might be shifted to O-glycosylation by simple amino acid substitutions; [17] with some C-GTs capable of forming all four types of glycosidic linkage (C-, O-, N-, and S-). [14,18] In this study, we aimed to discover new C-GTs based exclusively on the protein primary sequence. C-GT candidates from different plants were selected based on sequence identity (>53%) with known C-GTs and/or specific sequence motifs; DPF and DPFXL motifs were previously described in the majority of reported C-GTs. [1] In addition, we selected for the PSPG (plant secondary product glycosyltransferase) consensus motif, which is conserved among UDP-dependent glycosyltransferases. [19] C-GT candidates were recombinantly overproduced in Escherichia coli, partially purified, and screened for activity towards common C-glycosylation acceptor substrates. 18 new phloretin C-GTs were discovered, with four of them yielding a single product, that is, either nothofagin or phloretin-di-C-glycoside. These four were further investigated functionally and structurally using AlphaFold and molecular docking. Specifically, this study identifies UGT708A60 from Hordeum vulgare (barley) as a highly active phloretin mono-C-GT forming nothofagin, and characterizes it biochemically by estimating its kinetic parameters as well as pH and temperature optima.

Reagents and chemicals
Phloretin, phlorizin dehydrate (both ≥99%) and naringenin (≥95%) were purchased from Sigma-Aldrich. Nothofagin (>98%), vitexin (≥97%), and isovitexin (≥98%) were purchased from Carbosynth, Santa Cruz Biotechnology and Merck chemicals, respectively. All reagents were analytical grade. protease (10 μg/ml final, produced in house using the pRK793 plasmid) was added to the sample and TEV cleavage was carried out at 10 • C for 5 h. The sample was then loaded in the HisTrap FF column, which was equilibrated with 25 mM HEPES, 150 mM NaCl, pH 7.5, and purification performed using a gradient from 0 to 250 mM imidazole with the same buffer. The flow-through fractions with UGT708A60

Sequence selection
were collected and concentrated in the storage buffer (25 mM HEPES 150 mM NaCl, pH 7, 1 mM DTT). The sample purity was analyzed using SDS-PAGE and was >85% ( Figure S2).

In vitro activity screening
In vitro glycosylation reactions with the partially purified protein sam-

Phloretin C-glucosylation by selected UGTs
To investigate phloretin mono and di-C-glycosylation reactions over

Biochemical characterization of UGT708A60
For the identification of the pH optimum of the enzyme, activity was
Mixtures of water and acetonitrile containing 0.1% formic acid were used as the mobile phase. A gradient from 5% to 25% acetonitrile for 1.5 min followed by gradients from 25% to 80% for 2 min and from 80% to 100% for 1.5 min were applied for the separation of phloretin and its glucosides as well as apigenin and naringenin reactions. The flow rate was 1 ml/min, and most analytes were detected at 290 nm. Apigenin, vitexin and isovitexin were detected at 320 nm.

Sequence analysis and structure modelling
The using AlphaFold v2.0, using all available structural homologs, and the database search preset was set to "reduced_dbs." [22] Only the highest ranking (in pLDDT score) models were used in further analyses (Table S2).

Molecular docking
Binary complexes of protein and sugar donor were obtained by structurally aligning protein model structures on the crystal structure of PtUGT1 from Polyngonum tinctorium, which has a TA B L E 1 Conversion of common C-glycosylation substrates by C-GTs candidates

Estimation of raw material costs
Reaction conditions for the cyclodextrin-containing process were taken from Schmölzer et al. [28] as follows: 120 mM phloretin, 300 mg
Their activities on the common C-glycosylation substrates phloretin, its C-glucoside nothofagin as well as phlorizin, apigenin and naringenin were measured (Table 1). There were only trace amounts of apigenin Cglucoside detectable in the reaction with UGT708AC2, whereas almost all investigated enzymes showed C-glycosylation activity towards phloretin and nothofagin. Ten C-GT candidates yielded complete glucosylation of phloretin, whereas the others did not reach complete conversion of 100 μM phloretin after 22 h (Table 1 and Figure S1).
Phloretin di-C-glucoside was detected as the final product for most of  Figure 2D).
To investigate structural differences responsible for the different mono-and di-C-glucosylation preferences for the four enzymes, structural models using AlphaFold2 were generated. [22] Structural analysis of four selected UGTs was assessed, including the binding of phloretin in presence of both donor (UDP-glucose, superimposed) and the acceptor phloretin. We investigated three parameters to assess if a docking pose could be reactive, relative nucleophilic attack distance (distance C1 glc -C3' phlo , Figure 3A), deprotonation of the phenolic oxygen by the catalytic histidine (distance Nδ His -OH2' phlo ), and orbital angles (angle between the vector perpendicular to the A aromatic ring of phloretin and the C1 glc -O1 glc bond, Figure 3B). Reactive poses for phloretin were obtained for both UGT708F2 and UGT708A60, presenting coplanarity angles of 13.5 • and 17.2 • , respectively, C1 glc -C3' phlo distances of 4.5 Å and 4.4 Å, respectively, and Nδ His -OH2' phlo distances of 3.2 Å for both ( Figure 3). On the other hand, the pose yielding phloretin best positioned for a C-glucosylation reaction, according to angles and distances, in UGT708D9 and UGT708B8 presented the aromatic A ring in a tilted orientation (41.1 • and 31.6 • angles), and in UGT708B8, the nucleophilic attack distance was too long (5.5 Å, Figure 3). These All four enzymes have the residues known to enable C-glycosylation: the DPF motif (90-92), the catalytic dyad H24 and D121, and R283 (indexing from UGT708A60). [31] Notably, in all modelled structures, R283 forms a salt bridge with the D90 of the DPF motif (only D is shown in Figure 3). Interestingly, this Arg, which is not otherwise conserved in the GT1 family, was found to be present in all 18 sequences selected based on this DPF motif ( Figure S3). Chen and colleagues analyzed two highly similar enzymes (91% sequence identity) from Mangifera indica, MiCGT and MiCGTb, which catalyze either mono-(MiCGT) or di-C-glycosylation (MiCGTb). [16,18,32] It was shown that I152 of MiCGTb had a crucial role in di-C-glycosylation, and its mutation to glutamate converted the enzyme to a mono-C-GT. Moreover, the MiCGT mutant E152I efficiently catalyzed di-C-glucosylation, convincingly pinpointing residue 152 as the determining factor between these two enzymes. [18] They also reported that MiCGT mutants E152F and E152C produced di-C-glucosides with ∼20% and ∼10% conversion rates, respectively. However, UGT708D9 and UGT708B8, where the analogous residues are F150 and F148, respectively (Figure 3), achieve complete di-C-glycosylation, while UGT708F2 (F153) and UGT708A60 (C153) have severely reduced di-C capability. Hence, while the residue 152 was the essential discriminant in the Mangifera indica enzymes, it does not expand to the proteins analyzed here.
Chen and co-workers proposed that the 152 position had an effect on the active site size, and that di-C-glucosylation was allowed by the wider active site. A similar observation was reported in a sperate study with another di-C-glycosyltransferase GgCGT. [14] However, the comparison of modelled structures revealed that bi-C-glycosylation capable UGT708D9 and UGT708B8 had smaller binding sites, when compared with UGT708F2 and UGT708A60. Indeed, UGT708D9, the most proficient at di-C, presents bulky hydrophobic residues (W88 and F120 instead of Phe and Ala/Thr in other three, respectively) that considerably reduced the acceptor binding site size. UGT708B8, which favors mono-over di-C by an order of magnitude, presents slightly more space with M117 instead of Val/Ile. In contrast, all residues in the binding site vicinity of UGT708F2 were equal or smaller in size than in UGT708D9/UGT708B8, and UGT708A60, having the largest site, also comprised C153 instead of phenylalanine, and A123 instead of threonine/phenylalanine. Hence, between these four enzymes, the smaller the active site, the more proficient at di-C-glycosylation the enzyme is found to be.
We also performed amino acid sequence analysis of new tively. According to the BRENDA database on July 8 th , 2022, it is the lowest K M ever reported for a GT1 enzyme against any acceptor. [33] The enzyme efficiently catalyzes synthesis of nothofagin showing a catalytic efficiency of 2.8 × 10 7 s -1 M -1 which is the highest reported so far for selective nothofagin-producing enzymes ( Table 2). The enzyme also presents a slight substrate inhibition, that quickly plateaus at 1.60 s -1 ( Figure S5). Another phloretin C-GT with a comparable catalytic efficiency (>2.4 × 10 7 s -1 M -1 ) to UGT708A60 is FcCGT, [10] however, it was characterized as a di-C-GT producing both: phloretin and nothofagin.
Although enzymatic conversion of phloretin to nothofagin is likely competitive with conventional chemistry with respect to environmental sustainability, it must also be economically feasible to be implemented. The state-of-the-art of enzymatic nothofagin production uses cyclodextrin to solubilize the hydrophobic phloretin. [28] We estimate the raw materials of this approach to amount to 1536 USD per gram of nothofagin produced, driven by the high price of cyclodextrin (Table S3). If, instead, we propose to produce nothofagin at lower titers with higher water consumption, as demonstrated here, thus omitting the need for cyclodextrin, we arrive at a raw material cost of 556.8 USD per gram, or 5.9 if UDP-Glc is recycled with the SuSy system, if compatible conditions could be found (e.g., regarding pH, fructose concentrations, etc). [28] This paves the way for biotechnological production of this valuable compound.

CONCLUSIONS
C-glycosyltransferases (C-GTs) are attractive enzymes for biotechnological production of valuable C-glycosylated polyphenols. However, information about their substrates, specificity and reaction mechanism are limited, due in part by the small number of these enzymes discovered to date. To discover new C-GTs, we mined sequence databases and successfully identified and produced 18 C-GTs, all presenting activity OsCGT [15] 10.84 4.78 2.2 × 10 6 0.587 MiCGTb [32] 0.79 166 0.047 × 10 5 -UGT708A60 2.97 0.1 2.8 × 10 7 on the polyphenol phloretin. While the molecular discriminants that govern the mono-and di-C-glucosylation balance could not be determined, we showed that a narrower active site does not necessarily favor the mono-C-glucosylation. Our results suggest that previously identified residues and properties do not translate to all the C-GT enzymes. Hence, it is important to discover and report more systems presenting either mono or di-C-glycosylation, to obtain a comprehensive view of the mechanistic determinants. Moreover, we fully characterized UGT708A60, and propose it as an efficient biocatalyst for green and economically feasible production of the mono-C-glucoside nothofagin.

CONFLICT OF INTEREST
The authors declare no competing interest.

DATA AVAILABILITY STATEMENT
The data that supports the findings of this study are available in the article and/or the supplementary material of this article.