The first crystal structure of a family 45 glycoside hydrolase from a brown‐rot fungus, Gloeophyllum trabeum GtCel45A

Here we describe the first crystal structure of a beta‐1,4‐endoglucanase from a brown‐rot fungus, Gloeophyllum trabeum GtCel45A, which belongs to subfamily C of glycoside hydrolase family 45 (GH45). GtCel45A is ~ 18 kDa in size and the crystal structure contains 179 amino acids. The structure is refined at 1.30 Å resolution and R free 0.18. The enzyme consists of a single catalytic module folded into a six‐stranded double‐psi beta‐barrel domain surrounded by long loops. GtCel45A is very similar in sequence (82% identity) and structure to PcCel45A from the white‐rot fungus Phanerochaete chrysosporium. Surprisingly though, initial hydrolysis of barley beta‐glucan was almost twice as fast in GtCel45A as compared to PcCel45A.

Brown rot fungi are major wood decomposers in coniferous forests.They are considered generalists or gymnosperm specialists in regard to host preference [1].As the name suggests, a visual consequence of the degradation is browning of residual wood.That is because brown rot fungi primarily depolymerize cellulose and hemicellulose, and leave most of the lignin in the residues [2,3].Substrate degradation is achieved by secretion of various enzymes, such as cellulases, LPMOs and xylanases [4][5][6].In comparison to white rot fungi, brown rot fungi contain less glycoside hydrolase and LPMO encoding genes, yet glycoside hydrolase family 45 (GH45) subfamily C genes appear to be more common in brown rot fungi [7].The brown rot fungus Gloeophyllum trabeum is known to secrete all types of the aforementioned enzymes [8,9].
Fungal cellulases and their use as tools in utilizing industrial lignocellulosic waste have been a research topic for more than 60 years.Perhaps the most studied is the application of fungal cellulases to generate biofuels from cellulosic biomass, often carried out by introducing potent fungal cellulases in the enzyme mixtures.However, a rapid and complete cellulose degradation is not always the desired characteristic in an industrial cellulase.For example, for cellulases used in washing powders, mild cellulolytic activity is desirable to avoid bulk degradation of intact cotton fibers.A common cellulase used for this purpose is the GH45 cellulase from Humicola insolens, HiCel45A [10][11][12][13].
GH45 enzymes are generally small and have the smallest catalytic modules among GH families.Although often called cellulases, they commonly have broad specificity and hydrolyze other cell wall polysaccharides more readily (e.g.beta-glucan, glucomannan), with low activity on insoluble cellulose [10,14].Currently, ~570 entries are listed in GH45 in the CAZy database, ~40 from bacteria, the great majority from fungi, and fewer from other eukaryotes (e.g.mollusks, insects, crustaceans, nematodes).
Phylogenetic analyses have divided GH45 family into three subfamilies.The subfamilies are referred to as either A, B, and C or according to a CAZy database classification as 1, 2, and 3, respectively [10,15,16].The division of GH45 in subfamilies A, B, C predates the CAZy division.To this date, subfamily A, which includes HiCel45A, has been the most studied and subfamily C the least.Subfamilies B and C are more similar than subfamily A, to each other and to non-hydrolytic cell wall-active proteins called expansins, loosenins, and swollenins [17].All these proteins contain a conserved double-psi beta-barrel domain (DPBB), also known as the GH45-like domain.Loosenins and most GH45s consist of this domain alone, while expansins and swollenins also contain a beta-sandwich domain (CBM63) in combination with the DPBB.Swollenins and some GH45s (e.g.HiCel45A) have an additional carbohydratebinding module attached with a flexible linker.There are also examples of GH45s with more complex domain architectures.
GH45 enzymes from all three subfamilies employ an inverting hydrolysis mechanism, with an aspartate residue acting as catalytic acid, which is also conserved in expansins, loosenins and swollenins [14].However, the Asp proposed to act as catalytic base in subfamilies A and B, is missing in subfamily C, as well as in the non-hydrolytic proteins [15].Instead, an asparagine at another position has been proposed to act as catalytic base in GH45 subfamily C, based on neutron diffraction structure studies of PcCel45A from the white-rot fungus Phanerochaete chrysosporium [18].Although the mechanism of subfamily C may differ from A and B, subfamilies B and C are more alike in regard to reaction product profile [14].
Crystal structures of nine GH45 enzymes are available in the PDB (rcsb.org),six from kingdom Fungi (ascomycetes and basidiomycetes) and three from Metazoa (gastropod, bivalve, springtail), but no structure from brown rot fungi.There are structures for six members of subfamily A, two members of B (from mollusks) and only one of subfamily C, PcCel45A.
Here we describe a second structure of a GH45 subfamily C enzyme-GtCel45A from the brown-rot fungus Gloeophyllum trabeum.
Recombinant expression of GtCel45A, biochemical characterization and activity profiling have been published previously, in comparison with a subfamily A enzyme, MtGH45 from the ascomycete Myceliopthora thermophila [19].Activity optima for GtCel45A were at pH 5 and 65 °C, and activities were in a similar range to MtGH45 on carboxymethyl cellulose, beta-glucan, lichenan, and Avicel substrates.

Cultivation, expression, and purification
Spores of Aspergillus nidulans A773 strain expressing GtCel45A (GenBank: EPQ56593) were kindly provided by Dr. Fernando Segato, University of Sao Paulo, Brazil.The generation of constructions is described by Berto et al. [19].Approximate protein size is 18.4 kDa (183 aa), the calculated pI is 4.5.
A spore suspension in water was made from a 13-day old sporulating culture growing on potato dextrose agar (PDA) plates.To prepare a pre-culture, 50 mL of minimal medium (70 mM NaNO ) and maltose (3%) were inoculated with 200 lL of the spore suspension.The pre-culture was incubated in shake flasks at 25 °C, 120 rpm for 10 days.
Protein expression was done in the same medium at 30 °C in 1 L shake flasks, 4 9 350 mL cultures, shaking at 75 rpm.Cultures were harvested after 7 days, twice filtered (through 1 lm GF/B Whatman glass-fiber filter, followed by vacuum filtration through a 0.45 lm PES membrane).
Protein concentration was determined at 280 nm with a NanoDrop UV-Vis Spectrophotometer using an extinction coefficient of 0.1% = 1.71, which was calculated by Protein Identification and Analysis Tools on the Expasy Server [20].The purified protein was concentrated to 14.12 mgÁmL À1 in 700 lL.
The condition was further optimized in hanging drops to 0.1 M HEPES, pH 7.5, 20% PEG 3350 and 3 mM NiCl 2 with protein solution concentration of 14 mgÁmL À1 .Protein and reservoir proportion was 1 : 1 in hanging drops.
To reassure the identity of the crystallized protein, a protein crystal was dissolved in distilled water and the sample analyzed by SDS/PAGE (Fig. S2).

Data collection and structure determination
X-ray diffraction data were acquired at ID30B beamline at European Synchrotron Radiation Facility (ESRF), Grenoble, France.7000 images were collected with an oscillation angle of 0.10°and X-ray exposure time of 0.02 s with 4.9% transmission.
4000 images were used and processed using XDS software package [21].The structure was solved by molecular replacement with phenix.phaserfrom PHENIX suite [22] using the mature protein sequence of Cel45A from G. trabeum (UniProtKB: S7QB86) and coordinates from PDB entry 5KJO (P.chrysosporium Cel45A) as search model.The structure was refined using phenix.refinefrom PHENIX suite.Visual inspection and real space refinement were carried out using COOT [23].Statistics are summarized in Table 1.The structure was deposited in the Protein Data Bank under ID: 8BZQ.
Protein structure figures were prepared with PYMOL Molecular Graphics System, Version 2.0 Schr€ odinger (LLC, New York, NY, USA).Polar contacts were identified in PYMOL.Secondary structure plot was created with EMBL-EBI online service PDBSum, where the secondary structure motifs are computed by v.3.0 of Gail Hutchinson's PROMO- TIF program [24].

Bioinformatic analysis
The protein-protein BLAST algorithm of the blastp suite was used to search for similar sequences [25][26][27].The following sequences were used for query: Mytilus edulis MeCel45A, Ampullaria crossean AcCel45A, Humicola insolens HiCel45A, P. chrysosporium PcCel45A, Trichoderma reesei TrCel45A.The top 100 sequences with minimum 50% identity were collected and later revised to remove incomplete or identical sequences.
For each subfamily, a multiple sequence alignment was generated by MUSCLE (version 5) tool for multiple alignment [28].Consensus sequences were created using UGENE (version 41.0) [29][30][31], with consensus type "strict" and 90% threshold.Minimum of 50 sequences were used for the generation of each consensus sequence.Visualization of consensus sequence alignment was carried out using ESPRIPT 3.0 [32].To determine the percentage of sequence identity between GtCel45A and PcCel45A, and Cel45A from Gymnopilus dilepis, pairwise sequence alignments were made using EMBOSS Needle available online at the EMBL-EBI server (ebi.ac.uk) [33].T-COFFEE (Version 11.00) was used to create a structure-based multiple sequence alignment [34].Domain identification in Cel45A from G. dilepis was carried out with ScanProsite tool available online at Swiss Bioinformatics Resource Portal (expasy.org)[35].
To mark the secondary sequence elements, the following crystal structures were visualized in PYMOL and used as a template for the corresponding subfamily: HiCel45A (PDB ID: 2ENG), MeCel45A (PDB ID: 1WC2), GtCel45A (PDB ID: 8BZQ), and a homology model for TrCel45A.A homology model for GdCel45A (GenBank: PPQ98991.1)was created using RoseTTAFold [36].PYMOL was used to generate the 3D structure images of all GH45 enzymes in this paper.All figures were made ready for publication using a vector design application AFFINITY DESIGNER (version 1.10, 8ab56).
For the modeling of cellulose chain binding in GtCel45A, GtCel45A structure was superposed with the structure of PcCel45A in complex with two cellopentaose molecules, binding in À5 to À1 and +1 to +5, respectively (PDB: 3X2M).To improve cellulose (G10) fit in the substrate binding area, adjustments were made using Coot.The glucose residue in the À1 subsite was replaced with a distorted À1 Glc from an MD simulation model of cello-oligo binding in HiCel45A [37].The Asn95 sidechain clashed with the +1 Glc unit and was therefore rotated to the orientation that the corresponding residue has in the PcCel45A complex structure (PDB: 3X2M).The G10 chain was merged with the GtCel45A structure regularized in Coot.The resulting structure was refined in PHENIX.The G10 chain was reconstructed by merging the À5 to +1 Glc units from the refined structure (BGC210-205; including glycosidic oxygen O4 from BGC204) with the +2 to +5 units from the PcCel45A complex (BGC204-GLC201).

Enzyme activity assays
GtCel45A and PcCel45A at 0.1 lM concentration were each incubated at 30 °C, 400 rpm, in 0.1 M sodium acetate buffer pH 5.0 with 0.1% barley beta-glucan (purity ~95%, Megazyme, Wicklow, Ireland) dissolved in water. 1 M sodium hydroxide solution was added in a 1 : 1 ratio to the samples to stop the reaction, which was followed by cooling the samples on ice.Reducing sugar formation was determined colorimetrically by adding p-hydroxybenzoic acid hydrazide (PHBAH, Sigma) solution: 0.1 M PHBAH; 0.2 M sodium potassium tartrate; 0.5 M sodium hydroxide solution [38].The PHBAH solution was prepared directly before use and added in 1 : 1 ratio to the samples.Samples containing the PHBAH reagent were immediately incubated at 95 °C for 15 min and then cooled on ice for 10 min.Samples were kept at room temperature for 5 min before absorption was read at 410 nm.

Results and Discussion
Overall structure of GtCel45A The final crystal structure model of GtCel45A was refined at 1. 3 A resolution and R/R free values 0.166/0.183.The space group was P21 with 1 protein chain per asymmetric unit.Data collection and refinement statistics are summarized in Table 1.The structure has been deposited in the Protein Data Bank under PDB ID: 8BZQ.
GtCel45A has a globular structure, with dimensions approximating to 26 A 9 37 A 9 48 A. It contains a six-stranded beta barrel with a double-psi beta barrel (DPBB) fold, also known as the N-terminal domain of expansins, surrounded by six short helices and seven long loops.An open substrate binding groove spans across the surface and is 48 A in length, with the widest part of around 14 A near the center and depth around 12 A (Fig. S1).The first four amino acids that are present in the mature protein sequence (LEER) are not visible in the crystal structure.
The secondary structure along the protein sequence is depicted in Fig. 1 and the 3D structure with labeled secondary structure elements is shown in Fig. 2A.
In the following, beta strands are assigned letter b and numbered based on the order in the amino acid sequence.Loops in the structure are assigned a letter depending on their position relative to the substrate binding area (Fig. 2A).Loops positioned above the substrate binding area (where reducing end of the cellulose chain is to the right side relative to the active site) are assigned capital letter A, loops belowcapital letter B. Loop numbering is based on their order in the amino acid sequence.The structure model begins with loop A1, which contains a three amino acid long alpha helix.Five amino acids connect the helix to the first beta strand (b1), which is connected to the next beta strand (b2) via 15 amino acid loop (B2) (Fig. 2A).The secondary structure pattern of helix -b-strand -b-strand is repeated once more (Fig. 1), where at least 12 amino acid long loops connect the secondary structure elements.This pattern is followed by two repetitions of helix -b-strands.Two helices and long loops make up the C-terminal part of the structure.
On loop B2, shortly after strand b1, is located Tyr22a residue conserved in GH45 enzymes, as well as expansins.Gly24 on loop B2 is at the same position as the catalytic base in GH45 subfamily A and B members, for example Asp10 in HiCel45A and Asp24 in MeCel45A.On loop 5 (A5), between strands b4 and b5, is located an Asparagine (Asn95) residue (Fig. 2A).An asparagine in this position is regarded as the assisting residue in subfamily A enzymes.The putative catalytic acid Asp117 is found at the end of b5.The GtCel45A structure contains five disulfide bridges formed by cysteine pairing.All disulfide bonds appear to become reduced with the increase of X-ray exposure time during data collection.Two of the cysteine pairs (Cys61/Cys145, Cys161/Cys175) are found at either end of loop B7.Loop B7 carries Trp157 in the same position as Trp154 in PcCel45A (structure with two cellopentaose molecules, PDB ID: 3x2m) and similar to Trp64 in MeCel45A, which function as sugarbinding platform in subsite À4.
The position of loop A5, which carries the putative alternate base Asn95, is fastened by Cys90/Cys99, which positions the Asn95 toward the catalytic acid Asp117.Molecular dynamics simulations have shown that in the absence of a substrate, a corresponding loop in PcCel45A is able to enclose toward Asp117 (Asp114 in PcCel45A), forming a polar contact between Asn95 and Asp117 (Asn92 and Asp114 in PcCel45A) [39].

GH45 subfamily C
The other molecular structure which has been deposited in PDB and belongs to subfamily C is PcCel45A.GtCel45A has 82% sequence identity with PcCel45A and 88% sequence similarity.The crystal structure of GtCel45A consists of 179 amino acids while PcCel45A of 180.At the end of the loop B4, GtCel45A has His80, while two residues, Gly76 and Gln77, are located at this position in PcCel45A.
A total of 31 residues differ in the crystal structure of GtCel45A relative to the crystal structure of PcCel45A.Most are away from the substrate binding groove (Fig. 2C), three can be found in its periphery (Ala93/Gln90, Trp98/Phe95, Thr158/Asn155 in GtCe-l45A/PcCel45A respectively), but none at the catalytic site.All available PcCel45A crystal structures to date are missing the first four N-terminal residues present in the mature protein sequence.
The subfamily C enzymes are unique in GH45 family due to the ability to achieve hydrolysis despite lacking a traditional catalytic base residue in the active site (Fig. 2B).Both subfamily A and subfamily B enzymes have aspartic acid residues acting as a catalytic base in the active site, while subfamily C has a glycine residue at this position.
It has been suggested that subfamily C members utilize an asparagine residue (Asn95 in GtCel45A) in their catalytic mechanism, instead of using the residue corresponding to the position of catalytic base in the other two subfamilies.The Asn95 is positioned on a loop enclosing toward the active site.Asparagines are conserved at this location in subfamily B and C, and aspartic acid-in subfamily A (Fig. 3).Mutation of the asparagine residue at this position in the subfamily C enzymes from P. chrysosporium and Fomitopsis palustris leads to a significant decrease in their hydrolytic activity [41].
According to this mechanism, the imidic acid form of Asn95 could act as a general base in GtCel45A.
Consensus sequences show that 14 residues are strictly conserved in all of the GH45 subfamilies known to date (Fig. S3).Subfamily B sequences appear less conserved, presumably due to the larger diversity of organisms expressing subfamily B enzymes mollusks and fungi.
As mentioned previously, the absence of an acidic residue at the catalytic base position distinguishes subfamily C from the other two subfamilies (Fig. S3).However, there is further diversity among these enzymes which requires investigation.Most subfamily C enzymes are one domain proteins, but at least one appears to have a CBM, Gymnopilus dilepsis Cel45A (GenBank: PPQ98991.1)[42], which has an N-terminal CBM1 domain connected by a linker to the catalytic domain (Fig. S4).After removal of the CBM1 and linker from the sequence, G. dilepis Cel45A (GdCel45A) becomes 75% sequence identical to GtCel45A.
Thus far studies regarding enzymatic activity have not been carried out at identical conditions, therefore a uniform opinion on the catalytic activities of subfamily C enzymes cannot be made.It is clear that GH45 subfamily enzymes are able to hydrolyze CMC, barley beta-glucan and glucomannan [14,15,19,41].Studies show that the pH optimum is around pH 5, when determined on CMC, and some of GH45 subfamily C enzymes appear stable at 50 °C [19,41].
We compared the hydrolytic activity of GtCel45A and PcCel45A on barley beta-glucan at 30 °C, pH 5. Despite the high sequence and structural similarity, GtCel45A exhibited ~3.8 times higher initial rateit took 2.15 min for 0.1 lM GtCel45A to produce 112 AE 19 lM of reducing ends and 8.15 min for 0.1 lM PcCel45A to produce 111 AE 20 lM.Moreover, GtCel45A reached a plateau faster than PcCel45A (Fig. 4).At present we do not have any explanation for that difference.

Conclusions
Cel45A from the brown-rot fungus G. trabeum is a GH45 subfamily C member with a nearly identical structure to the Cel45A from white-rot fungus P. chrysosporium, and yet shows higher catalytic activity.The diversity within GH45 subfamily C would benefit from elucidation, as there is minimum one subfamily C member which has a CBM1 domain, the Cel45A from G. dilepis.The GH45 family is strictly conserved regarding 14 residues.molecular weight of GtCel45A, thus confirming the identity of the purified and crystallized protein.

Fig. 1 .
Fig. 1.Secondary structure plot for GtCel45A.Arrows represent beta strands (A), spirals-alpha helices (H), number connections show cysteine disulfide pairings, letter b indicates a beta turn motif, a beta hairpin motif is depicted in red.

Fig. 2 .
Fig. 2. Overall structure of GtCel45A.(A) Ribbon drawing with numbering of loops, corresponding amino acid range indicated in brackets; (B) Superposed active site of GtCel45A (green) and PcCel45A (blue) with substrate bound; (C) GtCel45A surface structure.Residues in GtCel45A that are unidentical to residues in PcCel45A are indicated in red on the surface structure of GtCel45A.Catalytic center indicated with an arrow.Residues positioned at the location of catalytic residues in subfamily A are circled in orange in the sequence alignment.[Correction added on 08 February, 2024, after first online publication: The residue numbering of loop A4 in Figure 2A has been corrected in this version.]

Fig. 3 .
Fig. 3. Structure-based sequence alignment of H. insolens Cel45A, T. reesei Cel45A and G. trabeum Cel45A.Alignment visualized in ESPRIPT 3.0.Secondary structure elements are represented as rounded rectangles (helices) and arrows (b-strands).Character coloration according to ESPript 3.0: filled red box and a white character indicate strict identity; red character-similarity within a group.Active site residues of subfamily A and residues at the corresponding location in other subfamilies are marked with a yellow frame.
Fig. S3.(A) An alignment of GH45 subfamily A, B and C consensus sequences; (B) An alignment of subfamily B consensus sequence, subfamily B consensus sequence in phylum Ascomycota, and phylum Mollusca.Fig. S4.Homology structure model of the bimodular GH45 subfamily C enzyme GdCel45A from Gymnopilus dilepis with linker and CBM1.

Table 1 .
Data collection and refinement statistics.Values taken from the validation report for the deposited structure.