M. G. Tuohy, Molecular Glycobiotechnology Group, Department of Biochemistry, National University of Ireland, Galway, Ireland. Fax: +353 91 512504, Tel.: +353 91 524411, E-mail: email@example.com
The X-ray structure of native cellobiohydrolase IB (CBH IB) from the filamentous fungus Talaromyces emersonii, PDB 1Q9H, was solved to 2.4 Å by molecular replacement. 1Q9H is a glycoprotein that consists of a large, single domain with dimensions of ≈ 60 Å × 40 Å × 50 Å and an overall β-sandwich structure, the characteristic fold of Family 7 glycosyl hydrolases (GH7). It is the first structure of a native glycoprotein and cellulase from this thermophilic eukaryote. The long cellulose-binding tunnel seen in GH7 Cel7A from Trichoderma reesei is conserved in 1Q9H, as are the catalytic residues. As a result of deletions and other changes in loop regions, the binding and catalytic properties of T. emersonii 1Q9H are different. The gene (cel7) encoding CBH IB was isolated from T. emersonii and expressed heterologously with an N-terminal polyHis-tag, in Escherichia coli. The deduced amino acid sequence of cel7 is homologous to fungal cellobiohydrolases in GH7. The recombinant cellobiohydrolase was virtually inactive against methylumberiferyl-cellobioside and chloronitrophenyl-lactoside, but partial activity could be restored after refolding of the urea-denatured enzyme. Profiles of cel7 expression in T. emersonii, investigated by Northern blot analysis, revealed that expression is regulated at the transcriptional level. Putative regulatory element consensus sequences for cellulase transcription factors have been identified in the upstream region of the cel7 genomic sequence.
Cellulose is the major constituent of all plant materials and is the most abundant organic molecule on Earth [1,2]. Microbial breakdown of cellulose creates the potential for the production of energy [3–5]. Cellulases are used in waste recycling processes and in the processing of cellulose-rich raw materials for the paper and textile industries . Cellulose is composed of repeating glucose units, where each glucose unit is rotated 180° relative to its neighbours along the main axis, so that the basic repeating unit is cellobiose. Plant cellulose exists in a highly crystalline form. Hydrolysis of cellulose requires the co-operative action of three classes of cellulolytic enzymes, namely endo-β-1,4-glucanases (EC 18.104.22.168), cellobiohydrolases (EC 22.214.171.124) and β-glucosidases (EC 126.96.36.199). The CAZy (carbohydrate active enzymes)  classification system collates glycosyl hydrolase (GH) enzymes into families according to sequence similarity, which have been shown to reflect shared structural features. To date, GH enzymes are members of 87 families, of which 43 have been assigned a retaining mechanism of action, 24 an inverting mechanism, and the stereochemical mode of action of the remaining families have yet to be determined. The endoglucanases are commonly characterized by a groove or a cleft into which a linear cellulose chain can fit in a random manner. Classically, exoglucanases such as the cellobiohydrolases (CBHs) possess tunnel-like active sites, which can only accept a substrate chain via its terminal regions . These exo-acting CBH enzymes act by threading the cellulose chain through the tunnel, where successive cellobiose units are removed in a sequential manner. Sequential hydrolysis of a cellulose chain is termed ‘processivity’. However, some cellulase enzymes are capable of both endo- and exo-actions [10,11]. Moreover, some GH families include both endo- and exo-enzymes, indicating that the mode of action can be independent of sequence homology and structural fold. Relatively minor changes in the lengths of relevant loops in the general proximity of the active site in such enzymes, may dictate the endo- or exo-mode of action without significant differences in the overall fold. In Trichoderma reesei Cel7A, deletion of the exo-loop (residues 243–256) has been shown to decrease activity against crystalline cellulose. It was therefore postulated that the exo-loop has evolved to facilitate processive hydrolysis of crystalline cellulose by T. reesei Cel7A . Fungal cellulolytic enzymes reported to date comprise a single polypeptide chain, frequently glycosylated, which contains a catalytic domain usually connected to a cellulose-binding domain by a proline/serine/threonine-rich linker . CBHs from Humicola grisea, Phanerochaete chrysosporium and Aspergillus niger have been shown to consist solely of a catalytic domain. The most characterized CBH members of GH7 are Cel7A from T. reesei and Cel7D (CBH58) from P. chrysosporium. Both CBHs consist of two β-sheets that pack face-to-face to form a β-sandwich . Cel7A from T. reesei is composed of long loops, on one face of the sandwich, that form a cellulose-binding tunnel of ≈ 50 Å. The catalytic residues are glutamate 212 and 217, which are located on opposite sides of the active site, separated by an intervening distance consistent with a double-displacement retaining mechanism . Members of GH7 are thought to follow a retaining mechanism of action. Kinetic parameters and enzyme–ligand interactions of GH7 enzymes are well characterized [19–21]. Genes from this family have been cloned and characterized from a variety of fungal sources, including H. grisea, T. reesei[22,23], Penicillium janthinellum, P. chrysosporium and Aspergillus species [16,25,26], but until recently, never from a truly thermophilic fungal species .
The thermophilic aerobic fungus, Talaromyces emersonii, isolated from composting biomass, produces a completely thermostable cellulase system that has not been fully characterized to date [21,28–30]. CBH enzymes from T. emersonii have been purified, characterized and assigned to GH families 6 and 7 [21,27,31]. Protein thermostability is not, however, reflected in the overall fold of a protein and is thought to be the result of more localized differences, causing thermophilic enzymes to be somewhat less flexible than mesophilic enzymes . In this article we present the 3D structure of the native CBH IB from T. emersonii, the first structure of any protein from this source and the first structure of a native fungal CBH core (glycoprotein). Molecular cloning, transcriptional regulation analysis and overexpression of the cel7 gene in Escherichia coli are also reported. The 3D structure has been deposited in the Protein Data Bank as 1Q9H.
Fungal strain and growth conditions
Mycelia harvested from cultures grown from T. emersonii strain CBS 814.70 at 45 °C on Sabouraud dextrose agar were used to inoculate liquid nutrient media, as described previously . Cultures were grown at 45 °C with shaking at 220 r.p.m. At appropriate time-points, mycelia were harvested by filtration through several layers of fine-grade muslin, washed with 75 mm sodium citrate, pH 7.5, and frozen immediately under liquid nitrogen for nucleic acid extraction.
PCR cloning of genomic DNA
Chromosomal DNA was isolated from T. emersonii mycelia harvested after 24 h of culture on 2% (w/v) glucose, by using the method of Raeder & Broda . Amplification of a DNA fragment encoding a portion of the catalytic domain of T. emersonii cel7 was performed by using PCR and degenerate primers designed from alignments of existing CBH sequences in the databases. Reaction cocktails contained 2.5 U of Qiagen HotStar™Taq DNA polymerase, 1× buffer (Qiagen, Crawley, West Sussex, UK), 0.5× Q solution, 200 µm of each deoxynucleotide triphosphate, 1.5 mm MgCl2 and 1 µm of the appropriate gene-specific primers. Reaction conditions for PCR amplification were 94 °C for 15 min (initial DNA polymerase activation), 94 °C for 1 min, 50–60 °C for 1 min and 72 °C for 1 min, followed by a final extension of 10 min, for 30 cycles. PCR products were separated by electrophoresis through a 1.2% (w/v) agarose gel and subsequently purified by using a Wizard PCR preps DNA purification system (Promega, Southampton, UK) and subcloned into the pGEM-T easy vector (Promega), following the manufacturer's guidelines. Plasmids were purified from E. coli JM109 cultures by using a spin miniprep kit (Qiagen), and sequenced. Sequencing reactions were carried out by Altabioscience Laboratories (University of Birmingham, Birmingham, UK). Sequence analysis and database similarity searches were performed by using the online program blast against protein (blastp) and nucleotide (blastx and blastn) sequences stored at the National Centre for Biotechnology Information (NCBI).
Rapid amplification of cDNA ends
RNA (10 µg), isolated after growth (48 h) of T. emersonii on solka floc (ball-milled cellulose), was used as a template for RACE, which involved a modification of the manufacturer's (Ambion Europe Ltd., Huntingdon, Cambridgeshire, UK) RACE protocol described previously . An aliquot (1 µL) of the reaction mixture was used as a template for performing 5′- and 3′-RACE PCRs by using the outer and inner RACE primers supplied by the manufacturer and the outer and inner gene-specific primers designed from the cel7 PCR products. The Cel7 outer and inner RACE primers were as follows: outer 5′-RACE primer CATGCGGTAAGGGTTGAAGTCACA-3′; inner 5′-RACE primer 5′-GTTTGCTTCCCAGACATCCATC-3′; outer 3′-RACE primer 5′-ATGCTGTGGTTGGATTCCGACTAC-3′; and inner 3′-RACE primer 5′-AACTCCTACGTGACCTACTCGAAC-3′. PCR products were cloned and sequenced as described previously.
Isolation of cel7 cDNA and genomic genes
Full-length genomic and cDNA sequences corresponding to cel7 were amplified from T. emersonii first-strand cDNA and genomic DNA, respectively, by PCR with primers corresponding to the 5′ start and 3′ stop sequences identified in the 5′- and 3′-RACE products. The cel7 sense and antisense primers were 5′-ATGCTTCGACGGGCTCTTCTTCTA-3′ and 5′-TCACGAAGCGGTGAAGGTCGAGTT-3′, respectively. Reactions contained 1.25 U of Pfu DNA polymerase, 1 × Pfu reaction buffer, 200 µm of each deoxynucleotide triphosphate and 1 µm of the appropriate gene-specific primers. PCR products were gel purified, subcloned and sequenced as described previously.
Northern blot analysis and genomic library screening
Northern blot analysis of cel7 expression was carried out as described previously . A T. emersonii Sau3A genomic library was prepared in LambdaGEM-11 (Promega). E. coli KW251 was used as the host strain in the preparation and screening of the genomic library. Plaque lifts were carried out as described by Sambrook et al. . Hybridization was conducted overnight at 68 °C in 5× NaCl/Cit, 0.1% (w/v) N-lauroylsarcosine, 0.02% (w/v) SDS and 1% (w/v) blocking reagent. Full-length dioxygenin (Roche Molecular Biochemicals, Roche Diagnostics Ltd., Lewes, East Sussex, UK)-labelled cel7 (20 ng·mL−1 of hybridization buffer) was used as a probe. Detection was performed according to the manufacturer's instructions. The presence of the full-length gene in positively hybridizing single plaque-forming units was confirmed by PCR, and the plaques were purified by using a lambda purification kit (Qiagen). The plaques were then sequenced directly, using a cel7 gene-specific sequencing primer (5′-GCATTCCTGCCATGTCAG-3′) to generate sequence data for the 5′ region upstream of the ATG start codon.
Expression of cel7 in E. coli
Primers F1 (5′-CACCCAGCAGGCCGGCACGGCG-3′) and R1 (5′-TCACGAAGCGGTGAAGGTCGAGTT-3′), corresponding to the N- and C-terminal regions of the mature protein, were used to amplify cel7 cDNA (the N-terminal signal peptide, i.e. amino acids 1–18, was removed). The CACC corresponding to the GTGG overhang in the TOPO® cloning vector (Invitrogen Ltd., Paisley, UK) is underlined in the primer sequence above. The purified PCR product was ligated into the pENTR/SD/D-TOPO vector and transformed into One Shot® Top10 E. coli competent cells, according to the manufacturer's instructions. An LR recombination reaction between the entry clone, pE-Cel7, and the destination vector, pDEST-17 (Invitrogen), was transformed into E. coli DH5α library-efficient cells, thereby generating the expression clone, pD-Cel7, with an N-terminal poly-histidine tag. Multiple transformants were analysed by restriction analysis and PCR to confirm the presence and correct orientation of the insert at all stages. For expression, plasmid DNA was purified and transformed into BL21-AI competent E. coli cells (Invitrogen), which were cultured to mid-log phase, and expression was induced by the addition of 0.2% (w/v) arabinose followed by a further growth period of 4 h at 37 °C. Pilot experiments indicated that the CBH protein was expressed in the inclusion body fraction. Cells were harvested by centrifugation (3630 g for 5 min) from a 50 mL culture, and the cell pellet was resuspended in 8 m urea. The cell lysate was sonicated with three, 5-s, high-intensity pulses, centrifuged at 1307 g for 15 min to pellet cellular debris, and the supernatant was applied to a Nickel-nitrilotriacetic acid purification matrix (Invitrogen). The lysate was allowed to interact with the matrix at room temperature for 30 min with gentle agitation and then washed with 2 volumes of wash solution (containing 8 m urea, 20 mm sodium phosphate, 500 mm NaCl, pH 7.8), followed by 2 volumes of a second wash solution (containing 8 m urea, 20 mm sodium phosphate, 500 mm NaCl, pH 6.0). The column was then washed with 4 volumes of a final wash solution of 50 mm sodium phosphate, and 20 mm imidazole, pH 8.0. Recombinant CBH (reCBH) was eluted from the column by application of a solution of 50 mm sodium phosphate, pH 8.0, containing 250 mm imidazole.
Denaturation and refolding of reCBH and enzyme assay
reCBH (1 mg) was denatured by incubation in a solution of 8 m urea/0.1 m Tris/HCl, pH 8.0, in the presence of 100 mm dithiothreitol and 1 mm EDTA, for 2 h at 20 °C. The pH was lowered to pH 4.0 by dropwise addition of 1 m HCl, and the dithiothreitol was removed by dialysis against the same buffer without the dithiothreitol. Denatured and reduced reCBH was diluted 1 : 100 in a buffer solution containing 0.1 m Tris/HCl, pH 8.5/1 mm EDTA/0.3 mm oxidized glutathione/3 mm glutathione, and then incubated in a renaturation buffer containing 2.5 mg of protein disulphide isomerase at 30 °C for 30 h. reCBH was dialysed against 100 mm sodium acetate, pH 5.0, followed by concentration in a Millipore microconcentrator fitted with a 10 kDa cut-off membrane. reCBH activity was measured by incubating 10 µL of renatured enzyme with 100 µL of 1 mm chloronitrophenyl-lactate (CNP-lactate), 1 mm 4-nitrophenyl-cellobioside or 50 µm 4-methylumberiferyl-cellobioside, at 50 °C. Reactions were terminated by the addition of 100 µL of 1 m Na2CO3 or 0.2 m glycine/sodium hydroxide, pH 10.5, and the absorbance (405 nm) or UV fluorescence was measured.
Purification of CBH IB
CBH IB was purified from 2% (w/v) solka floc cellulose-induced cultures and characterized as described previously . The purified enzyme was concentrated to 20 mg·mL−1 in 20 mm Tris buffer, pH 7.5, and stored at 4 °C. Peptide sequence information for native CBH IB was determined by Edman degradation on an automated sequenator (J. Gray, University of Newcastle-upon-Tyne, Newcastle-upon-Tyne, UK).
Crystallization and data collection
Native CBH IB from T. emersonii was crystallized by using the hanging-drop vapour-diffusion method with ammonium phosphate (dibasic) as a precipitant at pH 8.5.
Crystals of CBH IB, which diffracted to 2.4 Å, were obtained. Data were collected at room temperature on the multipolar wiggler beamline, BW7B, at the DORIS storage ring, EMBL Hamburg Outstation, Germany using a Mar345 area detector. Data processing indicated that CBH IB crystallised in the tetragonal space group P41212, with unit cell dimensions a = b = 74.42 Å, c = 176.92 Å.
The structure was solved by molecular replacement by utilizing the program amore. Molecular replacement was completed using two separate search models, chosen based on sequence homology. The models used were the catalytic domain of T. reesei Cel7A (PDB 1CEL) and the catalytic domain of P. chrysosporium Cel7D (PDB 1GPI).
A total of 5% of the reflections in the data set was set aside for free R-factor calculations during refinement. refmac5  from the ccp4  suite of programs was used throughout this refinement, with the program  being employed for graphical displays and manipulation of the models. With each round of refinement, maps were produced and the model was rebuilt where electron density supported the changes. Water molecules were located and refined by using the program arp_warp. The stereochemical quality of the model was followed by using the program procheck.
Isolation of genomic and cDNA clones
The cel7 degenerate primers amplified a 719 bp PCR product from T. emersonii chromosomal DNA. The product was cloned, sequenced and found to exhibit homology to other fungal gene cel7 sequences. Based on this sequence, 5′- and 3′ outer and inner RACE PCR primers were designed to amplify the 5′- and 3′ ends of the cel7 gene. Sequence analysis confirmed the RACE products to be part of the cel7 gene, which included a 54 bp 5′ untranslated region and a 281 bp 3′ untranslated region, including a polyA tail. The full-length genomic (GenBank AF439935) and cDNA (GenBank AY081766) cel7 clones were amplified from first-strand cDNA and chromosomal DNA, respectively, by using N- and C-terminal gene-specific primers based on the RACE products. Cel7 was encoded by a 1365 bp open reading frame encoding 455 amino acids and interrupted by two introns (52 and 61 bp), with consensus 5′- and 3′ intron splice sites (Fig. 1).
Peptides sequenced from native CBH IB confirmed the identity of the T. emersonii cel7 gene/gene product; the location of these peptides in the deduced polypeptide sequence is given in the legend to Fig. 1. Comparison of the deduced cel7 amino acid sequence from T. emersonii with those from P. chrysosporium (GenBank: AAA19802), T. reesei (GenBank CAA49596), A. niger (GenBank AAF04491), H. grisea (GenBank AAD11942) and A. aculeatus (GenBank BAA25183) gave sequence identity values of 65%, 64%, 73%, 51% and 68%, respectively (Fig. 2). Alignment of the deduced polypeptide sequence of T. emersonii cel7 reveals the presence of a terminal catalytic domain. Other cel7 (cbhI) gene products possessing a catalytic domain exclusively have been identified and include H. grisea and A. niger. Cel7 genes from T. reesei, P. chrysosporium and A. aculeatus, however, contain a modular structure composed of a C-terminal carbohydrate-binding module linked via a proline/serine/threonine-rich linker to the catalytic domain . There are two predicted N-glycosylation sites in the catalytic domain of 1Q9H (Fig. 3), i.e. Asn-X-Ser/Thr (X is any amino acid except proline) consensus sequence, at Asn267 and Asn431. There are 18 residues corresponding to the signal peptide at the N-terminus of the translated protein product. Alignment of the existing fungal CBH sequences revealed that 1Q9H from T. emersonii comprises features found in both Cel7D of P. chrysosporium and in Cel7A of T. reesei.
Analysis of the T. emersonii cel7 upstream region
Initial screening of 6000 λ phage clones from the T. emersonii Sau3A genomic library identified two positively hybridizing clones. Sequence analysis of the 5′ region upsteam from the start codon of the purified cel7 clones revealed putative TATA-like and CCAAT box sequences located upstream of the start codon at bp −99, −132, −340, 1040, −1242, −1348, −1476 and −1694. In filamentous fungi , and in higher eukaryotes , the CCAAT sequence is known as an upstream activating sequence. The binding sites for putative cellulase transcription factors [activator of cellulase expression I (ACEI) and ACEII][48,49] are located upstream of the start codon at bp −562, −844, −853 and −1175, while putative binding sites for the catabolite repressor element (Cre) [50,51] are located upstream of the start codon at bp −239, −265, −320, −359, −460, −977, −1404 and −1523.
Northern blot analysis of T. emersonii cel7 expression
Solka floc cellulose, lactose and beechwood xylan induce high levels of cel7 expression in T. emersonii (Fig. 4). Similar cellulase expression with complex cellulose has been documented in P. chrysposporium and T. reesei. Methyl xylose and gentiobiose, a β-1,6-linked glucose disaccharide, induce low levels of cel7 expression, relative to solka floc, in T. emersonii. Gentiobiose has been shown to induce other cellulases in T. emersonii. Other researchers have reported induction of CBH A and B, and endoglucanase genes from A. niger are also induced by d-xylose . Sophorose, a β-1,2-linked disaccharide of glucose, has previously been shown to be a poor inducer of cellulase activity in T. emersonii, and it has been postulated that sophorose could be the natural inducer of cellulase expression in T. reesei. Cellobiose is a poor inducer of the T. emersonii cellulases and did not induce detectable levels of cel7 in this study. Glucose-induced cultures displayed no detectable levels of cel7. Indeed, the addition of 2% (w/v) glucose for 2 h to T. emersonii mycelia, previously cultured on solka floc for 48 h, resulted in the abolition of the cel7 signal. The regulatory proteins, CreA  and Cre1 , similar to Mig1 in Saccharomyces cerevisiae, mediate glucose repression in Aspergillus and Trichoderma species. The 5′ upstream region of T. emersonii cel7 has eight potential catabolite repressor-binding sites (SYRGG). The sequence of a gene encoding CreA from T. emersonii has recently been submitted to the GenBank database (AF440004).
Expression of cel7 in E. coli
A recombinant protein, of ≈ 57 000 relative molecular mass, was expressed in E. coli BL-21A. Under the conditions tested, reCBH was present in the insoluble inclusion fraction (Fig. 5A). The protein was purified under hydrid conditions (denaturing/renaturing) on a Ni-nitrilotriacetic acid column. reCBH was inactive against CNP-lactate. Denaturation of reCBH, followed by refolding, with concominant disulphide bond formation, in the presence of protein disulphide isomerase in renaturation buffer, successfully restored partial biological activity of reCBH against CNP-lactate and methylumberiferyl (Fig. 5B).
Structure solution and refinement of CBH IB
Molecular replacement was performed by using the ccp4 (1994) programs contained in the Automated package for Molecular Replacement (amore) . Both 1GPI, which was solved at 1.32 Å resolution and 1CEL, which was solved at 1.8 Å resolution, were used as search models. Rotational and translational searches were performed at different resolutions in the range of 69 Å to 2.4 Å. Rigid body refinement was carried out after each translation function to refine the position of the potential solution. Euler angles, fractional coordinates, correlation coefficients and R factors for the best molecular replacement solution for each model, using P41212 as the space group of the 1Q9H crystal, were found. The best solutions had correlation coefficients of 56.1% and 55.3%, and R-factors of 40.0% and 40.9% for 1CEL and 1GPI, respectively. Refinement of the models produced by amore was performed by using the ccp4 program, refmac5 . refmac5 was used to carry out restrained refinement on X-ray data by using the maximum likelihood method. The Roverall and Rfree values from the first round of refmac5 cycles on the 1GPI model were 28.6% and 33.6%, respectively, while those for the 1CEL model were 28.9% and 35.5%, respectively. The 1GPI model was used for further analysis. The graphics program turbo was used to examine both models and the maps produced by refmac5. 2Fo-Fc and Fo-Fc maps were used and analysed with contour levels set to 1.0. The amino acid sequence of the 1GPI model was changed to that of 1Q9H, and the model was rebuilt where changes were supported by the electron density. Changes in R factors were used as a guide to improvements to the overall structure. Further rounds of model mutation and rebuilding resulted in a model with an R-factor of 16.1% and an R-free of 22.9%(Table 1). Electron density maps showed almost continuous density for the backbone of CBH IB. The final model, 1Q9H, included 430 of the 437 amino acid residues of CBH IB. The final two amino acids and the loop region (from amino acids 193–197) were not visible in electron density maps. In addition, no side-chain density was apparent for four residues, which were subsequently modelled as alanine. All of these residues are located on the surface of the protein and are presumed to be disordered. Three N-acetylglucosamine and 175 water molecules were located within the model. Average isotropic temperature factors (B factors) for the 1Q9H structure were calculated by using the ccp4 program baverage. Average isotropic temperature factors for the main chain were 20.91 Å2 and root-mean-square deviations (rmsd) from ideal bond lengths and angles were 0.009 Å and 1.281 Å, respectively. procheck was used to verify the stereochemical quality of the model. The Ramachandran plot showed that 86% of residues lie in the most favoured regions and 13.8% lie in the allowed region, while Ser311 was the only nonglycine residue in the generously allowed region and there were no residues in the disallowed regions. Peptide bond planarity for the main chain was found to be 7.0°, nonbonded interactions were 0.6 per 100 residues, α-carbon tetrahedral distortion was 1.8°, the standard deviation of the hydrogen bond energies was 0.7 and overall G-factor, a measure of the normality of the structure, was 0.0.
Table 1. Final statistics for the structure of Talaromyces emersonii 1Q9H. Values in parentheses refer to the last resolution shell.
Unit cell dimensions
a = b
α = β = γ (deg °)
Resolution range (Å)
No. of reflections
Mean I>2s (I) (%)
No. of water molecules
No. of sugar molecules
rms bond lengths (Å)
rms bond angle (°)
Average B main chain (Å2)
Average B water (Å2)
Overall structure of T. emersonii 1Q9H
1Q9H is a large single-domain protein with overall dimensions of ≈ 60 Å × 40 Å × 50 Å (Fig. 6). About one-third of this domain is arranged in two large antiparallel β-sheets, which are stacked face-to-face and are highly curved, forming convex and concave surfaces. The convex and concave sheets of the β-sandwich are composed of seven β-strands. Many of the side-chains in the β-sheets are hydrophobic, and interactions between these residues appear to hold the β-sandwich in position. With the exception of four α-helices and two pairs of short β-strands, the rest of the protein consists almost entirely of loops connecting the β-strands. The loops extending from the β-sandwich forms a tunnel, which runs the length of the concave sheet, into which the cellulose substrate can be accommodated. The β-sandwich represents the characteristic fold of GH7 and is also the fold of the legume-lectin family and of GH16 .
The loops extending from the β-sandwich are stabilized by the presence of nine disulphide bonds which are located between residues 19–25, 50–71, 61–67, 135–401, 169–207, 173–206, 227–253, 235–240 and 258–334. The N-terminal glutamine residue is present as the modified pyroglutamate group, as observed in other GH structures [17,58]. Electron density corresponding to N-glycosylation is visible at two asparagine residues, namely Asn267 and Asn431 (Fig. 3). It was possible to position two N-acetylglucosamine residues, linked via a β-1,4 bond, at Asn267. A single N-linked N-acetylglucosamine was seen in the model at position Asn431.
Structure of T. emersonii 1Q9H, in comparison with P. chrysosporium 1GPI and T. reesei 1CEL
A blast search of the Protein Data Bank (PDB) revealed that the protein structures with the highest sequence homology to 1Q9H were structures 1GPI and 1CEL, which are the catalytic domains of CBH Cel7D from P. chrysosporium and CBH Cel7A  from T. reesei, respectively. P. chrysosporium has a sequence identity of 67% with Cel7A, while T. reesei has an identity of 66%.
While the sequence homology between 1Q9H, 1CEL and 1GPI are similar, the areas of shared homology differ. Superimposing the C-alpha traces of 1GPI and 1CEL on 1Q9H, gave rmsd values of 0.71 Å and 0.67 Å, respectively (Fig. 7).
The X-ray structure of the T. reesei CBH, with eight glucose residues bound (PDB 7CEL), identifies some 20 residues involved in enzyme–substrate interactions. Superposition of this structure on 1Q9H shows that all but two of these residues are conserved and suitably positioned for interactions with the substrate. Four tryptophan residues form a glucosyl-binding platform in sites −7, −4, −2 and +1 in the tunnel of 1CEL; equivalent tryptophan residues are found in 1Q9H at positions 38, 40, 371 and 380. A tyrosine residue (Tyr47) present in the T. emersonii CBH IB sequence, and seen in 1GPI but not in 1CEL, is located at the entrance of the tunnel, which Munoz et al. suggests may constitute an additional binding subsite . Three arginine residues in the product sites of 1CEL (+1, +2 and +3) are proposed to assist in the binding and positioning of the substrate and play a role in the recognition of the reducing end of the cellulose chain. Arginine side-chains are present in all equivalent locations in 1Q9H (Fig. 7).
There are four major loops involved in the cellulose-binding tunnel in 1CEL. It is postulated that Asn197 and Asn198 make van der Waals interactions with Tyr370 and Tyr371, on the opposite loop, thus enabling it to form a fully enclosing tunnel . While sequence analysis shows that 1Q9H possesses the equivalent Asn residues (Asn193 and Asn194), one of the tyrosine residues on the opposite loop is replaced by an alanine (Ala374), forming a more open tunnel; however, electron density in this area of 1Q9H is poor. In 1GPI, neither asparagine residues are present and a histidine and an alanine residue are found in the equivalent tyrosine positions. The tunnel-forming loop (amino acids 240–248) in 1GPI is significantly shorter than in 1CEL and 1Q9H, owing to a six amino acid deletion, depicting a more exposed catalytic site for 1GPI. In 1CEL, three amino acids form a tight turn over site −6, with Gln101 hydrogen bonding to the glycosyl residue in site −5, thus forming the lid of the binding site. The structures 1Q9H and 1GPI have a deletion of these three residues, thus leading to a more open substrate-binding site.
Catalytic binding site
Brooks et al. showed, by NMR, that the CBHs I from T. emersonii has a retaining mechanism of action . This type of mechanism, as shown by Davies & Henrissat , involves a proton donor and a base separated by ≈ 5.5 Å[59–61]. Henrissat  classified all members of GH7, which catalyse the hydrolysis of the β-1,4-glycosidic bond of cellulose, as retaining enzymes, i.e. they retain the configuration of the anomeric carbon. Glu212 and Glu217 have been identified as the proton donor and acceptor, respectively, in 1CEL. Sequence analysis of 1Q9H and 1GPI shows that these residues are conserved, suggesting that they carry out the same function. Based on the proposed mechanism of action from 1CEL, Glu209 of 1Q9H may act as the nucleophile, while the proton donor is likely to be Glu214. The proposed catalytic residues are separated by 5.57 Å. The Asp211 residue of 1Q9H is in a position to share a proton with the nucleophile, in a short hydrogen bond (O-O distance 2.51 Å; Fig. 8). The residue Glu214 forms a weak hydrogen bond to Asn138. A platform of hydrophobic residues has recently been identified as being mechanistically relevant as a transition-state stabilizing factor in GH family members . A tyrosine residue (Tyr142), present near the −1 subsite in 1Q9H, is thought to be involved in this platform.
This article presents the first report on the purification and 3D structural determination of a native core CBH protein, and of the cloning and over-expression of the corresponding gene, from a thermophilic fungal source. CBH IB is extremely thermostable with a temperature optimum of 68 °C at pH 5.0 and a half-life (t½) of 68.0 min at 80 °C and pH 5.0. In comparison, Cel7a from T. reesei has a temperature optimum of 62 °C over the pH range 3.5–5.6. The cel7 gene from T. emersonii was cloned and the deduced amino acid sequence used during the structure solution of the native enzyme. Family 7 contains both CBHs and endoglucanases. The structure of CBHs is distinguished from that of endoglucanases by the presence of loops of polypeptide chain covering the active site residues, which convert the active site cleft of endoglucanases into the characteristic tunnel of CBHs. Three CBHs belonging to GH Family 7 –T. reesei 1CEL, P. chrysosporium 1GPI, and T. emersonii 1Q9H – are generally similar in structure. The catalytic domains are single domain proteins with two large antiparallel β-sheets that stack face-to-face to form a β-sandwich. The rest of the three CBHs consist almost entirely of loops connecting the β-strands. However, on closer inspection of the structures, local variations are reflected in the sequence differences. The cellulose-binding sites in 1Q9H are more accessible than those in 1CEL. The absence of the three amino acids that are observed to form a tight turn over the −5/−6 subsites in 1CEL, confer a more open entrance to the cellulose-binding sites in 1Q9H. This proposal is supported by the replacement of Asn7 in 1CEL by a smaller threonine residue in 1Q9H and 1GPI at the −7 subsite and of Tyr371 in 1CEL by Ala374 in 1Q9H at −3/−4 subsites. A tyrosine residue present in 1GPI and 1Q9H, but absent in 1CEL, has been suggested, by Munoz et al. , to be an additional substrate-binding site. The more open tunnel structure is probably an adaptation to the lack of a CBM, allowing short chain oligosaccharides more access to the active site, with supporting evidence from the higher catalytic rate (kcat) and catalytic efficiency (kcat/Km) of 1Q9H 13.4·s−1 and 3.6·s−1·mm−1 (compared with 0.093·s−1 and 0.23·s−1·mm−1 for 1CEL) with the oligosaccharide derivative 4-NP-lactopyranoside. An insertion of eight amino acid residues common to 1Q9H, P. chrysosporium Cel7D and T. reesei endoglucanase Cel7B can be seen. Although this insertion is located at the outer regions of the structure, it could potentially have implications for function and will be the target of future protein engineering studies. The probable catalytic residues, nucleophile Glu209 and proton donor Glu214, of 1Q9H are located approximately on opposite sides of the cleavable glycosidic linkage in the −1/+1 subsites, with their carboxylic groups 5.57 Å apart. Four tryptophan residues located along the substrate-binding tunnel in 1CEL, which are the determinants of the glycosyl-binding sites, are conserved in 1Q9H. Density was poor for one of the tunnel-forming loops of 1Q9H (residues 193–197). The tunnel is composed of loops that are inherently flexible, and the absence of good density in the loops is perhaps indicative of its flexibility. It is worth noting that the structures of T. reesei GH7 CBHs were solved in the presence of substrates. However, as 1Q9H was solved in the absence of bound substrate, one could imagine that if a substrate was present in the structure the loops would close over the substrate yielding a structure more like that of 1CEL.
The cel7 gene consists of a 1365 bp open reading frame encoding 455 amino acids interrupted by two introns. The deduced amino acid sequence revealed a secretory signal peptide and a CBH catalytic domain. The 5′ upstream region of cel7 has eight potential Cre-binding sites, and it is probable that glucose repression of cellulase transcription is mediated through a Cre protein in T. emersonii (the gene sequence for a Cre-like protein has been cloned from T. emersonii). It has been shown previously that sophorose is a weak inducer of cellulases in T. emersonii, but is the proposed natural inducer of cellulase expression in T. reesei. Induction of cel7 and cbhII expression by gentiobiose suggests that this glucose disaccharide may be the natural cellulase inducer in T. emersonii and indicative of an alternative cellulase induction mechanism in this fungus. The carbohydrate-binding module and linker region that are characteristic of some other GH family members were not encoded in the gene, in contrast to cbh 2 from the same source . Biochemical analysis of the CBHs from T. emersonii, previously reported from this laboratory, has shown that the hydrolysis of crystalline cellulose (Avicel) by CBH IB is 77% lower than observed with CBH IA . Earlier studies revealed that removal of the carbohydrate-binding module from the T. reesei CBH resulted in a 90% decrease in activity against Avicel [65,66]. More recently, Nutt et al.  have shown, by progressive curve analysis, that intact CBHs from T. reesei and P. chrysosporium show higher activities than their corresponding cores against bacterial microcrystalline cellulose. Takashima et al.  suggest that the exoglucanase (EXO1) of H. grisea displays lower activity towards crystalline cellulose than the corresponding CBHI enzyme from this organism. The same study indicated exo-synergism between EXO1 and CBH I in the hydrolysis of crystalline cellulose, and a similar co-operativity between CBHs in T. emersonii may occur. Despite the reduced activity of CBH IB against crystalline cellulose, the enzyme hydrolyses avicel in a processive manner. In processive cellulose hydrolysis, initial hydrolytic attack occurs at the chain end, with glucose or cellotetrose produced only upon initial attack, with cellobiose being the principal product of hydrolysis thereafter. During hydrolysis of avicel by CBH IB, glucose production is markedly low and remains constant after the initial hydrolytic attack. Cellobiose is the predominant product of hydrolysis and increases in concentration as the reaction proceeds . The exo-loop of 1CEL (amino acids 243–256) forms the roof of the active site tunnel at the catalytic centre. Deletion of this loop has been shown to lead to a decreased processivity of 1CEL against crystalline cellulose . This exo-loop is conserved in 1Q9H and is presumed to contribute to processivity of CBH IB against crystalline cellulose. It should be noted, however, that 1GPI has a natural deletion of the exo-loop, yet CEL7D, from P. chrysosporium, is able to maintain high processivity, leading to efficient crystalline cellulose hydrolysis . Therfore, conclusions drawn for one enzyme within the same family do not necessarily apply to others because of different substrate preferences. We were able to restore biological activity of the denatured reCBH, although enzyme activity remained very low. 1Q9H has nine disulphide bridges, and so regeneration of the native CBH enzyme in high yield by in vitro reoxidation of the reduced, denatured polypeptide, is extremely complex. Expression at lower temperatures has been carried out and has yielded similar activity results. Therefore, heterologous expression studies in other hosts are currently in progress. Future site-directed mutagenesis of specific residues in 1Q9H should provide a valuable insight into the structural basis of ehnaced thermostability of the CBH IB protein from T. emersonii.
This work was funded by HEA pre-PRTLI and Enterprise Ireland awards to M.G.T. C.M.C. and R.T. are grateful for junior teaching fellowships from NUI, Galway, and postgraduate scholarships from Enterprise Ireland.