Sequence diversity within a family of functional enzymes provides a platform for elucidating structure–function relationships and for protein engineering to improve properties important for applications. Access to nature's vast sequence diversity is often limited by the fact that only a few enzymes have been characterized in a given family. Here, we recombined the catalytic domains of three glycoside hydrolase family 48 bacterial cellulases (Cel48; EC 126.96.36.199) – Clostridium cellulolyticum CelF, Clostridium stercorarium CelY, and Clostridium thermocellum CelS – to create a diverse library of Cel48 enzymes with an average of 106 mutations from the closest native enzyme. Within this set, we found large variations in properties such as the functional temperature range, stability, and specific activity on crystalline cellulose. We showed that functional status and stability were predictable from simple linear models of the sequence–property data: recombined protein fragments contributed additively to these properties in a given chimera. Using this, we correctly predicted sequences that were as stable as any of the native Cel48 enzymes described to date. The characterization of 60 active Cel48 chimeras expands the number of characterized Cel48 enzymes from 13 to 73. Our work illustrates the role that structure-guided recombination can play in helping to identify sequence–function relationships within a family of enzymes by supplementing natural diversity with synthetic diversity.
Cellulolytic anaerobic bacteria use macromolecular structures known as cellulosomes to hydrolyze recalcitrant cellulosic substrates . Within the cellulosome, cellulases and other glycoside hydrolases [2, 3] are assembled onto multidomain scaffoldin proteins for efficient degradation of cellulosic substrates . Cellulosome assembly is achieved by binding of dockerin domains from enzymes to cohesin domains in scaffoldin, and interaction with the substrate is mediated by one or more carbohydrate-binding modules (CBMs) on the scaffoldin [1, 5].
The modularity of cellulosomes has spurred interest in ‘designer cellulosomes’ [4, 6], whereby different cellulases are synthetically combined for a specific application. Within a given glycoside hydrolase family, a diverse pool of potential cellulases would be beneficial for designer cellulosomes by providing a suite of enzymes with differing properties and an extensive platform for further enzyme engineering. Glycoside hydrolase family 48 cellulases (Cel48; EC 188.8.131.52) are ideal candidates for designer cellulosomes. As one of the most important families of bacterial cellulases [7, 8], they are usually a major constituent of bacterial cellulosomes [9, 10]. Of the 116 bacterial Cel48 genes currently predicted in the CAZy database (http://www.cazy.org/) , only 13 have been characterized.
Here, we used SCHEMA recombination to synthesize a diverse set of new Cel48 sequences. SCHEMA  is a structure-guided, site-directed protein recombination method that has been used to generate thousands of novel cytochrome P450s , β-lactamases , and fungal cellulases [15, 16]. SCHEMA identifies optimal crossover locations for shuffling homologous genes, based on minimizing structural disruption in the resulting chimeric proteins. The chimeric proteins that are made by recombining natural sequences differ from the parent sequences at many amino acid positions, and provide a convenient platform for structure–function studies. The new Cel48 enzymes described here are chimeras of the catalytic domains of three native Cel48 enzymes from mesophilic and thermophilic Clostridia. Sequence–function analysis of this synthetic enzyme library demonstrates a high degree of additivity in the sequence–stability relationship, as observed in previous studies [15, 17]. This simple relationship between the sequence block identity and its contribution to chimera stability has allowed us to predict highly stable, highly active Cel48 enzymes. We have also investigated the relationship between thermostability and optimal catalytic temperature in this enzyme family.
Cel48 parental enzymes
Three extensively characterized Cel48 cellulases were chosen as parents for construction of the SCHEMA recombination library: CelF  from the mesophile Clostridium cellulolyticum ATCC 35319, CelY  from the thermophile Clostridium stercorarium, and CelS (also known as CelA)  from the thermophile Clostridium thermocellum ATCC 27405. All three enzymes are known to act on crystalline cellulose in a processive manner [10, 11]. Crystal structures of CelF and CelS show that the Cel48 catalytic domain is an (α/α)6-barrel fold. The sequence and structural similarities of the catalytic domains (Fig. S1) suggest that these enzymes can be recombined to make functional catalytic domain chimeras.
Outside of the catalytic domain, however, the parent enzymes exhibit significant structural variations. CelF and CelS consist of a 70-kDa catalytic domain connected to their organisms' respective dockerin domains, whereas CelY is a noncellulosomal 103-kDa protein with its N-terminal catalytic domain attached via a 10-kDa domain of unknown function (DUF) to a 17-kDa cellulose-binding domain (CBM3) . Thus, CelY can directly bind cellulose, whereas CelF and CelS bind their respective scaffoldins.
As the noncatalytic domains (dockerin, scaffoldin, and CBM) differ among the parent enzymes, we chose to construct the library by using the C. thermocellum architecture. Having a single architecture for the cellulases enables fair comparison of the chimeric cellulase catalytic domains. A miniscaffoldin consisting of a C. thermocellum cohesin and a CBM was constructed as previously described , and the CelS dockerin domain was fused to the C-termini of the catalytic domains of CelF, CelS, and CelY (see Experimental procedures). The parental constructs, with the added C. thermocellum dockerin domain, are referred to as CelF-1, CelS, and CelY-2, and are highlighted (boxed) in Fig. 1. These constructs can attach to the miniscaffoldin to produce minicellulosomes. Another CelY construct was created by the addition of its DUF. Because the presence or absence of this DUF did not affect activity of the CelY constructs (Fig. S2B), the DUF was excluded in constructing the recombination library.
We first characterized and compared the activities on crystalline cellulose of the parental enzymes with and without the miniscaffoldin. For all cellulases with a dockerin, activity was substantially higher in the presence of miniscaffoldin than without it (Fig. 2A–C). Thus, cohesin–dockerin binding occurs, and CBM-mediated attachment to cellulose enhances the rate of sugar release from crystalline cellulose, as observed previously . Figure 2D directly compares the activity profiles for the dockerin-containing cellulases in the presence of C. thermocellum miniscaffoldin. Under these conditions, CelY and CelS displayed the highest activity at 70–80 °C and very low activity below 50 °C. In contrast, CelF was most active at ~ 50 °C, but quickly lost activity at higher temperatures. In a previous study, we compared the activities of three homologous bacterial glycoside hydrolase family 9 CBM3c cellulases from mesophilic and thermophilic organisms over a range of temperatures. They all displayed similar activities at lower temperatures, and that activity increased with temperature until the enzyme was no longer stable . Here, in contrast, the Cel48 cellulase from the mesophilic organism is significantly more active than its two thermophilic homologs at the lower temperature.
SCHEMA recombination library design
A structure-guided computational approach to designing a library of chimeric genes, SCHEMA identifies crossover sites for recombination of homologous proteins that maximize the likelihood that proteins in the resulting library will retain their folded structure . Contacts (residues that are < 4.5 Å from one another) are identified from one or more of the crystal structures, and the SCHEMA energy (E) for a given chimera is calculated by counting the number of residue–residue contacts that are disrupted by recombination. Recombination sites are chosen to minimize the average SCHEMA energy, <E>, of all possible sequences made by recombining those sequence fragments.
We designed the recombination library of Cel48 catalytic domains by using the raspp algorithm  to identify crossover sites that minimized <E> . raspp returned a set of candidate library designs (Fig. S3). The chosen library has crossovers located before residues Pro122, Ala260, Asp292, His348, Gly396, Asn437, and Leu556, based on the numbering of CelS [Protein Data Bank (PDB): 1L2A]. This library has an <E> of 31, and an average number of mutations from the closest parent (<m>) of 106. The individual structural elements (‘blocks’) for this design, shown in Fig. 3A, are not obvious from the secondary or domain structure. Crossovers between blocks B and C, C and D, and G and H, for example, lie within α-helices. This design, however, sequesters as many residue–residue contacts as it can within blocks, given limitations on block size (Fig. 3B).
Chimeric genes were assembled from 24 gene fragments, representing the eight blocks from each of the three parents, with the sequence-independent site-directed chimeragenesis method  to generate a gene library of 38 (6561) different sequences (Table S1; Fig. S4). A C. thermocellum dockerin was attached to the C-terminus of each chimeric sequence during reassembly. The methods used to express, purify and identify functional chimeras are described in detail in Experimental procedures.
Characterization of chimeric Cel48 cellulases
Upon screening 4872 library members with a 96-well plate cellulase activity assay (see Experimental procedures), we identified the functional enzymes, from which we purified and characterized 50 unique, novel Cel48 enzymes. As shown in Fig. 4, these enzymes have, on average, > 80 mutations from the closest parent cellulase. Their SCHEMA E-values range from 8 to 36, and they have 12–142 mutations from the closest parent cellulase. Sequences from all three parental enzymes are well represented at each block in the functional chimeras, except for CelF, which is underrepresented in blocks E, G, and H.
We measured the thermostabilities (T50) and optimal catalytic temperatures (Topt) of the 50 Cel48 chimeras and their three parents; these values are reported in Fig. 4. T50 is the temperature at which an enzyme loses 50% of its activity after a 10-min incubation (see Experimental procedures), and is a measure of its ability to resist temperature-induced irreversible inactivation. Topt is the temperature at which a cellulase is most active during a 2-h assay (see Experimental procedures), and is a measure of its ability to remain active at elevated temperature. Thermostability, the ability to withstand denaturation, is necessary but not sufficient for increasing an enzyme's optimal catalytic temperature. In the chimeras, both these measured properties extend beyond the range of the parents. Many of the chimeras are very stable: indeed, this experiment has added 35 new Cel48 enzymes with a Topt of > 60 °C to the six natural thermostable cellulases that have been characterized to date: C. thermocellum ATCC 27405 CelS , C. thermocellum F7 CelS , C. thermocellum ATCC 27405 CelY , Thermobifida fusca YX CelF , C. stercorarium CelY , and Anaerocellum thermophilum DSM 6725 CelA .
We also measured the specific activities of all the Cel48 chimeras at their respective optimal catalytic temperatures (Figs 4 and 5A). The chimeras tend to have specific activities that are similar to or slightly less than the parent enzymes. We did not observe a correlation between Topt and specific activity at that temperature for all of the sampled chimeras (Fig. 5B). However, recombination may have compromised the activities of many of the chimeras. If only the most active enzymes are considered, there does appear to be a correlation between Topt and specific activity (Fig. 5B, dotted line), whereby increasing temperature leads to higher specific activity.
Modeling and predicting the function of chimeric cellulases
As previously demonstrated for fungal CBHI and CBHII cellulases [15, 16], we can use information from a small number of sequences to predict the properties of all the chimeras in the recombination library. To demonstrate this for Cel48, we built predictive models of T50 and Topt based on the sequences and SCHEMA E-values of the 50 functional chimeric cellulases and the three parental enzymes. We modified the simple sequence–stability linear regression model first used by Li et al.  to include an additional parameter for second-order SCHEMA contacts in the chimeras (Eqn S1). As shown in Fig. 6A, the thermostability model fits the T50 measurements of all 53 enzymes well (r2 = 0.88), and is an improvement over the simpler model that does not include the SCHEMA E-parameter (r2 = 0.82), as illustrated in Fig. S5.
With this model, we were able to identify the contribution that each sequence block makes to stability (Fig. 6B). When trained on Topt measurements, the same additive block model also accurately predicts the measured values (Fig. 6C), and the block contributions to optimal catalytic temperature are very similar to their contributions to thermostability (Fig. 6D). These models trained on data from the sample set can be used to predict the T50 and Topt of all the remaining chimeras in the library.
We wished to construct and test the chimeric cellulases that are predicted to be the most thermostable. Not every chimeric cellulase, however, is functional. To investigate how recombination leads to nonfunctional sequences, we analyzed 28 unique inactive chimeras identified during the activity screen. A chimera was defined as nonfunctional if, upon a five-fold increase in enzyme concentration, from 0.2 to 1 μm, no detectable activity was measured between 45 °C and 80 °C. These nonfunctional cellulases are all soluble proteins of the correct length on an SDS/PAGE gel (data not shown). Using CD, we analyzed 17 of the 28 nonfunctional chimeras at 25 °C, and found that all gave a similar signal to the parent enzymes (Fig. S6), suggesting that nonfunctional chimeras are folded and have a similar secondary structure to functional ones.
Inspired by the success of the additive block models for thermostability and thermoactivity, we took a similar approach to modeling and predicting chimera functional status. We constructed a linear model in which each block contributes independently to whether a chimera is functional or not. As with thermostability, we also included the SCHEMA E-value as a parameter. The output from the model should be a value between 0 and 1 to represent the probability that a chimera is active. To do this, we augmented the output of the linear model by using a linking function, flink, which scales outputs of the model to the required range (Eqn S2). The coefficients for this model can be found by linear regression (Table S2), although, unlike the thermostability model, the block contributions are only additive under the linking function.
We trained the activity model on 81 cellulases (53 active; 28 inactive), and assessed its predictive ability by cross-validating the predictions of functional chimeras with the measurements of functional chimeras. The model successfully predicted the functional status of 88% of the chimeras (Table S3). A low SCHEMA E-value is known to increase the likelihood of a chimera being active , but E alone correctly predicted the functional status of only 77% of these chimeras under the same cross-validated conditions. Running the functionality model on all block combinations, we predict that the library contains more than 3000 unique active Cel48 enzymes.
Using the T50 model trained on the 53 experimentally active sequences in combination with the functionality model, we predicted the 13 most stable enzymes that are also expected to be catalytically active. These were constructed and characterized. Ten of the 13 were active (Table S4); these sequences and their stabilities are reported in Fig. 4. As shown in Fig. 7A, their stabilities closely matched the predictions. Five of these variants were slightly more stable than the most stable parental enzymes. Interestingly, two of the highly stable chimeras also hydrolyzed more cellulose than the most active parental enzyme, CelY-2, both in a 1-h assay (Figs 5 and 7C,D) and in a 48-h assay (Fig. 7B), demonstrating the potential utility of these chimeric enzymes for the construction of designer cellulosomes.
Probing biochemistry with synthetic diversity
With 60 active cellulase chimeras in hand, we next examined the relationship between the optimal temperature for catalytic activity (Topt) and resistance to temperature-induced denaturation (T50) over a broad range of temperatures. These two properties are closely correlated (Fig. 8), indicating that engineering Cel48 enzymes for greater thermostability increases their optimal catalytic temperatures. Some of the chimeric cellulases have a Topt higher than their T50. We believe that this reflects the stabilizing effect of cellulose substrate, because the substrate is present in the Topt assays but not in the denaturation step of the T50 assays. This effect can be seen in Fig. S7, where T50 values in the presence of cellulose are ~ 2 °C higher than in its absence.
The dearth of characterized Cel48 enzymes with different properties is an impediment to their use in designer cellulosomes for specific engineering applications, and inhibits the discovery of sequence–function relationships for these important enzymes. We have used structure-guided protein recombination to expand the diversity of characterized Cel48 enzymes. Using SCHEMA to identify suitable crossover locations for shuffling sequence blocks among the three parent Cel48 catalytic domains, we have generated a large set of novel, active cellulases that have the same architecture and are expressed under the same conditions in the same Escherichia coli host, where they are straightforward to characterize and compare. As expected, we found that properties such as Topt (the ability to remain active at elevated temperature), T50 (the ability to withstand denaturation at high temperature) and the specific activity at Topt vary greatly among these novel enzymes. We also found that functional status, T50 and Topt can be predicted from simple linear models built from sequence–function data from a small sample of the library. This has enabled us to efficiently identify stable chimeras, some of which have high cellulolytic activities.
This set of related enzymes can contribute to our understanding of how sequence affects Cel48 properties. The thermostability model illuminates stabilizing blocks of amino acids, whether they exist in the most stable proteins or not. Two of the most stabilizing blocks are predicted to be from the parent CelS at positions F and G. These blocks are located in the C-terminus of the catalytic domain, close to where the dockerin attaches, which suggests an important stabilizing interaction between these blocks and the C. thermocellum dockerin. When the dockerin binds the cohesin, the linker between the catalytic domain and dockerin is pleated, and this brings the dockerin into close contact with the catalytic domain . A CelS dockerin–cohesin crystal structure would be valuable for identifying specific stabilizing interactions between these two domains.
With this work, we also address another biochemical question with important engineering implications. Using this accessible set of related enzymes, we investigated the correlation between the temperature at which an enzyme is most active and the temperature at which it denatures irreversibly. We found that Cel48 chimeras with greater thermostability also have their activity optima at higher temperatures, and that these temperatures are closely related. In other words, the ability to withstand temperature-induced denaturation at ever-higher temperatures leads to increases in the optimum temperature for activity. It is not necessarily the case that increased structural stability and resistance to denaturation and irreversible inactivation will result in the ability to catalyze the reaction efficiently at higher temperatures, particularly if local instability or dynamics influence catalysis . Among the Cel48 chimeras, however, there is sufficient structural stability in key catalytic regions to make T50 a good surrogate for Topt.
We found that two of the predicted thermostable chimeras had higher specific activities at Topt than the most active parental enzyme, CelY-2. When assayed over a 48-h period, they hydrolyzed twice as much cellulose as CelY-2. These chimeric enzymes, which we have analyzed in a cellulosomal construct, may find potential uses in designer cellulosomes. An important next step will be to determine whether they provide an enhanced cellulolytic capability to a system such as the C. thermocellum cellulosome.
Parental enzyme constructs
Cel48 genes from CelF and CelS were PCR-amplified with Phusion-polymerase from genomic DNA, with primers CTHE312.40 and CTHE2453.40 for CelS, and CCEL786.41 and CCEL2864.41 for CelF, introducing HindIII and SacI sites at the 5′-end, as well as a NotI site at the 3′-end (Table S5). Taq polymerase was used to add A-overhangs for TA-cloning into pGEM-T Easy (Promega, Madison, WI, USA). The resulting plasmids were called pGEMT–CTHEwt and pGEMT–CCELwt. The CelS dockerin was added to the CelF catalytic domain to create the plasmid pGEMT–CCELmut1. These constructs were cloned into pET-22(+) by the use of NdeI and NotI sites.
We designed a synthetic gene for CelY from C. stercorarium on the basis of available sequence information but with restriction sites NdeI, HindIII, BsaXI, PstI and SapI removed. The gene was codon-optimized for expression in E. coli by DNA 2.0 (Doc. S1). The CelY gene was cloned into pET-22(+) by the use of NdeI and NotI restriction sites. The resulting construct was termed pET22b+CSTEwt, and contains the catalytic domain, the DUF, and the CBM. Two more constructs were made from the CelY gene: CelY-1, containing only the catalytic domain and C. thermocellum dockerin, and CelY-2, containing the catalytic domain, the DUF and the C. thermocellum dockerin. Products were cloned into pET-22(+) by the use of NdeI and NotI restriction sites.
An XbaI site was introduced by overlap extension PCR into all parental constructs between the catalytic domain and the dockerin. Introducing an XbaI restriction site between the catalytic domain and the dockerin allowed swapping of catalytic domains and dockerins. The XbaI site did not affect activity (Fig. S2A).
Recombination library design
The SCHEMA library was designed with the tools available on the Arnold group homepage (http://www.che.caltech.edu/groups/fha/). The catalytic domains of CelF, CelY and CelS were aligned, with clustalw, from Tyr40 to Phe661, based on the numbering of CelS. We analyzed all available structures without point mutations of the catalytic domains of CelS and CelF [CelF PDBs – 1F9O, 1FAE, 1FBO, 1FCE, and 1G9G; CelS PDBs – 1L1Y (six chains), and 1L2A (six chains); a total of 17 chains]. Of the 3035 unique residue–residue contacts in all 17 structures, on average 73% are conserved between any CelF structure and CelS structure, as compared with an average of 80% of contacts conserved between any two CelF structures, and an average of 80% of contacts conserved between any two CelS structures. As contacts between structures of the same enzyme vary almost as much as contacts between structures of CelF and CelS, we made use of all 17 available structures in designing the library. The average SCHEMA energy for a library (<E>) was calculated for each structure, and libraries were evaluated on the basis of the average <E> from all 17 structures. Seven crossover sites were chosen with the raspp algorithm , with a minimum fragment size of 30 residues. raspp returned a set of candidate libraries characterized by <E> (the average number of contacts broken within a library for a given structure), ≪E⪢ (the average of <E> for a given library across all 17 different structures), and <m> (the average number of amino acid substitutions from the closest parent within a library). Figure S3A shows ⪡E⪢ as a function of <m>. We removed solutions without a conserved amino acid at the designated crossover sites (Fig. S3B). To obtain libraries with mutations more evenly distributed into blocks, we also calculated the standard deviation of the average number of mutations per block for each library. Lower numbers indicate more evenly distributed blocks. Figure S3C shows ≪E⪢ as a function of the standard deviation of block mutations. From this set, we picked a library that would contain a large number of active enzymes with high sequence diversity: the chosen library has an ≪E⪢ of 31.3 and an <m> of 106. Calculated for each of the 17 structures, <E> for the library varies from 28 to 34.
Construction of chimeras
Chimeric genes were assembled from 24 gene fragments, representing the eight blocks from each of the three parents, with the sequence-independent site-directed chimeragenesis method . The following consensus sites were used for the crossover sites: (a) CCG; (b) GCC; (c) GAC; (d) CAT; (e) GGT; (f) AAC; and (g) TTA (Table S6). Mini-libraries were cloned into pGEMT by the use of SpeI and SacII sites. Full libraries were made by isolating large amounts of DNA from plasmids digested with SpeI and SacII, not by PCR amplification. Instead of SapI, the isochizomer LguI was used. A C. thermocellum dockerin was attached to the C-terminus of each chimeric sequence during reassembly. The genes were expressed in pET-22(+) under the control of an isopropyl-thio-β-d-galactoside (IPTG)-inducible T7 promoter in E. coli BL21(DE3). A similar approach was used for construction of the specific chimeras predicted to be thermostable, but with the difference that only the specific blocks for the desired chimera were used in the ligation steps.
Quality of library
We completely sequenced 61 randomly chosen chimeras in order to assess the frequency of library construction artefacts, including point mutations, deletions, and insertions. Eighty-nine per cent of the library (54 of 61) contained no amino acid mutations, no insertions, and no deletions. We found one single insertion, and two sequences were missing one-half of the library. Two sequences were back-to-front in the vector, and two sequences contained one remaining tag. Every block from every parent was found in the randomly sequenced chimeras, but CelF block E appears to be underrepresented in the library. The distribution of each block is shown in Table S7.
Protein expression in 96-well plates
In 96-well shallow-well plates, 300 μL of LB medium (10 g of tryptone, 5 g of yeast extract, 10 g of NaCl) containing 100 mg·L−1 ampicillin were inoculated with a single colony of E. coli BL21(DE3) having the cellulase gene on a pET-22(+) plasmid. Plates were grown overnight in an orbital shaker at 37 °C and 250 rpm. In a 96-well deep-well plate, 900 μL of TB medium (12 g of tryptone, 24 g of yeast extract, 4 mL of glycerol, in 1 L of H2O with 17 mm KH2PO4 and 72 mm K2HPO4) containing 100 mg·L−1 ampicillin were inoculated with 50 μL, and grown in an orbital shaker at 37 °C until the D600 nm reached 1.6–1.8. Plates were cooled to < 17 °C, induced with a final concentration of 50 μm IPTG, and grown at 17 °C for 16 h. Cultures were harvested by centrifugation at 5000 g for 10 min, and stored at – 20 °C.
Cellulase activity assay in 96-well plates
Cells were resuspended in 300 μL of lysis buffer (10 mm Tris, pH 8.0, 10 mm MgCl2, 0.7 mg·mL−1 lysozyme, 4 U·mL−1 DNase) per well, and incubated for 60 min at 37 °C. Plates were centrifuged for 5 min at 5000 g at 4 °C. From the supernatant, 100 μL was transferred to a 96-well PCR plate with 50 μL of a 10 g·L−1 Avicel suspension in reaction buffer (50 mm succinate, pH 6.0, 1 mm CaCl2) and 0.2 μm purified miniscaffoldin (Fig. S8). Hydrolysis proceeded overnight at both 50 °C and 75 °C. Plates were centrifuged for 3 min at 200 g at 4 °C, and from each well 50 μL of supernatant was transferred to a new plate. The amount of reducing ends was determined with the Park–Johnson assay.
Park–Johnson activity assay
Reagent A comprised 0.5 g·L−1 K3Fe(CN)6 and 0.2 m K2HPO4 (pH 10.6). Reagent B comprised 5.3 g·L−1 Na2CO3 and 0.65 g·L−1 KCN. Reagent C comprised 2.5 g·L−1 FeCl3, 10 g·L−1 poly(vinylpyrrolidone), and 1 m H2SO4. In a 96-well PCR plate, 50 μL of test sample was mixed with 150 μL of a 2 : 1 A/B mixture (i.e. 100 μL of reagent A and 50 μL of reagent B). The plate was sealed, heated to 95 °C for 15 min, and then cooled to 4 °C. From this plate, 180 μL was transferred to a transparent flat-bottomed screening plate containing 90 μL of reagent C. The plate was incubated in the dark for 1–3 min before the A520 nm was measured in a TECAN plate reader. If glucose equivalents were determined, a calibration curve made from solutions of defined glucose concentrations was included on each plate .
Enzymatic glucose activity assay
The β-glucosidase (BG) solution comprised 0.25 g·L−1 almond BG in 50 mm sodium acetate (pH 5.0). The tetramethylbenzidine (TMB) solution comprised 0.8 g·L−1 TMB in double-distilled H2O. The horseradish peroxidase (HRP) solution comprised 0.15 g·L−1 HRP in 50 mm sodium acetate (pH 5.0). The glucose oxidase (GOX) solution comprised 0.1 g·L−1 GOX in 50 mm sodium acetate (pH 5.0). In a transparent flat-bottomed screening plate, 100 μL of test sample was mixed with 50 μL of BG solution. If glucose equivalents were determined, a calibration curve made from solutions of defined glucose concentrations was included on each plate. The plate was sealed, and incubated for 16 h at 37 °C. For development, 50 μL of TMB solution and 20 μL each of HRP solution and GOX solution were added to the plate. After 5 min, the A650 nm was measured in a TECAN plate reader.
Each cellulase was purified from E. coli BL21(DE3), which contains the cellulase gene with a C-terminal His-tag on a pET-22(+) plasmid under the control of an IPTG-inducible promoter. The cells were grown in TB medium (12 g of tryptone, 24 g of yeast extract, 4 mL of glycerol, in 1 L of H2O with 17 mm KH2PO4 and 72 mm K2HPO4) at 37 °C with 100 mg·L−1 ampicillin. Cells were induced with a final concentration of 50 μm IPTG, grown for 16 h at 17 °C, and harvested by centrifugation for 10 min at 5000 g. Pellets were resuspended in buffer A (20 mm Tris, pH 7.4). The solution was lysed by sonication, and centrifuged at 75 000 g for 30 min to sediment cell debris. The supernatant was loaded onto a 1-mL Ni2+–nitrilotriacetic acid His-trap column (GE Healthcare, Little Chalfont, UK), and purified by washing with 1% buffer B (20 mm Tris, pH 7.4, 100 mm NaCl, 300 mm imidazole) for 15 column volumes, followed by a gradient elution (increase to 80% buffer B in 10 column volumes). Cellulase-containing fractions were pooled, and concentrated with protein concentrators with cellulose-free membranes (Vivaproducts, Middleton, MA, USA). Buffer was changed to 10 mm Tris (pH 8.0) by repeated refills. Purified proteins were flash frozen, and stored at −20 °C for up to 3 months. Protein concentration was determined with the Bradford assay, with BSA as the protein standard. Protein purity was determined from SDS/PAGE gels. The amounts of isolated protein were 15–60 mg·L for dockerin-containing constructs and 120 mg·L−1 for CelY.
Thermostability assay (T50 measurements)
For each well of a 96-well PCR plate, 50 μL of a 20 g·L−1 Avicel suspension in reaction buffer (50 mm succinate, pH 6.0, 1 mm CaCl2) was mixed with 25 μL of 0.8 μm miniscaffoldin and spun down for 10 min at 5 000 g. In a different PCR plate, 30 μL of 0.8 μm cellulase in reaction buffer were pipetted per well. Plates were incubated for 10 min in a gradient PCR cycler at the indicated temperatures, and then placed on ice. Heat-treated cellulases were transferred (25 μL per well) to the Avicel-containing PCR plate, and the reaction was run for 60 min at the indicated temperature. Plates were spun down for 3 min at 200 g. Then, 50 μL of supernatant was transferred to a new 96-well PCR plate and tested with either the Park–Johnson assay or the enzymatic glucose assay.
Temperature profiles (Topt measurements)
A final concentration of 0.2 μm enzyme or 0.2 μm enzyme plus 0.2 μm miniscaffoldin was added to a preheated suspension of 10 g·L−1 Avicel in reaction buffer (50 mm succinate, pH 6.0, 1 mm CaCl2). The hydrolysis was performed at a range of temperatures for 2 h in duplicate. Samples were spun down for 1 min at 200 g at 4 °C. From each well, 50 μL of the supernatant was transferred to a 96-well PCR plate, and analyzed with either the Park–Johnson assay or the enzymatic glucose assay. The Topt was determined from the temperature profiles of the chimeras.
Forty-eight-hour activity assay
A final concentration of 0.2 μm enzyme plus 0.2 μm miniscaffoldin was added to a preheated suspension of 10 g·L−1 Avicel in reaction buffer (50 mm succinate, pH 6.0, 1 mm CaCl2) at 75 °C. At regular intervals, the Avicel was resuspended, and a sample of the reaction mixture was removed and cooled to 4 °C. Samples were spun for 1 min at 200 g, and 50 μL of a 1 : 10 dilution of the supernatant was analyzed with the Park–Johnson assay. The measurements were performed in triplicate.
CD measurements were carried out with an Aviv Model 62DS spectrometer with 6 μm protein sample. Wavelength scans to determine the ellipticity were carried out at 25 °C.
Regression models for T50 and Topt were trained with matlab's ‘regress’ function. The regression model for functionality was trained with L1 regularized logistic regression from the toolbox glmnet for matlab [33, 34].
This work was supported by the Department of the Interior through grant D10AP00065 from the Defense Advanced Research Projects Agency to F. H. Arnold. M. A. Smith is supported by a Resnick Sustainability Institute fellowship, A. Rentmeister by a DFG postdoctoral fellowship, and T. Wu by a CIT summer undergraduate research fellowship (SURF).