Thermophilic lignocellulose deconstruction


  • Sara E. Blumer-Schuette,

    1. Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
    2. Bioenergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA
    Search for more papers by this author
  • Steven D. Brown,

    1. Bioenergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA
    2. Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
    Search for more papers by this author
  • Kyle B. Sander,

    1. Bioenergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA
    2. Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN, USA
    Search for more papers by this author
  • Edward A. Bayer,

    1. Department of Biological Chemistry, The Weizmann Institute of Science, Rehovot, Israel
    Search for more papers by this author
  • Irina Kataeva,

    1. Bioenergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA
    2. Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
    Search for more papers by this author
  • Jeffrey V. Zurawski,

    1. Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
    2. Bioenergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA
    Search for more papers by this author
  • Jonathan M. Conway,

    1. Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
    2. Bioenergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA
    Search for more papers by this author
  • Michael W. W. Adams,

    1. Bioenergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA
    2. Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
    Search for more papers by this author
  • Robert M. Kelly

    Corresponding author
    1. Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
    2. Bioenergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA
    • Correspondence: Robert M. Kelly, Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27695-7905, USA. Tel.: +1 919 515 6396; fax: +1 919 515 3465; e-mail:

    Search for more papers by this author


Thermophilic microorganisms are attractive candidates for conversion of lignocellulose to biofuels because they produce robust, effective, carbohydrate-degrading enzymes and survive under harsh bioprocessing conditions that reflect their natural biotopes. However, no naturally occurring thermophile is known that can convert plant biomass into a liquid biofuel at rates, yields and titers that meet current bioprocessing and economic targets. Meeting those targets requires either metabolically engineering solventogenic thermophiles with additional biomass-deconstruction enzymes or engineering plant biomass degraders to produce a liquid biofuel. Thermostable enzymes from microorganisms isolated from diverse environments can serve as genetic reservoirs for both efforts. Because of the sheer number of enzymes that are required to hydrolyze plant biomass to fermentable oligosaccharides, the latter strategy appears to be the preferred route and thus has received the most attention to date. Thermophilic plant biomass degraders fall into one of two categories: cellulosomal (i.e. multienzyme complexes) and noncellulosomal (i.e. ‘free’ enzyme systems). Plant-biomass-deconstructing thermophilic bacteria from the genera Clostridium (cellulosomal) and Caldicellulosiruptor (noncellulosomal), which have potential as metabolic engineering platforms for producing biofuels, are compared and contrasted from a systems biology perspective.


There has been substantial progress recently in engineering model host microorganisms to produce biofuel molecules ranging from simple alcohols, such as ethanol and butanol (Zhang et al., 2011b), to lipid-based drop-in replacements for aviation fuels, such as biodiesel and biosynthetic paraffinic kerosene (Georgianna & Mayfield, 2012; Larkum et al., 2012). This progress has been based on the strategic incorporation of heterologous pathways, recruited from a biodiverse collection of microorganisms, into the metabolic engineering host. Genome and metagenome sequencing efforts have provided an ever-expanding pool of genes encoding appropriate biocatalysts to make these efforts possible. However, the most difficult challenge to overcome in producing biofuels and industrial chemicals from renewable feedstocks is arguably the economic barrier that is associated with the recalcitrance of lignocellulose (Himmel et al., 2007). As a consequence, chemical, physical, and biological pretreatment is usually factored into the design of a biofuel process, with substantial economic implications (Lynd et al., 2008). However, certain microorganisms are capable of deconstruction of lignocellulosic materials to a significant extent (Himmel et al., 2010). The complexity and heterogeneity of plant biomass are reflected in the diversity of enzymes produced by these organisms that are dedicated to deconstructing plant biomass (Lykidis et al., 2007; Martinez et al., 2008; Barabote et al., 2009; Raman et al., 2009; Berka et al., 2011; Dam et al., 2011; Blumer-Schuette et al., 2012; Suzuki et al., 2012). One can argue that endowing the capacity for lignocellulose degradation that involves perhaps hundreds of genes to a biofuel-producing microorganism lacking this characteristic trait is more of an obstacle than adding metabolic pathways for biofuel or biochemical products to those that are intrinsic plant biomass degraders.

Microorganisms capable of converting plant biomass into fermentable sugars offer the possibility that little or no pretreatment may be required to produce a biofuel or industrial chemical. Recently, a great deal of interest has centered on thermophiles, which degrade lignocellulose by two different biocatalyst-based strategies: The first, packaging a set of enzymes and substrate-binding modules into an extracellular assembly, referred to as the ‘cellulosome’, or, alternatively, secreting ‘free’ multimodular enzymes that synergistically attack the range of carbohydrates comprising the plant cell wall (Blumer-Schuette et al., 2008; Himmel et al., 2010). There are some common features to these two strategies, but to a large extent, they are distinguishable in a number of ways. In this review, the physiological and biochemical basis for the ‘cellulosomal’ and ‘noncellulosomal’ approaches to lignocellulose deconstruction by thermophiles will be compared and contrasted, with an eye toward the ultimate goal of producing liquid biofuels and industrial chemicals.

Production of biofuels and industrial chemicals by thermophilic microorganisms

Interest in thermophilic microorganisms as a source of enzymes stable under process conditions for industrial biotechnology is not a new concept (Zeikus, 1979; Lamed & Zeikus, 1980; Cowan, 1992; Adams, 1993; Adams et al., 1995; Egorova & Antranikian, 2005). Higher process temperatures minimize the potential for biological contamination, reduce viscosity of substrate and product streams, and increase solubility or bioavailability of substrates that can increase reaction rates (Egorova & Antranikian, 2005). Enzymes derived from thermophilic microorganisms allow for higher process temperatures and are often tolerant of otherwise harsh process conditions for biological systems, including high pressure and protein-denaturing solvents (Unsworth et al., 2007). Access to thermophilic biocatalysts has been significantly expanded by genomic technology, because homologous, thermophilic versions of technologically important mesophilic enzymes for biomass processing can often be identified from genome sequence databases (Frock & Kelly, 2012). In fact, genomic DNA need not be the direct source of the gene encoding the enzyme of interest, because synthetic versions are readily generated for subsequent production in a suitable recombinant host. Since the publication of the first genome sequence of a hyperthermophilic archaeon, Methanococcus jannaschii (Topt 85 °C) in 1996 (Bult et al., 1996), and the first hyperthermophilic bacterium, Thermotoga maritima (Topt 80 °C) in 1999 (Nelson et al., 1999), there are well over 230 sequencing projects focused on thermophilic archaea and bacteria (GOLD, However, only a select few of these microorganisms have the metabolic capability to convert plant biomass into fermentable sugars (Demain et al., 2005; Blumer-Schuette et al., 2008).

Biological conversion of lignocellulose to fuel molecules

Biological production of fuels and chemicals from plant biomass involves one of three described processing strategies: separate hydrolysis and (co-)fermentation (SH[c]F), simultaneous saccharification and (co-)fermentation (SS[c]F), or consolidated bioprocessing (CBP; Fig. 1). Multiple vessels are often required for saccharification and fermentation in the SHF schema, thereby incurring capital costs above that for SSF, which combines saccharification and fermentation into one bioreactor. Overall, the economic costs of procuring or producing separate enzymes for SHF and SSF cannot be ignored (Lynd et al., 2008; Klein-Marcuschamer et al., 2012) and are one of the main drivers behind developing commercial technology that combines enzyme production, hydrolysis, and fermentation into one reactor vessel, that is, CBP (Wooley et al., 1999). Other advantages of using thermophiles in SSF or CBP are that the optimal temperatures of the enzymes and fermentative organism are closely matched boosting the efficiency of saccharification (Shaw et al., 2008; Ou et al., 2009), in addition to concurrent saccharification and fermentation relieving product inhibition by mono- and disaccharides. Enhanced thermal lability of liberated oligosaccharides is also a benefit of concurrent saccharification and fermentation because liberated oligosaccharides have been shown to better promote growth at very high temperatures (Driskill et al., 1999) and in the case of Clostridum thermocellum, are assimilated at an energy savings over the assimilation of disaccharides (Zhang & Lynd, 2005). One potential issue running concurrent hydrolysis and fermentation at elevated temperatures is enzyme inactivation by ethanol (Podkaminer et al., 2011); however, this could be alleviated by engineering ethanol-tolerant enzymes and/or continuous distillation of ethanol from the fermentation broth. Techno-economic analyses still need to be conducted to predict any cost-savings between an aerobic, mesophilic versus anaerobic, thermophilic bioprocessing scheme; however, there are some potential savings in capital investment or operating costs that can be predicted. At first glance, the operating costs of running bioreactors at high temperatures may not be above those at ambient temperatures. Balancing of energy costs to achieve savings during the design of anaerobic, thermophilic production facilities would consider heat integration throughout the facility. Additionally, instead of cooling the bioreactors during cell growth as is the case for mesophilic fermentation, the energy input from cell metabolism and hydrolysis reactions would help to maintain the vessel temperature. Further potential cost-savings can be realized from maintaining anoxic bioreactors which would not require any aeration and thus require lower agitation levels than the aerobic ethanologens. It remains to be seen whether running fermentation at elevated temperatures near the boiling point of ethanol will incur an economic benefit of auto-evaporation (Cysewski & Wilke, 1977; Shabtai et al., 1991; Taylor et al., 1995), given that the titer of ethanol will be low. It is possible that the concentration of ethanol in the microenvironment around the cells may allow for evaporation from the fermentation broth, a scenario that requires further experimental inquiry. Additional capital investment and operating costs savings with bioprocessing can be gained if the same microorganism is able to efficiently deconstruct untreated biomass (Yang et al., 2009b, Kataeva et al., 2013). Significant engineering of the organism in respect to enzyme production and enzyme robustness would be required for saccharification to occur at industrially relevant substrate loadings. Trading off between oligosaccharide consumption for high titers of enzyme production versus fermentation to ethanol will have to be a consideration when engineering biomass-deconstructing and solventogenic microorganisms.

Figure 1.

Strategies for the production of second-generation biofuels. Biomass typically requires pretreatment to reduce recalcitrance, allowing for enzyme access to crystalline cellulose (Himmel et al., 2007). SHF (in green) is a two-stage process that requires a pretreatment step to reduce recalcitrance after which the solids (cellulose and lignin) will be hydrolyzed separate from the liquid phase (mostly pentoses; Geddes et al., 2011). Dashes around pentose hydrolysis and fermentation indicate that this step is not always included in the SHF strategy. Resulting hydrolysate will then be fermented by hexose- or pentose-fermenting organisms or co-fermented by a single organism (SHcF). SSF is a one-stage process which increases efficiency by alleviating glucose inhibition of cellulases while reducing the volume required to produce comparable amounts of biofuel in comparison with SHF (Ghosh et al., 1982). Further efficiency can be gained using a microorganism capable of co-fermentation of hexoses and pentoses from the hydrolysate (SScF; Margeot et al., 2009). In both SHF and SSF strategies, off- or on-site production of plant-biomass-deconstructing enzymes is required for hydrolysis, incurring an additional economic cost to biofuel production. A process designed to combat the economic costs of separate enzyme production is CBP, which consolidates enzyme production, hydrolysis, and fermentation in a single-stage process (Lynd et al., 2008). Using microorganisms capable of reducing recalcitrance during hydrolysis could potentially allow for the pretreatment step to be skipped in a mature variation on the CBP. In this strategy, a single organism or community of microorganisms will produce plant-biomass-deconstructing enzymes that will release oligosaccharides that are then imported by the CBP microorganism(s) and co-fermented. Additionally, a CBP strategy requires that the microorganism(s) used be robust in both their enzymatic and metabolic capabilities including the absence of CCR that allows for co-fermentation of hexoses and pentoses.

Due to the recalcitrant nature of plant cell walls, pretreatment of feedstock prior to biological or enzymatic deconstruction is often required. Here, we consider pretreatment to include any method that increases the bioavailability of polysaccharides including those through mechanical, chemical or genetic means, or combinations of all three. Strategies for pretreatment will be a function of the biomass feedstock used (Hamelinck et al., 2005). However, for all biomass feedstocks, regardless of the hydrolysis and fermentation process, some amount of mechanical pretreatment will be required. Mechanical pretreatment, or simply breaking biomass down into smaller particles, increases the surface area, thereby promoting adherence of hydrolytic enzymes and microorganisms and access to the substrate. However, mechanical disruption of biomass to fine particles (1–3 mm) remains costly for large-scale production facilities (Wooley et al., 1999). Potential solutions to lessen the costs of mechanical pretreatment include increased enzyme efficiency and specificity allowing for larger particle sizes of plant biomass feedstock.

Pretreatment of plant biomass is an absolute requirement for processes based on either SH(c)F and SS(c)F, because there is no direct microbial attack on the feedstock which can in turn ‘release’ cellulose by hydrolyzing the enveloping hemicellulose. Common to all three bioprocess systems is the need to overcome recalcitrance of plant biomass owing to lignin. Physiochemical pretreatment, for example steam explosion (Grous et al., 1986; Avellar & Glasser, 1998; Mosier et al., 2005) or ammonia fiber expansion (Dale et al., 1996; Balan et al., 2009), is an attractive pretreatment where solubilization of hemicellulose and lignin can be accomplished while also increasing the surface area for subsequent enzymatic attack. Many variations on chemical pretreatment also exist including the use of an alkali, acid or hot water to partially solubilize lignin and hemicellulose, increasing the availability of cellulose, which in turn increases efficiency of the enzymes used (Kumar et al., 2009). One drawback is that the byproducts of chemical pretreatment, such as phenolics, furans, and acetate, can be inhibitory to fermentative microorganisms (Klinke et al., 2004). However, advances have been made in selecting for inhibitor-tolerant microorganisms, including the extremely thermophilic bacterium, Thermoanaerobacter mathranii (Topt 70–75 °C; Klinke et al., 2001). The development of high-throughput methods to assess the effects of enzyme loading on sugar release could be used to optimize not only SH(c)F or SS(c)F processes, but also assist in the selection of less recalcitrant plant genotypes for bioprocessing that bypass the need for chemical pretreatment (Studer et al., 2010; Studer et al., 2011).

Ideally, a fully developed and mature bioprocessing technology would incorporate an engineered microorganism capable of deconstructing unpretreated biomass solely by virtue of its enzymatic and physiological capabilities. Additionally, use of genetically modified plant biomass feedstocks could benefit such a process (Sticklen, 2007). Genetically modified, and less recalcitrant, switchgrass led to increased ethanol yield concomitant with a reduction in enzyme loading for SSF using yeast, as well as an increase in fermentation metabolites from C. thermocellum (Topt 60 °C), a proposed thermophilic candidate for CBP (Fu et al., 2011b). In general, plant biomass that is genetically modified to be less recalcitrant, such as aspen (Hu et al., 1999; Li et al., 2003), poplar (Coleman et al., 2008; Mansfield et al., 2012), alfalfa (Chen & Dixon, 2007), and switchgrass (Fu et al., 2011a), exhibits higher gains in glucan release, on a per dry weight basis.

Temperature might also be used advantageously to accelerate lignocellulose deconstruction in a thermophilic bioreactor. Thermophilic bioprocessing will act as a mild hot water pretreatment, and in one reported case, it acts in concert with the organism to solubilize all components of plant biomass, including lignin (Kataeva et al., 2013). Another potential use of thermophilic processes is in planta expression of thermostable biomass-degrading enzymes by the plant feedstocks. Here, the biomass feedstock would be brought up to the optimal temperature of the transgenic enzyme(s) to facilitate saccharification prior to or during fermentation (Herbers et al., 1995; Jensen et al., 1996; Borkhardt et al., 2010; Santa-Maria et al., 2011; Zhang et al., 2011a).

When considering potential microbial platforms to engineer for a CBP process, one quandary is the type of organism to start with: primarily fermentative or plant biomass-degrading microorganisms. There are undoubtedly benefits to both strain development strategies (Fig. 2). The ‘recombinant cellulolytic strategy’ (Lynd et al., 2005; la Grange et al., 2010) starts with a species possessing desirable fermentation characteristics, such as the production of ethanol or butanol. Fermentation of sugars to desired products would reduce the amount of metabolic engineering required to efficiently produce a biofuel or chemical. However, the upstream processes, requiring biomass-degrading enzymes, sugar transport, and appropriate metabolic steps, would have to be optimized and introduced into, or modified in, the fermentative organism. As an alternative, the ‘native cellulolytic strategy’ (Lynd et al., 2005; la Grange et al., 2010) starts with an organism that is already equipped for upstream processing, biomass degradation, and carbohydrate transport, requiring minimal engineering of these cellular functions. However, extensive engineering of the downstream processes would be required, including the re-direction of carbon for production of the desired biofuel or chemical. An alternative to both strategies would start with an organism that is completely engineered, ‘top to bottom’, to incorporate both the upstream and downstream processes required for CBP (Bokinsky et al., 2011). While this last strategy appears ‘on paper’ to require more engineering than the other two approaches, it is likely that many challenges will arise in optimizing or diverting native metabolic pathways. By building a wholly synthetic CBP organism using rational design, starting with a ‘minimal’ organism, some of these bottlenecks may be avoided. To this end, a chemically synthesized 1.08 Mbp genome that controls its recipient cell has been reported (Gibson et al., 2010), but the design of a synthetic CBP organism will be a nontrivial task, particularly one that can overcome the recalcitrance of plant biomass, as we discuss below.

Figure 2.

Comparison of strain development strategies for CBP for biofuels. Bold text indicates engineered processes. (a) For a native cellulolytic scenario, species used for strain development will require the incorporation of exogenous fuel pathways or engineering of existing fuel pathways. Upstream processes would already be in place including enzymatic machinery to deconstruct C5 and C6 sugars from plant biomass, carbohydrate transporters capable of importing mono- and oligosaccharides, and metabolic pathways to reduce both C5 and C6 sugars to acetate. (b) For a recombinant cellulolytic strategy, organisms will require the incorporation of most upstream processes, including enzymes capable of plant biomass deconstruction, and carbohydrate transporters. Downstream processes including carbohydrate metabolism and fuel pathways (i.e. butanol or ethanol) would already be in place.

Thermophilic microorganisms for a ‘native cellulolytic strategy’

As mentioned above, there are benefits to operating a CBP reactor at higher temperatures. Microorganisms that are capable of surviving and deconstructing biomass at these temperatures (≥ 60 °C) have received considerable interest for their robust enzymes, and also as potential hosts for CBP (Turner et al., 2007; Blumer-Schuette et al., 2008). In this review, we will classify thermophilic biomass degraders as being either ‘cellulosomal’ or ‘noncellulosomal’, based on the presence or absence of the large multienzyme complex, respectively.

The thermophilic cellulosomal bacterium (Topt 60 °C), C. thermocellum, was the first microorganism from which a cellulosome was described (Lamed et al., 1983a). While other thermophilic clostridia also use cellulosomes such as C. clariflavum (Topt 55–60 °C; Izquierdo et al., 2012), by far the most well characterized is C. thermocellum which this review will focus primarily on. Deconstruction of biomass by C. thermocellum involves the cellulosome and complementary free-enzyme systems (Vazana et al., 2010), both of which include cellulases, xylanases, and pectinases (Zverlov et al., 2005b). In addition, C. thermocellum produces ethanol as a fermentation product (Freier et al., 1988), indicating that the minimal pathways required for production of this biofuel are present, and these may only require metabolic fine-tuning to increase ethanol production. However, while the ability of C. thermocellum to degrade plant biomass and cellulose, in particular, is promising (Zhang & Lynd, 2005), this bacterium does not metabolize the pentoses that are released from hemicellulose, a major component of the biomass, during deconstruction (Demain et al., 2005). Clostridum thermocellum also lacks xylooligosaccharide transporters (Nataf et al., 2009) and has no identifiable xylose isomerase (Kanehisa et al., 2011), thus limiting the amount of biomass-derived sugars that can ultimately be reduced to alcohols. One common trend among currently isolated thermophilic and cellulosomal members of the genus Clostridium is the lost ability to metabolize pentoses (Kato et al., 2004; Izquierdo et al., 2012). Clearly, further metabolic engineering of this organism will be required to insert genes for import and conversion of pentoses and for subsequent optimization of ethanol production. Toward this goal, progress has been made in developing a tractable genetic system for C. thermocellum, including the development of both a transformation system based on electroporation (Tyurin et al., 2004; Tyurin et al., 2005) and a classical pyrF selection system for genetic modification (Tripathi et al., 2010). Furthermore, the framework for metabolomics has been established; a near full accounting of the carbon balance for C. thermocellum ATCC 27405 fermenting microcrystalline cellulose and cellobiose has been reported (Ellis et al., 2012), and the development of a genome-scale metabolic model has been described (Roberts et al., 2010).

In contrast to the cellulosomal paradigm, members of the extremely thermophilic genus (Topt 65–78 °C) Caldicellulosiruptor use complex biocatalysts that are secreted to deconstruct plant biomass (Bolshakova et al., 1994; van de Werken et al., 2008; Dam et al., 2011; Lochner et al., 2011a). One hallmark of this genus is the secretion of modular multimodular enzymes, with a variety of catalytic activities (Gibbs et al., 2000; VanFossen et al., 2011; Blumer-Schuette et al., 2012). Members of the genus Caldicellulosiruptor are metabolically diverse and capable of hydrolyzing and co-metabolizing hexoses and pentoses (Kádár et al., 2004; van de Werken et al., 2008; VanFossen et al., 2009). Fermentation products are minimal: typically molecular hydrogen, acetate, and lactate, in addition to trace amounts of ethanol (Svetlichnyi et al., 1990; Rainey et al., 1994; Mladenovska et al., 1995; Huang et al., 1998; Bredholt et al., 1999; Onyenwoke, 2006; Miroshnichenko et al., 2008; Hamilton-Brehm et al., 2010). Hydrogen production can approach the theoretical limit for anaerobic hydrogenesis of 4 mol H2/mol glucose (van Niel et al., 2002; Kádár et al., 2004; de Vrije et al., 2007; Zeidan & Van Niel, 2009; Zeidan & van Niel, 2010), hinting at a metabolic potential that could be engineered into other desirable end products. Recently, a pyrF-based genetic system for C. bescii has been reported, and expansion of genetics for this bacterium and other species in this genus will facilitate the inclusion of liquid fuel and organic chemical pathways into the genome for development as a CBP platform (Chung et al., 2012, 2013abc). Initial efforts into metabolic engineering of the genus demonstrated a successful deletion of the lactate dehydrogenase gene in C. bescii (Cha et al., 2013).

Other potential, but not well-studied, thermophilic targets for the native cellulolytic strategy also exist. One such group includes thermophilic species from the genus Geobacillus that have been enriched on microcrystalline cellulose or composted plant biomass (Ng et al., 2009; Rastogi et al., 2009; Assareh et al., 2012). One strain of Geobacillus (Topt 70 °C), isolated from a gold mine, was demonstrated to ferment pretreated corn stover and cord grass to ethanol (Zambare et al., 2011). It remains to be seen whether genetic manipulation of geobacilli can establish these bacteria as competitive CBP microorganisms.

Potential CBP platforms also include the aerobic, thermophilic actinomycetes: Acidothermus cellulolyticus (Topt 55 °C; Barabote et al., 2009) and Thermobifida fusca (Topt 50 °C; Lykidis et al., 2007). These actinomycetes are of interest for their thermophilic enzymes, which often are stable at temperatures above their optimal growth temperatures (Tucker et al., 1989; Wilson, 1992). Beyond being a promising source of enzymes for biomass deconstruction, the capability of A. cellulolyticus to grow in solid-state fermentation, using hot-water-extracted switchgrass, was demonstrated (VanderGheynst et al., 2010). However, the measurement of fermentation end products, beyond CO2, was not reported. Clearly, the enzymatic inventory of A. cellulolyticus is promising for CBP. Recently, T. fusca was engineered to produce 1-propanol and was shown to ferment untreated plant biomass (switchgrass and corn stover) to this higher alcohol (Deng & Fong, 2011), indicating that this bacterium also has promise for CBP.

Other candidate CBP platforms will likely emerge in the next few years. As the cost of genome sequencing continues to drop, more of these cellulolytic thermophiles initially isolated and characterized in the pregenomics era will be sequenced and, in some cases, capture attention as prospects for bioprocessing. Furthermore, as an alternative to the traditional ‘isolate and sequence’ strategy, targeted metagenomic screening may also yield new genomes or enzyme systems for metabolic engineering. Screening for biomass-degrading microorganisms in a ruminant fiber-adhered metagenome identified novel carbohydrate-active enzymes with low sequence identity to other proteins in GenBank (Hess et al., 2011). Further characterization of these microorganisms is warranted to broaden the search for superior candidates for CBP platforms, and at the same time expanding our understanding of how microorganisms interact with and deconstruct plant biomass.

Thermophilic microorganisms for a ‘recombinant cellulolytic’ or co-culture strategy

Thermophilic microorganisms capable of ‘solventogenesis’ are required for a thermophilic ‘recombinant cellulolytic’ strategy, where plant biomass-degrading enzymes would be engineered into a suitable host to enable liquid fuel or chemical production. Ethanol production by thermophilic members of the genera Caldanaerobacter (Xue et al., 2001; Soboh et al., 2004), Clostridium (Demain et al., 2005), Thermoanaerobacterium (Shaw et al., 2008), and Thermoanaerobacter (Lamed & Zeikus, 1980; Hemme et al., 2011) has raised interest for potential CBP schema. While most of these ethanologenic species, with the exception of C. thermocellum, are not cellulolytic, many Thermoanaerobacter and Thermoanaerobacterium species do possess xylanolytic enzymes and/or, the capacity to metabolize C5, in addition to C6, sugars (Lacis & Lawford, 1991; Onyenwoke & Wiegel, 2009; Hemme et al., 2011). A superior candidate for the recombinant cellulolytic strategy would require ethanol tolerance above a concentration of 40 g L−1 for the conversion of plant biomass to a liquid fuel to be economically viable for recovery (Lynd, 1996). Wild-type ethanologens, such as C. thermocellum, often tolerate < 16 g L−1 (Herrero & Gomez, 1980), but adapted and engineered strains exhibiting stable growth at 40 g L−1 ethanol, have been reported (Williams et al., 2007; Brown et al., 2011). Furthermore, adapted strains have been reported to survive up to 80 g L−1 ethanol, albeit with minimal growth (Williams et al., 2006). Development of ethanol tolerance in Thermoanaerobacter species is also of interest, with wild-type T. mathranii subsp. mathranii A3 (Larsen et al., 1997) and Thermoanaerobacter strain A10 (Georgieva et al., 2007b) tolerating 40 and 37.6 g L−1 ethanol, respectively. An adapted ethanol-tolerant Thermoanaerobacter ethanolicus strain (39A-H8) survived in up to 64 g L−1 of ethanol (Burdette et al., 2002), and an engineered T. mathranii BG1 strain with a disrupted lactate dehydrogenase gene (BG1L1; Mikkelsen & Ahring, 2007) was also adapted to be able to ferment xylose in the presence of up to 66.4 g L−1 ethanol (Georgieva et al., 2007a). The mechanisms by which these bacteria tolerate ethanol are also important for metabolic engineering strategies in ethanol-sensitive microorganisms (see section ‘Ethanol tolerance’ below).

Crucial to the development of ethanologenic strains for CBP is the availability of their genome sequences (Bao et al., 2002; Hemme et al., 2010), including for those species that are also xylanolytic (Shaw et al., 2008; Hemme et al., 2010) or cellulosomal (Hemme et al., 2010; Feinberg et al., 2011; Brown et al., 2012). At the same time, the availability and efficacy of genetic systems will likely be major limiting factor to strain development for industrial scales. With the recent advances in the genetic system of C. thermocellum (Tripathi et al., 2010), metabolic engineering for improved ethanol yield has proceeded (Argyros et al., 2011). Initial genetic manipulation of Thermoanaerobacterium saccharolyticum (Topt ~ 55 °C) to reduce organic acid production consequently increased ethanol production (Shaw et al., 2008). This was followed by the discovery of natural competence in T. saccharolyticum, which hastened advances in genetic manipulation techniques (Shaw et al., 2010), leading to the creation of markerless deletions to increase ethanol titer (Shaw et al., 2011).

As wild-type C. thermocellum lacks the capacity to import C5 sugars (Nataf et al., 2009) and hence does not ferment xylose (Ng et al., 1977), it is a natural complement to fermentative, C5-metabolizing thermophilic, genera such as Thermoanaerobacter and Thermoanaerobacterium (Demain et al., 2005). In fact, some of the strains of C. thermocellum including the original culture by McBee (1954) and strain C. thermocellum JW20 (ATCC 31549) were later determined to be co-cultures with other bacteria. In the case of C. thermocellum JW20 (ATCC 31549), the co-culture included a new strain of Thermoanaerobacter pseudethanolicus, similar to T. pseudethanolicus 39E (ATCC 33223; Erbeznik et al., 1997). This helped resolve the disputed observations of C. thermocellum growing on xylose (Freier et al., 1988). Laboratory co-cultures of C. thermocellum and Thermoanaerobacter strains have included those with T. ethanolicus (Wiegel, 1980; Wiegel & Ljungdahl, 1981), T. pseudethanolicus and Thermoanaerobacter sp. (strain X514; He et al., 2011; Hemme et al., 2011), and T. thermohydrosulfuricus (Saddler & Chan, 1984). Interestingly, the supplementation of vitamin B12 in co-cultures of C. thermocellum and T. pseudethanolicus increased ethanol titer (He et al., 2011; Hemme et al., 2011) and also partially explained the stimulatory effect of yeast extract on ethanol titer (He et al., 2011). Co-cultures of C. thermocellum and Thermoanaerobacterium species were also successful in the production of ethanol (Table 1). Creation of low organic acid producing Thermoanaerobacterium thermosaccharolyticum strains through random mutagenesis (Lynd et al., 2001) increased ethanol titer in co-culture (Wang et al., 1983; Venkateswaran & Demain, 1986). However, the highest ethanol titer was achieved through metabolic engineering of both C. thermocellum and T. saccharolyticum (Argyros et al., 2011; Table 1).

Table 1. Cellulosomal-ethanologenic co-cultures for liquid fuel production
 C. thermocellum strainSubstrate(s)Ethanol titer (g L−1)References
  1. a

    Reported as millimolar, converted to grams per litre using a molecular weight of 46.07 g mol−1 for ethanol.

T. ethanolicus ATCC 31550ATCC 31549Cellulose3.2Ljungdahl & Wiegel (1981) and Wiegel & Ljungdahl (1981)
T. pseudethanolicus ATCC 33223ATCC 35609Cellulose4.5aNg et al. (1981)
T. pseudethanolicus ATCC 33223ATCC 35609Cellulose6.6aHe et al. (2011) and Hemme et al. (2011)
T. thermohydrosulfuricus DSM 2247DSM 1313Biomass, cellulose< 3Saddler & Chan (1984)
Thermoanerobacter sp. str. X514ATCC 35609Cellulose12.2aHe et al. (2011) and Hemme et al. (2011)
T. saccharolyticum JW/SL-YS485DSM 1313Cellulose< 3Argyros et al. (2011)
T. saccharolyticum ALK2M1570Cellulose38.1Argyros et al. (2011)
T. thermosaccharolyticum ATCC 31925DSM 1313Biomass, cellulose< 3Saddler & Chan (1984)
T. thermosaccharolyticum ATCC 31960ATCC 27405Cellulose13.8Venkateswaran & Demain (1986)
T. thermosaccharolyticum ATCC 31960ATCC 31924Cellulose16Venkateswaran & Demain (1986)
T. thermosaccharolyticum ATCC 31925ATCC 31924Biomass, cellulose9.7, 25.3Wang et al. (1983)

Fermentation of plant biomass to butanol is gaining attention as a ‘drop-in’ liquid fuel (Green, 2011). Only a few thermophilic strains capable of fermenting sugars to butanol have been described. A strain of T. thermosaccharolyticum (DSM 571) was reported to produce both ethanol and butanol through fermentation, although the ratio of ethanol to butanol approached 2 : 1 (Freier-Schröder et al., 1989). A Thermoanaerobacterium sp. strain 260 (DSM 21660) that directly ferments glycerol to butanol has been reported, although it remains to be seen whether a dedicated butanol pathway is resposible or whether the observed production of butanol is due to promiscuous enzyme activity (Mladenovska & Dabrowski, 2010). A Geobacillus/Ureibacillus strain (Green Biologics Ltd.) was engineered to produce butanol through conversion of acetyl-CoA using a pathway constructed of thermophilic enzymes (Green et al., 2011). Whether native thermophilic butanol pathways occur in these organisms is not clear. However, with renewed interest in ‘bio-butanol’ (Cascone, 2008; Green, 2011) and rapid genome sequencing capabilities, it may be just a matter of time before native thermophilic butanol producers are identified and characterized.

As an alternative to converting sugars to butanol in the same microorganism, sequential co-cultures of C. thermocellum with mesophilic butanol-producing clostridia, using C. thermocellum to hydrolyze cellulose, have been developed (Nakayama et al., 2011). Mesophilic, acetone–butanol–ethanol (ABE) fermenting C. saccharoperbutylacetonium was able to convert the released cellooligosaccharides to butanol, without the exogenous addition of butyrate, unlike other ABE mesophiles, that is, C. acetobuylicum and C. beijerinckii (Nakayama et al., 2011). As sequential co-cultures of thermophiles and mesophiles are at best an extension of SSF schemes, and likely to be economically infeasible, the discovery and engineering of thermophilic butanol pathways as part of a recombinant cellulolytic strategy for CBP would be logical. It is curious that there are so few examples as yet of wild-type, thermophilic microorganisms producing butanol. This may be that at higher temperatures, it is thermodynamically more favorable to produce hydrogen or ethanol or, more likely, that we simply have not looked hard enough for this metabolic feature in high-temperature biotopes.

Thermophiles as genetic reservoirs of enzymes for lignocellulose deconstruction

Whether a ‘native cellulolytic’ or ‘recombinant cellulolytic’ strategy is considered, inserting additional enzymes into an already established CBP microorganism can either expand the spectrum of polysaccharides hydrolyzed from plant biomass or incorporate more efficient enzymes for polysaccharide hydrolysis. Many bacteria, while not ideal CBP platforms themselves, can serve as genetic reservoirs for thermophilic, carbohydrate-active enzymes (Tucker et al., 1989; Bergquist et al., 2002; Wilson, 2004; Conners et al., 2006; Blumer-Schuette et al., 2008). Such organisms, with optimal growth temperatures close to desired process temperatures, often produce enzymes that are optimally active at or above those temperatures (Table 2).

Table 2. Characterized thermophilic enzymes for potential inclusion into engineered CBP organisms
Locus tagActivityModulesSpeciesTopt (°C)References
Dtur_0462β-GlucosidaseGH1D. turgidum DSM 6724N/ABrumm et al. (2011)
DICTH_1689β-GlucosidaseGH1D. thermophilum DSM 396090Zou et al. (2012)
CTN_0782 (GghA)β-GlucosidaseGH1 T. neapolitana 95Yernool et al. (2000)
Tbis_0839β-GlucosidaseGH1 T. bispora 60Wright et al. (1992)
Tfu_0937 (BglC)β-GlucosidaseGH1T. fusca YX60Spiridonov & Wilson (2001)
Dtur_0219β-GlucosidaseGH3D. turgidum DSM 672485Kim et al. (2011b)
CTN_0670 (BglB)β-GlucosidaseGH3 T. maritima 90Zverlov et al. (1997b)
Acel_0614 (E1)Endo-1,4-β-glucanaseGH5, CBM2A. cellulolyticus 83Tucker et al. (1989)
(EBI-244)Endo-1,4-β-glucanaseGH5Archaeon EBI-244109Graham et al. (2011)
Dtur_0276Endo-1,4-β-glucanaseGH5D. turgidum DSM 6724N/ABrumm et al. (2011)
Dtur_0670Endo-1,4-β-glucanase GH5D. turgidum DSM 6724N/ABrumm et al. (2011)
Dtur_0671Endo-1,4-β-glucanaseGH5D. turgidum DSM 6724N/ABrumm et al. (2011)
Tfu0901 (E5)Endo-1,4-β-glucanaseCBM2, GH5T. fusca YXN/ALao et al. (1991)
Tfu2712Endo-1,4-β-glucanaseGH5, CBM3T. fusca YX77Posta et al. (2004)
TM1751Endo-1,4-β-glucanaseGH5 T. maritima 80Chhabra et al. (2002) and Bharadwaj et al. (2010)
Acel_0615* (GuxA)Endo-1,4-β-glucanaseGH6, CBM3, GH12, CBM2 A. cellulolyticus 62Ding et al. (2003)
Tfu0620 (E3)Exo-1,4-β-glucanaseGH6, CBM2T. fusca YX55Lao et al. (1991) and Zhang et al. (1995)
Tfu1074 (E2)Endo-1,4-β-glucanaseGH6, CBM2T. fusca YX72Lao et al. (1991) and Beadle et al. (1999)
Aaci_2475Endo-1,4-β-glucanaseGH9A. acidocaldarius DSM 44670Eckert et al. (2002)
Tfu1627 (E1)Endo-1,4-β-glucanaseCBM4, GH9, CBM2T. fusca YXN/AJung et al. (1993)
Tfu2176 (E4)Processive endoGH9, CBM3, CBM2T. fusca YXN/ALao et al. (1991) and Jung et al. (1993)
Acel_0615* (GuxA)Endo-1,4-β-glucanaseGH6, CBM3, GH12, CBM2A. cellulolyticus 62Ding et al. (2003)
Acel_0619Endo-1,4-β-glucanaseGH12, CBM2A. cellulolyticus N/ALinger et al. (2010)
Rmar_1627Endo-1,4-β-glucanaseGH12R. marinus DSM 425290–100Halldórsdóttir et al. (1998)
TM1524Endo-1,4-β-glucanaseGH12 T. maritima 90Liebl et al. (1996)
TM1525Endo-1,4-β-glucanaseGH12 T. maritima 85Liebl et al. (1996)
TM0024Endo-1,3-β-glucanaseCBM4, GH16, CBM4 T. maritima N/ABronnenmeier et al. (1995)
CTN_0671 (Lam16A)Endo-1,3-β-glucanaseCBM4, GH16, CBM4 T. neapolitana 95Zverlov et al. (1997a)
Tpet_0899 (TpLam)Endo-1,3-β-glucanaseCBM4, GH16, CBM4 T. petrophila 91Cota et al. (2011)
Acel_0617Exo-1,4-β-glucanaseCBM3, GH48, CBM2A. cellulolyticus N/ADing et al. (2003)
Tfu1959Exo-1,4-β-glucanaseCBM2, GH48T. fusca YX50Irwin et al. (2000) and Kostylev & Wilson (2011)
Aaci_0048Endo-1,4-β-glucanaseGH51A. acidocaldarius DSM 44680Eckert & Schneider (2003)
Acel_0618Exo-1,4-β-glucanaseGH74, CBM3, CBM2A. cellulolyticus N/ADing et al. (2003)
TM0305Endo-1,4-β-glucanaseGH74 T. maritima 90Chhabra & Kelly (2002)
Tfu1612Endo-1,4-β-glucanaseGH74, CBM2T. fusca YX60Irwin et al. (2003)
Tfu2130Endo-1,3-β-glucanaseGH81T. fusca YX50McGrath & Wilson (2006)
CTN_0783 (CbpA)Cellobiose phosphorylaseGH94 T. neapolitana 85Yernool et al. (2000)
Dtur_0852β-XylosidaseGH3D. turgidum DSM 6724N/ABrumm et al. (2011)
TM0076β-XylosidaseGH3 T. maritima 90Xue & Shao (2005)
Acel_0372Endo-1,4-β-xylanaseGH10 A. cellulolyticus 90Barabote et al. (2010)
(XynA)Endo-1,4-β-xylanaseGH10D. thermophilum Rt46B.185Gibbs et al. (1995) and Borkhardt et al. (2010)
Dtur_1647Endo-1,4-β-xylanase & Endo-1,4-β-glucanaseGH10D. turgidum DSM 6724N/ABrumm et al. (2011)
Dtur_1715Endo-1,4-β-xylanase CBM22, CBM22, GH10, CBM9, CBM9D. turgidum DSM 6724N/ABrumm et al. (2011)
Rmar_0119 (Xyn1)Endo-1,4-β-xylanase CBM4, CBM4, GH10R. marinus DSM 425280Karlsson et al. (2004)
Tfu2791 (10B)Endo-1,4-β-xylanase GH10T. fusca YX62Kim et al. (2004)
TM0070 (XynB)Endo-1,4-β-xylanase GH10 T. maritima 90Winterhalter et al. (1995) and Zhengqiang et al. (2001)
TM0061 (XynA)Endo-1,4-β-xylanase CBM22, CBM22, GH10, CBM9, CBM9 T. maritima 90Winterhalter et al. (1995)
(XynB)Endo-1,4-β-xylanase GH11, CBM36D. thermophilum Rt46B.185Morris et al. (1998), McCarthy et al. (2000) and Borkhardt et al. (2010)
Tfu1213 (XynA)Endo-1,4-β-xylanase GH11, CBM2T. fusca YX75Irwin et al. (1994)
Tfu1616β-XylosidaseGH43T. fusca YX55–60Moraïs et al. (2012)
TM1624β-MannosidaseGH2 T. maritima 87Parker et al. (2001b)
CTN_0834 (Man2)β-MannosidaseGH2 T. neapolitana 92Parker et al. (2001a)
Tfu0915β-MannosidaseGH2T. fusca YX53Béki et al. (2003)
Acel_0616 (ManA)β-MannanaseGH5A. cellulolyticus  Ding et al. (2006)
Tfu0900β-MannanaseGH5, CBM2T. fusca YX80Hilge et al. (1998)
TM1227β-MannanaseGH5, CBM27 T. maritima 90Parker et al. (2001b) and Chhabra et al. (2002)
Tpet_1542 (TpMan)β-MannanaseGH5, CBM27 T. Petrophila 81–93Santos et al. (2012)
CTN_1345 (Man5)β-MannanaseGH5, CBM27 T. neapolitana 90–92Duffaud et al. (1997) and Parker et al. (2001b)
(ManA)β-MannanaseCBM35, GH26D. thermophilum Rt46B.180Gibbs et al. (1999)
Rmar_0016 (ManA)β-MannanaseCBM35, GH26R. marinus DSM 425285Politz et al. (2000)
AaManAβ-MannanaseGH113A. acidocaldarius Tc-12-3165Zhang et al. (2008)
Side-chain degrading
Dtur_1799β-GalactosidaseGH1D. turgidum DSM 6724N/ABrumm et al. (2011)
Aaci_1895 (GlyB)β-GalactosidaseGH1A. acidocaldarius DSM 44685Di Lauro et al. (2006)
TM0434α-GlucuronidaseGH4 T. maritima 50Suresh et al. (2002)
TM0752α-GlucuronidaseGH4 T. maritima 80Suresh et al. (2002)
TM0306α-L-FucosidaseGH29 T. maritima N/ATarling et al. (2003)
Dtur_1670α-GalactosidaseGH36D. turgidum DSM 6724N/ABrumm et al. (2011)
TM1192α-GalactosidaseGH36 T. maritima 90–95Liebl et al. (1998)
TM1851α-MannosidaseGH38 T. maritima 80Nakajima et al. (2003)
Aaci_2891 (LacB)β-GalactosidaseGH42A. acidocaldarius DSM 44665–70Di Lauro et al. (2008) and Yuan et al. (2008)
Dtur_0505β-GalactosidaseGH42D. turgidum DSM 6724N/ABrumm et al. (2011)
TM1195β-GalactosidaseGH42 T. maritima N/AMoore et al. (1994)
TM0281α-L-ArabinofuranosidaseGH51 T. maritima 90Miyazaki (2005)
Tpet_0631α-L-ArabinofuranosidaseGH51 T. petrophila 65Souza et al. (2011)
TM0055α-GlucuronidaseGH67 T. maritima 85Ruile et al. (1997)
Dtur_1714α-GlucuronidaseGH67D. turgidum DSM 6724N/ABrumm et al. (2011)
TM0077Acetyl esteraseCE7 T. maritima 100Hedge et al. (2012) and Levisson et al. (2012)
TM0437 (PelB)ExopolygalacturonaseGH28 T. maritima 80Parisot et al. (2003) and Kluskens et al. (2005)
TM1201Endo-β-1,4-galactanaseGH53 T. maritima 90Yang et al. (2006)
TM0433 (PelA)Pectin lyasePL1 T. maritima 90Kluskens et al. (2003)

Soil environments – the thermophilic actinomycetes

Thermostable enzymes from a number of moderately thermophilic actinomycetes, most notably A. cellulolyticus and T. fusca (formerly ‘Thermomonospora fusca’), have been characterized (Tucker et al., 1989; Baker et al., 1994; Wilson, 2004). Genome sequences are available for all four thermophilic actinomycetes discussed here (Lykidis et al., 2007; Barabote et al., 2010; Liolios et al., 2010; Chertkov et al., 2011), thereby facilitating the identification of thermostable enzymes for plant biomass deconstruction. Currently, the only enzymes from Thermobispora bispora or Thermomonospora curvata that have been biochemically characterized are a β-glucosidase (Wright et al., 1992) and an endo-glucanase (Lin & Stutzenberger, 1995), respectively. Few modular enzymes are evident in the genome of T. curvata with a relatively low ‘modular metric’ (MM, see Fig. 3) of 7.2; the MM is based on the number of glycoside hydrolase (GH) and carbohydrate-binding modules (CBM) present in the genomes. However, the genome of T. bispora encodes more modular enzymes (MM = 43.2), which await biochemical characterization. The most extensively characterized enzyme systems come from T. fusca and A. cellulolyticus, which, as mentioned above, employ a ‘free-enzyme’ paradigm, including both simple and modular biocatalysts, to deconstruct plant biomass (Wilson, 2004; Barabote et al., 2009).

Figure 3.

Thermophilic bacteria and archaea of interest as potential enzyme or pathway donors. DNA gyrase B sequences were aligned using clustalw (Thompson et al., 1994), and a phylogenetic tree was built using mega5 (Tamura et al., 2011). Branches of solventogenic thermophiles are highlighted in blue. Species names, MM, and polysaccharide profiles are highlighted according to optimal growth temperature: red (Topt ≥ 80 °C), brown (Topt ≥ 65 °C), and green (Topt ≥ 50 °C). The ‘MM’ represents the ratio of CBM to GHs over the total number of CBM and GH ([CBM/GH]/[CBM + GH]) and is used here as a relative measure of modular enzymes present in a given genome. The column header ‘Polysaccharide’ represents the predicted types of carbohydrates that could be hydrolyzed based on annotated GHs by CAZy (Cantarel et al., 2009), Polysaccharides are denoted as α, α-linked glucans; β, β-linked glucans; C, crystalline cellulose; P, polygalacturonate; X, xylan. Species in bold are those that could be used in a ‘native cellulolytic strategy’ (la Grange et al., 2010) for strain development. Species abbreviations used are gene locus tags, here followed by the full species name and optimal growth temperature: Tpet (Thermotoga petrophila, 80 °C); TM (Thermotoga maritma, 80 °C); TRQ2 (Thermotoga sp. strain RQ2, 76–82 °C); CTN (Thermotoga neapolitana, 80 °C); Tlet (Thermotoga lettingae, 65 °C); Pmob (Petrotoga mobilis, 58–60 °C); Dtur (Dictyoglomus turgidum, 72 °C); DICTH (Dictyoglomus thermophilum, 73–78 °C); Tthe (Thermoanaerobacterium thermosaccharolyticum, 60 °C); Tsac (Thermoanaerobacterium saccharolyticum, 60 °C); Thexy (Thermoanaerobacterium xylanolyticum, 60 °C); TTE (Caldanaerobacter subterraneus subsp. tengcongensis, 75 °C); Tmath (Thermoanaerobacter mathranii subsp. mathranii, 70–75 °C); Teth514 (Thermoanaerobacter sp. strain X514, 60 °C); Theet (Thermoanaerobacter ethanolicus, 69 °C); Teth39 (Thermoanaerobacter pseudethanolicus, 65 °C); Cthe (Clostridium thermocellum ATCC 27405, 60 °C); Clo1313 (C. thermocellum DSM 1313, 60 °C); Csac (Caldicellulosiruptor saccharolyticus, 70 °C); Calkr (Caldicellulosiruptor kristjanssonii, 78 °C); Calla (Caldicellulosiruptor lactoaceticus, 68 °C); Calhy (Caldicellulosiruptor hydrothermalis, 65 °C); Calkro (Caldicellulosiruptor kronotskyensis, 70 °C); Cbes (Caldicellulosiruptor bescii, 75 °C); Calow (Caldicellulosiruptor owensensis, 75 °C); COB47 (Caldicellulosiruptor obsidiansis, 78 °C); Tbis (Thermobispora bispora, 55 °C); Tcur (Thermomonospora curvata, 50 °C); Tfu (Thermobifida fusca, 55 °C); Acel (Acidothermus cellulolyticus, 55–60 °C); Rmar (Rhodothermus marinus, 65 °C); Aaci (Alicyclobacillus acidocaldarius strain DSM 446, 75 °C); STHERM (Spirochaeta thermophila strain DSM 6192, 64–66 °C).

A cellulase system has been described in T. fusca (Topt 55 °C), which includes two exo-acting, four endo-acting, and one processive cellulase (Lao et al., 1991; Irwin et al., 1993; Jung et al., 1993; Zhang et al., 1995; Irwin et al., 2000; Lykidis et al., 2007; Table 2). Optimal temperatures for these enzymes range from 50 to 77 °C, and, as such, are potential targets for incorporation into a thermophilic CBP platform (Table 2). While many thermophiles have endo-1,4-β-glucanases capable of hydrolyzing soluble forms of cellulose (from the genera Thermotoga and Dictyoglomus, for example), there are far fewer enzymes capable of hydrolyzing crystalline cellulose. A key criterion for the identification of superior enzymes is based on the activity of these cellulose-acting enzymes, both alone and in concert with one another. In the case of T. fusca, not all of the endo-acting cellulases acted synergistically with one another; however, all endo- plus exo-acting cellulase combinations tested worked in synergism as did the exo- and processive endo-acting cellulases (Irwin et al., 1993).

Unlike the other thermophilic soil-based actinomcyetes, A. cellulolyticus (Topt 55 °C) was originally isolated from acidic hot springs in Yellowstone National Park (Mohagheghi et al., 1986). One of the originally characterized thermostable endoglucanases was, in fact, an endo-1,4-β-glucanase (E1) from A. cellulolyticus (Tucker et al., 1989). This enzyme is located in a cluster of five GH genes (Ding et al., 2003), encoding both endo- and exo-acting cellulases (Table 2). E1 has been extensively characterized; structural analysis supports the fact that E1 has a catalytic mechanism similar to other GH5 family enzymes (Sakon et al., 1996; Liu et al., 2010). Site-directed mutagenesis was used to study the modulation of cellulose-binding capacity in this enzyme (McCarter et al., 2002) and modeling the energetics involved in ‘peeling’ a single polysaccharide chain off of crystalline cellulose provided additional insights (Skopec et al., 2003).

Considering the isolation site for A. cellulolyticus, the incorporation of enzymes or binding modules from other, more thermophilic organisms into its own genome is a plausible explanation for its observed GH inventory. While CBM2 are enriched in Actinobacteria modular enzymes (60.4% CBM2) compared with Firmicutes (10.2% CBM2), CBM3s are enriched in Firmicutes (77% CBM3) compared with Actinobacteria (9.5% CBM3; Hamilton-Brehm et al., 2010). Considering the relatively even number of CBM2s and CBM3s (10 vs. 9, respectively;, this actinomycete appears to have selected for enzymes that are hybrids between the typical Actinobacteria and typical Firmicutes enzyme architectures. Both CBM families appear together in the same enzyme in all but two cases, where only a single CBM2 is present (Acel_0614 and Acel_0619). The most thermophilic, endo-acting enzyme studied to date from A. cellulolyticus, however, is E1, which is comprised of GH5 and CBM2 modules; only the GH5 module has any amino acid sequence homology to other GH5 enzymes from various hyperthermophilic Archaea (< 44%). The data regarding temperature optima for the various A. cellulolyticus enzymes are still rather sparse, and whether the coupled theme of CBM3 and CBM2 in the same enzyme reflects a requirement for substrate binding of this bacterium in its hyperthermophilic environment remains to be demonstrated experimentally.

It is interesting that the endo-1,4-β-glucanase with the highest reported optimal temperature among the T. fusca cellulases is also a modular enzyme containing both GH5 and CBM3 modules (Tfu_2712, Table 2). This is in contrast to the majority of the T. fusca cellulases that have the classical Actinobacteria architecture based on various GH families and CBM2 modules (Table 2) and points to gains in thermostability of some enzymes being in part due to ancient horizontal gene transfer events. In T. fusca, only two CBM3s are linked to GH modules, the other CBM3 being coupled to a GH9 module and important for the processive nature of the enzyme (Sakon et al., 1997), compared with 15 modular enzymes with CBM2s (Lykidis et al., 2007). It could be that the CBM3 module may play a thermostabilizing role in the case of Tfu_2712, or alternatively, thermostability may result from this enzyme having been gained by an ancient gene transfer from an extremely thermophilic partner. Based on blast protein homology (Altschul et al., 1990) over the entire multimodular structure of Tfu_2712, proteins with the highest sequence identity are represented from the extremely thermophilic genus Calidcellulosiruptor and may indicate that this enzyme was in fact obtained from an ancient extremely thermophilic species.

Both A. cellulolyticus and T. fusca also produce thermophilic xylanases from GH families 10 and 11 and xylosidases from GH family 43 (Table 2). Most notable is Xyn10A from A. cellulolyticus, which has an optimal temperature and specific activity similar to GH10 xylanases from T. maritima (Topt 80 °C; Barabote et al., 2010). A homolog of XynA from T. fusca (‘TfxA’) is also relatively thermophilic and is modular with a CBM2. This makes TfxA attractive for plant biomass deconstruction, because the CBM2 binds both cellulose and insoluble xylan (Irwin et al., 1994), thereby bringing the enzyme into close contact with the complex polysaccharide matrix of plant cell walls.

Marine environments – Rhodothermus and Thermotogales

Members of the thermophilic bacterial order Thermotogales and the genus Rhodothermus were isolated from marine hot springs, where α- and β-linked glucans, chitin, xylooligosaccharides, mannans, and glucuronides are present and often reflect the broad GH profile observed in these bacteria (Blumer-Schuette et al., 2008). However, as crystalline cellulose is lacking in marine environments, a dearth of primary (‘true’) cellulases and cellulose-binding modules is expected and that is what is observed.

Rhodothermus marinus (Topt 65 °C), an aerobic thermophile, produces thermostable enzymes of interest, including a GH12, endo-acting glucanase (Halldórsdóttir et al., 1998), xylanases (Dahlberg et al., 1993), and a yet uncharacterized GH9 enzyme (Nolan et al., 2009). Few enzymes from these bacteria are modular, because they lack both a GH catalytic site and CBM (MM = 17.8). Modular enzymes from this species include a mannanase (CBM35-GH26, Rmar_0016) and a putative xylanase (GH43-CBM6, Rmar_1068). The latter is found in a genomic region, rich with carbohydrate-active enzymes, that includes a modular xylanase (CBM4-CBM4-GH10, Rmar_1069). The xylanase is apparently localized to the outer cell wall, similar to what has been found for another marine thermophile, T. maritima (Topt 80 °C; Liebl et al., 2008). In addition, this species does produce an intracellular, thermostable endo-acting glucanase (Rmar_1627), with activity at temperatures well above the optimal growth temperature of R. marinus (Table 2). There are also two putative GH9 cellulases identified in the genome of this species, one extracellular, and the other intracellular (Rmar_0076 and Rmar_0525, respectively), although this activity has yet to be confirmed.

Potential plant biomass-hydrolyzing enzymes from members of the genus Thermotoga include those that degrade almost every major polysaccharide from plant cell walls, with the notable exception of crystalline cellulose (Fig. 3). The lack of true cellulase activity in the Thermotogales is consistent with the absence of CBM3s linked to glucanases. In support of this, the activities of GH74 and GH5 glucanases from T. maritima (Table 2) on microcrystalline cellulose were increased when fused to non-native CBMs (Chhabra & Kelly, 2002; Mahadevan et al., 2008). While Thermotoga species are typically thought to use a free-enzyme system, both an extracellular xylanase, XynA (TM0061, Table 2; Liebl et al., 2008) and amylase (Schumann et al., 1991), from T. maritima bind to the outer cell membrane (the ‘toga’). Subcellular localization of enzymes (i.e. cell wall bound) may be just as important as their catalytic activity, as already shown in C. thermocellum, where the close proximity between the bacterium and substrate, facilitated by the cellulosome, enhances enzymatic activity (Lu et al., 2006). These enzymes are not only ideal for CBP platform development, but also potential mechanisms for in vivo pretreatment of biomass in plant tissue by xylanases (Kim et al., 2011a), α-amylases (Santa-Maria et al., 2009; Santa-Maria et al., 2011), and endo-glucanases (Kim et al., 2010; Mahadevan et al., 2011). Use of transgenic plant biomass that undergoes auto-hydrolysis to fermentable sugars under controlled conditions is an interesting concept for keeping enzyme and pretreatment costs down during bioconversion of lignocellulose.

Terrestrial hot springs – Alicyclobacillus, Dictyoglomi, and Spirochaeta

Alicyclobacillus acidocaldarius (formerly ‘Bacillus acidocaldarius’, Topt 65 °C) was among the original thermoacidophilic bacteria isolated at Yellowstone National Park (Darland & Brock, 1971). Consistent with its natural habitat, extracellular enzymes from A. acidocaldarius are thermoacidophilic, an important trait for incorporation into a CBP scheme that includes acid pretreatment. An extracellular endo-acting GH51 glucanase with both thermal and acidic tolerance was described (Eckert & Schneider, 2003); GH51 typically includes α-L-arabinofuranosidases and glucanases, not solely glucanases (Table 2, Another enzyme from this bacterium with interesting properties is an intracellular endo-acting cellulase (GH9; Eckert et al., 2002). Its structure was solved (Pereira et al., 2009) in complex with cellooligosaccharides (Eckert et al., 2009) to examine the reasons for cellobiose release from short-chain cellooligosaccharides, but not from longer chains of soluble cellulose. A novel GH family (GH113) was also established, based on a β-mannanase, with preference for glucomannans (Zhang et al., 2008).

Originally isolated from terrestrial hot springs in Japan, members of the genus Dictyoglomus grow at temperatures up to 80 °C and produce thermostable xylanases. Dictyoglomi form multicellular ‘rotund bodies’ (Saiki et al., 1985; Patel et al., 1987) comprised of many protoplast-like cells surrounded by a common outer membrane (Hoppert et al., 2012). Given that both the Thermotogae and Dictyoglomi form large outer membranes that surround protoplasts (Liebl et al., 2008; Hoppert et al., 2012), it is not surprising that comparison of entire genome sequences indicates that the Dictyoglomi are more closely related to the phylum Thermotogae than to the Firmicutes (Nishida et al., 2011). Thermophilic xylanases and mannanases from Dictyoglomi (Gibbs et al., 1995; Morris et al., 1998) were touted for potential use in pulp and paper bleaching (Mathrani & Ahring, 1992; Table 2). Although one member of the genus, isolated from Kamchatka, Russia, grew on carboxymethylcellulose (Patel, 2010), no evidence of cellulose-binding CBMs can be found in the genome (Brumm et al., 2011), indicating that the responsible enzymes are most likely glucanases, and not cellulases. Currently, whole-genome sequences are available for Dictyoglomus turgidum and D. thermophilum (GenBank accession number CP001146.1). Each species contains numerous carbohydrate-active enzymes (56–57), as identified by CAZy (Cantarel et al., 2009), but only a few of which are linked to CBMs (Fig. 3; Table 2). Similar to most of the Thermotogae, the Dictyoglomi hydrolyze a broad spectrum of polysaccharides, despite the paucity of CBMs (Fig. 3). Mining the D. turgidum genome for carbohydrate-active enzymes (Table 2), using shotgun cloning in conjunction with bioinformatics analysis, identified 12 different GHs with catalytic activities important for plant biomass deconstruction (Brumm et al., 2011). There is evidence of a broader pan genome for the genus Dictyoglomus, as isolates from hot springs in the Uzon Caldera of Kamchatka, Russia, were able to hydrolyze microcrystalline cellulose and carboxymethylcellulose, although the corresponding enzymes responsible for this activity were not identified (Kublanov et al., 2009).

In addition to the thermophilic members of the phyla Firmicutes and Dictyoglomi, thermophilic members from the phylum Bacteroidetes, Spirochaeta thermophila DSM 6192 and DSM 6578 (Topt 65 °C), in particular, are capable of degrading a wide variety of polysaccharides, including cellulose (Rainey et al., 1991; Aksenova et al., 1992). While these strains remain relatively uncharacterized with respect to plant biomass degradation, a novel family of CBMs that bind to cellulose were identified in S. thermophila DSM 6192 and are found in many modular enzymes, including those that are predicted to be cellulases (Angelov et al., 2011). Currently, this novel family (CBM64) has only been identified in the genomes of both S. thermophila species (DSM 6192 and DSM 6578) and a species from the Firmicutes, Mahella australiensis ( Additionally, the lack of an identifiable exo-cellulase in the genome of S. thermophila species hints at a novel mechanism for crystalline cellulose hydrolysis and clearly warrants further biochemical characterization of this bacterium's carbohydrate-active enzymes.

New approaches for enzyme discovery may yield more candidate enzymes for plant biomass deconstruction. A recent screen of cellulolytic communities yielded a novel, thermostable endo-acting cellulase EBI-244 (Table 2). By enriching on crystalline cellulose, an archaeal consortium capable of biological cellulose deconstructing at temperatures above 90 °C was screened for novel cellulases (Graham et al., 2011). No known CBM families were found in this enzyme, although the C-terminal modules were needed for catalytic activity and may very well represent an as yet uncharacterized CBM family. Further screening of natural cellulolytic communities for novel, highly thermostable exo-acting cellulases is warranted to provide tools for improving thermophilic CBP platform microorganisms.

The cellulosome of C. thermocellum

Clostridum thermocellum provides the paradigm for bacterial crystalline cellulose hydrolysis using a single multiprotein complex. More than 30 years ago, observation of the adherence of C. thermocellum to insoluble cellulose, even when grown under conditions of constant agitation, was noted, and subsequently the mechanism was characterized (Bayer et al., 1983). The complex responsible was isolated (Lamed et al., 1983a) and described as a ‘cellulosome’ (Lamed et al., 1983b). The C. thermocellum cellulosome (Ct-cellulosome) has been shown to be capable of efficiently degrading plant cell wall polysaccharides, including crystalline cellulose (Lu et al., 2006). Most cellulosomes display a range of plant-biomass deconstruction catalytic activities, including xylanase, mannanase, arabinofuranosidase, lichenase, and pectate lyase, in addition to endoglucanase and exoglucanase (Schwarz, 2001; Bayer et al., 2004; Demain et al., 2005). Many members of the order Clostridiales possess the machinery to produce cellulosomes, although C. thermocellum and C. clariflavum are the only two described cellulosomal species isolated capable of efficient crystalline cellulose hydrolysis at higher temperatures (Topt ≥ 55–60 °C; Demain et al., 2005; Izquierdo et al., 2012). Other thermophilic and cellulolytic members include C. straminisolvens which has yet to be sequenced (Kato et al., 2004) and whose cellulosome-possessing status is unknown, and three noncellulosomal C. stercorarium subspecies that currently carry the genus name (Fardeau et al., 2001; Poehlein et al., 2013), which are classified in the Ruminococcaceae rather than Clostridiaceae family.

Many C. thermocellum enzymes, both cellulosomal and ‘free’, are highly modular. In fact, C. thermocellum has the highest MM of all thermophiles considered (see Fig. 3, MM = 196.0–212.2). Cellulosome systems of various anaerobic bacteria differ in their complexity and diversity (Schwarz, 2001; Bayer et al., 2004; Fontes & Gilbert, 2010), with the cellulosomes of the mesophilic bacteria Ruminococcus flavefaciens (Rincon et al., 2010) and Acetivibrio cellulolyticus (Dassa et al., 2012), appearing to be even more complex than the Ct-cellulosomal system, although their assembly is based on the same specific cohesin–dockerin interactions (Pagès et al., 1997). One explanation for this may be the level of competition for plant biomass at mesophilic compared with thermophilic temperatures, with the higher competition among mesophilic plant biomass-deconstructing bacteria driving the level of complexity observed so far.

The ‘cellulosome concept’

A simplified model of the Ct-cellulosome is shown in Fig. 4. It is composed of a cellulosome-integrating protein (CipA) and (hemi-)cellulolytic enzymes (catalytic subunits), which are attached to cell-wall-bound anchoring scaffoldins (Fig. 4). Modules in CipA are separated by linker regions, which exhibit intrinsic flexibility (Hammel et al., 2004; Hammel et al., 2005). Flexible interactions between CipA, catalytic subunits, and the substrate result in multiple structural changes that facilitate efficient attachment of the cellulosome to the substrate (Gilbert, 2007; Bomble et al., 2011; Garcia-Alvarez et al., 2011). While this flexibility in the overall structure of the cellulosome is required for adherence to nonuniform substrates, this will also incur a certain amount of mechanical stress on the structural components of the cellulosome. Accordingly, use of single-molecule force microscopy to physically unfold Ct-cohesin I (Ct-CohI) modules determined that scaffoldins are one of the most mechano-stable proteins discovered to date (Valbuena et al., 2009).

Figure 4.

Cellulosomal system of Clostridum thermocellum. The Ct-cellulosome is CBS and CFS subsystems (Raman et al., 2009; Fontes & Gilbert, 2010). The monomeric cellulosome is composed of a scaffoldin (CipA; Lamed et al., 1983a; Wu et al., 1988; Gerngross et al., 1993) containing nine cohesins of type I (CohI, in yellow), which form a complex by binding nine enzymes, each containing a dockerin module of type I (DocI, in yellow; Kruus et al., 1995). The C-terminal dockerin module of type II (DocII, in green) specifically binds cohesins of type II (CohII, in green; Leibovitz & Béguin, 1996; Carvalho et al., 2005). CBS are composed of five types of protein complexes, differentiated by the protein used to anchor the cellulosome to the cell surface. Three CBS complexes are assembled via interactions between the S-layer and S-layer homology domain (SLH, purple rectangles) proteins: SdbA (Leibovitz et al., 1997), Orf2p (Fujino et al., 1993), and OlpB (Lemaire et al., 1995), containing one, two (18 enzymes total), and seven (63 enzymes total) CohII modules, respectively. These cohesins then bind corresponding numbers of individual scaffoldins via the C-terminal DocII module of the CipA scaffoldin. Two additional cell-bound proteins, OlpA (Salamitou et al., 1994) and OlpC (Pinheiro et al., 2009), contain a CohI module capable of binding a single DocI-bearing enzyme. CFS is represented by two newly identified scaffoldins: Cthe_0736 and Cthe_0735, composed of seven and one CohIIs, respectively, which would thus in turn bind seven and one scaffoldins, respectively (Raman et al., 2009).

Each catalytic subunit of the cellulosome (Table 3) contains a type I dockerin module (DocI), capable of binding to the CohI modules present in the scaffoldin (Tokatlidis et al., 1991; Salamitou et al., 1992). The assembly of catalytic components into the cellulosome occurs via CohI-DocI interactions (Fig. 5a) with a 1 : 1 stoichiometry (Kataeva et al., 1997; Carvalho et al., 2003). This is strongly dependent on calcium ions, which promote the correct folding of DocI (Lytle et al., 2000; Bayer et al., 2004). CohI-DocI interactions are often species-dependent; however, the DocI modules from Ct-Xyn11A and Ct-CelJ bind to CohI modules from C. josui (Jindou et al., 2004; Sakka et al., 2009), and the DocI module from Ct-CelJ will bind to the CohI module from C. cellulyticum (Pinheiro et al., 2009). Binding of DocI to CohI causes conformational changes in CipA (Fig. 5a). It has been demonstrated with a mini-scaffoldin containing two CohI modules and Ct-CelD (GH9-DocI) that binding of the first Ct-CelD unit increases the affinity of binding of the second Ct-CelD unit (Kataeva et al., 1997). As a part of the scaffoldin, the CBM3a module binds to crystalline cellulose, effectively anchoring the cellulosome to its substrate (Morag et al., 1995; Liu et al., 2009; Fig. 5c). The C-terminal X-DocII module binds to the cohesin of type II (CohII) module from the anchoring scaffold located on the cell surface (Carvalho et al., 2005; Adams et al., 2006; Fig. 5b).

Table 3. Ct-cellulosomal carbohydrate-active enzymes
Locus tagaProtein nameModulesbActivitycTopt (°C)References
  1. a

    Locus tags are based on the C. thermocellum ATCC 27405 genome.

  2. b

    Catalytic modules follow the convention established by CAZy: GH, glycoside hydrolase; CBM, carbohydrate-binding module, CE, carbohydrate esterase, PL, polysaccharide lyase. Each catalytic module is followed by the family number, or nc in the case of nonestablished families.

  3. c

    Activity reported for enzymes without references are predicted based on annotated GH activity (

Cthe_0032GH26, CBM35, DocIβ-MannanaseN/A 
Cthe_0043CelNGH9, CBM3c, DocIEndo-1,4-β-glucanase70Zverlov et al. (2003)
Cthe_0043CelNGH9, CBM3c, DocIEndo-1,4-β-glucanase70Zverlov et al. (2003)
Cthe_0211LicBGH16, DocIEndo-1,3-1,4-β-glucanase80Schimming et al. (1991)
Cthe_0269CelAGH8, DocI Endo-1,4-β-glucanase75Schwarz et al. (1986)
Cthe_0274CelPGH9, DocIEndo-1,4-β-glucanaseN/ASchwarz (2001)
Cthe_0405CelLGH5, DocIEndo-1,4-β-glucanaseN/ASchwarz (2001)
Cthe_0412CelKGH9, CBM4, DocIExo-1,4-β-glucanase  
Cthe_0413CbhAGH9, CBM4, CBD3b, DocIExo-1,4-β-glucanaseN/AZverlov et al. (1998b) and Schubot et al. (2004)
Cthe_0433 GH9, CBM3c, DocIEndo-1,4-β-glucanaseN/A 
Cthe_0536CelBGH5, DocIEndo-1,4-β-glucanaseN/ABéguin et al. (1983)
Cthe_0543CelFGH9, CBM3c, DocIEndo-1,4-β-glucanaseN/ANavarro et al. (1991)
Cthe_0578CelRGH9, CBM3c, DocIEndo-β-1,4-glucanase78.5Zverlov et al. (2005a)
Cthe_0624CelJGH9, GH44, CBM30, DocIEndo-1,4-β-glucanase70Ahsan et al. (1997)
Cthe_0625CelQGH9, CBM3c, DocIEndo-1,4-β-glucanase60Arai et al. (2001)
Cthe_0660 GH81, DocIEndo-1,3-β-glucanaseN/A 
Cthe_0745CelWGH9, CBM3c, DocIEndo-1,4-β-glucanase  
Cthe_0797CelE, CtCE2GH5, DocI, CE2 Endo-1,4-β-glucanase, Acetylxylan esterase, 6-O-glycoside deacetylase70Hall et al. (1988), Abdeev et al. (2001) and Montanier et al. (2009)
Cthe_0825CelDGH9, DocIEndo-1,4-β-glucanase60Joliff et al. (1986)
Cthe_1472CelHGH26, GH5, CBM11, DocIEndo-1,4-β-glucanaseN/AYague et al. (1990)
Cthe_2089CelSGH48, DocI Exo-1,4,-β-glucanase70Kruus et al. (1995)
Cthe_2147CelOGH5, CBM3, DocIExo-1,4,-β-glucanase65Zverlov et al. (2002)
Cthe_2360CelUGH9, CBM3b, CBM3c, DocIEndo-1,4-β-glucanaseN/A 
Cthe_2760CelVGH9, CBM3b, CBM3c, DocIEndo-1,4-β-glucanaseN/A 
Cthe_2761 GH9, CBM3c, DocIEndo-1,4-β-glucanaseN/A 
Cthe_2812CelTGH9, DocIEndo-1,4-β-glucanase70Kurokawa et al. (2002)
Cthe_2872CelGGH5, DocIEndo-1,4-β-glucanase Lemaire & Beguin (1993)
Cthe_0912XynYCBM22, GH10, CBM22, DocI, CE1 Endo-1,4-β-xylanase, Feruloyl esteraseN/ABlum et al. (2000)
Cthe_1398Xgh74AGH74, DocIXyloglucanase75Zverlov et al. (2005c)
Cthe_1838XynCCBM22, GH10, DocIEndo-1,4-β-xylanase80Hayashi et al. (1997)
Cthe_1963XynZCE1, CBM6, DocI, GH10 Endo-1,4-β-xylanase, Feruloyl esterase60Kataeva et al. (2001)
Cthe_2137 GH39, CBM35, CBM35, DocIβ-XylosidaseN/A 
Cthe_2193CtXyl5AGH5, CBM6, CBM13, CBM62, DocIArabinoxylanaseN/ACorreia et al. (2011)
Cthe_2590XynDGH10, CBM22, DocIEndo-1,4-β-xylanase80Zverlov et al. (2005c)
Cthe_2972XynA/UGH11, CE4, CBM6, DocIEndo-1,4-β-xylanase Hayashi et al. (1999)
Cthe_0032 GH26, CBM6, DocIβ-MannanaseN/A 
Cthe_0821CtMan5AGH5, CBM32, DocIβ-Mannanase60Mizutani et al. (2012)
Cthe_2811ManAGH26, CBM35, DocIβ-Mannanase65Halstead et al. (1999)
Side chain-degrading enzymes
Cthe_0015 GH43, GH54, DocIα-l-arabinofuranosidaseN/A 
Cthe_0032 GH26, CBM6, DocIβ-MannanaseN/A 
Cthe_0246 PL11, CBM6, DocIRhamnogalacturonan lyaseN/A 
Cthe_0270ChiAGH18, DocIEndo-chitinaseN/AZverlov et al. (2002)
Cthe_0661Ct1,3Gal43AGH43, CBM13, DocIExo-1,3-β-galactanase50Ichinose et al. (2006)
Cthe_0798 CE3, CE3, DocIAcetylxylan esteraseN/A 
Cthe_1400 GH53, DocIEndo-1,4-β-galactanaseN/A 
Cthe_2138 GH42, CBM43, DocIα-L-arabinofuranosidaseN/A 
Cthe_2139 GH30, GH42, CBM43, DocIα-L-arabinofuranosidaseN/A 
Cthe_2194 CE1, CBM6, DocIAcetylxylan esteraseN/A 
Cthe_2196 GH43, CBM6, DocIα-L-arabinofuranosidaseN/A 
Cthe_2179 PL1, PL9, CBM6, DocIPectate lyaseN/A 
Cthe_2949 CE8, DocIPectinesteraseN/A 
Cthe_2950 PL1, CBM6, DocIPectate lyaseN/A 
Cthe_2038 DocICellulosome enzymeN/A 
Cthe_0109 DocICellulosome enzymeN/A 
Cthe_0435 DocICellulosome enzymeN/A 
Cthe_0438 DocICellulosome enzymeN/A 
Cthe_0640 DocICellulosome enzymeN/A 
Cthe_0729 DocICellulosome enzymeN/A 
Cthe_0918 DocICellulosome enzymeN/A 
Cthe_1271 GH43, CBM6, DocIHemicellulaseN/A 
Cthe_1806 DocICellulosome enzymeN/A 
Cthe_1890 DocICellulosome enzymeN/A 
Cthe_2195 CBM6, DocICellulosome enzymeN/A 
Cthe_2197 GH2, CBM6, DocICellulosome enzymeN/A 
Cthe_2271 DocICellulosome enzymeN/A 
Cthe_2549 DocICellulosome enzymeN/A 
Cthe_2879 CEnc, DocICarbohydrate esteraseN/A 
Cthe_3012 GH30, CBM6, DocICellulosome enzyme  
Cthe_3132 DocICellulosome enzymeN/A 
Cthe_3141 CE12, CBM6, DocICarbohydrate esteraseN/A 
Figure 5.

Structures of Ct-cellulosome protein components. (a) Scanning electron micrograph of Clostridum thermocellum cells exhibiting ‘protuberances’ (representing cellulosomes and polycellulosomes anchored to cell surface proteins; Bayer & Lamed, 1986). Figure 5a is reproduced with the permission of the Journal of Bacteriology. (b) CohI-DocI, the crystal structure of the CohI-DocI complex boxed in teal shows helices one and three from DocI (blue structure) binding to CohI (beige structure) via hydrophobic interactions and limited hydrogen bonding between serine and threonine residues. Red spheres represent calcium ion binding sites in the DocI module (Carvalho et al., 2003). (c) CBM3a (yellow structure) from CipA binds amorphous and crystalline cellulose (Morag et al., 1995). It is the major CBM responsible for the attachment of the Ct-cellulosome to the substrate. CBM3a possesses a β-sandwich fold with nine strands where one of the β-sheets displays a planar topology reflecting the planar structure of crystalline cellulose (Tormo et al., 1996). (d) The structure of the X-DocII-CohII complex boxed in magenta displays the hydrophobic interaction between DocII (green structure) and CohII (orange structure) and the accessory function of the X module (beige structure) in binding (Adams et al., 2006). While the structures of CohI and CohII appear to be similar, residues required for hydrogen bonding are not conserved (Carvalho et al., 2005). Furthermore, the overall structures of DocI and X-DocII are divergent so that the orientation in which the Coh-Doc pairs line up differ (Adams et al., 2006). (e) Overall schematic of the highlighted cellulosomal structural components. Type I cohesin–dockerin pairs are shaded in gold, and type II cohesin–dockerin pairs are shaded in green. Abbreviations are as follows: CD, catalytic module; DocI, dockerin type I; CohI, cohesin type I; DocII, dockerin type II, CohII, cohesin type II, SLH, surface-layer homology domain.

The modules in Ct-cellulosomal enzymes are separated by linkers that are relatively short compared with those in higher thermophiles (Dam et al., 2011). It has been demonstrated that in the seven-module enzyme, Ct-Cbh9A, units are coupled as multimodular constructs, which exhibited more cooperative unfolding than did the individual modules upon in vitro denaturation. Calcium ions promoted the correct interaction between the modules (Kataeva et al., 2004; Kataeva et al., 2005). Recently, an interaction between CBM3a and a linker separating CBM3a and CohI3 from Ct-CipA has been reported (Yaniv et al., 2012a). The synergism observed between cellulosomal components (Fontes & Gilbert, 2010) can at least be partially explained by inter-/intraprotein/modular interactions. The CipA linkers are highly O-glycosylated (Gerwig et al., 1993) and are also predicted to be phosphorylated, in accordance with the high phosphorus content of the Ct-cellulosome (Choi & Ljungdahl, 1996).

In addition to studies expanding our understanding the 3-D structure and function of cellulosomes, such large extracellular protein complexes must also have post-translational modifications to protect them from proteolytic degradation. Previous studies noted that the scaffoldin and catalytic proteins of C. thermocellum are extensively glycosylated (Gerwig et al., 1993), and it has been suggested that glycosylation works to further protect these protein complexes from proteolysis (Schwarz & Zverlov, 2006). Furthermore, C. thermocellum contains two genes that encode serine protease inhibitors (Ct-serpin1 and 2) and a subtilisin-related protease with a DocI module (Zverlov et al., 2005b). Inhibition of subtilisin was found to occur with a stoichiometry of 1 : 1 and thought to occur irreversibly by forming a bound complex with subtilisin, suggesting that cellulosomal serpins may interact with a cellulosomal protease identified in the C. thermocellum genome, CprA (Kang et al., 2006). Additionally, these protease inhibitors may protect the cellulosome from extracellular protease attack. It is clear that there is more to be learned about protecting the cellulosome, which will be of importance for CBP processes that require enzyme stability.

In support of the intensive studies of the Ct-cellulosome, mutagenesis has helped to gain new insights into its structure and function, which in turn will guide the creation of engineered cellulosomes for CBP strain development. Mutagenesis studies have been reported and continue to be used, in conjunction with crystallography and in vitro biochemical assays, to characterize the various cellulosomal components and functions. Mutagenesis targets have included the SbdA surface anchoring protein (Leibovitz et al., 1997), cohesin–dockerin interactions (Mechaly et al., 2000; Handelsman et al., 2004), the CelK CBM (Kataeva et al., 2001), especially the CipA scaffoldin protein (Miras et al., 2002; Adams et al., 2006; Zverlov et al., 2008), and Ct-cel48S have been deleted (Olson et al., 2010).

Assembly of the Ct-cellulosome occurs in a random manner, due to the high similarity in structure and in sequence between CohI modules from CipA and DocI modules from the enzymes (Bayer et al., 2004). The composition of the cellulosome varies with different growth substrates (Raman et al., 2009), suggesting that there is coordinated substrate-specific regulation of the subunits. With the availability of genome sequences of C. thermocellum strains ATCC 27405 (GenBank accession number CP000568) and DSM 1313 (Feinberg et al., 2011), it was determined that the cellulosome of this bacterium is composed of both cell-bound (CBS) and cell-free (CFS) systems, not to be confused with the ‘free’ noncellulosomal enzymes (Fig. 4). It will be interesting to see how the organism modulates the composition of enzymes in the CBS vs. CFS, and whether there are certain types of enzymes found preferentially in the CFS. Previous studies have demonstrated that specific CohI-DocI pairs have higher affinity for each other. For example, do the two C. thermocellum proteins preferentially bind to OlpC at the cell surface, or to the scaffoldin CipA (Pinheiro et al., 2009)? Additional cases, such as this, are certain to exist and may help unravel the mechanisms by which catalytic subunits from the cellulosome assemble into the various CBS and CFS cellulosomes.

Cellulosome self-assembly, as well as some aspects of cellulosomal synergistic solubilization, was investigated through free native dockerin-containing catalytic subunits from a CipA-deficient mutant (Zverlov et al., 2008). The recombinantly expressed versions were allowed to spontaneously assemble on different recombinant cohesin-containing CipA variants (Krauss et al., 2012). Previous work in C. cellulolyticum indicated that the CohI-DocI interaction may occur spontaneously (Fierobe et al., 1999). However, this had yet to be demonstrated in C. thermocellum. Overall, binding was shown to be spontaneous and random, with little or no preference for CohI positions on the scaffoldin; the amount of synergism displayed among enzymes increased with the number of CohI positions on the scaffoldin (Krauss et al., 2012). The in vitro reconstitution of an active cellulosome is a promising step. Not only does this move toward engineered cell-free cellulosomes, but also to the engineering of catalytic subunits incorporated into in vivo cellulosomes to further increase catalytic activity on plant biomass substrates.

Despite extensive studies, there are insufficient data to explain the complex mechanisms of microbial plant cell wall deconstruction, although an understanding of this phenomenon is critical to develop efficient conversion processes. Computational modeling is a powerful tool for generating new hypotheses on cellulosome properties, complemented by experimental validation (Beckham et al., 2011). Multiscale modeling from atomic resolution to coarse-graining multiple resolutions is essential and supports other structural techniques. The challenge for computer modeling is to develop methods applicable to the large cellulosome systems and their interactions with carbohydrate substrates. Computational technology is currently capable of describing systems that contain up to 106 atoms, but the cellulosome exceeds 108 atoms, rendering atomistic models incompatible with modeling large complexes such as cellulosomes (Taylor et al., 2008). Therefore, a coarse-grain approach must be applied to cellulosomal modeling, which considers defined clusters of atoms as single units with structural constraints. Modeling at coarse-grain scale simplifies calculations, and providing the model is chosen properly with well-defined parameters, accurate data can be predicted regarding cellulosomal structure and behavior (Ding et al., 2008; Beckham et al., 2011). Coarse-grained modeling was applied to investigate self-assembly of the Ct-cellulosome. The model was based on physical characteristics of three catalytic subunits (Ct-Cel5B, Ct-Cel48S, and Ct-Cbh48A) and of scaffoldin CipA to explore the effect of protein size, flexibility, and shape on self-assembly. It was determined that the shape and modular structure of the subunits, but not their mass, dominate upon assembly of the cellulosome. In particular, the large 140-kDa seven-module enzyme, Cbh9A, binds CipA more frequently than simpler Cel48S (GH48-DocI) or Cel5B (GH5-DocI; Bomble et al., 2011).

Cellulosomal and free-enzyme systems from C. thermocellum

The presence of multiple CohII-containing proteins in the CBS and CFS scaffoldin platforms agrees with the fact that the genome of C. thermocellum encodes over 70 dockerin-containing catalytic subunits (Zverlov et al., 2005b; Kang et al., 2006; Raman et al., 2009; Table 3). Noncellulosomal cellulolytic systems are also present in C. thermocellum (Hazlewood et al., 1993; Gilad et al., 2003; Berger et al., 2007), in addition to two S-layer bound enzymes (Fuchs et al., 2003; Selvaraj et al., 2010). By far, the most extensive system in C. thermocellum is the cellulosomal system, which can assemble up to 63 catalytic units into the OlpB-based CBS and Cthe_0736-based CFS (Fontes & Gilbert, 2010; Fig. 4). A well-observed phenomenon in organisms that produce cellulosomes is the catalytic synergy exhibited by the cellulolytic machinery. This is attributed to two major effects: a targeting effect (targeting of enzymes to the substrate surface) and a proximity effect (the fact that enzymes are in close proximity to each other and adhered to the cell surface). The synergistic effect of cellulosomes has been discussed extensively (Fierobe et al., 2002; Lynd et al., 2002; Bayer et al., 2004; Lynd et al., 2005) and quantified (Lu et al., 2006; Vazana et al., 2010). Synergy exhibited by catalytic module (CM) units in cellulosomes and the proximity between the cell and substrate collectively drive cellulose hydrolysis rates and are potential mechanisms that can be exploited in CBP strain development.

As discussed above, the enzymes of the Ct-cellulosome are composed of at least one CM, one DocI module, and potential accessory modules (Table 3). Around 20 cellulosomal subunits also have more complex modular architectures and contain, in particular, CBMs or other CMs, such as GH modules or carbohydrate esterases (CE; Zverlov et al., 2005b; Raman et al., 2009). The individual CBMs likely assist CBM3 of CipA in the correct positioning of the cellulosome on the cellulosic substrate, with respect to the parent enzymatic subunit. Several Ct-cellulosomal enzymes contain cellulose-, xyloglucan-, xylan-, and pectin-specific CBMs targeting their attached CMs to the target substrates (Boraston et al., 2004; Fontes & Gilbert, 2010). Catalytic activities and binding specificities within one multimodular enzyme can be related to the same substrates, but is not always the case.

A common strategy for discovering and isolating cellulosomal components before the C. thermocellum genome sequence was available was to screen random libraries for cellulase activity and then observe whether any isolates that screened positive for such activity possessed a dockerin module (Maki et al., 2009). Structural analysis of cellulosomal enzymes has also complemented biochemical analyses, and to date 59 crystal structures representing over 30 discrete DocI, GH, CBM, and CE modules are available in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB; After the draft genome sequence of C. thermocellum strain ATCC 27405 became available in 2006, additional cellulosomal components were identified, using bioinformatic analyses to scan the entire genome for coding sequences containing dockerin modules.

This bioinformatics-mediated scanning technique, coupled with 2D gel electrophoresis, was used to identify three additional catalytic cellulosome components (Zverlov et al., 2005b). Of the identified enzymes, CtCel9R, an endoglucanase, was also shown to be equally active in a recombinant ‘free’ state, in addition to its cellulosomal state in concert with Ct-Cel48S. This highlighted the importance of CBM-mediated targeting over proximity of CMs for both cellulosomal and free enzymes (Vazana et al., 2010). The other two enzymes identified from cellulosomal protein fractions, Ct-Xyn10D and Ct-Xgh74A, had substrate preferences for xylan and xyloglucan, respectively. Due to their abundance in the Ct-cellulosome, they are proposed to be major components of the hemicellulolytic machinery (Zverlov et al., 2005c). Further structural and biochemical analysis of Ct-Xgh74A determined that it is a true xyloglucanase, exhibiting little cross-reactivity and possessing a complex binding pocket that reflects the branched nature of its preferred substrate (Martinez-Fleites et al., 2006).

In addition to Ct-Xgh74A, several Ct-cellulosomal enzymatic components (CMs and, in some cases, ancillary modules) have been crystallized and their structures determined, some of which constitute the first cellulases to have structures solved. In this context, the structures were solved for the CMs for endoglucanases Ct-CelD (Juy et al., 1992; Chauvaux et al., 1995), Ct-Cel8A (Alzari et al., 1996; Schmidt et al., 2002), the CM for the cellobiohydrolase Ct-Cel48S (Guimaraes et al., 2002), and the GH44 module from endoglucanase Ct-CelJ (Kitago et al., 2007). Noncellulosomal enzymes are also of interest from C. thermocellum, including the endoglucanase CelC (Dominguez et al., 1994; Alzari et al., 1996).

Structural analyses have also been helpful in assigning functions to conserved modules. For example, X6 modules are now known as CBM family 22 (Charnock et al., 2000) and currently have over 350 representatives from the kingdoms of Bacteria and Eukaryota ( More recently, a hypothetical protein (Cthe_0435/Ct-Cel124) containing a dockerin module was found to be enriched in the cellulosome of C. thermocellum during growth on crystalline cellulose (Raman et al., 2009) and was catalytically active. Furthermore, this CM of the cellulosome was determined to be the first member of the novel GH124 family, based on structural analysis of the enzyme (Brás et al., 2011). This newly discovered family appears to be unique to the cellulosomal Clostridia and currently is comprised of only three enzymes ( Another example includes the use of a crystal structure for the Ct-CelT CM, which was solved to answer questions about its function. It contains a CM similar to that of other GH9 modules, as well as a number of additional, yet to be characterized, structural elements located far from the catalytic site (Kesavulu et al., 2012). Ct-CelT is unique in that it does not contain a dedicated CBM or immunoglobulin-like module as most other family 9 catalytic subunits do. However, even without a CBM, Ct-CelT maintains activity levels comparable with other GH9 enzymes (Kurokawa et al., 2002).

Structural data from individual and combined components of the large, modular cellulosomal cellobiohydrolase Ct-Cbh9A have been used to understand their respective functions. Ct-Cbh9A is a large (> 135 kDa), elaborate, cellulosomal enzyme composed of seven modules: CBM4-Ig-GH9-X11-X12-CBM3b-DocI (Zverlov et al., 1998a). Extensive structural analysis of this cellulosomal enzyme was conducted to better understand the biochemical and binding activity of Ct-Cbh9A, and computer modeling of all the interacting modules was possible. Overall, the size of the enzyme complex limits its diffusion in solution, thus allowing for longer residence times at the scaffoldin, thereby increasing the amount of Ct-CbhA subunits that are incorporated versus other smaller subunits (Bomble et al., 2011). Other insights gained by structural analysis of Ct-Cbh9A components include highlighting aromatic residues and peptide loops involved in amorphous cellulose adhesion from the modular pair CBM4-Ig (Alahuhta et al., 2010). Similar structures were also observed in a CBM4 from Cel9K, supporting the theory that additional peptide loops may be a feature of cellulosomal CBM4 modules (Alahuhta et al., 2011). The biological function of the Ig module is still not clear, although its deletion from the N-terminus of the GH9 CM results in complete loss of activity, presumably due to the observed structural instability (Kataeva et al., 2004). Structural analysis of the Ig-like module in combination with GH9 CM showed the extensive interface between these two modules is maintained by several hydrophobic and hydrophilic interactions (Schubot et al., 2004). This finding supported the hypothesis that the function of the Ig-like module is to stabilize the CM (Kataeva et al., 2004).

The crystal structures of two X1 modules from Ct-Cbh9A have also been solved individually and in combination. It has been suggested that the X1 modules aid in the thermostability of the Cbh9A, allowing it to hydrolyze cellulose at temperatures well above the optimal growth temperature of C. thermocellum (Kataeva et al., 2005). Despite the low amino acid sequence similarity (25%), the X1 modules have a similar fold (Brunecky et al., 2012). A non CM (CBM3b), localized near the DocI module, was crystallized from Ct-Cbh9A and was determined not to effectively bind cellulose. However, mutations added to the binding region restored cellulose binding (Yaniv et al., 2012b).

Other alternative roles for some of the noncellulosomal proteins have also recently been proposed, based on homology of their N-terminus to RsgI-like proteins (Nataf et al., 2010). These noncellulosomal proteins with a GH5 or GH10 module at the C-terminus are part of an extracellular oligosaccharide-responsive signaling network that will in turn regulate expression of cellulosomal genes (Nataf et al., 2010). Curiously, only the GH10-containing regulator had detectable catalytic activity, while the GH5-containing regulator had an amino acid residue substitution in the active site rendering it catalytically inactive (Bahari et al., 2011). The latter GHs were thus proposed to act as CBMs rather than enzymes. This cellulosome-related regulation system has important implications for strain development, because it could potentially be used to modulate cellulosomal components in vivo during CBP.

Engineered cellulosomes

As the cohesin–dockerin interaction is generic at the species level for cellulosomes (see above), this presents a useful platform on which to build synthetic cellulosomes. Many studies concerning in vitro cellulosome composition have been carried out with C. thermocellum (Uversky & Kataeva, 2006; Maki et al., 2009; Elkins et al., 2010a; Zhang, 2011). Such studies bode well for engineering more efficient platforms for plant biomass degradation, and these are summarized here.

In a recent study (Gefen et al., 2012), hydrolysis performance of cellulosomes acting on microcrystalline cellulose and pretreated switchgrass was improved by protein engineering. A normally intracellular β-glucosidase was fused to a CohII module and incorporated into the scaffoldin at the C-terminal DocII site. This allowed the native Ct-cellulosome components to attach to type I cohesin sites on the CipA scaffoldin. The hydrolysis of cellobiose by β-glucosidase can reduce product inhibition on cellobiohydrolase enzymes of the cellulosome (Lamed et al., 1991) and other cellulases (Sternberg et al., 1977). The cellulosome-type II cohesin-β-glucosidase complex was shown to degrade cellulose substrates at a faster rate and to a higher overall degree than either the native cellulosome or an enzyme mixture of cellulosomes and wild-type (free) C. thermocellum β-glucosidase. Catalytic efficiency was increased threefold for microcrystalline cellulose and twofold for switchgrass (Gefen et al., 2012). Improving the performance of individual cellulosomal catalytic subunits is another way cellulosomes can be improved through engineering. A recent example is the engineering of a more thermostable Cel8A using error-prone PCR and directed evolution, which increased the thermostability of Cel8A by 6.2 °C while maintaining enzymatic activity (Anbar et al., 2010). A follow-up study also aimed to increase the thermotolerance of the very same enzyme, using targeted mutations in three consensus residues, further increasing thermostability by 9.5 °C (Anbar et al., 2012).

In addition to computer modeling and detailed structural information, other tools are being used to learn more about cellulosome function. For example, a protein-based microarray was developed that was designed to assay the specificity of dockerin modules for cohesin modules using 28 known cohesins spanning six bacterial species and one archaeon (Haimovitz et al., 2008). Another cellulosome-specific tool developed recently is a modification of protein affinity tag chromatography, consisting of a shortened dockerin module as bait for cohesin modules (Demishtein et al., 2010). The approach was developed to improve the use of the wild-type cohesin–dockerin system (Craig et al., 2006), in which elution of the bound dockerin-tagged protein is difficult. A similar approach for reducing Ca2+-dependent, cohesin–dockerin binding was recently reported in which dockerin residues of the Ca2+-binding motif were mutated (Kamezaki et al., 2010). More attention should be paid in the future to better utilize the special characteristics of the cohesin–dockerin interaction for applied biotechnological purposes. The potential of this interaction was recognized almost two decades ago (Bayer et al., 1994), but very little has been accomplished throughout the years to realize its true biotechnological capacity.

Why are cellulosomes not found in hyperthermophiles?

Cellulosomal complexes are limited to only anaerobic bacteria growing at temperatures optimally at or below 60 °C (Blumer-Schuette et al., 2008). Some species of Thermotoga that grow optimally near 80 °C produce a broad spectrum of extracellular enzymes related to degrading plant biomass, but they lack cellulase activity, consistent with the fact that a hyperthermophilic cellulosome has not been found. It has been shown that cohesin- and dockerin-like structures are present in some hyperthermophilic Archaea, but again a cellulase complex capable of degrading crystalline cellulose is not present (Shoham et al., 1999; Peer et al., 2009). The genomes of such organisms usually encode for only one or a few of these modules, often combined in one protein, and are found in genomes of non(hemi)cellulolytic species. In particular, two consecutive genes identified in Archaeoglobus fulgidus encode for proteins with a putative cohesin module or both cohesin- and dockerin-like sequences (Bayer et al., 1999; Voronov-Goldman et al., 2011). Their function has been confirmed biochemically, as both recombinant cohesin modules recognize the lone dockerin. However, this recognition was observed at room temperature (Haimovitz et al., 2008), and it is not clear whether these modules interact at 80 °C, the optimum growth temperature of the organism. Thermophiles growing above 60 °C produce (hemi)cellulolytic enzymes with more complex modular structures than those from bacteria growing below 60 °C (Doi, 2008; Zverlov & Schwarz, 2008a; Xu et al., 2011). A characteristic feature of these enzymes is the presence of long linkers separating individual modules. It is possible that at ultra-high temperatures, combinations of CAZy modules can only be held together in a single polypeptide chain by covalent bonding and not by hydrophobic interactions that hold the cohesin–dockerin types. For example, it has been demonstrated with modules from Ccellulolyticum that the cohesin–dockerin complex can dissociate under elevated temperatures (Mingardon et al., 2007). In microorganisms growing at higher temperatures, long linkers enable multimodular proteins to remain intact under such conditions. We speculate that the tenacious, noncovalent binding provided by the cohesin–dockerin interaction is functional only up to 60 °C or so, although more studies are needed to confirm this.

Thermophilic CBP paradigm 1: Clostridium thermocellum

The use of C. thermocellum as a CBP organism has been reviewed previously (Lynd et al., 2002; Demain et al., 2005; Carere et al., 2008; Maki et al., 2009; Olson et al., 2012), and there have been a number of reviews of its cellulosome system (Garcia-Martinez et al., 1980; Sheehan & Himmel, 1999; Schwarz, 2001; Bayer et al., 2004; Gilbert, 2007; Fontes & Gilbert, 2010; Himmel et al., 2010). These earlier reviews covered many aspects that will not be discussed further here, including the requirement to move beyond fossil fuels, microbial cellulose utilization, biomass composition, ecological aspects of cellulose breakdown, enzymes, media, processes that would use co-cultures, carbon source utilization and uptake, bioenergetics and kinetics, regulation, and targets for strain development (e.g. acetate kinase, phosphotransacetylase and hydrogenase; Lynd et al., 2002). However, in the last several years, there have been important advances in DNA sequencing technologies, other technologies such as mass spectrometry, and in tool development for genetic manipulation that will enable more rapid advances in the future. This section will highlight the most recent advances in thermophilic lignocellulose deconstruction using C. thermocellum. Later sections will compare and contrast the properties of this organism with those of Caldicellulosiruptor spp. as a different model for thermophilic lignocellulose deconstruction. As discussed above, C. thermocellum naturally produces its own powerful enzymes that form large extracellular complexes, or cellulosomes, for biomass deconstruction. It has been observed that C. thermocellum (Topt 60 °C), as well as Caldicellulosiruptor spp. (Topt up to 78 °C), have higher maximum specific growth rates on crystalline cellulose compared to mesophilic organisms (Lynd et al., 2002). However, native C. thermocellum strains are not as prolific ethanologens as yeast or bacteria, such as Zymomonas mobilis, and C. thermocellum growth is limited to hexose (six-carbon) sugars (Demain et al., 2005). The advent of new tools and models offers the prospect of making faster progress on required improvements in fuel molecule yield, titer, productivity, and the possibility of adding genes for pentose uptake and utilization or to form other products.

Clostridum thermocellum genome sequence

The genome sequence of C. thermocellum strain ATCC 27405 was the first generated for this species. The first draft sequence was available to the public in November 2003; however, repetitive sequences, such as transposases and those present in cohesin modules, made closing this genome challenging, and the genome sequence was not finished until February 2007. The gene modeling prediction program Prodigal (Hyatt et al., 2010) was recently applied to the C. thermocellum ATCC 27405 genome sequence, and a total of 3173 candidate protein-coding sequences (CDSs) were predicted in the latest annotation of the C. thermocellum ATCC 27405 genome record (CP000568.1). A comprehensive gene-by-gene comparison of annotation versions can be found at A number of other microbial genomes have been updated in a similar fashion (Yang et al., 2009a; Hauser et al., 2011), and it is likely further improvements will continue to be made as algorithms improve and new features, such as small regulatory RNAs, are discovered and described.

Twenty genome sequences for clostridial species spanning multiple genera were recently released (Hemme et al., 2010), two of which were for C. thermocellum strains JW20 (DSM 4150) and LQRI (DSM 2360). The GenBank records show that the genomes for strains JW20 and LQRI are noncontiguous finished genomes (Chain et al., 2009), consisting of 21 and 110 contigs, respectively. Finished and noncontiguous finished genomes have been reported for C. thermocellum DSM 1313 (Feinberg et al., 2011) and C. thermocellum YS and AD2 (Brown et al., 2012), respectively. The ATCC 27405 genome sequence has been used to design PCR primers for related strain DSM 1313, indicating the strains are closely related (Tripathi et al., 2010).

The availability of genome sequences for different strains (Brown et al., 2012) may allow for new insights and interpretation into cellulose conversion. Clostridium thermocellum strain YS and the derived cellulose adhesion-defective mutant (AD2) had critical roles in describing the original cellulosome concept (Bayer et al., 1983; Lamed et al., 1983ab), in which cellulases and related polysaccharide-degrading enzymes are bundled in ordered, high molecular weight, cellulolytic enzyme complexes. Clostridium thermocellum DSM 1313 is the background strain for a newly developed genetic system. Multiple genome sequences will allow more refined bioinformatics predictions for genes, operons, and cis-regulatory motifs in the future.

Molecular genetic tools for C. thermocellum

The ability to genetically manipulate bacteria is an essential component to being able to understand a microorganism's physiology and regulation and for developing strains with industrial applications. Until recently, C. themocellum mutants had been created by random mutagenesis and selection for phenotypes, such as loss of the ability to bind to crystalline cellulose (Bayer et al., 1983; Zverlov et al., 2008), and ethanol tolerance (Herrero & Gomez, 1980; Williams et al., 2007). However, in the last several years, considerable progress has been made in clostridial genetics. Gene knock-out and knock-in systems have been developed for mesophilic members of the genus Clostridium (Heap et al., 2007; Tolonen et al., 2009; Heap et al., 2010; Heap et al., 2012), deletion of key genes involved in organic acid formation facilitates higher ethanol titers by the thermophile T. saccharolyticum (Shaw et al., 2008), and more recently progress has been made in C. thermocellum. Restriction-modification systems appear to be important barriers for efficient clostridial transformation, and restriction endonuclease activity has been identified in C. thermocellum ATCC 27405 (Klapatch et al., 1996). Electroporation was reported for C. thermocellum, and higher efficiencies were found for C. thermocellum DSM 1313 compared with C. thermocellum ATCC 27405 (Tyurin et al., 2004; Tyurin et al., 2005), likely due to differences in their restriction-modification systems (Guss et al., 2012). Clostridium thermocellum strains display a high degree of similarity, both through synteny and homology, at the genome level (Fig. 6), which reflects that these bacteria are close strains. Detailed methodologies have been provided for C. thermocellum transformations, and one key recommendation suggests that strain DSM 1313 is used over strain ATCC 27405 due to observed low and variable transformation efficiencies (Olson & Lynd, 2012b).

Figure 6.

Clostridum thermocellum comparative genomic analysis. The blast Ring Image Generator (BRIG) software (Alikhan et al., 2011) was used to compare C. thermocellum strains whose genome sequences were available. Different genome sequences, indicated by colors within concentric circles, are as follows from the outermost circle inward: red, strain ATCC27405 (GenBank accession NC_009012); yellow, strain AD2 (GenBank accession AJGS00000000); blue, strain YS (GenBank accession AJGT00000000); indigo, strain LQR1 (DSM 2360, GenBank accession ACVX00000000); teal, strain JW20 (GenBank accession ABVG00000000); violet, strain DSM 1313 (GenBank accession NC_017304), GC content, GC skew (+, green; −, purple). All C. thermocellum genomes were compared to strain ATCC 27405, and unique genetic loci found in ATCC 27405 are labeled in red: 1, Cthe_0512 : Cthe_0526; 2, Cthe_1114 : Cthe_1161; 3, Cthe_1591 : Cthe_1750; 4, Cthe_2455 : Cthe_2503; 5, Cthe_3201 : Cthe_3221; 6, Cthe_3232 : Cthe_3238. We thank Sagar Utturkar (Graduate School of Genome Science and Technology, University of Tennessee) for assistance in figure preparation.

The thermophilic lifestyle of C. thermocellum, as well as other cellulolytic organisms such as Caldicellulosiruptor spp., precludes the use of many antibiotic markers and has prevented the application of approaches used for mesophilic organisms. Several gene deletion strategies, based on homologous recombination, and the use of auxotrophic markers have been described recently for C. thermocellum DSM 1313 (Olson et al., 2010; Tripathi et al., 2010; Argyros et al., 2011; Olson & Lynd, 2012b). Initial C. thermocellum genes targeted for deletion included cel48S (also known as CelS, SS, and S8), which encodes a GH48 cellulose that has been shown to be the most abundant enzymatic cellulosome subunit (Lamed et al., 1983a; Wu et al., 1988; Zverlov et al., 2005b; Gold & Martin, 2007; Raman et al., 2009; Olson et al., 2010), and the phosphotransacetylase (pta) gene to produce a strain that did not produce acetate (Tripathi et al., 2010); and a double deletion strain was created to remove the lactate dehydrogenase (ldh) and phosphotransacetylase (pta) genes (Argyros et al., 2011).

In one approach, termed ‘allelic replacement’, the pyrF gene was deleted from the genome of C. thermocellum strain DSM 1313 by the common approach of using the toxic uracil analog 5-fluoroorotic acid (5-FOA) and selection to create a uracil auxotrophic ΔpyrF background strain (Tripathi et al., 2010). The pyrF gene encodes orotidine 5-phosphate decarboxylase, which is involved in de novo pyrimidine biosynthesis. Auxotrophs can be rescued with the addition of uracil or if a plasmid supplies a functional pyrF gene, which can then be used as positive selection. To create a deletion mutant, a plasmid is constructed that contains the pyrF gene, and DNA regions that flank the target are amplified and are inserted either side of the cat gene (chloramphenicol acetyl-transferase). The cat gene provides resistance to chloramphenicol and thiamphenicol. Initially, when introduced into C. thermocellum and selected for, the plasmid integrates at one of the homologous DNA sites via a single crossover and thiamphenicol selection for the cat gene. Marker exchange via a second recombination is subsequently forced by 5-FOA selection against pyrF, which leaves a deletion strain containing the cat gene in place of the wild-type locus.

A second method has been developed to create C. thermocellum markerless deletion strains (Argyros et al., 2011). The plasmid design to create a markerless C. thermocellum deletion strain differs in that 5′ and 3′ regions of homology are adjacent to one another on the initial plasmid construct; the cat and hpt genes are typically adjacent in a 3′ position, and a third region of homologous chromosomal DNA is downstream. Plasmid DNA integrates into the chromosome via thiamphenicol selection, and the tdk and hpt counterselection markers are also inserted. Negative selection then proceeds in several rounds using the toxic analogs 5-fluoro-2′-deoxyuradine (FuDR) and 8-azahypoxanthine (8AZH) against the tdk and hpt markers, respectively, and results in the creation of a markerless deletion strain. The use of hpt as a negative-selectable marker requires an hpt- background, which must be created initially using 8AZH if the genome encodes the gene (Argyros et al., 2011). Similar C. thermocellum fermentation end product deletion strains were characterized and shown to secrete free amino acids (van der Veen et al., 2013). It was hypothesized that greater production of amino acids may have been employed to ameliorate intracellular redox imbalance by recycling NADP(+).

In the last several years, a variety of tools have been developed for C. thermocellum. Complementation is required to confirm genotypic characterization of allelic replacement or markerless deletion mutant strains. A temperature-sensitive C. thermocellum plasmid has also been developed (Olson & Lynd, 2012a). The SNAP-tag system was used to fluorescently label C. thermocellum proteins to investigate dockerin interactions and cellulosome composition (Waller et al., 2013). While this system may be useful as a reporter system in the future, it is likely further optimization will be required. Finally, one of the most exciting recent developments is the utility of a mobile group II intron from the thermophilic cyanobacterium Thermosynechococcus elongatus to construct a thermophilic ‘targetron’ system (Mohr et al., 2013). The mesophilic mutagenesis system based on the mobile group II intron has been a very powerful tool used to rapidly mutate genes in a variety of bacteria (Heap et al., 2007). The new thermophilic version has been applied to C. thermocellum and offers great promise to rapidly mutate and characterize genes of interest.

Ethanol tolerance

Progress has been made recently in characterizing inhibitory compounds found in pretreated lignocellulosic biomass hydrolysates and identifying genes involved in overcoming the inhibitors. There is often more than one pathway to achieve tolerance in a range of microorganisms (Liu, 2006; Almeida et al., 2007; Nevoigt, 2008; Mills et al., 2009; Pienkos & Zhang, 2009; Dunlop, 2011). Tolerance mechanisms include efflux pumps, heat-shock proteins, membrane modifications, and other stress responses (Dunlop, 2011). So far, only limited progress has been made in understanding the genetic basis of inhibitor tolerance (Stephanopoulos, 2007). Hence, the molecular basis for C. themocellum tolerance to a range of inhibitors present in pretreated biomass is unknown and to what extent different genes or traits (e.g. hypercellulase production; Mori, 1990) can be combined remains to be seen.

High product titer is an important industrial component and especially as it relates to capital and processing costs (Stephanopoulos, 2007). As briefly discussed earlier, while C. thermocellum has an elaborate and powerful enzyme system to breakdown biomass, it is sensitive even to low ethanol concentrations (16 g L−1; Herrero & Gomez, 1980). Clostridum thermocellum strain SS22 was tolerant to 64 g L−1 of ethanol, and ethanol concentrations between 27 to 50 g L−1 inhibited the growth of strains A1, C9, and S7 to about a half that of wild-type levels (Lynd et al., 2002), and maximal levels of 80 g L−1 of ethanol have been tolerated by adapted strains (Williams et al., 2007). However, the highest reported concentration of ethanol produced by C. thermocellum is < 30 g L−1 (Rani et al., 1996). Ethanol and hydrogen yields of ~ 0.6 and 1.3 mol mol−1 hexose, respectively, have been reported, and it is recognized that these values are well below the ‘Thauer limit’ of 2 moles of ethanol or 4 moles of hydrogen per mole hexose, respectively (Thauer et al., 1977; Rydzak et al., 2009; Rydzak et al., 2012). Discrepancies between the tolerance and productivity of isobutanol by Escherichia coli have been observed, and the difference between the highest concentration of a compound that is tolerated by a microorganism and the maximum concentration that it can produce is referred to as the ‘titer gap’ (Olson et al., 2012).

Ethanol tolerance and inhibition are typically complicated, incompletely understood traits, and it has been suggested that ethanol tolerance is unlikely to be a single gene trait (Ingram, 1990; Stephanopoulos, 2007). Ethanol and other solvents alter membrane fluidity, and membrane integrity is recognized as a key factor in ethanol tolerance (Ingram, 1990). Williams et al. (2007) took a targeted proteomics gel-based approach to investigate differences in membrane proteins between wild-type C. thermocellum and an ethanol-tolerant or ethanol-adapted (EA) culture in one of the first C. thermocellum ‘omic’ studies after the initial genome sequence became available. Eighty-one proteins were identified, the majority (73%) of which were encoded by genes that were down-regulated in the EA strain, and many of these were related to carbohydrate transport and metabolism. About one-third of the proteins that showed greater abundance in EA had roles in chemotaxis and signal transduction. This study suggested that EA membrane-associated proteins were being made in lower amounts or were improperly integrated into the membrane. Timmons et al. (2009) determined the wild-type and EA fatty acid compositions and their membrane anisotropy and noted that EA had more fatty acids with chain lengths > 16 : 0 and a significantly greater proportion 16 : 0 plasmalogens. They proposed a model that EA ethanol tolerance is due to fatty acid alterations that increase membrane rigidity and counter-act the fluidizing effect of ethanol (Timmons et al., 2009). These and earlier studies were conducted before inexpensive DNA sequencing techniques became readily available and when there was an inability to conduct C. thermocellum genetics.

The genome sequences were determined for C. thermocellum wild-type strain ATCC 27405 and the previously derived EA culture (Brown et al., 2011). Mutational hot spots were identified in the EA mutant genome sequence, and many were found in genes related to cellulose breakdown (Brown et al., 2011), which is consistent with its poor growth on crystalline cellulose (Williams et al., 2007). The EA alcohol dehydrogenase gene, encoded by adhE, had several nonsynonymous mutations, and this enzyme catalyzes the final step in ethanol biosynthesis (Brown et al., 2011). A combination of genetics, biochemistry, and structural studies showed that the mutant version of the AdhE protein had altered cofactor specificity that resulted in loss of NADH-dependent activity and associated gain of NADPH-dependent activity. The mutant version of EA adhE was found to be an important genetic determinant for the enhanced C. thermocellum ethanol tolerance and was not predicted a priori. The nicotinamide co-factor binding site of the ADH domain was also mutated in two independent, ethanol-tolerant C. thermocellum mutants, named E50A and E50C (Shao et al., 2011). The selection of the E50A and E50C mutant strains was notable, as the strategy alternated between increasing ethanol concentrations and selection pressure relief, which resulted in tolerant strains that grew similarly to the wild-type strain or better. Genetic studies are required to test whether or not the mutant versions of AdhE found in the E50A and E50C strains confer a similar advantage in ethanol tolerance as for strain EA and adhE*. Additional biochemical and structural studies of AdhE are also required to investigate the molecular mechanisms of ethanol tolerance in greater detail. Others have also been able to show that single gene targets can be identified for improved alcohol tolerance in Saccharomyces cerevisiae (Hong et al., 2010), and the link (if any) between metabolic inhibition and prior observations that ethanol intolerance is due to a loss of membrane integrity deserves more consideration. Nevertheless, observations that single gene targets can be identified and manipulated for improved industrially relevant traits will be useful for later rational engineering purposes.

The identification of important genes and mutations for traits of interest feeds into the concept of standard ‘parts lists’ and synthetic biology. Key aspects of synthetic biology include the establishment of a ‘parts list’ with detailed information on function and predictable behavior, an array of promoters for control and fine-tuning of expression and regulation, sensors or switches that sense and respond to different cues and techniques that allow rapid and efficient variations or permutations for pathways or products of interest using synthetic DNA and mathematical, rational, or engineering principles (Lee et al., 2008; Carr & Church, 2009; Lu et al., 2009; Clomburg & Gonzalez, 2010; Blaby-Haas & de Crécy-Lagard, 2011; Cho et al., 2011; Du et al., 2011; Dunlop, 2011; Lee et al., 2011; Pasotti et al., 2012). There is much promise for systems levels modeling and biodesign; however, the field of synthetic biology is nascent and many more studies are required before the promise of synthetic biology is fully realized for microorganisms like C. thermocellum.

Physiology and systems biology studies

The rapid expansion of next-generation DNA sequencing technologies, access to genome sequences, and increased sensitivities of mass spectrometers have enabled the emerging fields of systems biology and genome-scale metabolic models. ‘Omics’ tools have been put to good use in the study of thermophilic, microbial lignocellulose deconstruction.

Clostridum thermocellum transcriptomics

A C. thermocellum ATCC 27405 whole-genome DNA microarray was constructed (Brown et al., 2007) and updated to evaluate transcriptomes at different time points during batch crystalline cellulose fermentations (Raman et al., 2011). Cell growth slowed as the cells entered stationary phase during batch fermentations on Avicel, along with down-regulation of genes involved in glycolysis and energy generation, translation, amino acid, nucleotide, and coenzyme metabolism (Raman et al., 2011). In contrast, higher transcription levels were noted for genes involved in chemotaxis, flagella biosynthesis, signal transduction, transcription, and cellulosomal genes, perhaps as cells began to search for more favorable conditions.

Growth conditions are important to cellulosome composition (Bayer et al., 1985; Lynd et al., 2002; Demain et al., 2005), and the production of C. thermocellum cellulases has been investigated from cellular and cellulose yield perspectives (Zhang & Lynd, 2005). A global gene expression study examined transcript profiles for C. thermocellum ATCC 27405 grown with either cellobiose or crystalline cellulose in continuous culture at several different dilution rates. A subset of 348 genes had expression profiles responding to the different substrates and growth rates (Riederer et al., 2011). GHs, scaffoldins, and other proteins involved in cellulose utilization were among the most highly expressed genes. Each of these global transcript profiling studies provided insights into energy generation, redox balance, and cellular systems, including cellulosomal genes. In a number of cases, results from earlier studies that indicated cellulosomal genes were among the most highly expressed in C. thermocellum were confirmed. Cthe_0271, located in a region with other genes for cellulose utilization, suggested a function for this putative protein (Riederer et al., 2011). Cthe_2809 (celS) was the second most abundant transcript in this chemostat study. Differential gene expression, based on either substrate or growth rate, indicates C. thermocellum mechanisms for fine control of expression of key genes under different conditions. A number of genes encoding hypothetical proteins were highly expressed, and regulatory genes were also characterized (Riederer et al., 2011).

Genes and proteins related to nitrogen uptake and metabolism were among those most highly up-regulated in C. thermocellum after ethanol shock, as the cells attempted to overcome inhibition and resume growth (Yang et al., 2012). Interestingly, the C. thermocellum ureABCDEFG genes were expressed in T. saccharolyticum, which permitted a higher titer of T. saccharolyticum ethanol production (Shaw et al., 2012). Different nitrogen sources affect C. thermocellum growth and cellulase production, with urea giving optimal endo-β-glucanase production in cellobiose fermentations (Garcia-Martinez et al., 1980); this may point to possible routes for C. thermocellum metabolic engineering in the future (Yang et al., 2012). The global study into ethanol stress and control conditions was an important initial look into solvent inhibition from a systems perspective (Yang et al., 2012).

Clostridum thermocellum proteomics

Advances in mass spectrometry have led to improvements in mass accuracy, sensitivity, quantification, and the number of proteins identified in complex samples [for a recent proteomics review see Walther & Mann (2010) and other cited reviews within]. After the ATCC27405 draft genome sequence became available, it was analyzed and at least 71 putative cellulosomal genes were identified that also had dockerin modules (Zverlov et al., 2005b). Clostridium thermocellum F7 was grown on cellulose, and its cellulosomes were isolated and purified by an affinity digestion method (Morag et al., 1995). Subsequently, proteins were separated using two-dimensional gel electrophoresis and 13 were identified by Matrix Assisted Laser Desorption and Ionization-Time of Flight (MALDI-TOF) mass spectrometry. The affinity digestion procedure takes advantage of natural cellulosome release late in exponential growth and in stationary phase for a discrete examination of cellulosomes. In a subsequent proteomics study, the number of cellulosomal components identified was expanded to 32 proteins in C. thermocellum ATCC 27405 fermenting cellulose, cellobiose, cellulose/xylan, or barley β-glucan (Zverlov & Schwarz, 2008b). The aim of these studies was to elucidate cellulosome composition, including the approximate abundance of different subunits, to better understand the system and possibility recreate synthetic or designer cellulosome systems.

A metabolic isotope-labeling strategy was employed, in conjunction with liquid chromatography–tandem mass spectrometry (LC-MS/MS), to measure the relative makeup of C. thermocellum ATCC 27405 cellulosome subunits from cells grown on either cellulose or cellobiose (Gold & Martin, 2007). The detection of 41 cellulosomal proteins represented the highest number of cellulosomal proteins at that time. Among the proteins were 36 type I dockerin-containing proteins and 16 novel subunits. Spectral counts were normalized for scaffoldin protein, CipA, and greater numbers of anchor protein OlpB, exoglucanase Cel48S, Cel9K, and the GH9/GH44 endoglucanase CelJ were measured for cellulose-grown cellulosomes compared with cellobiose-grown cellulosomes. The endoglucanases from GH families GH8 (CelA) and GH5 (CelB, CelE, CelG) and hemicellulases (Xyn11A, Xyn10C, Xyn10Z, and Xgh74A were found at lower levels for cellulose-grown cellulosomes. Components of the cellulosome appeared to be regulated and tailored to different growth requirements, consistent with earlier studies (Lynd et al., 2002). In a similar study, C. thermocellum ATCC 27405 cellulosome composition was investigated for an expanded range of substrates and included proteomic analysis of C. thermocellum cellulosomes from cells grown on dilute-acid pretreated switchgrass, a model cellulosic feedstock for ethanol production (Raman et al., 2009). Growth substrates included crystalline cellulose (Avicel), amorphous cellulose (Z-Trim®), cellobiose, combinations of cellulose with pectin and xylan, and switchgrass. Proteomic analysis identified a total of 59 dockerin- and 8 cohesin-containing subunits that included 16 previously undetected cellulosomal proteins. These studies provided important insights into essential cellulosome proteins and different compositions under a variety of conditions that may prove useful for designer cellulosome or industrial enzyme preparations in the future.

Several recent system-level studies have examined C. thermocellum ATCC 27405 proteomes using LC-MS/MS approaches. Shotgun proteomics profiles for one time point for untreated control and ethanol-treated cells were generated as part of a larger transcriptomic and metabolomics study (Yang et al., 2012). More than 1300 proteins were detected, which represented the largest number of C. thermocellum proteins detected at that time, and, of these, 77 proteins were present at significantly different levels. Cellulosome proteins were not differentially expressed under the conditions used in this study. Among the proteins that were most highly up-regulated (8- to 50-fold) after ethanol treatment were proteins for nitrogen metabolism, such as Cthe_0197 (glutamine amidotransferase, class-II) and Cthe_0198 (glutamate synthase; NADPH), and others, such as Cthe_1823 (extracellular ligand-binding receptor) and Cthe_0200 (FAD-dependent pyridine nucleotide-disulfide oxidoreductase). For this time point comparison, the corresponding genes (Cthe_0197–8, Cthe_1823, and Cthe_0200) were also up-regulated. Cthe_0087 (maf protein), Cthe_3183 (TrkA-N), Cthe_0395 (RbsD or FucU transport), and Cthe_0715 (adenosylmethionine decarboxylase) were down-regulated 25- to 50-fold at the protein level in the treated cells. In this case, only Cthe_0395 and Cthe_0715 were significantly down-regulated at the transcript level. To some extent, transcript and protein levels were not highly correlated due to inherent differences in technology (Hack, 2004) and physiology including post-translational modification, and transcript or protein turnover rates (Cox et al., 2005). Taken together, these observations support the application of systems levels studies to harness a global view of an organism's metabolic networks.

A detailed proteomic analysis of the core metabolism of C. thermocellum ATCC 27405 was undertaken recently using a LC-MS/MS-based approach that also incorporated the use of isobaric tags for relative and absolute quantitation (Rydzak et al., 2012). This proteomic study adds to several elegant studies that investigated C. thermocellum ATCC 27405 enzyme profiles for pyruvate catabolism and end product formation during different growth phases and after adding different end products (i.e. H2, acetate, ethanol, formate, and lactate; Rydzak et al., 2009; Rydzak et al., 2011). Enzyme profiles and bioinformatics analyses of key proteins were determined under different growth conditions and led to a proposed pathway for pyruvate catabolism and end product formation (Rydzak et al., 2009). In a subsequent study, it was shown that the addition of end products could shift metabolism (Rydzak et al., 2011). The addition of end product induced changes in C. thermocellum metabolism, with addition of exogenous ethanol and lactate leading to increases in H2 and acetate yields. Ethanol yields increased when exogenous hydrogen (H2), acetate, or lactate were applied. However, key enzymes did not necessarily correlate with differences in final product yields, suggesting that end product yield changes could be directed by thermodynamic considerations rather than enzyme levels. Core proteins for glycolysis, glycogen metabolism, pentose phosphate pathway (PPP), and enzymes for the conversion of phosphoenolpyruvate to end products were measured, and their relative abundance indexes and differences in expression levels were compiled (Rydzak et al., 2012). This study represents an important resource or foundation against which different mutant strains or conditions can be compared to in the future.

Gene regulation in C. thermocellum

There has been extensive study of the cellulosome over the years. However, relatively few studies have looked into molecular mechanisms responsible for sensing and exerting coordinate regulatory control over C. thermocellum physiology under different conditions. The C. thermocellum ATCC 27405 genome encodes three members of the LacI family of transcriptional regulators, and one – glyR3 (Cthe_2808) – has been characterized in some detail (Newcomb et al., 2007). glyR3 is co-transcribed with celC (GH5 family cellulase, Cthe_2807) and licA (GH16 family lichinase, Cthe_2809); this operon represents the first example of a characterized C. thermocellum transcriptional regulator of GH genes. The GlyR3 protein has been shown to bind to an 18-bp near-perfect palindrome sequence in the celC promoter region for repression of celC, which is reversible by laminaribiose, a β-1–3 linked glucose dimer that likely signals β-1,3 glucan availability. Clostridium thermocellum is able to grow using laminaribiose or laminarin as sole carbon sources. Four C. thermocellum ABC transporters for β-1,4-linked glucose oligomers (cellodextrins) and one for β-1,3-linked glucose dimer (laminaribiose) were identified and their substrate specificities characterized (Nataf et al., 2009).

The region adjacent to the celC operon was examined, and transcripts for the celC–glyR3–licA–orf4–manB–celT region were mapped using Northern blot analyses and transcription initiation sites determined by primer extension (Newcomb et al., 2011). Under the conditions assayed, Northern analysis indicated co-transcription of celC–glyR3–licA, and transcript mapping using real-time RT-PCR analysis showed the manB-celT (cellulosomal genes) were co-transcribed. celC, glyR3, and licA expression levels were highest in late exponential phase, when cells were grown on laminarin, and ≥ 2.5-fold higher than for cellobiose or cellulose growth.

Sigma factors are crucial components of the bacterial RNA polymerase, and they have integral roles in transcriptional regulatory networks through their selectivity to different promoter sequences and the genes they control. As briefly mentioned above, genomic analysis of C. thermocellum identified anti-σ factors that had strong similarity to the Bacillus subtilis membrane-associated anti-σ factor RsgI over the N-terminal domains (Kahel-Raifer et al., 2010). At the C-termini, six C. thermocellum RsgI-like proteins had predicted CBMs, which were novel to this species and absent in cellulolytic clostridia. The genes are arranged in several putative bi-cistronic operons with each encoding an RsgI-like anti-σ factor and a putative alternative sigma factor σI (SigI). A novel regulatory model for cellulosomal gene expression was proposed, such that extracellular biomass components are sensed by the CBM on the RsgI. This in turn transmits the signal to the intracellular SigI to activate expression of the suitable CAZyme genes to deconstruct the appropriate biomass. Dissociation constants were in the range of 0.02–1 μM for anti-σI factors and their cognate σ factors, and differences in expression levels were revealed by quantitative real-time RT-PCR (Kahel-Raifer et al., 2010) and by microarray analysis (Raman et al., 2011; Riederer et al., 2011; Yang et al., 2012).

Recent genetic system developments and advances in systems biology offer the prospect of being able to map C. thermocellum transcription unit architecture and build predictive models of gene-regulatory networks. However, global studies of C. thermocellum physiology and regulation are still in the early stages, and many more mutant strains and conditions need to be studied and integrated.

Genome-scale metabolic models

Constraint-based metabolic network modeling plays an important role in biosystems design of microorganisms as novel microbial biocatalysts. Metabolic pathway analysis (e.g. elementary mode analysis) and flux balance analysis are powerful and predictive tools for rational strain design (Trinh et al., 2009; Lewis et al., 2012). For reviews on genome-scale metabolic modeling, see Price et al. (2004), Feist & Palsson (2008), Park et al. (2008), Durot et al. (2009), Orth & Palsson (2010), Lutke-Eversloh & Bahl (2011), Kim et al. (2012), and Lewis et al. (2012).

A C. thermocellum ATCC 27405 genome-scale metabolic model, named iSR432, was developed to better understand and manipulate C. thermocellum physiology for biofuel production (Roberts et al., 2010). The model represented 577 reactions for 525 intracellular metabolites, 432 genes and also considered a proteomics representation of a cellulosome. The metabolic model led to proposed annotation changes for 27 genes over the annotations automatically assigned. The iSR432 model also predicted gene deletions for increased ethanol production, some of which had been suggested in earlier studies (Lynd et al., 2002). Subsequently, 454-based RNA-Seq data were integrated into the iSR432 genome-scale metabolic model, and predicted fluxes for 88 reactions were altered to provide a more detailed metabolic assessment (Gowen & Fong, 2010). Later, C. acetobutylicum and C. thermocellum metabolic models were compared and a knowledgebase environment was proposed to provide open, standardized descriptions for metabolites and reactions, so that metabolic pathways and models can be more readily compared, and strains can be developed rapidly (Kumar et al., 2012). Models describing new experimental observations, such as the fact that C. thermocellum strain ATCC 27405 can enter a dormant L-form that is different from the spore form, and analysis of environmental triggers may in the future assist industrial strain development (Mearls et al., 2012).

A complete accounting of a microorganism's carbon balance is imperative to meet applied strain development goals, which will also likely feed into improved metabolic models. Until recently, there had often been incomplete C. thermocellum carbon recovery when typical fermentation products had been measured. The gap between theoretical and experimental carbon recovery was substantially closed, and ~ 11% of the original substrate carbon was identified in nonstandard extracellular compounds that included malate, pyruvate, uracil, soluble glucans, and extracellular free amino acids; these account for ~ 93% of the final product carbon (Ellis et al., 2012). Nitrogen was also tracked, because a high concentration of protein and free amino acids (~ 5%) was found in the medium. A defined, low-background carbon growth medium was developed as part of these studies to enable metabolic studies and additional medium improvements (Holwerda et al., 2012). Access to systems biology data linked to controlled fermentations, and new genetic capabilities, will facilitate further refinements in C. thermocellum metabolic models for strain development purposes.

Thermophilic CBP Paradigm 2: Caldicellulosiruptor spp.

Interest in finding an extremely thermophilic, cellulolytic cognate of C. thermocellum led to the discovery of the genus Caldicellulosiruptor (Reynolds et al., 1986; Sissons et al., 1987) and, consequently, a second paradigm for thermophilic cellulose hydrolysis that involves large modular enzymes rather than a cellulosome (Gibbs et al., 2000). Species belonging to the genus Caldicellulosiruptor (Topt 65–78 °C) are globally distributed and have been isolated from terrestrial hot springs in New Zealand Rainey et al., 1994), Russia (Svetlichnyi et al., 1990; Miroshnichenko et al., 2008), Iceland (Nielsen et al., 1993; Mladenovska et al., 1995; Bredholt et al., 1999), and North America (Huang et al., 1998; Hamilton-Brehm et al., 2010). Initially described isolates from New Zealand were enriched in medium containing microcrystalline cellulose and screened for reducing sugars to indicate active cellobiohydrolase activity (Sissons et al., 1987), although since then, less cellulolytic members of the genus have been described (Blumer-Schuette et al., 2010). Whole-genome sequences for eight Caldicellulosiruptor species ranging from weakly to strongly cellulolytic members are available (van de Werken et al., 2008; Kataeva et al., 2009; Elkins et al., 2010b; Blumer-Schuette et al., 2011). These have facilitated genome-level comparisons to other thermophiles (Dam et al., 2011) and within the genus (Blumer-Schuette et al., 2012; Fig. 7). Overall, the most unique aspect of the genus Caldicellulosiruptor lies in the architecture of modular carbohydrate-active enzymes that may have been an ancient corollary to cellulosomes at high temperatures.

Figure 7.

Representation of the Caldicellulosiruptor pan and core genome. The inner (black) circle represents all common proteins shared across the eight sequenced Caldicellulosiruptor species. The middle (grey) circle represents the dispensable genome which includes proteins that are shared between two to seven genomes. The broken outer circles represent the number of unique proteins determined for each species. Carbohydrates listed in the circles indicate which catalytic capacities are endowed by the core- or dispensable genomes. Species abbreviations follow the gene locus convention: Athe, Caldicellulosiruptor bescii; Calhy, Caldicellulosiruptor hydrothermalis; Calkr, Caldicellulosiruptor kristjanssonii; Calkro, Caldicellulosiruptor kronotskyensis; Calla, Caldicellulosiruptor lactoaceticus; COB47, Caldicellulosiruptor obsidiansis; Calow, Caldicellulosiruptor owensensis; Csac, Caldicellulosiruptor saccharolyticus.

Evolution of modular, multifunctional carbohydrate-active enzymes

GH and CBM duplication and shuffling

Large, multimodular carbohydrate-active enzymes were discovered to be the distinguishing feature of the genus Caldicellulosiruptor, with CelB (Table 4) being the first described modular multifunctional enzyme (Saul et al., 1990), followed by the sequencing of other modular enzymes, such as manA (Gibbs et al., 1992) and celA (Te'O et al., 1995). Incidentally, although discovered almost 20 years ago, CelA remains one of the largest discrete cellulase proteins known (1751 amino acids, predicted molecular weight of 194.8 kDa). Native homologous versions of CelA from C. bescii were larger than the predicted molecular weight by gel electrophoresis (observed molecular weight of 230 kDa) most likely the result of glycosylation (Zverlov et al., 1998a). These modular, multifunctional enzymes are often co-located in genomic loci of strongly to moderately cellulolytic species and were later called the ‘glucan degradation locus’ (GDL; Blumer-Schuette et al., 2012).

Table 4. Biochemically characterized carbohydrate-active enzymes from the genus Caldicellulosiruptor
Locus tag aActivityGH familybTopt (°C)cReferences
  1. a

    Locus tag abbreviations indicated species: Athe, Caldicellulosiruptor bescii; Csac, Caldicellulosiruptor saccharolyticus; NA10, ‘Thermoanaerobacter cellulolyticus’; Rt69B.1, Caldicellulosiruptor sp. strain Rt69B.1; Rt8B.4, Caldicellulosiruptor sp. strain Rt8B.4.

  2. b

    In the case of modular enzymes, the GH module responsible for reported activity is in bold.

  3. c

    In the case of modular enzymes, the optimal temperature for designated catalytic activity is reported.

  4. d

    The GH5A catalytic module displays both activities.

Athe_0458CellodextrinaseGH1N/ASu et al. (2012b)
Athe_1859Endo-1,4-β-glucanaseGH5B, CBM3, CBM3, GH4485Ye et al. (2012)
Athe_1865Endo-1,4-β-glucanaseGH5B, CBM3, CBM3, CBM3, GH985Su et al. (2012a)
Athe_1867 (CelA)Endo-1,4-β-glucanaseGH9, CBM3, CBM3, CBM3, GH4895–100Zverlov et al. (1998a)
Athe_1867 (CelA)Exo-1,4-β-glucanaseGH9, CBM3, CBM3, CBM3, GH4885Zverlov et al. (1998a)
Csac_0678dEndo-1,4-β-glucanaseGH5A, CBM2875Ozdemir et al. (2012)
Csac_1076 (CelA)Endo-1,4-β-glucanaseGH9, CBM3, CBM3, CBM3, GH48N/ATe'O et al. (1995)
Csac_1077 (ManA)Endo-1,4-β-glucanaseGH5B, CBM3, CBM3, GH44N/AFrangos et al. (1999)
Csac_1078 (CelB)Endo-1,4-β-glucanaseGH10, CBM3, GH5C95Saul et al. (1990), Park et al. (2011) and VanFossen et al. (2011)
Csac_1079 (CelC)Endo-1,4-β-glucanaseGH9, CBM3, CBM3, CBM3N/AMorris et al. (1995)
Csac_1089 (BglA)β-Glucosidase GH170Love et al. (1988), Hong et al. (2009b) and Hardiman et al. (2010)
NA10 CelBEndo-1,4-β-glucanaseGH10, CBM3, GH5CN/AMiyake et al. (1998)
Athe_0185Endo-1,4-β-xylanaseGH10N/ASu et al. (2012b)
Csac_0678dEndo-1,4-β-xylanaseGH5A, CBM28N/AOzdemir et al. (2012)
Csac_1078 (CelB)Endo-β-1,4-xylanaseGH10, CBM3, GH5C80Saul et al. (1990), Park et al. (2011) and VanFossen et al. (2011)
Csac_2404 (XynC)β-Xylosidase GH39N/ALüthi et al. (1990)
Csac_2405 (ORF4)Endo-1,4-β-xylanase GH10N/ALüthi et al. (1990)
Csac_2408 (XynA)Endo-1,4-β-xylanase GH1070Lüthi et al. (1990)
Csac_2409 (XynD)Endo-1,4-β-xylanaseGH39N/AVanFossen et al. (2011)
Csac_2410 (XynE)Endo-1,4-β-xylanase CBM22, CBM22, GH10N/AVanFossen et al. (2011)
Rt69B.1 XynAEndo-1,4-β-xylanase CBM22, CBM22, GH1065Morris et al. (1999)
Rt69B.1 XynBEndo-1,4-β-xylanase CBM22, CBM22, CBM22, GH10, CBM9, CBM970Morris et al. (1999)
Rt69B.1 XynCEndo-1,4-β-xylanaseCBM22, CBM22, GH10, CBM3, CBM3, CBM3, GH43, CBM665Morris et al. (1999)
Rt69B.1 XynDEndo-1,4-β-xylanase GH11, CBM3670Morris et al. (1999)
Rt8B.4 XynAEndo-1,4-β-xylanaseCBM22, CBM22, GH1070Dwivedi et al. (1996)
Athe_1859β-1,4-MannanaseGH5B, CBM3, CBM3, GH4485Ye et al. (2012)
Athe_1865 β-1,4-MannanaseGH5B, CBM3, CBM3, CBM3, GH990Su et al. (2012a)
Csac_1077 (ManA)β-1,4-MannanaseGH5B, CBM3, CBM3, GH4480Lüthi et al. (1991) and Gibbs et al. (1992)
Csac_1080 (ManB)β-1,4-MannanaseGH580Morris et al. (1995)
Rt8B.4 ManAβ-1,4-MannanaseCBM27, CBM27, CBM35, GH2675Gibbs et al. (1996) and Sunna (2010)
Side chain-degrading
Csac_1018β-GalactosidaseGH4280Park & Oh (2010)
Csac_1561α-L-ArabinofuranosidaseGH5180Lim et al. (2010)
Csac_2411 (XynF)α-L-Arabinofuranosidase GH43, CBM22, GH43, CBM6N/ABergquist et al. (1999) and VanFossen et al. (2011)
Csac_1560Endo-1,5-α-L-arabinanaseGH4375Hong et al. (2009a)

Sequenced genomic library clones from C. saccharolyticus (Te'O et al., 1995) and Caldicellulosiruptor sp. Tok7B.1 (Bergquist et al., 1999; Gibbs et al., 2000) were the first observations of co-located, large modular enzymes containing CBM3s (Fig. 8). Similar clusters of CBM3-containing enzymes were also found in C. bescii (Dam et al., 2011) and other cellulolytic Caldicellulosiruptor species, in addition to the weakly cellulolytic C. kristjanssonii (Blumer-Schuette et al., 2012). One hypothesis concerning the architecture of this region is that recombination after gene duplication led to the diversity of modular enzymes (Bergquist et al., 1999; Gibbs et al., 2000), and similar hypotheses have also been proposed as to the proliferation of cellulosomal enzymes (Bayer et al., 2006). Indeed, a reported high level of amino acid sequence identity across CBM3s, GH48 modules, and GH5 modules from the GDL in C. bescii supports this theory (Dam et al., 2011).

Figure 8.

Layout of the glucan degrading genomic locus (GDL) with enzymes enriched in family 3 CBM. Each GDL is lined up in reference to respective genes that encode for orthologous GH9 proteins, designated by a caret (^). Angled, broken lines indicate a contiguous sequence that has been broken up to align with larger GDL layouts. Boxes with dotted lines indicate genes encoding enzymes that have swapped genomic locations in Caldicellulosiruptor bescii with reference to the GDL of Caldicellulosiruptor kronotskyensis. Solid boxes indicate genes encoding enzymes that are from separate areas of the respective species’ genome. Symbols above the GDL indicate genes encoding enzymes or genomic regions that are conserved in two or more species. Species abbreviations follow gene locus tags in the case of completed genome sequences: Athe, Caldicellulosiruptor bescii; Calkro, Caldicellulosiruptor kronotskyensis; COB47, Caldicellulosiruptor obsidiansis; Csac, Caldicellulosiruptor saccharolyticus; Tok7, Caldicellulosiruptor sp. strain Tok7B.1; Calla, Caldicellulosiruptor lactoaceticus; Calkr, Caldicellulosiruptor kristjanssonii; Rt69, Caldicellulosiruptor sp. strain Rt69B.1. GH modules are depicted as ovals: red, GH9; grey, GH48; olive, GH5; magenta, GH74; teal, GH44; lilac, GH10; green, GH43. Polysaccharide lyases are depicted as blue rectangles, glycoside transferase family 39 as beige arrows, AraC transcriptional regulator as orange rectangles and hypothetical proteins as white arrows. CBM3s are white diamonds, CBM22s are red diamonds, and CBM6s are yellow diamonds. Modified from the study by Blumer-Schuette et al. (2012).

Phylogeny based on amino acid sequence of all GH5 modules from the genus Caldicellulosiruptor indicates that there are three clades of extracellular GH5 modules, here referred to as GH5a, GH5b, and GH5c (Fig. 9). The GH5a module has not been subject to duplication and also is unique in that it is likely the only S-layer located enzyme from the genus Caldicellulosiruptor that exhibits β-1,4-glucanase activity (Csac_0678, Table 4; Ozdemir et al., 2012). As a part of the overall ability of one species, C. bescii, to hydrolyze crystalline cellulose, synergy between the GH44 module of Athe_1859 and various enzymes from C. bescii with endo-glucanase activity, such as Athe _0594 (GH5a), Athe_1865 (GH9), and Athe_1866 (GH5c), has been reported (Ye et al., 2012). The GH5b module is subject to the most duplication in genomes, potentially once before speciation between the New Zealand and Russian species and then again before speciation of the cellulolytic Russian species. Biochemical characterization of modular enzymes and truncation mutants from this module family confirms their role as mannanases (Morris et al., 1995; Frangos et al., 1999; Su et al., 2012a) that are paired with various β-1,4-glucanase modules (GH9, GH44 and GH5c). The third extracellular module, GH5c, is also found as a single module in multifunctional, modular enzymes paired with xylanases (GH10; VanFossen et al., 2011) or β-mannanases (GH5b). In support of the intragenic recombination hypothesis, a linker segment from a modular GH5c enzyme (Csac_1078) was found to be longer (van de Werken et al., 2008; VanFossen et al., 2011) than previously reported (Saul et al., 1990), and orthologs of this enzyme will vary in the number of CBM3s observed (one, Csac_1078; two, COB47_1671; and three, sp. Tok7B.1). Overall, the abundance of cellulase/mannanase pairings in various Caldicellulosiruptor species underscores the significance and potential synergy between these modules during plant biomass deconstruction.

Figure 9.

Clades of GH5 modules from the genus Caldicellulosiruptor based on amino acid alignments. Clade ‘A’ represents orthologs to the surface-layer-bound extracellular enzyme, Csac_0678 (Ozdemir et al., 2012), Thermoanaerobacterium saccharolyticum Tsac_2253 is used as an outlier for this clade. Clade ‘B’ represents modular extracellular enzymes with GH5 modules that are linked to GH5 modules from Clade ‘C’, GH9 or GH44 modules, Mahella australiensis Mahau_1112 is used as an outlier for this clade. Csac_1080 is the exception being a nonmodular, intracellular GH5 enzyme, but was included in this analysis because it is separated from Csac_1079 by a frameshift mutation. Clade ‘C’ represents modular extracellular enzymes with GH5 modules that are linked to GH5 modules from Clade ‘B’, or GH10 modules, Clostridium thermocellum Cthe_0405 is used as an outlier for this clade. The intracellular clade of GH5 modules represents those enzymes with no signal peptides.

In comparison with multiple subsets of GH5 modules, a single GH48 module, which was previously reported as being a genetic determinant for cellulolytic ability (Blumer-Schuette et al., 2012), is present in multiple copies in the genomes of C. obsidiansis, C. kronotskyensis, and C. bescii. As observed for C. bescii (Dam et al., 2011), these GH48 modules share 99% amino acid identity within each species, likely indicating that the multiple GH48 modules are the result of postspeciation duplication. Duplication of GH48 modules that has been selected for in at least three Caldicellulosiruptor species highlights the value of cellobiohydrolases for cellulolytic species, which has also been observed in Clostridium thermocellum (Olson et al., 2010). Both GH9 modules that are also found in the GDL are not as related and represent two distinct clades. However, these two modules are still more similar in amino acid sequence than other GH9 modules from other genera and likely represent an ancient domain duplication event in a common ancestor. When overall architecture of these modular enzymes is considered, the GDL appears to be fairly stable, considering that relatively few novel modular combinations are observed within the GDL of various species (Fig. 8). Comparing available whole genomes and partial sequence data, there are three unique modular combinations from the GDL (GH9/GH44, sp. Tok7B.1; PL11/GH44, C. lactoaceticus; and GH74/GH44, C. kristjanssonii; see Fig. 8).

Genome sequences from other species in the genus have uncovered other, even larger modular enzymes than those located in the GDL, including a modular GH16/GH55 enzyme of 2,435 amino acids (Predicted molecular weight, 263.9 kDa) from C. kronotskyensis (GenBank accession YP_004022846). This unique genomic area is comprised of two clusters of genes rich in CBM32 signatures. Similar to how the GDL may have undergone domain duplication and intragenic recombination, this unique locus also has multiple GH modules from the same family (GH81, GH55, GH16) linked to CBM32s. As more Caldicellulosiruptor species are genome-sequenced, it will be interesting to see whether there are other CBM families that loci like this and the GDL are built around. Identifying another example of a duplication locus similar to the GDL gives more credence to the theory that these loci are evolutionary parallels to the cellulosomal paradigm, where selection pressure for large multimodular enzymes with synergistic catalytic activity is present in hot spring environments.

Cellulolytic capability – gene gain or loss?

Beyond investigating how loci like the GDL are initially selected for and develop, it was also possible to use comparative genomics to determine genetic determinants for cellulolytic capacity within the genus, by comparing weakly to strongly cellulolytic species of Caldicellulosiruptor (Blumer-Schuette et al., 2010). Overall, the defining determinant between the strongly and weakly cellulolytic species was the presence or absence of a single GH48 module (Blumer-Schuette et al., 2012). This particular GH family is found as part of modular, multifunctional enzymes in the GDL of strongly to moderately cellulolytic Caldicellulosiruptor species. In weakly cellulolytic species, C. hydrothermalis, C. kristjanssonii, and C. owensensis, most if not all of the GDL is absent, along with any enzymes containing a GH48 module (Fig. 8). In the case of C. hydrothermalis and C. owensensis, it appears that the entire GDL was deleted, as the conserved up- and downstream sequences are present as a consecutive block (Fig. 10). An additional portion is missing (Fig. 10) from C. hydrothermalis, which led to the hypothesis that weakly cellulolytic species had undergone gene loss in this genomic region (Blumer-Schuette et al., 2012). Additionally, when the genomes of C. kristjanssonii and C. lactoaceticus were annotated, rearrangement in their genomes had led to the dispersal of the modular GDL enzymes into four loci, and both species also have the consecutive block of up- and downstream sequences that typically surround the GDL similar to C. hydrothermalis and C. owensensis (Figs 8 and 10). Further intragenic recombination in the genome of C. kristjanssonii after speciation from C. lactoaceticus has created novel modular combinations (GH74/GH44) and also the loss of or the division of some of the modular, multifunctional enzymes. In the case of moderately cellulolytic C. lactoaceticus, some of the modular, multifunctional enzymes in the GDL have been divided into nonmodular or modular, mono-functional enzymes (Fig. 8).

Figure 10.

Genomic neighborhoods surrounding the GDL from fully sequenced Caldicellulosiruptor species. Green boxes denote homologous upstream neighborhoods, and purple boxes denote downstream neighborhoods. Gold denotes the GDL in species that possess that particular locus. Dashed lines link contiguous genomic neighborhoods. For Csac, the dashed lines indicate that the purple downstream region appears in a different region of its genome. Species abbreviations follow the gene locus tag convention: Athe, Caldicellulosiruptor bescii; Calhy, Caldicellulosiruptor hydrothermalis; Calkr, Caldicellulosiruptor kristjanssonii; Calkro, Caldicellulosiruptor kronotskyensis; Calla, Caldicellulosiruptor lactoaceticus; COB47, Caldicellulosiruptor obsidiansis; Calow, Caldicellulosiruptor owensensis; Csac, Caldicellulosiruptor saccharolyticus.

As discussed above, weakly and moderately cellulolytic species were found to have rearrangements in their genome resulting in pieces of the GDL re-locating away from the typical colinear block. Mobile elements would be suspected as the culprits in this process. When the genome of strongly cellulolytic C. bescii was analyzed, fewer copies of complete mobile elements were found, in comparison with C. saccharolyticus, indicating that the genome of C. bescii was more stable (Dam et al., 2011). However, both species possessed a similar number of partial mobile elements, hinting at an active history of lateral transfers and recombination in their genomes. In all cases of the strongly cellulolytic Caldicellulosiruptor species, the GDL is flanked by one or more partial mobile elements. However, these elements are not the same from species to species, indicating that while the flanking regions of the GDL appear to be hot spots for mobile elements, recent activity in the area has occurred postspeciation. It is tempting then to propose that at some point the last common ancestor had gained an enzyme or enzymes through lateral transfer that then duplicated and evolved into the large (14.4- to 44.6-kb) GDL that is currently observed in the genomes of strongly cellulolytic species. A more representative hypothesis concerning the cellulolytic history of the genus Caldicellulosiruptor may involve two stages: the first stage involving the last common ancestor gaining an enzyme or enzymes that eventually became the GDL through lateral transfer, and then a second stage occurring postspeciation with either gradual gene loss or gene/domain duplication.

Adherence to complex substrates: S-layer located proteins and enzymes

While the genus Caldicellulosiruptor is differentiated from the moderately thermophilic clostridia by the absence of a cellulosome, some extracellular enzymes are attached to the cell. Attachment is facilitated by the surface (‘S’)-layer, a para-crystalline protein lattice that is thought to protect the cell and also facilitate a pseudo-periplasmic space for Gram-positive bacteria (Engelhardt, 2007). One of the defining features of the Avicel- and xylose-induced secretome from C. saccharolyticus is the major S-layer protein, which was theorized to play a role in biomass deconstruction although its exact role was unclear (Blumer-Schuette et al., 2010). Formation of a biofilm on crystalline cellulose by C. obsidiansis was observed and characterized to be similar to ‘craters’ that Clostridium thermocellum forms during cellulose hydrolysis (Wang et al., 2011; Dumitrache et al., 2013). The common mechanism by which a cellulolytic biofilm will drill into substrate, regardless of the presence of a cellulosome or not (Wang et al., 2011), indicates that C. obsidiansis and other Caldicellulosiruptor species, by extension, possess mechanisms through which they adhere to cellulose. S-layer linked (SLH) enzymes and proteins have been demonstrated to be a part of this mechanism for the genus Caldicellulosiruptor (Blumer-Schuette et al., 2012).

One annotated S-layer-associated enzyme is part of the Caldicellulosiruptor core genome, and an ortholog has been characterized from C. saccharolyticus. This core enzyme is a modular GH5/CBM28 endo-glucanase/xylanase (Ozdemir et al., 2012) that is represented by the GH5a clade discussed earlier. Biochemical characterization of one ortholog (Csac_0678), discovered that this enzyme has a unique property of being multifunctional against C5 and C6 polysaccharides, including microcrystalline cellulose, while only possessing one CM (Ozdemir et al., 2012). Furthermore, the presence of the CBM28 is critical for adherence to Avicel and also for activity on insoluble substrates (Ozdemir et al., 2012). Homologs to this GH5/CBM28 enzyme have been detected in proteomic screening of extracellular fractions from C. bescii grown on filter paper (Dam et al., 2011) and whole-cell lysate from C. obsidiansis grown on filter paper, Avicel and switchgrass (Lochner et al., 2011a). As this enzyme possesses S-layer homology domains, detection of the enzyme in whole-cell lysates would be expected and supports the role of SLH enzymes and proteins in the mechanism by which Caldicellulosiruptor species will adhere to substrates. In addition, inclusion of this enzyme in the Caldicellulosiruptor core genome appears to be correct, as homologs to this enzyme have been sequenced from a novel Caldicellulosiruptor species (sp. F23) and from an enrichment culture (GenBank accession numbers AFO70071 and AEL31246, respectively).

Aside from this GH5/CBM28 enzyme, there are other, noncore, S-layer-associated enzymes present in the Caldicellulosiruptor pan genome. Two species isolated from Kamchatka, Russia (C. hydrothermalis and C. kronotskyensis), are enriched in SLH enzymes, possibly selected for under stronger pressure to directly adhere to substrates that they are deconstructing. Other SLH proteins are also present in all eight sequenced genomes, some with identifiable protein modules (Dam et al., 2011; Ozdemir et al., 2012). Biochemical characterization of one such protein from C. saccharolyticus (Csac_2722) showed that this protein was capable of binding to Avicel and also possessed weak endo-glucanase activity (Ozdemir et al., 2012). Interestingly, this protein is larger than some of the modular enzymes (2593 aa) and is unique to C. saccharolyticus with no identified homologs as of yet. Other large SLH proteins have also been identified (Ozdemir et al., 2012) and, in the case of C. bescii (Athe_0012; 3027 aa), also detected in extracellular and membrane protein fractions from cells grown on cellulose and xylan (Dam et al., 2011; Ozdemir et al., 2012). Further characterization of additional SLH proteins and parsing out their roles in extracellular interactions is clearly warranted.

Systems biology of the genus Caldicellulosiruptor

After the first whole-genome sequence of C. saccharolyticus was determined in 2007 (van de Werken et al., 2008), it was possible to complement biochemical data with transcriptomics and proteomics. With the release of the genome sequence from C. bescii, a comparison between the two Caldicellulosiruptor genomes determined that they shared over 2300 orthologs, or roughly 87% of their genomes (Dam et al., 2011). Sequences of five additional genomes from the genus Caldicellulosiruptor (Elkins et al., 2010b; Blumer-Schuette et al., 2011) helped to define the proteins that are conserved across all eight sequenced species (‘core genome’, Fig. 7; Blumer-Schuette et al., 2012), which is considerably smaller than the original core genome proposed based on the genomes from C. saccharolyticus and C. bescii. Given that roughly half of a species’ genome is shared (core) among the genus Caldicellulosiruptor, transcriptomic and proteomic studies not only stand to explain phenomena in one individual species, but also other closely related members from the genus. Aside from answering basic questions about the physiology of the genus Caldicellulosiruptor, it is critical to identify enzymes, sugar transporters, substrate adhering proteins, and catabolic pathways that are expressed in response to plant biomass and plant biomass-derived carbohydrates for the development of species from this genus as CBP microorganisms.

Caldicellulosiruptor monosaccharide transcriptomes

Caldicellulosiruptor saccharolyticus was the first Caldicellulosiruptor species sequenced and, as such, was also the first member of this genus for which transcriptomic analysis could be used to observe how the entire genome responded to environmental stimuli, such as differing carbon sources. Along with the interest in the genus Caldicellulosiruptor for their GH enzymes, this genus also generates high yields of hydrogen when grown on monosaccharides (van Niel et al., 2002; Kádár et al., 2003; de Vrije et al., 2007) and waste paper or plant biomass hydrolysates (Kádár et al., 2003; Kádár et al., 2004; de Vrije et al., 2009). Furthermore, this genus has been shown to co-ferment both C5 and C6 sugars (Kádár et al., 2003; van de Werken et al., 2008). Prior to the public release of a genome sequence for C. saccharolyticus, metabolite analysis based on the fermentation of various 13C-labeled carbons in glucose determined that C. saccharolyticus uses the Embden-Meyerhof (EM) pathway for glycolysis (de Vrije et al., 2007). Transcriptomics of C. saccharolyticus growing on glucose, xylose, or a mixture thereof supported the biochemical evidence and genes involved in the EM pathway were consistently up-regulated, along with the nonoxidative PPP when xylose was present. These data also supported the premise that carbon catabolite repression (CCR) by glucose does not occur in C. saccharolyticus (van de Werken et al., 2008; VanFossen et al., 2009). While no CCR system exists in C. saccharolyticus, many studies have noted that xylose is preferentially fermented versus glucose (Kádár et al., 2004; van de Werken et al., 2008). Another study determined that of six monosaccharides tested, a hexose (fructose), rather than a pentose (xylose), was consumed at the highest rate (VanFossen et al., 2009).

Transcriptomic analysis of the response of C. saccharolyticus to different monosaccharides could predict the specificity of eight out of 24 predicted ATP-binding cassette (ABC) transporters (VanFossen et al., 2009). Preference for fructose may be the result of additional fructose-transporters, comprised of both ABC transporters and a phosphotransferase system that is capable of importing fructose into the cell, compared with glucose and xylose which are only imported by ABC transporters (van de Werken et al., 2008; VanFossen et al., 2009). From the transporters predicted to import glucose, one substrate-binding protein (Csac_2506) was also detected in the extracellular fraction of C. saccharolyticus, which is not unexpected because the protein is predicted to be extracellular (Andrews et al., 2010). This particular ABC transporter was also up-regulated on a variety of sugars (van de Werken et al., 2008; VanFossen et al., 2009) and may be an important, broad-specificity sugar transporter for C. saccharolyticus. Quantitative detection of proteins can be a useful complement to transcriptomics, as the detection of a protein can support the transcriptomic evidence or bolster evidence for poorly transcribed genes.

Caldicellulosiruptor polysaccharide transcriptomes

As Caldicellulosiruptor species are potential CBP microorganisms, their response to polysaccharides found in biomass will be important in the context of identifying important enzymes, carbohydrate transporters, and other biomass-deconstruction-related proteins. Studies have so far focused on the transcriptomics of C. saccharolyticus grown on xylan and xyloglucans (VanFossen et al., 2009; VanFossen et al., 2011), transcriptomics of C. bescii grown on filter paper, proteomics of C. bescii grown on filter paper or xylan (Dam et al., 2011), proteomics of C. bescii and C. obsidiansis grown on Avicel (Lochner et al., 2011b), and proteomics of seven C. species grown on Avicel (Blumer-Schuette et al., 2012). The up-regulation and secretion of plant-biomass-deconstructing enzymes are of considerable interest, and several strongly cellulolytic Caldicellulosiruptor species have been studied. Comparisons between cellulose and xylan-induced proteomes from C. bescii found only a few enzymes that are detected under one condition versus the other. Consistent with growth on xylan, enzymes capable of hydrolyzing (GH10, Athe_0618) and de-esterifying xylan (CE7, Athe_0152) were detected as xylan-specific (Dam et al., 2011). Growth on cellulose led to the detection of glucan-specific enzymes, including β-glucosidases, cellobiose phosphorylases, and enzymes involved in α-glucan hydrolysis, which was proposed as evidence that the regulation systems are not discriminating against the type of sugar linkage present (Dam et al., 2011). Enzymes typically associated with cellulose hydrolysis, mostly from the previously discussed GDL genomic region, were found in the cellulose- and xylan-induced extracellular proteomes of C. bescii (Dam et al., 2011), in the cellobiose-induced extracellular proteome of C. obsidiansis (Lochner et al., 2011a), and also in the glucose-induced extracellular proteome of C. saccharolyticus (Andrews et al., 2010), indicating that these organisms can adapt to rapidly changing environments by expressing crucial enzymes at low levels (VanFossen et al., 2011). Further comparisons between the extracellular proteome of strongly cellulolytic C. bescii and C. obsidiansis highlighted the time-dependent accumulation of cellulose hydrolyzing enzymes from the GDL in the extracellular proteome (Lochner et al., 2011b).

Confirmation of the importance of enzymes encoded by the GDL was also noted in a study that compared both weakly and strongly cellulolytic Caldicellulosiruptor species, with the CBM3-containing enzymes detected in the substrate-bound and extracellular proteomes of the strongly celluloytic species (Blumer-Schuette et al., 2012). Continuous accumulation of the CBM3-containing enzymes from C. bescii and C. obsidiansis in the extracellular proteome is most likely due to a combination of both gene up-regulation and saturation of binding sites on the substrate leading to the increased presence of the enzymes in cell-free supernatant. After 24 hours of growth on Avicel, many of these enzymes, homologs of CelA for example, were highly enriched in the substrate-bound fractions versus the extracellular fraction (Blumer-Schuette et al., 2012).

Aside from enzymes, further analysis of the response of ABC transporters from C. saccharolyticus to polysaccharides predicted functions for an additional five transporters that import either xyloglucans or xylooligosaccharides (VanFossen et al., 2009). Three of these ABC transporters are a part of the core genome that is common to all eight sequenced Caldicellulosiruptor species (Blumer-Schuette et al., 2012) and present a unique case for which transcriptomics or proteomics from other species in the genus can support the predictions made using C. saccharolyticus. Two of these common ABC transporters are predicted to import xylooligosaccharides from xylan, and proteomic evidence from C. bescii grown on xylan supports this prediction. Using proteomics, both substrate-binding proteins were detected in the extracellular and membrane-bound fractions of C. bescii cells grown on xylan (Dam et al., 2011). As a case where analysis of orthologous systems can help with predictions, one core ABC transporter, which had no clear substrate preference based on transcriptomic analysis using C. saccharolyticus (Csac_2692–2694; VanFossen et al., 2009), appears to be a xylooligosacharide transporter, based on proteomic evidence from C. bescii (Athe_0847, Athe_0849–0851; Dam et al., 2011). Time-dependent accumulation of solute-binding proteins (SBP) from ABC transporters was also noted during C. bescii and C. obsidiansis growth on Avicel (Lochner et al., 2011b). One of which (orthologs to Csac_0681) is part of the Caldicellulosiruptor core genome (Blumer-Schuette et al., 2012) and predicted to be part of a xyloglucan transporter based on C. saccharolyticus transcriptomics (VanFossen et al., 2009). However, this may be an oligosaccharide transporter. This SBP was highly enriched on the substrate-bound fractions from weakly and highly cellulolytic Caldicellulosiruptor species (Blumer-Schuette et al., 2012), and the proximity of this transporter to a core endo-glucanase/endo-xylanase discussed above (Csac_0678) lends credence to the re-assessment of this transporter. As more detailed analyses become available for other members of this genus, it will be interesting to assess these predictions.

Proteins predicted to be cell wall-bound were also detected in protein fractions from C. bescii grown on xylan or cellulose, with the characterized SLH enzyme Athe_0594 (ortholog to Csac_0678) being detected only during growth on cellulose and a noncatalytic SLH protein (Athe_0438) detected only during growth on xylan (Dam et al., 2011). While not significantly accumulated over time in the extracellular proteomes of C. bescii and C. obsidiansis (Lochner et al., 2011b), orthologs to Csac_0678 were detected as being enriched in the substrate-bound proteomes of six weakly to strongly cellulolytic species, and in the extracellular proteome of C. owensensis, most likely due to a truncated CBM (Blumer-Schuette et al., 2012). Other SLH proteins that were detected in both the cellulose- and xylan-induced proteomes of C. bescii: Athe_0012 and the main S-layer protein Athe_2303 (Dam et al., 2011) were also detected in the time-course-dependent extracellular proteome of C. bescii, and an ortholog to Athe_2303 was also detected in the extracellular proteome of C. obsidiansis (Lochner et al., 2011b). In contrast to comparing extracellular proteomes, the comparison of Avicel-bound proteomes to extracellular and whole-cell lysate proteomes identified SLH proteins that play a role in cellulose adhesion and possibly hydrolysis (Blumer-Schuette et al., 2012). Orthologs to Csac_0678 are included in this group and are the only orthologous group of SLH proteins that have any identifiable protein motifs other than the SLH domains. A lack of functional homology to these orthologous groups of proteins may illustrate the novel mechanisms that the genus Caldicellulosiruptor uses to adhere to substrates.

Other cell-wall-bound proteins that have been identified in extracellular proteomes include flagellin orthologs in C. bescii and C. obsidiansis (Lochner et al., 2011b). While no flagella were observed in electron micrographs of C. bescii or C. obsidiansis (Lochner et al., 2011b), flagella have been observed in electron micrographs of negatively stained C. hydrothermalis (Miroshnichenko et al., 2008) and C. owensensis (Huang et al., 1998). Noncatalytic proteins that are enriched in the Avicel-bound proteome were also the main focus of the seven species comparative proteomics screen (Blumer-Schuette et al., 2012). Orthologs to the flagellin domain proteins Athe_0597/COB47_0918 were enriched in the substrate-bound and whole-cell proteomes of all seven species analyzed and support the theory that this protein is involved in cellular-adherence to cellulose. In addition, analysis of the substrate-bound and extracellular proteomes determined that the proteins forming flagella of weakly cellulolytic species were enriched in the substrate-bound proteome, compared with enrichment in the extracellular proteome for highly cellulolytic species (Blumer-Schuette et al., 2012). Interestingly, comparisons between the extracellular proteome of C. obsidiansis grown on cellobiose and cellulose found that more flagellum-related proteins were detected in the cellulose-induced proteome, possibly related to repression by the disaccharide, cellobiose (Lochner et al., 2011a). Other extracellular structures, important for cellulose-adherence, included a genomic locus upstream of the GDL (Fig. 10), which contains genes predicted to form a type 4 pilus (T4P). Proteins that form the T4P were also detected predominantly in the cellulose-induced membrane proteome from C. bescii (Dam et al., 2011) and in the cellulose-induced whole-cell proteome from C. obsidiansis (Lochner et al., 2011a). This cluster was highly enriched in the substrate-bound proteome of strongly cellulolytic Caldicellulosiruptor species. Two hypothetical proteins in the cluster were also classified as ‘adhesins’, based on their extreme partitioning in the substrate-bound proteomes of cellulolytic species (Blumer-Schuette et al., 2012).

Caldicellulosiruptor plant biomass transcriptomes

Ultimately, the bacterial platform used for CBP development will need to attach to and hydrolyze plant biomass efficiently. Limited transcriptomic (VanFossen et al., 2011) and proteomic data (Lochner et al., 2011a) exist for Caldicellulosiruptor species grown on biomass. However, the available physiological data indicate that all sequenced members of this genus grow well on acid pretreated switchgrass (Blumer-Schuette et al., 2012). Tolerance to compounds released during pretreatment of biomass is of concern, with one study noting that C. saccharolyticus is sensitive to furans that are dehydration products from C5 and C6 (de Vrije et al., 2009). Thus, the use of untreated biomass would be ideal for a CBP process. Currently, two members of the genus, C. bescii and C. saccharolyticus, have been shown to grow on untreated biomass such as poplar (Yang et al., 2009b; VanFossen et al., 2011), and C. bescii also grew on untreated switchgrass (Yang et al., 2009b). A comparison of the transcriptional response of C. saccharolyticus grown on untreated poplar or acid pretreated switchgrass to monosaccharides noted that predicted xylan and xyloglucan transporters were up-regulated as expected; genes were up-regulated from the GDL, with CelB showing the highest up-regulation, and xylanase-containing genomic loci also responded to plant biomass (Fig. 11, VanFossen et al., 2011). Compounds that are released during pretreatment of plant biomass interfere with the methods used for proteomic analysis, so only the whole-cell proteome is available for C. obsidiansis. Of the most abundant proteins in the switchgrass-induced proteome, an SBP (COB47_0549) was 3.3-fold more abundant versus the cellobiose-induced proteome (Lochner et al., 2011a). The role of this transporter in uptake of oligosaccharides released from biomass is supported by up-regulation of the orthologous gene from C. saccharolyticus (Csac_0681) when grown on plant biomass (VanFossen et al., 2009). Enzymes of importance for oligosaccharide hydrolysis released during plant biomass deconstruction were also noted as being enriched in the switchgrass-induced proteome from C. obsidiansis, including five enzymes from the Caldicellulosiruptor core genome with members from the GH families: 3, 28, 31, 51, and 67. Two extracellular enzymes, the core S-layer located GH5 enzyme and CelA, were also noted as being enriched in the switchgrass proteome, which should be of note, because elevated intracellular levels should correlate with elevated extracellular levels (Lochner et al., 2011a).

Figure 11.

Transcript levels of select regions from the Caldicellulosiruptor saccharolyticus genome. Transcript levels represent ORFs transcribed with a least squared mean larger than ± 1.5. Red represents higher than average and green lower than average transcript levels. From outer to inner rings: Open reading frames (orange: + strand, green: − strand); XG, xyloglucan; XGO, xylogluco-oligosaccharides; xylan; X, xylose; G, glucose; Pop, poplar; SG, switchgrass; glycoside hydrolases (blue) and ABC transporters (orange).

Future directions

There have never been better ‘tools’ available for the development of thermophilic microbial strains designed to produce liquid biofuels and chemicals from renewable feedstocks. These include genome sequence information, a spectrum of powerful ‘omics’ capabilities, molecular genetics, and the potential to ‘customize’ plant biomass through genetic manipulations to match the capabilities of CBP candidates. These tools can be used to maximize the genetic potential of candidate CBP microorganisms. At the same time, there are still many questions to be answered about the prospective thermophilic platforms under consideration – whether cellulosomal or noncellulosomal. How much pretreatment is necessary for significant conversion efficiencies? To what extent, if any, are thermophiles advantageous compared to mesophiles? How genetically stable are wild-type and recombinant CBP thermophiles in the face of heterogeneous biomass feedstocks? Are metabolic engineering strategies ultimately limited by core bioenergetic constraints? How resistant can thermophilic CBP microorganisms be to solvents and chemicals, and to the potentially inhibitory influences of lignin-derived moieties? While many of these questions have been asked about less thermophilic CBP candidates, the fact that much less is known about thermophile growth physiologies perhaps means that the answers will be different. De novo design of CBP microorganisms, if this comes to pass sometime in the future, will rely heavily upon what is learned from wild-type and recombinant systems currently being considered. Perhaps, this information will be used for optimization or at least provide the basis for rational approaches to develop commercially important strains. In any case, the considerable interest in producing biofuels and industrial chemicals from renewable feedstocks has accelerated scientific and technological efforts to bring this element to the energy landscape. The relative importance of thermophiles in this picture will depend on the outcome of efforts currently underway.


The BioEnergy Science Center is a U.S. Department of Energy Bioenergy Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the DOE under Contract DE-AC05-00OR22725.