Cellulose factories: advancing bioenergy production from forest trees


Author for correspondence:
Alexander A. Myburg
Tel: +27 0 12 4204945
Email: zander.myburg@fabi.up.ac.za


Fast-growing, short-rotation forest trees, such as Populus and Eucalyptus, produce large amounts of cellulose-rich biomass that could be utilized for bioenergy and biopolymer production. Major obstacles need to be overcome before the deployment of these genera as energy crops, including the effective removal of lignin and the subsequent liberation of carbohydrate constituents from wood cell walls. However, significant opportunities exist to both select for and engineer the structure and interaction of cell wall biopolymers, which could afford a means to improve processing and product development. The molecular underpinnings and regulation of cell wall carbohydrate biosynthesis are rapidly being elucidated, and are providing tools to strategically develop and guide the targeted modification required to adapt forest trees for the emerging bioeconomy. Much insight has already been gained from the perturbation of individual genes and pathways, but it is not known to what extent the natural variation in the sequence and expression of these same genes underlies the inherent variation in wood properties of field-grown trees. The integration of data from next-generation genomic technologies applied in natural and experimental populations will enable a systems genetics approach to study cell wall carbohydrate production in trees, and should advance the development of future woody bioenergy and biopolymer crops.


With the growing need for alternative sources of energy and raw materials, fast-growing plantation tree species, such as Populus and Eucalyptus, are important candidates for renewable sources of lignocellulosic biomass (for recent reviews on the feasability of bioenergy production from wood biomass, refer to Carroll & Somerville, 2009; Hinchee et al., 2009; Mansfield, 2009; Richard, 2010; Somerville et al., 2010; Séguin, 2011). These two genera, broadly representing the Northern and Southern Hemisphere, respectively, produce large amounts of woody biomass (> 50 m3 ha−1 yr−1 for eucalypts in highly productive areas, such as Brazil) in relatively short rotation times and, in general, do not infringe on land dedicated to food crop production. In addition, contrary to agriculture-derived biomass, tree-derived lignocellulosics can be harvested all year round to ensure a stable, predictable and constant supply of raw material for bioenergy or biofuel production. Establishment costs and carbon footprints of multiyear forest plantations are also lower than those of annually planted crops, especially for coppicing eucalypt species, which can be grown on marginal lands (Hinchee et al., 2009). Well-established industrial breeding programmes already exploit the substantial inherent genetic variation available in these genera, which can be (and has been) expanded with interspecific hybridization, and ultimately captured in clonal plantations (Grattapaglia et al., 2009). The processing of wood fibre and, especially, cellulose from woody biomass has been improved and optimized for decades, providing a technology base from which to develop processing plants for biofuels and biomaterials. One major consideration that is often overlooked when forecasting bioenergy feedstocks is that this bioenergy end-use will have to compete with the high-value products derived from chemical cellulose and its derivatives (Fig. 1), and the desired traits for many bioenergy applications are common to those desired for chemical cellulose production. Thus, the objective of improving feedstock characteristics in trees is complementary to current tree breeding programmes directed at traditional forest-reliant industries.

Figure 1.

Examples of the diversity of currently produced, high-value derivatives of wood-derived cellulose. The structure of the repeating unit of cellulose – cellobiose – is shown in the middle, with a ‘head-to-tail’ arrangement of two glucose molecules bound via a β 1–4 linkage. The side-chain substitution of the hydroxyl groups from C2, C3 and/or C6 (highlighted in red) results in the production of a variety of unique physicochemical derivatives, all of which comprise diverse industrial and commercial products (top). Pure crystalline cellulose can also be broken up into micro-crystalline cellulose (bottom) by chemical disruption of the noncrystalline regions or, alternatively, the entire polymer can be separated into nanocellulose crystals.

Cellulose-rich biomass derived from fast-growing tree species offers many advantages over agricultural feedstocks for bioenergy production, but the removal of lignin to facilitate the effective and efficient extraction of cell wall carbohydrates remains one of the primary hurdles (Studer et al., 2011). To efficiently deconstruct lignocellulosic biomass, a detailed understanding of how wood cell walls are synthesized, deposited and modified in planta is required (Mansfield, 2009). Recent research has mainly focused on the modification of lignin, the most abundant natural biopolymer after cellulose (Vanholme et al., 2008), but much remains to be learned about the possibilities for modifying and regulating the synthesis of cellulose, ultimately impacting on the overall chemistry and ultrastructure of wood cell walls. Although major advances have been made in understanding the biosynthesis of cellulose itself (Joshi & Mansfield, 2007), the underlying cellular and biochemical processes that influence cellulose properties in wood cell walls have not yet been fully dissected.

Most of our current knowledge of cellulose biosynthesis stems from studies in model herbaceous plants, such as Arabidopsis thaliana, and, to some extent, the extension of this knowledge to woody plant genera, such as Populus (Joshi et al., 2011). The poplar genome sequence (Tuskan et al., 2006) has been available for 5 yr and, as of 2011, the genome sequence of Eucalyptus grandis (A. A. Myburg et al., unpublished) has also been publicly available (http://www.phytozome.net). These two landmark achievements have opened up new avenues for exploiting the genetic variation in forest trees, and strategically improving the physicochemical properties of woody biomass. The availability of a genome sequence is particularly important for Eucalyptus, the most widely grown hardwood crop in the world (c. 20 million ha). With advances in next-generation sequencing technologies, comparative genomics can now be applied to rapidly adopt the information learned from herbaceous models and other woody plants, such as poplars, to accelerate Eucalyptus improvement. However, with so many candidate genes known to influence xylogenesis, how does one prioritize targets when considering forest trees as bioenergy crops? How can one expand the fundamental understanding of the biology and biosynthesis of cellulose and its interaction with other wood cell wall polymers?

Here, we provide a current summary of the general understanding of the molecular biology of cellulose production in plants, and discuss how the integration of emerging functional genomics technologies with the wealth of fundamental information on wood properties in tree breeding programmes could be used to accelerate the improvement of cellulose and bioenergy potential in trees.

An integrated view of the proteins involved in cellulose biosynthesis and deposition

Historically, the biosynthesis of cellulose has focused on the plasma membrane-located cellulose synthase (CESA) proteins that constitute the active synthesizing complex (CSC; cellulose synthase complex), which is ultimately responsible for producing the polymeric glucan chains that coalesce to form cellulose microfibrils in primary and secondary cell walls of plants (Delmer, 1999; Doblin et al., 2002; Saxena & Brown, 2005; Somerville, 2006; Bessueille & Bulone, 2008; Taylor, 2008; Guerriero et al., 2010). Building on these solid foundations, our current understanding requires an integrated view that incorporates a diverse set of proteins and regulatory mechanisms to fully understand this intricate biological process. Such a view should take into consideration the variety of cellular processes and metabolic fluxes that could, and do, influence the synthesis, deposition and physical properties of cellulose in the two distinctly different cell walls. This holistic view should also include the inherent and tightly regulated interactions of cellulose with other cell wall biopolymers, such as lignin and hemicellulose. For example, the biosynthesis and deposition of xylan, a major constituent of the dicot secondary cell wall (Scheller & Ulvskov, 2010), is closely coordinated with the deposition of cellulose (Hertzberg et al., 2001; Schrader et al., 2004). Thus, to advance our fundamental understanding and further the biotechnological objectives of improving cellulose-rich resources, research areas to be explored should focus on the transcriptional regulation of xylem-forming genes, as well as post-translational modification, protein folding and protein complex assembly, substrate (metabolite) production, transport and availability, the transport of proteins and/or polysaccharides between organelles and to the plasma membrane, and signalling and feedback between the extracellular environment and the cytoplasm, organelles and nucleus.

Using Arabidopsis as the primary model, the current architecture of proteins and the cellular processes thought to be involved in, or influence, the biosynthesis and deposition of cellulose and xylan are illustrated in Fig. 2. At the level of transcriptional regulation, several transcription factors have been shown to directly regulate secondary cell wall CesA genes in Arabidopsis (Zhong et al., 2008; Yamaguchi et al., 2010; Xie et al., 2011). Three of these – SND2, SND3 and MYB103 – appear to specifically regulate secondary cell wall CesA genes, but not xylan or lignin genes (Zhong et al., 2008). These transcription factors are part of a complex transcriptional network regulating various aspects of xylogenesis, the extent of which is still being resolved in Arabidopsis (Kubo et al., 2005; Zhong et al., 2006, 2007, 2008; Demura & Fukuda, 2007), as well as, more recently, in Populus (McCarthy et al., 2010; Zhong & Ye, 2010; Zhong et al., 2010, 2011).

Figure 2.

An integrated view of currently known proteins and some cellular processes involved in cellulose and xylan biosynthesis. Proteins are indicated as coloured circles in the cell areas with which they are associated, and classes of proteins are coloured as indicated by the legend at the bottom left of the figure. It should be noted that the proximity of proteins in the figure does not imply interaction. Actin (blue beads) and microtubules (red and orange tubes) are also shown. References for the inclusion of specific proteins and full protein names can be found in the text.

CESA proteins are synthesized and assembled into complexes in the endoplasmic reticulum (Rudolph, 1987) and, with the help of chaperones, packaged and delivered to the Golgi (Haigler & Brown, 1986). The Golgi (Fig. 2) is also the site for xylan biosynthesis (Bolwell & Northcote, 1983), which can be divided, simplistically, into primer synthesis (PARVUS), chain elongation (IRX9, 10 and 14) and side chain modifications by IRX7, IRX8, PGSIP1, DUF579- and/or DUF231-containing proteins (Brown et al., 2007, 2009, 2011; Lee et al., 2007; York & O’Neill, 2008; Wu et al., 2009, 2010; Jensen et al., 2011). Once the CSCs are assembled, they are transported from the Golgi to the plasma membrane, via the trans-Golgi network, in specialized microtubule-associated compartments (MASCs; Crowell et al., 2009) that interact with actin through MYOSIN (Wightman & Turner, 2008; Szymanski, 2009). At the plasma membrane, MASCs interact with cortical microtubules, possibly, but not conclusively, via KINESIN, and bud vesicles containing CSCs that fuse with and become embedded in the plasma membrane (Giddings et al., 1980; Szymanski, 2009; Crowell et al., 2010).

On the cytoplasmic face (Fig. 2), the CSCs associate with cortical microtubules, putatively through kinesin-like proteins, such as FRAGILE FIBER 1 (FRA1) (Zhong et al., 2002), CESA-interactive protein 1 (CSI1) (Gu et al., 2010) and other microtubule-associated proteins (MAPs). It is therefore apparent that cortical microtubule organization is extremely important in the regulation and deposition of cellulose, and the structure and orientation of said cortical microtubules are influenced by a variety of factors. From the assembly of α- and β-TUB at microtubule assembly sites containing γ-TUB and Gamma-complex proteins (Pastuglia & Bouchez, 2007; Cai, 2010), the growth and modification of the microtubules are influenced by strong association with actin via KCH (kinesin with calponin-homology domain) and MAP190 (Cai, 2010), association with other microtubules via MAP65-1, MAP 200, TBMP 200 (TOBACCO MICROTUBULE BUNDLING POLYPEPTIDE) and/or MICROTUBULE ORGANIZATION 1 (MOR1) (Cai, 2010), and association with the plasma membrane via proteins, such as end-binding 1 (EB1) (Morrison, 2007), P-161 (Cai et al., 2005), A. thaliana kinesin 5 (ATK5) (Ambrose & Cyr, 2007; Pastuglia & Bouchez, 2007), SPIRAL 1 (SPR1) (Nakajima et al., 2004, 2006; Sedbrook et al., 2004), cytoplasmic linker proteins (CLIPs) and CLIP-associating proteins (CLASPs) (Galjart, 2005; Ambrose & Wasteneys, 2008) and PHOSPHOLIPASE-D (Cai, 2010). Microtubule length and organization are also modified by KATANIN (McNally & Vale, 1993; Burk et al., 2001; Stoppin-Mellet et al., 2006; Sharma et al., 2007), and therefore can have an impact on the quality and quantity of cellulose. Transamination, tyrosylation or acetylation of microtubules can influence the binding of KINESIN proteins, whereas glutamination or glycylation of microtubules has been shown to influence KATANIN activity (Cai, 2010). These, and other as yet unidentified proteins, could all potentially have direct or indirect effects on cellulose deposition via their influence on cortical microtubule dynamics.

Movement of the CSC along the membrane is believed to be driven by the force of cellulose microfibril synthesis itself against the cell wall matrix (Diotallevi & Mulder, 2007), and is guided by the cortical microtubules (Paredez et al., 2006), with membrane-associated sucrose synthase (SUSY) providing uridine diphosphate (UDP)-glucose as substrate for the CSC (Fig. 2). Towards the cell wall side, KORRIGAN (KOR; Lane et al., 2001) and possibly other glycosyl hydrolases (GHs) edit elongating cellulose chains as they are synthesized, whereas COBRA/COBL and possibly other glycosylphosphatidylinositol (GPI)-anchored proteins, as well as the fasciclin-like arabinogalactan (FLA) proteins and/or other arabinogalactan proteins (AGPs), are thought to interact with cellulose as it is deposited, and concurrently relay signals back to the cytoplasm to regulate its synthesis (Zhang et al., 2003; Seifert & Roberts, 2007; MacMillan et al., 2010).

The mediation of cell wall feedback signalling is carried out by a number of pathways and, recently, the Rop/Rac guanosine triphosphatases (GTPases) (Fig. 2), which are regulated by RIC (ROP-INTERACTIVE CRIB MOTIF-CONTAINING PROTEIN) and ROPGEF (RHO GUANYL-NUCLEOTIDE EXCHANGE FACTOR), have been highlighted as playing an important role in cell wall signalling, together with IQD (IQ DOMAIN) and CTL (CHITINASE-LIKE) proteins, and wall-associated kinases (WAKs), such as leucine-rich repeat (LRR)-receptor kinases (Oikawa et al., 2010). The LRR-receptor kinases include, amongst others, THESEUS (Hématy et al., 2007) and KOBITO/ELONGATION DEFECTIVE 1 (ELD1) (Pagant et al., 2002; Lertpiriyapong & Sung, 2003), both of which have been shown to have an impact on cell wall properties. In the secondary cell wall, laccases (LACs) and other peroxidases oxidize monolignols, leading to the random coupling of lignin monomers and resulting in the synthesis of the macromolecule lignin polymer (Boerjan et al., 2003; Ralph et al., 2004; Mattinen et al., 2008), whereas other as yet unidentified GHs and carbohydrate-binding module (CBM)-containing proteins appear to be involved in the mediation of cellulose–cellulose, cellulose–xylan, xylan–xylan or xylan–lignin interactions as the different biopolymers are synthesized, deposited and arranged.

In addition to the cellular processes and specific proteins involved in cellulose deposition itself, it is important to consider the metabolic flux and channelling to the various biochemical pathways that lead to the synthesis of cellulose and xylan. For example, a key metabolite is UDP-glucose, which is the immediate precursor for cellulose biosynthesis by CESA proteins. In addition, UDP-glucose can be readily converted to UDP-xylose for xylan biosynthesis (Fig. 3). UDP-glucose is produced directly via the hydrolysis of sucrose by sucrose SUSY or indirectly by invertase (Barratt et al., 2009; Kleczkowski et al., 2010), which cleaves sucrose to monomeric glucose and fructose. Monomeric glucose is then converted to UDP-glucose via phosphorylation of the 6’ position (HEXOKINASE/GLUCOKINASE), followed by the substitution of the phosphate to the 1’ position (PHOSPHOGLUCOMUTASE) and the subsequent substitution of the phosphate group with UDP by UTP-glucose-1-phosphate uridylyltransferase (UGP). UDP-glucose can be directly employed by CESA proteins for cellulose biosynthesis, or converted to UDP-xylose via conversion to UDP-d-glucuronate by UDP-glucose 6-dehydrogenase (UGHD), followed by the removal of CO2 by uridine-diphosphoglucuronate decarboxylase (UXS). UDP-xylose is then utilized as the backbone for xylan biosynthesis, with the addition of glucuronic acid (GlcA) and acetyl groups to the backbone or side chains to form heteroxylan.

Figure 3.

Metabolic pathways and processes leading to cellulose and xylan biosynthesis, based on the Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.genome.jp/kegg/), as well as recent literature revealing putative biosynthetic enzymes involved in xylan biosynthesis (Brown et al., 2007, 2009; York & O’Neill, 2008; Oikawa et al., 2010). Metabolites are represented as circles, and enzymatic processes or known enzymes of interest as boxes. BGL, β-glucosidase; CESA, cellulose synthase; SPS, sucrose phosphate synthase; SPP, sucrose phosphate phosphatase; SUSY, sucrose synthase; UDP, uridine diphosphate; UGHD, UDP-glucose 6-dehydrogenase; UGP, UTP-glucose-1-phosphate uridylyltransferase; UXS, uridine-diphosphoglucuronate decarboxylase.

Studies have shown that alterations in the metabolic flux of UDP-glucose can indeed affect the relative abundance and structure of cell wall polysaccharides. For example, upregulation of SUSY in poplar trees resulted in an increase in cell wall thickness of fibres and the production of more cellulose that displayed enhanced crystallinity (Coleman et al., 2009). The combination of SUSY and UGP overexpression in tobacco also resulted in a synergistic increase in plant height and biomass (Coleman et al., 2006). It should be noted that the overall phenotypic effect of increased SUSY or UGP levels is dependent on the source and sink sugars and other metabolites (Haigler et al., 2001; Coleman et al., 2009; Meng et al., 2009), which will vary in different plant species, and under an array of physiological conditions. These studies demonstrate that changes in metabolite levels, through intracellular and intercellular transport or enzymatic activity, could greatly influence the resulting abundance and/or structure of cell wall polysaccharides.

Towards systems genetics of cellulose production in trees

The scale of cellulose biosynthesis and biomass production in fast-growing plantation trees is vastly different from that in herbaceous models. There is an emphasis on large-scale cambial cell differentiation, cell elongation, secondary cell wall deposition and programmed cell death. The tremendous strength of the sink tissue means that the tree as a system must prioritize the channelling of carbon flow towards the synthesis of xylem biopolymers. Therefore, information cannot always be directly extended from herbaceous models to trees – good examples of this are the different outcomes that resulted from the overexpression of SUSY in tobacco plants (Coleman et al., 2006) as opposed to poplar (Coleman et al., 2009), and the fact that, for Arabidopsis, INVERTASE is necessary and sufficient for normal growth, whereas direct UDP-glucose production through SUSY is not (Barratt et al., 2009). Recent findings have also suggested that the transcriptional network regulating cell wall biopolymer synthesis in woody plants may be more complex and comprise novel transcription factors not previously linked to secondary cell wall formation in Arabidopsis (Zhong et al., 2011). This implies the need to independently study the functions of secondary cell wall-related genes in trees. Some practical considerations are that very few commercial species and clonal genotypes have optimized transformation protocols, mature wood properties take several years to acquire and wood properties are complex traits affected by large numbers of genes. Rigorous glasshouse studies and field trials are required for each candidate, and these carry significant economical, ecological and regulatory burdens (for recent reviews on this issue, see Strauss et al., 2009; Ahuja, 2011; Harfouche et al., 2011). What is required is an approach that would prioritize genes or pathways that underlie variation in wood properties in mature, field-grown trees.

At our disposal is a rich history of tree breeding, resulting in large, structured populations, and large amounts of genetic diversity in these populations (Sederoff et al., 2009; Neale & Kremer, 2011). These resources have been exploited through the application of molecular marker technologies and forward genetics approaches in multiple forest tree pedigrees, where high linkage disequilibrium (LD) has allowed the efficient identification of quantitative trait loci (QTLs; Grattapaglia & Kirst, 2008), as well as in large association populations where low LD has allowed the association of single genes with wood properties (Groover, 2007; Neale & Ingvarsson, 2008). Single gene associations detected in Eucalyptus and Populus (Thumma et al., 2005, 2009; Wegrzyn et al., 2010) have not always been intuitive – for example, the association between a lignin gene (cinnamoyl CoA reductase, CCR) and a physical cellulose property (microfibril angle) in Eucalyptus (Thumma et al., 2005). This illustrates that our understanding of the causal relationship of genes and complex traits is still incomplete.

Phenotypic variation in tree breeding populations is influenced by a variety of intrinsic (and measurable) biological processes, mainly those of transcriptional and translational regulation of various biochemical pathways (Du & Groover, 2010), as well as the flux of metabolic intermediates in these pathways (Mansfield, 2009). In addition, these biological processes are strongly influenced by environmental cues and seasonal variation over the lifetime of these long-lived organisms (Groover, 2007). A more holistic research approach encompassing genetic, biochemical and environmental variation must therefore be adopted to understand and improve wood property traits in trees.

Systems genetics (Fig. 4) connects the intermediate components of a complex phenotype (e.g. transcript, protein and metabolite levels) in related individuals to measurable phenotypic traits, such as wood properties or bioenergy potential, in the context of the underlying genetic variation in populations (MacKay et al., 2009; Nadeau & Dudley, 2011). An extension of genetical genomics (Jansen & Nap, 2001), systems genetics is a network approach that explores the interconnectedness of the component levels of biological variation. It has been successfully applied in model organisms, such as Drosophila (Ayroles et al., 2009; Morozova et al., 2009; Jumbo-Lucioni et al., 2010) and mouse (Farber et al., 2011). It has also been applied in humans (Plaisier et al., 2009; Romanoski et al., 2010) and, importantly, in animal breeding (Kadarmideen et al., 2006; Kadarmideen & Janss, 2007, 2009), which has many similarities to plant breeding. The power of systems genetics is that it reveals emergent properties of the system, providing insight into novel gene–gene, gene–trait and trait–trait relationships that would not be detected at the level of the individual. This often allows the reconstruction of complex directional gene regulatory networks and metabolic pathways (Kadarmideen et al., 2006; Keurentjes et al., 2007), adding insight into previously identified single gene associations and the molecular basis of QTLs. Systems genetics could also explain the biology underlying complex phenomena, such as G × E interactions, epigenetic control, biotic and abiotic interactions and hybrid vigour (heterosis), which are key themes to be addressed in tree improvement in the near future.

Figure 4.

A systems genetics approach to understanding the molecular basis of complex phenotypic traits in forest trees. Left: systems genetics allows the molecular dissection of polygenic traits by relating phenotypic and genetic variation in experimental populations to measurable component traits (in developing cells, tissues and organs of trees) segregating in the same populations. Right: conceptual network resulting from the integration of the covariation of complex and component traits, revealing novel correlations among genes, expression modules, metabolites and complex wood phenotypes that would not be observed at the level of the individual.

Tree breeding programmes already make use of structured pedigrees and populations replicated across environments, and therefore present an ideal starting place for systems genetics. Variation in transcriptomes has already been studied at the population level in Eucalyptus (Kirst et al., 2005; Grattapaglia & Kirst, 2008) and Populus (Drost et al., 2010). Transcriptome, proteome and metabolome profiling at the population level will allow integrated modelling of biomass production in trees. Systems genetics is complementary to fundamental biological investigations performed in model organisms and will also complement association genetics approaches and genomic selection strategies that are being implemented in forest tree breeding programmes (Grattapaglia & Resende, 2011). Moreover, systems genetics will allow the identification and prioritization of candidate genes for functional genetic testing in glasshouse and field trials of forest trees.


An understanding of how cellulose is deposited during xylogenesis in wood fibre cells has important implications for our ability to manipulate and select for bioenergy traits in trees. We also need to understand the complex genetic relationships and biochemical interactions that underlie wood property variation in tree populations. The application of next-generation DNA and RNA sequencing (Mizrachi et al., 2010), and the adoption of high-throughput proteomics and metabolomics technologies in trees (Abril et al., 2011; Dauwe et al., 2011; Robinson & Mansfield, 2011), will allow integrated approaches to study complex relationships of genes, metabolites and wood (bio)chemistry traits at the population level. A systems genetics approach, which also includes the measurement of bioenergy potential, is a viable and increasingly cost-effective method to dissect complex phenotypes in trees, and will complement genomic selection efforts. It will also permit us to address the fundamental question of whether the same genes linked to cell wall biosynthesis by functional genetic studies in individual genotypes also influence cell wall properties in natural or experimental populations. In addition, the diversity of applications of next-generation DNA sequencing will enable the investigation of other types of regulation, such as allele-specific expression, splice site variation, gene regulation by endogenous small RNAs or epigenetic modification, which may have an impact on the bioenergy potential of forest trees. Finally, the completion of additional tree genome sequences will permit comparative genomics approaches to dissect vital biosynthetic pathways important to industrial trait development, which should form the foundations of the emerging bio-based economy.